Include MelGAN, HifiGAN and Multiband-HifiGAN, maybe NHV in the future.

Overview

Fast (GAN Based Neural) Vocoder

Chinese README

Todo

  • Submit demo
  • Support NHV

Discription

Include MelGAN, HifiGAN and Multiband-HifiGAN, maybe include NHV in the future. Developed on BiaoBei dataset, you can modify conf and hparams.py to fit your own dataset and model.

Usage

  • Prepare data
    • write path of wav data in a file, for example: cd dataset && python3 biaobei.py
    • bash preprocess.sh <wav path file> <path to save processed data> dataset/audio dataset/mel
    • for example: bash preprocess.sh dataset/BZNSYP.txt processed dataset/audio dataset/mel
  • Train
    • command:
    bash train.sh \
        <GPU ids> \
        /path/to/audio/train \
        /path/to/audio/valid \
        /path/to/mel/train \
        /path/to/mel/valid \
        <model name> \
        <if multi band> \
        <if use scheduler> \
        <path to configuration file>
    
    • for example:
    bash train.sh \
    0 \
    dataset/audio/train \
    dataset/audio/valid \
    dataset/mel/train \
    dataset/mel/valid \
    hifigan \
    0 0 0 \
    conf/hifigan/light.yaml
    
  • Train from checkpoint
    • command:
    bash train.sh \
        <GPU ids> \
        /path/to/audio/train \
        /path/to/audio/valid \
        /path/to/mel/train \
        /path/to/mel/valid \
        <model name> \
        <if multi band> \
        <if use scheduler> \
        <path to configuration file> \
        /path/to/checkpoint \
        <step of checkpoint>
    
  • Synthesize
    • command:
    bash synthesize.sh \
        /path/to/checkpoint \
        /path/to/mel \
        /path/for/saving/wav \
        <model name> \
        /path/to/configuration/file
    

Acknowledgments

Comments
  • why set the L=30 ?

    why set the L=30 ?

    hello,I have some question, in the paper ,the shape of basis matrix is [32, 256] , but in the code ,the shape is [30, 256] . And according to the function "overlap_and_add" , output_size = (frames - 1) * frame_step + frame_length, if the L=30, I think it cannot match the real wave length ? for example, hop_len=256, mel.shape=[80, 140] , theoretically the output wave length is 140*256=35840. according to the code, the output wave length is 33600.

    Thanks in advance.

    opened by yingfenging 3
  • Link to Basis-MelGAN paper?

    Link to Basis-MelGAN paper?

    Hi Zhengxi, congrats on your paper's acceptance on Interspeech 2021!

    I got pretty interested in your paper while reading the abstract of Basis-MelGAN on the README, but I could not find any link to the paper. Though the Interspeech conference is only 2 months away, don't you have any plans on publishing the paper on arXiv in near future?

    opened by seungwonpark 2
  • Random start index in WeightDataset

    Random start index in WeightDataset

    At this line: https://github.com/xcmyz/FastVocoder/blob/a9af370be896b1096e746ce6489fb16fef8ca585/data/dataset.py#L97

    If the input mel size smaller than fix-length, the random raise issue, I have try except to pass these short audios, but I just wonder it is handle in collate.

    More than that, the segment size as I found in hifigan is 32, but in basic-melgan it (fix-length) is set to 140. Are there any difference between the 140 for biaobei and the one for LJspeech

    opened by v-nhandt21 0
  • can basis-melgan  be used as  unversial vocoder?

    can basis-melgan be used as unversial vocoder?

    I tried it for a single speaker dataset, rtf surprises me. Have you ever use basis-melgan for a multi-speaker dataset, or is it suitable for unseen speaker tts synthesis?

    opened by mayfool 0
  • Shape mismatch error on new dataset

    Shape mismatch error on new dataset

    Hi, thanks for your work!

    The frame rate of my dataset is 22050, and hop size of text2mel model is 256. I have changed hparams.py accordingly, but training results in an expcetion: (preprocessing was fine, anyway)

      File "/home/user/speechlab/FastVocoder-main/model/loss/loss.py", line 23, in forward
        assert est_source_sub_band.size(1) == wav_sub_band.size(1)
    

    I figured out that model inference still uses hop-size of 240. So how to make your code fully compatible with other datasets? it seems that the codes are somehow hardcoded for Biaobei dataset.

    opened by tekinek 1
  • Multiband Architecture

    Multiband Architecture

    Hi author, I have found the notes as "the generated audio has interference at a specific frequency" in this repo. I have encountered with the straight line at a specific frequency when developing similar multiband architecture, and I wonder if such phenomenon is the one you mentioned? And do you have some advice or solutions? Thanks. audio

    help wanted 
    opened by Rongjiehuang 6
Owner
Zhengxi Liu (刘正曦)
Interested in high performance neural vocoder and expressive TTS acoustic model. Member of DeepMist and developed MistGPU.
Zhengxi Liu (刘正曦)
Simple virtual assistant using pyttsx3 and speech recognition optionally with pywhatkit and pther libraries.

VirtualAssistant Simple virtual assistant using pyttsx3 and speech recognition optionally with pywhatkit and pther libraries. Third Party Libraries us

Logadheep 1 Nov 27, 2021
⚖️ A Statutory Article Retrieval Dataset in French.

A Statutory Article Retrieval Dataset in French This repository contains the Belgian Statutory Article Retrieval Dataset (BSARD), as well as the code

Maastricht Law & Tech Lab 19 Nov 17, 2022
Built for cleaning purposes in military institutions

Ferramenta do AL Construído para fins de limpeza em instituições militares. Instalação Requer python = 3.2 pip install -r requirements.txt Usagem Exe

0 Aug 13, 2022
A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

MMF is a modular framework for vision and language multimodal research from Facebook AI Research. MMF contains reference implementations of state-of-t

Facebook Research 5.1k Dec 26, 2022
Spokestack is a library that allows a user to easily incorporate a voice interface into any Python application with a focus on embedded systems.

Welcome to Spokestack Python! This library is intended for developing voice interfaces in Python. This can include anything from Raspberry Pi applicat

Spokestack 133 Sep 20, 2022
A python script to prefab your scripts/text files, and re create them with ease and not have to open your browser to copy code or write code yourself

Scriptfab - What is it? A python script to prefab your scripts/text files, and re create them with ease and not have to open your browser to copy code

DevNugget 3 Jul 28, 2021
Research code for ECCV 2020 paper "UNITER: UNiversal Image-TExt Representation Learning"

UNITER: UNiversal Image-TExt Representation Learning This is the official repository of UNITER (ECCV 2020). This repository currently supports finetun

Yen-Chun Chen 680 Dec 24, 2022
JaQuAD: Japanese Question Answering Dataset

JaQuAD: Japanese Question Answering Dataset for Machine Reading Comprehension (2022, Skelter Labs)

SkelterLabs 84 Dec 27, 2022
Graphical user interface for Argos Translate

Argos Translate GUI Website | GitHub | PyPI Graphical user interface for Argos Translate. Install pip3 install argostranslategui

Argos Open Tech 16 Dec 07, 2022
Suite of 500 procedurally-generated NLP tasks to study language model adaptability

TaskBench500 The TaskBench500 dataset and code for generating tasks. Data The TaskBench dataset is available under wget http://web.mit.edu/bzl/www/Tas

Belinda Li 20 May 17, 2022
Automatic privilege escalation for misconfigured capabilities, sudo and suid binaries

GTFONow Automatic privilege escalation for misconfigured capabilities, sudo and suid binaries. Features Automatically escalate privileges using miscon

101 Jan 03, 2023
A high-level yet extensible library for fast language model tuning via automatic prompt search

ruPrompts ruPrompts is a high-level yet extensible library for fast language model tuning via automatic prompt search, featuring integration with Hugg

Sber AI 37 Dec 07, 2022
GraphNLI: A Graph-based Natural Language Inference Model for Polarity Prediction in Online Debates

GraphNLI: A Graph-based Natural Language Inference Model for Polarity Prediction in Online Debates Vibhor Agarwal, Sagar Joglekar, Anthony P. Young an

Vibhor Agarwal 2 Jun 30, 2022
超轻量级bert的pytorch版本,大量中文注释,容易修改结构,持续更新

bert4pytorch 2021年8月27更新: 感谢大家的star,最近有小伙伴反映了一些小的bug,我也注意到了,奈何这个月工作上实在太忙,更新不及时,大约会在9月中旬集中更新一个只需要pip一下就完全可用的版本,然后会新添加一些关键注释。 再增加对抗训练的内容,更新一个完整的finetune

muqiu 317 Dec 18, 2022
Deploying a Text Summarization NLP use case on Docker Container Utilizing Nvidia GPU

GPU Docker NLP Application Deployment Deploying a Text Summarization NLP use case on Docker Container Utilizing Nvidia GPU, to setup the enviroment on

Ritesh Yadav 9 Oct 14, 2022
DeepAmandine is an artificial intelligence that allows you to talk to it for hours, you won't know the difference.

DeepAmandine This is an artificial intelligence based on GPT-3 that you can chat with, it is very nice and makes a lot of jokes. We wish you a good ex

BuyWithCrypto 3 Apr 19, 2022
Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning

GenSen Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning Sandeep Subramanian, Adam Trischler, Yoshua B

Maluuba Inc. 309 Oct 19, 2022
ProtFeat is protein feature extraction tool that utilizes POSSUM and iFeature.

Description: ProtFeat is designed to extract the protein features by employing POSSUM and iFeature python-based tools. ProtFeat includes a total of 39

GOKHAN OZSARI 5 Dec 16, 2022