Include MelGAN, HifiGAN and Multiband-HifiGAN, maybe NHV in the future.

Last update: Dec 16, 2022

Related tags

Text Data & NLP FastVocoder

Overview

Fast (GAN Based Neural) Vocoder

Chinese README

Todo

Submit demo
Support NHV

Discription

Include MelGAN, HifiGAN and Multiband-HifiGAN, maybe include NHV in the future. Developed on BiaoBei dataset, you can modify conf and hparams.py to fit your own dataset and model.

Usage

Prepare data
- write path of wav data in a file, for example: cd dataset && python3 biaobei.py
- bash preprocess.sh <wav path file> <path to save processed data> dataset/audio dataset/mel
- for example: bash preprocess.sh dataset/BZNSYP.txt processed dataset/audio dataset/mel

Train

command:

bash train.sh \
    <GPU ids> \
    /path/to/audio/train \
    /path/to/audio/valid \
    /path/to/mel/train \
    /path/to/mel/valid \
    <model name> \
    <if multi band> \
    <if use scheduler> \
    <path to configuration file>

for example:

bash train.sh \
0 \
dataset/audio/train \
dataset/audio/valid \
dataset/mel/train \
dataset/mel/valid \
hifigan \
0 0 0 \
conf/hifigan/light.yaml

Train from checkpoint

command:

bash train.sh \
    <GPU ids> \
    /path/to/audio/train \
    /path/to/audio/valid \
    /path/to/mel/train \
    /path/to/mel/valid \
    <model name> \
    <if multi band> \
    <if use scheduler> \
    <path to configuration file> \
    /path/to/checkpoint \
    <step of checkpoint>

Synthesize

command:

bash synthesize.sh \
    /path/to/checkpoint \
    /path/to/mel \
    /path/for/saving/wav \
    <model name> \
    /path/to/configuration/file

Acknowledgments

Comments

why set the L=30 ?

hello，I have some question， in the paper ，the shape of basis matrix is [32, 256] , but in the code ,the shape is [30, 256] . And according to the function "overlap_and_add" , output_size = (frames - 1) * frame_step + frame_length, if the L=30, I think it cannot match the real wave length ? for example, hop_len=256, mel.shape=[80, 140] , theoretically the output wave length is 140*256=35840. according to the code, the output wave length is 33600.

Thanks in advance.

opened by yingfenging 3
Link to Basis-MelGAN paper?

Hi Zhengxi, congrats on your paper's acceptance on Interspeech 2021!

I got pretty interested in your paper while reading the abstract of Basis-MelGAN on the README, but I could not find any link to the paper. Though the Interspeech conference is only 2 months away, don't you have any plans on publishing the paper on arXiv in near future?

opened by seungwonpark 2
Random start index in WeightDataset

At this line: https://github.com/xcmyz/FastVocoder/blob/a9af370be896b1096e746ce6489fb16fef8ca585/data/dataset.py#L97

If the input mel size smaller than fix-length, the random raise issue, I have try except to pass these short audios, but I just wonder it is handle in collate.

More than that, the segment size as I found in hifigan is 32, but in basic-melgan it (fix-length) is set to 140. Are there any difference between the 140 for biaobei and the one for LJspeech

opened by v-nhandt21 0
can basis-melgan be used as unversial vocoder?

I tried it for a single speaker dataset, rtf surprises me. Have you ever use basis-melgan for a multi-speaker dataset, or is it suitable for unseen speaker tts synthesis?

opened by mayfool 0
Shape mismatch error on new dataset
Hi, thanks for your work!

The frame rate of my dataset is 22050, and hop size of text2mel model is 256. I have changed hparams.py accordingly, but training results in an expcetion: (preprocessing was fine, anyway)

File "/home/user/speechlab/FastVocoder-main/model/loss/loss.py", line 23, in forward assert est_source_sub_band.size(1) == wav_sub_band.size(1)

I figured out that model inference still uses hop-size of 240. So how to make your code fully compatible with other datasets? it seems that the codes are somehow hardcoded for Biaobei dataset.
opened by tekinek 1
Multiband Architecture

Hi author, I have found the notes as "the generated audio has interference at a specific frequency" in this repo. I have encountered with the straight line at a specific frequency when developing similar multiband architecture, and I wonder if such phenomenon is the one you mentioned? And do you have some advice or solutions? Thanks.
help wanted

opened by Rongjiehuang 6

Releases(v1.0)

v1.0(Jun 24, 2021)

Source code(tar.gz)
Source code(zip)
basis.melgan.pt(53.36 MB)

Owner

Zhengxi Liu (刘正曦)

Interested in high performance neural vocoder and expressive TTS acoustic model. Member of DeepMist and developed MistGPU.

GitHub Repository

Python package to easily retrain OpenAI's GPT-2 text-generating model on new texts

gpt-2-simple A simple Python package that wraps existing model fine-tuning and generation scripts for OpenAI's GPT-2 text generation model (specifical

3.1k Jan 07, 2023

GooAQ 🥑 : Google Answers to Google Questions!

This repository contains the code/data accompanying our recent work on long-form question answering.

112 Nov 06, 2022

Under the hood working of transformers, fine-tuning GPT-3 models, DeBERTa, vision models, and the start of Metaverse, using a variety of NLP platforms: Hugging Face, OpenAI API, Trax, and AllenNLP

150 Dec 23, 2022

End-to-end MLOps pipeline of a BERT model for emotion classification.

image source EmoBERT-MLOps The goal of this repository is to build an end-to-end MLOps pipeline based on the MLOps course from Made with ML, but this

4 Nov 06, 2022

Reproduction process of BERT on SST2 dataset

BERT-SST2-Prod Reproduction process of BERT on SST2 dataset 安装说明下载代码库 git clone https://github.com/JunnYu/BERT-SST2-Prod 进入文件夹，安装requirements pip ins

1 Nov 18, 2021

Utility for Google Text-To-Speech batch audio files generator. Ideal for prompt files creation with Google voices for application in offline IVRs

Google Text-To-Speech Batch Prompt File Maker Are you in the need of IVR prompts, but you have no voice actors? Let Google talk your prompts like a pr

1 Aug 19, 2021

TFIDF-based QA system for AIO2 competition

AIO2 TF-IDF Baseline This is a very simple question answering system, which is developed as a lightweight baseline for AIO2 competition. In the traini

4 Feb 19, 2022

code for modular summarization work published in ACL2021 by Krishna et al

This repository contains the code for running modular summarization pipelines as described in the publication Krishna K, Khosla K, Bigham J, Lipton ZC

21 Nov 24, 2022

Transformers4Rec is a flexible and efficient library for sequential and session-based recommendation, available for both PyTorch and Tensorflow.

730 Jan 09, 2023

Python powered crossword generator with database with 20k+ polish words

crossword_generator Generate simple crossword puzzle from words and definitions fetched from krzyżowki.edu.pl endpoints -/ string:word - returns js

0 Jan 04, 2022

Python functions for summarizing and improving voice dictation input.

Helpmespeak Help me speak uses Python functions for summarizing and improving voice dictation input. Get started with OpenAI gpt-3 OpenAI is a amazing

6 Dec 17, 2022

🚀 RocketQA, dense retrieval for information retrieval and question answering, including both Chinese and English state-of-the-art models.

In recent years, the dense retrievers based on pre-trained language models have achieved remarkable progress. To facilitate more developers using cutt

475 Jan 04, 2023

My implementation of Safaricom Machine Learning Codility test. The code has bugs, logical I guess I made errors and any correction will be appreciated.

Safaricom_Codility Machine Learning 2022 The test entails two questions. Question 1 was on Machine Learning. Question 2 was on SQL I ran out of time.

1 Mar 03, 2022

This Project is based on NLTK It generates a RANDOM WORD from a predefined list of words, From that random word it read out the word, its meaning with parts of speech , its antonyms, its synonyms

This Project is based on NLTK(Natural Language Toolkit) It generates a RANDOM WORD from a predefined list of words, From that random word it read out the word, its meaning with parts of speech , its

2 Nov 17, 2021

Include MelGAN, HifiGAN and Multiband-HifiGAN, maybe NHV in the future.

Related tags

Overview

Fast (GAN Based Neural) Vocoder

Todo

Discription

Usage

Acknowledgments

Comments

why set the L=30 ?

Link to Basis-MelGAN paper?

Random start index in WeightDataset

can basis-melgan be used as unversial vocoder?

Shape mismatch error on new dataset

Multiband Architecture