Fast and Simple Neural Vocoder, the Multiband RNNMS

Last update: Jan 11, 2022

Related tags

Deep Learning MultibandRNNMS

Overview

Multiband RNN_MS

Fast and Simple vocoder, Multiband RNN_MS.

Demo
Quick training
How to Use
System Details
Results
References

Demo

ToDO: Link super great impressive high-quatity audio demo.

Quick Training

Jump to ☞ , then Run. That's all!

How to Use

1. Install

# pip install "torch==1.10.0" -q      # Based on your environment (validated with v1.10)
# pip install "torchaudio==0.10.0" -q # Based on your environment
pip install git+https://github.com/tarepan/MultibandRNNMS

2. Data & Preprocessing

"Batteries Included".
RNNMS transparently download corpus and preprocess it for you 😉

3. Train

python -m mbrnnms.main_train

For arguments, check ./mbrnnms/config.py

Advanced: Other datasets

You can switch dataset with arguments.
All speechcorpusy's preset corpuses are supported.

# LJSpeech corpus
python -m mbrnnms.main_train data.data_name=LJ

Advanced: Custom dataset

Copy mbrnnms.main_train and replace DataModule.

    # datamodule = LJSpeechDataModule(batch_size, ...)
    datamodule = YourSuperCoolDataModule(batch_size, ...)
    # That's all!

System Details

Model

PreNet: GRU
Upsampler: time-directional nearest interpolation
Decoder: Embedding-auto-regressive generative RNN with 10-bit μ-law encoding

Results

Output Sample

Demo

Performance

X [iter/sec] @ NVIDIA T4 on Google Colaboratory (AMP+, num_workers=8)

It takes about Ydays for full training.

References

Acknowlegements

: Basic vocoder concept came from this paper.
bshall/UniversalVocoding: Model and hyperparams are derived from this repository. All codes are re-written.

Fast and Simple Neural Vocoder, the Multiband RNNMS

Related tags

Overview

Multiband RNN_MS

Demo

Quick Training

How to Use

1. Install

2. Data & Preprocessing

3. Train

Advanced: Other datasets

Advanced: Custom dataset

System Details

Model

Results

Output Sample

Performance

References

Acknowlegements

Owner

tarepan

Automatic Video Captioning Evaluation Metric --- EMScore

MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tricks

Long Expressive Memory (LEM)

Urban mobility simulations with Python3, RLlib (Deep Reinforcement Learning) and Mesa (Agent-based modeling)

The description of FMFCC-A (audio track of FMFCC) dataset and Challenge resluts.

Tensor-Based Quantum Machine Learning

Portfolio Optimization and Quantitative Strategic Asset Allocation in Python

[WACV21] Code for our paper: Samuel, Atzmon and Chechik, "From Generalized zero-shot learning to long-tail with class descriptors"

PyTorch Implement for Path Attention Graph Network

Mercer Gaussian Process (MGP) and Fourier Gaussian Process (FGP) Regression

OpenL3: Open-source deep audio and image embeddings

MediaPipeで姿勢推定を行い、Tokyo2020オリンピック風のピクトグラムを表示するデモ

A python library for self-supervised learning on images.

Easily pull telemetry data and create beautiful visualizations for analysis.

Use VITS and Opencpop to develop singing voice synthesis; Maybe it will VISinger.

QA-GNN: Question Answering using Language Models and Knowledge Graphs

Code for Private Recommender Systems: How Can Users Build Their Own Fair Recommender Systems without Log Data? (SDM 2022)

PyTorch common framework to accelerate network implementation, training and validation

PyTorch Code of "Memory In Memory: A Predictive Neural Network for Learning Higher-Order Non-Stationarity from Spatiotemporal Dynamics"

Binary classification for arrythmia detection with ECG datasets.