PyTorch implementation of Tacotron speech synthesis model.

Last update: Dec 09, 2022

Overview

tacotron_pytorch

PyTorch implementation of Tacotron speech synthesis model.

Inspired from keithito/tacotron. Currently not as much good speech quality as keithito/tacotron can generate, but it seems to be basically working. You can find some generated speech examples trained on LJ Speech Dataset at here.

If you are comfortable working with TensorFlow, I'd recommend you to try https://github.com/keithito/tacotron instead. The reason to rewrite it in PyTorch is that it's easier to debug and extend (multi-speaker architecture, etc) at least to me.

Requirements

PyTorch
TensorFlow (if you want to run the training script. This definitely can be optional, but for now required.)

Installation

git clone --recursive https://github.com/r9y9/tacotron_pytorch
pip install -e . # or python setup.py develop

If you want to run the training script, then you need to install additional dependencies.

pip install -e ".[train]"

Training

The package relis on keithito/tacotron for text processing, audio preprocessing and audio reconstruction (added as a submodule). Please follows the quick start section at https://github.com/keithito/tacotron and prepare your dataset accordingly.

If you have your data prepared, assuming your data is in "~/tacotron/training" (which is the default), then you can train your model by:

python train.py

Alignment, predicted spectrogram, target spectrogram, predicted waveform and checkpoint (model and optimizer states) are saved per 1000 global step in checkpoints directory. Training progress can be monitored by:

tensorboard --logdir=log

Testing model

Open the notebook in notebooks directory and change checkpoint_path to your model.

PyTorch implementation of Tacotron speech synthesis model.

Related tags

Overview

tacotron_pytorch

Requirements

Installation

Training

Testing model

Owner

Ryuichi Yamamoto

A toolkit for document-level event extraction, containing some SOTA model implementations

Multispeaker & Emotional TTS based on Tacotron 2 and Waveglow

text to speech toolkit. 好用的中文语音合成工具箱，包含语音编码器、语音合成器、声码器和可视化模块。

Calibre recipe to convert latest issue of Analyse & Kritik into an ebook

Speech Recognition for Uyghur using Speech transformer

EasyTransfer is designed to make the development of transfer learning in NLP applications easier.

Задания КЕГЭ по информатике 2021 на Python

Yomichad - a Japanese pop-up dictionary that can display readings and English definitions of Japanese words

Baseline code for Korean open domain question answering(ODQA)

Transcribing audio files using Hugging Face's implementation of Wav2Vec2 + "chain-linking" NLP tasks to combine speech-to-text with downstream tasks like translation and summarisation.

Image2pcl - Enter the metaverse with 2D image to 3D projections

Line as a Visual Sentence: Context-aware Line Descriptor for Visual Localization

Natural Language Processing for Adverse Drug Reaction (ADR) Detection

🚀Clone a voice in 5 seconds to generate arbitrary speech in real-time

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators

Generate product descriptions, blogs, ads and more using GPT architecture with a single request to TextCortex API a.k.a Hemingwai

CodeBERT: A Pre-Trained Model for Programming and Natural Languages.

FewCLUE: 为中文NLP定制的小样本学习测评基准

This github repo is for Neurips 2021 paper, NORESQA A Framework for Speech Quality Assessment using Non-Matching References.