Pytorch implementation of Tacotron

Last update: Dec 02, 2022

Overview

Tacotron-pytorch

A pytorch implementation of Tacotron: A Fully End-to-End Text-To-Speech Synthesis Model.

Requirements

Install python 3
Install pytorch == 0.2.0
Install requirements:
```
pip install -r requirements.txt
```

Data

I used LJSpeech dataset which consists of pairs of text script and wav files. The complete dataset (13,100 pairs) can be downloaded here. I referred https://github.com/keithito/tacotron for the preprocessing code.

File description

hyperparams.py includes all hyper parameters that are needed.
data.py loads training data and preprocess text to index and wav files to spectrogram. Preprocessing codes for text is in text/ directory.
module.py contains all methods, including CBHG, highway, prenet, and so on.
network.py contains networks including encoder, decoder and post-processing network.
train.py is for training.
synthesis.py is for generating TTS sample.

Training the network

STEP 1. Download and extract LJSpeech data at any directory you want.
STEP 2. Adjust hyperparameters in hyperparams.py, especially 'data_path' which is a directory that you extract files, and the others if necessary.
STEP 3. Run train.py.

Generate TTS wav file

STEP 1. Run synthesis.py. Make sure the restore step.

Samples

You can check the generated samples in 'samples/' directory. Training step was only 60K, so the performance is not good yet.

Reference

Keith ito: https://github.com/keithito/tacotron

Comments

Any comments for the codes are always welcome.

Pytorch implementation of Tacotron

Related tags

Overview

Tacotron-pytorch

Requirements

Data

File description

Training the network

Generate TTS wav file

Samples

Reference

Comments

Owner

soobin seo

Rhythm-Finder is a unsupervised ML driven python powered web-application that can find the songs that suits you.

Natural Language Processing at EDHEC, 2022

Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning

Repository for Project Insight: NLP as a Service

Understanding the Difficulty of Training Transformers

Addon for adding subtitle files to blender VSE as Text sequences. Using pysub2 python module.

HuggingSound: A toolkit for speech-related tasks based on HuggingFace's tools

Predicting the usefulness of reviews given the review text and metadata surrounding the reviews.

Dual languaged (rus+eng) tool for packing and unpacking archives of Silky Engine.

ConvBERT: Improving BERT with Span-based Dynamic Convolution

A simple chatbot based on chatterbot that you can use for anything has basic features

MPNet: Masked and Permuted Pre-training for Language Understanding

:mag: Transformers at scale for question answering & neural search. Using NLP via a modular Retriever-Reader-Pipeline. Supporting DPR, Elasticsearch, HuggingFace's Modelhub...

An assignment on creating a minimalist neural network toolkit for CS11-747

Word2Wave: a framework for generating short audio samples from a text prompt using WaveGAN and COALA.

Common Voice Dataset explorer

TensorFlow code and pre-trained models for BERT

Switch spaces for knowledge graph embeddings

PocketSphinx is a lightweight speech recognition engine, specifically tuned for handheld and mobile devices, though it works equally well on the desktop

PyTorch implementation of convolutional neural networks-based text-to-speech synthesis models