NATSpeech: A Non-Autoregressive Text-to-Speech Framework
| 中文文档
This repo contains official PyTorch implementation of:
- PortaSpeech: Portable and High-Quality Generative Text-to-Speech (NeurIPS 2021)
Demo page | HuggingFace🤗 Demo - DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism (DiffSpeech) (AAAI 2022)
Demo page | Project page | HuggingFace🤗 Demo
Key Features
We implement the following features in this framework:
- Data processing for non-autoregressive Text-to-Speech using Montreal Forced Aligner.
- Convenient and scalable framework for training and inference.
- Simple but efficient random-access dataset implementation.
Install Dependencies
## We tested on Linux/Ubuntu 18.04.
## Install Python 3.6+ first (Anaconda recommended).
export PYTHONPATH=.
# build a virtual env (recommended).
python -m venv venv
source venv/bin/activate
# install requirements.
pip install -U pip
pip install Cython numpy==1.19.1
pip install torch==1.9.0 # torch >= 1.9.0 recommended
pip install -r requirements.txt
sudo apt install -y sox libsox-fmt-mp3
bash mfa_usr/install_mfa.sh # install forced alignment tool
Documents
Citation
If you find this useful for your research, please cite the following papers:
- PortaSpeech
@article{ren2021portaspeech,
title={PortaSpeech: Portable and High-Quality Generative Text-to-Speech},
author={Ren, Yi and Liu, Jinglin and Zhao, Zhou},
journal={Advances in Neural Information Processing Systems},
volume={34},
year={2021}
}
- DiffSpeech
@article{liu2021diffsinger,
title={Diffsinger: Singing voice synthesis via shallow diffusion mechanism},
author={Liu, Jinglin and Li, Chengxi and Ren, Yi and Chen, Feiyang and Liu, Peng and Zhao, Zhou},
journal={arXiv preprint arXiv:2105.02446},
volume={2},
year={2021}
}
Acknowledgments
Our codes are influenced by the following repos: