vits chinese, tts chinese, tts mandarin

Last update: Dec 14, 2022

Related tags

Text Data & NLP tts

Overview

vits实现的中文TTS

this is the copy of https://github.com/jaywalnut310/vits

VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

Espnet连接：github.com/espnet/espnet/tree/master/espnet2/gan_tts/vits

coqui-ai/TTS连接：github.com/coqui-ai/TTS/tree/main/recipes/ljspeech/vits_tts

如果有侵权行为，请联系我，我将删除项目

If there is infringement, please contact me and I will delete the item

基于VITS 实现 16K baker TTS 的流程记录

apt-get install espeak

pip install -r requirements.txt

cd monotonic_align

python setup.py build_ext --inplace

将16K标贝音频拷贝到./baker_waves/，启动训练

python train.py -c configs/baker_base.json -m baker_base

两张1080卡，训练两天，基本可以使用了

测试

python vits_strings.py

上面的模型训练出来后存在，明显停顿的问题

原因：

1，本来已经在音素后面强插边界了，VITS又强插边界了，具体是配置参数："add_blank": true

2，可能影响，随机时长预测，具体配置参数：use_sdp=True

vits chinese, tts chinese, tts mandarin

Related tags

Overview

基于VITS 实现 16K baker TTS 的流程记录

将16K标贝音频拷贝到./baker_waves/，启动训练

测试

Owner

AmorTX

History Aware Multimodal Transformer for Vision-and-Language Navigation

nlp-tutorial is a tutorial for who is studying NLP(Natural Language Processing) using Pytorch

Implementation of "Adversarial purification with Score-based generative models", ICML 2021

Unsupervised Language Model Pre-training for French

Longformer: The Long-Document Transformer

Code for the paper "Language Models are Unsupervised Multitask Learners"

Blue Brain text mining toolbox for semantic search and structured information extraction

Implementation of paper Does syntax matter? A strong baseline for Aspect-based Sentiment Analysis with RoBERTa.

A Python package implementing a new model for text classification with visualization tools for Explainable AI :octocat:

A highly sophisticated sequence-to-sequence model for code generation

Code for the project carried out fulfilling the course requirements for Fall 2021 NLP at NYU

Creating a chess engine using GPT-3

PyTorch code for EMNLP 2019 paper "LXMERT: Learning Cross-Modality Encoder Representations from Transformers".

HAN2HAN : Hangul Font Generation

Awesome Treasure of Transformers Models Collection

texlive expressions for documents

A very simple framework for state-of-the-art Natural Language Processing (NLP)

Live Speech Portraits: Real-Time Photorealistic Talking-Head Animation (SIGGRAPH Asia 2021)

Code for EmBERT, a transformer model for embodied, language-guided visual task completion.

This project converts your human voice input to its text transcript and to an automated voice too.