Pytorch implementation of "Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech"

Last update: Dec 23, 2022

Related tags

Overview

GradTTS

Unofficial Pytorch implementation of "Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech" (arxiv)

About this repo

This is an unofficial implementation of GradTTS. We created this project based on GlowTTS (https://github.com/jaywalnut310/glow-tts). We replace the GlowDecoder with DiffusionDecoder which follows the settings of the original paper. In addition, we also replace torch.distributed with horovod for convenience and we don't use fp16 now.

Training and inference

Please go to egs/ folder, and see run.sh and inference_waveglow_vocoder.py for example use. Before training, please download and extract the LJ Speech dataset, then rename or create a link to the dataset folder: ln -s /path/to/LJSpeech-1.1/wavs DUMMY. And build Monotonic Alignment Search Code (Cython): cd monotonic_align; python setup.py build_ext --inplace. Before inference, you should download waveglow checkpoint from download_link and put it into the waveglow folder.

Reference Materials

Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech

GlowTTS

Score-Based Generative Modeling through Stochastic Differential Equations

score_sde_pytorch

denoising-diffusion-pytorch

Authors

Heyang Xue(https://github.com/WelkinYang) and Qicong Xie(https://github.com/QicongXie)

Pytorch implementation of "Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech"

Related tags

Overview

GradTTS

Unofficial Pytorch implementation of "Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech" (arxiv)

About this repo

Training and inference

Reference Materials

Authors

Owner

HeyangXue1997

ParmeSan: Sanitizer-guided Greybox Fuzzing

Graph Posterior Network: Bayesian Predictive Uncertainty for Node Classification (NeurIPS 2021)

This repo contains the official code and pre-trained models for the Dynamic Vision Transformer (DVT).

Code for paper "A Critical Assessment of State-of-the-Art in Entity Alignment" (https://arxiv.org/abs/2010.16314)

PyTorch implementation of D2C: Diffuison-Decoding Models for Few-shot Conditional Generation.

Python implementation of the multistate Bennett acceptance ratio (MBAR)

AdaFocus V2: End-to-End Training of Spatial Dynamic Networks for Video Recognition

[ACMMM 2021 Oral] Enhanced Invertible Encoding for Learned Image Compression

TVNet: Temporal Voting Network for Action Localization

The code for Expectation-Maximization Attention Networks for Semantic Segmentation (ICCV'2019 Oral)

[NeurIPS 2021] “Improving Contrastive Learning on Imbalanced Data via Open-World Sampling”,

This repository provides a basic implementation of our GCPR 2021 paper "Learning Conditional Invariance through Cycle Consistency"

Model serving at scale

COIN the currently largest dataset for comprehensive instruction video analysis.

Scene-Text-Detection-and-Recognition (Pytorch)

This repository is for our EMNLP 2021 paper "Automated Generation of Accurate & Fluent Medical X-ray Reports"

My usage of Real-ESRGAN to upscale anime, some test and results in the test_img folder

Code release for Local Light Field Fusion at SIGGRAPH 2019

Repo for CVPR2021 paper "QPIC: Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information"

An attempt at the implementation of Glom, Geoffrey Hinton's new idea that integrates neural fields, predictive coding, top-down-bottom-up, and attention (consensus between columns)