FG-transformer-TTS Fine-grained style control in transformer-based text-to-speech synthesis

Overview

LST-TTS

Official implementation for the paper Fine-grained style control in transformer-based text-to-speech synthesis. Submitted to ICASSP 2022. Audio samples/demo for our system can be accessed here

Setting up submodules

git submodule update --init --recursive

Get the waveglow vocoder checkpoint from here (This is from the NVIDIA official WaveGlow repo).

Setup environment

See docker/Dockerfile for the packages need to be installed.

Dataset preprocessing

LJSpeech

python preprocess_LJSpeech.py --datadir LJSpeechDir --outputdir OutputDir

VCTK

Get the leading and trailing scilence marks from this repo, and put vctk-silences.0.92.txt in your VCTK dataset directory.

python preprocess_VCTK.py --datadir VCTKDir --outputdir Output_Train_Dir
python preprocess_VCTK.py --datadir VCTKDir --outputdir Output_Test_Dir --make_test_set
  • --make_test_set: specify this flag to process the speakers in the test set, otherwise only process training speakers.

Training

LJSpeech

python train_TTS.py --precision 16 \
                    --datadir FeatureDir \
                    --vocoder_ckpt_path WaveGlowCKPT_PATH \
                    --sampledir SampleDir \
                    --batch_size 128 \
                    --check_val_every_n_epoch 50 \
                    --use_guided_attn \
                    --training_step 250000 \
                    --n_guided_steps 250000 \
                    --saving_path Output_CKPT_DIR \
                    --datatype LJSpeech \
                    [--distributed]
  • --distributed: enable DDP multi-GPU training
  • --batch_size: batch size per GPU, scale down if you train with multi-GPU and want to keep the same batch size
  • --check_val_every_n_epoch: sample and validate every n epoch
  • --datadir: output directory of the preprocess scripts

VCTK

python train_TTS.py --precision 16 \
                    --datadir FeatureDir \
                    --vocoder_ckpt_path WaveGlowCKPT_PATH \
                    --sampledir SampleDir \
                    --batch_size 64 \
                    --check_val_every_n_epoch 50 \
                    --use_guided_attn \
                    --training_step 150000 \
                    --n_guided_steps 150000 \
                    --etts_checkpoint LJSpeech_Model_CKPT \
                    --saving_path Output_CKPT_DIR \
                    --datatype VCTK \
                    [--distributed]
  • --etts_checkpoint: the checkpoint path of pretrained model (on LJ Speech)

Synthesis

We provide examples for synthesis of the system in synthesis.py, you can adjust this script to your own usage. Example to run synthesis.py:

python synthesis.py --etts_checkpoint VCTK_Model_CKPT \
                    --sampledir SampleDir \
                    --datatype VCTK \
                    --vocoder_ckpt_path WaveGlowCKPT_PATH
Owner
Li-Wei Chen
Li-Wei Chen
This repository contains the code and models necessary to replicate the results of paper: How to Robustify Black-Box ML Models? A Zeroth-Order Optimization Perspective

Black-Box-Defense This repository contains the code and models necessary to replicate the results of our recent paper: How to Robustify Black-Box ML M

OPTML Group 2 Oct 05, 2022
Object classification with basic computer vision techniques

naive-image-classification Object classification with basic computer vision techniques. Final assignment for the computer vision course I took at univ

2 Jul 01, 2022
Minimisation of a negative log likelihood fit to extract the lifetime of the D^0 meson (MNLL2ELDM)

Minimisation of a negative log likelihood fit to extract the lifetime of the D^0 meson (MNLL2ELDM) Introduction The average lifetime of the $D^{0}$ me

Son Gyo Jung 1 Dec 17, 2021
A deep learning library that makes face recognition efficient and effective

Distributed Arcface Training in Pytorch This is a deep learning library that makes face recognition efficient, and effective, which can train tens of

Sajjad Aemmi 10 Nov 23, 2021
Shitty gaze mouse controller

demo.mp4 shitty_gaze_mouse_cotroller install tensofflow, cv2 run the main.py and as it starts it will collect data so first raise your left eyebrow(bo

16 Aug 30, 2022
Python scripts performing class agnostic object localization using the Object Localization Network model in ONNX.

ONNX Object Localization Network Python scripts performing class agnostic object localization using the Object Localization Network model in ONNX. Ori

Ibai Gorordo 15 Oct 14, 2022
FocusFace: Multi-task Contrastive Learning for Masked Face Recognition

FocusFace This is the official repository of "FocusFace: Multi-task Contrastive Learning for Masked Face Recognition" accepted at IEEE International C

Pedro Neto 21 Nov 17, 2022
Wider-Yolo Kütüphanesi ile Yüz Tespit Uygulamanı Yap

WIDER-YOLO : Yüz Tespit Uygulaması Yap Wider-Yolo Kütüphanesinin Kullanımı 1. Wider Face Veri Setini İndir Train Dataset Val Dataset Test Dataset Not:

Kadir Nar 6 Aug 22, 2022
CONetV2: Efficient Auto-Channel Size Optimization for CNNs

CONetV2: Efficient Auto-Channel Size Optimization for CNNs Exciting News! CONetV2: Efficient Auto-Channel Size Optimization for CNNs has been accepted

Mahdi S. Hosseini 3 Dec 13, 2021
Preparation material for Dropbox interviews

Dropbox-Onsite-Interviews A guide for the Dropbox onsite interview! The Dropbox interview question bank is very small. The bank has been in a Chinese

386 Dec 31, 2022
Face Recognize System on camera AI OAK1

FRS on OAK1 Face Recognize System on camera OAK1 This project contains our work that deploy on camera OAK1 Features Anti-Spoofing Face detection Face

Tran Anh Tuan 6 Aug 08, 2022
A python package simulating the quasi-2D pseudospin-1/2 Gross-Pitaevskii equation with NVIDIA GPU acceleration.

A python package simulating the quasi-2D pseudospin-1/2 Gross-Pitaevskii equation with NVIDIA GPU acceleration. Introduction spinor-gpe is high-level,

2 Sep 20, 2022
Learning Continuous Signed Distance Functions for Shape Representation

DeepSDF This is an implementation of the CVPR '19 paper "DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation" by Park et a

Meta Research 1.1k Jan 01, 2023
(CVPR 2022) Energy-based Latent Aligner for Incremental Learning

Energy-based Latent Aligner for Incremental Learning Accepted to CVPR 2022 We illustrate an Incremental Learning model trained on a continuum of tasks

Joseph K J 37 Jan 03, 2023
Code to use Augmented Shapiro Wilks Stopping, as well as code for the paper "Statistically Signifigant Stopping of Neural Network Training"

This codebase is being actively maintained, please create and issue if you have issues using it Basics All data files are included under losses and ea

J K Terry 32 Nov 09, 2021
[AAAI 2021] EMLight: Lighting Estimation via Spherical Distribution Approximation and [ICCV 2021] Sparse Needlets for Lighting Estimation with Spherical Transport Loss

EMLight: Lighting Estimation via Spherical Distribution Approximation (AAAI 2021) Update 12/2021: We release our Virtual Object Relighting (VOR) Datas

Fangneng Zhan 144 Jan 06, 2023
A smart Chat bot that can help to know about corona virus and Make prediction of corona using X-ray.

TRINIT_Hum_kuchh_nahi_karenge_ML01 Document Link https://github.com/Jatin-Goyal-552/TRINIT_Hum_kuchh_nahi_karenge_ML01/blob/main/hum_kuchh_nahi_kareng

JatinGoyal 1 Feb 03, 2022
Fast, general, and tested differentiable structured prediction in PyTorch

Fast, general, and tested differentiable structured prediction in PyTorch

HNLP 1.1k Dec 16, 2022
This is a vision-based 3d model manipulation and control UI

Manipulation of 3D Models Using Hand Gesture This program allows user to manipulation 3D models (.obj format) with their hands. The project support bo

Cortic Technology Corp. 43 Oct 23, 2022