Official implementation of deep Gaussian process (DGP)-based multi-speaker speech synthesis with PyTorch.

Last update: Sep 07, 2022

Related tags

Overview

Multi-speaker DGP

This repository provides official implementation of deep Gaussian process (DGP)-based multi-speaker speech synthesis with PyTorch.

Our paper: Deep Gaussian Process Based Multi-speaker Speech Synthesis with Latent Speaker Representation

Test environment

This repository is tested in the following environment.

Ubuntu 18.04
NVIDIA GeForce RTX 2080 Ti
Python 3.7.3
CUDA 11.1
cuDNN 8.1.1

Setup

You can complete setup by simply executing setup.sh.

$ . ./setup.sh

*Please make sure that installed PyTorch is compatible with CUDA (see https://pytorch.org/ for more info). Otherwise, CUDA error will occur during training.

How to use

This repository is designed according to Kaldi-style recipe. To run the scripts, please follow the below instruction. JVS corpus [Takamichi et al., 2020] can be downloaded from here.

# Move to the recipe directory
$ cd egs/jvs

# Download the corpus to be used. The directory structure will be as follows:

├── conf/     # directory containing YAML format configuration files
├── jvs_ver1/ # downloaded data
├── local/    # directory containing corpus-dependent scripts
└── run.sh    # main scripts

# Run the recipe from scratch
$ ./run.sh

# Or you can run the recipe step by step
$ ./run.sh --stage 0 --stop-stage 0  # train/dev/eval split
$ ./run.sh --stage 1 --stop-stage 1  # preprocessing
$ ./run.sh --stage 2 --stop-stage 2  # train phoneme duration model
$ ./run.sh --stage 3 --stop-stage 3  # train acoustic model
$ ./run.sh --stage 4 --stop-stage 4  # decoding

# During stage 2 & 3, you can monitor logs using Tensorboard
# for example:
$ tensorboard --logdir exp/dgp

How to customize

conf/*.yaml include all settings for data preparation, preprocessing, training, and decoding. We have prepared two configuration files, dgp.yaml and dgplvm.yaml. You can change experimental conditions by editing these files.

Official implementation of deep Gaussian process (DGP)-based multi-speaker speech synthesis with PyTorch.

Related tags

Overview

Multi-speaker DGP

Test environment

Setup

How to use

How to customize

Owner

sarulab-speech

Language models are open knowledge graphs ( non official implementation )

A web porting for NVlabs' StyleGAN2, to facilitate exploring all kinds characteristic of StyleGAN networks

[CVPR 2022] Structured Sparse R-CNN for Direct Scene Graph Generation

Predicting path with preference based on user demonstration using Maximum Entropy Deep Inverse Reinforcement Learning in a continuous environment

Multi-Object Tracking in Satellite Videos with Graph-Based Multi-Task Modeling

A DeepStack custom model for detecting common objects in dark/night images and videos.

Unofficial implementation of PatchCore anomaly detection

Nsdf: A mesh SDF with just some code we can directly paste into our raymarcher

A Simple Example for Imitation Learning with Dataset Aggregation (DAGGER) on Torcs Env

Neural machine translation between the writings of Shakespeare and modern English using TensorFlow

Computer Vision Paper Reviews with Key Summary of paper, End to End Code Practice and Jupyter Notebook converted papers

Bayesian dessert for Lasagne

Hierarchical Aggregation for 3D Instance Segmentation (ICCV 2021)

Software & Hardware to do multi color printing with Sharpies

The open source code of SA-UNet: Spatial Attention U-Net for Retinal Vessel Segmentation.

Evaluation Pipeline for our ECCV2020: Journey Towards Tiny Perceptual Super-Resolution.

Improving adversarial robustness by a coupling rejection strategy

(IEEE TIP 2021) Regularized Densely-connected Pyramid Network for Salient Instance Segmentation

The code for the CVPR 2021 paper Neural Deformation Graphs, a novel approach for globally-consistent deformation tracking and 3D reconstruction of non-rigid objects.

FinRL-Meta: A Universe for Data-Driven Financial Reinforcement Learning. 🔥

Official implementation of deep Gaussian process (DGP)-based multi-speaker speech synthesis with PyTorch.

Related tags

Overview

Multi-speaker DGP

Test environment

Setup

How to use

How to customize

Owner

sarulab-speech

Language models are open knowledge graphs ( non official implementation )

A web porting for NVlabs' StyleGAN2, to facilitate exploring all kinds characteristic of StyleGAN networks

[CVPR 2022] Structured Sparse R-CNN for Direct Scene Graph Generation

Predicting path with preference based on user demonstration using Maximum Entropy Deep Inverse Reinforcement Learning in a continuous environment

Multi-Object Tracking in Satellite Videos with Graph-Based Multi-Task Modeling

A DeepStack custom model for detecting common objects in dark/night images and videos.

Unofficial implementation of PatchCore anomaly detection

Nsdf: A mesh SDF with just some code we can directly paste into our raymarcher

A Simple Example for Imitation Learning with Dataset Aggregation (DAGGER) on Torcs Env

Neural machine translation between the writings of Shakespeare and modern English using TensorFlow

Computer Vision Paper Reviews with Key Summary of paper, End to End Code Practice and Jupyter Notebook converted papers

Bayesian dessert for Lasagne

Hierarchical Aggregation for 3D Instance Segmentation (ICCV 2021)

Software & Hardware to do multi color printing with Sharpies

The open source code of SA-UNet: Spatial Attention U-Net for Retinal Vessel Segmentation.

Evaluation Pipeline for our ECCV2020: Journey Towards Tiny Perceptual Super-Resolution.

Improving adversarial robustness by a coupling rejection strategy

(IEEE TIP 2021) Regularized Densely-connected Pyramid Network for Salient Instance Segmentation

The code for the CVPR 2021 paper Neural Deformation Graphs, a novel approach for globally-consistent deformation tracking and 3D reconstruction of non-rigid objects.

FinRL­-Meta: A Universe for Data­-Driven Financial Reinforcement Learning. 🔥

FinRL-Meta: A Universe for Data-Driven Financial Reinforcement Learning. 🔥