StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion

Overview

StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion

Yinghao Aaron Li, Ali Zare, Nima Mesgarani

We present an unsupervised non-parallel many-to-many voice conversion (VC) method using a generative adversarial network (GAN) called StarGAN v2. Using a combination of adversarial source classifier loss and perceptual loss, our model significantly outperforms previous VC models. Although our model is trained only with 20 English speakers, it generalizes to a variety of voice conversion tasks, such as any-to-many, cross-lingual, and singing conversion. Using a style encoder, our framework can also convert plain reading speech into stylistic speech, such as emotional and falsetto speech. Subjective and objective evaluation experiments on a non-parallel many-to-many voice conversion task revealed that our model produces natural sounding voices, close to the sound quality of state-of-the-art text-tospeech (TTS) based voice conversion methods without the need for text labels. Moreover, our model is completely convolutional and with a faster-than-real-time vocoder such as Parallel WaveGAN can perform real-time voice conversion.

Paper: https://arxiv.org/abs/2107.10394

Audio samples: https://starganv2-vc.github.io/

Pre-requisites

  1. Python >= 3.7
  2. Clone this repository:
git https://github.com/yl4579/StarGANv2-VC.git
cd StarGANv2-VC
  1. Install python requirements:
pip install SoundFile torchaudio munch parallel_wavegan torch pydub
  1. Download and extract the VCTK dataset and use VCTK.ipynb to prepare the data (downsample to 24 kHz etc.). You can also download the dataset we have prepared and unzip it to the Data folder, use the provided config.yml to reproduce our models.

Training

python train.py --config_path ./Configs/config.yml

Please specify the training and validation data in config.yml file. Change num_domains to the number of speakers in the dataset. The data list format needs to be filename.wav|speaker_number, see train_list.txt as an example.

Checkpoints and Tensorboard logs will be saved at log_dir. To speed up training, you may want to make batch_size as large as your GPU RAM can take. However, please note that batch_size = 5 will take around 10G GPU RAM.

Inference

Please refer to inference.ipynb for details.

The pretrained StarGANv2 and ParallelWaveGAN on VCTK corpus can be downloaded at StarGANv2 Link and ParallelWaveGAN Link. Please unzip to Models and Vocoder respectivey and run each cell in the notebook.

ASR & F0 Models

The pretrained F0 and ASR models are provided under the Utils folder. Both the F0 and ASR models are trained with melspectrograms preprocessed using meldataset.py, and both models are trained on speech data only.

The ASR model is trained on English corpus, but it appears to work when training StarGANv2 models in other languages such as Japanese. The F0 model also appears to work with singing data. For the best performance, however, training your own ASR and F0 models is encouraged for non-English and non-speech data.

You can edit the meldataset.py with your own melspectrogram preprocessing, but the provided pretrained models will no longer work. You will need to train your own ASR and F0 models with the new preprocessing. You may refer to repo Diamondfan/CTC_pytorch and keums/melodyExtraction_JDC to train your own the ASR and F0 models, for example.

References

Acknowledgement

The author would like to thank @tosaka-m for his great repository and valuable discussions.

Owner
Aaron (Yinghao) Li
Aaron (Yinghao) Li
Tree-based Search Graph for Approximate Nearest Neighbor Search

TBSG: Tree-based Search Graph for Approximate Nearest Neighbor Search. TBSG is a graph-based algorithm for ANNS based on Cover Tree, which is also an

Fanxbin 2 Dec 27, 2022
A PyTorch toolkit for 2D Human Pose Estimation.

PyTorch-Pose PyTorch-Pose is a PyTorch implementation of the general pipeline for 2D single human pose estimation. The aim is to provide the interface

Wei Yang 1.1k Dec 30, 2022
Code for Fully Context-Aware Image Inpainting with a Learned Semantic Pyramid

SPN: Fully Context-Aware Image Inpainting with a Learned Semantic Pyramid Code for Fully Context-Aware Image Inpainting with a Learned Semantic Pyrami

12 Jun 27, 2022
PFLD pytorch Implementation

PFLD-pytorch Implementation of PFLD A Practical Facial Landmark Detector by pytorch. 1. install requirements pip3 install -r requirements.txt 2. Datas

zhaozhichao 669 Jan 02, 2023
A C implementation for creating 2D voronoi diagrams

Branch OSX/Linux Windows master dev jc_voronoi A fast C/C++ header only implementation for creating 2D Voronoi diagrams from a point set Uses Fortune'

Mathias Westerdahl 481 Dec 29, 2022
Modeling Temporal Concept Receptive Field Dynamically for Untrimmed Video Analysis

Modeling Temporal Concept Receptive Field Dynamically for Untrimmed Video Analysis This is a PyTorch implementation of the model described in our pape

qzhb 6 Jul 08, 2021
天勤量化开发包, 期货量化, 实时行情/历史数据/实盘交易

TqSdk 天勤量化交易策略程序开发包 TqSdk 是一个由信易科技发起并贡献主要代码的开源 python 库. 依托快期多年积累成熟的交易及行情服务器体系, TqSdk 支持用户使用极少的代码量构建各种类型的量化交易策略程序, 并提供包含期货、期权、股票的 历史数据-实时数据-开发调试-策略回测-

信易科技 2.8k Dec 30, 2022
SMD-Nets: Stereo Mixture Density Networks

SMD-Nets: Stereo Mixture Density Networks This repository contains a Pytorch implementation of "SMD-Nets: Stereo Mixture Density Networks" (CVPR 2021)

Fabio Tosi 115 Dec 26, 2022
Source code for "OmniPhotos: Casual 360° VR Photography"

OmniPhotos: Casual 360° VR Photography Project Page | Video | Paper | Demo | Data This repository contains the source code for creating and viewing Om

Christian Richardt 144 Dec 30, 2022
Pose Transformers: Human Motion Prediction with Non-Autoregressive Transformers

Pose Transformers: Human Motion Prediction with Non-Autoregressive Transformers This is the repo used for human motion prediction with non-autoregress

Idiap Research Institute 26 Dec 14, 2022
PyTorch Implementation of [1611.06440] Pruning Convolutional Neural Networks for Resource Efficient Inference

PyTorch implementation of [1611.06440 Pruning Convolutional Neural Networks for Resource Efficient Inference] This demonstrates pruning a VGG16 based

Jacob Gildenblat 836 Dec 26, 2022
This repo contains the code and data used in the paper "Wizard of Search Engine: Access to Information Through Conversations with Search Engines"

Wizard of Search Engine: Access to Information Through Conversations with Search Engines by Pengjie Ren, Zhongkun Liu, Xiaomeng Song, Hongtao Tian, Zh

19 Oct 27, 2022
A curated list of awesome deep long-tailed learning resources.

A curated list of awesome deep long-tailed learning resources.

vanint 210 Dec 25, 2022
Discover hidden deepweb pages

DeepWeb Scapper Att: Demo version An simple script to scrappe deepweb to find pages. Will return if any of those exists and will save on a file. You s

Héber Júlio 77 Oct 02, 2022
Official code of our work, AVATAR: A Parallel Corpus for Java-Python Program Translation.

AVATAR Official code of our work, AVATAR: A Parallel Corpus for Java-Python Program Translation. AVATAR stands for jAVA-pyThon progrAm tRanslation. AV

Wasi Ahmad 26 Dec 03, 2022
✅ How Robust are Fact Checking Systems on Colloquial Claims?. In NAACL-HLT, 2021.

How Robust are Fact Checking Systems on Colloquial Claims? Official PyTorch implementation of our NAACL paper: Byeongchang Kim*, Hyunwoo Kim*, Seokhee

Byeongchang Kim 19 Mar 15, 2022
This is the repository for CVPR2021 Dynamic Metric Learning: Towards a Scalable Metric Space to Accommodate Multiple Semantic Scales

Intro This is the repository for CVPR2021 Dynamic Metric Learning: Towards a Scalable Metric Space to Accommodate Multiple Semantic Scales Vehicle Sam

39 Jul 21, 2022
[arXiv22] Disentangled Representation Learning for Text-Video Retrieval

Disentangled Representation Learning for Text-Video Retrieval This is a PyTorch implementation of the paper Disentangled Representation Learning for T

Qiang Wang 49 Dec 18, 2022
An open source Python package for plasma science that is under development

PlasmaPy PlasmaPy is an open source, community-developed Python 3.7+ package for plasma science. PlasmaPy intends to be for plasma science what Astrop

PlasmaPy 444 Jan 07, 2023
This repository contains code and data for "On the Multimodal Person Verification Using Audio-Visual-Thermal Data"

trimodal_person_verification This repository contains the code, and preprocessed dataset featured in "A Study of Multimodal Person Verification Using

ISSAI 7 Aug 31, 2022