Official implementation of "One-Shot Voice Conversion with Weight Adaptive Instance Normalization".

Last update: Dec 07, 2022

Related tags

Deep Learning WadaIN-VC

Overview

One-Shot Voice Conversion with Weight Adaptive Instance Normalization

By Shengjie Huang, Yanyan Xu*, Dengfeng Ke*, Mingjie Chen, Thomas Hain.

This repo is the official implementation of "One-Shot Voice Conversion with Weight Adaptive Instance Normalization".

Audio samples are available at here.

Dependencies

python 3.6.0
pytorch 1.4.0
pyyaml 5.4.1
numpy 1.19.5
librosa 0.8.0
soundfile 0.10.2
tensorboardX 2.1

Preprocess

What you need to prepare first before running this project and how to prepare them

We use the ParallelWaveGAN as our vocoder, and VCTK as our data set.
If you wanna run our project, please install as the description of ParallelWaveGAN project first.
And then prepare all the mel-spectrogram data as ParallelWaveGAN do.
Prepare the speaker_used.json file by yourself, as ./data/80_train_speaker_used.json and ./data/fine_tune_speaker_used.json show.
Prepare the feats.scp file by runing ./convert_decode/convert_mel/get_scp.py .

Assume that your prepared mel-spectrograms are sorted in the files tree like:

├── p225
│   ├── p225_001-feats.npy
│   ├── p225_004-feats.npy
│   ├── p225_005-feats.npy
│   ......
├── p226
│   ├── p226_001-feats.npy
│   ├── p226_003-feats.npy
│   ├── p226_004-feats.npy
│   ......
├── p227
│   ......
├── p228
│   ......
│   ...
│   ...

Training

Run the pretrain stage by bash run_main.sh. We use 80 speakers of VCTK data set, and all utterances for each person.

Fine Tuning

Run the fine tune stage by bash run_fine_tune.sh. We use the other 10 speakers of VCTK data set, and only 1 utterance for each person used.

Inference

$ cd convert_decode/convert_mel
$ bash run_convert.sh

We generate one-shot voice conversion utterances between the 10 one-shot speakers , and use their other unseen utterances to perform one-shot voice conversion!

Official implementation of "One-Shot Voice Conversion with Weight Adaptive Instance Normalization".

Related tags

Overview

One-Shot Voice Conversion with Weight Adaptive Instance Normalization

Dependencies

Preprocess

What you need to prepare first before running this project and how to prepare them

Assume that your prepared mel-spectrograms are sorted in the files tree like:

Training

Fine Tuning

Inference

Owner

Zero-Shot Text-to-Image Generation VQGAN+CLIP Dockerized

M2MRF: Many-to-Many Reassembly of Features for Tiny Lesion Segmentation in Fundus Images

Zsseg.baseline - Zero-Shot Semantic Segmentation

The datasets and code of ACL 2021 paper "Aspect-Category-Opinion-Sentiment Quadruple Extraction with Implicit Aspects and Opinions".

PyTorch implementation of residual gated graph ConvNets, ICLR’18

Code for the ICCV 2021 paper "Pixel Difference Networks for Efficient Edge Detection" (Oral).

🛰️ Awesome Satellite Imagery Datasets

PyTorch implementation of normalizing flow models

Useful materials and tutorials for 110-1 NTU DBME5028 (Application of Deep Learning in Medical Imaging)

Extreme Lightwegith Portrait Segmentation

This is an implementation of PIFuhd based on Pytorch

Face Recognition & AI Based Smart Attendance Monitoring System.

Deformable DETR is an efficient and fast-converging end-to-end object detector.

Pytorch implementation of BRECQ, ICLR 2021

A Python library created to assist programmers with complex mathematical functions

A minimalist tool to display a network graph.

Multi-Scale Aligned Distillation for Low-Resolution Detection (CVPR2021)

Real-Time High-Resolution Background Matting

Source code for Task-Aware Variational Adversarial Active Learning

harmonic-percussive-residual separation algorithm wrapped as a VST3 plugin (iPlug2)