Phonetic PosteriorGram (PPG)-Based Voice Conversion (VC)

Last update: Dec 28, 2022

Related tags

Deep Learning ppg-vc

Overview

ppg-vc

Phonetic PosteriorGram (PPG)-Based Voice Conversion (VC)

This repo implements different kinds of PPG-based VC models. Pretrained models. More models are on the way.

Notes:

The PPG model provided in conformer_ppg_model is based on Hybrid CTC-Attention phoneme recognizer, trained with LibriSpeech (960hrs). PPGs have frame-shift of 10 ms, with dimensionality of 144. This modelis very much similar to the one used in this paper.
This repo uses HifiGAN V1 as the vocoder model, sampling rate of synthesized audio is 24kHz.

Highlights

Any-to-many VC
Any-to-Any VC (a.k.a. few/one-shot VC)

How to use

Data preprocessing

Please run 1_compute_ctc_att_bnf.py to compute PPG features.
Please run 2_compute_f0.py to compute fundamental frequency.
Please run 3_compute_spk_dvecs.py to compute speaker d-vectors.

Training

Please refer to run.sh

Conversion

Plesae refer to test.sh

TODO

Upload pretraind models.

Citations

@ARTICLE{liu2021any,
  author={Liu, Songxiang and Cao, Yuewen and Wang, Disong and Wu, Xixin and Liu, Xunying and Meng, Helen},
  journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing}, 
  title={Any-to-Many Voice Conversion With Location-Relative Sequence-to-Sequence Modeling}, 
  year={2021},
  volume={29},
  number={},
  pages={1717-1728},
  doi={10.1109/TASLP.2021.3076867}
}

@inproceedings{Liu2018,
  author={Songxiang Liu and Jinghua Zhong and Lifa Sun and Xixin Wu and Xunying Liu and Helen Meng},
  title={Voice Conversion Across Arbitrary Speakers Based on a Single Target-Speaker Utterance},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={496--500},
  doi={10.21437/Interspeech.2018-1504},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1504}
}

Phonetic PosteriorGram (PPG)-Based Voice Conversion (VC)

Related tags

Overview

ppg-vc

Highlights

How to use

Data preprocessing

Training

Conversion

TODO

Citations

Owner

Liu Songxiang

Python script for performing depth completion from sparse depth and rgb images using the msg_chn_wacv20. model in Tensorflow Lite.

PSGAN running with ncnn⚡妆容迁移/仿妆⚡Imitation Makeup/Makeup Transfer⚡

Lorien: A Unified Infrastructure for Efficient Deep Learning Workloads Delivery

Yolo algorithm for detection + centroid tracker to track vehicles

Multiband spectro-radiometric satellite image analysis with K-means cluster algorithm

Product-based-recommendation-system - A product based recommendation system which uses Machine learning algorithm such as KNN and cosine similarity

Using this codebase as a tool for my own research. Making some modifications to the original repo for my own purposes.

Implementation for Stankevičiūtė et al. "Conformal time-series forecasting", NeurIPS 2021.

Prediction of MBA refinance Index (Mortgage prepayment)

Small utility to demangle Nim symbols in callgrind files

Classify music genre from a 10 second sound stream using a Neural Network.

A tensorflow implementation of GCN-LPA

A colab notebook for training Stylegan2-ada on colab, transfer learning onto your own dataset.

Pathdreamer: A World Model for Indoor Navigation

Real-world Anomaly Detection in Surveillance Videos- pytorch Re-implementation

[ICLR2021oral] Rethinking Architecture Selection in Differentiable NAS

[NeurIPS 2020] Code for the paper "Balanced Meta-Softmax for Long-Tailed Visual Recognition"

Time Series Forecasting with Temporal Fusion Transformer in Pytorch

DSAC* for Visual Camera Re-Localization (RGB or RGB-D)

YOLO5Face: Why Reinventing a Face Detector (https://arxiv.org/abs/2105.12931)