Any-to-any voice conversion using synthetic specific-speaker speeches as intermedium features

Overview

MediumVC

MediumVC is an utterance-level method towards any-to-any VC. Before that, we propose SingleVC to perform A2O tasks(Xi → Ŷi) , Xi means utterance i spoken by X). The Ŷi are considered as SSIF. To build SingleVC, we employ a novel data augment strategy: pitch-shifted and duration-remained(PSDR) to produce paired asymmetrical training data. Then, based on pre-trained SingleVC, MediumVC performs an asymmetrical reconstruction task(Ŷi → X̂i). Due to the asymmetrical reconstruction mode, MediumVC achieves more efficient feature decoupling and fusion. Experiments demonstrate MediumVC performs strong robustness for unseen speakers across multiple public datasets. Here is the official implementation of the paper, MediumVC.

The following are the overall model architecture.

Model architecture

For the audio samples, please refer to our demo page. The more converted speeches can be found in "Demo/ConvertedSpeeches/".

Envs

You can install the dependencies with

pip install -r requirements.txt

Speaker Encoder

Dvector is a robust speaker verification (SV) system pre-trained on VoxCeleb1 using GE2E loss, and it produces 256-dim speaker embedding. In our evaluation on multiple datasets(VCTK with 30000 pairs, Librispeech with 30000 pairs and VCC2020 with 10000 pairs), the equal error rates(EERs)and thresholds(THRs) are recorded in Table. Then Dvector with THRs is also employed to calculate SV accuracy(ACC) of pairs produced by MediumVC and other contrast methods for objective evaluation. The more details can access paper.

Dataset VCTK LibriSpeech VCC2020
EER(%)/THR 7.71/0.462 7.95/0.337 1.06/0.432

Vocoder

The HiFi-GAN vocoder is employed to convert log mel-spectrograms to waveforms. The model is trained on universal datasets with 13.93M parameters. Through our evaluation, it can synthesize 22.05 kHz high-fidelity speeches over 4.0 MOS, even in cross-language or noisy environments.

Infer

You can download the pretrained model, and then edit "Any2Any/infer/infer_config.yaml".Test Samples could be organized as "wav22050/$figure$/*.wav".

python Any2Any/infer/infer.py

Train from scratch

Preprocessing

The corpus should be organized as "VCTK22050/$figure$/*.wav", and then edit the config file "Any2Any/pre_feature/preprocess_config.yaml".The output "spk_emb_mel_label.pkl" will be used for training.

python Any2Any/pre_feature/figure_spkemb_mel.py

Training

Please edit the paths of pretrained hifi-model,wav2mel,dvector,SingleVC in config file "Any2Any/config.yaml" at first.

python Any2Any/solver.py
Owner
谷下雨
美中不足
谷下雨
[NeurIPS'21 Spotlight] PyTorch code for our paper "Aligned Structured Sparsity Learning for Efficient Image Super-Resolution"

ASSL This repository is for a new network pruning method (Aligned Structured Sparsity Learning, ASSL) for efficient single image super-resolution (SR)

Huan Wang 47 Nov 28, 2022
PyBullet CartPole and Quadrotor environments—with CasADi symbolic a priori dynamics—for learning-based control and reinforcement learning

safe-control-gym Physics-based CartPole and Quadrotor Gym environments (using PyBullet) with symbolic a priori dynamics (using CasADi) for learning-ba

Dynamic Systems Lab 300 Dec 28, 2022
Pytorch implementation of paper "Learning Co-segmentation by Segment Swapping for Retrieval and Discovery"

SegSwap Pytorch implementation of paper "Learning Co-segmentation by Segment Swapping for Retrieval and Discovery" [PDF] [Project page] If our project

xshen 41 Dec 10, 2022
Code for "Learning the Best Pooling Strategy for Visual Semantic Embedding", CVPR 2021

Learning the Best Pooling Strategy for Visual Semantic Embedding Official PyTorch implementation of the paper Learning the Best Pooling Strategy for V

Jiacheng Chen 106 Jan 06, 2023
The Wearables Development Toolkit - a development environment for activity recognition applications with sensor signals

Wearables Development Toolkit (WDK) The Wearables Development Toolkit (WDK) is a framework and set of tools to facilitate the iterative development of

Juan Haladjian 114 Nov 27, 2022
TransMIL: Transformer based Correlated Multiple Instance Learning for Whole Slide Image Classification

TransMIL: Transformer based Correlated Multiple Instance Learning for Whole Slide Image Classification [NeurIPS 2021] Abstract Multiple instance learn

132 Dec 30, 2022
iBOT: Image BERT Pre-Training with Online Tokenizer

Image BERT Pre-Training with iBOT Official PyTorch implementation and pretrained models for paper iBOT: Image BERT Pre-Training with Online Tokenizer.

Bytedance Inc. 435 Jan 06, 2023
This framework implements the data poisoning method found in the paper Adversarial Examples Make Strong Poisons

Adversarial poison generation and evaluation. This framework implements the data poisoning method found in the paper Adversarial Examples Make Strong

31 Nov 01, 2022
Virtual hand gesture mouse using a webcam

NonMouse 日本語のREADMEはこちら This is an application that allows you to use your hand itself as a mouse. The program uses a web camera to recognize your han

Yuki Takeyama 55 Jan 01, 2023
Generalized and Efficient Blackbox Optimization System.

OpenBox Doc | OpenBox中文文档 OpenBox: Generalized and Efficient Blackbox Optimization System OpenBox is an efficient and generalized blackbox optimizatio

DAIR Lab 238 Dec 29, 2022
Scenarios, tutorials and demos for Autonomous Driving

The Autonomous Driving Cookbook (Preview) NOTE: This project is developed and being maintained by Project Road Runner at Microsoft Garage. This is cur

Microsoft 2.1k Jan 02, 2023
A pure PyTorch batched computation implementation of "CIF: Continuous Integrate-and-Fire for End-to-End Speech Recognition"

A pure PyTorch batched computation implementation of "CIF: Continuous Integrate-and-Fire for End-to-End Speech Recognition"

張致強 14 Dec 02, 2022
Self-Supervised CNN-GCN Autoencoder

GCNDepth Self-Supervised CNN-GCN Autoencoder GCNDepth: Self-supervised monocular depth estimation based on graph convolutional network To be published

53 Dec 14, 2022
Repository for the paper titled: "When is BERT Multilingual? Isolating Crucial Ingredients for Cross-lingual Transfer"

When is BERT Multilingual? Isolating Crucial Ingredients for Cross-lingual Transfer This repository contains code for our paper titled "When is BERT M

Princeton Natural Language Processing 9 Dec 23, 2022
Pytorch Implementation of Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic

Pytorch Implementation of Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic [Paper] [Colab is coming soon] Approach Example Usage To r

170 Jan 03, 2023
Pytorch implementation of the unsupervised object discovery method LOST.

LOST Pytorch implementation of the unsupervised object discovery method LOST. More details can be found in the paper: Localizing Objects with Self-Sup

Valeo.ai 189 Dec 25, 2022
Generative Art Using Neural Visual Grammars and Dual Encoders

Generative Art Using Neural Visual Grammars and Dual Encoders Arnheim 1 The original algorithm from the paper Generative Art Using Neural Visual Gramm

DeepMind 231 Jan 05, 2023
Benchmarks for the Optimal Power Flow Problem

Power Grid Lib - Optimal Power Flow This benchmark library is curated and maintained by the IEEE PES Task Force on Benchmarks for Validation of Emergi

A Library of IEEE PES Power Grid Benchmarks 207 Dec 08, 2022
Classify the disease status of a plant given an image of a passion fruit

Passion Fruit Disease Detection I tried to create an accurate machine learning models capable of localizing and identifying multiple Passion Fruits in

3 Nov 09, 2021
Tensorflow 2.x implementation of Panoramic BlitzNet for object detection and semantic segmentation on indoor panoramic images.

Deep neural network for object detection and semantic segmentation on indoor panoramic images. The implementation is based on the papers:

Alejandro de Nova Guerrero 9 Nov 24, 2022