Any-to-any voice conversion using synthetic specific-speaker speeches as intermedium features

Overview

MediumVC

MediumVC is an utterance-level method towards any-to-any VC. Before that, we propose SingleVC to perform A2O tasks(Xi → Ŷi) , Xi means utterance i spoken by X). The Ŷi are considered as SSIF. To build SingleVC, we employ a novel data augment strategy: pitch-shifted and duration-remained(PSDR) to produce paired asymmetrical training data. Then, based on pre-trained SingleVC, MediumVC performs an asymmetrical reconstruction task(Ŷi → X̂i). Due to the asymmetrical reconstruction mode, MediumVC achieves more efficient feature decoupling and fusion. Experiments demonstrate MediumVC performs strong robustness for unseen speakers across multiple public datasets. Here is the official implementation of the paper, MediumVC.

The following are the overall model architecture.

Model architecture

For the audio samples, please refer to our demo page. The more converted speeches can be found in "Demo/ConvertedSpeeches/".

Envs

You can install the dependencies with

pip install -r requirements.txt

Speaker Encoder

Dvector is a robust speaker verification (SV) system pre-trained on VoxCeleb1 using GE2E loss, and it produces 256-dim speaker embedding. In our evaluation on multiple datasets(VCTK with 30000 pairs, Librispeech with 30000 pairs and VCC2020 with 10000 pairs), the equal error rates(EERs)and thresholds(THRs) are recorded in Table. Then Dvector with THRs is also employed to calculate SV accuracy(ACC) of pairs produced by MediumVC and other contrast methods for objective evaluation. The more details can access paper.

Dataset VCTK LibriSpeech VCC2020
EER(%)/THR 7.71/0.462 7.95/0.337 1.06/0.432

Vocoder

The HiFi-GAN vocoder is employed to convert log mel-spectrograms to waveforms. The model is trained on universal datasets with 13.93M parameters. Through our evaluation, it can synthesize 22.05 kHz high-fidelity speeches over 4.0 MOS, even in cross-language or noisy environments.

Infer

You can download the pretrained model, and then edit "Any2Any/infer/infer_config.yaml".Test Samples could be organized as "wav22050/$figure$/*.wav".

python Any2Any/infer/infer.py

Train from scratch

Preprocessing

The corpus should be organized as "VCTK22050/$figure$/*.wav", and then edit the config file "Any2Any/pre_feature/preprocess_config.yaml".The output "spk_emb_mel_label.pkl" will be used for training.

python Any2Any/pre_feature/figure_spkemb_mel.py

Training

Please edit the paths of pretrained hifi-model,wav2mel,dvector,SingleVC in config file "Any2Any/config.yaml" at first.

python Any2Any/solver.py
Owner
谷下雨
美中不足
谷下雨
Anki vector Music ❤ is the best and only Telegram VC player with playlists, Multi Playback, Channel play and more

Anki Vector Music 🎵 A bot that can play music on Telegram Group and Channel Voice Chats Available on telegram as @Anki Vector Music Features 🔥 Thumb

Damantha Jasinghe 12 Nov 12, 2022
GiantMIDI-Piano is a classical piano MIDI dataset contains 10,854 MIDI files of 2,786 composers

GiantMIDI-Piano is a classical piano MIDI dataset contains 10,854 MIDI files of 2,786 composers

Bytedance Inc. 1.3k Jan 04, 2023
A tool for retrieving audio in the past

Rewinder A tool for retrieving audio in the past. Ever felt like, I need to remember that discussion which happened 10 min back. Now you can! Rewind a

Bharat 1 Jan 24, 2022
Automatically move or copy files based on metadata associated with the files. For example, file your photos based on EXIF metadata or use MP3 tags to file your music files.

Automatically move or copy files based on metadata associated with the files. For example, file your photos based on EXIF metadata or use MP3 tags to file your music files.

Rhet Turnbull 14 Nov 02, 2022
Python interface to the WebRTC Voice Activity Detector

py-webrtcvad This is a python interface to the WebRTC Voice Activity Detector (VAD). It is compatible with Python 2 and Python 3. A VAD classifies a p

John Wiseman 1.5k Dec 22, 2022
This is an AI that runs in the terminal. It is a voice assistant that can do common activities and can also help in your coding doubts like

This is an AI that runs in the terminal. It is a voice assistant that can do common activities and can also help in your coding doubts like

OneBit 1 Nov 05, 2021
Python module for handling audio metadata

Mutagen is a Python module to handle audio metadata. It supports ASF, FLAC, MP4, Monkey's Audio, MP3, Musepack, Ogg Opus, Ogg FLAC, Ogg Speex, Ogg The

Quod Libet 1.1k Dec 31, 2022
A Python 3 script for capturing and recording a SDR stream to a WAV file (or serving it to a HTTP audio stream).

rfsoapyfile A Python 3 script for capturing and recording a SDR stream to a WAV file (or serving it to a HTTP audio stream). The script is threaded fo

4 Dec 19, 2022
Learn chords with your MIDI keyboard !

miditeach miditeach is a music learning tool that can be used to practice your chords skills with a midi keyboard 🎹 ! Features Midi keyboard input se

Alexis LOUIS 3 Oct 20, 2021
convert-to-opus-cli is a Python CLI program for converting audio files to opus audio format.

convert-to-opus-cli convert-to-opus-cli is a Python CLI program for converting audio files to opus audio format. Installation Must have installed ffmp

4 Dec 21, 2022
Deep learning transformer model that generates unique music sequences.

music-ai Deep learning transformer model that generates unique music sequences. Abstract In 2017, a new state-of-the-art was published for natural lan

xacer 6 Nov 19, 2022
Improved Python UI to convert Youtube URL to .mp3 file.

YT-MP3 Improved Python UI to convert Youtube URL to .mp3 file. How to use? Just run python3 main.py Enter the URL of the video Enter the PATH of where

8 Jun 19, 2022
Music generation using ml / dl

Data analysis Document here the project: deep_music Description: Project Description Data Source: Type of analysis: Please document the project the be

0 Jul 03, 2022
digital audio workstation, instrument and effect plugins, wave editor

digital audio workstation, instrument and effect plugins, wave editor

306 Jan 05, 2023
Stream Music 🎵 𝘼 𝙗𝙤𝙩 𝙩𝙝𝙖𝙩 𝙘𝙖𝙣 𝙥𝙡𝙖𝙮 𝙢𝙪𝙨𝙞𝙘 𝙤𝙣 𝙏𝙚𝙡𝙚𝙜𝙧𝙖𝙢 𝙂𝙧𝙤𝙪𝙥 𝙖𝙣𝙙 𝘾𝙝𝙖𝙣𝙣𝙚𝙡 𝙑𝙤𝙞𝙘𝙚 𝘾𝙝𝙖𝙩𝙨 𝘼𝙫𝙖𝙞𝙡?

Stream Music 🎵 𝘼 𝙗𝙤𝙩 𝙩𝙝𝙖𝙩 𝙘𝙖𝙣 𝙥𝙡𝙖𝙮 𝙢𝙪𝙨𝙞𝙘 𝙤𝙣 𝙏𝙚𝙡𝙚𝙜𝙧𝙖𝙢 𝙂𝙧𝙤𝙪𝙥 𝙖𝙣𝙙 𝘾𝙝𝙖𝙣𝙣𝙚𝙡 𝙑𝙤𝙞𝙘𝙚 𝘾𝙝𝙖𝙩𝙨 𝘼𝙫𝙖𝙞𝙡?

Sadew Jayasekara 15 Nov 12, 2022
A GUI-based audio player with support for a large variety of formats

Miza-Player A GUI-based audio player with support for a large variety of formats, able to play from web-hosted media platforms such as YouTube, includ

Thomas Xin 3 Dec 14, 2022
MelGAN test on audio decoding

Official repository for the paper MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis The original work URL: https://github.com

Jurio 1 Apr 29, 2022
Graphical interface to control granular sound synthesis.

Granular sound synthesis interface SoundGrain is a graphical interface where users can draw and edit trajectories to control granular sound synthesis

Olivier Bélanger 122 Dec 10, 2022
This is a realtime voice translator program which gets input from user at any language and converts it to the desired language that the user asks

This is a realtime voice translator program which gets input from user at any language and converts it to the desired language that the user asks ...

Mohan Ram S 1 Dec 30, 2021
📺Headless全自动B站直播录播、切片、上传一体工具

DDRecorder Headless全自动B站直播录播、切片、上传一体工具 感谢 FortuneDayssss/BilibiliUploader 安装指南(Windows) 在Release下载zip包解压。 修改配置文件config.json 双击运行DDRecorder.exe (这将使用co

322 Dec 27, 2022