SingleVC performs any-to-one VC, which is an important component of MediumVC project.

Overview

SingleVC

SingleVC performs any-to-one VC, which is an important component of MediumVC project. Here is the official implementation of the paper, MediumVC.

The following are the overall model architecture. Model architecture

For the audio samples, please refer to our demo page. The more details can be found in "any2one/demo_page/ConvertedSpeeches/".

Envs

You can install the dependencies with

pip install -r requirements.txt

PSDR

PSDR means scaling F0 and correlative harmonics with duration remained, which intuitively modifying the speaker-related information while maintaining linguistic content and prosodic information. PSDR can be used as a data augment strategy for VC by producing fake parallel corpus. To verify its feasibility that slight pitch shifts don't affect content information, we measure the word error rate(WER) between source speeches and pitch-shifted speeches through Wav2Vec2-based ASR System. The speeches of p249(female) from VCTK Corupsis selected, and pyrubberband is utilized to execute PSDR. Table indicates that when S in -6~4, the strategy applies to VC with acceptable WERs.

S -7 -6 -5 0 3 4 5
WER(%) 40.51 25.79 17.25 0 17.27 25.21 48.14

Vocoder

The HiFi-GAN vocoder is employed to convert log mel-spectrograms to waveforms. The model is trained on universal datasets with 13.93M parameters. Through our evaluation, it can synthesize 22.05 kHz high-fidelity speeches over 4.0 MOS, even in cross-language or noisy environments.

pretrained models

You can download the pretrained model as well as the vocoder following the link, and then edit the config file any2one/infer/infer_config.yaml. Infer corpus should be organized as test22050/*.wav You can convert an list of utterances, e.g.

python any2one/infer/infer.py

Train from scratch

select acceptable pitch shifts

If you want to reconstruct someone's voice, you need to calculate the acceptable pitch shifts of that person first. Edit the "any2one/tools/wav2vec_asr.py" and config the "wave_dir" as "speech16000_dir". The ASR model provided in "any2one/tools/wav2vec_asr.py" only supports English speech recognition currently. You can replace it for other languages. In our test, the acceptable pitch shifts of p249 in VCTK-Corups are [-6,4].

python any2one/tools.wav2vec_asr.py

tips: In practice, it performers a higher probability of success to build female voices than male voices . Compared to males, the periodic patterns of females perform more stable due to the higher frequency resolution.

The train corpus should be organized as vctk22050/p249/*.wav

python any2one/solver.py

Preprocessing

  1. The model is trained with random pitch shifted speeches processed in real-time. If you want to speed up the training, please refer the code in "any2one/meldataset.py" to have data preprocessed.
  2. If use preprocess method in HiFi-GAN vocoder, the training will take about one day with TITAN Xp, and the performances will be more robust. However, using preprocess method in WaveRNN, the training will just spend three hours.
Owner
谷下雨
美中不足
谷下雨
Urban mobility simulations with Python3, RLlib (Deep Reinforcement Learning) and Mesa (Agent-based modeling)

Deep Reinforcement Learning for Smart Cities Documentation RLlib: https://docs.ray.io/en/master/rllib.html Mesa: https://mesa.readthedocs.io/en/stable

1 May 15, 2022
Pretrained Cost Model for Distributed Constraint Optimization Problems

Pretrained Cost Model for Distributed Constraint Optimization Problems Requirements PyTorch 1.9.0 PyTorch Geometric 1.7.1 Directory structure baseline

2 Aug 28, 2022
Pytorch Implementation of LNSNet for Superpixel Segmentation

LNSNet Overview Official implementation of Learning the Superpixel in a Non-iterative and Lifelong Manner (CVPR'21) Learning Strategy The proposed LNS

42 Oct 11, 2022
DROPO: Sim-to-Real Transfer with Offline Domain Randomization

DROPO: Sim-to-Real Transfer with Offline Domain Randomization Gabriele Tiboni, Karol Arndt, Ville Kyrki. This repository contains the code for the pap

Gabriele Tiboni 8 Dec 19, 2022
Official implementation for paper Knowledge Bridging for Empathetic Dialogue Generation (AAAI 2021).

Knowledge Bridging for Empathetic Dialogue Generation This is the official implementation for paper Knowledge Bridging for Empathetic Dialogue Generat

Qintong Li 50 Dec 20, 2022
pytorch, hand(object) detect ,yolo v5,手检测

YOLO V5 物体检测,包括手部检测。 项目介绍 手部检测 手部检测示例如下 : 视频示例: 项目配置 作者开发环境: Python 3.7 PyTorch = 1.5.1 数据集 手部检测数据集 该项目数据集采用 TV-Hand 和 COCO-Hand (COCO-Hand-Big 部分) 进

Eric.Lee 11 Dec 20, 2022
Distributed DataLoader For Pytorch Based On Ray

Dpex——用户无感知分布式数据预处理组件 一、前言 随着GPU与CPU的算力差距越来越大以及模型训练时的预处理Pipeline变得越来越复杂,CPU部分的数据预处理已经逐渐成为了模型训练的瓶颈所在,这导致单机的GPU配置的提升并不能带来期望的线性加速。预处理性能瓶颈的本质在于每个GPU能够使用的C

Dalong 23 Nov 02, 2022
PyTorch CZSL framework containing GQA, the open-world setting, and the CGE and CompCos methods.

Compositional Zero-Shot Learning This is the official PyTorch code of the CVPR 2021 works Learning Graph Embeddings for Compositional Zero-shot Learni

EML Tübingen 70 Dec 27, 2022
Predict halo masses from simulations via graph neural networks

HaloGraphNet Predict halo masses from simulations via Graph Neural Networks. Given a dark matter halo and its galaxies, creates a graph with informati

Pablo Villanueva Domingo 20 Nov 15, 2022
code and data for paper "GIANT: Scalable Creation of a Web-scale Ontology"

GIANT Code and data for paper "GIANT: Scalable Creation of a Web-scale Ontology" https://arxiv.org/pdf/2004.02118.pdf Please cite our paper if this pr

Excalibur 39 Dec 29, 2022
Predict stock movement with Machine Learning and Deep Learning algorithms

Project Overview Stock market movement prediction using LSTM Deep Neural Networks and machine learning algorithms Software and Library Requirements Th

Naz Delam 46 Sep 13, 2022
Implementation of our paper "Video Playback Rate Perception for Self-supervised Spatio-Temporal Representation Learning".

PRP Introduction This is the implementation of our paper "Video Playback Rate Perception for Self-supervised Spatio-Temporal Representation Learning".

yuanyao366 39 Dec 29, 2022
A simple command line tool for text to image generation, using OpenAI's CLIP and a BigGAN.

Ryan Murdock has done it again, combining OpenAI's CLIP and the generator from a BigGAN! This repository wraps up his work so it is easily accessible to anyone who owns a GPU.

Phil Wang 2.3k Jan 09, 2023
Official PyTorch implementation and pretrained models of the paper Self-Supervised Classification Network

Self-Classifier: Self-Supervised Classification Network Official PyTorch implementation and pretrained models of the paper Self-Supervised Classificat

Elad Amrani 24 Dec 21, 2022
AAAI 2022: Stationary diffusion state neural estimation

Stationary Diffusion State Neural Estimation Although many graph-based clustering methods attempt to model the stationary diffusion state in their obj

绽琨 33 Nov 24, 2022
The implementation of 'Image synthesis via semantic composition'.

Image synthesis via semantic synthesis [Project Page] by Yi Wang, Lu Qi, Ying-Cong Chen, Xiangyu Zhang, Jiaya Jia. Introduction This repository gives

DV Lab 71 Jan 06, 2023
A python library to artfully visualize Factorio Blueprints and an interactive web demo for using it.

Factorio Blueprint Visualizer I love the game Factorio and I really like the look of factories after growing for many hours or blueprints after tweaki

Piet Brömmel 124 Jan 07, 2023
A Python library that provides a simplified alternative to DBAPI 2

A Python library that provides a simplified alternative to DBAPI 2. It provides a facade in front of DBAPI 2 drivers.

Tony Locke 44 Nov 17, 2021
Learning Pixel-level Semantic Affinity with Image-level Supervision for Weakly Supervised Semantic Segmentation, CVPR 2018

Learning Pixel-level Semantic Affinity with Image-level Supervision This code is deprecated. Please see https://github.com/jiwoon-ahn/irn instead. Int

Jiwoon Ahn 337 Dec 15, 2022
Self-Supervised Speech Pre-training and Representation Learning Toolkit.

What's New Sep 2021: We host a challenge in AAAI workshop: The 2nd Self-supervised Learning for Audio and Speech Processing! See SUPERB official site

s3prl 1.6k Jan 08, 2023