Use VITS and Opencpop to develop singing voice synthesis; Maybe it will VISinger.

Overview

Init

Use VITS and Opencpop to develop singing voice synthesis; Maybe it will VISinger.

本项目基于

https://github.com/jaywalnut310/vits
https://github.com/SJTMusicTeam/Muskits/
https://wenet.org.cn/opencpop/ 歌声数据

使用muskit数据预处理,获得初步数据

cd egs/opencpop/svs1/
./local/data.sh

VISinger_data
--lable
--midi_dump
--wav_dump

采样率转换

python wave_16k.py
--wav_dump
--wav_dump_16k

使用muskit将数据处理成vits的格式

1, 将lable进行拆分
python muskit/data_label_single.py

label_dump,midi_dump,wav_dump:一个文件一个标注

注意:label和lable的混用(两个单词都是对的)

VISinger_data
--label_dump
--midi_dump
--wav_dump
--wav_dump_16k

2, 将label和midi处理为frame对应的发音单元和音符(基音)
python muskit/data_format_vits.py
VISinger_data
--label_vits
--label_dump
--midi_dump
--wav_dump
--wav_dump_16k

3, 生成VITS需要的files,并分割为train和dev,test不需要(可以手动设计)
python muskit/data_format_vits.py

vits_file.txt 中的内容格式:wave path|label path|pitch path;

cp vits_file.txt VISinger/filelists/
cd VISinger/

python preprocess.py 分割为train和dev

VITS训练

cd VISinger
CUDA_VISIBLE_DEVICES=0 python train.py -c configs/singing_base.json -m singing_base 2>exit_error.log;cat exit_error.log
python vsinging_infer.py

使用16K节约内存,方便模型修改

编辑midi,然后测试

cd ../;python muskit/infer_midi.py;cd -;python vsinging_edit.py

LOSS值 MEL谱

样例音频

vits_singing_样例.wav

You might also like...
In this project, we develop a face recognize platform based on MTCNN object-detection netcwork and FaceNet self-supervised network.
In this project, we develop a face recognize platform based on MTCNN object-detection netcwork and FaceNet self-supervised network.

模式识别大作业——人脸检测与识别平台 本项目是一个简易的人脸检测识别平台,提供了人脸信息录入和人脸识别的功能。前端采用 html+css+js,后端采用 pytorch,

Official codebase used to develop Vision Transformer, MLP-Mixer, LiT and more.

Big Vision This codebase is designed for training large-scale vision models on Cloud TPU VMs. It is based on Jax/Flax libraries, and uses tf.data and

Use MATLAB to simulate the signal and extract features. Use PyTorch to build and train deep network to do spectrum sensing.

Deep-Learning-based-Spectrum-Sensing Use MATLAB to simulate the signal and extract features. Use PyTorch to build and train deep network to do spectru

Transfer style api - An API to use with Tranfer Style App, where you can use two image and transfer the style

Transfer Style API It's an API to use with Tranfer Style App, where you can use

Voice of Pajlada with model and weights.

Pajlada TTS Stripped down version of ForwardTacotron (https://github.com/as-ideas/ForwardTacotron) with pretrained weights for Pajlada's (https://gith

A voice recognition assistant similar to amazon alexa, siri and google assistant.
A voice recognition assistant similar to amazon alexa, siri and google assistant.

kenyan-Siri Build an Artificial Assistant Full tutorial (video) To watch the tutorial, click on the image below Installation For windows users (run th

An implementation of
An implementation of "Optimal Textures: Fast and Robust Texture Synthesis and Style Transfer through Optimal Transport"

Optex An implementation of Optimal Textures: Fast and Robust Texture Synthesis and Style Transfer through Optimal Transport for TU Delft CS4240. You c

this is a lite easy to use virtual keyboard project for anyone to use
this is a lite easy to use virtual keyboard project for anyone to use

virtual_Keyboard this is a lite easy to use virtual keyboard project for anyone to use motivation I made this for this year's recruitment for RobEn AA

A collection of easy-to-use, ready-to-use, interesting deep neural network models
A collection of easy-to-use, ready-to-use, interesting deep neural network models

Interesting and reproducible research works should be conserved. This repository wraps a collection of deep neural network models into a simple and un

Comments
  • couple of questions

    couple of questions

    Hello how are you ! very cool stuff you have here ,I can clearly see you love singing voice synthesis (SVS) from your forks and repos !! i wanted to ask is that a fully working Visingerr or is it a try from you to make it to sing , like can it be tested on a custom English data and have like results the same as or near the demo in the paper. Also do you have like other samples i can hear , i know that you tested it on opencpop that has almost 5.2 hours of singing data , and also in the paper they trained Visingerr for 600k iterations right ? how many iterations did you achieve on the opencpop to get the result linked below (vits_singing_样例.wav). to be honest i thought vits is data hungry like tacotron2 or fastspeech (aka needs a lot of data to get great results) , that opencpop result of your is so impressive for 5.2 hours data , i also wonder if you lowered the sample rate of opencpop from 44.1 KHz to 22KHz as i heard 44.1 KHz takes alot of time to train x10 the time needed.

    迫不及待地想知道你的消息 :)

    opened by dutchsing009 5
  • 问题

    问题

    python prepare/data_vits.py 输出 1,../VISinger_data/label_vits/XXX._label.npy|XXX_score.npy|XXX_pitch.npy|XXX_slurs.npy 2,filelists/vits_file.txt 内容格式:wave path|label path|score path|pitch path|slurs path;

    请问1 2这两步是怎么操作?

    opened by baipeng0110 3
  • 训练结果

    训练结果

    目前模型缺乏时长预测模型和基音预测模型; 训练语料中的句子修改歌词的效果;

    原歌词:雨淋湿了天空灰得更讲究

    https://user-images.githubusercontent.com/16432329/164953151-4c2513cb-f336-416b-8f04-604f13e63368.MP4

    修改歌词:你闹够了没有让我更难受

    https://user-images.githubusercontent.com/16432329/164953155-16c72670-cc89-40bc-99fe-42781c9dcdc0.MP4

    help wanted 
    opened by MaxMax2016 0
  • About release models and VISinger

    About release models and VISinger

    Hi

    This is a fantastic project that I have ever seen.

    Could you please share the released model? As on the inference step, it is said that "using the released model"

    Also, is there any plan to implement the VISinger model?

    Thank you!

    opened by shiyanpei0826 1
Owner
AmorTX
Speech
AmorTX
Pmapper is a super-resolution and deconvolution toolkit for python 3.6+

pmapper pmapper is a super-resolution and deconvolution toolkit for python 3.6+. PMAP stands for Poisson Maximum A-Posteriori, a highly flexible and a

NASA Jet Propulsion Laboratory 8 Nov 06, 2022
audioLIME: Listenable Explanations Using Source Separation

audioLIME This repository contains the Python package audioLIME, a tool for creating listenable explanations for machine learning models in music info

Institute of Computational Perception 27 Dec 01, 2022
Doing the asl sign language classification on static images using graph neural networks.

SignLangGNN When GNNs 💜 MediaPipe. This is a starter project where I tried to implement some traditional image classification problem i.e. the ASL si

10 Nov 09, 2022
Implementation of "Deep Implicit Templates for 3D Shape Representation"

Deep Implicit Templates for 3D Shape Representation Zerong Zheng, Tao Yu, Qionghai Dai, Yebin Liu. arXiv 2020. This repository is an implementation fo

Zerong Zheng 144 Dec 07, 2022
中文语音识别系列,读者可以借助它快速训练属于自己的中文语音识别模型,或直接使用预训练模型测试效果。

MASR中文语音识别(pytorch版) 开箱即用 自行训练 使用与训练分离(增量训练) 识别率高 说明:因为每个人电脑机器不同,而且有些安装包安装起来比较麻烦,强烈建议直接用我编译好的docker环境跑 目前docker基础环境为ubuntu-cuda10.1-cudnn7-pytorch1.6.

发送小信号 180 Dec 17, 2022
Model-based Reinforcement Learning Improves Autonomous Racing Performance

Racing Dreamer: Model-based versus Model-free Deep Reinforcement Learning for Autonomous Racing Cars In this work, we propose to learn a racing contro

Cyber Physical Systems - TU Wien 38 Dec 06, 2022
The code for paper Efficiently Solve the Max-cut Problem via a Quantum Qubit Rotation Algorithm

Quantum Qubit Rotation Algorithm Single qubit rotation gates $$ U(\Theta)=\bigotimes_{i=1}^n R_x (\phi_i) $$ QQRA for the max-cut problem This code wa

SheffieldWang 0 Oct 18, 2021
Synthesizing and manipulating 2048x1024 images with conditional GANs

pix2pixHD Project | Youtube | Paper Pytorch implementation of our method for high-resolution (e.g. 2048x1024) photorealistic image-to-image translatio

NVIDIA Corporation 6k Dec 27, 2022
Does Pretraining for Summarization Reuqire Knowledge Transfer?

Pretraining summarization models using a corpus of nonsense

Approximately Correct Machine Intelligence (ACMI) Lab 12 Dec 19, 2022
PyTorch code for: Learning to Generate Grounded Visual Captions without Localization Supervision

Learning to Generate Grounded Visual Captions without Localization Supervision This is the PyTorch implementation of our paper: Learning to Generate G

Chih-Yao Ma 41 Nov 17, 2022
The repository contain code for building compiler using puthon.

Building Compiler This is a python implementation of JamieBuild's "Super Tiny Compiler" Overview JamieBuilds developed a wonderfully educative compile

Shyam Das Shrestha 1 Nov 21, 2021
Detect roadway lanes using Python OpenCV for project during the 5th semester at DHBW Stuttgart for lecture in digital image processing.

Find Line Detection (Image Processing) Identifying lanes of the road is very common task that human driver performs. It's important to keep the vehicl

LMF 4 Jun 21, 2022
Softlearning is a reinforcement learning framework for training maximum entropy policies in continuous domains. Includes the official implementation of the Soft Actor-Critic algorithm.

Softlearning Softlearning is a deep reinforcement learning toolbox for training maximum entropy policies in continuous domains. The implementation is

Robotic AI & Learning Lab Berkeley 997 Dec 30, 2022
Improving Query Representations for DenseRetrieval with Pseudo Relevance Feedback:A Reproducibility Study.

APR The repo for the paper Improving Query Representations for DenseRetrieval with Pseudo Relevance Feedback:A Reproducibility Study. Environment setu

ielab 8 Nov 26, 2022
WTTE-RNN a framework for churn and time to event prediction

WTTE-RNN Weibull Time To Event Recurrent Neural Network A less hacky machine-learning framework for churn- and time to event prediction. Forecasting p

Egil Martinsson 727 Dec 28, 2022
Predicting Axillary Lymph Node Metastasis in Early Breast Cancer Using Deep Learning on Primary Tumor Biopsy Slides

Predicting Axillary Lymph Node Metastasis in Early Breast Cancer Using Deep Learning on Primary Tumor Biopsy Slides Project | This repo is the officia

CVSM Group - email: <a href=[email protected]"> 33 Dec 28, 2022
A PyTorch implementation of "Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks" (KDD 2019).

ClusterGCN ⠀⠀ A PyTorch implementation of "Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks" (KDD 2019). A

Benedek Rozemberczki 697 Dec 27, 2022
DirectVoxGO reconstructs a scene representation from a set of calibrated images capturing the scene.

DirectVoxGO reconstructs a scene representation from a set of calibrated images capturing the scene. We achieve NeRF-comparable novel-view synthesis quality with super-fast convergence.

sunset 709 Dec 31, 2022
Simple and Effective Few-Shot Named Entity Recognition with Structured Nearest Neighbor Learning

structshot Code and data for paper "Simple and Effective Few-Shot Named Entity Recognition with Structured Nearest Neighbor Learning", Yi Yang and Arz

ASAPP Research 47 Dec 27, 2022
Training deep models using anime, illustration images.

animeface deep models for anime images. Datasets anime-face-dataset Anime faces collected from Getchu.com. Based on Mckinsey666's dataset. 63.6K image

Tomoya Sawada 61 Dec 25, 2022