This code is an implementation for Singing TTS.

Last update: Dec 23, 2022

Related tags

Overview

MLP Singer

This code is an implementation for Singing TTS. The algorithm is based on the following papers:

Tae, J., Kim, H., & Lee, Y. (2021). MLP Singer: Towards Rapid Parallel Korean Singing Voice Synthesis. arXiv preprint arXiv:2106.07886.
Tolstikhin, I., Houlsby, N., Kolesnikov, A., Beyer, L., Zhai, X., Unterthiner, T., ... & Dosovitskiy, A. (2021). Mlp-mixer: An all-mlp architecture for vision. arXiv preprint arXiv:2105.01601.

Structure

Structure is based on the MLP Singer.
I changed several hyper parameters and data type
- One of mel or spectrogram is can be selected as a feature type.
- Token type is changed from phoneme to grapheme.

Used dataset

Code verification was conducted through a private Korean dataset.
- Thus, current Pattern_Generator.py and Datasets.py are based on the Korean.
TODO: Scripting for the offical dataset.
- CSD Dataset

Hyper parameters

Before proceeding, please set the pattern, inference, and checkpoint paths in Hyper_Parameters.yaml according to your environment.

Sound
- Setting basic sound parameters.
Tokens
- The number of Lyric token.
Max_Note
- The highest note value for embedding.
Duration
- Min duration is used at pattern generating only.
- Max duration is decided the maximum time step of model. MLP mixer always use the maximum time step.
- Equality set the strategy about syllable to grapheme.
  - When True, onset, nucleus, and coda have same length or ±1 difference.
  - When False, onset and coda have Consonant_Duration length, and nucleus has duration - 2 * Consonant_Duration.
Feature_Type
- Setting the feature type (Mel or Spectrogram).
Encoder
- Setting the encoder(embedding).
Mixer
- Setting the MLP mixer.
Train
- Setting the parameters of training.
Inference_Batch_Size
- Setting the batch size when inference
Inference_Path
- Setting the inference path
Checkpoint_Path
- Setting the checkpoint path
Log_Path
- Setting the tensorboard log path
Use_Mixed_Precision
- Setting using mixed precision
Use_Multi_GPU
- Setting using multi gpu
- By the nvcc problem, Only linux supports this option.
- If this is True, device parameter is also multiple like '0,1,2,3'.
- And you have to change the training command also: please check multi_gpu.sh.
Device
- Setting which GPU devices are used in multi-GPU enviornment.
- Or, if using only CPU, please set '-1'. (But, I don't recommend while training.)

Generate pattern

Current version does not support any open source dataset.

Inference file path while training for verification.

Inference_for_Training
- There are three examples for inference.
- It is midi file based script.

Run

Command

Single GPU

python Train.py -hp  -s

-hp
- The hyper paramter file path
- This is required.
-s
- The resume step parameter.
- Default is 0.
- If value is 0, model try to search the latest checkpoint.

Multi GPU

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 OMP_NUM_THREADS=32 python -m torch.distributed.launch --nproc_per_node=8 Train.py --hyper_parameters Hyper_Parameters.yaml --port 54322

I recommend to check the multi_gpu.sh.

This code is an implementation for Singing TTS.

Related tags

Overview

MLP Singer

Structure

Used dataset

Hyper parameters

Generate pattern

Inference file path while training for verification.

Run

Command

Single GPU

Multi GPU

Owner

Heejo You

Finite-temperature variational Monte Carlo calculation of uniform electron gas using neural canonical transformation.

A large-scale benchmark for co-optimizing the design and control of soft robots, as seen in NeurIPS 2021.

Utility code for use with PyXLL

PyTorch implementation of Lip to Speech Synthesis with Visual Context Attentional GAN (NeurIPS2021)

REBEL: Relation Extraction By End-to-end Language generation

CVPR 2020 oral paper: Overcoming Classifier Imbalance for Long-tail Object Detection with Balanced Group Softmax.

Package for working with hypernetworks in PyTorch.

MarcoPolo is a clustering-free approach to the exploration of bimodally expressed genes along with group information in single-cell RNA-seq data

Noether Networks: meta-learning useful conserved quantities

Stacked Generative Adversarial Networks

Python Library for Signal/Image Data Analysis with Transport Methods

A small fun project using python OpenCV, mediapipe, and pydirectinput

Semi-supervised Stance Detection of Tweets Via Distant Network Supervision

[NeurIPS 2021] "Drawing Robust Scratch Tickets: Subnetworks with Inborn Robustness Are Found within Randomly Initialized Networks" by Yonggan Fu, Qixuan Yu, Yang Zhang, Shang Wu, Xu Ouyang, David Cox, Yingyan Lin

交互式标注软件，暂定名 iann

AlgoVision - A Framework for Differentiable Algorithms and Algorithmic Supervision

Supporting code for short YouTube series Neural Networks Demystified.

The source code for Adaptive Kernel Graph Neural Network at AAAI2022

Commonality in Natural Images Rescues GANs: Pretraining GANs with Generic and Privacy-free Synthetic Data - Official PyTorch Implementation (CVPR 2022)

Pose estimation for iOS and android using TensorFlow 2.0