This code is an implementation for Singing TTS.

Overview

MLP Singer

This code is an implementation for Singing TTS. The algorithm is based on the following papers:

Tae, J., Kim, H., & Lee, Y. (2021). MLP Singer: Towards Rapid Parallel Korean Singing Voice Synthesis. arXiv preprint arXiv:2106.07886.
Tolstikhin, I., Houlsby, N., Kolesnikov, A., Beyer, L., Zhai, X., Unterthiner, T., ... & Dosovitskiy, A. (2021). Mlp-mixer: An all-mlp architecture for vision. arXiv preprint arXiv:2105.01601.

Structure

  • Structure is based on the MLP Singer.
  • I changed several hyper parameters and data type
    • One of mel or spectrogram is can be selected as a feature type.
    • Token type is changed from phoneme to grapheme.

Used dataset

Hyper parameters

Before proceeding, please set the pattern, inference, and checkpoint paths in Hyper_Parameters.yaml according to your environment.

  • Sound

    • Setting basic sound parameters.
  • Tokens

    • The number of Lyric token.
  • Max_Note

    • The highest note value for embedding.
  • Duration

    • Min duration is used at pattern generating only.
    • Max duration is decided the maximum time step of model. MLP mixer always use the maximum time step.
    • Equality set the strategy about syllable to grapheme.
      • When True, onset, nucleus, and coda have same length or ±1 difference.
      • When False, onset and coda have Consonant_Duration length, and nucleus has duration - 2 * Consonant_Duration.
  • Feature_Type

    • Setting the feature type (Mel or Spectrogram).
  • Encoder

    • Setting the encoder(embedding).
  • Mixer

    • Setting the MLP mixer.
  • Train

    • Setting the parameters of training.
  • Inference_Batch_Size

    • Setting the batch size when inference
  • Inference_Path

    • Setting the inference path
  • Checkpoint_Path

    • Setting the checkpoint path
  • Log_Path

    • Setting the tensorboard log path
  • Use_Mixed_Precision

    • Setting using mixed precision
  • Use_Multi_GPU

    • Setting using multi gpu
    • By the nvcc problem, Only linux supports this option.
    • If this is True, device parameter is also multiple like '0,1,2,3'.
    • And you have to change the training command also: please check multi_gpu.sh.
  • Device

    • Setting which GPU devices are used in multi-GPU enviornment.
    • Or, if using only CPU, please set '-1'. (But, I don't recommend while training.)

Generate pattern

  • Current version does not support any open source dataset.

Inference file path while training for verification.

  • Inference_for_Training
    • There are three examples for inference.
    • It is midi file based script.

Run

Command

Single GPU

python Train.py -hp  -s 
  • -hp

    • The hyper paramter file path
    • This is required.
  • -s

    • The resume step parameter.
    • Default is 0.
    • If value is 0, model try to search the latest checkpoint.

Multi GPU

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 OMP_NUM_THREADS=32 python -m torch.distributed.launch --nproc_per_node=8 Train.py --hyper_parameters Hyper_Parameters.yaml --port 54322
Owner
Heejo You
Main focus: Psycholinguistics / Mechine learning / Deep learning
Heejo You
Real-time 3D multi-person detection made easy with OpenPose and the ZED

OpenPose ZED This sample show how to simply use the ZED with OpenPose, the deep learning framework that detects the skeleton from a single 2D image. T

blanktec 5 Nov 06, 2020
Lipstick ain't enough: Beyond Color-Matching for In-the-Wild Makeup Transfer (CVPR 2021)

Table of Content Introduction Datasets Getting Started Requirements Usage Example Training & Evaluation CPM: Color-Pattern Makeup Transfer CPM is a ho

VinAI Research 248 Dec 13, 2022
CARL provides highly configurable contextual extensions to several well-known RL environments.

CARL (context adaptive RL) provides highly configurable contextual extensions to several well-known RL environments.

AutoML-Freiburg-Hannover 51 Dec 28, 2022
Home repository for the Regularized Greedy Forest (RGF) library. It includes original implementation from the paper and multithreaded one written in C++, along with various language-specific wrappers.

Regularized Greedy Forest Regularized Greedy Forest (RGF) is a tree ensemble machine learning method described in this paper. RGF can deliver better r

RGF-team 364 Dec 28, 2022
One Million Scenes for Autonomous Driving

ONCE Benchmark This is a reproduced benchmark for 3D object detection on the ONCE (One Million Scenes) dataset. The code is mainly based on OpenPCDet.

148 Dec 28, 2022
A simple pytorch pipeline for semantic segmentation.

SegmentationPipeline -- Pytorch A simple pytorch pipeline for semantic segmentation. Requirements : torch=1.9.0 tqdm albumentations=1.0.3 opencv-pyt

petite7 4 Feb 22, 2022
Ladder Variational Autoencoders (LVAE) in PyTorch

Ladder Variational Autoencoders (LVAE) PyTorch implementation of Ladder Variational Autoencoders (LVAE) [1]: where the variational distributions q at

Andrea Dittadi 63 Dec 22, 2022
In this project I played with mlflow, streamlit and fastapi to create a training and prediction app on digits

Fastapi + MLflow + streamlit Setup env. I hope I covered all. pip install -r requirements.txt Start app Go in the root dir and run these Streamlit str

76 Nov 23, 2022
Code for ACL2021 paper Consistency Regularization for Cross-Lingual Fine-Tuning.

xTune Code for ACL2021 paper Consistency Regularization for Cross-Lingual Fine-Tuning. Environment DockerFile: dancingsoul/pytorch:xTune Install the f

Bo Zheng 42 Dec 09, 2022
Learning Compatible Embeddings, ICCV 2021

LCE Learning Compatible Embeddings, ICCV 2021 by Qiang Meng, Chixiang Zhang, Xiaoqiang Xu and Feng Zhou Paper: Arxiv We cannot release source codes pu

Qiang Meng 25 Dec 17, 2022
Python Rapid Artificial Intelligence Ab Initio Molecular Dynamics

Python Rapid Artificial Intelligence Ab Initio Molecular Dynamics

14 Nov 06, 2022
Source code for TACL paper "KEPLER: A Unified Model for Knowledge Embedding and Pre-trained Language Representation".

KEPLER: A Unified Model for Knowledge Embedding and Pre-trained Language Representation Source code for TACL 2021 paper KEPLER: A Unified Model for Kn

THU-KEG 138 Dec 22, 2022
Large-Scale Unsupervised Object Discovery

Large-Scale Unsupervised Object Discovery Huy V. Vo, Elena Sizikova, Cordelia Schmid, Patrick Pérez, Jean Ponce [PDF] We propose a novel ranking-based

17 Sep 19, 2022
a general-purpose Transformer based vision backbone

Swin Transformer By Ze Liu*, Yutong Lin*, Yue Cao*, Han Hu*, Yixuan Wei, Zheng Zhang, Stephen Lin and Baining Guo. This repo is the official implement

Microsoft 9.9k Jan 08, 2023
🕵 Artificial Intelligence for social control of public administration

Non-tech crash course into Operação Serenata de Amor Tech crash course into Operação Serenata de Amor Contributing with code and tech skills Supportin

Open Knowledge Brasil - Rede pelo Conhecimento Livre 4.4k Dec 31, 2022
ML From Scratch

ML from Scratch MACHINE LEARNING TOPICS COVERED - FROM SCRATCH Linear Regression Logistic Regression K Means Clustering K Nearest Neighbours Decision

Tanishq Gautam 66 Nov 02, 2022
ST++: Make Self-training Work Better for Semi-supervised Semantic Segmentation

ST++ This is the official PyTorch implementation of our paper: ST++: Make Self-training Work Better for Semi-supervised Semantic Segmentation. Lihe Ya

Lihe Yang 147 Jan 03, 2023
1st place solution in CCF BDCI 2021 ULSEG challenge

1st place solution in CCF BDCI 2021 ULSEG challenge This is the source code of the 1st place solution for ultrasound image angioma segmentation task (

Chenxu Peng 30 Nov 22, 2022
:fire: 2D and 3D Face alignment library build using pytorch

Face Recognition Detect facial landmarks from Python using the world's most accurate face alignment network, capable of detecting points in both 2D an

Adrian Bulat 6k Dec 31, 2022
The King is Naked: on the Notion of Robustness for Natural Language Processing

the-king-is-naked: on the notion of robustness for natural language processing AAAI2022 DISCLAIMER:This repo will be updated soon with instructions on

Iperboreo_ 1 Nov 24, 2022