SinGlow: Generative Flow for SVS tasks in Tensorflow 2

Related tags

AudioSinGlow
Overview

SinGlow: Generative Flow for SVS tasks in Tensorflow 2

python 3 tensorflow 2

See more in the paper: SinGlow: Singing Voice Synthesis with Glow ---- Help Virtual Singers More Human-like

SinGlow is a part of my Singing voice synthesis system. It can extract features of sound, particularly songs and musics. Then we can use these features (or perfect encoding) for feature migrating tasks. For example migrate features of real singers' song to those virtual singers' songs.

This project is developed above the project GLOW-tf2 under MIT licence, and the following words are from its developers.

My implementation of GLOW from the paper https://arxiv.org/pdf/1807.03039 in Tensorflow 2. GLOW is an interesting generative model as it uses invertible neural network to transform images to normal distribution and vice versa. Additionally, it is strongly based on RealNVP, so knowing it would be helpful to understand GLOW's contribution.

Table of Contents


Abstract

Singing voice synthesis (SVS) is a task using the computer to generate songs with lyrics. So far, researchers are focusing on tunning the pre-recorded sound pieces according to rigid rules. For example, in Vocaloid, one of the commercial SVS systems, there are 8 principal parameters modifiable by song creators. The system uses these parameters to synthesise sound pieces pre-recorded from professional voice actors. We notice a common difference between computer-generated songs and real singers' songs. This difference can be addressed to help the generated ones become more like the real-singer ones.

In this paper, we propose SinGlow, as a solution to minimise this difference. SinGlow is one of the Normalising Flow that directly uses the calculated Negative Log-Likelihood value to optimise the trainable parameters. This feature gives SinGlow the ability to perfectly encode inputs into feature vectors, which allows us to manipulate the feature space to minimise the difference we discussed before. To our best knowledge, we are the first to propose an application of Normalising Flow in SVS fields.

In our experiments, SinGlow shows the ability to encode sound and make the input virtual-singer songs more human-like.

Structure

SinGlow
│   train.py //need modification, replace data dirs with yours
│   common_definitions.py //model configurations are located here
│   data_loarder.py //construct tfrecord dataset from wav or mp3 data, and load it
│   model.py //Glow Model / SinGlow Model
│   pipeline.py //training pipeline
│   README.md
├───utils
│       utils.py //originate from Glow-OpenAI
│       weightnorm.py //originate from Tensorflow
├───checkpoints
│       weights.h5 //the model weights file
├───runs //outputs and rerecords
├───logs //tensorboard logdir
├───design //model architecture information
└───notebooks
        run.ipynb //dataset construction and applying model
        experiment.ipynb //evaluate model
        README.md //some user guide

Requirements

pip3 install -r requirements.txt

Training

After every epoch, the network's weights will be stored in the checkpoints directory defined in common_definitions.py.

There are also some sampling of the network (image generation mode) that are going to be stored in results directory. Additionally, TensorBoard is used to track z's mean and variance, as well as the negative log-likelihood.

In optimal state, z should have zero mean and one variance. Additionally, the TensorBoard stores sampling with temperature of 0.7.

python3 train.py [-h] [--dataset [DATASET]] [--k_glow [K_GLOW]] [--l_glow [L_GLOW]]
       [--img_size [IMG_SIZE]] [--channel_size [CHANNEL_SIZE]]

optional arguments:
  -h, --help            show this help message and exit
  --dataset [DATASET]   The dataset to train on ("mnist", "cifar10", "cifar100")
  --k_glow [K_GLOW]     The amount of blocks per layer
  --l_glow [L_GLOW]     The amount of layers
  --img_size [IMG_SIZE] The width and height of the input images
  --channel_size [CHANNEL_SIZE]
                        The channel size of the input images

CONTRIBUTING

To contribute to the project, these steps can be followed. Anyone that contributes will surely be recognized and mentioned here!

Contributions to the project are made using the "Fork & Pull" model. The typical steps would be:

  1. create an account on github
  2. fork this repository
  3. make a local clone
  4. make changes on the local copy
  5. commit changes git commit -m "my message"
  6. push to your GitHub account: git push origin
  7. create a Pull Request (PR) from your GitHub fork (go to your fork's webpage and click on "Pull Request." You can then add a message to describe your proposal.)

LICENSE

This open-source project is licensed under MIT License.

Reference

TODO the reference information

中文注释

这是一个基于流模型的歌曲特征提取,并进行风格迁移的项目。我们一定程度上实现了将真实人声歌曲的特征迁移到虚拟歌手的歌曲上。

我们接下来的计划是继续优化模型,并在歌曲切割上取得进展,向着研究落地努力。


我们有一个堆满创意点子的秘密基地,里面有很多有意思的小伙伴。生活什么的、技术什么的、二次元什么的都可以聊得开。

欢迎加入我们的小群:兔叽的魔术工房。群内会经常发布各种各样的企划,总会遇上你感兴趣的。

Owner
Haobo Yang
A 3rd-year undergraduate student, hope to be an AI Architect in the future.
Haobo Yang
📺Headless全自动B站直播录播、切片、上传一体工具

DDRecorder Headless全自动B站直播录播、切片、上传一体工具 感谢 FortuneDayssss/BilibiliUploader 安装指南(Windows) 在Release下载zip包解压。 修改配置文件config.json 双击运行DDRecorder.exe (这将使用co

322 Dec 27, 2022
Powerful, simple, audio tag editor for GNU/Linux

puddletag puddletag is an audio tag editor (primarily created) for GNU/Linux similar to the Windows program, Mp3tag. Unlike most taggers for GNU/Linux

341 Dec 26, 2022
PyAbsorp is a python module that has the main focus to help estimate the Sound Absorption Coefficient.

This is a package developed to be use to find the Sound Absorption Coefficient through some implemented models, like Biot-Allard, Johnson-Champoux and

Michael Markus Ackermann 8 Oct 19, 2022
Basically Play Pauses the song when it is safe to do so. when you die in a round

Basically Play Pauses the song when it is safe to do so. when you die in a round

AG_1436 1 Feb 13, 2022
Read music meta data and length of MP3, OGG, OPUS, MP4, M4A, FLAC, WMA and Wave files with python 2 or 3

tinytag tinytag is a library for reading music meta data of MP3, OGG, OPUS, MP4, M4A, FLAC, WMA and Wave files with python Install pip install tinytag

Tom Wallroth 577 Dec 26, 2022
Hide Your Secret Message in any Wave Audio File.

HiddenWave Embedding secret messages in wave audio file What is HiddenWave Hiddenwave is a python based program for simple audio steganography. You ca

TechChip 99 Dec 28, 2022
Port Hitsuboku Kumi Chinese CVVC voicebank to deepvocal. / 筆墨クミDeepvocal中文音源

Hitsuboku Kumi (筆墨クミ) is a UTAU virtual singer developed by Cubialpha. This project ports Hitsuboku Kumi Chinese CVVC voicebank to deepvocal. This is the first open-source deepvocal voicebank on Gith

8 Apr 26, 2022
A rofi-blocks script that searches youtube and plays the selected audio on mpv.

rofi-ytm A rofi-blocks script that searches youtube and plays the selected audio on mpv. To use the script, run the following command rofi -modi block

Cliford 26 Dec 21, 2022
This is my voice assistant Patric!

voice-assistant This is my voice assistant Patric! You can add can add commands and even modify his name Indice How to use Installation guide How to u

Norbert Gabos 1 Jun 28, 2022
A Python port and library-fication of the midicsv tool by John Walker.

A Python port and library-fication of the midicsv tool by John Walker. If you need to convert MIDI files to human-readable text files and back, this is the library for you.

Tim Wedde 52 Dec 29, 2022
Python audio and music signal processing library

madmom Madmom is an audio signal processing library written in Python with a strong focus on music information retrieval (MIR) tasks. The library is i

Institute of Computational Perception 1k Dec 26, 2022
Jarvis From Basic to Advance - make a voice assistant similar to JARVIS (in iron man movie)

JARVIS (Basic to Advance) This was my attempt to make a voice assistant similar to JARVIS (in iron man movie) Let's be honest, it's not as intelligent

codesempai 17 Dec 25, 2022
Mopidy is an extensible music server written in Python

Mopidy Mopidy is an extensible music server written in Python. Mopidy plays music from local disk, Spotify, SoundCloud, Google Play Music, and more. Y

Mopidy 7.6k Jan 05, 2023
Audio spatialization over WebRTC and JACK Audio Connection Kit

Audio spatialization over WebRTC Spatify provides a framework for building multichannel installations using WebRTC.

Bruno Gola 34 Jun 29, 2022
Code for "Audio-driven Talking Face Video Generation with Learning-based Personalized Head Pose"

Audio-driven Talking Face Video Generation with Learning-based Personalized Head Pose We provide PyTorch implementations for our arxiv paper "Audio-dr

Ran Yi 497 Jan 09, 2023
DCL - An easy to use diacritic library used for diacritic and accent manipulation.

Diacritics Library This library is used for adding, and removing diacritics from strings. Getting started Start by importing the module: import dcl DC

Kreus Amredes 6 Jun 03, 2022
Music bot of # Owner

Pokimane-Music Music bot of # Owner How To Host The easiest way to deploy this Bot Support Channel :- TeamDlt Support Group :- TeamDlt Please fork thi

5 Dec 23, 2022
Anaphones are like anagrams, but for sounds.

Anaphones Anaphones are like anagrams but for sounds (phonemes). Examples include: salami-awesomely, atari-tiara, and beefy-phoebe. Anaphones can be a

James Murphy 18 Nov 02, 2022
Spotipy - Player de música simples em Python

Spotipy Player de música simples em Python, utilizando a biblioteca Pysimplegui para a interface gráfica. Este tocador é bastante simples em si, mas p

Adelino Almeida 4 Feb 28, 2022
An Amazon Music client for Linux (unpretentious)

Amusiz An Amazon Music client for Linux (unpretentious) ↗️ Install You can install Amusiz in multiple ways, choose your favorite. 🚀 AppImage Here you

Mirko Brombin 25 Nov 08, 2022