Code for "Unsupervised Source Separation via Bayesian inference in the latent domain"

Overview

LQVAE-separation

Code for "Unsupervised Source Separation via Bayesian inference in the latent domain"

Paper

Samples

GT Compressed Separated
Drums GT Compressed Drums Separated Drums
Bass GT Compressed Bass Separated Bass
Mix GT Compressed Mix Separated Mix

The separation is performed on a x64 compressed latent domain. The results can be upsampled via Jukebox upsamplers in order to increment perceptive quality (WIP).

Install

Install the conda package manager from https://docs.conda.io/en/latest/miniconda.html

conda create --name lqvae-separation python=3.7.5
conda activate lqvae-separation
pip install mpi4py==3.0.3
pip install ffmpeg-python==0.2.0
pip install torch==1.7.1 torchvision==0.8.2 torchaudio==0.7.2
pip install -r requirements.txt
pip install -e .

Checkpoints

  • Enter inside script/ folder and create the folder checkpoints/ and the folder results/.
  • Download the checkpoints contained in this Google Drive folder and put them inside checkpoints/

Separation with checkpoints

  • Call the following in order to perform bs separations of 3 seconds starting from second shift of the mixture created with the sources in path_1 and path_2. The sources must be WAV files sampled at 22kHz.
    PYTHONPATH=.. python bayesian_inference.py --shift=shift --path_1=path_1 --path_2=path_2 --bs=bs
    
  • The default value for bs is 64, and can be handled by an RTX3080 with 16 GB of VRAM. Lower the value if you get CUDA: out of memory.

Training

LQ-VAE

  • The vqvae/vqvae.pyfile of Jukebox has been modified in order to include the linearization loss of the LQ-VAE (it is computed at all levels of the hierarchical VQ-VAE but we only care of the topmost level given that we perform separation there). One can train a new LQ-VAE on custom data (here data/train for train and data/test for test) by running the following from the root of the project
PYTHONPATH=. mpiexec -n 1 python jukebox/train.py --hps=vqvae --sample_length=131072 --bs=8 
--audio_files_dir=data/train/ --labels=False --train --test --aug_shift --aug_blend --name=lq_vae --test_audio_files_dir=data/test
  • The trained model uses the vqvae hyperparameters in hparams.py so if you want to change the levels / downsampling factors you have to modify them there.
  • The only constraint for training the LQ-VAE is to use an even number for the batch size, given its use of pairs in the loss.
  • Given that L_lin enforces the sum operation on the latent domain, you can use the data of both sources together (or any other audio data).
  • Checkpoints are save in logs/lq_vae (lq_vae is the name parameter).

Priors

  • After training the LQ-VAE, train two priors on two different classes by calling
PYTHONPATH=. mpiexec -n 1 python jukebox/train.py --hps=vqvae,small_prior,all_fp16,cpu_ema --name=pior_source
 --audio_files_dir=data/source/train --test_audio_files_dir=data/source/test --labels=False --train --test --aug_shift
  --aug_blend --prior --levels=3 --level=2 --weight_decay=0.01 --save_iters=1000 --min_duration=24 --sample_length=1048576 
  --bs=16 --n_ctx=8192 --sample=True --sample_iters=1000 --restore_vqvae=logs/lq_vae/checkpoint_lq_vae.pth.tar
  • Here the data of the source is located in data/source/train and data/source/test and we assume the LQ-VAE has 3 levels (topmost level = 2).
  • The Transformer model is defined by the parameters of small_prior in hparams.py and uses a context of n_ctx=8192 codes.
  • The checkpoint path of the LQ-VAE trained in the previous step must be passed to --restore_vqvae
  • Checkpoints are save in logs/pior_source (pior_source is the name parameter).

Codebook sums

  • Before separation, the sums between all codes must be computed using the LQ-VAE. This can be done using the codebook_precalc.py in the script folder:
PYTHONPATH=.. python codebook_precalc.py --save_path=checkpoints/codebook_sum_precalc.pt 
--restore_vqvae=../logs/lq_vae/checkpoint_lq_vae.pth.tar` --raw_to_tokens=64 --l_bins=2048
--sample_rate=22050 --alpha=[0.5, 0.5] --downs_t=(2, 2, 2) --commit=1.0 --emb_width=64

Separation with trained checkpoints

  • Trained checkpoints can be given to bayesian_inference.py as following:
    PYTHONPATH=.. python bayesian_inference.py --shift=shift --path_1=path_1 --path_2=path_2 --bs=bs --restore_vqvae=checkpoints/checkpoint_step_60001_latent.pth.tar
    --restore_priors 'checkpoints/checkpoint_drums_22050_latent_78_19k.pth.tar' checkpoints/checkpoint_latest.pth.tar' --sum_codebook=checkpoints/codebook_precalc_22050_latent.pt
    
  • restore_priors accepts two paths to the first and second prior checkpoints.

Evaluation

  • In order to evaluate the pre-trained checkpoints, run bayesian_test.py after you have put the full Slakh drums and bass validation split inside data/bass/validation and data/drums/validation.

Future work

  • training of upsamplers for increasing the quality of the separations
  • better rejection sampling method (maybe use verifiers as in https://arxiv.org/abs/2110.14168)

Citations

If you find the code useful for your research, please consider citing

@article{mancusi2021unsupervised,
  title={Unsupervised Source Separation via Bayesian Inference in the Latent Domain},
  author={Mancusi, Michele and Postolache, Emilian and Fumero, Marco and Santilli, Andrea and Cosmo, Luca and Rodol{\`a}, Emanuele},
  journal={arXiv preprint arXiv:2110.05313},
  year={2021}
}

as well as the Jukebox baseline:

  • Dhariwal, P., Jun, H., Payne, C., Kim, J. W., Radford, A., & Sutskever, I. (2020). Jukebox: A generative model for music. arXiv preprint arXiv:2005.00341.
Owner
Michele Mancusi
PhD student in Computer Science @ La Sapienza University of Rome, MSc in Quantum Information @ La Sapienza University of Rome
Michele Mancusi
Heart Arrhythmia Classification

This program takes and input of an ECG in European Data Format (EDF) and outputs the classification for heartbeats into normal vs different types of arrhythmia . It uses a deep learning model for cla

4 Nov 02, 2022
Beginner-friendly repository for Hacktober Fest 2021. Start your contribution to open source through baby steps. 💜

Hacktober Fest 2021 🎉 Open source is changing the world – one contribution at a time! 🎉 This repository is made for beginners who are unfamiliar wit

Abhilash M Nair 32 Dec 11, 2022
Element selection for functional materials discovery by integrated machine learning of atomic contributions to properties

Element selection for functional materials discovery by integrated machine learning of atomic contributions to properties 8.11.2021 Andrij Vasylenko I

Leverhulme Research Centre for Functional Materials Design 4 Dec 20, 2022
This repository contains the DendroMap implementation for scalable and interactive exploration of image datasets in machine learning.

DendroMap DendroMap is an interactive tool to explore large-scale image datasets used for machine learning. A deep understanding of your data can be v

DIV Lab 33 Dec 30, 2022
Official code for the paper "Why Do Self-Supervised Models Transfer? Investigating the Impact of Invariance on Downstream Tasks".

Why Do Self-Supervised Models Transfer? Investigating the Impact of Invariance on Downstream Tasks This repository contains the official code for the

Linus Ericsson 11 Dec 16, 2022
Experiments with Fourier layers on simulation data.

Factorized Fourier Neural Operators This repository contains the code to reproduce the results in our NeurIPS 2021 ML4PS workshop paper, Factorized Fo

Alasdair Tran 57 Dec 25, 2022
Source code for the BMVC-2021 paper "SimReg: Regression as a Simple Yet Effective Tool for Self-supervised Knowledge Distillation".

SimReg: A Simple Regression Based Framework for Self-supervised Knowledge Distillation Source code for the paper "SimReg: Regression as a Simple Yet E

9 Oct 15, 2022
Official implementation of "CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding" (CVPR, 2022)

CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding (CVPR'22) Paper Link | Project Page Abstract : Manual an

Mohamed Afham 152 Dec 23, 2022
Implementation for On Provable Benefits of Depth in Training Graph Convolutional Networks

Implementation for On Provable Benefits of Depth in Training Graph Convolutional Networks Setup This implementation is based on PyTorch = 1.0.0. Smal

Weilin Cong 8 Oct 28, 2022
RL-driven agent playing tic-tac-toe on starknet against challengers.

tictactoe-on-starknet RL-driven agent playing tic-tac-toe on starknet against challengers. GUI reference: https://pythonguides.com/create-a-game-using

21 Jul 30, 2022
Laser device for neutralizing - mosquitoes, weeds and pests

Laser device for neutralizing - mosquitoes, weeds and pests (in progress) Here I will post information for creating a laser device. A warning!! How It

Ildaron 1k Jan 02, 2023
Stochastic Extragradient: General Analysis and Improved Rates

Stochastic Extragradient: General Analysis and Improved Rates This repository is the official implementation of the paper "Stochastic Extragradient: G

Hugo Berard 4 Nov 11, 2022
Understanding Convolution for Semantic Segmentation

TuSimple-DUC by Panqu Wang, Pengfei Chen, Ye Yuan, Ding Liu, Zehua Huang, Xiaodi Hou, and Garrison Cottrell. Introduction This repository is for Under

TuSimple 585 Dec 31, 2022
Structured Edge Detection Toolbox

################################################################### # # # Structure

Piotr Dollar 779 Jan 02, 2023
The Unreasonable Effectiveness of Random Pruning: Return of the Most Naive Baseline for Sparse Training

[ICLR 2022] The Unreasonable Effectiveness of Random Pruning: Return of the Most Naive Baseline for Sparse Training The Unreasonable Effectiveness of

VITA 44 Dec 23, 2022
An implementation of "MixHop: Higher-Order Graph Convolutional Architectures via Sparsified Neighborhood Mixing" (ICML 2019).

MixHop and N-GCN â € A PyTorch implementation of "MixHop: Higher-Order Graph Convolutional Architectures via Sparsified Neighborhood Mixing" (ICML 2019)

Benedek Rozemberczki 393 Dec 13, 2022
PyTorch implementation of probabilistic deep forecast applied to air quality.

Probabilistic Deep Forecast PyTorch implementation of a paper, titled: Probabilistic Deep Learning to Quantify Uncertainty in Air Quality Forecasting

Abdulmajid Murad 13 Nov 16, 2022
PyTorch image models, scripts, pretrained weights -- ResNet, ResNeXT, EfficientNet, EfficientNetV2, NFNet, Vision Transformer, MixNet, MobileNet-V3/V2, RegNet, DPN, CSPNet, and more

PyTorch Image Models Sponsors What's New Introduction Models Features Results Getting Started (Documentation) Train, Validation, Inference Scripts Awe

Ross Wightman 22.9k Jan 09, 2023
MAGMA - a GPT-style multimodal model that can understand any combination of images and language

MAGMA -- Multimodal Augmentation of Generative Models through Adapter-based Finetuning Authors repo (alphabetical) Constantin (CoEich), Mayukh (Mayukh

Aleph Alpha GmbH 331 Jan 03, 2023
PyTorch implementation HoroPCA: Hyperbolic Dimensionality Reduction via Horospherical Projections

HoroPCA This code is the official PyTorch implementation of the ICML 2021 paper: HoroPCA: Hyperbolic Dimensionality Reduction via Horospherical Projec

HazyResearch 52 Nov 14, 2022