Music Source Separation; Train & Eval & Inference piplines and pretrained models we used for 2021 ISMIR MDX Challenge.

Overview

Open In Colab

Update on 2021.09

Here is the package torchsubband I wrote for subband decomposition.

https://github.com/haoheliu/torchsubband

Music Source Separation with Channel-wise Subband Phase Aware ResUnet (CWS-PResUNet)

ranking

Introduction

This repo contains the pretrained Music Source Separation models I submitted to the 2021 ISMIR MSS Challenge. We only participate the Leaderboard A, so these models are solely trained on MUSDB18HQ.

You can use this repo to separate 'bass', 'drums', 'vocals', and 'other' tracks from a music mixture. Also we provides our vocals and other models' training pipline. You can train your own model easily.

As is shown in the following picture, in leaderboard A, we(ByteMSS) achieved the 2nd on Vocal score and 5th on average score. For bass and drums separation, we directly use the open-sourced demucs model. It's trained with only MUSDB18HQ data, thus is qualified for LeaderBoard A.

ranking

1. Usage (For MSS)

1.1 Prepare running environment

First you need to clone this repo:

git clone https://github.com/haoheliu/2021-ISMIR-MSS-Challenge-CWS-PResUNet.git

Install the required packages

cd 2021-ISMIR-MSS-Challenge-CWS-PResUNet
pip3 install --upgrade virtualenv==16.7.9 # this version virtualenv support the --no-site-packages option
virtualenv --no-site-packages env_mss # create new environment
source env_mss/bin/activate # activate environment
pip3 install -r requirements.txt # install requirements

You'd better have wget and unzip command installed so that the scripts can automatically download pretrained models and unzip them.

1.2 Use pretrained model

To use the pretrained model to conduct music source separation. You can run the following demos. If it's the first time you run this program, it will automatically download the pretrained models.

python3 main -i <input-wav-file-path/folder> 
             -o <output-path-dir> 
             -s <sources-to-separate>  # vocals bass drums other (all four stems by default)
             --cuda  # if wanna use GPU, use this flag
             # --wiener  # if wanna use wiener filtering, use this flag. 
             # '--wiener' can take effect only when separation of all four tracks are done or you separate four tracks at the same time.
             
# <input-wav-file-path> is the .wav file to be separated or a folder containing all .wav mixtures.
# <output-path-dir> is the folder to store the separation results 
# python3 main.py -i <input-wav-file-path> -o <output-path-dir>
# Separate a single file to four sources
python3 main.py -i example/test/zeno_sign_stereo.wav -o example/results -s vocals bass drums other
# Separate all the files in a folder
python3 main.py -i example/test/ -o example/results
# Use GPU Acceleration
python3 main.py -i example/test/zeno_sign_stereo.wav -o example/results --cuda
# Separate all the files in a folder using GPU and wiener filtering post processing (may introduce new distortions, make the results even worse.)
python3 main.py -i example/test -o example/results --cuda # --wiener

Each pretrained model in this repo take us approximately two days on 8 V100 GPUs to train.

1.3 Train new MSS models from scratch

1.3.1 How to train

For the training data:

  • If you havn't download musdb18hq, we will automatically download the dataset for you by running the following command.
  • If you have already download musdb18hq, you can put musdb18hq.zip or musdb18hq folder into the data folder and run init.sh to prepare this dataset.
source init.sh

Finally run either of these two commands to start training.

# For track 'vocals', we use a 4 subbands resunet to perform separation. 
# The input of model is mixture and its output is vocals waveform.
# Note: Batchsize is set to 16 by default. Check your hard ware configurations to avoid GPU OOM.
source models/resunet_conv8_vocals/run.sh

# For track 'other', we also use a 4 subbands resunet to perform separation.
# But for this track, we did a little modification.
# The input of model is mixture, and its output are bass, other and drums waveforms. (bass and drums are only used during training) 
# We calculate the losses for "bass","other", and "drums" these three sources together.
# Result shows that joint training is beneficial for 'other' track.
# Note: Batchsize is set to 16 by default. Check your hard ware configurations to avoid GPU OOM.
source models/resunet_joint_training_other/run.sh
  • By default, we use batchsize 8 and 8 gpus for vocal and batchsize 16 and 8 gpus for other. You can custom your own by modifying parameters in the above run.sh files.

  • Training logs will be presented in the mss_challenge_log folder. System will perform validations every two epoches.

Here we provide the result of a test run: 'source models/resunet_conv8_vocals/run.sh'.

ranking

1.3.2 Use the model you trained

To use the the vocals and the other model you trained by your own. You need to modify the following two variables in the predictor.py to the path of your models.

41 ...
42  v_model_path = <path-to-your-vocals-model>
43  o_model_path = <path-to-your-other-model>
44 ...

1.4 Model Evaluation

Since the evaluation process is slow, we separate the evaluation process out as a single task. It's conducted on the validation results generated during training.

Steps:

  1. Locate the path of the validation result. After training, you will get a validation folder inside your loging directory (mss_challenge_log by default).

  2. Determine which kind of source you wanna evaluate (bass, vocals, others or drums). Make sure its results present in the validation folder.

  3. Run eval.sh with two arguments: the source type and the validation results folder (automatic generated after training in the logging folder).

For example:

# source eval.sh <source-type> <your-validation-results-folder-after-training> 

# evaluate vocal score
source eval.sh vocals mss_challenge_log/2021-08-11-subband_four_resunet_for_vocals-vocals/version_0/validations
# evaluate bass score
source eval.sh bass mss_challenge_log/2021-08-11-subband_four_resunet_for_vocals-vocals/version_0/validations
# evaluate drums score
source eval.sh drums mss_challenge_log/2021-08-11-subband_four_resunet_for_vocals-vocals/version_0/validations
# evaluate other score
source eval.sh other mss_challenge_log/2021-08-11-subband_four_resunet_for_vocals-vocals/version_0/validations

The system will save the overall score and the score for each song in the result folder.

For faster evalution, you can adjust the parameter MAX_THREAD insides the evaluator/eval.py to determine how many threads you gonna use. It's value should fit your computer resources. You can start with MAX_THREAD=3 and then try 6, 10 or 16.

2. Usage (For customizing sound source)

This feature allows you to separate an arbitrary sound source as long as you got enough training data.

This colab demonstrates the following procedure.

Step1: Prepare running environment.

! git clone https://github.com/haoheliu/2021-ISMIR-MSS-Challenge-CWS-PResUNet.git
# MAKE SURE SOX IS INSTALLED
#!apt-get install libsox-fmt-all libsox-dev sox > /dev/null
%cd 2021-ISMIR-MSS-Challenge-CWS-PResUNet
! pip3 install -r requirements.txt

Step2: Organize your data

I assume that you have already got the following two disjoint kinds of data (there are sample datas in this repo when you clone it):

  1. the_source_you_want_to_get (for example, speech data)
  2. the_source_you_want_to_remove (for example, noise data)
  • Split and put these data into data/your_data folder:
    • train(about 90%~99%): training data (used during training)
      • the_source_you_want_to_get: put your target source (the source you'd like to separate out) audios into this folder
      • the_source_you_want_to_remove: put undesired sources audios into this folder
    • test(about 1%~10%): testing data (used during validation, every two epoches)
      • the_source_you_want_to_get
      • the_source_you_want_to_remove
  • Then run:
# Automatic parsing your data
source init_your_data.sh

Step3: Start training!

  • Use the same MSS model
source models/resunet_conv8_vocals/run.sh

This script use 8 gpus with 8 batchsize by default. You may need to modify this run.sh to fit in your machine.

  • Use a smaller model (1/8)
source models/resunet_conv1_vocals/run.sh

Log file will be automatic generated. You can check validation results during training, which update every two epoches.

Hints:

  • To perform separation on real test data, you can upload validation data as real_mixture + silent.
  • To make an epoch shorter, you can modify the parameter HOURS_FOR_A_EPOCH inside models/dataloader/loaders/individual_loader.py.

3. Reference

If you find our code useful for your research, please consider citing:

@misc{liu2021cwspresunet,
    title={CWS-PResUNet: Music Source Separation with Channel-wise Subband Phase-aware ResUNet},
    author={Haohe Liu and Qiuqiang Kong and Jiafeng Liu},
    year={2021},
    eprint={2112.04685},
    archivePrefix={arXiv},
    primaryClass={cs.SD}
}
@inproceedings{Liu2020,   
  author={Haohe Liu and Lei Xie and Jian Wu and Geng Yang},   
  title={{Channel-Wise Subband Input for Better Voice and Accompaniment Separation on High Resolution Music}},   
  year=2020,   
  booktitle={Proc. Interspeech 2020},   
  pages={1241--1245},   
  doi={10.21437/Interspeech.2020-2555},   
  url={http://dx.doi.org/10.21437/Interspeech.2020-2555}   
}.

4. Change log

2021-11-20: Update the demucs version. Now I directly use the mdx version demucs in this repo to separate bass and drums.

Owner
Leo
Speech Quality Enhancement | Music Source Separation | Speech Synthesis
Leo
LSTM-VAE Implementation and Relevant Evaluations

LSTM-VAE Implementation and Relevant Evaluations Before using any file in this repository, please create two directories under the root directory name

Lan Zhang 5 Oct 08, 2022
FaceQgen: Semi-Supervised Deep Learning for Face Image Quality Assessment

FaceQgen FaceQgen: Semi-Supervised Deep Learning for Face Image Quality Assessment This repository is based on the paper: "FaceQgen: Semi-Supervised D

Javier Hernandez-Ortega 3 Aug 04, 2022
[peer review] An Arbitrary Scale Super-Resolution Approach for 3D MR Images using Implicit Neural Representation

ArSSR This repository is the pytorch implementation of our manuscript "An Arbitrary Scale Super-Resolution Approach for 3-Dimensional Magnetic Resonan

Qing Wu 19 Dec 12, 2022
"MST++: Multi-stage Spectral-wise Transformer for Efficient Spectral Reconstruction" (CVPRW 2022) & (Winner of NTIRE 2022 Challenge on Spectral Reconstruction from RGB)

MST++: Multi-stage Spectral-wise Transformer for Efficient Spectral Reconstruction (CVPRW 2022) Yuanhao Cai, Jing Lin, Zudi Lin, Haoqian Wang, Yulun Z

Yuanhao Cai 274 Jan 05, 2023
Repository of 3D Object Detection with Pointformer (CVPR2021)

3D Object Detection with Pointformer This repository contains the code for the paper 3D Object Detection with Pointformer (CVPR 2021) [arXiv]. This wo

Zhuofan Xia 117 Jan 06, 2023
A different spin on dataclasses.

dataklasses Dataklasses is a library that allows you to quickly define data classes using Python type hints. Here's an example of how you use it: from

David Beazley 752 Nov 18, 2022
Learning High-Speed Flight in the Wild

Learning High-Speed Flight in the Wild This repo contains the code associated to the paper Learning Agile Flight in the Wild. For more information, pl

Robotics and Perception Group 391 Dec 29, 2022
UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language

UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language This repository contains UA-GEC data and an accompanying Python lib

Grammarly 226 Dec 29, 2022
:fire: 2D and 3D Face alignment library build using pytorch

Face Recognition Detect facial landmarks from Python using the world's most accurate face alignment network, capable of detecting points in both 2D an

Adrian Bulat 6k Dec 31, 2022
Implementation of CSRL from the AAAI2022 paper: Constraint Sampling Reinforcement Learning: Incorporating Expertise For Faster Learning

CSRL Implementation of CSRL from the AAAI2022 paper: Constraint Sampling Reinforcement Learning: Incorporating Expertise For Faster Learning Python: 3

4 Apr 14, 2022
UFPR-ADMR-v2 Dataset

UFPR-ADMR-v2 Dataset The UFPR-ADMRv2 dataset contains 5,000 dial meter images obtained on-site by employees of the Energy Company of Paraná (Copel), w

Gabriel Salomon 8 Sep 29, 2022
Fader Networks: Manipulating Images by Sliding Attributes - NIPS 2017

FaderNetworks PyTorch implementation of Fader Networks (NIPS 2017). Fader Networks can generate different realistic versions of images by modifying at

Facebook Research 753 Dec 23, 2022
CyTran: Cycle-Consistent Transformers for Non-Contrast to Contrast CT Translation

CyTran: Cycle-Consistent Transformers for Non-Contrast to Contrast CT Translation We propose a novel approach to translate unpaired contrast computed

Nicolae Catalin Ristea 13 Jan 02, 2023
Using LSTM write Tang poetry

本教程将通过一个示例对LSTM进行介绍。通过搭建训练LSTM网络,我们将训练一个模型来生成唐诗。本文将对该实现进行详尽的解释,并阐明此模型的工作方式和原因。并不需要过多专业知识,但是可能需要新手花一些时间来理解的模型训练的实际情况。为了节省时间,请尽量选择GPU进行训练。

56 Dec 15, 2022
Autoregressive Predictive Coding: An unsupervised autoregressive model for speech representation learning

Autoregressive Predictive Coding This repository contains the official implementation (in PyTorch) of Autoregressive Predictive Coding (APC) proposed

iamyuanchung 173 Dec 18, 2022
Video Representation Learning by Recognizing Temporal Transformations. In ECCV, 2020.

Video Representation Learning by Recognizing Temporal Transformations [Project Page] Simon Jenni, Givi Meishvili, and Paolo Favaro. In ECCV, 2020. Thi

Simon Jenni 46 Nov 14, 2022
A python library for highly configurable transformers - easing model architecture search and experimentation.

A python library for highly configurable transformers - easing model architecture search and experimentation.

Anthony Fuller 51 Nov 20, 2022
《Single Image Reflection Removal Beyond Linearity》(CVPR 2019)

Single-Image-Reflection-Removal-Beyond-Linearity Paper Single Image Reflection Removal Beyond Linearity. Qiang Wen, Yinjie Tan, Jing Qin, Wenxi Liu, G

Qiang Wen 51 Jun 24, 2022
LaneAF: Robust Multi-Lane Detection with Affinity Fields

LaneAF: Robust Multi-Lane Detection with Affinity Fields This repository contains Pytorch code for training and testing LaneAF lane detection models i

155 Dec 17, 2022
[SIGGRAPH 2020] Attribute2Font: Creating Fonts You Want From Attributes

Attr2Font Introduction This is the official PyTorch implementation of the Attribute2Font: Creating Fonts You Want From Attributes. Paper: arXiv | Rese

Yue Gao 200 Dec 15, 2022