Video Background Music Generation with Controllable Music Transformer (ACM MM 2021 Oral)

Last update: Dec 27, 2022

Related tags

Deep Learning video-bgm-generation

Overview

CMT

Code for paper Video Background Music Generation with Controllable Music Transformer (ACM MM 2021 Best Paper Award)

[Paper] [Site]

Directory Structure

src/: code of the whole pipeline
- train.py: training script, take a npz as input music data to train the model
- model.py: code of the model
- gen_midi_conditional.py: inference script, take a npz (represents a video) as input to generate several songs
- src/video2npz/: convert video into npz by extracting motion saliency and motion speed
dataset/: processed dataset for training, in the format of npz
logs/: logs that automatically generate during training, can be used to track training process
exp/: checkpoints, named after val loss (e.g. loss_13_params.pt)
inference/: processed video for inference (.npz), and generated music(.mid)

Preparation

clone this repo
download lpd_5_prcem_mix_v8_10000.npz from HERE and put it under dataset/
download pretrained model loss_8_params.pt from HERE and put it under exp/
install ffmpeg=3.2.4
prepare a Python3 conda environment
```
pip install -r py3_requirements.txt
```
prepare a Python2 conda environment (for extracting visbeat)
- ```
pip install -r py2_requirements.txt
```
- open visbeat package directory (e.g. anaconda3/envs/XXXX/lib/python2.7/site-packages/visbeat), replace the original Video_CV.py with src/video2npz/Video_CV.py

Training

If you want to use another training set: convert training data from midi into npz under dataset/
```
python midi2numpy_mix.py --midi_dir /PATH/TO/MIDIS/ --out_name data.npz 
```

train the model

python train.py -n XXX -g 0 1 2 3

# -n XXX: the name of the experiment, will be the name of the log file & the checkpoints directory. if XXX is 'debug', checkpoints will not be saved
# -l (--lr): initial learning rate
# -b (--batch_size): batch size
# -p (--path): if used, load model checkpoint from the given path
# -e (--epochs): number of epochs in training
# -t (--train_data): path of the training data (.npz file) 
# -g (--gpus): ids of gpu
# other model hyperparameters: modify the source .py files

Inference

convert input video (MP4 format) into npz (use the Python2 environment)
```
cd src/video2npz
sh video2npz.sh ../../videos/xxx.mp4
```
- try resizing the video if this takes a long time

run model to generate .mid :

python gen_midi_conditional.py -f "../inference/xxx.npz" -c "../exp/loss_8_params.pt"

# -c: checkpoints to be loaded
# -f: input npz file
# -g: id of gpu (only one gpu is needed for inference)

if using another training set, change decoder_n_class in gen_midi_conditional to the decoder_n_class in train.py

convert midi into audio: use GarageBand (recommended) or midi2audio
- set tempo to the value of tempo in video2npz/metadata.json

combine original video and audio into video with BGM

ffmpeg -i 'xxx.mp4' -i 'yyy.mp3' -c:v copy -c:a aac -strict experimental -map 0:v:0 -map 1:a:0 'zzz.mp4'

# xxx.mp4: input video
# yyy.mp3: audio file generated in the previous step
# zzz.mp4: output video

Video Background Music Generation with Controllable Music Transformer (ACM MM 2021 Oral)

Related tags

Overview

CMT

Directory Structure

Preparation

Training

Inference

Owner

Zhaokai Wang

code for the ICLR'22 paper: On Robust Prefix-Tuning for Text Classification

A Unified Generative Framework for Various NER Subtasks.

This is an official pytorch implementation of Fast Fourier Convolution.

Unsupervised clustering of high content screen samples

[ICLR2021] Unlearnable Examples: Making Personal Data Unexploitable

Calibrated Hyperspectral Image Reconstruction via Graph-based Self-Tuning Network.

ManiSkill-Learn is a framework for training agents on SAPIEN Open-Source Manipulation Skill Challenge (ManiSkill Challenge), a large-scale learning-from-demonstrations benchmark for object manipulation.

source code of “Visual Saliency Transformer” (ICCV2021)

Align and Prompt: Video-and-Language Pre-training with Entity Prompts

tree-math: mathematical operations for JAX pytrees

Reinforcement Learning for the Blackjack

Simulations for Turring patterns on an apically expanding domain. T

Resources complimenting the Machine Learning Course led in the Faculty of mathematics and informatics part of Sofia University.

A pytorch reproduction of { Co-occurrence Feature Learning from Skeleton Data for Action Recognition and Detection with Hierarchical Aggregation }.

Source code for paper: Knowledge Inheritance for Pre-trained Language Models

PyTorch implementation of convolutional neural networks-based text-to-speech synthesis models

fcn by tensorflow

Utilities to bridge Canvas-generated course rosters with GitLab's API.

Code for paper PairRE: Knowledge Graph Embeddings via Paired Relation Vectors.

ExCon: Explanation-driven Supervised Contrastive Learning