Source code for "MusCaps: Generating Captions for Music Audio" (IJCNN 2021)

Overview

MusCaps: Generating Captions for Music Audio

Ilaria Manco1 2, Emmanouil Benetos1, Elio Quinton2, Gyorgy Fazekas1
1 Queen Mary University of London, 2 Universal Music Group

This repository is the official implementation of "MusCaps: Generating Captions for Music Audio" (IJCNN 2021). In this work, we propose an encoder-decoder model to generate natural language descriptions of music audio. We provide code to train our model on any dataset of (audio, caption) pairs, together with code to evaluate the generated descriptions on a set of automatic metrics (BLEU, METEOR, ROUGE, CIDEr, SPICE, SPIDEr).

Setup

The code was developed in Python 3.7 on Linux CentOS 7 and training was carried out on an RTX 2080 Ti GPU. Other GPUs and platforms have not been fully tested.

Clone the repo

git clone https://github.com/ilaria-manco/muscaps
cd muscaps

You'll need to have the libsndfile library installed. All other requirements, including the code package, can be installed with

pip install -r requirements.txt
pip install -e .

Project structure

root
├─ configs                      # Config files
│   ├─ datasets
│   ├─ models  
│   └─ default.yaml              
├─ data                         # Folder to save data (input data, pretrained model weights, etc.)
│   ├─ audio_encoders   
│   ├─ datasets            
│   │   └─ dataset_name     
|   └── ...             
├─ muscaps
|   ├─ caption_evaluation_tools # Translation metrics eval on audio captioning 
│   ├─ datasets                 # Dataset classes
│   ├─ models                   # Model code
│   ├─ modules                  # Model components
│   ├─ scripts                  # Python scripts for training, evaluation etc.
│   ├─ trainers                 # Trainer classes
│   └─ utils                    # Utils
└─ save                         # Saved model checkpoints, logs, configs, predictions    
    └─ experiments
        ├── experiment_id1
        └── ...                  

Dataset

The datasets used in our experiments is private and cannot be shared, but details on how to prepare an equivalent music captioning dataset are provided in the data README.

Pre-trained audio feature extractors

For the audio feature extraction component, MusCaps uses CNN-based audio tagging models like musicnn. In our experiments, we use @minzwon's implementation and pre-trained models, which you can download from the official repo. For example, to obtain the weights for the HCNN model trained on the MagnaTagATune dataset, run the following commands

mkdir data/audio_encoders
cd data/audio_encoders/
wget https://github.com/minzwon/sota-music-tagging-models/raw/master/models/mtat/hcnn/best_model.pth
mv best_model.pth mtt_hcnn.pth

Training

Dataset, model and training configurations are set in the respective yaml files in configs. Some of the fields can be overridden by arguments in the CLI (for more details on this, refer to the training script).

To train the model with the default configs, simply run

cd muscaps/scripts/
python train.py <baseline/attention> --feature_extractor <musicnn/hcnn> --pretrained_model <msd/mtt>  --device_num <gpu_number>

This will generate an experiment_id and create a new folder in save/experiments where the output will be saved.

If you wish to resume training from a saved checkpoint, run

python train.py <baseline/attention> --experiment_id <experiment_id>  --device_num <gpu_number>

Evaluation

To evaluate a model saved under <experiment_id> on the captioning task, run

cd muscaps/scripts/
python caption.py <experiment_id> --metrics True

Cite

@misc{manco2021muscaps,
      title={MusCaps: Generating Captions for Music Audio}, 
      author={Ilaria Manco and Emmanouil Benetos and Elio Quinton and Gyorgy Fazekas},
      year={2021},
      eprint={2104.11984},
      archivePrefix={arXiv}
}

Acknowledgements

This repo reuses some code from the following repos:

Contact

If you have any questions, please get in touch: [email protected].

Owner
Ilaria Manco
AI & Music PhD Researcher at the Centre for Digital Music (QMUL)
Ilaria Manco
Python library for loading and using triangular meshes.

Trimesh is a pure Python (2.7-3.4+) library for loading and using triangular meshes with an emphasis on watertight surfaces. The goal of the library i

Michael Dawson-Haggerty 2.2k Jan 07, 2023
Leveraging OpenAI's Codex to solve cornerstone problems in Music

Music-Codex Leveraging OpenAI's Codex to solve cornerstone problems in Music Please NOTE: Presented generated samples were created by OpenAI's Codex P

Alex 2 Mar 11, 2022
Novel Instances Mining with Pseudo-Margin Evaluation for Few-Shot Object Detection

Novel Instances Mining with Pseudo-Margin Evaluation for Few-Shot Object Detection (NimPme) The official implementation of Novel Instances Mining with

12 Sep 08, 2022
PyTorch Implementation of NCSOFT's FastPitchFormant: Source-filter based Decomposed Modeling for Speech Synthesis

FastPitchFormant - PyTorch Implementation PyTorch Implementation of FastPitchFormant: Source-filter based Decomposed Modeling for Speech Synthesis. Qu

Keon Lee 63 Jan 02, 2023
This is the official pytorch implementation for our ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering" on VQA Task

🌈 ERASOR (RA-L'21 with ICRA Option) Official page of "ERASOR: Egocentric Ratio of Pseudo Occupancy-based Dynamic Object Removal for Static 3D Point C

Hyungtae Lim 225 Dec 29, 2022
Pytorch implementation of Generative Models as Distributions of Functions 🌿

Generative Models as Distributions of Functions This repo contains code to reproduce all experiments in Generative Models as Distributions of Function

Emilien Dupont 117 Dec 29, 2022
PyTorch CZSL framework containing GQA, the open-world setting, and the CGE and CompCos methods.

Compositional Zero-Shot Learning This is the official PyTorch code of the CVPR 2021 works Learning Graph Embeddings for Compositional Zero-shot Learni

EML Tübingen 70 Dec 27, 2022
Code for CPM-2 Pre-Train

CPM-2 Pre-Train Pre-train CPM-2 此分支为110亿非 MoE 模型的预训练代码,MoE 模型的预训练代码请切换到 moe 分支 CPM-2技术报告请参考link。 0 模型下载 请在智源资源下载页面进行申请,文件介绍如下: 文件名 描述 参数大小 100000.tar

Tsinghua AI 136 Dec 28, 2022
Deep Learning Pipelines for Apache Spark

Deep Learning Pipelines for Apache Spark The repo only contains HorovodRunner code for local CI and API docs. To use HorovodRunner for distributed tra

Databricks 2k Jan 08, 2023
CR-Fill: Generative Image Inpainting with Auxiliary Contextual Reconstruction. ICCV 2021

crfill Usage | Web App | | Paper | Supplementary Material | More results | code for paper ``CR-Fill: Generative Image Inpainting with Auxiliary Contex

182 Dec 20, 2022
Code, environments, and scripts for the paper: "How Private Is Your RL Policy? An Inverse RL Based Analysis Framework"

Privacy-Aware Inverse RL (PRIL) Analysis Framework Code, environments, and scripts for the paper: "How Private Is Your RL Policy? An Inverse RL Based

1 Dec 06, 2021
Provided is code that demonstrates the training and evaluation of the work presented in the paper: "On the Detection of Digital Face Manipulation" published in CVPR 2020.

FFD Source Code Provided is code that demonstrates the training and evaluation of the work presented in the paper: "On the Detection of Digital Face M

88 Nov 22, 2022
Code for paper "Do Language Models Have Beliefs? Methods for Detecting, Updating, and Visualizing Model Beliefs"

This is the codebase for the paper: Do Language Models Have Beliefs? Methods for Detecting, Updating, and Visualizing Model Beliefs Directory Structur

Peter Hase 19 Aug 21, 2022
Discord-Protect is a simple discord bot allowing you to have some security on your discord server by ordering a captcha to the user who joins your server.

Discord-Protect Discord-Protect is a simple discord bot allowing you to have some security on your discord server by ordering a captcha to the user wh

Tir Omar 2 Oct 28, 2021
Generic image compressor for machine learning. Pytorch code for our paper "Lossy compression for lossless prediction".

Lossy Compression for Lossless Prediction Using: Training: This repostiory contains our implementation of the paper: Lossy Compression for Lossless Pr

Yann Dubois 84 Jan 02, 2023
This is a TensorFlow implementation for C2-Rec

This is a TensorFlow implementation for C2-Rec We refer to the repo SASRec. Requirements requirement.txt Datasets This repo includes Amazon Beauty dat

7 Nov 14, 2022
This repository contains the official code of the paper Equivariant Subgraph Aggregation Networks (ICLR 2022)

Equivariant Subgraph Aggregation Networks (ESAN) This repository contains the official code of the paper Equivariant Subgraph Aggregation Networks (IC

Beatrice Bevilacqua 59 Dec 13, 2022
ReLoss - Official implementation for paper "Relational Surrogate Loss Learning" ICLR 2022

Relational Surrogate Loss Learning (ReLoss) Official implementation for paper "R

Tao Huang 31 Nov 22, 2022
A Loss Function for Generative Neural Networks Based on Watson’s Perceptual Model

This repository contains the similarity metrics designed and evaluated in the paper, and instructions and code to re-run the experiments. Implementation in the deep-learning framework PyTorch

Steffen 86 Dec 27, 2022
Cross-Document Coreference Resolution

Cross-Document Coreference Resolution This repository contains code and models for end-to-end cross-document coreference resolution, as decribed in ou

Arie Cattan 29 Nov 28, 2022