Text-to-Music Retrieval using Pre-defined/Data-driven Emotion Embeddings

Last update: Dec 05, 2022

Overview

Text2Music Emotion Embedding

Text-to-Music Retrieval using Pre-defined/Data-driven Emotion Embeddings

Reference

Emotion Embedding Spaces for Matching Music to Stories, ISMIR 2021 [paper]

-- Minz Won, Justin Salamon, Nicholas J. Bryan, Gautham J. Mysore, and Xavier Serra

@inproceedings{won2021emotion,
  title={Emotion embedding spaces for matching music to stories},
  author={Won, Minz. and Salamon, Justin. and Bryan, Nicholas J. and Mysore, Gautham J. and Serra, Xavier.},
  booktitle={ISMIR},
  year={2021}
}

Requirements

conda create -n YOUR_ENV_NAME python=3.7
conda activate YOUR_ENV_NAME
pip install -r requirements.txt

Data

You need to collect audio files of AudioSet mood subset (link).
Read the audio files and store them into .npy format.
Other relevant data including Alm's dataset (original link), ISEAR dataset (original link), emotion embeddings, pretrained Word2Vec, and data splits are all available here (link).
Unzip ttm_data.tar.gz and locate the extracted data folder under text2music-emotion-embedding/.

Training

Here is an example for training a metric learning model.

python3 src/metric_learning/main.py \
        --dataset 'isear' \
        --num_branches 3 \
        --data_path YOUR_DATA_PATH_TO_AUDIOSET

Fore more examples, check bash files under scripts folder.

Test

Here is an example for the test.

python3 src/metric_learning/main.py \
        --mode 'TEST' \
        --dataset 'alm' \
        --model_load_path 'data/pretrained/alm_cross.ckpt' \
        --data_path 'YOUR_DATA_PATH_TO_AUDIOSET'

Pretrained three-branch metric learning models (alm_cross.ckpt and isear_cross.ckpt) are included in ttm_data.tar.gz. This code is reproducible by locating the unzipped data folder under text2music-emotion-embedding/.

Visualization

Embedding distribution of each model can be projected onto 2-dimensional space. We used uniform manifold approximation and projection (UMAP) to visualize the distribution. UMAP is known to preserve more of global structure compared to t-SNE.

Demo

Please try some examples done by the three-branch metric learning model [Soundcloud].

License

Some License

Text-to-Music Retrieval using Pre-defined/Data-driven Emotion Embeddings

Related tags

Overview

Text2Music Emotion Embedding

Reference

Requirements

Data

Training

Test

Visualization

Demo

License

Owner

Minz Won

OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark

Implementation of the algorithm shown in the article "Modelo de Predicción de Éxito de Canciones Basado en Descriptores de Audio"

Learning Calibrated-Guidance for Object Detection in Aerial Images

Learning Saliency Propagation for Semi-supervised Instance Segmentation

Source code for the NeurIPS 2021 paper "On the Second-order Convergence Properties of Random Search Methods"

《Image2Reverb: Cross-Modal Reverb Impulse Response Synthesis》(2021)

🛠️ Tools for Transformers compression using Lightning ⚡

Implementation of PyTorch-based multi-task pre-trained models

Mask-invariant Face Recognition through Template-level Knowledge Distillation

Keras Image Embeddings using Contrastive Loss

Official implement of Paper：A deeply supervised image fusion network for change detection in high resolution bi-temporal remote sening images

Implementation of ResMLP, an all MLP solution to image classification, in Pytorch

PyTorch implementation of paper “Unbiased Scene Graph Generation from Biased Training”

A Benchmark For Measuring Systematic Generalization of Multi-Hierarchical Reasoning

Official PyTorch implementation of paper: Standardized Max Logits: A Simple yet Effective Approach for Identifying Unexpected Road Obstacles in Urban-Scene Segmentation (ICCV 2021 Oral Presentation)

Official implementation of Influence-balanced Loss for Imbalanced Visual Classification in PyTorch.

Code to accompany our paper "Continual Learning Through Synaptic Intelligence" ICML 2017

[CVPR 2022 Oral] MixFormer: End-to-End Tracking with Iterative Mixed Attention

keyframes-CNN-RNN(action recognition)

RCT-ART is an NLP pipeline built with spaCy for converting clinical trial result sentences into tables through jointly extracting intervention, outcome and outcome measure entities and their relations.