Self-Supervised Contrastive Learning of Music Spectrograms

Overview

Self-Supervised Music Analysis

Self-Supervised Contrastive Learning of Music Spectrograms

Dataset

Songs on the Billboard Year End Hot 100 were collected from the years 1960-2020. This list tracks the top songs of the US market for a given calendar year based on aggregating metrics including streaming plays, physical and digital purchases, radio plays, etc. In total the dataset includes 5737 songs, excluding some songs which could not be found and some which are duplicates across multiple years. It’s worth noting that the types of songs that are able to make it onto this sort of list represent a very narrow subset of the overall variety of the US music market, let alone the global music market. So while we can still learn some interesting things from this dataset, we shouldn’t mistake it for being representative of music in general.

Raw audio files were processed into spectrograms using a synchrosqueeze CWT algorithm from the ssqueezepy python library. Some additional cleaning and postprocessing was done and the spectrograms were saved as grayscale images. These images are structured so that the Y axis which spans 256 pixels represents a range of frequencies from 30Hz – 12kHz with a log scale. The X axis represents time with a resolution of 200 pixels per second. Pixel intensity therefore encodes the signal energy at a particular frequency at a moment in time.

The full dataset can be found here: https://www.kaggle.com/tpapp157/billboard-hot-100-19602020-spectrograms

Model and Training

A 30 layer ResNet styled CNN architecture was used as the primary feature extraction network. This was augmented with learned position embeddings along the frequency axis inserted at regular block intervals. Features were learned in a completely self-supervised fashion using Contrastive Learning. Matched pairs were taken as random 256x1024 pixel crops (corresponding to ~5 seconds of audio) from each song with no additional augmentations.

Output feature vectors have 512 channels representing a 64 pixel span (~0.3 seconds of audio).

Results

The entirety of each song was processed via the feature extractor with the resulting song matrix averaged across the song length into a single vector. UMAP is used for visualization and HDBSCAN for cluster extraction producing the following plot:

Each color represents a cluster (numbered 0-16) of similar songs based on the learned features. Immediately we can see a very clear structure in the data, showing the meaningful features have been learned. We can also color the points by year of release:

Points are colored form oldest (dark) to newest (light). As expected, the distribution of music has changed over the last 60 years. This gives us some confidence that the learned features are meaningful but let’s try a more specific test. A gradient boosting regressor model is trained on the learned features to predict the release year of a song.

The model achieves an overall mean absolute error of ~6.2 years. The violin and box plots show the distribution of predictions for songs in each year. This result is surprisingly good considering we wouldn’t expect a model get anywhere near perfect. The plot shows some interesting trends in how the predicted median and overall variance shift from year to year. Notice, for example, the high variance and rapid median shift across the years 1990 to 2000 compared to the decades before and after. This hints at some potential significant changes in the structure of music during this decade. Those with a knowledge of modern musical history probably already have some ideas in mind. Again, it’s worth noting that this dataset represents generically popular music which we would expect to lag behind specific music trends (probably by as much as 5-10 years).

Let’s bring back the 17 clusters that were identified previously and look at the distribution of release years of songs in each cluster. The black grouping labeled -1 captures songs which were not strongly allocated to any particular cluster and is simply included for completeness.

Here again we see some interesting trends of clusters emerging, peaking, and even dying out at various points in time. Aligning with out previous chart, we see four distinct clusters (7, 10, 11, 12) die off in the 90s while two brand new clusters (3, 4) emerge. Other clusters (8, 9, 15), interestingly, span most or all of the time range.

We can also look at the relative allocation of songs to clusters by year to get a better sense of the overall size of each cluster.

Cluster Samples

So what exactly are these clusters? I’ve provided links below to ten representative songs from each cluster so you can make your own qualitative evaluation. Before going further and listening to these songs I want to encourage you loosen your preconceived notions of musical genre. Popular conception of musical genres typically includes non-musical aspects like lyrics, theme, particular instruments, artist demographics, singer accent, year of release, marketing, etc. These aspects are not captured in the dataset and therefore not represented below but with an open ear you may find examples of songs that you considered to be different genres are actually quite musically similar.

Cluster 0

Cluster 1

Cluster 2

Cluster 3

Cluster 4

Cluster 5

Cluster 6

Cluster 7

Cluster 8

Cluster 9

Cluster 10

Cluster 11

Cluster 12

Cluster 13

Cluster 14

Cluster 15

Cluster 16

Customer Segmentation using RFM

Customer-Segmentation-using-RFM İş Problemi Bir e-ticaret şirketi müşterilerini segmentlere ayırıp bu segmentlere göre pazarlama stratejileri belirlem

Nazli Sener 7 Dec 26, 2021
Official implementation of Rich Semantics Improve Few-Shot Learning (BMVC, 2021)

Rich Semantics Improve Few-Shot Learning Paper Link Abstract : Human learning benefits from multi-modal inputs that often appear as rich semantics (e.

Mohamed Afham 11 Jul 26, 2022
Codes and models of NeurIPS2021 paper - DominoSearch: Find layer-wise fine-grained N:M sparse schemes from dense neural networks

DominoSearch This is repository for codes and models of NeurIPS2021 paper - DominoSearch: Find layer-wise fine-grained N:M sparse schemes from dense n

11 Sep 10, 2022
A Peer-to-peer Platform for Secure, Privacy-preserving, Decentralized Data Science

PyGrid is a peer-to-peer network of data owners and data scientists who can collectively train AI models using PySyft. PyGrid is also the central serv

OpenMined 615 Jan 03, 2023
This package contains a PyTorch Implementation of IB-GAN of the submitted paper in AAAI 2021

The PyTorch implementation of IB-GAN model of AAAI 2021 This package contains a PyTorch implementation of IB-GAN presented in the submitted paper (IB-

Insu Jeon 9 Mar 30, 2022
A selection of State Of The Art research papers (and code) on human locomotion (pose + trajectory) prediction (forecasting)

A selection of State Of The Art research papers (and code) on human trajectory prediction (forecasting). Papers marked with [W] are workshop papers.

Karttikeya Manglam 40 Nov 18, 2022
Implementation of ConvMixer-Patches Are All You Need? in TensorFlow and Keras

Patches Are All You Need? - ConvMixer ConvMixer, an extremely simple model that is similar in spirit to the ViT and the even-more-basic MLP-Mixer in t

Sayan Nath 8 Oct 03, 2022
Implementation of the famous Image Manipulation\Forgery Detector "ManTraNet" in Pytorch

Who has never met a forged picture on the web ? No one ! Everyday we are constantly facing fake pictures touched up in Photoshop but it is not always

Rony Abecidan 77 Dec 16, 2022
UpChecker is a simple opensource project to host it fast on your server and check is server up, view statistic, get messages if it is down. UpChecker - just run file and use project easy

UpChecker UpChecker is a simple opensource project to host it fast on your server and check is server up, view statistic, get messages if it is down.

Yan 4 Apr 07, 2022
Prototypical Pseudo Label Denoising and Target Structure Learning for Domain Adaptive Semantic Segmentation (CVPR 2021)

Prototypical Pseudo Label Denoising and Target Structure Learning for Domain Adaptive Semantic Segmentation (CVPR 2021, official Pytorch implementatio

Microsoft 247 Dec 25, 2022
Scalable training for dense retrieval models.

Scalable implementation of dense retrieval. Training on cluster By default it trains locally: PYTHONPATH=.:$PYTHONPATH python dpr_scale/main.py traine

Facebook Research 90 Dec 28, 2022
3ds-Ghidra-Scripts - Ghidra scripts to help with 3ds reverse engineering

3ds Ghidra Scripts These are ghidra scripts to help with 3ds reverse engineering

Zak 7 May 23, 2022
InterFaceGAN - Interpreting the Latent Space of GANs for Semantic Face Editing

InterFaceGAN - Interpreting the Latent Space of GANs for Semantic Face Editing Figure: High-quality facial attributes editing results with InterFaceGA

GenForce: May Generative Force Be with You 1.3k Jan 09, 2023
This code is an unofficial implementation of HiFiSinger.

HiFiSinger This code is an unofficial implementation of HiFiSinger. The algorithm is based on the following papers: Chen, J., Tan, X., Luan, J., Qin,

Heejo You 87 Dec 23, 2022
TensorFlow 101: Introduction to Deep Learning for Python Within TensorFlow

TensorFlow 101: Introduction to Deep Learning I have worked all my life in Machine Learning, and I've never seen one algorithm knock over its benchmar

Sefik Ilkin Serengil 896 Jan 04, 2023
An easy-to-use app to visualise attentions of various VQA models.

Ask Me Anything: A tool for visualising Visual Question Answering (AMA) An easy-to-use app to visualise attentions of various VQA models. Please click

Apoorve 37 Nov 13, 2022
Deploy tensorflow graphs for fast evaluation and export to tensorflow-less environments running numpy.

Deploy tensorflow graphs for fast evaluation and export to tensorflow-less environments running numpy. Now with tensorflow 1.0 support. Evaluation usa

Marcel R. 349 Aug 06, 2022
NL-Augmenter 🦎 → 🐍 A Collaborative Repository of Natural Language Transformations

NL-Augmenter 🦎 → 🐍 The NL-Augmenter is a collaborative effort intended to add transformations of datasets dealing with natural language. Transformat

684 Jan 09, 2023
Enabling dynamic analysis of Legacy Embedded Systems in full emulated environment

PENecro This project is based on "Enabling dynamic analysis of Legacy Embedded Systems in full emulated environment", published on hardwear.io USA 202

Ta-Lun Yen 10 May 17, 2022
Deep Q-network learning to play flappybird.

AI Plays Flappy Bird I've trained a DQN that learns to play flappy bird on it's own. Try the pre-trained model First install the pip requirements and

Anish Shrestha 3 Mar 01, 2022