Music Classification: Beyond Supervised Learning, Towards Real-world Applications

Related tags

Deep Learningtutorial
Overview

Music Classification: Beyond Supervised Learning, Towards Real-world Applications

Jupyter Book Badge

About the book

This is a web book written for a tutorial session of the 22nd International Society for Music Information Retrieval Conference, Nov 8-12, 2021, in an online format. The ISMIR conference is the world’s leading research forum on processing, searching, organising and accessing music-related data.

Motivation

Lower the barrier: As deep learning emerges, music classification research has entered a new phase, and many data-driven approaches have been proposed to solve the problem. However, researchers sometimes use jargon in various ways. Also, some implementation details and evaluation methods are ambiguously described in the papers, blocking access to the information without personal contact. These are tremendous obstacles when new researchers want to dive into this fascinating research area. Through this book, we would like to lower the barrier for newcomers and reduce miscommunication between researchers by sharing the secrets.

Cope with data issue: Another issue that we are facing under the deep learning era is the exhaustion of labeled data. Labeling musical attributes requires strong domain knowledge and a significant amount of time for listening; hence expensive. Because of this, deep learning researchers started actively utilizing large-scale unlabeled data. This book introduces the recent advances in semi- and self-supervised learning that enables music classification models to step further beyond supervised learning.

Narrow the gap: Music classification has been applied to solve real-world problems successfully. However, some important procedures and considerations for real-world applications are rarely discussed as research topics. In this book, based on the various industry experiences of the authors, we try our best to raise the awareness of these questions and provide answers and perspectives. We hope this helps academia and industries harmonize better together.

About the authors

Minz Won is a Ph.D candidate at the Music Technology Group (MTG) of Universitat Pompeu Fabra in Barcelona, Spain. His research focus is music representation learning. Along with his academic career, he has put his knowledge into practice with industry internships at Kakao Corp., Naver Corp., Pandora, Adobe, and he recently joined ByteDance as a research scientist. He contributed to the winning entry in the WWW 2018 Challenge: Learning to Recognize Musical Genre.

Janne Spijkervet graduated from the University of Amsterdam in 2021 with her Master's thesis titled "Contrastive Learning of Musical Representations". The paper with the same title was published in 2020 on self-supervised learning on raw audio in music tagging. She has started at ByteDance as a research scientist (2020 - present), developing generative models for music creation. She is also a songwriter and music producer, and explores the design and use of machine learning technology in her music.

Keunwoo Choi is a senior research scientist at ByteDance, developing machine learning products for music recommendation and discovery. He received a Ph.D degree from Queen Mary University of London (c4dm) in 2018. As a researcher, he also has been working at Spotify (2018 - 2020) and several other music companies as well as open-source projects such as Kapre, librosa, and torchaudio. He also writes some music.

Citing this book

@book{musicclassification:book,
	Author = {Minz Won, Janne Spijkervet, and Keunwoo Choi},
	Month = Nov.,
	Publisher = {https://music-classification.github.io/tutorial},
	Title = {Music Classification: Beyond Supervised Learning, Towards Real-world Applications},
	Year = 2021,
	Url = {https://music-classification.github.io/tutorial}
}
PICK: Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional Networks

Code for the paper "PICK: Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional Networks" (ICPR 2020)

Wenwen Yu 498 Dec 24, 2022
Self-Supervised Pre-Training for Transformer-Based Person Re-Identification

Self-Supervised Pre-Training for Transformer-Based Person Re-Identification [pdf] The official repository for Self-Supervised Pre-Training for Transfo

Hao Luo 116 Jan 04, 2023
My freqtrade strategies

My freqtrade-strategies Hi there! This is repo for my freqtrade-strategies. My name is Ilya Zelenchuk, I'm a lecturer at the SPbU university (https://

171 Dec 05, 2022
Parameter Efficient Deep Probabilistic Forecasting

PEDPF Parameter Efficient Deep Probabilistic Forecasting (PEDPF) is a repository containing code to run experiments for several deep learning based pr

Olivier Sprangers 10 Jun 13, 2022
A port of muP to JAX/Haiku

MUP for Haiku This is a (very preliminary) port of Yang and Hu et al.'s μP repo to Haiku and JAX. It's not feature complete, and I'm very open to sugg

18 Dec 30, 2022
Course materials for Fall 2021 "CIS6930 Topics in Computing for Data Science" at New College of Florida

Fall 2021 CIS6930 Topics in Computing for Data Science This repository hosts course materials used for a 13-week course "CIS6930 Topics in Computing f

Yoshi Suhara 101 Nov 30, 2022
EgGateWayGetShell py脚本

EgGateWayGetShell_py 免责声明 由于传播、利用此文所提供的信息而造成的任何直接或者间接的后果及损失,均由使用者本人负责,作者不为此承担任何责任。 使用 python3 eg.py urls.txt 目标 title:锐捷网络-EWEB网管系统 port:4430 漏洞成因 ?p

榆木 61 Nov 09, 2022
AFL binary instrumentation

E9AFL --- Binary AFL E9AFL inserts American Fuzzy Lop (AFL) instrumentation into x86_64 Linux binaries. This allows binaries to be fuzzed without the

242 Dec 12, 2022
Blind visual quality assessment on 360° Video based on progressive learning

Blind visual quality assessment on omnidirectional or 360 video (ProVQA) Blind VQA for 360° Video via Progressively Learning from Pixels, Frames and V

5 Jan 06, 2023
Jittor is a high-performance deep learning framework based on JIT compiling and meta-operators.

Jittor: a Just-in-time(JIT) deep learning framework Quickstart | Install | Tutorial | Chinese Jittor is a high-performance deep learning framework bas

2.7k Jan 03, 2023
Revisting Open World Object Detection

Revisting Open World Object Detection Installation See INSTALL.md. Dataset Our n

58 Dec 23, 2022
NeRF Meta-Learning with PyTorch

NeRF Meta Learning With PyTorch nerf-meta is a PyTorch re-implementation of NeRF experiments from the paper "Learned Initializations for Optimizing Co

Sanowar Raihan 78 Dec 18, 2022
An implementation of quantum convolutional neural network with MindQuantum. Huawei, classifying MNIST dataset

关于实现的一点说明 山东大学 2020级 苏博南 www.subonan.com 文件说明 tools.py 这里面主要有两个函数: resize(a, lenb) 这其实是我找同学写的一个小算法hhh。给出一个$28\times 28$的方阵a,返回一个$lenb\times lenb$的方阵。因

ぼっけなす 2 Aug 29, 2022
The trained model and denoising example for paper : Cardiopulmonary Auscultation Enhancement with a Two-Stage Noise Cancellation Approach

The trained model and denoising example for paper : Cardiopulmonary Auscultation Enhancement with a Two-Stage Noise Cancellation Approach

ycj_project 1 Jan 18, 2022
Multi-Person Extreme Motion Prediction

Multi-Person Extreme Motion Prediction Implementation for paper Wen Guo, Xiaoyu Bie, Xavier Alameda-Pineda, Francesc Moreno-Noguer, Multi-Person Extre

GUO-W 38 Nov 15, 2022
Self-Supervised Generative Style Transfer for One-Shot Medical Image Segmentation

Self-Supervised Generative Style Transfer for One-Shot Medical Image Segmentation This repository contains the Pytorch implementation of the proposed

Devavrat Tomar 19 Nov 10, 2022
A Python parser that takes the content of a text file and then reads it into variables.

Text-File-Parser A Python parser that takes the content of a text file and then reads into variables. Input.text File 1. What is your ***? 1. 18 -

Kelvin 0 Jul 26, 2021
3.8% and 18.3% on CIFAR-10 and CIFAR-100

Wide Residual Networks This code was used for experiments with Wide Residual Networks (BMVC 2016) http://arxiv.org/abs/1605.07146 by Sergey Zagoruyko

Sergey Zagoruyko 1.2k Dec 29, 2022
GLODISMO: Gradient-Based Learning of Discrete Structured Measurement Operators for Signal Recovery

GLODISMO: Gradient-Based Learning of Discrete Structured Measurement Operators for Signal Recovery This is the code to the paper: Gradient-Based Learn

3 Feb 15, 2022
An unreferenced image captioning metric (ACL-21)

UMIC This repository provides an unferenced image captioning metric from our ACL 2021 paper UMIC: An Unreferenced Metric for Image Captioning via Cont

hwanheelee 14 Nov 20, 2022