A lightweight yet powerful audio-to-MIDI converter with pitch bend detection

Overview

Basic Pitch Logo

License PyPI - Python Version Supported Platforms

Basic Pitch is a Python library for Automatic Music Transcription (AMT), using lightweight neural network developed by Spotify's Audio Intelligence Lab. It's small, easy-to-use, and pip install-able.

Basic Pitch may be simple, but it's is far from "basic"! basic-pitch is efficient and easy to use, and its multipitch support, its ability to generalize across instruments, and its note accuracy competes with much larger and more resource-hungry AMT systems.

Provide a compatible audio file and basic-pitch will generate a MIDI file, complete with pitch bends. Basic pitch is instrument-agnostic and supports polyphonic instruments, so you can freely enjoy transcription of all your favorite music, no matter what instrument is used. Basic pitch works best on one instrument at a time.

Research Paper

This library was released in conjunction with Spotify's publication at ICASSP 2022. You can read more about this research in the paper, A Lightweight Instrument-Agnostic Model for Polyphonic Note Transcription and Multipitch Estimation.

If you use this library in academic research, consider citing it:

@inproceedings{2022_BittnerBRME_LightweightNoteTranscription_ICASSP,
  author= {Bittner, Rachel M. and Bosch, Juan Jos\'e and Rubinstein, David and Meseguer-Brocal, Gabriel and Ewert, Sebastian},
  title= {A Lightweight Instrument-Agnostic Model for Polyphonic Note Transcription and Multipitch Estimation},
  booktitle= {Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
  address= {Singapore},
  year= 2022,
}

Demo

If, for whatever reason, you're not yet completely inspired, or you're just like so totally over the general vibe and stuff, checkout our snappy demo website, basicpitch.io, to experiment with our model on whatever music audio you provide!

Installation

basic-pitch is available via PyPI. To install the current release:

pip install basic-pitch

To update Basic Pitch to the latest version, add --upgrade to the above command.

Compatible Environments:

  • MacOS, Windows and Ubuntu operating systems
  • Python versions 3.7, 3.8, 3.9

Usage

Model Prediction

Command Line Tool

This library offers a command line tool interface. A basic prediction command will generate and save a MIDI file transcription of audio at the <input-audio-path> to the <output-directory>:

basic-pitch <output-directory> <input-audio-path>

To process more than one audio file at a time:

basic-pitch <output-directory> <input-audio-path-1> <input-audio-path-2> <input-audio-path-3>

Optionally, you may append any of the following flags to your prediction command to save additional formats of the prediction output to the <output-directory>:

  • --sonify-midi to additionally save a .wav audio rendering of the MIDI file
  • --save-model-outputs to additionally save raw model outputs as an NPZ file
  • --save-note-events to additionally save the predicted note events as a CSV file

To discover more parameter control, run:

basic-pitch --help

Programmatic

predict()

Import basic-pitch into your own Python code and run the predict functions directly, providing an <input-audio-path> and returning the model's prediction results:

from basic_pitch.inference import predict
from basic_pitch import ICASSP_2022_MODEL_PATH

model_output, midi_data, note_activations = predict(<input-audio-path>)
  • <minimum-frequency> & <maximum-frequency> (floats) set the maximum and minimum allowed note frequency, in Hz, returned by the model. Pitch events with frequencies outside of this range will be excluded from the prediction results.
  • model_output is the raw model inference output
  • midi_data is the transcribed MIDI data derived from the model_output
  • note_events is a list of note events derived from the model_output

predict() in a loop

To run prediction within a loop, you'll want to load the model yourself and provide predict() with the loaded model object itself to be used for repeated prediction calls, in order to avoid redundant and sluggish model loading.

import tensorflow as tf

from basic_pitch.inference import predict
from basic_pitch import ICASSP_2022_MODEL_PATH

basic_pitch_model = tf.saved_model.load(str(ICASSP_2022_MODEL_PATH))

for x in range():
    ...
    model_output, midi_data, note_activations = predict(
        <loop-x-input-audio-path>,
        basic_pitch_model,
    )
    ...

predict_and_save()

If you would like basic-pitch orchestrate the generation and saving of our various supported output file types, you may use predict_and_save instead of using predict directly:

from basic_pitch.inference import predict_and_save

predict_and_save(
    <input-audio-path-list>,
    <output-directory>,
    <save-midi>,
    <sonify-midi>,
    <save-model-outputs>,
    <save-note-events>,
)

where:

  • <input-audio-path-list> & <output-directory>
    • directory paths for basic-pitch to read from/write to.
  • <save-midi>
    • bool to control generating and saving a MIDI file to the <output-directory>
  • <sonify-midi>
    • bool to control saving a WAV audio rendering of the MIDI file to the <output-directory>
  • <save-model-outputs>
    • bool to control saving the raw model output as a NPZ file to the <output-directory>
  • <save-note-events>
    • bool to control saving predicted note events as a CSV file <output-directory>

Model Input

Supported Audio Codecs

basic-pitch accepts all sound files that are compatible with its version of librosa, including:

  • .mp3
  • .ogg
  • .wav
  • .flac
  • .m4a

Mono Channel Audio Only

While you may use stereo audio as an input to our model, at prediction time, the channels of the input will be down-mixed to mono, and then analyzed and transcribed.

File Size/Audio Length

This model can process any size or length of audio, but processing of larger/longer audio files could be limited by your machine's available disk space. To process these files, we recommend streaming the audio of the file, processing windows of audio at a time.

Sample Rate

Input audio maybe be of any sample rate, however, all audio will be resampled to 22050 Hz before processing.

Contributing

Contributions to basic-pitch are welcomed! See CONTRIBUTING.md for details.

Copyright and License

basic-pitch is Copyright 2022 Spotify AB.

This software is licensed under the Apache License, Version 2.0 (the "Apache License"). You may choose either license to govern your use of this software only upon the condition that you accept all of the terms of either the Apache License.

You may obtain a copy of the Apache License at:

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the Apache License or the GPL License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the Apache License for the specific language governing permissions and limitations under the Apache License.

Audio pitch-shifting & re-sampling utility, based on the EMU SP-1200

Pitcher.py Free & OS emulation of the SP-12 & SP-1200 signal chain (now with GUI) Pitch shift / bitcrush / resample audio files Written and tested in

morgan 13 Oct 03, 2022
Audio2midi - Automatic Audio-to-symbolic Arrangement

Automatic Audio-to-symbolic Arrangement This is the repository of the project "Audio-to-symbolic Arrangement via Cross-modal Music Representation Lear

Ziyu Wang 24 Dec 05, 2022
A music player designed for a University Project.

A music player designed for a University Project. Very flexibe and easy to use, a real life working application with user friendly controls. Hope u enjoy!!

Aditya Johorey 1 Nov 19, 2021
Stream Music ๐ŸŽต ๐˜ผ ๐™—๐™ค๐™ฉ ๐™ฉ๐™๐™–๐™ฉ ๐™˜๐™–๐™ฃ ๐™ฅ๐™ก๐™–๐™ฎ ๐™ข๐™ช๐™จ๐™ž๐™˜ ๐™ค๐™ฃ ๐™๐™š๐™ก๐™š๐™œ๐™ง๐™–๐™ข ๐™‚๐™ง๐™ค๐™ช๐™ฅ ๐™–๐™ฃ๐™™ ๐˜พ๐™๐™–๐™ฃ๐™ฃ๐™š๐™ก ๐™‘๐™ค๐™ž๐™˜๐™š ๐˜พ๐™๐™–๐™ฉ๐™จ ๐˜ผ๐™ซ๐™–๐™ž๐™ก?

Stream Music ๐ŸŽต ๐˜ผ ๐™—๐™ค๐™ฉ ๐™ฉ๐™๐™–๐™ฉ ๐™˜๐™–๐™ฃ ๐™ฅ๐™ก๐™–๐™ฎ ๐™ข๐™ช๐™จ๐™ž๐™˜ ๐™ค๐™ฃ ๐™๐™š๐™ก๐™š๐™œ๐™ง๐™–๐™ข ๐™‚๐™ง๐™ค๐™ช๐™ฅ ๐™–๐™ฃ๐™™ ๐˜พ๐™๐™–๐™ฃ๐™ฃ๐™š๐™ก ๐™‘๐™ค๐™ž๐™˜๐™š ๐˜พ๐™๐™–๐™ฉ๐™จ ๐˜ผ๐™ซ๐™–๐™ž๐™ก?

Sadew Jayasekara 15 Nov 12, 2022
Implicit neural differentiable FM synthesizer

Implicit neural differentiable FM synthesizer The purpose of this project is to emulate arbitrary sounds with FM synthesis, where the parameters of th

Andreas Jansson 34 Nov 06, 2022
A Python wrapper for the high-quality vocoder "World"

PyWORLD - A Python wrapper of WORLD Vocoder Linux Windows WORLD Vocoder is a fast and high-quality vocoder which parameterizes speech into three compo

Jeremy Hsu 583 Dec 15, 2022
MusicBrainz Picard

MusicBrainz Picard MusicBrainz Picard is a cross-platform (Linux/Mac OS X/Windows) application written in Python and is the official MusicBrainz tagge

MetaBrainz Foundation 3k Dec 31, 2022
Royal Music You can play music and video at a time in vc

Royals-Music Royal Music You can play music and video at a time in vc Commands SOON String STRING_SESSION Deployment ๐ŸŽ– Credits โ€ข ๐Ÿ‡ธแดแดสแด€โƒ๐Ÿ‡ฏแด‡แด‡แด› โ€ข ๐Ÿ‡ดา“า“ษช

2 Nov 23, 2021
Python interface to the WebRTC Voice Activity Detector

py-webrtcvad This is a python interface to the WebRTC Voice Activity Detector (VAD). It is compatible with Python 2 and Python 3. A VAD classifies a p

John Wiseman 1.5k Dec 22, 2022
Audio book player for senior visually impaired.

PI Zero W Audio Book Motivation and requirements My dad is practically blind and at 80 years has trouble hearing and operating tiny or more complicate

Andrej Hosna 29 Dec 25, 2022
cross-library (GStreamer + Core Audio + MAD + FFmpeg) audio decoding for Python

audioread Decode audio files using whichever backend is available. The library currently supports: Gstreamer via PyGObject. Core Audio on Mac OS X via

beetbox 419 Dec 26, 2022
FPGA based USB 2.0 high speed audio interface featuring multiple optical ADAT inputs and outputs

ADAT USB Audio Interface FPGA based USB 2.0 High Speed audio interface featuring multiple optical ADAT inputs and outputs Status / current limitations

Hans Baier 78 Dec 31, 2022
GNU Radio โ€“ the Free and Open Software Radio Ecosystem

GNU Radio is a free & open-source software development toolkit that provides signal processing blocks to implement software radios. It can be used wit

GNU Radio 4.1k Jan 06, 2023
An 8D music player made to enjoy Halloween this year!๐Ÿค˜

HAPPY HALLOWEEN buddy! Split Player Hello There! Welcome to SplitPlayer... Supposed To Be A 8DPlayer.... You Decide.... It can play the ordinary audio

Akshat Kumar Singh 1 Nov 04, 2021
PianoPlayer - Automatic fingering generator for piano scores

PianoPlayer - Automatic fingering generator for piano scores

Marco Musy 571 Jan 02, 2023
Tune in is a Collaborative Music Playing Systems where multiple guests can join a room and enjoy the song being played

โœจA collaborative music playing systems๐ŸŽถ where multiple guests can join a room โžก๐Ÿšช and enjoy the song๐ŸŽง being played.

Vedansh Vijaywargiya 8 Nov 05, 2022
The project aims to develop a personal-assistant for Windows & Linux-based systems

The project aims to develop a personal-assistant for Windows & Linux-based systems. Samiksha draws its inspiration from virtual assistants like Cortana for Windows, and Siri for iOS. It has been desi

SHUBHANSHU RAI 1 Jan 16, 2022
Using python to generate a bat script of repetitive lines of code that differ in some way but can sort out a group of audio files according to their common names

Batch Sorting Using python to generate a bat script of repetitive lines of code that differ in some way but can sort out a group of audio files accord

David Mainoo 1 Oct 29, 2021
Voice helper on russian

Voice helper on russian

KreO 1 Jun 30, 2022
Gateware for the Terasic/Arrow DECA board, to become a USB2 high speed audio interface

DECA USB Audio Interface DECA based USB 2.0 High Speed audio interface Status / current limitations enumerates as class compliant audio device on Linu

Hans Baier 16 Mar 21, 2022