Audio augmentations library for PyTorch for audio in the time-domain

Overview

Audio Augmentations

DOI

Audio augmentations library for PyTorch for audio in the time-domain, with support for stochastic data augmentations as used often in self-supervised / contrastive learning.

Usage

We can define several audio augmentations, which will be applied sequentially to a raw audio waveform:

from audio_augmentations import *

audio, sr = torchaudio.load("tests/classical.00002.wav")

num_samples = sr * 5
transforms = [
    RandomResizedCrop(n_samples=num_samples),
    RandomApply([PolarityInversion()], p=0.8),
    RandomApply([Noise(min_snr=0.3, max_snr=0.5)], p=0.3),
    RandomApply([Gain()], p=0.2),
    RandomApply([HighLowPass(sample_rate=sr)], p=0.8),
    RandomApply([Delay(sample_rate=sr)], p=0.5),
    RandomApply([PitchShift(
        n_samples=num_samples,
        sample_rate=sr
    )], p=0.4),
    RandomApply([Reverb(sample_rate=sr)], p=0.3)
]

We can return either one or many versions of the same audio example:

transform = Compose(transforms=transforms)
transformed_audio =  transform(audio)
>> transformed_audio.shape[0] = 1
> transformed_audio.shape[0] = 4 ">
audio = torchaudio.load("testing/classical.00002.wav")
transform = ComposeMany(transforms=transforms, num_augmented_samples=4)
transformed_audio = transform(audio)
>> transformed_audio.shape[0] = 4

Similar to the torchvision.datasets interface, an instance of the Compose or ComposeMany class can be supplied to a torchaudio dataloaders that accept transform=.

Optional

Install WavAugment for reverberation / pitch shifting:

pip install git+https://github.com/facebookresearch/WavAugment

Cite

You can cite this work with the following BibTeX:

@misc{spijkervet_torchaudio_augmentations,
  doi = {10.5281/ZENODO.4748582},
  url = {https://zenodo.org/record/4748582},
  author = {Spijkervet,  Janne},
  title = {Spijkervet/torchaudio-augmentations},
  publisher = {Zenodo},
  year = {2021},
  copyright = {MIT License}
}
Comments
  • Delay augmentation on cuda

    Delay augmentation on cuda

    Hi. Currently the delay augmentation doesn't work on gpu since part of the signal is on cpu. I think making thebeginning tensor same as the audio tensor device should fix it. Thanks. https://github.com/Spijkervet/torchaudio-augmentations/blob/d044f9d020e12032ab9280acf5f34a337e72d212/torchaudio_augmentations/augmentations/delay.py#L31

    opened by sidml 2
  • Correctness unit test would be great

    Correctness unit test would be great

    For some transforms, we can test if the values are actually correct by manually computing the expected value. For example, PolarityInversion could be test with some tiny tensors like [[0.1, 0.5, -1.0]]. Reverse as well. Probably only those two? Still, it'd be better than not having any.

    opened by keunwoochoi 2
  • Default value of `max_snr` in `Noise`

    Default value of `max_snr` in `Noise`

    1.0 of SNR with signal and white noise would be a really heavily corrupted signal. Could we set it to be a little more reasonable value?

    Related; it'd be great if one can hear some examples of the augmented result.

    opened by keunwoochoi 2
  • End-to-end PitchShift transform tests

    End-to-end PitchShift transform tests

    This merge requests adds end-to-end pitch transformation detection with librosa's pYIN pitch detection, to test if the applied transformation yields the expected pitch transposition.

    opened by Spijkervet 0
  • Unittests

    Unittests

    This adds various unittests and fixes to multi-channel input for Reverb, Pitch, Reverse and HighLowPass filter augmentations. It also removes Essentia as a dependency, and instead uses julius for IRR filtering.

    opened by Spijkervet 0
  • import error

    import error

    when i import torchaudio_augmentation

    I got the error

    RuntimeError : torchaudio.sox_effects.sox_effects.effect_names requires module: torchaudio._torchaudio

    how can I deal with it?

    opened by EavnJeong 0
  • Snr db

    Snr db

    Hi, Thanks for the interesting work. Allow me to suggest this change for two reasons:

    • Expressing SNR in dB cancels the doubt there might be between power SNR and RMS SNR.
    • When sampling an SNR, it feels to me like it makes more sense to uniformly sample from the log scale of the dB than on the linear range. This way you ensure that your low SNR have as much chances as your high SNR.

    I hope it makes sens. I'd be glad to discuss further about that.

    opened by wesbz 0
  • sanity check for duration

    sanity check for duration

    In transforms where the duration may change, if the input audio is shorter than n_samples, the error message is not intuitive. I forgot but in some case, the multiprocessing-based dataloader silently died. Maybe it's worth checking it somewhere?

    opened by keunwoochoi 0
  • Shapes are still a bit confusing

    Shapes are still a bit confusing

    From ComposeMany.__call__(), is x also a ch, time shape 2-dim tensor? And I'm sure what would be the expected behavior by this function, especially the shape of the output.

    opened by keunwoochoi 3
Releases(v0.2.3)
Owner
Janne
Music producer, machine learning in MIR & occasional ethical hacker
Janne
a library for audio and music analysis

aubio aubio is a library to label music and sounds. It listens to audio signals and attempts to detect events. For instance, when a drum is hit, at wh

aubio 2.9k Dec 30, 2022
extract unpack asset file (form unreal engine 4 pak) with extenstion *.uexp which contain awb/acb (cri/cpk like) sound or music resource

Uexp2Awb extract unpack asset file (form unreal engine 4 pak) with extenstion .uexp which contain awb/acb (cri/cpk like) sound or music resource. i ju

max 6 Jun 22, 2022
A python script that can play .mp3 URLs upon the ringing or motion detection of a Ring doorbell. The sound plays through Sonos speakers.

Ring x Sonos A python script that plays .mp3 files whenever a doorbell is rung or a doorbell detects motion. Features Music! Authors @braden Running T

braden 0 Nov 12, 2021
XA Music Player - Telegram Music Bot

XA Music Player Requirements 📝 FFmpeg (Latest) NodeJS nodesource.com (NodeJS 17+) Python (3.10+) PyTgCalls (Lastest) MongoDB (3.12.1) 2nd Telegram Ac

RexAshh 3 Jun 30, 2022
Speech Algorithms Collections

Speech Algorithms Collections

Ryuk 498 Jan 06, 2023
Music generation using ml / dl

Data analysis Document here the project: deep_music Description: Project Description Data Source: Type of analysis: Please document the project the be

0 Jul 03, 2022
This is a python package that turns any images into MIDI files that views the same as them

image_to_midi This is a python package that turns any images into MIDI files that views the same as them. This package firstly convert the image to AS

Rainbow Dreamer 4 Mar 10, 2022
Audio features extraction

Yaafe Yet Another Audio Feature Extractor Build status Branch master : Branch dev : Anaconda : Install Conda Yaafe can be easily install with conda. T

Yaafe 231 Dec 26, 2022
A Python library and tools AUCTUS A6 based radios.

A Python library and tools AUCTUS A6 based radios.

Jonathan Hart 6 Nov 23, 2022
This Is Telegram Music UserBot To Play Music Without Being Admin

This Is Telegram Music UserBot To Play Music Without Being Admin

Krishna Kumar 36 Sep 13, 2022
Music player and music library manager for Linux, Windows, and macOS

Ex Falso / Quod Libet - A Music Library / Editor / Player Quod Libet is a music management program. It provides several different ways to view your au

Quod Libet 1.2k Jan 07, 2023
Desktop music recognition application for windows

MusicRecognizer Music recognition application for windows You can choose from which of the devices the recording will be made. If you choose speakers,

Nikita Merzlyakov 28 Dec 13, 2022
A python package for calculating the PESQ.

PyPESQ (WIP) Pypesq is a python wrapper for the PESQ score calculation C routine. It only can be used in evaluation purpose. INSTALL pip install https

Jingdong Li 269 Dec 18, 2022
Anaphones are like anagrams, but for sounds.

Anaphones Anaphones are like anagrams but for sounds (phonemes). Examples include: salami-awesomely, atari-tiara, and beefy-phoebe. Anaphones can be a

James Murphy 18 Nov 02, 2022
A Python port and library-fication of the midicsv tool by John Walker.

A Python port and library-fication of the midicsv tool by John Walker. If you need to convert MIDI files to human-readable text files and back, this is the library for you.

Tim Wedde 52 Dec 29, 2022
Read music meta data and length of MP3, OGG, OPUS, MP4, M4A, FLAC, WMA and Wave files with python 2 or 3

tinytag tinytag is a library for reading music meta data of MP3, OGG, OPUS, MP4, M4A, FLAC, WMA and Wave files with python Install pip install tinytag

Tom Wallroth 577 Dec 26, 2022
Simple, hackable offline speech to text - using the VOSK-API.

Nerd Dictation Offline Speech to Text for Desktop Linux. This is a utility that provides simple access speech to text for using in Linux without being

Campbell Barton 844 Jan 07, 2023
Gateware for the Terasic/Arrow DECA board, to become a USB2 high speed audio interface

DECA USB Audio Interface DECA based USB 2.0 High Speed audio interface Status / current limitations enumerates as class compliant audio device on Linu

Hans Baier 16 Mar 21, 2022
Spotipy - Player de música simples em Python

Spotipy Player de música simples em Python, utilizando a biblioteca Pysimplegui para a interface gráfica. Este tocador é bastante simples em si, mas p

Adelino Almeida 4 Feb 28, 2022
Sparse Beta-Divergence Tensor Factorization Library

NTFLib Sparse Beta-Divergence Tensor Factorization Library Based off of this beta-NTF project this library is specially-built to handle tensors where

Stitch Fix Technology 46 Jan 08, 2022