Audio augmentations library for PyTorch for audio in the time-domain

Overview

Audio Augmentations

DOI

Audio augmentations library for PyTorch for audio in the time-domain, with support for stochastic data augmentations as used often in self-supervised / contrastive learning.

Usage

We can define several audio augmentations, which will be applied sequentially to a raw audio waveform:

from audio_augmentations import *

audio, sr = torchaudio.load("tests/classical.00002.wav")

num_samples = sr * 5
transforms = [
    RandomResizedCrop(n_samples=num_samples),
    RandomApply([PolarityInversion()], p=0.8),
    RandomApply([Noise(min_snr=0.3, max_snr=0.5)], p=0.3),
    RandomApply([Gain()], p=0.2),
    RandomApply([HighLowPass(sample_rate=sr)], p=0.8),
    RandomApply([Delay(sample_rate=sr)], p=0.5),
    RandomApply([PitchShift(
        n_samples=num_samples,
        sample_rate=sr
    )], p=0.4),
    RandomApply([Reverb(sample_rate=sr)], p=0.3)
]

We can return either one or many versions of the same audio example:

transform = Compose(transforms=transforms)
transformed_audio =  transform(audio)
>> transformed_audio.shape[0] = 1
> transformed_audio.shape[0] = 4 ">
audio = torchaudio.load("testing/classical.00002.wav")
transform = ComposeMany(transforms=transforms, num_augmented_samples=4)
transformed_audio = transform(audio)
>> transformed_audio.shape[0] = 4

Similar to the torchvision.datasets interface, an instance of the Compose or ComposeMany class can be supplied to a torchaudio dataloaders that accept transform=.

Optional

Install WavAugment for reverberation / pitch shifting:

pip install git+https://github.com/facebookresearch/WavAugment

Cite

You can cite this work with the following BibTeX:

@misc{spijkervet_torchaudio_augmentations,
  doi = {10.5281/ZENODO.4748582},
  url = {https://zenodo.org/record/4748582},
  author = {Spijkervet,  Janne},
  title = {Spijkervet/torchaudio-augmentations},
  publisher = {Zenodo},
  year = {2021},
  copyright = {MIT License}
}
Comments
  • Delay augmentation on cuda

    Delay augmentation on cuda

    Hi. Currently the delay augmentation doesn't work on gpu since part of the signal is on cpu. I think making thebeginning tensor same as the audio tensor device should fix it. Thanks. https://github.com/Spijkervet/torchaudio-augmentations/blob/d044f9d020e12032ab9280acf5f34a337e72d212/torchaudio_augmentations/augmentations/delay.py#L31

    opened by sidml 2
  • Correctness unit test would be great

    Correctness unit test would be great

    For some transforms, we can test if the values are actually correct by manually computing the expected value. For example, PolarityInversion could be test with some tiny tensors like [[0.1, 0.5, -1.0]]. Reverse as well. Probably only those two? Still, it'd be better than not having any.

    opened by keunwoochoi 2
  • Default value of `max_snr` in `Noise`

    Default value of `max_snr` in `Noise`

    1.0 of SNR with signal and white noise would be a really heavily corrupted signal. Could we set it to be a little more reasonable value?

    Related; it'd be great if one can hear some examples of the augmented result.

    opened by keunwoochoi 2
  • End-to-end PitchShift transform tests

    End-to-end PitchShift transform tests

    This merge requests adds end-to-end pitch transformation detection with librosa's pYIN pitch detection, to test if the applied transformation yields the expected pitch transposition.

    opened by Spijkervet 0
  • Unittests

    Unittests

    This adds various unittests and fixes to multi-channel input for Reverb, Pitch, Reverse and HighLowPass filter augmentations. It also removes Essentia as a dependency, and instead uses julius for IRR filtering.

    opened by Spijkervet 0
  • import error

    import error

    when i import torchaudio_augmentation

    I got the error

    RuntimeError : torchaudio.sox_effects.sox_effects.effect_names requires module: torchaudio._torchaudio

    how can I deal with it?

    opened by EavnJeong 0
  • Snr db

    Snr db

    Hi, Thanks for the interesting work. Allow me to suggest this change for two reasons:

    • Expressing SNR in dB cancels the doubt there might be between power SNR and RMS SNR.
    • When sampling an SNR, it feels to me like it makes more sense to uniformly sample from the log scale of the dB than on the linear range. This way you ensure that your low SNR have as much chances as your high SNR.

    I hope it makes sens. I'd be glad to discuss further about that.

    opened by wesbz 0
  • sanity check for duration

    sanity check for duration

    In transforms where the duration may change, if the input audio is shorter than n_samples, the error message is not intuitive. I forgot but in some case, the multiprocessing-based dataloader silently died. Maybe it's worth checking it somewhere?

    opened by keunwoochoi 0
  • Shapes are still a bit confusing

    Shapes are still a bit confusing

    From ComposeMany.__call__(), is x also a ch, time shape 2-dim tensor? And I'm sure what would be the expected behavior by this function, especially the shape of the output.

    opened by keunwoochoi 3
Releases(v0.2.3)
Owner
Janne
Music producer, machine learning in MIR & occasional ethical hacker
Janne
Converting UGG files from Rode Wireless Go II transmitters (unsompressed recordings) to WAV format

Rode_WirelessGoII_UGG2wav Converting UGG files from Rode Wireless Go II transmitters (uncompressed recordings) to WAV format Story I backuped the .ugg

Ján Mazanec 31 Dec 22, 2022
Audio processor to map oracle notes in the VoG raid in Destiny 2 to call outs.

vog_oracles Audio processor to map oracle notes in the VoG raid in Destiny 2 to call outs. Huge thanks to mzucker on GitHub for the note detection cod

19 Sep 29, 2022
Terminal-based audio-to-text converter

att Terminal-based audio-to-text converter Project description A terminal-based audio-to-text converter written in python, enabling you to convert .wa

Sven Eschlbeck 4 Dec 15, 2022
The project aims to develop a personal-assistant for Windows & Linux-based systems

The project aims to develop a personal-assistant for Windows & Linux-based systems. Samiksha draws its inspiration from virtual assistants like Cortana for Windows, and Siri for iOS. It has been desi

SHUBHANSHU RAI 1 Jan 16, 2022
Spotipy - Player de música simples em Python

Spotipy Player de música simples em Python, utilizando a biblioteca Pysimplegui para a interface gráfica. Este tocador é bastante simples em si, mas p

Adelino Almeida 4 Feb 28, 2022
kapre: Keras Audio Preprocessors

Kapre Keras Audio Preprocessors - compute STFT, ISTFT, Melspectrogram, and others on GPU real-time. Tested on Python 3.6 and 3.7 Why Kapre? vs. Pre-co

Keunwoo Choi 867 Dec 29, 2022
An Amazon Music client for Linux (unpretentious)

Amusiz An Amazon Music client for Linux (unpretentious) ↗️ Install You can install Amusiz in multiple ways, choose your favorite. 🚀 AppImage Here you

Mirko Brombin 25 Nov 08, 2022
PatrikZero's CS:GO Hearing protection

Program that lowers volume when you die and get flashed in CS:GO. It aims to lower the chance of hearing damage by reducing overall sound exposure. Uses game state integration. Anti-cheat safe.

Patrik Žúdel 224 Dec 04, 2022
Analyze, visualize and process sound field data recorded by spherical microphone arrays.

Sound Field Analysis toolbox for Python The sound_field_analysis toolbox (short: sfa) is a Python port of the Sound Field Analysis Toolbox (SOFiA) too

Division of Applied Acoustics at Chalmers University of Technology 69 Nov 23, 2022
GNOME powered sound conversion

SoundConverter A simple sound converter application for the GNOME environment. It reads anything the GStreamer library can read, and writes Ogg Vorbis

Gautier Portet 188 Dec 17, 2022
A python wrapper for REAPER

pyreaper A python wrapper for REAPER (Robust Epoch And Pitch EstimatoR) Installation pip install pyreaper Demonstration notebnook http://nbviewer.jupy

Ryuichi Yamamoto 56 Dec 27, 2022
A Python wrapper for the high-quality vocoder "World"

PyWORLD - A Python wrapper of WORLD Vocoder Linux Windows WORLD Vocoder is a fast and high-quality vocoder which parameterizes speech into three compo

Jeremy Hsu 583 Dec 15, 2022
This is an OverPowered Vc Music Player! Will work for you and play music in Voice Chatz

VcPlayer This is an OverPowered Vc Music Player! Will work for you and play music in Voice Chatz Telegram Voice-Chat Bot [PyTGCalls] ⇝ Requirements ⇜

1 Dec 20, 2021
Minimal command-line music player written in Python

pyms Minimal command-line music player written in Python. Designed with elegance and minimalism. Resizes dynamically with your terminal. Dependencies

12 Sep 23, 2022
FPGA based USB 2.0 high speed audio interface featuring multiple optical ADAT inputs and outputs

ADAT USB Audio Interface FPGA based USB 2.0 High Speed audio interface featuring multiple optical ADAT inputs and outputs Status / current limitations

Hans Baier 78 Dec 31, 2022
:notes: Cross-platform music player

Exaile Exaile is a music player with a simple interface and powerful music management capabilities. Features include automatic fetching of album art,

Exaile 327 Dec 19, 2022
Spotifyd - An open source Spotify client running as a UNIX daemon.

Spotifyd An open source Spotify client running as a UNIX daemon. Spotifyd streams music just like the official client, but is more lightweight and sup

8.5k Jan 09, 2023
Mina - A Telegram Music Bot 5 mandatory Assistant written in Python using Pyrogram and Py-Tgcalls

Mina - A Telegram Music Bot 5 mandatory Assistant written in Python using Pyrogram and Py-Tgcalls

3 Feb 07, 2022
TONet: Tone-Octave Network for Singing Melody Extraction from Polyphonic Music

TONet Introduction The official implementation of "TONet: Tone-Octave Network for Singing Melody Extraction from Polyphonic Music", in ICASSP 2022 We

Knut(Ke) Chen 29 Dec 01, 2022
A voice control utility for Spotify

Spotify Voice Control A voice control utility for Spotify · Report Bug · Request

Shoubhit Dash 27 Jan 01, 2023