Audio augmentations library for PyTorch for audio in the time-domain

Last update: Jan 08, 2023

Related tags

Overview

Audio Augmentations

Audio augmentations library for PyTorch for audio in the time-domain, with support for stochastic data augmentations as used often in self-supervised / contrastive learning.

Usage

We can define several audio augmentations, which will be applied sequentially to a raw audio waveform:

from audio_augmentations import *

audio, sr = torchaudio.load("tests/classical.00002.wav")

num_samples = sr * 5
transforms = [
    RandomResizedCrop(n_samples=num_samples),
    RandomApply([PolarityInversion()], p=0.8),
    RandomApply([Noise(min_snr=0.3, max_snr=0.5)], p=0.3),
    RandomApply([Gain()], p=0.2),
    RandomApply([HighLowPass(sample_rate=sr)], p=0.8),
    RandomApply([Delay(sample_rate=sr)], p=0.5),
    RandomApply([PitchShift(
        n_samples=num_samples,
        sample_rate=sr
    )], p=0.4),
    RandomApply([Reverb(sample_rate=sr)], p=0.3)
]

We can return either one or many versions of the same audio example:

transform = Compose(transforms=transforms)
transformed_audio =  transform(audio)
>> transformed_audio.shape[0] = 1

> transformed_audio.shape[0] = 4 ">

audio = torchaudio.load("testing/classical.00002.wav")
transform = ComposeMany(transforms=transforms, num_augmented_samples=4)
transformed_audio = transform(audio)
>> transformed_audio.shape[0] = 4

Similar to the torchvision.datasets interface, an instance of the Compose or ComposeMany class can be supplied to a torchaudio dataloaders that accept transform=.

Optional

Install WavAugment for reverberation / pitch shifting:

pip install git+https://github.com/facebookresearch/WavAugment

Cite

You can cite this work with the following BibTeX:

@misc{spijkervet_torchaudio_augmentations,
  doi = {10.5281/ZENODO.4748582},
  url = {https://zenodo.org/record/4748582},
  author = {Spijkervet,  Janne},
  title = {Spijkervet/torchaudio-augmentations},
  publisher = {Zenodo},
  year = {2021},
  copyright = {MIT License}
}

Comments

Delay augmentation on cuda

Hi. Currently the delay augmentation doesn't work on gpu since part of the signal is on cpu. I think making thebeginning tensor same as the audio tensor device should fix it. Thanks. https://github.com/Spijkervet/torchaudio-augmentations/blob/d044f9d020e12032ab9280acf5f34a337e72d212/torchaudio_augmentations/augmentations/delay.py#L31

opened by sidml 2
Correctness unit test would be great

For some transforms, we can test if the values are actually correct by manually computing the expected value. For example, PolarityInversion could be test with some tiny tensors like [[0.1, 0.5, -1.0]]. Reverse as well. Probably only those two? Still, it'd be better than not having any.

opened by keunwoochoi 2
Default value of `max_snr` in `Noise`

1.0 of SNR with signal and white noise would be a really heavily corrupted signal. Could we set it to be a little more reasonable value?

Related; it'd be great if one can hear some examples of the augmented result.

opened by keunwoochoi 2
End-to-end PitchShift transform tests

This merge requests adds end-to-end pitch transformation detection with librosa's pYIN pitch detection, to test if the applied transformation yields the expected pitch transposition.

opened by Spijkervet 0
Unittests

This adds various unittests and fixes to multi-channel input for Reverb, Pitch, Reverse and HighLowPass filter augmentations. It also removes Essentia as a dependency, and instead uses julius for IRR filtering.

opened by Spijkervet 0
import error

when i import torchaudio_augmentation

I got the error

RuntimeError : torchaudio.sox_effects.sox_effects.effect_names requires module: torchaudio._torchaudio

how can I deal with it?

opened by EavnJeong 0
Snr db
Hi, Thanks for the interesting work. Allow me to suggest this change for two reasons:

Expressing SNR in dB cancels the doubt there might be between power SNR and RMS SNR.

When sampling an SNR, it feels to me like it makes more sense to uniformly sample from the log scale of the dB than on the linear range. This way you ensure that your low SNR have as much chances as your high SNR.

I hope it makes sens. I'd be glad to discuss further about that.
opened by wesbz 0
sanity check for duration

In transforms where the duration may change, if the input audio is shorter than n_samples, the error message is not intuitive. I forgot but in some case, the multiprocessing-based dataloader silently died. Maybe it's worth checking it somewhere?

opened by keunwoochoi 0
Shapes are still a bit confusing

From ComposeMany.__call__(), is x also a ch, time shape 2-dim tensor? And I'm sure what would be the expected behavior by this function, especially the shape of the output.

opened by keunwoochoi 3

Releases(v0.2.3)

v0.2.3(Nov 15, 2021)
This version:

Removes WavAugment's pitch detection, and instead relies on the parallelisable torch-pitch-shift package.

Adds support for batched inputs (batch, channel, time).

It also adds more tests for every audio transformation.

Source code(tar.gz)
Source code(zip)
0.2.0(Jun 29, 2021)

Removes the Essentia library as a dependency and adds compatibility for multi-channel audio.
Source code(tar.gz)
Source code(zip)
1.0(May 11, 2021)

Source code(tar.gz)
Source code(zip)

Owner

Janne

Music producer, machine learning in MIR & occasional ethical hacker

GitHub Repository

AudioDVP:Photorealistic Audio-driven Video Portraits

AudioDVP This is the official implementation of Photorealistic Audio-driven Video Portraits. Major Requirements Ubuntu = 18.04 PyTorch = 1.2 GCC =

232 Jan 03, 2023

Gateware for the Terasic/Arrow DECA board, to become a USB2 high speed audio interface

DECA USB Audio Interface DECA based USB 2.0 High Speed audio interface Status / current limitations enumerates as class compliant audio device on Linu

16 Mar 21, 2022

A Music Player Bot for Discord Servers

2 Oct 25, 2021

An audio guide for destroying oracles in Destiny's Vault of Glass raid

prophet An audio guide for destroying oracles in Destiny's Vault of Glass raid. This project allows you to make any encounter with oracles without hav

24 Sep 15, 2022

Python tools for the corpus analysis of popular music.

CATCHY Corpus Analysis Tools for Computational Hook discovery Python tools for the corpus analysis of popular music recordings. The tools can be used

20 Aug 20, 2022

Enhanced Audio Player for Discord

Discodo is an enhanced audio player for discord

42 Oct 05, 2022

A Python library for audio data augmentation. Inspired by albumentations. Useful for machine learning.

Audiomentations A Python library for audio data augmentation. Inspired by albumentations. Useful for deep learning. Runs on CPU. Supports mono audio a

1.2k Jan 07, 2023

MUSIC-AVQA, CVPR2022 (ORAL)

Audio-Visual Question Answering (AVQA) PyTorch code accompanies our CVPR 2022 paper: Learning to Answer Questions in Dynamic Audio-Visual Scenarios (O

44 Dec 23, 2022

This is an OverPowered Vc Music Player! Will work for you and play music in Voice Chatz

VcPlayer This is an OverPowered Vc Music Player! Will work for you and play music in Voice Chatz Telegram Voice-Chat Bot [PyTGCalls] ⇝ Requirements ⇜

1 Dec 20, 2021

Real-time audio visualizations (spectrum, spectrogram, etc.)

Friture Friture is an application to visualize and analyze live audio data in real-time. Friture displays audio data in several widgets, such as a sco

700 Dec 31, 2022

Analysis of voices based on the Mel-frequency band

Speaker_partition_module Analysis of voices based on the Mel-frequency band. Goal: Identification of voices speaking (diarization) and calculation of

1 Feb 06, 2022

Music bot of # Owner

Pokimane-Music Music bot of # Owner How To Host The easiest way to deploy this Bot Support Channel :- TeamDlt Support Group :- TeamDlt Please fork thi

5 Dec 23, 2022

Pythonic bindings for FFmpeg's libraries.

PyAV PyAV is a Pythonic binding for the FFmpeg libraries. We aim to provide all of the power and control of the underlying library, but manage the gri

1.8k Jan 03, 2023

A Python wrapper for the high-quality vocoder "World"

PyWORLD - A Python wrapper of WORLD Vocoder Linux Windows WORLD Vocoder is a fast and high-quality vocoder which parameterizes speech into three compo

583 Dec 15, 2022

Expressive Digital Signal Processing (DSP) package for Python

AudioLazy Development Last release PyPI status Real-Time Expressive Digital Signal Processing (DSP) Package for Python! Laziness and object representa

642 Dec 26, 2022

Voice package for Pycord adding extra features.

VoiceIO Voice package for Pycord adding extra features. Example Down bellow is an example of what you can currently do. import voiceio process = voic

1 Dec 24, 2021

Official implementation of A cappella: Audio-visual Singing VoiceSeparation, from BMVC21

Y-Net Official implementation of A cappella: Audio-visual Singing VoiceSeparation, British Machine Vision Conference 2021 Project page: ipcv.github.io

12 Oct 22, 2022

Python module for handling audio metadata

Mutagen is a Python module to handle audio metadata. It supports ASF, FLAC, MP4, Monkey's Audio, MP3, Musepack, Ogg Opus, Ogg FLAC, Ogg Speex, Ogg The

1.1k Dec 31, 2022

Voice helper on russian

1 Jun 30, 2022

Sparse Beta-Divergence Tensor Factorization Library

NTFLib Sparse Beta-Divergence Tensor Factorization Library Based off of this beta-NTF project this library is specially-built to handle tensors where

46 Jan 08, 2022