PyTorch wrappers for using your model in audacity!

Overview

torchaudacity

This package contains utilities for prepping PyTorch audio models for use in Audacity. More specifically, it provides abstract classes for you to wrap your waveform-to-waveform and waveform-to-labels models (see the Deep Learning for Audacity website to learn more about deep learning models for audacity).

img

Table of Contents


Contributing Models to Audacity

Audacity is equipped with a wrapper framework for deep learning models written in PyTorch. Audacity contains two deep learning tools: Deep Learning Effect and Deep Learning Analyzer.
Deep Learning Effect performs waveform to waveform processing, and is useful for audio-in-audio-out tasks (such as source separation, voice conversion, style transfer, amplifier emulation, etc.), while Deep Learning Analyzer performs waveform to labels processing, and is useful for annotation tasks (such as sound event detection, musical instrument recognition, automatic speech recognition, etc.). torchaudacity contains two abstract classes for serializing two types of models: waveform-to-waveform and waveform-to-labels. The classes are WaveformToWaveform, and WaveformToLabels, respectively.

Choosing an Effect Type

Waveform to Waveform models

Waveform-to-waveform models receive a single multichannel audio track as input, and may write to a variable number of new audio tracks as output.

Example models for waveform-to-waveform effects include source separation, neural upsampling, guitar amplifier emulation, generative models, etc. Output tensors for waveform-to-waveform models must be multichannel waveform tensors with shape (num_output_channels, num_samples). For every audio waveform in the output tensor, a new audio track is created in the Audacity project.

Waveform to Labels models

Waveform-to-labels models receive a single multichannel audio track as input, and may write to an output label track as output. The waveform-to-labels effect can be used for many audio analysis applications, such as voice activity detection, sound event detection, musical instrument recognition, automatic speech recognition, etc. The output for waveform-to-labels models must be a tuple of two tensors. The first tensor corresponds to the class probabilities for each label present in the waveform, shape (num_timesteps, num_classes). The second tensor must contain timestamps with start and stop times for each label, shape (num_timesteps, 2).

Model Metadata

Certain details about the model, such as its sample rate, tool type (e.g. waveform-to-waveform or waveform-to-labels), list of labels, etc. must be provided by the model contributor in a separate metadata.json file. In order to help users choose the correct model for their required task, model contributors are asked to provide a short and long description of the model, the target domain of the model (e.g. speech, music, environmental, etc.), as well as a list of tags or keywords as part of the metadata. For waveform-to-label models, the model contributor may include an optional confidence threshold, where predictions with a probability lower than the confidence threshold are labeled as ``uncertain''.

Metadata Spec

required fields:

  • sample_rate (int)
    • range (0, 396000)
    • Model sample rate. Input tracks will be resampled to this value.
  • domains (List[str])
    • List of data domains for the model. The list should contain any of the following strings (any others will be ignored): ["music", "speech", "environmental", "other"]
  • short_description(str)
    • max 60 chars
    • short description of the model. should contain a brief message with the model's purpose, e.g. "Use me for separating vocals from the background!".
  • long_description (str)
    • max 280 chars
    • long description of the model. Shown in the detailed view of the model UI.
  • tags (List[str])
    • list of tags (to be shown in the detailed view)
    • each tag should be 15 characters max
    • max 5 tags per model.
  • labels (List[str)
    • output labels for the model. Depending on the effect type, this field means different things
    • waveform-to-waveform
      • name of each output source (e.g. drums, bass, vocal). To create the track name for each output source, each one of the labels will be appended to the mixture track's name.
    • waveform-to-labels:
      • labeler models should output a list of class probabilities with shape (n_timesteps, n_class) and a list of start/stop timestamps for each label (n_timesteps, 2). The labeler effect will create a add new labels by taking the argmax of each class probability and indexing into the metadata's labels.
  • effect_type (str)
    • Target effect for this model. Must be one of ["waveform-to-waveform", "waveform-to-labels"].
  • multichannel (bool)
    • If multichannel is set to true, stereo tracks are passed to the model as multichannel audio tensors, with shape (2, n). Note that this means that the input could either be a mono track with shape (1, n) or stereo track with shape (2, n).
    • If multichannel is set to false, stereo tracks are downmixed, meaning that the input audio tensor will always be shape (1, n).

Example - Waveform-to-Waveform model

Here's a minimal example for a model that simply boosts volume by multiplying the incoming audio by a factor of 2.

We can sum up the whole process into 4 steps:

  1. Developing your model
  2. Wrapping your model using torchaudio
  3. Creating a metadata document
  4. Exporting to HuggingFace

Developing your model

First, we create our model. There are no internal constraints on what the internal model architecture should be, as long as you can use torch.jit.script or torch.jit.trace to serialize it, and it is able to meet the input-output constraints specified in waveform-to-waveform and waveform-to-labels models.

import torch.nn as nn

class MyVolumeModel(nn.Module):

    def __init__(self):
        super().__init__()

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        # do the neural net magic!
        x = x * 2

        return x

Making sure your model is compatible with torchscript

PyTorch makes it really easy to deploy your Python models in C++ by using torchscript, an intermediate representation format for torch models that can be called in C++. Many of Python's built-in functions are supported by torchscript. However, not all Python operations are supported by the torchscript environment, meaning that you are only allowed to use a subset of Python operations in your model code. See the torch.jit docs to learn more about writing torchscript-compatible code.

If your model computes spectrograms (or requires any kind of preprocessing/postprocessing), make sure those operations are compatible with torchscript, like torchaudio's operation set.

Useful links:

Wrapping your model using torchaudio

Now, we create a wrapper class for our model. Because our model returns an audio waveform as output, we'll use WaveformToWaveform as our parent class. For both WaveformToWaveform and WaveformToLabels, we need to implement the do_forward_pass method with our processing code. See the docstrings for more details.

from torchaudacity import WaveformToWaveform

class MyVolumeModelWrapper(WaveformToWaveform):

    def __init__(self, model):
        model.eval()
        self.model = model
    
    def do_forward_pass(self, x: torch.Tensor) -> torch.Tensor:
        
        # do any preprocessing here! 
        # expect x to be a waveform tensor with shape (n_channels, n_samples)

        output = self.model(x)

        # do any postprocessing here!
        # the return value should be a multichannel waveform tensor with shape (n_channels, n_samples)
    
        return output

Creating a metadata document

Audacity models need a metadata file. See the metadata spec to learn about the required fields.

metadata = {
    'sample_rate': 48000, 
    'domain_tags': ['music', 'speech', 'environmental'],
    'short_description': 'Use me to boost volume by 3dB :).',
    'long_description':  'This description can be a max of 280 characters aaaaaaaaaaaaaaaaaaaa.',
    'tags': ['volume boost'],
    'labels': ['boosted'],
    'effect_type': 'waveform-to-waveform',
    'multichannel': False,
}

All set! We can now proceed to serialize the model to torchscript and save the model, along with its metadata.

from pathlib import Path
from torchaudacity import save_model

# create a root dir for our model
root = Path('booster-net')
root.mkdir(exist_ok=True, parents=True)

# get our model
model = MyVolumeModel()

# wrap it
wrapper = MyVolumeModelWrapper(model)

# serialize it
# an alternative is to use torch.jit.trace
serialized_model = torch.jit.script(serialized_model)

# save!
save_model(serialized_model, metadata, root)

Exporting to HuggingFace

You should now have a directory structure that looks like this:

/booster-net/
/booster-net/model.pt
/booster-net/metadata.json

This will be the repository for your audacity model. Make sure to add a readme with the audacity tag in the YAML metadata, so it show up on the explore tab of Audacity's Deep Learning Tools.

Create a README.md inside booster-net/, and add the following header:

in README.md

---
tags: audacity
---

Awesome! It's time to push to HuggingFace. See their documentation for adding a model to the HuggingFace model hub.

Example - Exporting a Pretrained Asteroid model

See this example notebook, where we serialize a pretrained ConvTasNet model for speech separation using the Asteroid source separation library.


Owner
PhD @interactiveaudiolab
Code for paper "Energy-Constrained Compression for Deep Neural Networks via Weighted Sparse Projection and Layer Input Masking"

model_based_energy_constrained_compression Code for paper "Energy-Constrained Compression for Deep Neural Networks via Weighted Sparse Projection and

Haichuan Yang 16 Jun 15, 2022
Kaldi-compatible feature extraction with PyTorch, supporting CUDA, batch processing, chunk processing, and autograd

Kaldi-compatible feature extraction with PyTorch, supporting CUDA, batch processing, chunk processing, and autograd

Fangjun Kuang 119 Jan 03, 2023
A PyTorch implementation of EfficientNet

EfficientNet PyTorch Quickstart Install with pip install efficientnet_pytorch and load a pretrained EfficientNet with: from efficientnet_pytorch impor

Luke Melas-Kyriazi 7.2k Jan 06, 2023
A tiny scalar-valued autograd engine and a neural net library on top of it with PyTorch-like API

micrograd A tiny Autograd engine (with a bite! :)). Implements backpropagation (reverse-mode autodiff) over a dynamically built DAG and a small neural

Andrej 3.5k Jan 08, 2023
Tez is a super-simple and lightweight Trainer for PyTorch. It also comes with many utils that you can use to tackle over 90% of deep learning projects in PyTorch.

Tez: a simple pytorch trainer NOTE: Currently, we are not accepting any pull requests! All PRs will be closed. If you want a feature or something does

abhishek thakur 1.1k Jan 04, 2023
Model summary in PyTorch similar to `model.summary()` in Keras

Keras style model.summary() in PyTorch Keras has a neat API to view the visualization of the model which is very helpful while debugging your network.

Shubham Chandel 3.7k Dec 29, 2022
Implementation of LambdaNetworks, a new approach to image recognition that reaches SOTA with less compute

Lambda Networks - Pytorch Implementation of λ Networks, a new approach to image recognition that reaches SOTA on ImageNet. The new method utilizes λ l

Phil Wang 1.5k Jan 07, 2023
torch-optimizer -- collection of optimizers for Pytorch

torch-optimizer torch-optimizer -- collection of optimizers for PyTorch compatible with optim module. Simple example import torch_optimizer as optim

Nikolay Novik 2.6k Jan 03, 2023
Pretrained ConvNets for pytorch: NASNet, ResNeXt, ResNet, InceptionV4, InceptionResnetV2, Xception, DPN, etc.

Pretrained models for Pytorch (Work in progress) The goal of this repo is: to help to reproduce research papers results (transfer learning setups for

Remi 8.7k Dec 31, 2022
Bunch of optimizer implementations in PyTorch

Bunch of optimizer implementations in PyTorch

Hyeongchan Kim 76 Jan 03, 2023
A pure Python implementation of Compact Bilinear Pooling and Count Sketch for PyTorch.

Compact Bilinear Pooling for PyTorch. This repository has a pure Python implementation of Compact Bilinear Pooling and Count Sketch for PyTorch. This

Grégoire Payen de La Garanderie 234 Dec 07, 2022
OptNet: Differentiable Optimization as a Layer in Neural Networks

OptNet: Differentiable Optimization as a Layer in Neural Networks This repository is by Brandon Amos and J. Zico Kolter and contains the PyTorch sourc

CMU Locus Lab 428 Dec 24, 2022
PyGCL: Graph Contrastive Learning Library for PyTorch

PyGCL is an open-source library for graph contrastive learning (GCL), which features modularized GCL components from published papers, standardized evaluation, and experiment management.

GCL: Graph Contrastive Learning Library for PyTorch 592 Jan 07, 2023
An implementation of Performer, a linear attention-based transformer, in Pytorch

Performer - Pytorch An implementation of Performer, a linear attention-based transformer variant with a Fast Attention Via positive Orthogonal Random

Phil Wang 900 Dec 22, 2022
pip install antialiased-cnns to improve stability and accuracy

Antialiased CNNs [Project Page] [Paper] [Talk] Making Convolutional Networks Shift-Invariant Again Richard Zhang. In ICML, 2019. Quick & easy start Ru

Adobe, Inc. 1.6k Dec 28, 2022
A Pytorch Implementation for Compact Bilinear Pooling.

CompactBilinearPooling-Pytorch A Pytorch Implementation for Compact Bilinear Pooling. Adapted from tensorflow_compact_bilinear_pooling Prerequisites I

169 Dec 23, 2022
Fast, general, and tested differentiable structured prediction in PyTorch

Torch-Struct: Structured Prediction Library A library of tested, GPU implementations of core structured prediction algorithms for deep learning applic

HNLP 1.1k Jan 07, 2023
High-level batteries-included neural network training library for Pytorch

Pywick High-Level Training framework for Pytorch Pywick is a high-level Pytorch training framework that aims to get you up and running quickly with st

382 Dec 06, 2022
Training PyTorch models with differential privacy

Opacus is a library that enables training PyTorch models with differential privacy. It supports training with minimal code changes required on the cli

1.3k Dec 29, 2022
TorchShard is a lightweight engine for slicing a PyTorch tensor into parallel shards

TorchShard is a lightweight engine for slicing a PyTorch tensor into parallel shards. It can reduce GPU memory and scale up the training when the model has massive linear layers (e.g., ViT, BERT and

Kaiyu Yue 275 Nov 22, 2022