PyTorch implementation of SampleRNN: An Unconditional End-to-End Neural Audio Generation Model

Overview

samplernn-pytorch

A PyTorch implementation of SampleRNN: An Unconditional End-to-End Neural Audio Generation Model.

A visual representation of the SampleRNN architecture

It's based on the reference implementation in Theano: https://github.com/soroushmehr/sampleRNN_ICLR2017. Unlike the Theano version, our code allows training models with arbitrary number of tiers, whereas the original implementation allows maximum 3 tiers. However it doesn't allow using LSTM units (only GRU). For more details and motivation behind rewriting this model to PyTorch, see our blog post: http://deepsound.io/samplernn_pytorch.html.

Dependencies

This code requires Python 3.5+ and PyTorch 0.1.12+. Installation instructions for PyTorch are available on their website: http://pytorch.org/. You can install the rest of the dependencies by running pip install -r requirements.txt.

Datasets

We provide a script for creating datasets from YouTube single-video mixes. It downloads a mix, converts it to wav and splits it into equal-length chunks. To run it you need youtube-dl (a recent version; the latest version from pip should be okay) and ffmpeg. To create an example dataset - 4 hours of piano music split into 8 second chunks, run:

cd datasets
./download-from-youtube.sh "https://www.youtube.com/watch?v=EhO_MrRfftU" 8 piano

You can also prepare a dataset yourself. It should be a directory in datasets/ filled with equal-length wav files. Or you can create your own dataset format by subclassing torch.utils.data.Dataset. It's easy, take a look at dataset.FolderDataset in this repo for an example.

Training

To train the model you need to run train.py. All model hyperparameters are settable in the command line. Most hyperparameters have sensible default values, so you don't need to provide all of them. Run python train.py -h for details. To train on the piano dataset using the best hyperparameters we've found, run:

python train.py --exp TEST --frame_sizes 16 4 --n_rnn 2 --dataset piano

The results - training log, loss plots, model checkpoints and generated samples will be saved in results/.

We also have an option to monitor the metrics using CometML. To use it, just pass your API key as --comet_key parameter to train.py.

Owner
DeepSound
DeepSound is a project of creating a system capable of generating music using deep learning methods, led by two students pursuing their Master's degree.
DeepSound
Hierarchical Memory Matching Network for Video Object Segmentation (ICCV 2021)

Hierarchical Memory Matching Network for Video Object Segmentation Hongje Seong, Seoung Wug Oh, Joon-Young Lee, Seongwon Lee, Suhyeon Lee, Euntai Kim

Hongje Seong 72 Dec 14, 2022
Code for Multimodal Neural SLAM for Interactive Instruction Following

Code for Multimodal Neural SLAM for Interactive Instruction Following Code structure The code is adapted from E.T. and most training as well as data p

7 Dec 07, 2022
Open source repository for the code accompanying the paper 'Non-Rigid Neural Radiance Fields Reconstruction and Novel View Synthesis of a Deforming Scene from Monocular Video'.

Non-Rigid Neural Radiance Fields This is the official repository for the project "Non-Rigid Neural Radiance Fields: Reconstruction and Novel View Synt

Facebook Research 296 Dec 29, 2022
Official PyTorch implementation of BlobGAN: Spatially Disentangled Scene Representations

BlobGAN: Spatially Disentangled Scene Representations Official PyTorch Implementation Paper | Project Page | Video | Interactive Demo BlobGAN.mp4 This

148 Dec 29, 2022
Multiwavelets-based operator model

Multiwavelet model for Operator maps Gaurav Gupta, Xiongye Xiao, and Paul Bogdan Multiwavelet-based Operator Learning for Differential Equations In Ne

Gaurav 33 Dec 04, 2022
An efficient and easy-to-use deep learning model compression framework

TinyNeuralNetwork 简体中文 TinyNeuralNetwork is an efficient and easy-to-use deep learning model compression framework, which contains features like neura

Alibaba 441 Dec 25, 2022
Video lie detector using xgboost - A video lie detector using OpenFace and xgboost

video_lie_detector_using_xgboost a video lie detector using OpenFace and xgboost

2 Jan 11, 2022
an implementation of 3D Ken Burns Effect from a Single Image using PyTorch

3d-ken-burns This is a reference implementation of 3D Ken Burns Effect from a Single Image [1] using PyTorch. Given a single input image, it animates

Simon Niklaus 1.4k Dec 28, 2022
Orbivator AI - To Determine which features of data (measurements) are most important for diagnosing breast cancer and find out if breast cancer occurs or not.

Orbivator_AI Breast Cancer Wisconsin (Diagnostic) GOAL To Determine which features of data (measurements) are most important for diagnosing breast can

anurag kumar singh 1 Jan 02, 2022
[CVPR2021 Oral] End-to-End Video Instance Segmentation with Transformers

VisTR: End-to-End Video Instance Segmentation with Transformers This is the official implementation of the VisTR paper: Installation We provide instru

Yuqing Wang 687 Jan 07, 2023
Web-interface + rest API for classification and regression (https://jeff1evesque.github.io/machine-learning.docs)

Machine Learning This project provides a web-interface, as well as a programmatic-api for various machine learning algorithms. Supported algorithms: S

Jeff Levesque 252 Dec 11, 2022
This project generates news headlines using a Long Short-Term Memory (LSTM) neural network.

News Headlines Generator bunnysaini/Generate-Headlines Goal This project aims to generate news headlines using a Long Short-Term Memory (LSTM) neural

Bunny Saini 1 Jan 24, 2022
Patch Rotation: A Self-Supervised Auxiliary Task for Robustness and Accuracy of Supervised Models

Patch-Rotation(PatchRot) Patch Rotation: A Self-Supervised Auxiliary Task for Robustness and Accuracy of Supervised Models Submitted to Neurips2021 To

4 Jul 12, 2021
Towards Flexible Blind JPEG Artifacts Removal (FBCNN, ICCV 2021)

Towards Flexible Blind JPEG Artifacts Removal (FBCNN, ICCV 2021)

Jiaxi Jiang 282 Jan 02, 2023
Automates Machine Learning Pipeline with Feature Engineering and Hyper-Parameters Tuning :rocket:

MLJAR Automated Machine Learning Documentation: https://supervised.mljar.com/ Source Code: https://github.com/mljar/mljar-supervised Table of Contents

MLJAR 2.4k Dec 31, 2022
DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generative Transformers

DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generative Transformers Authors: Jaemin Cho, Abhay Zala, and Mohit Bansal (

Jaemin Cho 98 Dec 15, 2022
Unoffical reMarkable AddOn for Firefox.

reMarkable for Firefox (Download) This repo converts the offical reMarkable Chrome Extension into a Firefox AddOn published here under the name "Unoff

Jelle Schutter 45 Nov 28, 2022
The repository for freeCodeCamp's YouTube course, Algorithmic Trading in Python

Algorithmic Trading in Python This repository Course Outline Section 1: Algorithmic Trading Fundamentals What is Algorithmic Trading? The Differences

Nick McCullum 1.8k Jan 02, 2023
Code for paper "Context-self contrastive pretraining for crop type semantic segmentation"

Code for paper "Context-self contrastive pretraining for crop type semantic segmentation" Setting up a python environment Follow the instruction in ht

Michael Tarasiou 11 Oct 09, 2022
PyTorch Implementation of ECCV 2020 Spotlight TuiGAN: Learning Versatile Image-to-Image Translation with Two Unpaired Images

TuiGAN-PyTorch Official PyTorch Implementation of "TuiGAN: Learning Versatile Image-to-Image Translation with Two Unpaired Images" (ECCV 2020 Spotligh

181 Dec 09, 2022