Official implementation of A cappella: Audio-visual Singing VoiceSeparation, from BMVC21

Overview

Y-Net

Official implementation of A cappella: Audio-visual Singing VoiceSeparation, British Machine Vision Conference 2021

Project page: ipcv.github.io/Acappella/
Paper: Arxiv, BMVC (not available yet)

Running a demo / Y-Net Inference

We provide simple functions to load models with pre-trained weights. Steps:

  1. Clone the repo or download y-net>VnBSS>models (models can run as a standalone package)
  2. Load a model:
from VnBSS import y_net_gr # or from models import y_net_gr 
model = y_net_gr(n=1)

Check a demo fully working:
Open In Colab

Citation

@inproceedings{acappella,
    author    = {Juan F. Montesinos and
                 Venkatesh S. Kadandale and
                 Gloria Haro},
    title     = {A cappella: Audio-visual Singing VoiceSeparation},
    booktitle = {British Machine Vision Conference (BMVC)},
    year      = {2021},

}

Repository under construction .
.
.
.
.
.
.
.

Training / Using DEV code

###Training The most difficult part is to prepare the dataset as everything is builded upon a very specific format.
To run training:
python run.py -m model_name --workname experiment_name --arxiv_path directory_of_experiments --pretrained_from path_pret_weights
You can inspect the argparse at default.py>argparse_default.
Possible model names are: y_net_g, y_net_gr, y_net_m,y_net_r,u_net,llcp

Testing

  1. Go to manuscript_scripts and replace checkpoint paths by yours in the testing scripts.
  2. Run: bash manuscript_scripts/test_gr_r.sh
  3. Replace the paths of manuscript_scripts/auto_metrics.py by your experiment_directory path.
  4. Run: python manuscript_scripts/auto_metrics.py to visualise results.

It's a complicated framework. HELP!

The best option to run the framework is to debug! Having a runable code helps to see input shapes, dataflow and to run line by line. Download The circle of life demo with the files already processed. It will act like a dataset of 6 samples. You can download it from Google Drive 1.1 Gb.

  1. Unzip the file
  2. run python run.py -m y_net_gr (for example)

Everything has been configured to run by default this way.

The model

Each effective model is wrapped by a nn.Module which takes care of computing the STFT, the mask, returning the waveform etcetera... This wrapper can be found at VnBSS>models>y_net.py>YNet. To get rid of this you can simply inherit the class, take minimum layers and keep the core_forward method, which is the inference step without the miscelanea.

FAQs

  1. How to change the optimizer's hyperparameters?
    Go to config>optimizer.json
  2. How to change clip duration, video framerate, STFT parameters or audio samplerate?
    Go to config>__init__.py
  3. How to change the batch size or the amount of epochs?
    Go to config>hyptrs.json
  4. How to dump predictions from the training and test set
    Go to default.py. Modify DUMP_FILES (can be controlled at a subset level). force argument skips the iteration-wise conditions and dumps for every single network prediction.
  5. Is tensorboard enabled?
    Yes, you will find tensorboard records at your_experiment_directory/used_workname/tensorboard
  6. Can I resume an experiment?
    Yes, if you set exactly the same experiment folder and workname, the system will detect it and will resume from there.
  7. I'm trying to resume but found AssertionError If there is an exception before running the model
  8. How to change the amount of layers of U-Net
    U-net is build dynamically given a list of layers per block as shown in models>__init__.py from outer to inner blocks.
  9. How to modify the default network values?
    The json file config>net_cfg.json overwrites any default configuration from the model.
Owner
Juan F. Montesinos
PhD student at Pompeu Fabra university Barcelona
Juan F. Montesinos
Scrap electronic music charts into CSV files

musiccharts A small python script to scrap (electronic) music charts into directories with csv files. Installation Download MusicCharts.exe Run MusicC

Dustin Scharf 1 May 11, 2022
Voice to Text using Raspberry Pi

This module will help to convert your voice (speech) into text using Speech Recognition Library. You can control the devices or you can perform the desired tasks by the word recognition

Raspberry_Pi Pakistan 2 Dec 15, 2021
Omniscient Mozart, being able to transcribe everything in the music, including vocal, drum, chord, beat, instruments, and more.

OMNIZART Omnizart is a Python library that aims for democratizing automatic music transcription. Given polyphonic music, it is able to transcribe pitc

MCTLab 1.3k Jan 08, 2023
An 8D music player made to enjoy Halloween this year!🤘

HAPPY HALLOWEEN buddy! Split Player Hello There! Welcome to SplitPlayer... Supposed To Be A 8DPlayer.... You Decide.... It can play the ordinary audio

Akshat Kumar Singh 1 Nov 04, 2021
Official implementation of A cappella: Audio-visual Singing VoiceSeparation, from BMVC21

Y-Net Official implementation of A cappella: Audio-visual Singing VoiceSeparation, British Machine Vision Conference 2021 Project page: ipcv.github.io

Juan F. Montesinos 12 Oct 22, 2022
Generating a structured library of .wav samples with Python.

sample-library Scripts for generating a structured sample library with Python Requires Docker about Samples are written to wave files in lib/. Differe

Ben Mangold 1 Nov 11, 2021
A voice based calculator by using termux api in Android

termux_voice_calculator This is. A voice based calculator by using termux api in Android Instagram account 👉 👈 Requirements and installation Downloa

ʕ´•ᴥ•`ʔ╠ŞĦỮβĦa̷m̷╣ʕ´•ᴥ•`ʔ 2 Apr 29, 2022
A Youtube audio player for your terminal

AudioLine A lightweight Youtube audio player for your terminal Explore the docs » View Demo · Report Bug · Request Feature · Send a Pull Request About

Haseeb Khalid 26 Jan 04, 2023
GiantMIDI-Piano is a classical piano MIDI dataset contains 10,854 MIDI files of 2,786 composers

GiantMIDI-Piano is a classical piano MIDI dataset contains 10,854 MIDI files of 2,786 composers

Bytedance Inc. 1.3k Jan 04, 2023
Just-Music - Spotify API Driven Music Web app, that allows to listen and control and share songs

Just Music... Just Music Is A Web APP That Allows Users To Play Song Using Spoti

Ayush Mishra 3 May 01, 2022
A Python library for audio data augmentation. Inspired by albumentations. Useful for machine learning.

Audiomentations A Python library for audio data augmentation. Inspired by albumentations. Useful for deep learning. Runs on CPU. Supports mono audio a

Iver Jordal 1.2k Jan 07, 2023
Some utils for auto speech recognition

About Some utils for auto speech recognition. Utils Util Description Script Reset audio Reset sample rate, sample width, etc of audios.

1 Jan 24, 2022
Pyroomacoustics is a package for audio signal processing for indoor applications. It was developed as a fast prototyping platform for beamforming algorithms in indoor scenarios.

Summary Pyroomacoustics is a software package aimed at the rapid development and testing of audio array processing algorithms. The content of the pack

Audiovisual Communications Laboratory 1k Jan 09, 2023
Algorithmic Multi-Instrumental MIDI Continuation Implementation

Matchmaker Algorithmic Multi-Instrumental MIDI Continuation Implementation Taming large-scale MIDI datasets with algorithms This is a WIP so please ch

Alex 2 Mar 11, 2022
Hide Your Secret Message in any Wave Audio File.

HiddenWave Embedding secret messages in wave audio file What is HiddenWave Hiddenwave is a python based program for simple audio steganography. You ca

TechChip 99 Dec 28, 2022
TwitterMusicBot - A Twitter bot with Spotify integration.

A Twitter Music Bot 🤖 🎵 🎶 I created this project to learn more about APIs, so it only works for student purposes. Initially, delving into the Spoti

Gustavo Oliveira 2 Jan 02, 2022
Codes for "Efficient Long-Range Attention Network for Image Super-resolution"

ELAN Codes for "Efficient Long-Range Attention Network for Image Super-resolution", arxiv link. Dependencies & Installation Please refer to the follow

xindong zhang 124 Dec 22, 2022
This is an OverPowered Vc Music Player! Will work for you and play music in Voice Chatz

VcPlayer This is an OverPowered Vc Music Player! Will work for you and play music in Voice Chatz Telegram Voice-Chat Bot [PyTGCalls] ⇝ Requirements ⇜

1 Dec 20, 2021
Simple, hackable offline speech to text - using the VOSK-API.

Nerd Dictation Offline Speech to Text for Desktop Linux. This is a utility that provides simple access speech to text for using in Linux without being

Campbell Barton 844 Jan 07, 2023
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

⚠️ Checkout develop branch to see what is coming in pyannote.audio 2.0: a much smaller and cleaner codebase Python-first API (the good old pyannote-au

pyannote 2.1k Dec 31, 2022