The simple project to separate mixed voice (2 clean voices) to 2 separate voices.

Last update: Oct 30, 2022

Related tags

Text Data & NLP speech_separation_PIT

Overview

Speech Separation

The simple project to separate mixed voice (2 clean voices) to 2 separate voices.

Result Example (Clisk to hear the voices): mix || prediction voice1 || prediction voice2

Mix Spectrogram

Predict Voice1's Spectrogram

Predict Voice2's Spectrogram

1. Quick train

Step 1:

Download LibriMixSmall, extract it and move it to the root of the project.

Step 2:

./train.sh

It will take about ONLY 2-3 HOURS to train with normal GPU. After each epoch, the prediction is generated to ./viz_outout folder.

2. Quick inference

./inference.sh The result will be generated to ./viz_outout folder.

3. More detail

Input: The Complex spectrogram. Get from the raw mixed audio signal
Output: The complex ratio mask (cRM) ---> complex spectrogram ---> separated voices.
Model: Use the simple version of this implementation , which is defined in paper Looking to Listen at the Cocktail Party: A Speaker-Independent Audio-Visual Model for Speech Separation
Loss function: Permutation Invariant Training Loss and PairWise Neg SisDr Loss (more SOTA)
Dataset: A small version of LibriMix dataset. I get from LibriMixSmall

4. Current problem

Due to small dataset size for fast training, the model is a bit overfitting to the training set. Use the bigger dataset will potentially help to overcome that. Some suggestions:

Use the original LibriMix Dataset which is way much bigger (around 60 times bigger that what I have trained).
Use this work to download much more in-the-wild dataset and use datasets/VoiceMixtureDataset.py instead of the Libri one that I am using. p/s I have trained and it work too.

The simple project to separate mixed voice (2 clean voices) to 2 separate voices.

Related tags

Overview

Speech Separation

1. Quick train

Step 1:

Step 2:

2. Quick inference

3. More detail

4. Current problem

Owner

vuthede

📝An easy-to-use package to restore punctuation of the text.

A simple command line tool for text to image generation, using OpenAI's CLIP and a BigGAN

Telegram bot to auto post messages of one channel in another channel as soon as it is posted, without the forwarded tag.

天池中药说明书实体识别挑战冠军方案；中文命名实体识别；NER; BERT-CRF & BERT-SPAN & BERT-MRC；Pytorch

Bpe algorithm can finetune tokenizer - Bpe algorithm can finetune tokenizer

MicBot - MicBot uses Google Translate to speak everyone's chat messages

Dual languaged (rus+eng) tool for packing and unpacking archives of Silky Engine.

🏆 • 5050 most frequent words in 109 languages

Develop open-source Python Arabic NLP libraries that the Arab world will easily use in all Natural Language Processing applications

An Explainable Leaderboard for NLP

a chinese segment base on crf

Bidirectional LSTM-CRF and ELMo for Named-Entity Recognition, Part-of-Speech Tagging and so on.

Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

Chinese Named Entity Recognization (BiLSTM with PyTorch)

A simple visual front end to the Maya UE4 RBF plugin delivered with MetaHumans

A curated list of efficient attention modules

OCR을 이용하여 인원수를 인식 후 줌을 Kill 해줍니다

An Open-Source Package for Neural Relation Extraction (NRE)

Retraining OpenAI's GPT-2 on Discord Chats

Shared, streaming Python dict