VoiceFixer VoiceFixer is a framework for general speech restoration.

Overview

Open In Colab PyPI version

VoiceFixer

VoiceFixer is a framework for general speech restoration. We aim at the restoration of severly degraded speech and historical speech.

46dPxJ.png

Paper

⚠️ We submit this paper to ICLR2022. Preprint on arxiv will be available before Oct.03 2021!

Usage

⚠️ Still working on it, stay tuned! Expect to be available before 2021.09.30.

Environment

# Download dataset and prepare running environment
source init.sh 

Train from scratch

Let's take VF_UNet(voicefixer with unet as analysis module) as an example. Other model have the similar training and evaluation logic.

cd general_speech_restoration/voicefixer/unet
source run.sh

After that, you will get a log directory that look like this

├── unet
│   └── log
│       └── 2021-09-27-xxx
│           └── version_0
│               └── checkpoints
                    └──epoch=1.ckpt
│               └── code

Evaluation

Automatic evaluation and generate .csv file for the results.

cd general_speech_restoration/voicefixer/unet
# Basic usage
python3 handler.py  -c <str, path-to-checkpoint> \
                    -t <str, testset> \ 
                    -l <int, limit-utterance-number> \ 
                    -d <str, description of this evaluation> \ 

For example, if you like to evaluate on all testset. And each testset you intend to limit the number to 10 utterance.

python3 handler.py  -c  log/2021-09-27-xxx/version_0/checkpoints/epoch=1.ckpt \
                    -t  base \ 
                    -l  10 \ 
                    -d  ten_utterance_for_each_testset \ 

There are generally seven testsets:

  • base: all testset
  • clip: testset with speech that have clipping threshold of 0.1, 0.25, and 0.5
  • reverb: testset with reverberate speech
  • general_speech_restoration: testset with speech that contain all kinds of random distortions
  • enhancement: testset with noisy speech
  • speech_super_resolution: testset with low resolution speech that have sampling rate of 2kHz, 4kHz, 8kHz, 16kHz, and 24kHz.

Demo

Demo page

Demo page contains comparison between single task speech restoration, general speech restoration, and voicefixer.

Pip package

We wrote a pip package for voicefixer.

Colab

You can try voicefixer using your own voice on colab!

real-life-example real-life-example real-life-example

Project Structure

.
├── dataloaders 
│   ├── augmentation # code for speech data augmentation.
│   └── dataloader # code for different kinds of dataloaders.
├── datasets 
│   ├── datasetParser # code for preparing each dataset
│   └── se # Dataset for speech enhancement (source init.sh)
│       ├── RIR_44k # Room Impulse Response 44.1kHz
│       │   ├── test
│       │   └── train
│       ├── TestSets # Evaluation datasets
│       │   ├── ALL_GSR # General speech restoration testset
│       │   │   ├── simulated
│       │   │   └── target
│       │   ├── DECLI # Speech declipping testset
│       │   │   ├── 0.1 # Different clipping threshold
│       │   │   ├── 0.25
│       │   │   ├── 0.5
│       │   │   └── GroundTruth
│       │   ├── DENOISE # Speech enhancement testset
│       │   │   └── vd_test
│       │   │       ├── clean_testset_wav
│       │   │       └── noisy_testset_wav
│       │   ├── DEREV # Speech dereverberation testset
│       │   │   ├── GroundTruth
│       │   │   └── Reverb_Speech
│       │   └── SR # Speech super resolution testset
│       │       ├── GroundTruth
│       │       └── cheby1
│       │           ├── 1000 # Different cutoff frequencies
│       │           ├── 12000
│       │           ├── 2000
│       │           ├── 4000
│       │           └── 8000
│       ├── vd_noise # Noise training dataset
│       └── wav48 # Speech training dataset
│           ├── test # Not used, included for completeness
│           └── train 
├── evaluation # The code for model evaluation
├── exp_results # The Folder that store evaluation result (in handler.py).
├── general_speech_restoration # GSR 
│   ├── unet # GSR_UNet
│   │   └── model_kqq_lstm_mask_gan
│   └── voicefixer # Each folder contains the training entry for each model.
│       ├── dnn # VF_DNN
│       ├── lstm # VF_LSTM
│       ├── unet # VF_UNet
│       └── unet_small # VF_UNet_S
├── resources 
├── single_task_speech_restoration # SSR
│   ├── declip_unet # Declip_UNet
│   ├── derev_unet # Derev_UNet
│   ├── enh_unet # Enh_UNet
│   └── sr_unet # SR_UNet
├── tools
└── callbacks

Citation

⚠️ Will be available once the paper is ready.

A relatively simple python program to generate one of those reddit text to speech videos dominating youtube.

Reddit text to speech generator A basic reddit tts video generator Current functionality Generate videos for subs based on comments,(askreddit) so rea

Aadvik 17 Dec 19, 2022
Chinese Pre-Trained Language Models (CPM-LM) Version-I

CPM-Generate 为了促进中文自然语言处理研究的发展,本项目提供了 CPM-LM (2.6B) 模型的文本生成代码,可用于文本生成的本地测试,并以此为基础进一步研究零次学习/少次学习等场景。[项目首页] [模型下载] [技术报告] 若您想使用CPM-1进行推理,我们建议使用高效推理工具BMI

Tsinghua AI 1.4k Jan 03, 2023
Graph Coloring - Weighted Vertex Coloring Problem

Graph Coloring - Weighted Vertex Coloring Problem This project proposes several local searches and an MCTS algorithm for the weighted vertex coloring

Cyril 1 Jul 08, 2022
PyTorch Implementation of the paper Single Image Texture Translation for Data Augmentation

SITT The repo contains official PyTorch Implementation of the paper Single Image Texture Translation for Data Augmentation. Authors: Boyi Li Yin Cui T

Boyi Li 52 Jan 05, 2023
A list of NLP(Natural Language Processing) tutorials

NLP Tutorial A list of NLP(Natural Language Processing) tutorials built on PyTorch. Table of Contents A step-by-step tutorial on how to implement and

Allen Lee 1.3k Dec 25, 2022
PocketSphinx is a lightweight speech recognition engine, specifically tuned for handheld and mobile devices, though it works equally well on the desktop

molten A minimal, extensible, fast and productive API framework for Python 3. Changelog: https://moltenframework.com/changelog.html Community: https:/

3.2k Dec 28, 2022
A Python wrapper for simple offline real-time dictation (speech-to-text) and speaker-recognition using Vosk.

Simple-Vosk A Python wrapper for simple offline real-time dictation (speech-to-text) and speaker-recognition using Vosk. Check out the official Vosk G

2 Jun 19, 2022
Open-World Entity Segmentation

Open-World Entity Segmentation Project Website Lu Qi*, Jason Kuen*, Yi Wang, Jiuxiang Gu, Hengshuang Zhao, Zhe Lin, Philip Torr, Jiaya Jia This projec

DV Lab 408 Dec 29, 2022
Vad-sli-asr - A Python scripts for a speech processing pipeline with Voice Activity Detection (VAD)

VAD-SLI-ASR Python scripts for a speech processing pipeline with Voice Activity

Dynamics of Language 14 Dec 09, 2022
Outreachy TFX custom component project

Schema Curation Custom Component Outreachy TFX custom component project This repo contains the code for Schema Curation Custom Component made as a par

Robert Crowe 5 Jul 16, 2021
Main repository for the chatbot Bobotinho.

Bobotinho Bot Main repository for the chatbot Bobotinho. ℹ️ Introduction Twitch chatbot with entertainment commands. ‎ 💻 Technologies Concurrent code

Bobotinho 14 Nov 29, 2022
IndoBERTweet is the first large-scale pretrained model for Indonesian Twitter. Published at EMNLP 2021 (main conference)

IndoBERTweet 🐦 🇮🇩 1. Paper Fajri Koto, Jey Han Lau, and Timothy Baldwin. IndoBERTweet: A Pretrained Language Model for Indonesian Twitter with Effe

IndoLEM 40 Nov 30, 2022
This is a MD5 password/passphrase brute force tool

CROWES-PASS-CRACK-TOOl This is a MD5 password/passphrase brute force tool How to install: Do 'git clone https://github.com/CROW31/CROWES-PASS-CRACK-TO

9 Mar 02, 2022
Built for cleaning purposes in military institutions

Ferramenta do AL Construído para fins de limpeza em instituições militares. Instalação Requer python = 3.2 pip install -r requirements.txt Usagem Exe

0 Aug 13, 2022
Gold standard corpus annotated with verb-preverb connections for Hungarian.

Hungarian Preverb Corpus A gold standard corpus manually annotated with verb-preverb connections for Hungarian. corpus The corpus consist of the follo

RIL Lexical Knowledge Representation Research Group 3 Jan 27, 2022
Spooky Skelly For Python

_____ _ _____ _ _ _ | __| ___ ___ ___ | |_ _ _ | __|| |_ ___ | || | _ _ |__ || . || . || . || '

Kur0R1uka 1 Dec 23, 2021
Rank-One Model Editing for Locating and Editing Factual Knowledge in GPT

Rank-One Model Editing (ROME) This repository provides an implementation of Rank-One Model Editing (ROME) on auto-regressive transformers (GPU-only).

Kevin Meng 130 Dec 21, 2022
🤗 Transformers: State-of-the-art Natural Language Processing for Pytorch, TensorFlow, and JAX.

English | 简体中文 | 繁體中文 State-of-the-art Natural Language Processing for Jax, PyTorch and TensorFlow 🤗 Transformers provides thousands of pretrained mo

Hugging Face 77.2k Jan 03, 2023
Multi-Scale Temporal Frequency Convolutional Network With Axial Attention for Speech Enhancement

MTFAA-Net Unofficial PyTorch implementation of Baidu's MTFAA-Net: "Multi-Scale Temporal Frequency Convolutional Network With Axial Attention for Speec

Shimin Zhang 87 Dec 19, 2022
Japanese NLP Library

Japanese NLP Library Back to Home Contents 1 Requirements 1.1 Links 1.2 Install 1.3 History 2 Libraries and Modules 2.1 Tokenize jTokenize.py 2.2 Cabo

Pulkit Kathuria 144 Dec 27, 2022