wav2vec_finetune

Test finetuning of XLSR (multilingual wav2vec 2.0) for other speech classification tasks

Initial test: gender recognition on this dataset.
Finetune for autism detection
[] Clean up directory
[] Make training and evaluation scripts runnable with cmd line / shell scripts
[] Add random noise on training samples
[] Make baseline models

# make virtual env
pip install -r requirements.txt

mkdir data
mkdir preproc_data
mkdir model
cd data
wget https://zenodo.org/record/1219621/files/CaFE_48k.zip?download=1
unzip the file 

python preproc.py
python train.py
python evaluate.py

Updates

11/9: success! Trained a sex classifier on a small dataset that performs soso. Everything seems to work though.

TODO

Chunk audio files - make predictions in batches of e.g. 5 seconds
Set up benchmark models

Resources:

https://github.com/pytorch/fairseq/blob/master/examples/xlmr/README.md
https://arxiv.org/abs/2006.13979
https://huggingface.co/transformers/training.html
https://huggingface.co/blog/fine-tune-xlsr-wav2vec2
https://discuss.huggingface.co/t/german-asr-fine-tuning-wav2vec2/4558/5
https://huggingface.co/docs/datasets/loading_datasets.html#from-local-files
https://github.com/huggingface/transformers/blob/master/examples/research_projects/wav2vec2/FINE_TUNE_XLSR_WAV2VEC2.md
https://github.com/m3hrdadfi/soxan
https://www.zhaw.ch/storage/engineering/institute-zentren/cai/BA21_Speech_Classification_Reiser_Fivian.pdf
https://github.com/DReiser7/w2v_did
https://github.com/ARBML/klaam
https://github.com/talhanai/speech-nlp-datasets

Notes:

Look into SpecAugment for finetuning: https://arxiv.org/abs/1904.08779 (on by default)
How to make the prediction?
- Better way than a small feedforward projection? (LSTM or something?)

Test finetuning of XLSR (multilingual wav2vec 2.0) for other speech classification tasks

Related tags

Overview

wav2vec_finetune

Updates

TODO

Resources:

Notes:

Owner

Ongoing research training transformer language models at scale, including: BERT & GPT-2

[ICCV 2021] Counterfactual Attention Learning for Fine-Grained Visual Categorization and Re-identification

Material for GW4SHM workshop, 16/03/2022.

Interpretable Models for NLP using PyTorch

A simple Streamlit App to classify swahili news into different categories.

A repo for materials relating to the tutorial of CS-332 NLP

This project is part of Eleuther AI's quest to create a massive repository of high quality text data for training language models.

Code for producing Japanese GPT-2 provided by rinna Co., Ltd.

Reformer, the efficient Transformer, in Pytorch

Samantha, A covid-19 information bot which will provide basic information about this pandemic in form of conversation.

Count the frequency of letters or words in a text file and show a graph.

txtai: Build AI-powered semantic search applications in Go

Implementation of Fast Transformer in Pytorch

Download videos from YouTube/Twitch/Twitter right in the Windows Explorer, without installing any shady shareware apps

Torchrecipes provides a set of reproduci-able, re-usable, ready-to-run RECIPES for training different types of models, across multiple domains, on PyTorch Lightning.

Learning to Rewrite for Non-Autoregressive Neural Machine Translation

Silero Models: pre-trained speech-to-text, text-to-speech models and benchmarks made embarrassingly simple

Various Algorithms for Short Text Mining

AllenNLP integration for Shiba: Japanese CANINE model

Lumped-element impedance calculator and frequency-domain plotter.