GSoC'2021 | TensorFlow implementation of Wav2Vec2

Overview

GSoC

This repository presents an implementation of the Wav2Vec2 model [1] in TensorFlow 2.0 as a part of Google Summer of Code.

For a quick demo, please check out this. Final report of the project can be found here.

Notebooks

The repository comes with shiny Colab Notebooks. Below you can find a list of them. Spin them up and don't forget to have fun!

Notebook Description
Open In Colab This notebook gives you a template to fine-tune a pre-trained Wav2Vec2 SavedModel
Open In Colab This notebook demonstrates conversion of TF Wav2Vec2 model to ONNX and compares the latency of ONNX exported model & TF model on CPU
Open In Colab This notebook demonstrates Wav2Vec2 evaluation (without any padding) on LibriSpeech data
Open In Colab This notebook demonstrates Wav2Vec2 SavedModel evaluation (with constant padding upto 246000 length) on LibriSpeech data
Open In Colab This notebook shows a small demo of how to use Wav2Vec2 for inference for ASR task

Checkpoints

Below is a summary of checkpoints obtained during the project:

🤗 Hub Checkpoint TFHub SavedModel Description
gsoc-wav2vec2 wav2vec2 This checkpoint is TensorFlow's equivalent of pre-trained Wav2Vec2 by Facebook. PyTorch weights are converted into TensorFlow using convert_torch_to_tf.py
gsoc-wav2vec2-960h wav2vec2-960h This checkpoint is TensorFlow's equivalent of fine-tuned Wav2Vec2 by Facebook. PyTorch weights are converted into TensorFlow using convert_torch_to_tf.py
finetuned-wav2vec2-960h - This checkpoint is obtained by fine-tuning Wav2Vec2 model on 960h of LibriSpeech dataset during my GSoC tenure. You can reproduce training by running main.py on TPU v3-8

To know more about the process of obtaining the first two checkpoints, please check out this section and to know about the process of obtaining the last checkpoint, please check out this section.

Using this Repository

Wav2Vec2 model from this repository can be installed using the pip command:

# this will install the wav2vec2 package
pip3 install git+https://github.com/vasudevgupta7/[email protected]

You can use the fine-tuned checkpoints (from 🤗 Hub) like this:

from wav2vec2 import Wav2Vec2ForCTC, Wav2Vec2Config

config = Wav2Vec2Config()
model = Wav2Vec2ForCTC(config)
# now use this model like any other TF model

# incase you are interested in already trained model, use `.from_pretrained` method
model_id = "finetuned-wav2vec2-960h"
model = Wav2Vec2ForCTC.from_pretrained(model_id)

Additionally, you can use the SavedModel from TFHub like this:

import tensorflow_hub as hub

model_url = "https://tfhub.dev/vasudevgupta7/wav2vec2-960h/1"
model = hub.KerasLayer(model_url)

# use this `model`, just like any other TF SavedModel

Please checkout the notebooks referred to in this repository for more information on how to use the Wav2Vec2 model.

Reproducing this project

Setting Up

# install & setup TensorFlow first
pip3 install tensorflow

# install other requirements of this project using the following command:
pip3 install -qr requirements.txt
sudo apt-get install libsndfile1-dev

# switch to code directory for further steps
cd src

For using TPUs, it's important to store model weights and datasets in the GCS bucket so that TPU can access them directly from there. Hence we will create 2 GCS buckets - one for checkpointing and the other for storing LibriSpeech tfrecords.

# these bucket names will be required to run the training script later
export DATA_BUCKET_NAME="gsoc-librispeech-us"
export CKPT_BUCKET_NAME="gsoc-checkpoints-us"

# create GCS buckets
gsutil mb gs://${DATA_BUCKET_NAME}
gsutil mb gs://${CKPT_BUCKET_NAME}

Preparing dataset

Now we will download the LibriSpeech dataset from the official website & convert them into tfrecords using make_tfrecords.py. Finally, we will export all the tfrecords to the GCS bucket.

# possible values are `dev-clean`, `train-clean-100`, `train-clean-360`, `train-other-500`, `test-clean`
# you will have to follow same steps for all the configurations (specified above).
export DATA_SPLIT=dev-clean

wget https://www.openslr.org/resources/12/${DATA_SPLIT}.tar.gz
tar -xf ${DATA_SPLIT}.tar.gz

python3 make_tfrecords.py --data_dir LibriSpeech/${DATA_SPLIT} -d ${DATA_SPLIT} -n 50

# transfer tfrecords to GCS bucket
gsutil cp -r ${DATA_SPLIT} gs://<DATA_BUCKET_NAME>/${DATA_SPLIT}

Now your GCS bucket (DATA_BUCKET_NAME) should look like this:

.
|- ${DATA_SPLIT}
    |- ${DATA_SPLIT}-0.tfrecord
    |- ${DATA_SPLIT}-1.tfrecord
    .
    .

Follow the above steps for all other data splits. You just need to change the DATA_SPLIT environment variable.

Model training

Now since everything is installed and GCS buckets are configured, we just need to run one command to initiate training.

Note: Following commands assumes that you have exported DATA_BUCKET_NAME & CKPT_BUCKET_NAME environment variables already.

The following command will fine-tune the wav2vec2 model on single/multiple GPUs or Colab/Kaggle TPUs:

python3 main.py

For training on Cloud TPUs, run the following command:

# export `TPU_NAME` environment variable first
# this flag will ensure that your VM connects to the specified TPUs & TPUs become visible to TensorFlow
TPU_NAME=<tpu-name> python3 main.py

Running Conversion script

Original PyTorch checkpoints (from Facebook) can be converted using the conversion script available in this repository.

python3 convert_torch_to_tf.py \
--hf_model_id facebook/wav2vec2-base \ # HuggingFace Hub ID of the model you want to convert
--with_lm_head # Whether to use `Wav2Vec2ForCTC` or `Wav2Vec2Model` from this repository

Running tests

# first install `torch` & `transformers`
pip3 install torch transformers

# run this from the root of this repository
pytest -sv tests

Acknowledgement

References

[1] Baevski, Alexei, et al. “Wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations.” ArXiv:2006.11477 [Cs, Eess], Oct. 2020. arXiv.org, http://arxiv.org/abs/2006.11477.

End Notes

Please create an issue in case you encountered any issues while using this project. Don't forget to 🌟 this repository if you liked my work.

Owner
Vasudev Gupta
Open Source @huggingface, @tensorflow | Interested in Speech & Text
Vasudev Gupta
Behavioral Testing of Clinical NLP Models

Behavioral Testing of Clinical NLP Models This repository contains code for testing the behavior of clinical prediction models based on patient letter

Betty van Aken 2 Sep 20, 2022
Code for Discovering Topics in Long-tailed Corpora with Causal Intervention.

Code for Discovering Topics in Long-tailed Corpora with Causal Intervention ACL2021 Findings Usage 0. Prepare environment Requirements: python==3.6 te

Xiaobao Wu 8 Dec 16, 2022
Russian words synonyms and antonyms

ru_synonyms Russian words synonyms and antonyms. Install pip install git+https://github.com/ahmados/rusynonyms.git Usage from ru_synonyms import Anto

sumekenov 7 Dec 14, 2022
An implementation of WaveNet with fast generation

pytorch-wavenet This is an implementation of the WaveNet architecture, as described in the original paper. Features Automatic creation of a dataset (t

Vincent Herrmann 858 Dec 27, 2022
GrammarTagger — A Neural Multilingual Grammar Profiler for Language Learning

GrammarTagger — A Neural Multilingual Grammar Profiler for Language Learning GrammarTagger is an open-source toolkit for grammatical profiling for lan

Octanove Labs 27 Jan 05, 2023
A linter to manage all your python exceptions and try/except blocks (limited only for those who like dinosaurs).

Manage your exceptions in Python like a PRO Currently in BETA. Inspired by this blog post. I shared the building process of this tool here. “For those

Guilherme Latrova 353 Dec 31, 2022
The repository for the paper: Multilingual Translation via Grafting Pre-trained Language Models

Graformer The repository for the paper: Multilingual Translation via Grafting Pre-trained Language Models Graformer (also named BridgeTransformer in t

22 Dec 14, 2022
An easy-to-use Python module that helps you to extract the BERT embeddings for a large text dataset (Bengali/English) efficiently.

An easy-to-use Python module that helps you to extract the BERT embeddings for a large text dataset (Bengali/English) efficiently.

Khalid Saifullah 37 Sep 05, 2022
A crowdsourced dataset of dialogues grounded in social contexts involving utilization of commonsense.

A crowdsourced dataset of dialogues grounded in social contexts involving utilization of commonsense.

Alexa 62 Dec 20, 2022
Nystromformer: A Nystrom-based Algorithm for Approximating Self-Attention

Nystromformer: A Nystrom-based Algorithm for Approximating Self-Attention April 6, 2021 We extended segment-means to compute landmarks without requiri

Zhanpeng Zeng 322 Jan 01, 2023
ChatBotProyect - This is an unfinished project about a simple chatbot.

chatBotProyect This is an unfinished project about a simple chatbot. (union_todo.ipynb) Reminders for the project: Find why one of the vectorizers fai

Tomás 0 Jul 24, 2022
🤗 Transformers: State-of-the-art Natural Language Processing for Pytorch, TensorFlow, and JAX.

English | 简体中文 | 繁體中文 State-of-the-art Natural Language Processing for Jax, PyTorch and TensorFlow 🤗 Transformers provides thousands of pretrained mo

Hugging Face 77.2k Jan 03, 2023
Pre-Training with Whole Word Masking for Chinese BERT

Pre-Training with Whole Word Masking for Chinese BERT

Yiming Cui 7.7k Dec 31, 2022
a CTF web challenge about making screenshots

screenshotter (web) A CTF web challenge about making screenshots. It is inspired by a bug found in real life. The challenge was created by @LiveOverfl

219 Jan 02, 2023
Research code for "What to Pre-Train on? Efficient Intermediate Task Selection", EMNLP 2021

efficient-task-transfer This repository contains code for the experiments in our paper "What to Pre-Train on? Efficient Intermediate Task Selection".

AdapterHub 26 Dec 24, 2022
Coreference resolution for English, French, German and Polish, optimised for limited training data and easily extensible for further languages

Coreferee Author: Richard Paul Hudson, Explosion AI 1. Introduction 1.1 The basic idea 1.2 Getting started 1.2.1 English 1.2.2 French 1.2.3 German 1.2

Explosion 70 Dec 12, 2022
An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"

The implementation of paper CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval. CLIP4Clip is a video-text retrieval model based

ArrowLuo 456 Jan 06, 2023
Code for the paper "VisualBERT: A Simple and Performant Baseline for Vision and Language"

This repository contains code for the following two papers: VisualBERT: A Simple and Performant Baseline for Vision and Language (arxiv) with a short

Natural Language Processing @UCLA 464 Jan 04, 2023
Machine learning models from Singapore's NLP research community

SG-NLP Machine learning models from Singapore's natural language processing (NLP) research community. sgnlp is a Python package that allows you to eas

AI Singapore | AI Makerspace 21 Dec 17, 2022
ChatterBot is a machine learning, conversational dialog engine for creating chat bots

ChatterBot ChatterBot is a machine-learning based conversational dialog engine build in Python which makes it possible to generate responses based on

Gunther Cox 12.8k Jan 03, 2023