GSoC'2021 | TensorFlow implementation of Wav2Vec2

Overview

GSoC

This repository presents an implementation of the Wav2Vec2 model [1] in TensorFlow 2.0 as a part of Google Summer of Code.

For a quick demo, please check out this. Final report of the project can be found here.

Notebooks

The repository comes with shiny Colab Notebooks. Below you can find a list of them. Spin them up and don't forget to have fun!

Notebook Description
Open In Colab This notebook gives you a template to fine-tune a pre-trained Wav2Vec2 SavedModel
Open In Colab This notebook demonstrates conversion of TF Wav2Vec2 model to ONNX and compares the latency of ONNX exported model & TF model on CPU
Open In Colab This notebook demonstrates Wav2Vec2 evaluation (without any padding) on LibriSpeech data
Open In Colab This notebook demonstrates Wav2Vec2 SavedModel evaluation (with constant padding upto 246000 length) on LibriSpeech data
Open In Colab This notebook shows a small demo of how to use Wav2Vec2 for inference for ASR task

Checkpoints

Below is a summary of checkpoints obtained during the project:

ЁЯдЧ Hub Checkpoint TFHub SavedModel Description
gsoc-wav2vec2 wav2vec2 This checkpoint is TensorFlow's equivalent of pre-trained Wav2Vec2 by Facebook. PyTorch weights are converted into TensorFlow using convert_torch_to_tf.py
gsoc-wav2vec2-960h wav2vec2-960h This checkpoint is TensorFlow's equivalent of fine-tuned Wav2Vec2 by Facebook. PyTorch weights are converted into TensorFlow using convert_torch_to_tf.py
finetuned-wav2vec2-960h - This checkpoint is obtained by fine-tuning Wav2Vec2 model on 960h of LibriSpeech dataset during my GSoC tenure. You can reproduce training by running main.py on TPU v3-8

To know more about the process of obtaining the first two checkpoints, please check out this section and to know about the process of obtaining the last checkpoint, please check out this section.

Using this Repository

Wav2Vec2 model from this repository can be installed using the pip command:

# this will install the wav2vec2 package
pip3 install git+https://github.com/vasudevgupta7/[email protected]

You can use the fine-tuned checkpoints (from ЁЯдЧ Hub) like this:

from wav2vec2 import Wav2Vec2ForCTC, Wav2Vec2Config

config = Wav2Vec2Config()
model = Wav2Vec2ForCTC(config)
# now use this model like any other TF model

# incase you are interested in already trained model, use `.from_pretrained` method
model_id = "finetuned-wav2vec2-960h"
model = Wav2Vec2ForCTC.from_pretrained(model_id)

Additionally, you can use the SavedModel from TFHub like this:

import tensorflow_hub as hub

model_url = "https://tfhub.dev/vasudevgupta7/wav2vec2-960h/1"
model = hub.KerasLayer(model_url)

# use this `model`, just like any other TF SavedModel

Please checkout the notebooks referred to in this repository for more information on how to use the Wav2Vec2 model.

Reproducing this project

Setting Up

# install & setup TensorFlow first
pip3 install tensorflow

# install other requirements of this project using the following command:
pip3 install -qr requirements.txt
sudo apt-get install libsndfile1-dev

# switch to code directory for further steps
cd src

For using TPUs, it's important to store model weights and datasets in the GCS bucket so that TPU can access them directly from there. Hence we will create 2 GCS buckets - one for checkpointing and the other for storing LibriSpeech tfrecords.

# these bucket names will be required to run the training script later
export DATA_BUCKET_NAME="gsoc-librispeech-us"
export CKPT_BUCKET_NAME="gsoc-checkpoints-us"

# create GCS buckets
gsutil mb gs://${DATA_BUCKET_NAME}
gsutil mb gs://${CKPT_BUCKET_NAME}

Preparing dataset

Now we will download the LibriSpeech dataset from the official website & convert them into tfrecords using make_tfrecords.py. Finally, we will export all the tfrecords to the GCS bucket.

# possible values are `dev-clean`, `train-clean-100`, `train-clean-360`, `train-other-500`, `test-clean`
# you will have to follow same steps for all the configurations (specified above).
export DATA_SPLIT=dev-clean

wget https://www.openslr.org/resources/12/${DATA_SPLIT}.tar.gz
tar -xf ${DATA_SPLIT}.tar.gz

python3 make_tfrecords.py --data_dir LibriSpeech/${DATA_SPLIT} -d ${DATA_SPLIT} -n 50

# transfer tfrecords to GCS bucket
gsutil cp -r ${DATA_SPLIT} gs://<DATA_BUCKET_NAME>/${DATA_SPLIT}

Now your GCS bucket (DATA_BUCKET_NAME) should look like this:

.
|- ${DATA_SPLIT}
    |- ${DATA_SPLIT}-0.tfrecord
    |- ${DATA_SPLIT}-1.tfrecord
    .
    .

Follow the above steps for all other data splits. You just need to change the DATA_SPLIT environment variable.

Model training

Now since everything is installed and GCS buckets are configured, we just need to run one command to initiate training.

Note: Following commands assumes that you have exported DATA_BUCKET_NAME & CKPT_BUCKET_NAME environment variables already.

The following command will fine-tune the wav2vec2 model on single/multiple GPUs or Colab/Kaggle TPUs:

python3 main.py

For training on Cloud TPUs, run the following command:

# export `TPU_NAME` environment variable first
# this flag will ensure that your VM connects to the specified TPUs & TPUs become visible to TensorFlow
TPU_NAME=<tpu-name> python3 main.py

Running Conversion script

Original PyTorch checkpoints (from Facebook) can be converted using the conversion script available in this repository.

python3 convert_torch_to_tf.py \
--hf_model_id facebook/wav2vec2-base \ # HuggingFace Hub ID of the model you want to convert
--with_lm_head # Whether to use `Wav2Vec2ForCTC` or `Wav2Vec2Model` from this repository

Running tests

# first install `torch` & `transformers`
pip3 install torch transformers

# run this from the root of this repository
pytest -sv tests

Acknowledgement

References

[1] Baevski, Alexei, et al. тАЬWav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations.тАЭ ArXiv:2006.11477 [Cs, Eess], Oct. 2020. arXiv.org, http://arxiv.org/abs/2006.11477.

End Notes

Please create an issue in case you encountered any issues while using this project. Don't forget to ЁЯМЯ this repository if you liked my work.

Owner
Vasudev Gupta
Open Source @huggingface, @tensorflow | Interested in Speech & Text
Vasudev Gupta
BookNLP, a natural language processing pipeline for books

BookNLP BookNLP is a natural language processing pipeline that scales to books and other long documents (in English), including: Part-of-speech taggin

654 Jan 02, 2023
189 Jan 02, 2023
Samantha, A covid-19 information bot which will provide basic information about this pandemic in form of conversation.

Covid-19-BOT Samantha, A covid-19 information bot which will provide basic information about this pandemic in form of conversation. This bot uses torc

Neeraj Majhi 2 Nov 05, 2021
LV-BERT: Exploiting Layer Variety for BERT (Findings of ACL 2021)

LV-BERT Introduction In this repo, we introduce LV-BERT by exploiting layer variety for BERT. For detailed description and experimental results, pleas

Weihao Yu 14 Aug 24, 2022
Semi-automated vocabulary generation from semantic vector models

vec2word Semi-automated vocabulary generation from semantic vector models This script generates a list of potential conlang word forms along with asso

9 Nov 25, 2022
A text file containing 479k English words for all your dictionary/word-based projects e.g: auto-completion / autosuggestion

List Of English Words A text file containing over 466k English words. While searching for a list of english words (for an auto-complete tutorial) I fo

dwyl 8.5k Jan 03, 2023
Open source annotation tool for machine learning practitioners.

doccano doccano is an open source text annotation tool for humans. It provides annotation features for text classification, sequence labeling and sequ

7.1k Jan 01, 2023
Auto-researching tool generating word documents.

About ResearchTE automates researching by generating document with answers to given questions. Supports getting results from: Google DuckDuckGo (with

1 Feb 14, 2022
BPEmb is a collection of pre-trained subword embeddings in 275 languages, based on Byte-Pair Encoding (BPE) and trained on Wikipedia.

BPEmb is a collection of pre-trained subword embeddings in 275 languages, based on Byte-Pair Encoding (BPE) and trained on Wikipedia. Its intended use is as input for neural models in natural languag

Benjamin Heinzerling 1.1k Jan 03, 2023
Text-to-Speech for Belarusian language

title emoji colorFrom colorTo sdk app_file pinned Belarusian TTS ЁЯР╕ green green gradio app.py false Belarusian TTS ЁЯУв ЁЯдЦ Belarusian TTS (text-to-speec

Yurii Paniv 1 Nov 27, 2021
:mag: Transformers at scale for question answering & neural search. Using NLP via a modular Retriever-Reader-Pipeline. Supporting DPR, Elasticsearch, HuggingFace's Modelhub...

Haystack is an end-to-end framework that enables you to build powerful and production-ready pipelines for different search use cases. Whether you want

deepset 6.4k Jan 09, 2023
рдорд░рд╛рдареА рднрд╛рд╖рд╛ рд╡рд╛рдЪрд╡рд┐рдгреНрдпрд╛рдЪрд╛ рдПрдХ рдкреНрд░рдпрд╛рд╕. рдЗрдВрдЧреНрд░рдЬреА рддреЗ рдорд░рд╛рдареАрдЪрд╛ рд╢рдмреНрджрдХреЛрд╢. An attempt to preserve the Marathi language. A lightweight and ad free English to Marathi thesaurus.

For English, scroll down рдорд░рд╛рдареА рд╢рдмреНрдж рдорд░рд╛рдареА рднрд╛рд╖рд╛ рд╡рд╛рдЪрд╡рдгреНрдпрд╛рд╕рд╛рдареА рдореА рд╣рд╛ рдУрдкрди рд╕реЛрд░реНрд╕ рдкреНрд░реЛрдЬреЗрдХреНрдЯ рд╕реБрд░реВ рдХреЗрд▓рд╛ рдЖрд╣реЗ. рдорд╛рдЭреНрдпрд╛ рдорддреЗ, рдЖрдкрд▓реА рднрд╛рд╖рд╛ рд╣рд│реВрд╣рд│реВ рдЖрдгрд┐ рдХреЛрдгрд╛рдЪрд╛рд╣реА рд▓рдХреНрд╖рд╛рдд

рдореБрдХреНрдд рд╕реНрддреНрд░реЛрдд 20 Oct 11, 2022
Snowball compiler and stemming algorithms

Snowball is a small string processing language for creating stemming algorithms for use in Information Retrieval, plus a collection of stemming algori

Snowball Stemming language and algorithms 613 Jan 07, 2023
A very simple framework for state-of-the-art Natural Language Processing (NLP)

A very simple framework for state-of-the-art NLP. Developed by Humboldt University of Berlin and friends. Flair is: A powerful NLP library. Flair allo

flair 12.3k Jan 02, 2023
A programming language with logic of Python, and syntax of all languages.

Pytov The idea was to take all well known syntaxes, and combine them into one programming language with many posabilities. Installation Install using

Yuval Rosen 14 Dec 07, 2022
SAINT PyTorch implementation

SAINT-pytorch A Simple pyTorch implementation of "Towards an Appropriate Query, Key, and Value Computation for Knowledge Tracing" based on https://arx

Arshad Shaikh 63 Dec 25, 2022
Spokestack is a library that allows a user to easily incorporate a voice interface into any Python application with a focus on embedded systems.

Welcome to Spokestack Python! This library is intended for developing voice interfaces in Python. This can include anything from Raspberry Pi applicat

Spokestack 133 Sep 20, 2022
DiY Oxygen Concentrator based on the OxiKit

M19O2 DiY Oxygen Concentrator based on / inspired by the OxiKit, OpenOx, Marut, RepRap and Project Apollo platforms. About Read about the project on H

Maker's Asylum 62 Dec 22, 2022
Pytorch NLP library based on FastAI

Quick NLP Quick NLP is a deep learning nlp library inspired by the fast.ai library It follows the same api as fastai and extends it allowing for quick

Agis pof 283 Nov 21, 2022
The ibet-Prime security token management system for ibet network.

ibet-Prime The ibet-Prime security token management system for ibet network. Features ibet-Prime is an API service that enables the issuance and manag

BOOSTRY 8 Dec 22, 2022