Transformers Wav2Vec2 + Parlance's CTCDecodeTransformers Wav2Vec2 + Parlance's CTCDecode

Last update: Jul 21, 2022

Overview

🤗 Transformers Wav2Vec2 + Parlance's CTCDecode

Introduction

This repo shows how 🤗 Transformers can be used in combination with Parlance's ctcdecode & KenLM ngram as a simple way to boost word error rate (WER).

Included is a file to create an ngram with KenLM as well as a simple evaluation script to compare the results of using Wav2Vec2 with ctcdecode + KenLM vs. without using any language model.

Note: The scripts are written to be used on GPU. If you want to use a CPU instead, simply remove all .to("cuda") occurances in eval.py.

Installation

In a first step, one should install KenLM. For Ubuntu, it should be enough to follow the installation steps described here. The installed kenlm folder should be move into this repo for ./create_ngram.py to function correctly. Alternatively, one can also link the lmplz binary file to a lmplz bash command to directly run lmplz instead of ./kenlm/build/bin/lmplz.

Next, some Python dependencies should be installed. Assuming PyTorch is installed, it should be sufficient to run pip install -r requirements.txt.

Run evaluation

Create ngram

In a first step on should create a ngram. E.g. for polish the command would be:

./create_ngram.py --language polish --path_to_ngram polish.arpa

After the language model is created, one should open the file. one should add a The file should have a structure which looks more or less as follows:

\data\        
ngram 1=86586
ngram 2=546387
ngram 3=796581           
ngram 4=843999             
ngram 5=850874              
                                                  
\1-grams:
-5.7532206      
   
       0
0       
         -0.06677356                                                                            
-3.4645514      drugi   -0.2088903
...

~~Now it is very important also add a~~ token to the n-gram so that it can be correctly loaded. You can simple copy the line:

0 -0.06677356

and change to . When doing this you should also inclease ngram by 1. The new ngram should look as follows:

\data\ ngram 1=86587 ngram 2=546387 ngram 3=796581 ngram 4=843999 ngram 5=850874 \1-grams: -5.7532206 0 0 -0.06677356 0 -0.06677356 -3.4645514 drugi -0.2088903 ...

Now the ngram can be correctly used with pyctcdecode

Run eval

Having created the ngram, one can run:

./eval.py --language polish --path_to_ngram polish.arpa

To compare Wav2Vec2 + LM vs. Wav2Vec2 + No LM on polish.

Results

==================================================polish================================================== polish - No LM - | WER: 0.3069742867206763 | CER: 0.06054530156286364 | Time: 32.37423086166382 polish - With LM - | WER: 0.39526828695550076 | CER: 0.17596985266474516 | Time: 62.017329692840576

I didn't obtain any good results even when trying out a variety of different settings for alpha and beta. Sadly there aren't many examples, tutorials or docs on parlance/ctcdecode so it's hard to find the reason for the problem.

Also tried it out for other languages like Portuguese and Spanish, but no luck there either.

Transformers Wav2Vec2 + Parlance's CTCDecodeTransformers Wav2Vec2 + Parlance's CTCDecode

Related tags

Overview

🤗 Transformers Wav2Vec2 + Parlance's CTCDecode

Introduction

Installation

Run evaluation

Create ngram

Run eval

Results

Owner

Patrick von Platen

Rhyme with AI

Code for Emergent Translation in Multi-Agent Communication

Utilizing RBERT model for KLUE Relation Extraction task

Data loaders and abstractions for text and NLP

Fastseq 基于ONNXRUNTIME的文本生成加速框架

An IVR Chatbot which can exponentially reduce the burden of companies as well as can improve the consumer/end user experience.

Finds snippets in iambic pentameter in English-language text and tries to combine them to a rhyming sonnet.

Generating new names based on trends in data using GPT2 (Transformer network)

Spert NLP Relation Extraction API deployed with torchserve for inference

Tools for curating biomedical training data for large-scale language modeling

A deep learning-based translation library built on Huggingface transformers

The projects lets you extract glossary words and their definitions from a given piece of text automatically using NLP techniques

Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning

Explore different way to mix speech model(wav2vec2, hubert) and nlp model(BART,T5,GPT) together

EMNLP'2021: Can Language Models be Biomedical Knowledge Bases?

Translate - a PyTorch Language Library

Automatic privilege escalation for misconfigured capabilities, sudo and suid binaries

Code repository for "It's About Time: Analog clock Reading in the Wild"

customer care chatbot made with Rasa Open Source.

easySpeech is an open-source Python wrapper for google speech to text API that doesn't require PyAudio(So you especially windows user don't have to deal with the errors while installing PyAudio) and also works with hugging face transformers