A minimal Conformer ASR implementation adapted from ESPnet.

Last update: Jan 24, 2022

Related tags

Overview

Conformer ASR

A minimal Conformer ASR implementation adapted from ESPnet.

Introduction

I want to use the pre-trained English ASR model provided by ESPnet. However, ESPnet is relatively heavy for me. So here I try to extract only the conformer ASR part from ESPnet so that I can do better customization. Let's do it.

There are bunch of models available for ASR listed here. I choose the one with name:

kamo-naoyuki/librispeech_asr_train_asr_conformer6_n_fft512_hop_length256_raw_en_bpe5000_scheduler_confwarmup_steps40000_optim_conflr0.0025_sp_valid.acc.ave

Its performance can be found [here](https://zenodo.org/record/4604066#.YbxsX5FByV4), toggle me to see.

dataset	Snt	Wrd	Corr	Sub	Del	Ins	Err	S.Err
decode_asr_asr_model_valid.acc.ave/dev_clean	2703	54402	97.9	1.9	0.2	0.2	2.3	28.6
decode_asr_asr_model_valid.acc.ave/dev_other	2864	50948	94.5	5.1	0.5	0.6	6.1	48.3
decode_asr_asr_model_valid.acc.ave/test_clean	2620	52576	97.7	2.1	0.2	0.3	2.6	31.4
decode_asr_asr_model_valid.acc.ave/test_other	2939	52343	94.7	4.9	0.5	0.7	6.0	49.0
decode_asr_lm_lm_train_lm_transformer2_bpe5000_scheduler_confwarmup_steps25000_batch_bins500000000_accum_grad2_use_amptrue_valid.loss.ave_asr_model_valid.acc.ave/dev_clean	2703	54402	98.3	1.5	0.2	0.2	1.9	25.2
decode_asr_lm_lm_train_lm_transformer2_bpe5000_scheduler_confwarmup_steps25000_batch_bins500000000_accum_grad2_use_amptrue_valid.loss.ave_asr_model_valid.acc.ave/dev_other	2864	50948	95.8	3.7	0.4	0.5	4.6	40.0
decode_asr_lm_lm_train_lm_transformer2_bpe5000_scheduler_confwarmup_steps25000_batch_bins500000000_accum_grad2_use_amptrue_valid.loss.ave_asr_model_valid.acc.ave/test_clean	2620	52576	98.1	1.7	0.2	0.3	2.1	26.2
decode_asr_lm_lm_train_lm_transformer2_bpe5000_scheduler_confwarmup_steps25000_batch_bins500000000_accum_grad2_use_amptrue_valid.loss.ave_asr_model_valid.acc.ave/test_other	2939	52343	95.8	3.7	0.5	0.5	4.7	42.4

dataset	Snt	Wrd	Corr	Sub	Del	Ins	Err	S.Err
decode_asr_asr_model_valid.acc.ave/dev_clean	2703	288456	99.4	0.3	0.2	0.2	0.8	28.6
decode_asr_asr_model_valid.acc.ave/dev_other	2864	265951	98.0	1.2	0.8	0.7	2.7	48.3
decode_asr_asr_model_valid.acc.ave/test_clean	2620	281530	99.4	0.3	0.3	0.3	0.9	31.4
decode_asr_asr_model_valid.acc.ave/test_other	2939	272758	98.2	1.0	0.7	0.7	2.5	49.0
decode_asr_lm_lm_train_lm_transformer2_bpe5000_scheduler_confwarmup_steps25000_batch_bins500000000_accum_grad2_use_amptrue_valid.loss.ave_asr_model_valid.acc.ave/dev_clean	2703	288456	99.5	0.3	0.2	0.2	0.7	25.2
decode_asr_lm_lm_train_lm_transformer2_bpe5000_scheduler_confwarmup_steps25000_batch_bins500000000_accum_grad2_use_amptrue_valid.loss.ave_asr_model_valid.acc.ave/dev_other	2864	265951	98.3	1.0	0.7	0.5	2.2	40.0
decode_asr_lm_lm_train_lm_transformer2_bpe5000_scheduler_confwarmup_steps25000_batch_bins500000000_accum_grad2_use_amptrue_valid.loss.ave_asr_model_valid.acc.ave/test_clean	2620	281530	99.5	0.3	0.3	0.2	0.7	26.2
decode_asr_lm_lm_train_lm_transformer2_bpe5000_scheduler_confwarmup_steps25000_batch_bins500000000_accum_grad2_use_amptrue_valid.loss.ave_asr_model_valid.acc.ave/test_other	2939	272758	98.5	0.8	0.7	0.5	2.1	42.4

dataset	Snt	Wrd	Corr	Sub	Del	Ins	Err	S.Err
decode_asr_asr_model_valid.acc.ave/dev_clean	2703	68010	97.5	1.9	0.7	0.4	2.9	28.6
decode_asr_asr_model_valid.acc.ave/dev_other	2864	63110	93.4	5.0	1.6	1.0	7.6	48.3
decode_asr_asr_model_valid.acc.ave/test_clean	2620	65818	97.2	2.0	0.8	0.4	3.3	31.4
decode_asr_asr_model_valid.acc.ave/test_other	2939	65101	93.7	4.5	1.8	0.9	7.2	49.0
decode_asr_lm_lm_train_lm_transformer2_bpe5000_scheduler_confwarmup_steps25000_batch_bins500000000_accum_grad2_use_amptrue_valid.loss.ave_asr_model_valid.acc.ave/dev_clean	2703	68010	97.8	1.5	0.7	0.3	2.5	25.2
decode_asr_lm_lm_train_lm_transformer2_bpe5000_scheduler_confwarmup_steps25000_batch_bins500000000_accum_grad2_use_amptrue_valid.loss.ave_asr_model_valid.acc.ave/dev_other	2864	63110	94.6	3.8	1.6	0.7	6.1	40.0
decode_asr_lm_lm_train_lm_transformer2_bpe5000_scheduler_confwarmup_steps25000_batch_bins500000000_accum_grad2_use_amptrue_valid.loss.ave_asr_model_valid.acc.ave/test_clean	2620	65818	97.6	1.6	0.8	0.3	2.7	26.2
decode_asr_lm_lm_train_lm_transformer2_bpe5000_scheduler_confwarmup_steps25000_batch_bins500000000_accum_grad2_use_amptrue_valid.loss.ave_asr_model_valid.acc.ave/test_other	2939	65101	94.7	3.5	1.8	0.7	6.0	42.4

ASR step by step

1. Setup code

pip install .

2. Download the model and unzip it

wget https://zenodo.org/record/4604066/files/asr_train_asr_conformer6_n_fft512_hop_length256_raw_en_bpe5000_scheduler_confwarmup_steps40000_optim_conflr0.0025_sp_valid.acc.ave.zip?download=1 -o conformer.zip
unzip conformer.zip

3. Run an example

import torch
import librosa
from mmds.utils.spectrogram import MelSpectrogram
from conformer_asr import Conformer, Tokenizer

sample_rate = 16000
cfg_path = "./exp_unnorm/asr_train_asr_conformer6_n_fft512_hop_length256_raw_en_unnorm_bpe5000/config.yaml"
bpe_path = "./data/en_unnorm_token_list/bpe_unigram5000/bpe.model"
ckpt_path = "./exp_unnorm/asr_train_asr_conformer6_n_fft512_hop_length256_raw_en_unnorm_bpe5000/valid.acc.ave_10best.pth"

tokenizer = Tokenizer(cfg_path, bpe_path)
conformer = Conformer(tokenizer, ckpt_path=ckpt_path)
conformer.eval()

spec_fn = MelSpectrogram(
    sample_rate,
    hop_length=256,
    f_min=0,
    f_max=8000,
    win_length=512,
    power=2,
)

w0, _ = librosa.load("./example.m4a", sample_rate)
w0 = torch.from_numpy(w0)
m0 = spec_fn(w0).t()

l = len(m0)

# create batch with different length audio (yes, supported)
x = [m0, m0[: l // 2], m0[: l // 4]]

ref = "This is a test video for youtube-dl. For more information, contact [email protected]".lower()
hyps = conformer.decode(x, beam_width=20)

print("REF", ref)
for hyp in hyps:
    print("HYP", hyp.lower())

Results

REF this is a test video for youtube-dl. for more information, contact [email protected]
HYP this is a test video for you do bl for more information -- contact the hih aging at the hihaging, not the
HYP this is a test for you d bl for more information
HYP this is a testim for you to

A minimal Conformer ASR implementation adapted from ESPnet.

Related tags

Overview

Conformer ASR

Introduction

ASR step by step

1. Setup code

2. Download the model and unzip it

3. Run an example

Features

Supported

Not supported yet

Owner

Niu Zhe

Official implementation of MLP Singer: Towards Rapid Parallel Korean Singing Voice Synthesis

MRC approach for Aspect-based Sentiment Analysis (ABSA)

[EMNLP 2021] Mirror-BERT: Converting Pretrained Language Models to universal text encoders without labels.

Compute distance between sequences. 30+ algorithms, pure python implementation, common interface, optional external libs usage.

Code for the paper TestRank: Bringing Order into Unlabeled Test Instances for Deep Learning Tasks

Ongoing research training transformer language models at scale, including: BERT & GPT-2

Get list of common stop words in various languages in Python

🕹 An esoteric language designed so that the program looks like the transcript of a Pokémon battle

This repository contains helper functions which can help you generate additional data points depending on your NLP task.

Official PyTorch implementation of SegFormer

A framework for cleaning Chinese dialog data

Simple program that translates the name of files into English

AI and Machine Learning workflows on Anthos Bare Metal.

A notebook that shows how to import the IITB English-Hindi Parallel Corpus from the HuggingFace datasets repository

BiNE: Bipartite Network Embedding

The swas programming language

Pytorch code for ICRA'21 paper: "Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation"

We have built a Voice based Personal Assistant for people to access files hands free in their device using natural language processing.

Code for "Parallel Instance Query Network for Named Entity Recognition", accepted at ACL 2022.

Shirt Bot is a discord bot which uses GPT-3 to generate text