In this repository, I have developed an end to end Automatic speech recognition project. I have developed the neural network model for automatic speech recognition with PyTorch and used MLflow to manage the ML lifecycle, including experimentation, reproducibility, deployment, and a central model registry.

Overview

End to End Automatic Speech Recognition

architecture

In this repository, I have developed an end to end Automatic speech recognition project. I have developed the neural network model for automatic speech recognition with PyTorch and used MLflow to manage the ML lifecycle, including experimentation, reproducibility, deployment, and a central model registry. The Neural Acoustic model is built with reference to the DeepSpeech2 model, but not the exact DeepSpeach2 model or the DeepSpeech model as mentioned in their respective research papers.

Technologies used:

  1. MLflow.mlflow
    • to manage the ML lifecycle.
    • to track and compare model performance in the ml lifecyle.
    • experimentation, reproducibility, deployment, and a central model registry.
  2. Pytorch.pytorch
    • The Acoustic Neural Network is implemented with pytorch.
    • torchaudio for feature extraction and data pre-processing.

Speech Recognition Pipeline

architecture1

Dataset

In this project, the LibriSpeech dataset has been used to train and validate the model. It has audio data for input and text speech for the respective audio to be predicted by our model. Also, I have used a subset of 2000 files from the training and test set of the LibriSpeech dataset for faster training and validation over limited GPU power and usage limit.

Pre-Processing

In this process Torchaudio has been used to extract the waveform and sampling rate from the audiofile. Then have been used MFCC(Mel-frequency cepstrum coefficients) for feature extraction from the waveform. MelSpectogram, Spectogram and Frequency Masking could also be used in this case for feature exxtraction.

Acoustic Model architecture.

The Neural Network architecture consist of Residul-CNN blocks, BidirectionalGRU blocks, and fully connected Linear layers for final classification. From the input layer, we have two Residual CNN blocks(in sequential) with batch normalization, followed by a fully connected layer, hence connecting it to three bi-directional GRU blocks(in sequential) and finally fully connected linear layers for classification.

CTC(Connectionist Temporal Classification) Loss as the base loss function for our model and AdamW as the optimizer.

Decoding

We have used Greedy Decoder which argmax's the output of the Neural Network and transforms it into text through character mapping.

ML Lifecycle Pipeline

architecture2
We start by initializing the mlflow server where we need to specify backend storage, artifact uri, host and the port. Then we create an experiment, start a run within the experiment which inturn tracks the training and validation loss. Then we save the model followed by registring it and further use the registered model for deployment over the production.

Implementation

First we need to initialize the mlflow server.

mlflow run -e server . 

To start the server in a non-conda environmnet

mlflow run -e server . --no-conda

the server could also be initialized directly from the terminal by the following command. But for this the tracking uri need to be set manually.

mlflow server \
--backend-store-uri sqlite:///mlflow.db \
--default-artifact-root ./mlruns \
--host 127.0.0.1

Then we need to start the model training.

mlflow run -e train --experiment-name "SpeechRecognition" . -P epoch=20 -P batch=32

To train in non-conda environment.

mlflow run -e train --experiment-name "SpeechRecognition" . -P epoch=20 -P batch=32 --no-conda

To train the model through python command.

python main.py --epoch=20 --batch=20

This command functions the same as the above mlflow commands. It's just that I was facing some issues or bugs while running with mlflow command which worked prefectly fine while running with the python command.

Trained model performance

trainloss
testloss
lr
wer
cer
Now its time to validate the registered model. Enter the registered model name with respective model stage and version and file_id of the LibriSpeech dataset Test file.

mlflow run -e validate . -P train=False -P registered_model=SpeechRecognitionModel -P model_stage=Production file_id=1089-134686-0000
python main.py --train=False --registered_model=SpeechRecognitionModel --model_stage=Production --file_id=1089-134686-0000

Dashboard

dashboard

Registered model

regmodel

Artifacts

artifacts

Results

testresult

Target: she spoke with a sudden energy which partook of fear and passion and flushed her thin cheek and made her languid eyes flash
Predicted: she spot with a sudn inderge which pert huopk obeer an pasion amd hust her sting cheek and mad herlang wld ise flush
Target: we look for that reward which eye hath not seen nor ear heard neither hath entered into the heart of man
Predicted: we look forthat rewrd which i havt notse mor iear herd meter hat entere incs the hard oftmon
Target: there was a grim smile of amusement on his shrewd face
Predicted: there was a grim smiriel of a mise men puisoreud face
Target: if this matter is not to become public we must give ourselves certain powers and resolve ourselves into a small private court martial
Predicted: if this motere is not to mecome pubotk we mestgoeourselv certan pouors and resal orselveent a srmall pribut court nmatheld
Taarget: no good my dear watson
Predicted: no good my deare otsen 
Target: well she was better though she had had a bad night
Predicted: all she ws bhatter thu shu oid hahabaut night 
Target: the air is heavy the sea is calm
Predicted: the ar is haavyd the see is coomd 
Target: i left you on a continent and here i have the honor of finding you on an island
Predicted: i left you n a contonent and herei hafe the aner a find de youw on an ihalnd 
Target: the young man is in bondage and much i fear his death is decreed
Predicted: th young manis an bondage end much iffeer his dethis de creed 
Target: hay fever a heart trouble caused by falling in love with a grass widow
Predicted: hay fever ahar trbrl cawaese buy fallling itlelov wit the gressh wideo
Target: bravely and generously has he battled in my behalf and this and more will i dare in his service
Predicted: bravly ansjenereusly has he btaoled and miy ba hah andthis en morera welig darind his serves 

Future Scopes

  • There are other Neural Network models like Wav2Vec, Jasper which also be used and tested against for better model performance.
  • This is not a real-time automatic speech recognition project, where human speech would be decoded to text in real-time like in Amazon Alexa and Google Assistant. It takes the audio file as input and returns predicted speech. So, this could be taken to further limits by developing it into real-time automatic speech recognition.
  • The entire project has been done for local deployment. For the Productionisation of the model and datasets AWS s3 bucket and Microsoft Azure could be used, Kubernetes would also serve as a better option for the Productionisation of the model.
Owner
Victor Basu
Hello! I am Data Scientist and I love to do research on Data Science and Machine Learning
Victor Basu
A collection of scripts to preprocess ASR datasets and finetune language-specific Wav2Vec2 XLSR models

wav2vec-toolkit A collection of scripts to preprocess ASR datasets and finetune language-specific Wav2Vec2 XLSR models This repository accompanies the

Anton Lozhkov 29 Oct 23, 2022
Toy example of an applied ML pipeline for me to experiment with MLOps tools.

Toy Machine Learning Pipeline Table of Contents About Getting Started ML task description and evaluation procedure Dataset description Repository stru

Shreya Shankar 190 Dec 21, 2022
A spaCy wrapper of OpenTapioca for named entity linking on Wikidata

spaCyOpenTapioca A spaCy wrapper of OpenTapioca for named entity linking on Wikidata. Table of contents Installation How to use Local OpenTapioca Vizu

Universitätsbibliothek Mannheim 80 Jan 03, 2023
Tool to check whether a GCP bucket is public or not.

Tool to check publicly accessible GCP bucket. Blog https://justm0rph3u5.medium.com/gcp-inspector-auditing-publicly-exposed-gcp-bucket-ac6cad55618c Wha

DIVYANSHU SHUKLA 7 Nov 24, 2022
DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism (SVS & TTS); AAAI 2022

DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism This repository is the official PyTorch implementation of our AAAI-2022 paper, in

Jinglin Liu 829 Jan 07, 2023
My implementation of Safaricom Machine Learning Codility test. The code has bugs, logical I guess I made errors and any correction will be appreciated.

Safaricom_Codility Machine Learning 2022 The test entails two questions. Question 1 was on Machine Learning. Question 2 was on SQL I ran out of time.

Lawrence M. 1 Mar 03, 2022
Natural Language Processing Tasks and Examples.

Natural Language Processing Tasks and Examples With the advancement of A.I. technology in recent years, natural language processing technology has bee

Soohwan Kim 53 Dec 20, 2022
AI-powered literature discovery and review engine for medical/scientific papers

AI-powered literature discovery and review engine for medical/scientific papers paperai is an AI-powered literature discovery and review engine for me

NeuML 819 Dec 30, 2022
Final Project Bootcamp Zero

The Quest (Pygame) Descripción Este es el repositorio de código The-Quest para el proyecto final Bootcamp Zero de KeepCoding. El juego consiste en la

Seven-z01 1 Mar 02, 2022
A BERT-based reverse dictionary of Korean proverbs

Wisdomify A BERT-based reverse-dictionary of Korean proverbs. 김유빈 : 모델링 / 데이터 수집 / 프로젝트 설계 / back-end 김종윤 : 데이터 수집 / 프로젝트 설계 / front-end / back-end 임용

94 Dec 08, 2022
Chatbot with Pytorch, Python & Nextjs

Installation Instructions Make sure that you have Python 3, gcc, venv, and pip installed. Clone the repository $ git clone https://github.com/sahr

Rohit Sah 0 Dec 11, 2022
Understand Text Summarization and create your own summarizer in python

Automatic summarization is the process of shortening a text document with software, in order to create a summary with the major points of the original document. Technologies that can make a coherent

Sreekanth M 1 Oct 18, 2022
A simple Flask site that allows users to create, update, and delete posts in a database, as well as perform basic NLP tasks on the posts.

A simple Flask site that allows users to create, update, and delete posts in a database, as well as perform basic NLP tasks on the posts.

Ian 1 Jan 15, 2022
This is a general repo that helps you develop fast/effective NLP classifiers using Huggingface

NLP Classifier Introduction This project trains a bert model on any NLP classifcation model. And uses the model in make predictions on new data using

Abdullah Tarek 3 Mar 11, 2022
This repository will contain the code for the CVPR 2021 paper "GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields"

GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields Project Page | Paper | Supplementary | Video | Slides | Blog | Talk If

1.1k Dec 27, 2022
A Python wrapper for simple offline real-time dictation (speech-to-text) and speaker-recognition using Vosk.

Simple-Vosk A Python wrapper for simple offline real-time dictation (speech-to-text) and speaker-recognition using Vosk. Check out the official Vosk G

2 Jun 19, 2022
Chinese Grammatical Error Diagnosis

nlp-CGED Chinese Grammatical Error Diagnosis 中文语法纠错研究 基于序列标注的方法 所需环境 Python==3.6 tensorflow==1.14.0 keras==2.3.1 bert4keras==0.10.6 笔者使用了开源的bert4keras

12 Nov 25, 2022
Watson Natural Language Understanding and Knowledge Studio

Material de demonstração dos serviços: Watson Natural Language Understanding e Knowledge Studio Visão Geral: https://www.ibm.com/br-pt/cloud/watson-na

Vanderlei Munhoz 4 Oct 24, 2021
A python package to fine-tune transformer-based models for named entity recognition (NER).

nerblackbox A python package to fine-tune transformer-based language models for named entity recognition (NER). Resources Source Code: https://github.

Felix Stollenwerk 13 Jul 30, 2022
Deep Learning Topics with Computer Vision & NLP

Deep learning Udacity Course Deep Learning Topics with Computer Vision & NLP for the AWS Machine Learning Engineer Nanodegree Program Tasks are mostly

Simona Mircheva 1 Jan 20, 2022