Romanian Automatic Speech Recognition from the ROBIN project

Last update: Jan 01, 2023

Overview

RobinASR

This repository contains Robin's Automatic Speech Recognition (RobinASR) for the Romanian language based on the DeepSpeech2 architecture, together with a KenLM language model to imporve the transcriptions.

The pretrained text-to-speech model can be downloaded from here and the pretrained KenLM can be downloaded from here.

Also, make sure to visit:

A demo of the ASR system available in the RELATE platform: https://relate.racai.ro/index.php?path=robin/asr
A post-processing web service allowing hyphenation and basic capitalization restoration: https://github.com/racai-ai/RobinASRHyphenationCorrection

Installation

Docker

Download the pretrained text-to-speech model and the pretrained KenLM at the above links, and copy them in a models directory inside this repository.
Build the docker image using the Dockerfile. Make sure that deepspeech_pytorch/configs/inference_config.py has the desired configuration.

docker build --tag RobinASR .

Run the docker image.

docker run --gpus all -p 8888:8888 --net=host --ipc=host RobinASR

From Source

You must have Python 3.6+ and PyTorch 1.5.1+ installed in your system. Also. Cuda 10.1+ is required if you want to use the (recommended) GPU version.
Clone the repository and install its dependencies:

git clone https://github.com/racai-ai/RobinASR.git
cd RobinASR
pip3 install -r requirements.txt
pip3 install -e .

Install Nvidia Apex:

git clone --recursive https://github.com/NVIDIA/apex.git
cd apex && pip install .

If you want to use Beam Search and the KenLM language model, you must install CTCDecode:

git clone --recursive https://github.com/parlance/ctcdecode.git
cd ctcdecode && pip install .

Inference Server

Firstly, take a look at the configuration file in deepspeech_pytorch/configs/inference_config.py and make sure that the configuration meets your requirements. Then, run the following command:

python3 server.py

Train a New Model

You must create 3 csv manifest files (train, valid and test) that contain on each line the the path to a wav file and the path to its corresponding transcription, separated by commas:

path_to_wav1,path_to_txt1
path_to_wav2,path_to_txt2
path_to_wav3,path_to_txt3
...

Then you must modify correspondingly with your configuration the file located at deepspeech_pytorch/configs/train_config.py and start training with:

python train.py

Acknowledgments

We would like to thank Sean Narnen for making his DeepSpeech2 implementation publicly-available. We used a lot of his code in our implementation.

Cite

If you are using this repository, please cite the following paper as a thank you to the authors:

Avram, A.M., Păiș, V. and Tufis, D., 2020, October. Towards a Romanian end-to-end automatic speech recognition based on Deepspeech2. In Proc. Rom. Acad. Ser. A (Vol. 21, pp. 395-402).

or in BibTeX format:

@inproceedings{avram2020towards,
  title={Towards a Romanian end-to-end automatic speech recognition based on Deepspeech2},
  author={Avram, Andrei-Marius and Păiș, Vasile and Tufiș, Dan},
  booktitle={Proceedings of the Romanian Academy, Series A},
  pages={395--402},
  year={2020}
}

Romanian Automatic Speech Recognition from the ROBIN project

Related tags

Overview

RobinASR

Installation

Docker

From Source

Inference Server

Train a New Model

Acknowledgments

Cite

Owner

RACAI

Attack on Confidence Estimation algorithm from the paper "Disrupting Deep Uncertainty Estimation Without Harming Accuracy"

TabNet for fastai

AAI supports interdisciplinary research to help better understand human, animal, and artificial cognition.

Fast Neural Representations for Direct Volume Rendering

The Codebase for Causal Distillation for Language Models.

Codes and pretrained weights for winning submission of 2021 Brain Tumor Segmentation (BraTS) Challenge

ICRA 2021 "Towards Precise and Efficient Image Guided Depth Completion"

Code release for Universal Domain Adaptation(CVPR 2019)

Graph Analysis From Scratch

Code for 'Self-Guided and Cross-Guided Learning for Few-shot segmentation. (CVPR' 2021)'

Adjust Decision Boundary for Class Imbalanced Learning

CIFAR-10_train-test - training and testing codes for dataset CIFAR-10

Simple PyTorch implementations of Badnets on MNIST and CIFAR10.

The mini-MusicNet dataset

Elegy is a framework-agnostic Trainer interface for the Jax ecosystem.

Greedy Gaussian Segmentation

Source code for ZePHyR: Zero-shot Pose Hypothesis Rating @ ICRA 2021

Checking fibonacci - Generating the Fibonacci sequence is a classic recursive problem

Homepage of paper: Paint Transformer: Feed Forward Neural Painting with Stroke Prediction, ICCV 2021.

Detect roadway lanes using Python OpenCV for project during the 5th semester at DHBW Stuttgart for lecture in digital image processing.