PyTorch Lightning implementation of Automatic Speech Recognition

Last update: Sep 19, 2022

Overview

lasr

Lightening Automatic Speech Recognition

An MIT License ASR research library, built on PyTorch-Lightning, for developing end-to-end ASR models.

Introduction

PyTorch Lightning is the lightweight PyTorch wrapper for high-performance AI research. PyTorch is extremely easy to use to build complex AI models. But once the research gets complicated and things like multi-GPU training, 16-bit precision and TPU training get mixed in, users are likely to introduce bugs. PyTorch Lightning solves exactly this problem. Lightning structures your PyTorch code so it can abstract the details of training. This makes AI research scalable and fast to iterate on.

This project is an example that implements the asr project with PyTorch Lightning. In this project, I trained a model consisting of a conformer encoder + LSTM decoder with Joint CTC-Attention. The lasr means lighthning automatic speech recognition. I hope this could be a guideline for those who research speech recognition.

Installation

This project recommends Python 3.7 or higher.
I recommend creating a new virtual environment for this project (using virtual env or conda).

Prerequisites

Numpy: pip install numpy (Refer here for problem installing Numpy).
Pytorch: Refer to PyTorch website to install the version w.r.t. your environment.
librosa: conda install -c conda-forge librosa (Refer here for problem installing librosa)
torchaudio: pip install torchaudio==0.6.0 (Refer here for problem installing torchaudio)
sentencepiece: pip install sentencepiece (Refer here for problem installing sentencepiece)
pytorch-lightning: pip install pytorch-lightning (Refer here for problem installing pytorch-lightning)
hydra: pip install hydra-core --upgrade (Refer here for problem installing hydra)

Install from source

Currently we only support installation from source code using setuptools. Checkout the source code and run the
following commands:

pip install -e .

Install Apex (for 16-bit training)

For faster training install NVIDIA's apex library:

$ git clone https://github.com/NVIDIA/apex
$ cd apex

# ------------------------
# OPTIONAL: on your cluster you might need to load CUDA 10 or 9
# depending on how you installed PyTorch

# see available modules
module avail

# load correct CUDA before install
module load cuda-10.0
# ------------------------

# make sure you've loaded a cuda version > 4.0 and < 7.0
module load gcc-6.1.0

$ pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

Get Started

I use Hydra to control all the training configurations. If you are not familiar with Hydra we recommend visiting the Hydra website. Generally, Hydra is an open-source framework that simplifies the development of research applications by providing the ability to create a hierarchical configuration dynamically.

Training Speech Recognizer

You can simply train with LibriSpeech dataset like below:

$ python ./bin/main.py --dataset_path $DATASET_PATH --dataset_download True

Check configuraions at [link]

Troubleshoots and Contributing

If you have any questions, bug reports, and feature requests, please open an issue on Github.

I appreciate any kind of feedback or contribution. Feel free to proceed with small issues like bug fixes, documentation improvement. For major contributions and new features, please discuss with the collaborators in corresponding issues.

Code Style

I follow PEP-8 for code style. Especially the style of docstrings is important to generate documentation.

License

This project is licensed under the MIT LICENSE - see the LICENSE.md file for details

Author

Soohwan Kim @sooftware
Contacts: [email protected]

Unofficial Pytorch Lightning implementation of Contrastive Syn-to-Real Generalization (ICLR, 2021)

17 Sep 23, 2021

RGBD-Net - This repository contains a pytorch lightning implementation for the 3DV 2021 RGBD-Net paper.

[3DV 2021] We propose a new cascaded architecture for novel view synthesis, called RGBD-Net, which consists of two core components: a hierarchical depth regression network and a depth-aware generator network.

4 May 26, 2022

A simple, unofficial implementation of MAE using pytorch-lightning

Masked Autoencoders in PyTorch A simple, unofficial implementation of MAE (Masked Autoencoders are Scalable Vision Learners) using pytorch-lightning.

20 Dec 3, 2022

Tensorflow Implementation for "Pre-trained Deep Convolution Neural Network Model With Attention for Speech Emotion Recognition"

Tensorflow Implementation for "Pre-trained Deep Convolution Neural Network Model With Attention for Speech Emotion Recognition" Pre-trained Deep Convo

5 Nov 11, 2022

STYLER: Style Factor Modeling with Rapidity and Robustness via Speech Decomposition for Expressive and Controllable Neural Text to Speech

STYLER: Style Factor Modeling with Rapidity and Robustness via Speech Decomposition for Expressive and Controllable Neural Text to Speech Keon Lee, Ky

114 Dec 12, 2022

ERISHA is a mulitilingual multispeaker expressive speech synthesis framework. It can transfer the expressivity to the speaker's voice for which no expressive speech corpus is available.

ERISHA: Multilingual Multispeaker Expressive Text-to-Speech Library ERISHA is a multilingual multispeaker expressive speech synthesis framework. It ca

43 Nov 27, 2022

Pytorch Lightning code guideline for conferences

Deep learning project seed Use this seed to start new deep learning / ML projects. Built in setup.py Built in requirements Examples with MNIST Badges

1k Jan 2, 2023

Pytorch Lightning Distributed Accelerators using Ray

Distributed PyTorch Lightning Training on Ray This library adds new PyTorch Lightning accelerators for distributed training using the Ray distributed

166 Dec 27, 2022

Pytorch Lightning Distributed Accelerators using Ray

Distributed PyTorch Lightning Training on Ray This library adds new PyTorch Lightning plugins for distributed training using the Ray distributed compu

167 Jan 2, 2023

Comments

incorrect spm params

python prepare_libri.py --dataset_path ../../data/lasr/libri/LibriSpeech --vocab_size 5000
sentencepiece_trainer.cc(177) LOG(INFO) Running command: --input=spm_input.txt --model_prefix=tokenizer --vocab_size=5000 --model_type=unigram --pad_id=0 --bos_id=1 --eos_id=2
sentencepiece_trainer.cc(77) LOG(INFO) Starts training with :
trainer_spec {
  input: spm_input.txt
  input_format:
  model_prefix: tokenizer
  model_type: UNIGRAM
  vocab_size: 5000
  self_test_sample_size: 0
  character_coverage: 0.9995
  input_sentence_size: 0
  shuffle_input_sentence: 1
  seed_sentencepiece_size: 1000000
  shrinking_factor: 0.75
  max_sentence_length: 4192
  num_threads: 16
  num_sub_iterations: 2
  max_sentencepiece_length: 16
  split_by_unicode_script: 1
  split_by_number: 1
  split_by_whitespace: 1
  split_digits: 0
  treat_whitespace_as_suffix: 0
  required_chars:
  byte_fallback: 0
  vocabulary_output_piece_score: 1
  train_extremely_large_corpus: 0
  hard_vocab_limit: 1
  use_all_vocab: 0
  unk_id: 0
  bos_id: 1
  eos_id: 2
  pad_id: 0
  unk_piece: <unk>
  bos_piece: <s>
  eos_piece: </s>
  pad_piece: <pad>
  unk_surface:  ⁇
}
normalizer_spec {
  name: nmt_nfkc
  add_dummy_prefix: 1
  remove_extra_whitespaces: 1
  escape_whitespaces: 1
  normalization_rule_tsv:
}
denormalizer_spec {}
Traceback (most recent call last):
  File "prepare_libri.py", line 65, in <module>
    main()
  File "prepare_libri.py", line 58, in main
    prepare_tokenizer(transcripts_collection[0], opt.vocab_size)
  File "lasr/dataset/preprocess.py", line 71, in prepare_tokenizer
    spm.SentencePieceTrainer.Train(cmd)
  File "anaconda3/envs/lasr/lib/python3.7/site-packages/sentencepiece/__init__.py", line 407, in Train
    return SentencePieceTrainer._TrainFromString(arg)
  File "anaconda3/envs/lasr/lib/python3.7/site-packages/sentencepiece/__init__.py", line 385, in _TrainFromString
    return _sentencepiece.SentencePieceTrainer__TrainFromString(arg)
RuntimeError: Internal: /home/conda/feedstock_root/build_artifacts/sentencepiece_1612846348604/work/src/trainer_interface.cc(666) [insert_id(trainer_spec_.pad_id(), trainer_spec_.pad_piece())]

opened by szalata 3

Releases(v0.1)

v0.1(May 9, 2021)
Fix several bugs in v0.0

pull request 28~53

Apply structured configuraions (using Hydra)

Source code(tar.gz)
Source code(zip)
v0.0(May 8, 2021)
First release

Data download & processing test complete

Source code(tar.gz)
Source code(zip)

Owner

Soohwan Kim

Toward human-like AI

GitHub Repository https://sooftware.github.io/lasr/

Metadata-Extractor - Metadata Extractor Script can be used to read in exif metadata

Metadata Extractor The exifextract script can be used to read in exif metadata f

1 Feb 16, 2022

Prototypical Cross-Attention Networks for Multiple Object Tracking and Segmentation, NeurIPS 2021 Spotlight

PCAN for Multiple Object Tracking and Segmentation This is the offical implementation of paper PCAN for MOTS. We also present a trailer that consists

328 Dec 29, 2022

Facial Action Unit Intensity Estimation via Semantic Correspondence Learning with Dynamic Graph Convolution

FAU Implementation of the paper: Facial Action Unit Intensity Estimation via Semantic Correspondence Learning with Dynamic Graph Convolution. Yingruo

78 Nov 29, 2022

A Decentralized Omnidirectional Visual-Inertial-UWB State Estimation System for Aerial Swar.

Omni-swarm A Decentralized Omnidirectional Visual-Inertial-UWB State Estimation System for Aerial Swarm Introduction Omni-swarm is a decentralized omn

99 Dec 23, 2022

The implementation code for "DAGAN: Deep De-Aliasing Generative Adversarial Networks for Fast Compressed Sensing MRI Reconstruction"

DAGAN This is the official implementation code for DAGAN: Deep De-Aliasing Generative Adversarial Networks for Fast Compressed Sensing MRI Reconstruct

159 Nov 22, 2022

Gluon CV Toolkit

Gluon CV Toolkit | Installation | Documentation | Tutorials | GluonCV provides implementations of the state-of-the-art (SOTA) deep learning models in

5.4k Jan 06, 2023

GAT - Graph Attention Network (PyTorch) 💻 + graphs + 📣 = ❤️

GAT - Graph Attention Network (PyTorch) 💻 + graphs + 📣 = ❤️ This repo contains a PyTorch implementation of the original GAT paper ( 🔗 Veličković et

1.9k Jan 09, 2023

SOLOv2 on onnx & tensorRT

SOLOv2.tensorRT: NOTE: code based on WXinlong/SOLO add support to TensorRT inference onnxruntime tensorRT full_dims and dynamic shape postprocess with

47 Nov 26, 2022

Customer-Transaction-Analysis - This analysis is based on a synthesised transaction dataset containing 3 months worth of transactions for 100 hypothetical customers.

Customer-Transaction-Analysis - This analysis is based on a synthesised transaction dataset containing 3 months worth of transactions for 100 hypothetical customers. It contains purchases, recurring

1 Jan 01, 2022

ClevrTex: A Texture-Rich Benchmark for Unsupervised Multi-Object Segmentation

ClevrTex This repository contains dataset generation code for ClevrTex benchmark from paper: ClevrTex: A Texture-Rich Benchmark for Unsupervised Multi

26 Dec 21, 2022

Analysis code and Latex source of the manuscript describing the conditional permutation test of confounding bias in predictive modelling.

Git repositoty of the manuscript entitled Statistical quantification of confounding bias in predictive modelling by Tamas Spisak The manuscript descri

0 Nov 22, 2021

PyTorch Lightning implementation of Automatic Speech Recognition

Related tags

Overview

lasr

Introduction

Installation

Prerequisites

Install from source

Install Apex (for 16-bit training)

Get Started

Training Speech Recognizer

Troubleshoots and Contributing

Code Style

License

Author

You might also like...

Unofficial Pytorch Lightning implementation of Contrastive Syn-to-Real Generalization (ICLR, 2021)

RGBD-Net - This repository contains a pytorch lightning implementation for the 3DV 2021 RGBD-Net paper.

A simple, unofficial implementation of MAE using pytorch-lightning

Tensorflow Implementation for "Pre-trained Deep Convolution Neural Network Model With Attention for Speech Emotion Recognition"

STYLER: Style Factor Modeling with Rapidity and Robustness via Speech Decomposition for Expressive and Controllable Neural Text to Speech

ERISHA is a mulitilingual multispeaker expressive speech synthesis framework. It can transfer the expressivity to the speaker's voice for which no expressive speech corpus is available.

Pytorch Lightning code guideline for conferences

Pytorch Lightning Distributed Accelerators using Ray

Pytorch Lightning Distributed Accelerators using Ray

Comments

incorrect spm params

Releases(v0.1)

v0.1(May 9, 2021)

v0.0(May 8, 2021)

Owner

Soohwan Kim

Metadata-Extractor - Metadata Extractor Script can be used to read in exif metadata

Prototypical Cross-Attention Networks for Multiple Object Tracking and Segmentation, NeurIPS 2021 Spotlight

Facial Action Unit Intensity Estimation via Semantic Correspondence Learning with Dynamic Graph Convolution

A Decentralized Omnidirectional Visual-Inertial-UWB State Estimation System for Aerial Swar.

The implementation code for "DAGAN: Deep De-Aliasing Generative Adversarial Networks for Fast Compressed Sensing MRI Reconstruction"

Gluon CV Toolkit

GAT - Graph Attention Network (PyTorch) 💻 + graphs + 📣 = ❤️

SOLOv2 on onnx & tensorRT

Customer-Transaction-Analysis - This analysis is based on a synthesised transaction dataset containing 3 months worth of transactions for 100 hypothetical customers.

ClevrTex: A Texture-Rich Benchmark for Unsupervised Multi-Object Segmentation

Analysis code and Latex source of the manuscript describing the conditional permutation test of confounding bias in predictive modelling.

A library for preparing, training, and evaluating scalable deep learning hybrid recommender systems using PyTorch.

Official PyTorch implementation of the Fishr regularization for out-of-distribution generalization

Hyperopt for solving CIFAR-100 with a convolutional neural network (CNN) built with Keras and TensorFlow, GPU backend

A knowledge base construction engine for richly formatted data

PyTorch 1.0 inference in C++ on Windows10 platforms

Official Pytorch implementation of 6DRepNet: 6D Rotation representation for unconstrained head pose estimation.

Semi-Supervised Learning, Object Detection, ICCV2021

Supervised multi-SNE (S-multi-SNE): Multi-view visualisation and classification

Api for getting bin info and getting encrypted card details for adyen.