PyKaldi GOP-DNN on Epa-DB

Last update: Dec 14, 2022

Related tags

Overview

PyKaldi GOP-DNN on Epa-DB

This repository has the tools to run a PyKaldi GOP-DNN algorithm on Epa-DB, a database of non-native English speech by Spanish speakers from Argentina. It uses a PyTorch acoustic model based on Kaldi's TDNN-F acoustic model. A script is provided to convert Kaldi's model to PyTorch. Kaldi's model must be downloaded separately from the Kaldi website

If you use this code or the Epa database, please cite the following paper:

J. Vidal, L. Ferrer, L. Brambilla, "EpaDB: a database for the development of pronunciation assessment systems", isca-speech

@article{vidal2019epadb,
  title={EpaDB: a database for development of pronunciation assessment systems},
  author={Vidal, Jazmin and Ferrer, Luciana and Brambilla, Leonardo},
  journal={Proc. Interspeech 2019},
  pages={589--593},
  year={2019}
}

Introduction
Prerequisites
How to install
How to run
Notes on Kaldi-DNN-GOP
References

Introduction

This toolkit is meant to facilitate experimentation with Epa-DB by allowing users to run a state-of-the-art baseline system on it. Epa-DB, is a database of non-native English speech by argentinian speakers of Spanish. It is intended for research on mispronunciation detection and development of pronunciation assessment systems. The database includes recordings from 30 non-native speakers of English, 15 male and 15 female, whose first language (L1) is Spanish from Argentina (mainly of the Rio de la Plata dialect). Each speaker recorded 64 short English phrases phonetically balanced and specifically designed to globally contain all the sounds difficult to pronounce for the target population. All recordings were annotated at phone level by expert raters.

For more information on the database, please refer to the documentation or publication

If you are only looking for the EpaDB corpus, you can download it from this link.

Prerequisites

Kaldi installed.
TextGrid managing library installed using pip. Instructions at this link.
The EpaDB database downloaded. Alternative link.
Librispeech ASR model

How to install

To install this repository, do the following steps:

Clone this repository:

git clone https://github.com/MarceloSancinetti/epa-gop-pykaldi.git

Download Librispeech ASR acoustic model from Kaldi and move it or link it inside the top directory of the repository:

wget https://kaldi-asr.org/models/13/0013_librispeech_v1_chain.tar.gz
tar -zxvf 0013_librispeech_v1_chain.tar.gz

Convert the acoustic model to text format:

nnet3-copy --binary=false exp/chain_cleaned/tdnn_1d_sp/final.mdl exp/chain_cleaned/tdnn_1d_sp/final.txt

Install the requirements:

pip install -r requirements.txt

Install PyKaldi:

Follow instructions from https://github.com/pykaldi/pykaldi#installation

Convert the acoustic model to Pytorch:

python convert_chain_to_pytorch.py

PyKaldi GOP-DNN on Epa-DB

Related tags

Overview

PyKaldi GOP-DNN on Epa-DB

Table of Contents

Introduction

Prerequisites

How to install

Owner

Code accompanying "Adaptive Methods for Aggregated Domain Generalization"

RCD: Relation Map Driven Cognitive Diagnosis for Intelligent Education Systems

MG-GCN: Scalable Multi-GPU GCN Training Framework

Causal-BALD: Deep Bayesian Active Learning of Outcomes to Infer Treatment-Effects from Observational Data.

PyTorch reimplementation of hand-biomechanical-constraints (ECCV2020)

Personals scripts using ageitgey/face_recognition

code for TCL: Vision-Language Pre-Training with Triple Contrastive Learning, CVPR 2022

DockStream: A Docking Wrapper to Enhance De Novo Molecular Design

Code of paper Interact, Embed, and EnlargE (IEEE): Boosting Modality-specific Representations for Multi-Modal Person Re-identification.

Pytorch implementation for Patient Knowledge Distillation for BERT Model Compression

An MQA (Studio, originalSampleRate) identifier for lossless flac files written in Python.

Allows including an action inside another action (by preprocessing the Yaml file). This is how composite actions should have worked.

Spectral normalization (SN) is a widely-used technique for improving the stability and sample quality of Generative Adversarial Networks (GANs)

This is the official implementation of "One Question Answering Model for Many Languages with Cross-lingual Dense Passage Retrieval".

SPRING is a seq2seq model for Text-to-AMR and AMR-to-Text (AAAI2021).

The code for paper "Learning Implicit Fields for Generative Shape Modeling".

[NeurIPS 2020] Semi-Supervision (Unlabeled Data) & Self-Supervision Improve Class-Imbalanced / Long-Tailed Learning

LiDAR Distillation: Bridging the Beam-Induced Domain Gap for 3D Object Detection

StarGAN - Official PyTorch Implementation (CVPR 2018)

Proto-RL: Reinforcement Learning with Prototypical Representations