Vad-sli-asr - A Python scripts for a speech processing pipeline with Voice Activity Detection (VAD)

Last update: Dec 09, 2022

Related tags

Overview

VAD-SLI-ASR

Python scripts for a speech processing pipeline with Voice Activity Detection (VAD), Spoken Language Identification (SLI), and Automatic Speech Recognition (ASR). Our use case involves using VAD to detect time regions in a language documentation recording where someone is speaking, then using SLI to classify each region as either English (eng) or Muruwari (zmu), and then using an English ASR model to transcribe regions detected as English. This pipeline outputs an ELAN .eaf file with the following tier structure (_vad, _sli, and _asr):

Set up

pip install -r requirements.txt

Data

├── data
│   ├── sli-train      <- Training data for SLI (one folder per language)
│   │   ├── eng/       <- .wav files (English utterances)
│   │   ├── zmu/       <- .wav files (Muruwari utterances)
│   ├── asr-train      <- Intermediate data that has been transformed.
│   │   ├── eng.tsv    <- transcriptions
│   │   ├── eng/       <- .wav files (English utterances)

Usage

VAD

# VAD
python scripts/run_vad-by-silero.py myrecording.wav

SLI

# To train a classifier using your own clips and then save it:
python scripts/train_sli-by-sblr.py data/sli-train models/zmu-eng_sli_k10.pkl

# Use trained model to classify VAD-detected regions as eng or zmu
python scripts/run_sli-by-sblr.py models/zmu-eng_sli_k10.pkl myrecording.wav

ASR

# To fine-tune a wav2vec 2.0 model and save the checkpoint:
python scripts/train_asr-by-w2v2.py data/asr-train data/checkpoints/no-lm_b10

# Transcribe using trained model 
python scripts/run_asr-by-w2v2.py data/checkpoints/no-lm_b10 myrecording.wav

Paddlespeech Streaming ASR GUI

Paddlespeech-Streaming-ASR-GUI Introduction A paddlespeech Streaming ASR GUI. Us

3 Jan 5, 2022

Code for Findings of ACL 2022 Paper "Sentiment Word Aware Multimodal Refinement for Multimodal Sentiment Analysis with ASR Errors"

SWRM Code for Findings of ACL 2022 Paper "Sentiment Word Aware Multimodal Refinement for Multimodal Sentiment Analysis with ASR Errors" Clone Clone th

14 Jan 3, 2023

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

Rasa Open Source Rasa is an open source machine learning framework to automate text-and voice-based conversations. With Rasa, you can build contextual

15.3k Dec 30, 2022

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

Rasa Open Source Rasa is an open source machine learning framework to automate text-and voice-based conversations. With Rasa, you can build contextual

15.3k Jan 3, 2023

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

Rasa Open Source Rasa is an open source machine learning framework to automate text-and voice-based conversations. With Rasa, you can build contextual

10.8k Feb 18, 2021

Releases(1.1.0)

1.1.0(Apr 23, 2022)
Switched to using pre-existing vocabulary from pre-trained model (see Appendix A in paper).

Source code(tar.gz)
Source code(zip)
1.0.0(Apr 18, 2022)

Source code(tar.gz)
Source code(zip)
0.9.0(Apr 14, 2022)

Pre-release to check Zenodo sync
Source code(tar.gz)
Source code(zip)

Vad-sli-asr - A Python scripts for a speech processing pipeline with Voice Activity Detection (VAD)

Related tags

Overview

VAD-SLI-ASR

Set up

Data

Usage

VAD

SLI

ASR

You might also like...

Paddlespeech Streaming ASR GUI

Code for Findings of ACL 2022 Paper "Sentiment Word Aware Multimodal Refinement for Multimodal Sentiment Analysis with ASR Errors"

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

This project converts your human voice input to its text transcript and to an automated voice too.

Chinese real time voice cloning (VC) and Chinese text to speech (TTS).

Clone a voice in 5 seconds to generate arbitrary speech in real-time

Every Google, Azure & IBM text to speech voice for free

Releases(1.1.0)

1.1.0(Apr 23, 2022)

1.0.0(Apr 18, 2022)

0.9.0(Apr 14, 2022)

Owner

Dynamics of Language

Convolutional Neural Networks for Sentence Classification

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Chinese Named Entity Recognization (BiLSTM with PyTorch)

Web Scraping, Document Deduplication & GPT-2 Fine-tuning with a newly created scam dataset.

LegalNLP - Natural Language Processing Methods for the Brazilian Legal Language

Twitter bot that uses NLP models to summarize news articles referenced in a user's twitter timeline

BERT has a Mouth, and It Must Speak: BERT as a Markov Random Field Language Model

ThinkTwice: A Two-Stage Method for Long-Text Machine Reading Comprehension

In this repository we have tested 3 VQA models on the ImageCLEF-2019 dataset.

Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition

LightSeq: A High-Performance Inference Library for Sequence Processing and Generation

Code for the paper "Are Sixteen Heads Really Better than One?"

The SVO-Probes Dataset for Verb Understanding

Codes for coreference-aware machine reading comprehension

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

LSTM model - IMDB review sentiment analysis

Samantha, A covid-19 information bot which will provide basic information about this pandemic in form of conversation.

Implemented shortest-circuit disambiguation, maximum probability disambiguation, HMM-based lexical annotation and BiLSTM+CRF-based named entity recognition

Tokenizer - Module python d'analyse syntaxique et de grammaire, tokenization

Test finetuning of XLSR (multilingual wav2vec 2.0) for other speech classification tasks