Simple Python library, distributed via binary wheels with few direct dependencies, for easily using wav2vec 2.0 models for speech recognition

Last update: Dec 29, 2022

Overview

Wav2Vec2 STT Python

Beta Software

Simple Python library, distributed via binary wheels with few direct dependencies, for easily using wav2vec 2.0 models for speech recognition.

Requirements:

Python 3.7+
Platform: Linux x64 (Windows is a work in progress; MacOS may work; PRs welcome)
Python package requirements: cffi, numpy
Wav2Vec2 2.0 Model (must be converted to compatible format)
- Several are available ready-to-go on this project's releases page and below.
- You can convert your own models by following the instructions here.

Models:

Model	Download Size
Facebook Wav2Vec2 2.0 Base (960h)	360 MB
Facebook Wav2Vec2 2.0 Large (960h)	1.18 GB
Facebook Wav2Vec2 2.0 Large LV60 (960h)	1.18 GB
Facebook Wav2Vec2 2.0 Large LV60 Self (960h)	1.18 GB

Usage

from wav2vec2_stt import Wav2Vec2STT
decoder = Wav2Vec2STT('model_dir')

import wave
wav_file = wave.open('tests/test.wav', 'rb')
wav_samples = wav_file.readframes(wav_file.getnframes())

assert decoder.decode(wav_samples).strip().lower() == 'it depends on the context'

Also contains a simple CLI interface for recognizing wav files:

$ python -m wav2vec2_stt decode model test.wav
IT DEPENDS ON THE CONTEXT
$ python -m wav2vec2_stt decode model test.wav test.wav
IT DEPENDS ON THE CONTEXT
IT DEPENDS ON THE CONTEXT
$ python -m wav2vec2_stt -h
usage: python -m wav2vec2_stt [-h] {decode} ...

positional arguments:
  {decode}    sub-command
    decode    decode one or more WAV files

optional arguments:
  -h, --help  show this help message and exit

Installation/Building

Recommended installation via wheel from pip (requires a recent version of pip):

python -m pip install wav2vec2_stt

See setup.py for more details on building it yourself.

Author

David Zurow (@daanzu)

License

This project is licensed under the GNU Affero General Public License v3 (AGPL-3.0-or-later). See the LICENSE file for details. If this license is problematic for you, please contact me.

Acknowledgments

Contains and uses code from PyTorch and torchaudio, licensed under the BSD 2-Clause License.

Comments

provide API for returning output from intermediate layers

It would be very helpful to have an API for returning output from intermediate layers, for example, the one before the final layers. This output can be used in other speech tasks other than speech recognition.

opened by zhouyong64 1

ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.

13.6k Jan 5, 2023

Simple telegram bot to convert files into direct download link.you can use telegram as a file server 🪁

TGCLOUD 🪁 Simple telegram bot to convert files into direct download link.you can use telegram as a file server 🪁 Features Easy to Deploy Heroku Supp

6 Oct 18, 2022

Python interface for converting Penn Treebank trees to Stanford Dependencies and Universal Depenencies

PyStanfordDependencies Python interface for converting Penn Treebank trees to Universal Dependencies and Stanford Dependencies. Example usage Start by

64 May 8, 2022

Modular and extensible speech recognition library leveraging pytorch-lightning and hydra.

Lightning ASR Modular and extensible speech recognition library leveraging pytorch-lightning and hydra What is Lightning ASR • Installation • Get Star

40 Sep 19, 2022

This repository details the steps in creating a Part of Speech tagger using Trigram Hidden Markov Models and the Viterbi Algorithm without using external libraries.

POS-Tagger This repository details the creation of a Part-of-Speech tagger using Trigram Hidden Markov Models to predict word tags in a word sequence.

1 Dec 9, 2021

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

⚠️ Checkout develop branch to see what is coming in pyannote.audio 2.0: a much smaller and cleaner codebase Python-first API (the good old pyannote-au

2.2k Jan 9, 2023

PyTorch implementation of Microsoft's text-to-speech system FastSpeech 2: Fast and High-Quality End-to-End Text to Speech.

An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"

1k Dec 30, 2022

Code for ACL 2022 main conference paper "STEMM: Self-learning with Speech-text Manifold Mixup for Speech Translation".

STEMM: Self-learning with Speech-Text Manifold Mixup for Speech Translation This is a PyTorch implementation for the ACL 2022 main conference paper ST

29 Oct 16, 2022

Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning

GenSen Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning Sandeep Subramanian, Adam Trischler, Yoshua B

309 Oct 19, 2022

Simple Python library, distributed via binary wheels with few direct dependencies, for easily using wav2vec 2.0 models for speech recognition

Related tags

Overview

Wav2Vec2 STT Python

Usage

Installation/Building

Author

License

Acknowledgments

You might also like...

ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.

Simple telegram bot to convert files into direct download link.you can use telegram as a file server 🪁

Python interface for converting Penn Treebank trees to Stanford Dependencies and Universal Depenencies

Modular and extensible speech recognition library leveraging pytorch-lightning and hydra.

This repository details the steps in creating a Part of Speech tagger using Trigram Hidden Markov Models and the Viterbi Algorithm without using external libraries.

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

PyTorch implementation of Microsoft's text-to-speech system FastSpeech 2: Fast and High-Quality End-to-End Text to Speech.

Code for ACL 2022 main conference paper "STEMM: Self-learning with Speech-text Manifold Mixup for Speech Translation".

Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning

Comments

provide API for returning output from intermediate layers

Releases(v0.2.0)

v0.2.0(Aug 16, 2021)

models(Aug 2, 2021)

Owner

David Zurow

Malware-Related Sentence Classification

Implementation of Memorizing Transformers (ICLR 2022), attention net augmented with indexing and retrieval of memories using approximate nearest neighbors, in Pytorch

Using BERT-based models for toxic span detection

💫 Industrial-strength Natural Language Processing (NLP) in Python

Stuff related to Ben Eater's 8bit breadboard computer

Neural network models for joint POS tagging and dependency parsing (CoNLL 2017-2018)

Codename generator using WordNet parts of speech database

The swas programming language

A framework for training and evaluating AI models on a variety of openly available dialogue datasets.

In this Notebook I've build some machine-learning and deep-learning to classify corona virus tweets, in both multi class classification and binary classification.

Sorce code and datasets for "K-BERT: Enabling Language Representation with Knowledge Graph",

Analyse japanese ebooks using MeCab to determine the difficulty level for japanese learners

justCTF [*] 2020 challenges sources

Final Project Bootcamp Zero

AI-powered literature discovery and review engine for medical/scientific papers

小布助手对话短文本语义匹配的一个baseline

Statistics and Mathematics for Machine Learning, Deep Learning , Deep NLP

Use Tensorflow2.7.0 Build OpenAI'GPT-2

A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural networks

NumPy String-Indexed is a NumPy extension that allows arrays to be indexed using descriptive string labels