Asr abc - Automatic speech recognition(ASR),中文语音识别

Overview

语音识别的简单示例,主要在课堂演示使用

创建python虚拟环境

在linux 和macos 上验证通过

# 如果已经有pyhon3.6 环境,跳过该步骤,使用现有环境也可以
virtualenv ~/env/asr_abc --python=python3.8
. ~/env/asr_abc/bin/activate

安装本项目

python setup.py install
or 
pip install .

识别wav音频

Note: 输入音频采样率必须是16k,如果待识别音频不是16k,可以采用以下命令重采样为16k.

ffmpeg -i ks1_48k.acc -ar 16000 ks1_16k.wav
python decode.py
# 或debug 模式
python decode.py -d

#预期输出:
2021-12-24 17:08:31,736 INFO [decode.py:91] All files seem exist.
2021-12-24 17:08:31,736 INFO [decode.py:96] Start loading model.
2021-12-24 17:08:41,321 INFO [decode.py:113] Start loading dict.
2021-12-24 17:08:41,336 INFO [decode.py:119] Start recognize data/wavs/BAC009S0764W0143.wav.
2021-12-24 17:08:41,527 INFO [decode.py:134] Result: 在市场整体从高速增长进入中高速增长区间的同时
2021-12-24 17:08:41,527 INFO [decode.py:135] done.

# 也可以指定输入音频
python decode.py --input-wav=data/wavs/ks1_16k.wav
或者
python decode.py -i=data/wavs/ks1_16k.wav  # ks is short for "Kantanzhe Song"
# 预期输出:
2021-12-27 19:16:02,911 INFO [decode.py:91] All files seem exist.
2021-12-27 19:16:02,911 INFO [decode.py:96] Start loading model.
2021-12-27 19:16:08,405 INFO [decode.py:113] Start loading dict.
2021-12-27 19:16:08,409 INFO [decode.py:119] Start recognize data/wavs/ks1_16k.wav.
2021-12-27 19:16:08,449 INFO [decode.py:137] Result: 我们有火焰般的热情
2021-12-27 19:16:08,450 INFO [decode.py:138] done.

相关项目链接:

https://github.com/k2-fsa/icefall
https://github.com/k2-fsa/k2
https://github.com/lhotse-speech/lhotse

手动模型下载

如果上述python decode.py 已识别出预期结果,说明模型已自动从下载源1成功下载模型,无需关注以下内容。

下载源1:(decode.py 代码会自动访问这个源下载): https://huggingface.co/GuoLiyong/cn_conformer_encoder_aishell/tree/main/data/lang_char

下载源2: 百度网盘

链接: https://pan.baidu.com/s/17tPOJM_Sm49q1kZrE3jfUQ
提取码: qa4p

对于访问下载源1有困难或者访问速度过慢的同学,可以手动从百度网盘下载. 下载完毕后按以下文件结构放置下载所得的"tokens.txt"和"conformer_encoder.pt"两个文件:

.
|-- README.md
|-- build
|   `-- bdist.linux-x86_64
|-- conformer.py
|-- data
|   |-- lang_char
|   |   |-- tokens.txt
|   `-- wavs
|       |-- BAC009S0764W0143.wav
|       |-- README.md
|       `-- transcript
|-- decode.py
|-- exp
|   `-- conformer_encoder.pt
|-- requirements.txt
|-- setup.py
`-- utils.py
Owner
LIyong.Guo
LIyong.Guo
AutoGluon: AutoML for Text, Image, and Tabular Data

AutoML for Text, Image, and Tabular Data AutoGluon automates machine learning tasks enabling you to easily achieve strong predictive performance in yo

Amazon Web Services - Labs 5.2k Dec 29, 2022
Mlcode - Continuous ML API Integrations

mlcode Basic APIs for ML applications. Django REST Application Contains REST API

Sujith S 1 Jan 01, 2022
SEJE is a prototype for the paper Learning Text-Image Joint Embedding for Efficient Cross-Modal Retrieval with Deep Feature Engineering.

SEJE is a prototype for the paper Learning Text-Image Joint Embedding for Efficient Cross-Modal Retrieval with Deep Feature Engineering. Contents Inst

0 Oct 21, 2021
Generate a cool README/About me page for your Github Profile

Github Profile README/ About Me Generator 💯 This webapp lets you build a cool README for your profile. A few inputs + ~15 mins = Your Github Profile

Rahul Banerjee 179 Jan 07, 2023
COVID-19 Related NLP Papers

COVID-19 outbreak has become a global pandemic. NLP researchers are fighting the epidemic in their own way.

xcfeng 28 Oct 30, 2022
Source code and dataset for ACL 2019 paper "ERNIE: Enhanced Language Representation with Informative Entities"

ERNIE Source code and dataset for "ERNIE: Enhanced Language Representation with Informative Entities" Reqirements: Pytorch=0.4.1 Python3 tqdm boto3 r

THUNLP 1.3k Dec 30, 2022
Web Scraping, Document Deduplication & GPT-2 Fine-tuning with a newly created scam dataset.

Web Scraping, Document Deduplication & GPT-2 Fine-tuning with a newly created scam dataset.

18 Nov 28, 2022
American Sign Language (ASL) to Text Converter

Signterpreter American Sign Language (ASL) to Text Converter Recommendations Although there is grayscale and gaussian blur, we recommend that you use

0 Feb 20, 2022
LSTC: Boosting Atomic Action Detection with Long-Short-Term Context

LSTC: Boosting Atomic Action Detection with Long-Short-Term Context This Repository contains the code on AVA of our ACM MM 2021 paper: LSTC: Boosting

Tencent YouTu Research 9 Oct 11, 2022
Official Pytorch implementation of Test-Agnostic Long-Tailed Recognition by Test-Time Aggregating Diverse Experts with Self-Supervision.

This repository is the official Pytorch implementation of Test-Agnostic Long-Tailed Recognition by Test-Time Aggregating Diverse Experts with Self-Supervision.

vanint 101 Dec 30, 2022
Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.

OpenSpeech provides reference implementations of various ASR modeling papers and three languages recipe to perform tasks on automatic speech recogniti

Soohwan Kim 26 Dec 14, 2022
skweak: A software toolkit for weak supervision applied to NLP tasks

Labelled data remains a scarce resource in many practical NLP scenarios. This is especially the case when working with resource-poor languages (or text domains), or when using task-specific labels wi

Norsk Regnesentral (Norwegian Computing Center) 850 Dec 28, 2022
Blazing fast language detection using fastText model

Luga A blazing fast language detection using fastText's language models Luga is a Swahili word for language. fastText provides a blazing fast language

Prayson Wilfred Daniel 18 Dec 20, 2022
IndoBERTweet is the first large-scale pretrained model for Indonesian Twitter. Published at EMNLP 2021 (main conference)

IndoBERTweet 🐦 🇮🇩 1. Paper Fajri Koto, Jey Han Lau, and Timothy Baldwin. IndoBERTweet: A Pretrained Language Model for Indonesian Twitter with Effe

IndoLEM 40 Nov 30, 2022
This is a really simple text-to-speech app made with python and tkinter.

Tkinter Text-to-Speech App by Souvik Roy This is a really simple tkinter app which converts the text you have entered into a speech. It is created wit

Souvik Roy 1 Dec 21, 2021
A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

MMF is a modular framework for vision and language multimodal research from Facebook AI Research. MMF contains reference implementations of state-of-t

Facebook Research 5.1k Dec 26, 2022
Full Spectrum Bioinformatics - a free online text designed to introduce key topics in Bioinformatics using the Python

Full Spectrum Bioinformatics is a free online text designed to introduce key topics in Bioinformatics using the Python programming language. The text is written in interactive Jupyter Notebooks, whic

Jesse Zaneveld 33 Dec 28, 2022
A library for end-to-end learning of embedding index and retrieval model

Poeem Poeem is a library for efficient approximate nearest neighbor (ANN) search, which has been widely adopted in industrial recommendation, advertis

54 Dec 21, 2022
Implementation of N-Grammer, augmenting Transformers with latent n-grams, in Pytorch

N-Grammer - Pytorch Implementation of N-Grammer, augmenting Transformers with latent n-grams, in Pytorch Install $ pip install n-grammer-pytorch Usage

Phil Wang 66 Dec 29, 2022
Multi Task Vision and Language

12-in-1: Multi-Task Vision and Language Representation Learning Please cite the following if you use this code. Code and pre-trained models for 12-in-

Meta Research 711 Jan 08, 2023