Connectionist Temporal Classification (CTC) decoding algorithms: best path, beam search, lexicon search, prefix search, and token passing. Implemented in Python.

Overview

CTC Decoding Algorithms

Update 2021: installable Python package

Python implementation of some common Connectionist Temporal Classification (CTC) decoding algorithms. A minimalistic language model is provided.

Installation

  • Go to the root level of the repository
  • Execute pip install .
  • Go to tests/ and execute pytest to check if installation worked

Usage

Basic usage

Here is a minimalistic executable example:

import numpy as np
from ctc_decoder import best_path, beam_search

mat = np.array([[0.4, 0, 0.6], [0.4, 0, 0.6]])
chars = 'ab'

print(f'Best path: "{best_path(mat, chars)}"')
print(f'Beam search: "{beam_search(mat, chars)}"')

The output mat (numpy array, softmax already applied) of the CTC-trained neural network is expected to have shape TxC and is passed as the first argument to the decoders. T is the number of time-steps, and C the number of characters (the CTC-blank is the last element). The characters that can be predicted by the neural network are passed as the chars string to the decoder. Decoders return the decoded string.
Running the code outputs:

Best path: ""
Beam search: "a"

To see more examples on how to use the decoders, please have a look at the scripts in the tests/ folder.

Language model and BK-tree

Beam search can optionally integrate a character-level language model. Text statistics (bigrams) are used by beam search to improve reading accuracy.

from ctc_decoder import beam_search, LanguageModel

# create language model instance from a (large) text
lm = LanguageModel('this is some text', chars)

# and use it in the beam search decoder
res = beam_search(mat, chars, lm=lm)

The lexicon search decoder computes a first approximation with best path decoding. Then, it uses a BK-tree to retrieve similar words, scores them and finally returns the best scoring word. The BK-tree is created by providing a list of dictionary words. A tolerance parameter defines the maximum edit distance from the query word to the returned dictionary words.

from ctc_decoder import lexicon_search, BKTree

# create BK-tree from a list of words
bk_tree = BKTree(['words', 'from', 'a', 'dictionary'])

# and use the tree in the lexicon search
res = lexicon_search(mat, chars, bk_tree, tolerance=2)

Usage with deep learning frameworks

Some notes:

  • No adapter for TensorFlow or PyTorch is provided
  • Apply softmax already in the model
  • Convert to numpy array
  • Usually, the output of an RNN layer rnn_output has shape TxBxC, with B the batch dimension
    • Decoders work on single batch elements of shape TxC
    • Therefore, iterate over all batch elements and apply the decoder to each of them separately
    • Example: extract matrix of batch element 0 mat = rnn_output[:, 0, :]
  • The CTC-blank is expected to be the last element along the character dimension
    • TensorFlow has the CTC-blank as last element, so nothing to do here
    • PyTorch, however, has the CTC-blank as first element by default, so you have to move it to the end, or change the default setting

List of provided decoders

Recommended decoders:

  • best_path: best path (or greedy) decoder, the fastest of all algorithms, however, other decoders often perform better
  • beam_search: beam search decoder, optionally integrates a character-level language model, can be tuned via the beam width parameter
  • lexicon_search: lexicon search decoder, returns the best scoring word from a dictionary

Other decoders, from my experience not really suited for practical purposes, but might be used for experiments or research:

  • prefix_search: prefix search decoder
  • token_passing: token passing algorithm
  • Best path decoder implementation in OpenCL (see extras/ folder)

This paper gives suggestions when to use best path decoding, beam search decoding and token passing.

Documentation of test cases and data

References

Owner
Harald Scheidl
Interested in computer vision, deep learning, C++ and Python.
Harald Scheidl
Code for PED: DETR For (Crowd) Pedestrian Detection

Code for PED: DETR For (Crowd) Pedestrian Detection

36 Sep 13, 2022
TTS is a library for advanced Text-to-Speech generation.

TTS is a library for advanced Text-to-Speech generation. It's built on the latest research, was designed to achieve the best trade-off among ease-of-training, speed and quality. TTS comes with pretra

Mozilla 6.5k Jan 08, 2023
Control the classic General Instrument SP0256-AL2 speech chip and AY-3-8910 sound generator with a Raspberry Pi and this Python library.

GI-Pi Control the classic General Instrument SP0256-AL2 speech chip and AY-3-8910 sound generator with a Raspberry Pi and this Python library. The SP0

Nick Bild 8 Dec 15, 2021
Implementation of N-Grammer, augmenting Transformers with latent n-grams, in Pytorch

N-Grammer - Pytorch Implementation of N-Grammer, augmenting Transformers with latent n-grams, in Pytorch Install $ pip install n-grammer-pytorch Usage

Phil Wang 66 Dec 29, 2022
texlive expressions for documents

tex2nix Generate Texlive environment containing all dependencies for your document rather than downloading gigabytes of texlive packages. Installation

Jörg Thalheim 70 Dec 26, 2022
justCTF [*] 2020 challenges sources

justCTF [*] 2020 This repo contains sources for justCTF [*] 2020 challenges hosted by justCatTheFish. TLDR: Run a challenge with ./run.sh (requires Do

justCatTheFish 25 Dec 27, 2022
Maix Speech AI lib, including ASR, chat, TTS etc.

Maix-Speech 中文 | English Brief Now only support Chinese, See 中文 Build Clone code by: git clone https://github.com/sipeed/Maix-Speech Compile x86x64 c

Sipeed 267 Dec 25, 2022
Code Generation using a large neural network called GPT-J

CodeGenX is a Code Generation system powered by Artificial Intelligence! It is delivered to you in the form of a Visual Studio Code Extension and is Free and Open-source!

DeepGenX 389 Dec 31, 2022
Code for CVPR 2021 paper: Revamping Cross-Modal Recipe Retrieval with Hierarchical Transformers and Self-supervised Learning

Revamping Cross-Modal Recipe Retrieval with Hierarchical Transformers and Self-supervised Learning This is the PyTorch companion code for the paper: A

Amazon 69 Jan 03, 2023
The official repository of the ISBI 2022 KNIGHT Challenge

KNIGHT The official repository holding the data for the ISBI 2022 KNIGHT Challenge About The KNIGHT Challenge asks teams to develop models to classify

Nicholas Heller 4 Jan 22, 2022
Contains the code and data for our #ICSE2022 paper titled as "CodeFill: Multi-token Code Completion by Jointly Learning from Structure and Naming Sequences"

CodeFill This repository contains the code for our paper titled as "CodeFill: Multi-token Code Completion by Jointly Learning from Structure and Namin

Software Analytics Lab 11 Oct 31, 2022
Unsupervised text tokenizer for Neural Network-based text generation.

SentencePiece SentencePiece is an unsupervised text tokenizer and detokenizer mainly for Neural Network-based text generation systems where the vocabu

Google 6.4k Jan 01, 2023
"Investigating the Limitations of Transformers with Simple Arithmetic Tasks", 2021

transformers-arithmetic This repository contains the code to reproduce the experiments from the paper: Nogueira, Jiang, Lin "Investigating the Limitat

Castorini 33 Nov 16, 2022
An implementation of model parallel GPT-2 and GPT-3-style models using the mesh-tensorflow library.

GPT Neo 🎉 1T or bust my dudes 🎉 An implementation of model & data parallel GPT3-like models using the mesh-tensorflow library. If you're just here t

EleutherAI 6.7k Dec 28, 2022
Python module (C extension and plain python) implementing Aho-Corasick algorithm

pyahocorasick pyahocorasick is a fast and memory efficient library for exact or approximate multi-pattern string search meaning that you can find mult

Wojciech Muła 763 Dec 27, 2022
SimpleChinese2 集成了许多基本的中文NLP功能,使基于 Python 的中文文字处理和信息提取变得简单方便。

SimpleChinese2 SimpleChinese2 集成了许多基本的中文NLP功能,使基于 Python 的中文文字处理和信息提取变得简单方便。 声明 本项目是为方便个人工作所创建的,仅有部分代码原创。

Ming 30 Dec 02, 2022
A fast and lightweight python-based CTC beam search decoder for speech recognition.

pyctcdecode A fast and feature-rich CTC beam search decoder for speech recognition written in Python, providing n-gram (kenlm) language model support

Kensho 315 Dec 21, 2022
Sentence Embeddings with BERT & XLNet

Sentence Transformers: Multilingual Sentence Embeddings using BERT / RoBERTa / XLM-RoBERTa & Co. with PyTorch This framework provides an easy method t

Ubiquitous Knowledge Processing Lab 9.1k Jan 02, 2023
This project converts your human voice input to its text transcript and to an automated voice too.

Human Voice to Automated Voice & Text Introduction: In this project, whenever you'll speak, it will turn your voice into a robot voice and furthermore

Hassan Shahzad 3 Oct 15, 2021
A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural networks

A Deep Learning NLP/NLU library by Intel® AI Lab Overview | Models | Installation | Examples | Documentation | Tutorials | Contributing NLP Architect

Intel Labs 2.9k Dec 31, 2022