Connectionist Temporal Classification (CTC) decoding algorithms: best path, beam search, lexicon search, prefix search, and token passing. Implemented in Python.

Overview

CTC Decoding Algorithms

Update 2021: installable Python package

Python implementation of some common Connectionist Temporal Classification (CTC) decoding algorithms. A minimalistic language model is provided.

Installation

  • Go to the root level of the repository
  • Execute pip install .
  • Go to tests/ and execute pytest to check if installation worked

Usage

Basic usage

Here is a minimalistic executable example:

import numpy as np
from ctc_decoder import best_path, beam_search

mat = np.array([[0.4, 0, 0.6], [0.4, 0, 0.6]])
chars = 'ab'

print(f'Best path: "{best_path(mat, chars)}"')
print(f'Beam search: "{beam_search(mat, chars)}"')

The output mat (numpy array, softmax already applied) of the CTC-trained neural network is expected to have shape TxC and is passed as the first argument to the decoders. T is the number of time-steps, and C the number of characters (the CTC-blank is the last element). The characters that can be predicted by the neural network are passed as the chars string to the decoder. Decoders return the decoded string.
Running the code outputs:

Best path: ""
Beam search: "a"

To see more examples on how to use the decoders, please have a look at the scripts in the tests/ folder.

Language model and BK-tree

Beam search can optionally integrate a character-level language model. Text statistics (bigrams) are used by beam search to improve reading accuracy.

from ctc_decoder import beam_search, LanguageModel

# create language model instance from a (large) text
lm = LanguageModel('this is some text', chars)

# and use it in the beam search decoder
res = beam_search(mat, chars, lm=lm)

The lexicon search decoder computes a first approximation with best path decoding. Then, it uses a BK-tree to retrieve similar words, scores them and finally returns the best scoring word. The BK-tree is created by providing a list of dictionary words. A tolerance parameter defines the maximum edit distance from the query word to the returned dictionary words.

from ctc_decoder import lexicon_search, BKTree

# create BK-tree from a list of words
bk_tree = BKTree(['words', 'from', 'a', 'dictionary'])

# and use the tree in the lexicon search
res = lexicon_search(mat, chars, bk_tree, tolerance=2)

Usage with deep learning frameworks

Some notes:

  • No adapter for TensorFlow or PyTorch is provided
  • Apply softmax already in the model
  • Convert to numpy array
  • Usually, the output of an RNN layer rnn_output has shape TxBxC, with B the batch dimension
    • Decoders work on single batch elements of shape TxC
    • Therefore, iterate over all batch elements and apply the decoder to each of them separately
    • Example: extract matrix of batch element 0 mat = rnn_output[:, 0, :]
  • The CTC-blank is expected to be the last element along the character dimension
    • TensorFlow has the CTC-blank as last element, so nothing to do here
    • PyTorch, however, has the CTC-blank as first element by default, so you have to move it to the end, or change the default setting

List of provided decoders

Recommended decoders:

  • best_path: best path (or greedy) decoder, the fastest of all algorithms, however, other decoders often perform better
  • beam_search: beam search decoder, optionally integrates a character-level language model, can be tuned via the beam width parameter
  • lexicon_search: lexicon search decoder, returns the best scoring word from a dictionary

Other decoders, from my experience not really suited for practical purposes, but might be used for experiments or research:

  • prefix_search: prefix search decoder
  • token_passing: token passing algorithm
  • Best path decoder implementation in OpenCL (see extras/ folder)

This paper gives suggestions when to use best path decoding, beam search decoding and token passing.

Documentation of test cases and data

References

Owner
Harald Scheidl
Interested in computer vision, deep learning, C++ and Python.
Harald Scheidl
Training and evaluation codes for the BertGen paper (ACL-IJCNLP 2021)

BERTGEN This repository is the implementation of the paper "BERTGEN: Multi-task Generation through BERT" (https://arxiv.org/abs/2106.03484). The codeb

<a href=[email protected]"> 9 Oct 26, 2022
Maha is a text processing library specially developed to deal with Arabic text.

An Arabic text processing library intended for use in NLP applications Maha is a text processing library specially developed to deal with Arabic text.

Mohammad Al-Fetyani 184 Nov 27, 2022
Tool which allow you to detect and translate text.

Text detection and recognition This repository contains tool which allow to detect region with text and translate it one by one. Description Two pretr

Damian Panek 176 Nov 28, 2022
Google AI 2018 BERT pytorch implementation

BERT-pytorch Pytorch implementation of Google AI's 2018 BERT, with simple annotation BERT 2018 BERT: Pre-training of Deep Bidirectional Transformers f

Junseong Kim 5.3k Jan 07, 2023
Official PyTorch Implementation of paper "NeLF: Neural Light-transport Field for Single Portrait View Synthesis and Relighting", EGSR 2021.

NeLF: Neural Light-transport Field for Single Portrait View Synthesis and Relighting Official PyTorch Implementation of paper "NeLF: Neural Light-tran

Ken Lin 38 Dec 26, 2022
A retro text-to-speech bot for Discord

hawking A retro text-to-speech bot for Discord, designed to work with all of the stuff you might've seen in Moonbase Alpha, using the existing command

Nick Schorr 23 Dec 25, 2022
Quantifiers and Negations in RE Documents

Quantifiers-and-Negations-in-RE-Documents This project was part of my work for a

Nicolas Ruscher 1 Feb 01, 2022
Code for lyric-section-to-comment generation based on huggingface transformers.

CommentGeneration Code for lyric-section-to-comment generation based on huggingface transformers. Migrate Guyu model and code (both 12-layers and 24-l

Yawei Sun 8 Sep 04, 2021
NeoDays-based tileset for the roguelike CDDA (Cataclysm Dark Days Ahead)

NeoDaysPlus Reduced contrast, expanded, and continuously developed version of the CDDA tileset NeoDays that's being completed with new sprites for mis

0 Nov 12, 2022
Espresso: A Fast End-to-End Neural Speech Recognition Toolkit

Espresso Espresso is an open-source, modular, extensible end-to-end neural automatic speech recognition (ASR) toolkit based on the deep learning libra

Yiming Wang 919 Jan 03, 2023
spaCy-wrap: For Wrapping fine-tuned transformers in spaCy pipelines

spaCy-wrap: For Wrapping fine-tuned transformers in spaCy pipelines spaCy-wrap is minimal library intended for wrapping fine-tuned transformers from t

Kenneth Enevoldsen 32 Dec 29, 2022
The RWKV Language Model

RWKV-LM We propose the RWKV language model, with alternating time-mix and channel-mix layers: The R, K, V are generated by linear transforms of input,

PENG Bo 877 Jan 05, 2023
A versatile token stream for handwritten parsers.

Writing recursive-descent parsers by hand can be quite elegant but it's often a bit more verbose than expected, especially when it comes to handling indentation and reporting proper syntax errors. Th

Valentin Berlier 8 Nov 30, 2022
BiQE: Code and dataset for the BiQE paper

BiQE: Bidirectional Query Embedding This repository includes code for BiQE and the datasets introduced in Answering Complex Queries in Knowledge Graph

Bhushan Kotnis 1 Oct 20, 2021
keras implement of transformers for humans

keras implement of transformers for humans

苏剑林(Jianlin Su) 4.8k Jan 03, 2023
Bot to connect a real Telegram user, simulating responses with OpenAI's davinci GPT-3 model.

AI-BOT Bot to connect a real Telegram user, simulating responses with OpenAI's davinci GPT-3 model.

Thempra 2 Dec 21, 2022
The projects lets you extract glossary words and their definitions from a given piece of text automatically using NLP techniques

Unsupervised technique to Glossary and Definition Extraction Code Files GPT2-DefinitionModel.ipynb - GPT-2 model for definition generation. Data_Gener

Prakhar Mishra 28 May 25, 2021
Guide: Finetune GPT2-XL (1.5 Billion Parameters) and GPT-NEO (2.7 B) on a single 16 GB VRAM V100 Google Cloud instance with Huggingface Transformers using DeepSpeed

Guide: Finetune GPT2-XL (1.5 Billion Parameters) and GPT-NEO (2.7 Billion Parameters) on a single 16 GB VRAM V100 Google Cloud instance with Huggingfa

289 Jan 06, 2023
Implementation of ProteinBERT in Pytorch

ProteinBERT - Pytorch (wip) Implementation of ProteinBERT in Pytorch. Original Repository Install $ pip install protein-bert-pytorch Usage import torc

Phil Wang 92 Dec 25, 2022
This converter will create the exact measure for your cappuccino recipe from the grandiose Rafaella Ballerini!

About CappuccinoJs This converter will create the exact measure for your cappuccino recipe from the grandiose Rafaella Ballerini! Este conversor criar

Arthur Ottoni Ribeiro 48 Nov 15, 2022