Connectionist Temporal Classification (CTC) decoding algorithms: best path, beam search, lexicon search, prefix search, and token passing. Implemented in Python.

Overview

CTC Decoding Algorithms

Update 2021: installable Python package

Python implementation of some common Connectionist Temporal Classification (CTC) decoding algorithms. A minimalistic language model is provided.

Installation

  • Go to the root level of the repository
  • Execute pip install .
  • Go to tests/ and execute pytest to check if installation worked

Usage

Basic usage

Here is a minimalistic executable example:

import numpy as np
from ctc_decoder import best_path, beam_search

mat = np.array([[0.4, 0, 0.6], [0.4, 0, 0.6]])
chars = 'ab'

print(f'Best path: "{best_path(mat, chars)}"')
print(f'Beam search: "{beam_search(mat, chars)}"')

The output mat (numpy array, softmax already applied) of the CTC-trained neural network is expected to have shape TxC and is passed as the first argument to the decoders. T is the number of time-steps, and C the number of characters (the CTC-blank is the last element). The characters that can be predicted by the neural network are passed as the chars string to the decoder. Decoders return the decoded string.
Running the code outputs:

Best path: ""
Beam search: "a"

To see more examples on how to use the decoders, please have a look at the scripts in the tests/ folder.

Language model and BK-tree

Beam search can optionally integrate a character-level language model. Text statistics (bigrams) are used by beam search to improve reading accuracy.

from ctc_decoder import beam_search, LanguageModel

# create language model instance from a (large) text
lm = LanguageModel('this is some text', chars)

# and use it in the beam search decoder
res = beam_search(mat, chars, lm=lm)

The lexicon search decoder computes a first approximation with best path decoding. Then, it uses a BK-tree to retrieve similar words, scores them and finally returns the best scoring word. The BK-tree is created by providing a list of dictionary words. A tolerance parameter defines the maximum edit distance from the query word to the returned dictionary words.

from ctc_decoder import lexicon_search, BKTree

# create BK-tree from a list of words
bk_tree = BKTree(['words', 'from', 'a', 'dictionary'])

# and use the tree in the lexicon search
res = lexicon_search(mat, chars, bk_tree, tolerance=2)

Usage with deep learning frameworks

Some notes:

  • No adapter for TensorFlow or PyTorch is provided
  • Apply softmax already in the model
  • Convert to numpy array
  • Usually, the output of an RNN layer rnn_output has shape TxBxC, with B the batch dimension
    • Decoders work on single batch elements of shape TxC
    • Therefore, iterate over all batch elements and apply the decoder to each of them separately
    • Example: extract matrix of batch element 0 mat = rnn_output[:, 0, :]
  • The CTC-blank is expected to be the last element along the character dimension
    • TensorFlow has the CTC-blank as last element, so nothing to do here
    • PyTorch, however, has the CTC-blank as first element by default, so you have to move it to the end, or change the default setting

List of provided decoders

Recommended decoders:

  • best_path: best path (or greedy) decoder, the fastest of all algorithms, however, other decoders often perform better
  • beam_search: beam search decoder, optionally integrates a character-level language model, can be tuned via the beam width parameter
  • lexicon_search: lexicon search decoder, returns the best scoring word from a dictionary

Other decoders, from my experience not really suited for practical purposes, but might be used for experiments or research:

  • prefix_search: prefix search decoder
  • token_passing: token passing algorithm
  • Best path decoder implementation in OpenCL (see extras/ folder)

This paper gives suggestions when to use best path decoding, beam search decoding and token passing.

Documentation of test cases and data

References

Owner
Harald Scheidl
Interested in computer vision, deep learning, C++ and Python.
Harald Scheidl
official ( API ) for the zAmericanEnglish app in [ Google play ] and [ App store ]

official ( API ) for the zAmericanEnglish app in [ Google play ] and [ App store ]

Plugin 3 Jan 12, 2022
End-to-End Speech Processing Toolkit

ESPnet: end-to-end speech processing toolkit system/pytorch ver. 1.0.1 1.1.0 1.2.0 1.3.1 1.4.0 1.5.1 1.6.0 1.7.1 1.8.1 ubuntu18/python3.8/pip ubuntu18

ESPnet 5.9k Jan 03, 2023
PRAnCER is a web platform that enables the rapid annotation of medical terms within clinical notes.

PRAnCER (Platform enabling Rapid Annotation for Clinical Entity Recognition) is a web platform that enables the rapid annotation of medical terms within clinical notes. A user can highlight spans of

Sontag Lab 39 Nov 14, 2022
Visual Automata is a Python 3 library built as a wrapper for Caleb Evans' Automata library to add more visualization features.

Visual Automata Copyright 2021 Lewi Lie Uberg Released under the MIT license Visual Automata is a Python 3 library built as a wrapper for Caleb Evans'

Lewi Uberg 55 Nov 17, 2022
DeBERTa: Decoding-enhanced BERT with Disentangled Attention

DeBERTa: Decoding-enhanced BERT with Disentangled Attention This repository is the official implementation of DeBERTa: Decoding-enhanced BERT with Dis

Microsoft 1.2k Jan 03, 2023
A high-level yet extensible library for fast language model tuning via automatic prompt search

ruPrompts ruPrompts is a high-level yet extensible library for fast language model tuning via automatic prompt search, featuring integration with Hugg

Sber AI 37 Dec 07, 2022
A flask application to predict the speech emotion of any .wav file.

This is a speech emotion recognition app. It will allow you to train a modular MLP model with the RAVDESS dataset, and then use that model with a flask application to predict the speech emotion of an

Aryan Vijaywargia 2 Dec 15, 2021
AIDynamicTextReader - A simple dynamic text reader based on Artificial intelligence

AI Dynamic Text Reader: This is a simple dynamic text reader based on Artificial

Md. Rakibul Islam 1 Jan 18, 2022
Python library to make development of portfolio analysis faster and easier

Trafalgar Python library to make development of portfolio analysis faster and easier Installation 🔥 For the moment, Trafalgar is still in beta develo

Santosh Passoubady 641 Jan 01, 2023
A single model that parses Universal Dependencies across 75 languages.

A single model that parses Universal Dependencies across 75 languages. Given a sentence, jointly predicts part-of-speech tags, morphology tags, lemmas, and dependency trees.

Dan Kondratyuk 189 Nov 29, 2022
Constituency Tree Labeling Tool

Constituency Tree Labeling Tool The purpose of this package is to solve the constituency tree labeling problem. Look from the dataset labeled by NLTK,

张宇 6 Dec 20, 2022
Code for lyric-section-to-comment generation based on huggingface transformers.

CommentGeneration Code for lyric-section-to-comment generation based on huggingface transformers. Migrate Guyu model and code (both 12-layers and 24-l

Yawei Sun 8 Sep 04, 2021
DeepSpeech - Easy-to-use Speech Toolkit including SOTA ASR pipeline, influential TTS with text frontend and End-to-End Speech Simultaneous Translation.

(简体中文|English) Quick Start | Documents | Models List PaddleSpeech is an open-source toolkit on PaddlePaddle platform for a variety of critical tasks i

5.6k Jan 03, 2023
Mycroft Core, the Mycroft Artificial Intelligence platform.

Mycroft Mycroft is a hackable open source voice assistant. Table of Contents Getting Started Running Mycroft Using Mycroft Home Device and Account Man

Mycroft 6.1k Jan 09, 2023
Persian Bert For Long-Range Sequences

ParsBigBird: Persian Bert For Long-Range Sequences The Bert and ParsBert algorithms can handle texts with token lengths of up to 512, however, many ta

Sajjad Ayoubi 63 Dec 14, 2022
Ask for weather information like a human

weather-nlp About Ask for weather information like a human. Goals Understand typical questions like: Hourly temperatures in Potsdam on 2020-09-15. Rai

5 Oct 29, 2022
Large-scale Knowledge Graph Construction with Prompting

Large-scale Knowledge Graph Construction with Prompting across tasks (predictive and generative), and modalities (language, image, vision + language, etc.)

ZJUNLP 161 Dec 28, 2022
ACL'2021: Learning Dense Representations of Phrases at Scale

DensePhrases DensePhrases is an extractive phrase search tool based on your natural language inputs. From 5 million Wikipedia articles, it can search

Princeton Natural Language Processing 540 Dec 30, 2022
NL. The natural language programming language.

NL A Natural-Language programming language. Built using Codex. A few examples are inside the nl_projects directory. How it works Write any code in pur

2 Jan 17, 2022
This is a simple item2vec implementation using gensim for recbole

recbole-item2vec-model This is a simple item2vec implementation using gensim for recbole( https://recbole.io ) Usage When you want to run experiment f

Yusuke Fukasawa 2 Oct 06, 2022