Handwritten Text Recognition (HTR) system implemented with TensorFlow.

Overview

Handwritten Text Recognition with TensorFlow

  • Update 2021: more robust model, faster dataloader, word beam search decoder also available for Windows
  • Update 2020: code is compatible with TF2

Handwritten Text Recognition (HTR) system implemented with TensorFlow (TF) and trained on the IAM off-line HTR dataset. This Neural Network (NN) model recognizes the text contained in the images of segmented words as shown in the illustration below. 3/4 of the words from the validation-set are correctly recognized, and the character error rate is around 10%.

htr

Run demo

Download the model trained on the IAM dataset. Put the contents of the downloaded file model.zip into the model directory of the repository. Afterwards, go to the src directory and run python main.py. The input image and the expected output is shown below.

test

> python main.py
Init with stored values from ../model/snapshot-39
Recognized: "Hello"
Probability: 0.42098119854927063

Command line arguments

  • --train: train the NN on 95% of the dataset samples and validate on the remaining 5%
  • --validate: validate the trained NN
  • --decoder: select from CTC decoders "bestpath", "beamsearch", and "wordbeamsearch". Defaults to "bestpath". For option "wordbeamsearch" see details below
  • --batch_size: batch size
  • --data_dir: directory containing IAM dataset (with subdirectories img and gt)
  • --fast: use LMDB to load images (faster than loading image files from disk)
  • --dump: dumps the output of the NN to CSV file(s) saved in the dump folder. Can be used as input for the CTCDecoder

If neither --train nor --validate is specified, the NN infers the text from the test image (data/test.png).

Integrate word beam search decoding

The word beam search decoder can be used instead of the two decoders shipped with TF. Words are constrained to those contained in a dictionary, but arbitrary non-word character strings (numbers, punctuation marks) can still be recognized. The following illustration shows a sample for which word beam search is able to recognize the correct text, while the other decoders fail.

decoder_comparison

Follow these instructions to integrate word beam search decoding:

  1. Clone repository CTCWordBeamSearch
  2. Compile and install by running pip install . at the root level of the CTCWordBeamSearch repository
  3. Specify the command line option --decoder wordbeamsearch when executing main.py to actually use the decoder

The dictionary is automatically created in training and validation mode by using all words contained in the IAM dataset (i.e. also including words from validation set) and is saved into the file data/corpus.txt. Further, the manually created list of word-characters can be found in the file model/wordCharList.txt. Beam width is set to 50 to conform with the beam width of vanilla beam search decoding.

Train model with IAM dataset

Follow these instructions to get the IAM dataset:

  • Register for free at this website
  • Download words/words.tgz
  • Download ascii/words.txt
  • Create a directory for the dataset on your disk, and create two subdirectories: img and gt
  • Put words.txt into the gt directory
  • Put the content (directories a01, a02, ...) of words.tgz into the img directory

Start the training

  • Delete files from model directory if you want to train from scratch
  • Go to the src directory and execute python main.py --train --data_dir path/to/IAM
  • Training stops after a fixed number of epochs without improvement

Fast image loading

Loading and decoding the png image files from the disk is the bottleneck even when using only a small GPU. The database LMDB is used to speed up image loading:

  • Go to the src directory and run createLMDB.py --data_dir path/to/IAM with the IAM data directory specified
  • A subfolder lmdb is created in the IAM data directory containing the LMDB files
  • When training the model, add the command line option --fast

The dataset should be located on an SSD drive. Using the --fast option and a GTX 1050 Ti training takes around 3h with a batch size of 500.

Information about model

The model is a stripped-down version of the HTR system I implemented for my thesis. What remains is what I think is the bare minimum to recognize text with an acceptable accuracy. It consists of 5 CNN layers, 2 RNN (LSTM) layers and the CTC loss and decoding layer. The illustration below gives an overview of the NN (green: operations, pink: data flowing through NN) and here follows a short description:

  • The input image is a gray-value image and has a size of 128x32
  • 5 CNN layers map the input image to a feature sequence of size 32x256
  • 2 LSTM layers with 256 units propagate information through the sequence and map the sequence to a matrix of size 32x80. Each matrix-element represents a score for one of the 80 characters at one of the 32 time-steps
  • The CTC layer either calculates the loss value given the matrix and the ground-truth text (when training), or it decodes the matrix to the final text with best path decoding or beam search decoding (when inferring)

nn_overview

References

Owner
Harald Scheidl
Interested in computer vision, deep learning, C++ and Python.
Harald Scheidl
Face Anonymizer - FaceAnonApp v1.0

Face Anonymizer - FaceAnonApp v1.0 Blur faces from image and video files in /data/files folder. Contents Repo of the source files for the FaceAnonApp.

6 Apr 18, 2022
Markup for note taking

Subtext: markup for note-taking Subtext is a text-based, block-oriented hypertext format. It is designed with note-taking in mind. It has a simple, pe

Gordon Brander 224 Jan 01, 2023
Source Code for AAAI 2022 paper "Graph Convolutional Networks with Dual Message Passing for Subgraph Isomorphism Counting and Matching"

Graph Convolutional Networks with Dual Message Passing for Subgraph Isomorphism Counting and Matching This repository is an official implementation of

HKUST-KnowComp 13 Sep 08, 2022
CRAFT-Pyotorch:Character Region Awareness for Text Detection Reimplementation for Pytorch

CRAFT-Reimplementation Note:If you have any problems, please comment. Or you can join us weChat group. The QR code will update in issues #49 . Reimple

453 Dec 28, 2022
A novel region proposal network for more general object detection ( including scene text detection ).

DeRPN: Taking a further step toward more general object detection DeRPN is a novel region proposal network which concentrates on improving the adaptiv

Deep Learning and Vision Computing Lab, SCUT 151 Dec 12, 2022
Opencv-image-filters - A camera to capture videos in real time by placing filters using Python with the help of the Tkinter and OpenCV libraries

Opencv-image-filters - A camera to capture videos in real time by placing filters using Python with the help of the Tkinter and OpenCV libraries

Sergio Díaz Fernández 1 Jan 13, 2022
A python script based on opencv and paddleocr, which can automatically pick up tasks, make cookies, and receive rewards in the Destiny 2 Dawning Oven

A python script based on opencv and paddleocr, which can automatically pick up tasks, make cookies, and receive rewards in the Destiny 2 Dawning Oven

1 Dec 22, 2021
Corner-based Region Proposal Network

Corner-based Region Proposal Network CRPN is a two-stage detection framework for multi-oriented scene text. It employs corners to estimate the possibl

xhzdeng 140 Nov 04, 2022
textspotter - An End-to-End TextSpotter with Explicit Alignment and Attention

An End-to-End TextSpotter with Explicit Alignment and Attention This is initially described in our CVPR 2018 paper. Getting Started Installation Clone

Tong He 323 Nov 10, 2022
Contextual speed detection for python

Speed Prediction using Optical Flow and 2D CNN About the challenge: Comma.AI Speed Challenge This challenge was developed by Comma.AI to predict the s

Mahimana Bhatt 2 Dec 16, 2021
Document Layout Analysis Projects

Layout_Analysis Introduction This is an implementation of RLSA and X-Y Cut with OpenCV Dependencies OpenCV 3.0+ How to use Compile with g++ : g++ -std

22 Dec 08, 2022
利用Paddle框架复现CRAFT

CRAFT-Paddle 利用Paddle框架复现CRAFT CRAFT 本项目基于paddlepaddle框架复现CRAFT,并参加百度第三届论文复现赛,将在2021年5月15日比赛完后提供AIStudio链接~敬请期待 参考项目: CRAFT: Character-Region Awarenes

QuanHao Guo 2 Mar 07, 2022
Open Source Computer Vision Library

OpenCV: Open Source Computer Vision Library Resources Homepage: https://opencv.org Courses: https://opencv.org/courses Docs: https://docs.opencv.org/m

OpenCV 65.7k Jan 03, 2023
Total Text Dataset. It consists of 1555 images with more than 3 different text orientations: Horizontal, Multi-Oriented, and Curved, one of a kind.

Total-Text-Dataset (Official site) Updated on April 29, 2020 (Detection leaderboard is updated - highlighted E2E methods. Thank you shine-lcy.) Update

Chee Seng Chan 671 Dec 27, 2022
Course material for the Multi-agents and computer graphics course

TC2008B Course material for the Multi-agents and computer graphics course. Setup instructions Strongly recommend using a custom conda environment. Ins

16 Dec 13, 2022
scantailor - Scan Tailor is an interactive post-processing tool for scanned pages.

Scan Tailor - scantailor.org This project is no longer maintained, and has not been maintained for a while. About Scan Tailor is an interactive post-p

1.5k Dec 28, 2022
This is a implementation of CRAFT OCR method

This is a implementation of CRAFT OCR method

Esaka 0 Nov 01, 2021
This repository summarized computer vision theories.

This repository summarized computer vision theories.

3 Feb 04, 2022
An Optical Character Recognition system using Pytesseract/Extracting data from Blood Pressure Reports.

Optical_Character_Recognition An Optical Character Recognition system using Pytesseract/Extracting data from Blood Pressure Reports. As an IOT/Compute

Ramsis Hammadi 1 Feb 12, 2022
Sign Language Recognition service utilizing a deep learning model with Long Short-Term Memory to perform sign language recognition.

Sign Language Recognition Service This is a Sign Language Recognition service utilizing a deep learning model with Long Short-Term Memory to perform s

Martin Lønne 1 Jan 08, 2022