Use Convolutional Recurrent Neural Network to recognize the Handwritten line text image without pre segmentation into words or characters. Use CTC loss Function to train.

Overview

Handwritten Line Text Recognition using Deep Learning with Tensorflow

Description

Use Convolutional Recurrent Neural Network to recognize the Handwritten line text image without pre segmentation into words or characters. Use CTC loss Function to train. More read this Medium Post

Why Deep Learning?

Why Deep Learning

Deep Learning self extracts features with a deep neural networks and classify itself. Compare to traditional Algorithms it performance increase with Amount of Data.

Basic Intuition on How it Works.

Step_wise_detail

  • First Use Convolutional Recurrent Neural Network to extract the important features from the handwritten line text Image.
  • The output before CNN FC layer (512x100x8) is passed to the BLSTM which is for sequence dependency and time-sequence operations.
  • Then CTC LOSS Alex Graves is used to train the RNN which eliminate the Alignment problem in Handwritten, since handwritten have different alignment of every writers. We just gave the what is written in the image (Ground Truth Text) and BLSTM output, then it calculates loss simply as -log("gtText"); aim to minimize negative maximum likelihood path.
  • Finally CTC finds out the possible paths from the given labels. Loss is given by for (X,Y) pair is: Ctc_Loss
  • Finally CTC Decode is used to decode the output during Prediction.

Detail Project Workflow

Architecture of Model

  • Project consists of Three steps:
    1. Multi-scale feature Extraction --> Convolutional Neural Network 7 Layers
    2. Sequence Labeling (BLSTM-CTC) --> Recurrent Neural Network (2 layers of LSTM) with CTC
    3. Transcription --> Decoding the output of the RNN (CTC decode) DetailModelArchitecture

Requirements

  1. Tensorflow 1.8.0
  2. Flask
  3. Numpy
  4. OpenCv 3
  5. Spell Checker autocorrect >=0.3.0 pip install autocorrect

Dataset Used

  • IAM dataset download from here
  • Only needed the lines images and lines.txt (ASCII).
  • Place the downloaded files inside data directory
The Trained model is available and download from this link. The trained model CER=8.32% and trained on IAM dataset with some additional created dataset.

To Train the model from scratch

$ python main.py --train

To validate the model

$ python main.py --validate

To Prediction

$ python main.py

Run in Web with Flask

$ python upload.py
Validation character error rate of saved model: 8.654728%
Python: 3.6.4 
Tensorflow: 1.8.0
Init with stored values from ../model/snapshot-24
Without Correction clothed leaf by leaf with the dioappoistmest
With Correction clothed leaf by leaf with the dioappoistmest

Prediction output on IAM Test Data PredictionOutput

Prediction output on Self Test Data PredictionOutput

See the project Devnagari Handwritten Word Recognition with Deep Learning for more insights.

Further Improvement

  • Using MDLSTM to recognize whole paragraph at once Scan, Attend and Read: End-to-End Handwritten Paragraph Recognition with MDLSTM Attention
  • Line segementation can be added for full paragraph text recognition. For line segmentation you can use A* path planning algorithm or CNN model to seperate paragraph into lines.
  • Better Image preprocessing such as: reduce backgoround noise to handle real time image more accurately.
  • Better Decoding approach to improve accuracy. Some of the CTC Decoder found here

Feel Free to improve this project with pull Request.

This is part of my last semester project of Computer Engineering From Tribhuvan University. July 2019

Owner
sushant097
Machine Learning Engineer | Computer Vision Developer. Working in the field of Research, development of Machine learning and Computer Vision .
sushant097
Python Computer Vision from Scratch

This repository explores the variety of techniques commonly used to analyze and interpret images. It also describes challenging real-world applications where vision is being successfully used, both f

Milaan Parmar / Милан пармар / _米兰 帕尔马 221 Dec 26, 2022
Sort By Face

Sort-By-Face This is an application with which you can either sort all the pictures by faces from a corpus of photos or retrieve all your photos from

0 Nov 29, 2021
Msos searcher - A half-hearted attempt at finding a magic square of squares

MSOS searcher A half-hearted attempt at finding (or rather searching) a MSOS (Magic Square of Squares) in the spirit of the Parker Square. Running I r

Niels Mündler 1 Jan 02, 2022
Polaris is a Face recognition attendance system .

Support Me 🚀 About Polaris 📄 Polaris is a system based on facial recognition with a futuristic GUI design, Can easily find people informations store

XN3UR0N 215 Dec 26, 2022
OCR system for Arabic language that converts images of typed text to machine-encoded text.

Arabic OCR OCR system for Arabic language that converts images of typed text to machine-encoded text. The system currently supports only letters (29 l

Hussein Youssef 144 Jan 05, 2023
Document Layout Analysis Projects

Layout_Analysis Introduction This is an implementation of RLSA and X-Y Cut with OpenCV Dependencies OpenCV 3.0+ How to use Compile with g++ : g++ -std

22 Dec 08, 2022
Code for the "Sensing leg movement enhances wearable monitoring of energy expenditure" paper.

EnergyExpenditure Code for the "Sensing leg movement enhances wearable monitoring of energy expenditure" paper. Additional data for replicating this s

Patrick S 42 Oct 26, 2022
Face Recognizer using Opencv Python

Face Recognizer using Opencv Python The first step create your own dataset with file open-cv-create_dataset second step You can put the photo accordin

Han Izza 2 Nov 16, 2021
Distilling Knowledge via Knowledge Review, CVPR 2021

ReviewKD Distilling Knowledge via Knowledge Review Pengguang Chen, Shu Liu, Hengshuang Zhao, Jiaya Jia This project provides an implementation for the

DV Lab 194 Dec 28, 2022
A python program to block out your face

Readme This is a small program I threw together in about 6 hours to block out your face. It probably doesn't work very well, so be warned. By default,

1 Oct 17, 2021
Validate and transform various OCR file formats (hOCR, ALTO, PAGE, FineReader)

ocr-fileformat Validate and transform between OCR file formats (hOCR, ALTO, PAGE, FineReader) Installation Docker System-wide Usage CLI GUI API Transf

Universitätsbibliothek Mannheim 152 Dec 20, 2022
TextField: Learning A Deep Direction Field for Irregular Scene Text Detection (TIP 2019)

TextField: Learning A Deep Direction Field for Irregular Scene Text Detection Introduction The code and trained models of: TextField: Learning A Deep

Yukang Wang 101 Dec 12, 2022
Give a solution to recognize MaoYan font.

猫眼字体识别 该 github repo 在于帮助xjtlu的同学们识别猫眼的扭曲字体。已经打包上传至 pypi ,可以使用 pip 直接安装。 猫眼字体的识别不出来的原理与解决思路在采茶上 使用方法: import MaoYanFontRecognize

Aruix 4 Jun 30, 2022
Semantic-based Patch Detection for Binary Programs

PMatch Semantic-based Patch Detection for Binary Programs Requirement tensorflow-gpu 1.13.1 numpy 1.16.2 scikit-learn 0.20.3 ssdeep 3.4 Usage tar -xvz

Mr.Curiosity 3 Sep 02, 2022
Play the Namibian game of Owela against a terrible AI. Built using Django and htmx.

Owela Club A Django project for playing the Namibian game of Owela against a dumb AI. Built following the rules described on the Mancala World wiki pa

Adam Johnson 18 Jun 01, 2022
SRA's seminar on Introduction to Computer Vision Fundamentals

Introduction to Computer Vision This repository includes basics to : Python Numpy: A python library Git Computer Vision. The aim of this repository is

Society of Robotics and Automation 147 Dec 04, 2022
PianoVisuals - Create background videos synced with piano music using opencv

Steps Record piano video Use Neural Network to do body segmentation (video matti

Solbiati Alessandro 4 Jan 24, 2022
Learn computer graphics by writing GPU shaders!

This repo contains a selection of projects designed to help you learn the basics of computer graphics. We'll be writing shaders to render interactive two-dimensional and three-dimensional scenes.

Eric Zhang 1.9k Jan 02, 2023
Repository relating to the CVPR21 paper TimeLens: Event-based Video Frame Interpolation

TimeLens: Event-based Video Frame Interpolation This repository is about the High Speed Event and RGB (HS-ERGB) dataset, used in the 2021 CVPR paper T

Robotics and Perception Group 544 Dec 19, 2022
This is an API written in python that uses FastAPI. It is a simple API that can detect discord tokens in Images.

Welcome This is an API written in python that uses FastAPI. It is a simple API that can detect discord tokens in Images. Installation There are curren

8 Jul 29, 2022