Handwritten_Text_Recognition

Overview

Deep Learning framework for Line-level Handwritten Text Recognition

Short presentation of our project

  1. Introduction

  2. Installation
    2.a Install conda environment
    2.b Download databases

    • IAM dataset
    • ICFHR 2014 dataset
  3. How to use
    3.a Make predictions on unlabelled data using our best networks
    3.b Train and test a network from scratch
    3.c Test a model without retraining it

  4. References

  5. Contact

1. Introduction

This work was an internship project under Mathieu Aubry's supervision, at the LIGM lab, located in Paris.

In HTR, the task is to predict a transcript from an image of a handwritten text. A commonly used structure for this task is Convolutional Recurrent Neural Networks (CRNN). One CRNN network consists of a feature extractor (often with convolutional layers), followed by a recurrent network (LSTM).

This github provides a framework to train and test CRNN networks on handwritten grayscale line-level datasets. This github also provides code to generate predictions on an unlabelled, line-level, grayscale line-level dataset. There are several options for the structure of the CRNN used, image preprocessing, dataset used, data augmentation.

alt text

2. Installation

Prerequisites

Make sure you have Anaconda installed (version >= to 4.7.10, you may not be able to install correct dependencies if older). If not, follow the installation instructions provided at https://docs.anaconda.com/anaconda/install/.

Also pull the git.

2.a Download and activate conda environment

Once in the git folder on your machine, run the command lines :

conda env create -f HTR_environment.yml
conda activate HTR 

2.b Download databases

You will only need to download these databases if you want to train your own network from scratch. The framework is built to train a network on one of these 2 datasets : IAM and ICFHR2014 HTR competition. [ADD REF TO SLIDES]

  • Before downloading IAM dataset, you need to register on this website. Once that's done, you need to download :

    • The 'lines' folder at this link.
    • The 'split' folder at this link.
    • The 'lines.txt' file at this link.
  • For ICFHR2014 dataset, you need to download the 'BenthamDatasetR0-GT' folder at this link.

Make sure to download the two databases in the same folder. Structure must be

Your data folder / 
    IAM/
        lines.txt
        lines/
        split/
            trainset.txt
            testset.txt
            validationset1.txt
            validationset2.txt
            
    ICFHR2014/
        BenthamDatasetR0-GT/ 

    Your own dataset/

3. How to use

3.a Make predictions on your own unlabelled dataset

Running this code will use model stored at model_path to make predictions on images stored in data_path. The predictions will be stored in predictions.txt in data_path folder.

python lines_predictor.py --data_path datapath  --model_path ./trained_networks/IAM_model_imgH64.pth --imgH 64

/!\ Make sure that each image in the data folder has a unique file name and all images are in .jpg form. When you use our trained model with imgH as 64 (i.e. IAM_model_imgH64.pth), you have to set the argument --imgH as 64.

3.b Train a network from scratch

python train.py --dataset dataset  --tr_data_path data_dir --save_model_path path

Before running the code, make sure that you change ROOT_PATH variable at the beginning of params.py to the path of the folder you want to save your models in. Main arguments :

  • --dataset: name of the dataset to train and test on. Supported values are ICFHR2014 and IAM.
  • --tr_data_path: location of the train dataset folder on local machine. See section [??] for downloading datasets.
  • --save_model_path: path of the folder where model will be saved if params.save is set to True.

Main learning arguments :

  • --data_aug: If set to True, will apply random affine data transformation to the training images.

  • --optimizer: Which optimizer to use. Supported values are rmsprop, adam, adadelta, and sgd. We recommend using RMSprop, which got best results in our experiments. See params.py for optimizer-specific parameters.

  • --epochs : Number of training epochs

  • --lr: Learning rate at the beginning of training.

  • --milestones: List of the epochs at which the learning rate will be divided by 10.

  • feat_extractor: Structure to use for the feature extractor. Supported values are resnet18, custom_resnet, and conv.

    • resnet18 : standard structure of resnet18.
    • custom_resnet: variant of resnet18 that we tuned for our experiments.
    • conv: Use this option if you want to use a purely convolutional feature extractor and not a residual one. See conv parameters in params.py to choose conv structure.

3.c Test a model without retraining it

Running this code will compute the average CER and WER of model stored at pretrained_model path on the testing set of chosen dataset.

python train.py --train '' --save '' --pretrained_model model_path --dataset dataset --tr_data_path data_path 

Main arguments :

  • --pretrained_model: path to state_dict of pretrained model.
  • --dataset: Which dataset to test on. Supported values are ICFHR2014 and IAM.
  • --tr_data_path: path to the dataset folder (see section [??])

4. References

Graves et al. Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks
Sánchez et al. A set of benchmarks for Handwritten Text Recognition on historical documents
Dutta et al. Improving CNN-RNN Hybrid Networks for Handwriting Recognition

U.-V. Marti, H. Bunke The IAM-database: an English sentence database for offline handwriting recognition

https://github.com/Holmeyoung/crnn-pytorch
https://github.com/georgeretsi/HTR-ctc
Synthetic line generator : https://github.com/monniert/docExtractor (see paper for more information)

5. Contact

If you have questions or remarks about this project, please email us at [email protected] and [email protected].

Machine Leaning applied to denoise images to improve OCR Accuracy

Machine Learning to Denoise Images for Better OCR Accuracy This project is an adaptation of this tutorial and used only for learning purposes: https:/

Antonio Bri Pérez 2 Nov 16, 2022
Introduction to image processing, most used and popular functions of OpenCV

👀 OpenCV 101 Introduction to image processing, most used and popular functions of OpenCV go here.

Vusal Ismayilov 3 Jul 02, 2022
Code for the paper: Fusformer: A Transformer-based Fusion Approach for Hyperspectral Image Super-resolution

Fusformer Code for the paper: "Fusformer: A Transformer-based Fusion Approach for Hyperspectral Image Super-resolution" Plateform Python 3.8.5 + Pytor

Jin-Fan Hu (胡锦帆) 11 Dec 12, 2022
3点クリックで円を指定し、極座標変換を行うサンプルプログラム

click-warpPolar 3点クリックで円を指定し、極座標変換を行うサンプルプログラムです。 Requirements OpenCV 3.4.2 or Later Usage 実行方法は以下です。 起動後、マウスで3点をクリックし円を指定してください。 python click-warpPol

KazuhitoTakahashi 17 Dec 30, 2022
Markup for note taking

Subtext: markup for note-taking Subtext is a text-based, block-oriented hypertext format. It is designed with note-taking in mind. It has a simple, pe

Gordon Brander 224 Jan 01, 2023
1st place solution for SIIM-FISABIO-RSNA COVID-19 Detection Challenge

SIIM-COVID19-Detection Source code of the 1st place solution for SIIM-FISABIO-RSNA COVID-19 Detection Challenge. 1.INSTALLATION Ubuntu 18.04.5 LTS CUD

Nguyen Ba Dung 170 Dec 21, 2022
🖺 OCR using tensorflow with attention

tensorflow-ocr 🖺 OCR using tensorflow with attention, batteries included Installation git clone --recursive http://github.com/pannous/tensorflow-ocr

646 Nov 11, 2022
ERQA - Edge Restoration Quality Assessment

ERQA - a full-reference quality metric designed to analyze how good image and video restoration methods (SR, deblurring, denoising, etc) are restoring real details.

MSU Video Group 27 Dec 17, 2022
Binarize document images

Binarization Binarization for document images Examples Introduction This tool performs document image binarization (i.e. transform colour/grayscale to

QURATOR-SPK 48 Jan 02, 2023
Deep LearningImage Captcha 2

滑动验证码深度学习识别 本项目使用深度学习 YOLOV3 模型来识别滑动验证码缺口,基于 https://github.com/eriklindernoren/PyTorch-YOLOv3 修改。 只需要几百张缺口标注图片即可训练出精度高的识别模型,识别效果样例: 克隆项目 运行命令: git cl

Python3WebSpider 117 Dec 28, 2022
Course material for the Multi-agents and computer graphics course

TC2008B Course material for the Multi-agents and computer graphics course. Setup instructions Strongly recommend using a custom conda environment. Ins

16 Dec 13, 2022
Document blur detection based on Laplacian operator and text detection.

Document Blur Detection For general blurred image, using the variance of Laplacian operator is a good solution. But as for the blur detection of docum

JoeyLr 5 Oct 20, 2022
An official PyTorch implementation of the paper "Learning by Aligning: Visible-Infrared Person Re-identification using Cross-Modal Correspondences", ICCV 2021.

PyTorch implementation of Learning by Aligning (ICCV 2021) This is an official PyTorch implementation of the paper "Learning by Aligning: Visible-Infr

CV Lab @ Yonsei University 30 Nov 05, 2022
M-LSDを用いて四角形を検出し、射影変換を行うサンプルプログラム

M-LSD-warpPerspective-Example M-LSDを用いて四角形を検出し、射影変換を行うサンプルプログラムです。 Requirements OpenCV 3.4.2 or Later tensorflow 2.4.1 or Later Usage 実行方法は以下です。 pytho

KazuhitoTakahashi 9 Oct 14, 2022
Indonesian ID Card OCR using tesseract OCR

KTP OCR Indonesian ID Card OCR using tesseract OCR KTP OCR is python-flask with tesseract web application to convert Indonesian ID Card to text / JSON

Revan Muhammad Dafa 5 Dec 06, 2021
Pixie - A full-featured 2D graphics library for Python

Pixie - A full-featured 2D graphics library for Python Pixie is a 2D graphics library similar to Cairo and Skia. pip install pixie-python Features: Ty

treeform 65 Dec 30, 2022
天池2021"全球人工智能技术创新大赛"【赛道一】:医学影像报告异常检测 - 第三名解决方案

天池2021"全球人工智能技术创新大赛"【赛道一】:医学影像报告异常检测 比赛链接 个人博客记录 目录结构 ├── final------------------------------------决赛方案PPT ├── preliminary_contest--------------------

19 Aug 17, 2022
Geometric Augmentation for Text Image

Text Image Augmentation A general geometric augmentation tool for text images in the CVPR 2020 paper "Learn to Augment: Joint Data Augmentation and Ne

Canjie Luo 440 Jan 05, 2023
ISI's Optical Character Recognition (OCR) software for machine-print and handwriting data

VistaOCR ISI's Optical Character Recognition (OCR) software for machine-print and handwriting data Publications "How to Efficiently Increase Resolutio

ISI Center for Vision, Image, Speech, and Text Analytics 21 Dec 08, 2021
轻量级公式 OCR 小工具:一键识别各类公式图片,并转换为 LaTeX 格式

QC-Formula | 青尘公式 OCR 介绍 轻量级开源公式 OCR 小工具:一键识别公式图片,并转换为 LaTeX 格式。 支持从 电脑本地 导入公式图片;(后续版本将支持直接从网页导入图片) 公式图片支持 .png / .jpg / .bmp,大小为 4M 以内均可; 支持印刷体及手写体,前

青尘工作室 26 Jan 07, 2023