Convolutional Recurrent Neural Network (CRNN) for image-based sequence recognition.

Overview

Convolutional Recurrent Neural Network

This software implements the Convolutional Recurrent Neural Network (CRNN), a combination of CNN, RNN and CTC loss for image-based sequence recognition tasks, such as scene text recognition and OCR. For details, please refer to our paper http://arxiv.org/abs/1507.05717.

UPDATE Mar 14, 2017 A Docker file has been added to the project. Thanks to @varun-suresh.

UPDATE May 1, 2017 A PyTorch port has been made by @meijieru.

UPDATE Jun 19, 2017 For an end-to-end text detector+recognizer, check out the CTPN+CRNN implementation by @AKSHAYUBHAT.

Build

The software has only been tested on Ubuntu 14.04 (x64). CUDA-enabled GPUs are required. To build the project, first install the latest versions of Torch7, fblualib and LMDB. Please follow their installation instructions respectively. On Ubuntu, lmdb can be installed by apt-get install liblmdb-dev.

To build the project, go to src/ and execute sh build_cpp.sh to build the C++ code. If successful, a file named libcrnn.so should be produced in the src/ directory.

Run demo

A demo program can be found in src/demo.lua. Before running the demo, download a pretrained model from here. Put the downloaded model file crnn_demo_model.t7 into directory model/crnn_demo/. Then launch the demo by:

th demo.lua

The demo reads an example image and recognizes its text content.

Example image: Example Image

Expected output:

Loading model...
Model loaded from ../model/crnn_demo/model.t7
Recognized text: available (raw: a-----v--a-i-l-a-bb-l-e---)

Another example: Example Image2

Recognized text: shakeshack (raw: ss-h-a--k-e-ssh--aa-c--k--)

Use pretrained model

The pretrained model can be used for lexicon-free and lexicon-based recognition tasks. Refer to the functions recognizeImageLexiconFree and recognizeImageWithLexicion in file utilities.lua for details.

Train a new model

Follow the following steps to train a new model on your own dataset.

  1. Create a new LMDB dataset. A python program is provided in tool/create_dataset.py. Refer to the function createDataset for details (need to pip install lmdb first).
  2. Create model directory under model/. For example, model/foo_model. Then create configuraton file config.lua under the model directory. You can copy model/crnn_demo/config.lua and do modifications.
  3. Go to src/ and execute th main_train.lua ../models/foo_model/. Model snapshots and logging file will be saved into the model directory.

Build using docker

  1. Install docker. Follow the instructions here
  2. Install nvidia-docker - Follow the instructions here
  3. Clone this repo, from this directory run docker build -t crnn_docker .
  4. Once the image is built, the docker can be run using nvidia-docker run -it crnn_docker.

Citation

Please cite the following paper if you are using the code/model in your research paper.

@article{ShiBY17,
  author    = {Baoguang Shi and
               Xiang Bai and
               Cong Yao},
  title     = {An End-to-End Trainable Neural Network for Image-Based Sequence Recognition
               and Its Application to Scene Text Recognition},
  journal   = {{IEEE} Trans. Pattern Anal. Mach. Intell.},
  volume    = {39},
  number    = {11},
  pages     = {2298--2304},
  year      = {2017}
}

Acknowledgements

The authors would like to thank the developers of Torch7, TH++, lmdb-lua-ffi and char-rnn.

Please let me know if you encounter any issues.

Owner
Baoguang Shi
Researcher at Microsoft
Baoguang Shi
Train custom VR face tracking parameters

Pal Buddy Guy: The anipal's best friend This is a small script to improve upon the tracking capabilities of the Vive Pro Eye and facial tracker. You c

7 Dec 12, 2021
This repository lets you train neural networks models for performing end-to-end full-page handwriting recognition using the Apache MXNet deep learning frameworks on the IAM Dataset.

Handwritten Text Recognition (OCR) with MXNet Gluon These notebooks have been created by Jonathan Chung, as part of his internship as Applied Scientis

Amazon Web Services - Labs 422 Jan 03, 2023
caffe re-implementation of R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection

R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection Abstract This is a caffe re-implementation of R2CNN: Rotational Region CNN fo

candler 80 Dec 28, 2021
Repositório para registro de estudo da biblioteca opencv (Python)

OpenCV (Python) Objetivo do Repositório: Registrar avanços no estudo da biblioteca opencv. O repositório estará aberto a qualquer pessoa e há tambem u

1 Jun 14, 2022
OCR system for Arabic language that converts images of typed text to machine-encoded text.

Arabic OCR OCR system for Arabic language that converts images of typed text to machine-encoded text. The system currently supports only letters (29 l

Hussein Youssef 144 Jan 05, 2023
This project modify tensorflow object detection api code to predict oriented bounding boxes. It can be used for scene text detection.

This is an oriented object detector based on tensorflow object detection API. Most of the code is not changed except for those related to the need of

Dafang He 30 Oct 22, 2022
The code of "Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes"

Mask TextSpotter A Pytorch implementation of Mask TextSpotter along with its extension can be find here Introduction This is the official implementati

Pengyuan Lyu 261 Nov 21, 2022
Some Boring Research About Products Recognition 、Duplicate Img Detection、Img Stitch、OCR

Products Recognition 介绍 商品识别,围绕在复杂的商场零售场景中,识别出货架图像中的商品信息。主要组成部分: 重复图像检测。【更新进度 4/10】 图像拼接。【更新进度 0/10】 目标检测。【更新进度 0/10】 商品识别。【更新进度 1/10】 OCR。【更新进度 1/10】

zhenjieWang 18 Jan 27, 2022
Document manipulation detection with python

image manipulation detection task: -- tianchi function image segmentation salie

JiaKui Hu 3 Aug 22, 2022
原神风花节自动弹琴辅助

GenshinAutoPlayBalladsofBreeze 原神风花节自动弹琴辅助(已适配1920*1080分辨率) 本程序基于opencv图像识别技术,不存在任何封号。 因为正确率取决于你的cpu性能,10900k都不一定全对。 由于图像识别存在误差,根本无法确定出错时间。更不用说被检测到了。

晓轩 20 Oct 27, 2022
Ackermann Line Follower Robot Simulation.

Ackermann Line Follower Robot This is a simulation of a line follower robot that works with steering control based on Stanley: The Robot That Won the

Lucas Mazzetto 2 Apr 16, 2022
This is a Computer vision package that makes its easy to run Image processing and AI functions. At the core it uses OpenCV and Mediapipe libraries.

CVZone This is a Computer vision package that makes its easy to run Image processing and AI functions. At the core it uses OpenCV and Mediapipe librar

CVZone 648 Dec 30, 2022
Code for the "Sensing leg movement enhances wearable monitoring of energy expenditure" paper.

EnergyExpenditure Code for the "Sensing leg movement enhances wearable monitoring of energy expenditure" paper. Additional data for replicating this s

Patrick S 42 Oct 26, 2022
Image Recognition Model Generator

Takes a user-inputted query and generates a machine learning image recognition model that determines if an inputted image is or isn't their query

Christopher Oka 1 Jan 13, 2022
Code for the head detector (HeadHunter) proposed in our CVPR 2021 paper Tracking Pedestrian Heads in Dense Crowd.

Head Detector Code for the head detector (HeadHunter) proposed in our CVPR 2021 paper Tracking Pedestrian Heads in Dense Crowd. The head_detection mod

Ramana Subramanyam 76 Dec 06, 2022
Handwritten Text Recognition (HTR) using TensorFlow 2.x

Handwritten Text Recognition (HTR) system implemented using TensorFlow 2.x and trained on the Bentham/IAM/Rimes/Saint Gall/Washington offline HTR data

Arthur Flôr 160 Dec 21, 2022
Unofficial implementation of "TableNet: Deep Learning model for end-to-end Table detection and Tabular data extraction from Scanned Document Images"

TableNet Unofficial implementation of ICDAR 2019 paper : TableNet: Deep Learning model for end-to-end Table detection and Tabular data extraction from

Jainam Shah 243 Dec 30, 2022
Select range and every time the screen changes, OCR is activated.

ASOCR(Auto Screen OCR) Select range and every time you press Space key, OCR is activated. 範囲を選ぶと、あなたがスペースキーを押すたびに、画面が変わる度にOCRが起動します。 usage1: simple OC

1 Feb 13, 2022
An unofficial package help developers to implement ZATCA (Fatoora) QR code easily which required for e-invoicing

ZATCA (Fatoora) QR-Code Implementation An unofficial package help developers to implement ZATCA (Fatoora) QR code easily which required for e-invoicin

TheAwiteb 28 Nov 03, 2022
CTPN + DenseNet + CTC based end-to-end Chinese OCR implemented using tensorflow and keras

简介 基于Tensorflow和Keras实现端到端的不定长中文字符检测和识别 文本检测:CTPN 文本识别:DenseNet + CTC 环境部署 sh setup.sh 注:CPU环境执行前需注释掉for gpu部分,并解开for cpu部分的注释 Demo 将测试图片放入test_images

Yang Chenguang 2.6k Dec 29, 2022