This is a c++ project deploying a deep scene text reading pipeline with tensorflow. It reads text from natural scene images. It uses frozen tensorflow graphs. The detector detect scene text locations. The recognizer reads word from each detected bounding box.

Overview

DeepSceneTextReader

This is a c++ project deploying a deep scene text reading pipeline. It reads text from natural scene images.

Prerequsites

The project is written in c++ using tensorflow computational framework. It is tested using tensorflow 1.4. Newer version should be ok too, but not tested. Please install:

Please check this project on how to build project using tensorflow with cmake: https://github.com/cjweeks/tensorflow-cmake It greatly helped the progress of building this project. When building tensorflow library, please be careful since we need to use opencv. Looks like there is still problem when including tensorflow and opencv together. It will make opencv unable to read image. Check out this issue: https://github.com/tensorflow/tensorflow/issues/14267 The answer by allenlavoie solved my problem, so I paste it here:

"In the meantime, as long as you're not using any custom ops you can build libtensorflow_cc.so with bazel build --config=monolithic, which will condense everything together into one shared object (no libtensorflow_framework dependence) and seal off non-TensorFlow symbols. That shared object will have protocol buffer symbols."

Status

Currently two pretrained model is provided. One for scene text detection, and one for scene text recognition. More model will be provided. Note that the current model is not so robust. U can easily change to ur trained model. The models will be continuously updated.

build process

cd build

cmake ..

make

It will create an excutable named DetectText in bin folder.

Usage:

The excutable could be excuted in three modes: (1) Detect (2) Recognize (3) Detect and Recognize

Detect

Download the pretrained detector model and put it in model/

./DetectText --detector_graph='model/Detector_model.pb'
--image_filename='test_images/test_img1.jpg' --mode='detect' --output_filename='results/output_image.jpg'

Recognize

Download the pretrained recognizer model and put it in model/ Download the dictionary file and put it in model

./DetectText --recognizer_graph='model/Recognizer_model.pb'
--image_filename='test_images/recognize_image1.jpg' --mode='recognize'
--im_height=32 --im_width=128

Detect and Recognize

Download the pretrained detector and recognizer model and put it in model/ as described previously.

./DetectText --recognizer_graph=$recognizer_graph --detector_graph='model/Detector_model.pb'
--image_filename='model/Recognizer_model.pb' --mode='detect_and_read' --output_filename='results/output_image.jpg'

Model Description

Detector

  1. Faster RCNN Detector Model The detector is trained with modified tensorflow [object detector api]: (https://github.com/tensorflow/models/tree/master/research/object_detection) I modify it by changing the proposal scheme to regress to the 4 coordinates of the oriented bounding box rather than regular rectangular bounding box. Check out this repo for the training code. Pretrained model: FasterRCNN_detector_model.pb

  2. R2CNN will be updated. See R2CNN for details. The code is also modified with tnesorflow [object detector api]: (https://github.com/tensorflow/models/tree/master/research/object_detection) The training code will be released soon.

Recognizer

  1. CTC scene text recognizer. The recognizer model follows the famous scene text recognition CRNN model

  2. Spatial Attention OCR will be updated soon. It is based on GoogleOCR

Detect and Recognize

The whole scene text reading pipeline detects the text and rotate it horizontally and read it with recognizer. The pipeline is here:

Pretrained Models

You can play with the code with provided pretrained models.
They are not fully optimized yet, but could be used for being familiar with the code.
Check them out here: models

You will find two detection models called: (1) FasterRCNN_detector_model.pb (2) R2CNN_detector_model.pb
Two recognition models with their charset: (1) Recognizer_model.pb + charset_full.txt and (2)Recognizer_model_case_insen.pb + charset_case_insen.txt.
Full charset means English letters + digit and case insen means case insensitive English letters + digit. Let me know if u have any problens using them.

Reference and Related Projects

Contact:

Owner
Dafang He
Ph.D student at the Penn State University. Focusing on machine learning and computer vision.
Dafang He
Isearch (OSINT) 🔎 Face recognition reverse image search on Instagram profile feed photos.

isearch is an OSINT tool on Instagram. Offers a face recognition reverse image search on Instagram profile feed photos.

Malek salem 20 Oct 25, 2022
Extracting Tables from Document Images using a Multi-stage Pipeline for Table Detection and Table Structure Recognition:

Multi-Type-TD-TSR Check it out on Source Code of our Paper: Multi-Type-TD-TSR Extracting Tables from Document Images using a Multi-stage Pipeline for

Pascal Fischer 178 Dec 27, 2022
Code for CVPR'2022 paper ✨ "Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-Language Model"

PPE ✨ Repository for our CVPR'2022 paper: Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-

Zipeng Xu 34 Nov 28, 2022
YOLOv5 in DOTA with CSL_label.(Oriented Object Detection)(Rotation Detection)(Rotated BBox)

YOLOv5_DOTA_OBB YOLOv5 in DOTA_OBB dataset with CSL_label.(Oriented Object Detection) Datasets and pretrained checkpoint Datasets : DOTA Pretrained Ch

1.1k Dec 30, 2022
Introduction to Augmented Reality (AR) with Python 3 and OpenCV 4.2.

Introduction to Augmented Reality (AR) with Python 3 and OpenCV 4.2.

fernanda rodríguez 85 Jan 02, 2023
利用Paddle框架复现CRAFT

CRAFT-Paddle 利用Paddle框架复现CRAFT CRAFT 本项目基于paddlepaddle框架复现CRAFT,并参加百度第三届论文复现赛,将在2021年5月15日比赛完后提供AIStudio链接~敬请期待 参考项目: CRAFT: Character-Region Awarenes

QuanHao Guo 2 Mar 07, 2022
Validate and transform various OCR file formats (hOCR, ALTO, PAGE, FineReader)

ocr-fileformat Validate and transform between OCR file formats (hOCR, ALTO, PAGE, FineReader) Installation Docker System-wide Usage CLI GUI API Transf

Universitätsbibliothek Mannheim 152 Dec 20, 2022
A python program to block out your face

Readme This is a small program I threw together in about 6 hours to block out your face. It probably doesn't work very well, so be warned. By default,

1 Oct 17, 2021
governance proposal to make fei redeemable for eth

Feil Proposal 🌲 Abstract Migrate all ETH from Fei protocol-controlled value into Yearn ETH Vault. Allow redemptions of outstanding FEI for yvETH. At

13 Mar 31, 2022
This can be use to convert text in a file to handwritten text.

TextToHandwriting This can be used to convert text to handwriting. Clone this project or download the code. Run TextToImage.py give the filename of th

Ashutosh Mahapatra 2 Feb 06, 2022
A general list of resources to image text localization and recognition 场景文本位置感知与识别的论文资源与实现合集 シーンテキストの位置認識と識別のための論文リソースの要約

Scene Text Localization & Recognition Resources Read this institute-wise: English, 简体中文. Read this year-wise: English, 简体中文. Tags: [STL] (Scene Text L

Karl Lok (Zhaokai Luo) 901 Dec 11, 2022
CNN+LSTM+CTC based OCR implemented using tensorflow.

CNN_LSTM_CTC_Tensorflow CNN+LSTM+CTC based OCR(Optical Character Recognition) implemented using tensorflow. Note: there is No restriction on the numbe

Watson Yang 356 Dec 08, 2022
Text layer for bio-image annotation.

napari-text-layer Napari text layer for bio-image annotation. Installation You can install using pip: pip install napari-text-layer Keybindings and m

6 Sep 29, 2022
Generating .npy dataset and labels out of given image, containing numbers from 0 to 9, using opencv

basic-dataset-generator-from-image-of-numbers generating .npy dataset and labels out of given image, containing numbers from 0 to 9, using opencv inpu

1 Jan 01, 2022
A tool to make dumpy among us GIFS

Among Us Dumpy Gif Maker Made by ThatOneCalculator & Pixer415 With help from Telk, karl-police, and auguwu! Please credit this repository when you use

Kainoa Kanter 535 Jan 07, 2023
Provides OCR (Optical Character Recognition) services through web applications

OCR4all As suggested by the name one of the main goals of OCR4all is to allow basically any given user to independently perform OCR on a wide variety

174 Dec 31, 2022
Captcha Recognition

The objective of this project is to recognize the target numbers in the captcha images correctly which would tell us how good or bad a captcha system has been built.

Mohit Kaushik 5 Feb 20, 2022
SemTorch

SemTorch This repository contains different deep learning architectures definitions that can be applied to image segmentation. All the architectures a

David Lacalle Castillo 154 Dec 07, 2022
Code release for Hu et al., Learning to Segment Every Thing. in CVPR, 2018.

Learning to Segment Every Thing This repository contains the code for the following paper: R. Hu, P. Dollár, K. He, T. Darrell, R. Girshick, Learning

Ronghang Hu 417 Oct 03, 2022
pyntcloud is a Python library for working with 3D point clouds.

pyntcloud is a Python library for working with 3D point clouds.

David de la Iglesia Castro 1.2k Jan 07, 2023