This project modify tensorflow object detection api code to predict oriented bounding boxes. It can be used for scene text detection.

Overview

This is an oriented object detector based on tensorflow object detection API. Most of the code is not changed except for those related to the need of predicinting oriented bounding boxes rather than regular horizontal bounding boxes.

Many tasks need to predict an oriented bounding box, e.g: Scene Text Detection. Check out the detection results: (Note that this code doesn't train model to recognize text. Only the bounding boxes are predicted)

Goals

For each predicted bounding boxes, in addition to the regular horizontal bounding box, we need to predict one oriented bounding box. Basically it means that we need to regress to an oriented bounding box. In this project, we simply regress to the encoded 4 corners of the oriented bounding boxes(8 values). See below equation for the encoding function. j is the index for each corner. g represents ground truth oriented bounding boxes. w_a and h_a is the anchor width and height, respectively.

The reason of adopting this Faster RCNN/SSD framework:

There are many object detection framework to be used. We adopt this one as the basis for the following reasons:

Highly modular designed code

It's easy to change the encoding scheme in the code. Simply changing the code in box_coders folder. The encoding using [R2CNN] (https://arxiv.org/abs/1706.09579) will be released soon. Training model with faster rcnn or ssd is easy to modify.

Natural integration with slim nets

It's easy to change feature extraction CNN backbone by using slim nets.

Easy and clear configuration setting with google protobuf

Changing the network configuration setting is easy. For example, to change the different aspect ratios of the anchors used, simply changing the grid_anchor_generator in the configuration file.

Many supporting codes have been provided.

It provides many supporting code such as exporting the trained model to a frozen graph that can be used in production(For example, in your c++ project). Check out my another project DeepSceneTextReader which used the frozen graph trained with this code.

Code Changed compared to the original object detection implementation

Import path for each python file

You do not need to use blaze build to build the code. Simply run the code from the root directory for fast experiment.

proto files

added oriented related filed to the proto files. Please build them with

protoc protos/*.proto --python_out=.

Box encoding scheme

added code for encode and decode oriented bounding boxes

Added code in meta architecture for supporting oriented bounding box prediction

Add code to predict the oriented bounding boxes for each proposal. At the same time the add code to calculate the oriented bounding boxes regression loss.

Other changes regarding data reading, data decoding and others

Usage:

Create the tfrecord data

Use the code create_text_dataset.py to create the tfexample data files used for training. You can create ICDAR 2015 and ICDAR 2013 data for training.

Download the pretrained weight

If you are training faster rcnn inception resnet v2 model, you can download the pretrained weight from tensorflow model zoo.

change the specific configuration setting.

See data/faster_rcnn_inception_resnet_v2_atrous_text.config for example configuration The parameter: second_stage_localization_loss_weight_oriented is the weight for the oriented bounding box prediction.

Train the model

Example running script is provided: train_faster_rcnn_inception_resnet_v2.sh

Evaluation

Trained with default configuration with ResNet Inception V2 or ResNet 101 backbone on ICDAR 2013 + ICDAR 2015 training set. The performance on ICDAR 2015 dataset.

Backbone Recall Precision F-1
ResNet Inception V2 0.7371 0.8057 0.7699
ResNet 101 0.6861 0.8213 0.7476

To improve the performance, try changing the configuration settings. Many scene text detectors have more aspect ratios anchors for each location than that was used for regular object detection.

TODO

  1. Provide support for R2CNN training.

Reference and Related Projects

Contact:

Owner
Dafang He
Ph.D student at the Penn State University. Focusing on machine learning and computer vision.
Dafang He
A PyTorch implementation of ECCV2018 Paper: TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes

TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes A PyTorch implement of TextSnake: A Flexible Representation for Detecting

Prince Wang 417 Dec 12, 2022
Detect textlines in document images

Textline Detection Detect textlines in document images Introduction This tool performs border, region and textline detection from document image data

QURATOR-SPK 70 Jun 30, 2022
TableBank: A Benchmark Dataset for Table Detection and Recognition

TableBank TableBank is a new image-based table detection and recognition dataset built with novel weak supervision from Word and Latex documents on th

844 Jan 04, 2023
Character Segmentation using TensorFlow

Character Segmentation Segment characters and spaces in one text line,from this paper Chinese English mixed Character Segmentation as Semantic Segment

26 Aug 25, 2022
7th place solution

SIIM-FISABIO-RSNA-COVID-19-Detection 7th place solution Validation: We used iterative-stratification with 5 folds (https://github.com/trent-b/iterativ

11 Jul 17, 2022
A curated list of papers and resources for scene text detection and recognition

Awesome Scene Text A curated list of papers and resources for scene text detection and recognition The year when a paper was first published, includin

Jan Zdenek 43 Mar 15, 2022
SemTorch

SemTorch This repository contains different deep learning architectures definitions that can be applied to image segmentation. All the architectures a

David Lacalle Castillo 154 Dec 07, 2022
Image Smoothing and Blurring Using OpenCV

Image-Smoothing-and-Blurring-Using-OpenCV This repository contains codes for performing image smoothing and blurring using OpenCV. There are different

Happy N. Monday 3 Feb 15, 2022
Zoom , GoogleMeets에서 Vtuber 데뷔하기

EasyVtuber Facial landmark와 GAN을 이용한 Character Face Generation Google Meets, Zoom 등에서 자신만의 웹툰, 만화 캐릭터로 대화해보세요! 악세사리는 어느정도 추가해도 잘 작동해요! 안타깝게도 RTX 2070

Gunwoo Han 140 Dec 23, 2022
Perspective recovery of text using transformed ellipses

unproject_text Perspective recovery of text using transformed ellipses. See full writeup at https://mzucker.github.io/2016/10/11/unprojecting-text-wit

Matt Zucker 111 Nov 13, 2022
Learning Camera Localization via Dense Scene Matching, CVPR2021

This repository contains code of our CVPR 2021 paper - "Learning Camera Localization via Dense Scene Matching" by Shitao Tang, Chengzhou Tang, Rui Hua

tangshitao 65 Dec 01, 2022
Volume Control using OpenCV

Gesture-Volume-Control Volume Control using OpenCV Here i made volume control using Python and OpenCV in which we can control the volume of our laptop

Mudit Sinha 3 Oct 10, 2021
PyTorch Re-Implementation of EAST: An Efficient and Accurate Scene Text Detector

Description This is a PyTorch Re-Implementation of EAST: An Efficient and Accurate Scene Text Detector. Only RBOX part is implemented. Using dice loss

365 Dec 20, 2022
Application that instantly translates sign-language to letters.

Sign Language Translator Project Description The main purpose of project is translating sign-language to letters. In accordance with this purpose we d

3 Sep 29, 2022
OCR engine for all the languages

Description kraken is a turn-key OCR system optimized for historical and non-Latin script material. kraken's main features are: Fully trainable layout

431 Jan 04, 2023
Pixie - A full-featured 2D graphics library for Python

Pixie - A full-featured 2D graphics library for Python Pixie is a 2D graphics library similar to Cairo and Skia. pip install pixie-python Features: Ty

treeform 65 Dec 30, 2022
Hand gesture detection project with aweome UI implementation.

an awesome hand gesture detection project for you to be creative! Imagination is the limit to do with this project.

AR Ashraf 39 Sep 26, 2022
Convert PDF/Image to TXT using EasyOcr - the best OCR engine available!

PDFImage2TXT - DOWNLOAD INSTALLER HERE What can you do with it? Convert scanned PDFs to TXT. Convert scanned Documents to TXT. No coding required!! In

Hans Alemão 2 Feb 22, 2022
A simple demo program for using OpenCV on Android

Kivy OpenCV Demo A simple demo program for using OpenCV on Android Build with: buildozer android debug deploy run Run (on desktop) with: python main.p

Andrea Ranieri 13 Dec 29, 2022