An Implementation of the seglink alogrithm in paper Detecting Oriented Text in Natural Images by Linking Segments

Overview

Tips: A more recent scene text detection algorithm: PixelLink, has been implemented here: https://github.com/ZJULearning/pixel_link

Contents:

  1. Introduction
  2. Installation&requirements
  3. Datasets
  4. Problems
  5. Models
  6. Test Your own images
  7. Models
  8. Some Comments

Introduction

This is a re-implementation of the SegLink text detection algorithm described in the paper Detecting Oriented Text in Natural Images by Linking Segments, Baoguang Shi, Xiang Bai, Serge Belongie

Installation&requirements

  1. tensorflow-gpu 1.1.0

  2. cv2. I'm using 2.4.9.1, but some other versions less than 3 should be OK too. If not, try to switch to the version as mine.

  3. download the project pylib and add the src folder to your PYTHONPATH

If any other requirements unmet, just install them following the error msg.

Datasets

  1. SynthText

  2. ICDAR2015

Convert them into tfrecords format using the scripts in datasets if you wanna train your own model.

Problems

The convergence speed of my seglink is quite slow compared with that described in the paper. For example, the authors of SegLink paper said that a good result can be obtained by training on Synthtext for less than 10W iterations and on IC15-train for less than 1W iterations. However, using my implementation, I have to train on SynthText for about 20W iterations and another more than 10W iterations on IC15-train, to get a competitive result.

Several reasons may contribute to the slow convergency of my model:

  1. Batch size. I don't have 4 12G-Titans for training, as described in the paper. Instead, I trained my model on two 8G GeForce GTX 1080 or two Titans.
  2. Learning Rate. In the paper, 10^-3 and 10^-4 have been used. But I adopted a fixed learning rate of 10^-4.
  3. Different initialization model. I used the pretrained VGG model from SSD-caffe on coco , because I thought it better than VGG trained on ImageNet. However, it seems that my point of view does not hold. 4.Some other differences exists maybe, I am not sure.

Models

Two models trained on SynthText and IC15 train can be downloaded.

  1. seglink-384. Trained using image size of 384x384, the same image size as the paper. The Hmean is comparable to the result reported in the paper.

The hust_orientedText is the result of paper.

  1. seglink-512. Trainied using image size of 512x512, and one pointer better than 384x384.

They have been trained:

  • on Synthtext for about 20W iterations, and on IC15-train for 10w~20W iterations.

  • learning_rate = 10e-4

  • two gpus

  • 384: GTX 1080, batch_size = 24; 512: Titan, batch_size = 20

Both models perform best at seg_conf_threshold=0.8 and link_conf_threshold=0.5, well, another difference from paper, which takes 0.9 and 0.7 respectively.

Test Your own images

Use the script test_seglink.py, and a shortcut has been created in script test.sh:

Go to the seglink root directory and execute the command:


./scripts/test.sh 0 GPU_ID CKPT_PATH DATASET_DIR

For example:


./scripts/test.sh 0 ~/models/seglink/model.ckpt-217867  ~/dataset/ICDAR2015/Challenge4/ch4_training_images

I have only tested my models on IC15-test, but any other images can be used for test: just put your images into a directory, and config the path in the command as DATASET_DIR.

A bunch of txt files and a zip file is created after test. If you are using IC15-test for testing, you can upload this zip file to the icdar evaluation server directly.

The text files and placed in a subdir of the checkpoint directory, and contain the bounding boxes as the detection results, and can visualized using the script visualize_detection_result.py.

The command looks like:


python visualize_detection_result.py \

    --image=where your images are put

    --det=the directory of the text files output by test_seglink.py

    --output=the output directory of detection results drawn on images.

For example:


python visualize_detection_result.py \

    --image=~/dataset/ICDAR2015/Challenge4/ch4_training_images/ \

    --det=~/models/seglink/seglink_icdar2015_without_ignored/eval/icdar2015_train/model.ckpt-72885/seg_link_conf_th_0.900000_0.700000/txt \
    --output=~/temp/no-use/seglink_result_512_train

Training and evaluation

The training processing requires data processing, i.e. converting data into tfrecords. The converting scripts are put in the datasets directory. The scrips:train_seglink.py and eval_seglink.py are the training and evaluation scripts respectively. Especially, I have implemented an offline evaluation function, which calculates the Recall/Precision/Hmean as the ICDAR test server, and can be used for cross validation and grid search. However, the resulting scores may have slight differences from those of test sever, but it does not matter that much. Sorry for the imcomplete documentation here. Read and modify them if you want to train your own model.

Some Comments

Thanks should be given to the authors of the Seglink paper, i.e., Baoguang Shi1 Xiang Bai1, Serge Belongie.

EAST is another paper on text detection accepted by CVPR 2017, and its reported result is better than that of SegLink. But if they both use same VGG16, their performances are quite similar.

Contact me if you have any problems, through github issues.

Some Notes On Implementation Detail

How the groundtruth is calculated, in Chinese: http://fromwiz.com/share/s/34GeEW1RFx7x2iIM0z1ZXVvc2yLl5t2fTkEg2ZVhJR2n50xg

Owner
dengdan
Master on CS, from Zhejiang University; Now, perception algorithm R&D in FABU.ai, on automous driving
dengdan
Machine Leaning applied to denoise images to improve OCR Accuracy

Machine Learning to Denoise Images for Better OCR Accuracy This project is an adaptation of this tutorial and used only for learning purposes: https:/

Antonio Bri Pérez 2 Nov 16, 2022
Deep Learning Chinese Word Segment

引用 本项目模型BiLSTM+CRF参考论文:http://www.aclweb.org/anthology/N16-1030 ,IDCNN+CRF参考论文:https://arxiv.org/abs/1702.02098 构建 安装好bazel代码构建工具,安装好tensorflow(目前本项目需

2.1k Dec 23, 2022
Rotational region detection based on Faster-RCNN.

R2CNN_Faster_RCNN_Tensorflow Abstract This is a tensorflow re-implementation of R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detecti

UCAS-Det 581 Nov 22, 2022
This is a real life mario project using python and mediapipe

real-life-mario This is a real life mario project using python and mediapipe How to run to run this just run - realMario.py file requirements This req

Programminghut 42 Dec 22, 2022
Total Text Dataset. It consists of 1555 images with more than 3 different text orientations: Horizontal, Multi-Oriented, and Curved, one of a kind.

Total-Text-Dataset (Official site) Updated on April 29, 2020 (Detection leaderboard is updated - highlighted E2E methods. Thank you shine-lcy.) Update

Chee Seng Chan 671 Dec 27, 2022
Single Shot Text Detector with Regional Attention

Single Shot Text Detector with Regional Attention Introduction SSTD is initially described in our ICCV 2017 spotlight paper. A third-party implementat

Pan He 215 Dec 07, 2022
The code for CVPR2022 paper "Likert Scoring with Grade Decoupling for Long-term Action Assessment".

Likert Scoring with Grade Decoupling for Long-term Action Assessment This is the code for CVPR2022 paper "Likert Scoring with Grade Decoupling for Lon

10 Oct 21, 2022
Official implementation of Character Region Awareness for Text Detection (CRAFT)

CRAFT: Character-Region Awareness For Text detection Official Pytorch implementation of CRAFT text detector | Paper | Pretrained Model | Supplementary

Clova AI Research 2.5k Jan 03, 2023
How to detect objects in real time by using Jupyter Notebook and Neural Networks , by using Yolo3

Real Time Object Recognition From your Screen Desktop . In this post, I will explain how to build a simply program to detect objects from you desktop

Ruslan Magana Vsevolodovna 2 Sep 28, 2022
PAGE XML format collection for document image page content and more

PAGE-XML PAGE XML format collection for document image page content and more For an introduction, please see the following publication: http://www.pri

PRImA Research Lab 46 Nov 14, 2022
Python library to extract tabular data from images and scanned PDFs

Overview ExtractTable - API to extract tabular data from images and scanned PDFs The motivation is to make it easy for developers to extract tabular d

Org. Account 165 Dec 31, 2022
Recognizing cropped text in natural images.

ASTER: Attentional Scene Text Recognizer with Flexible Rectification ASTER is an accurate scene text recognizer with flexible rectification mechanism.

Baoguang Shi 681 Jan 02, 2023
CVPR 2021 Oral paper "LED2-Net: Monocular 360˚ Layout Estimation via Differentiable Depth Rendering" official PyTorch implementation.

LED2-Net This is PyTorch implementation of our CVPR 2021 Oral paper "LED2-Net: Monocular 360˚ Layout Estimation via Differentiable Depth Rendering". Y

Fu-En Wang 83 Jan 04, 2023
Augmenting Anchors by the Detector Itself

Augmenting Anchors by the Detector Itself Introduction It is difficult to determine the scale and aspect ratio of anchors for anchor-based object dete

4 Nov 06, 2022
Optical character recognition for Japanese text, with the main focus being Japanese manga

Manga OCR Optical character recognition for Japanese text, with the main focus being Japanese manga. It uses a custom end-to-end model built with Tran

Maciej Budyś 327 Jan 01, 2023
deployment of a hybrid model for automatic weapon detection/ anomaly detection for surveillance applications

Automatic Weapon Detection Deployment of a hybrid model for automatic weapon detection/ anomaly detection for surveillance applications. Loved the pro

Janhavi 4 Mar 04, 2022
BNF Globalization Code (CVPR 2016)

Boundary Neural Fields Globalization This is the code for Boundary Neural Fields globalization method. The technical report of the method can be found

25 Apr 15, 2022
caffe re-implementation of R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection

R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection Abstract This is a caffe re-implementation of R2CNN: Rotational Region CNN fo

candler 80 Dec 28, 2021
Steve Tu 71 Dec 30, 2022
Python bindings for JIGSAW: a Delaunay-based unstructured mesh generator.

JIGSAW: An unstructured mesh generator JIGSAW is an unstructured mesh generator and tessellation library; designed to generate high-quality triangulat

Darren Engwirda 26 Dec 13, 2022