EAST for ICPR MTWI 2018 Challenge II (Text detection of network images)

Overview

EAST_ICPR2018: EAST for ICPR MTWI 2018 Challenge II (Text detection of network images)

Introduction

This is a repository forked from argman/EAST for the ICPR MTWI 2018 Challenge II.
Origin Repository: argman/EAST - EAST: An Efficient and Accurate Scene Text Detector. It is a tensorflow re-implementation of EAST: An Efficient and Accurate Scene Text Detector.
Origin Author: argman

This repository also refers to HaozhengLi/EAST_ICPR
Origin Repository: HaozhengLi/EAST_ICPR.
Origin Author: HaozhengLi.

Author: Qichao Wu
Email: [email protected] or [email protected]

Contents

  1. Dataset and Transform
  2. Models
  3. Demo
  4. Train
  5. Test
  6. Results

Dataset and Transform

the dataset for model training include ICDAR 2017 MLT (train + val), RCTW-17 (train) and ICPR MTWI 2018. Among them, ICPR MTWI 2018 include 9000 train data <ICPR_text_train_part2_20180313> and 1000 validate data <(update)ICPR_text_train_part1_20180316>.

Some data in the dataset is abnormal for argman/EAST, just like ICPR_text_train_part2_20180313 or (update)ICPR_text_train_part1_20180316. Abnormal means that the ground true labels are anticlockwise, or the images are not in 3 channels. Then errors like 'poly in wrong direction' will occur while using argman/EAST.

Images and ground true labels files must be renamed as <img_1>, <img_2>, ..., <img_xxx> and <txt_1>, <txt_2>, ..., <txt_xxx> while using argman/EAST to train or test Because Names of the images and txt in ICPR MTWI 2018 are abnormal. Like <T1cMkaFMFcXXXXXXXX_!!0-item_pic.jpg> but not <img_***.jpg>. Then errors will occur while using argman/EAST#test.

So I wrote a python program to check and transform the dataset. The program named <getTxt.py> is in the folder 'script/' and its parameters are descripted as bellow:

#input
gt_text_dir="./txt_9000"                   #original ground true labels 
image_dir = "./image_9000/*.jpg"           #original image which must be in 3 channels(Assume that the picture is in jpg format. If the picture is in another format, please change the suffix of the picture.
#output
revised_text_dir = "./trainData"           #Rename txt for EAST and make the coordinate of detected text block in txt clockwise
imgs_save_dir = "./trainData"              #Rename image for EAST 

Before you run getTxt.py to transform the dataset for argman/EAST, you should make sure that the original images are all in 3 channels. I write a cpp file to selete the abnormal picture(not in 3 channels) from the dataset. The program named <change_three_channels.cpp> is in the folder 'script/' and its parameters are descripted as bellow:

string dir_path = "./image_9000/";             //original images which include abnomral images
string output_path = "./output/";              //abnormal images which is in three channels 

When you get the output abnormal images from getTxt.py, please transform them to normal ones through other tools like Format Factory (e.g. Cast to jpg format in Format Factory)

I have changed ICPR MTWI 2018 for EAST. Their names are ICPR2018_training which include 9000 train images+txt and ICPR2018_validation which include 1000 validate images+txt.
I have also changed ICDAR 2017 MLT (train + val) for EAST. Their names are ICDAR2017_training which include 1600 train images+txt and ICDAR2017_validation which include 400 images+txt.
I have changed RCTW-17 (train) but it's too large to upload so maybe you change yourself.

Models

  1. Use ICPR2018_training and 0.0001 learning rate to train Resnet_V1_50 model which is pretrained by ICDAR 2013 (train) + ICDAR 2015 (train). The pretrained model is provided by argman/EAST, it is trainde by 50k iteration.
    The 100k iteration model is 50net-100k, 270k iteration model is 50net-270k, 900k iteraion model is 50net-900k
  2. Use ICPR2018_training, ICDAR2017_training, ICDAR2017_validation, RCTW-17 (train) and 0.0001 learing rate to train Resnet_V1_101 model. The pretrainede model is slim_resnet_v1_101 provided by tensorflow slim.
    The 230k iteration model is 101net-mix-230k
  3. Use ICPR2018_training, ICDAR2017_training, ICDAR2017_validation, RCTW-17 (train) and 0.001 learing rate to train Resnet_V1_101 model. The pretrainede model is 101net-mix-230k.
    The 330k iteration model is 101net-mix-10*lr-330k
  4. Use ICPR2018_training and 0.0001 learing rate to train Resnet_V1_101 model. The pretrainede model is mix-10lr-330k.
    The 460k iteration model is 101net-460k
  5. Use ICPR2018_training and 0.0001 learing rate to train Resnet_V1_101 model. The pretrainede model is 101net-mix-230k.
    The 300k iteration model is 101net-300k, 400k iteration model is 101net-400k, 500k iteration model is 101net-500k, 550k iteraion model is 101net-550k
  6. Use ICPR2018_training and 0.0001 learing rate with data argument to train Resnet_V1_101 model. The pretrainede model is 101net-550k.
    The 700k iteration model is 101net-arg-700k, 1000k iteration model is 101net-arg-1000k

Demo

Download the pre-trained models and run:

python run_demo_server.py --checkpoint-path models/east_icpr2018_resnet_v1_50_rbox_100k/

Then Open http://localhost:8769 for the web demo server, or get the results in 'static/results/'.
Note: See argman/EAST#demo for more details.

Train

Prepare the training set and run:

python multigpu_train.py --gpu_list=0 --input_size=512 --batch_size_per_gpu=14 --checkpoint_path=/tmp/east_icdar2015_resnet_v1_50_rbox/ \
--text_scale=512 --training_data_path=/data/ocr/icdar2015/ --geometry=RBOX --learning_rate=0.0001 --num_readers=24 \
--pretrained_model_path=/tmp/resnet_v1_50.ckpt

Note 1: Images and ground true labels files must be renamed as <img_1>, <img_2>, ..., <img_xxx> while using argman/EAST. Please see the examples in the folder 'training_samples/'.
Note 2: If --restore=True, training will restore from checkpoint and ignore the --pretrained_model_path. If --restore=False, training will delete checkpoint and initialize with the --pretrained_model_path (if exists).
Note 3: If you want to change the learning rate during training, your setting learning rate in the command line is equal to the learning rate which you want to set in current step divided by the learning rate in current step times original learing rate setted in the command line
Note 4: See argman/EAST#train for more details.

when you use Resnet_V1_101 model, you should modify three parts of code in argman/EAST. 1.model.py

with slim.arg_scope(resnet_v1.resnet_arg_scope(weight_decay=weight_decay)):
    # logits, end_points = resnet_v1.resnet_v1_50(images, is_training=is_training, scope='resnet_v1_50')
    logits, end_points = resnet_v1.resnet_v1_101(images, is_training=is_training, scope='resnet_v1_101')

2.nets/resnet_v1.py

if __name__ == '__main__':
    input = tf.placeholder(tf.float32, shape=(None, 224, 224, 3), name='input')
    with slim.arg_scope(resnet_arg_scope()) as sc:
        # logits = resnet_v1_50(input)
        logits = resnet_v1_101(input)

3.nets/resnet_v1.py

try:
    # end_points['pool3'] = end_points['resnet_v1_50/block1']
    # end_points['pool4'] = end_points['resnet_v1_50/block2']
    end_points['pool3'] = end_points['resnet_v1_101/block1']
    end_points['pool4'] = end_points['resnet_v1_101/block2']
except:
    #end_points['pool3'] = end_points['Detection/resnet_v1_50/block1']
    #end_points['pool4'] = end_points['Detection/resnet_v1_50/block2']
    end_points['pool3'] = end_points['Detection/resnet_v1_101/block1']
    end_points['pool4'] = end_points['Detection/resnet_v1_101/block2']

when you use data argument, you should add two parts of code argman/EAST.

1.nets/resnet_v1.py

#add before resnet_v1 function
def gaussian_noise_layer(input_layer, std):
    noise = tf.random_normal(shape=tf.shape(input_layer), mean=0.0, stddev=std, dtype=tf.float32)
    return input_layer + noise/250

2.nets/resnet_v1.py

with slim.arg_scope([slim.batch_norm], is_training=is_training):
	inputs=gaussian_noise_layer(inputs,1)								#add gaussian noise data argument
	inputs=tf.image.random_brightness(inputs,32./255)                   #add brightness data argument
	inputs=tf.image.random_contrast(inputs,lower=0.5,upper=1.5)         #add contrast data argument
	net = inputs

Test

when you use argman/EAST for testing, Names of the images in ICPR MTWI 2018 are abnormal. Like <T1cMkaFMFcXXXXXXXX_!!0-item_pic.jpg> but not <img_***.jpg>. Then errors will occur while using argman/EAST#test.
So I wrote a python programs to rename and inversely rename the dataset. Before evaluating, run the program named <changeImageName.py> to make names of the images normal. This program is in the folder 'script/' and its parameters are descripted as bellow:

#input
image_dir = "./image_test/*.jpg"                         #orignial images name(perhaps abnormal e.g <T1cMkaFMFcXXXXXXXX_!!0-item_pic.jpg>)
#output
imgs_save_dir = "./image_test_change"                    #renamed images(e.g. <img_1.jpg>)

After evaluating, the output file folder contain images with bounding boxes and txt. If I want to get the original name of txt, we should delete the images in the output file folder and inversely rename the txt.
So I wrote two python programs to get the original name of txt. First, run the program named <deleteImage.py> to delete the images in folder. This program is in the folder 'script/' and its parameters are descripted as bellow:

#input 
output_dir = "./output/"        #original output file folder(txt and images)
#output 
output_dir = "./output/"        #processed output file folder(only txt)

Second, run the program named <rechangeTxtName.py> to inversely rename the txt in output folder. This program is in the folder 'script/' and its parameters are descripted as bellow:

#input
image_dir = "./image_test/*.jpg"     #original images  
gt_text_dir = "./txt_test"           #the folder which contain renamed txt e.g. <txt_1>
#output
gt_text_dir = "./txt_test"           #the folder which contain inversely renamed txt e.g. <T1cMkaFMFcXXXXXXXX_!!0-item_pic.jpg> but not <img_1.jpg>

If you want to see the output result on the image, you can draw the output bounding boxes on the origanl image.
So I wrote a python programs to read picture and txt coompatibel with Chinese, then draw and save images with output bounding boxes. This program named <check.py> is in the folder 'script/' and its parameters are descripted as bellow: #input gt_text_dir = "./txt_test" #output labels(bounding boxes) folder image_dir = "./image_test/*.jpg" #original images folder #output imgs_save_dir = "./processImageTest" #where to save the images with output bounding boxes. This program is in the folder 'script/' and its parameters are descripted as bellow:

I wrote a python programs to evaluate the output performance. The program named <getACC.py> is in the folder 'script/' and its parameters are descripted as bellow:

#input
gt_text_dir = "./traintxt9000/"      # ground truth directory
#output
test_text_dir = "./output/"          # output directory 

Finally, If you want to compress the output txt in order to submit, you can run the command 'zip -r sample_task2.zip sample_task2' to get the .zip file

Results

Here are some results on ICPR MTWI 2018:






Hope this helps you

Owner
QichaoWu
machine learning,deep learning
QichaoWu
This pyhton script converts a pdf to Image then using tesseract as OCR engine converts Image to Text

Script_Convertir_PDF_IMG_TXT Este script de pyhton convierte un pdf en Imagen luego utilizando tesseract como motor OCR convierte la Imagen a Texto. p

alebogado 1 Jan 27, 2022
Official code for :rocket: Unsupervised Change Detection of Extreme Events Using ML On-Board :rocket:

RaVAEn The RaVÆn system We introduce the RaVÆn system, a lightweight, unsupervised approach for change detection in satellite data based on Variationa

SpaceML 35 Jan 05, 2023
Here use convulation with sobel filter from scratch in opencv python .

Here use convulation with sobel filter from scratch in opencv python .

Tamzid hasan 2 Nov 11, 2021
Let's explore how we can extract text from forms

Form Segmentation Let's explore how we can extract text from any forms / scanned pages. Objectives The goal is to find an algorithm that can extract t

Philip Doxakis 42 Jun 05, 2022
Deskew is a command line tool for deskewing scanned text documents. It uses Hough transform to detect "text lines" in the image. As an output, you get an image rotated so that the lines are horizontal.

Deskew by Marek Mauder https://galfar.vevb.net/deskew https://github.com/galfar/deskew v1.30 2019-06-07 Overview Deskew is a command line tool for des

Marek Mauder 127 Dec 03, 2022
OpenCV-Erlang/Elixir bindings

evision [WIP] : OS : arch Build Status Ubuntu 20.04 arm64 Ubuntu 20.04 armv7 Ubuntu 20.04 s390x Ubuntu 20.04 ppc64le Ubuntu 20.04 x86_64 macOS 11 Big

Cocoa 194 Jan 05, 2023
Repository of conference publications and source code for first-/ second-authored papers published at NeurIPS, ICML, and ICLR.

Repository of conference publications and source code for first-/ second-authored papers published at NeurIPS, ICML, and ICLR.

Daniel Jarrett 26 Jun 17, 2021
An Implementation of the seglink alogrithm in paper Detecting Oriented Text in Natural Images by Linking Segments

Tips: A more recent scene text detection algorithm: PixelLink, has been implemented here: https://github.com/ZJULearning/pixel_link Contents: Introduc

dengdan 484 Dec 07, 2022
Camelot: PDF Table Extraction for Humans

Camelot: PDF Table Extraction for Humans Camelot is a Python library that makes it easy for anyone to extract tables from PDF files! Note: You can als

Atlan Technologies Pvt Ltd 3.3k Dec 31, 2022
PSENet - Shape Robust Text Detection with Progressive Scale Expansion Network.

News Python3 implementations of PSENet [1], PAN [2] and PAN++ [3] are released at https://github.com/whai362/pan_pp.pytorch. [1] W. Wang, E. Xie, X. L

1.1k Dec 24, 2022
Text to QR-CODE

QR CODE GENERATO USING PYTHON Author : RAFIK BOUDALIA. Installation Use the package manager pip to install foobar. pip install pyqrcode Usage from tki

Rafik Boudalia 2 Oct 13, 2021
Generate a list of papers with publicly available source code in the daily arxiv

2021-06-08 paper code optimal network slicing for service-oriented networks with flexible routing and guaranteed e2e latency networkslicing multi-moda

79 Jan 03, 2023
This repository summarized computer vision theories.

This repository summarized computer vision theories.

3 Feb 04, 2022
Deskewing images with slanted content

skew_correction De-skewing images with slanted content by finding the deviation using Canny Edge Detection. To Run: In python 3.6, from deskew import

13 Aug 27, 2022
Python Computer Vision application that allows users to draw/erase on the screen using their webcam.

CV-Virtual-WhiteBoard The Virtual WhiteBoard is a project I made using the OpenCV and Mediapipe Python libraries. Using your index and middle finger y

Stephen Wang 1 Jan 07, 2022
A python program to block out your face

Readme This is a small program I threw together in about 6 hours to block out your face. It probably doesn't work very well, so be warned. By default,

1 Oct 17, 2021
Handwritten Text Recognition (HTR) using TensorFlow 2.x

Handwritten Text Recognition (HTR) system implemented using TensorFlow 2.x and trained on the Bentham/IAM/Rimes/Saint Gall/Washington offline HTR data

Arthur Flôr 160 Dec 21, 2022
Maze generator and solver with python

Procedural-Maze-Generator-Algorithms Check out my youtube channel : Auctux Ressources Thanks to Jamis Buck Book : Mazes for programmers Requirements P

Joseph 19 Dec 07, 2022
deployment of a hybrid model for automatic weapon detection/ anomaly detection for surveillance applications

Automatic Weapon Detection Deployment of a hybrid model for automatic weapon detection/ anomaly detection for surveillance applications. Loved the pro

Janhavi 4 Mar 04, 2022
A document scanner application for laptops/desktops developed using python, Tkinter and OpenCV.

DcoumentScanner A document scanner application for laptops/desktops developed using python, Tkinter and OpenCV. Directly install the .exe file to inst

Harsh Vardhan Singh 1 Oct 29, 2021