Implementation of our paper 'PixelLink: Detecting Scene Text via Instance Segmentation' in AAAI2018

Overview

Code for the AAAI18 paper PixelLink: Detecting Scene Text via Instance Segmentation, by Dan Deng, Haifeng Liu, Xuelong Li, and Deng Cai.

Contributions to this repo are welcome, e.g., some other backbone networks (including the model definition and pretrained models).

PLEASE CHECK EXSITING ISSUES BEFORE OPENNING YOUR OWN ONE. IF A SAME OR SIMILAR ISSUE HAD BEEN POSTED BEFORE, JUST REFER TO IT, AND DO NO OPEN A NEW ONE.

Installation

Clone the repo

git clone --recursive [email protected]:ZJULearning/pixel_link.git

Denote the root directory path of pixel_link by ${pixel_link_root}.

Add the path of ${pixel_link_root}/pylib/src to your PYTHONPATH:

export PYTHONPATH=${pixel_link_root}/pylib/src:$PYTHONPATH

Prerequisites

(Only tested on) Ubuntu14.04 and 16.04 with:

  • Python 2.7
  • Tensorflow-gpu >= 1.1
  • opencv2
  • setproctitle
  • matplotlib

Anaconda is recommended to for an easier installation:

  1. Install Anaconda
  2. Create and activate the required virtual environment by:
conda env create --file pixel_link_env.txt
source activate pixel_link

Testing

Download the pretrained model

Unzip the downloaded model. It contains 4 files:

  • config.py
  • model.ckpt-xxx.data-00000-of-00001
  • model.ckpt-xxx.index
  • model.ckpt-xxx.meta

Denote their parent directory as ${model_path}.

Test on ICDAR2015

The reported results on ICDAR2015 are:

Model Recall Precision F-mean
PixelLink+VGG16 2s 82.0 85.5 83.7
PixelLink+VGG16 4s 81.7 82.9 82.3

Suppose you have downloaded the ICDAR2015 dataset, execute the following commands to test the model on ICDAR2015:

cd ${pixel_link_root}
./scripts/test.sh ${GPU_ID} ${model_path}/model.ckpt-xxx ${path_to_icdar2015}/ch4_test_images

For example:

./scripts/test.sh 3 ~/temp/conv3_3/model.ckpt-38055 ~/dataset/ICDAR2015/Challenge4/ch4_test_images

The program will create a zip file of detection results, which can be submitted to the ICDAR2015 server directly. The detection results can be visualized via scripts/vis.sh.

Here are some samples: ./samples/img_333_pred.jpg ./samples/img_249_pred.jpg

Test on any images

Put the images to be tested in a single directory, i.e., ${image_dir}. Then:

cd ${pixel_link_root}
./scripts/test_any.sh ${GPU_ID} ${model_path}/model.ckpt-xxx ${image_dir}

For example:

 ./scripts/test_any.sh 3 ~/temp/conv3_3/model.ckpt-38055 ~/dataset/ICDAR2015/Challenge4/ch4_training_images

The program will visualize the detection results directly on images. If the detection result is not satisfying, try to:

  1. Adjust the inference parameters like eval_image_width, eval_image_height, pixel_conf_threshold, link_conf_threshold.
  2. Or train your own model.

Training

Converting the dataset to tfrecords files

Scripts for converting ICDAR2015 and SynthText datasets have been provided in the datasets directory. It not hard to write a converting script for your own dataset.

Train your own model

  • Modify scripts/train.sh to configure your dataset name and dataset path like:
DATASET=icdar2015
DATASET_DIR=$HOME/dataset/pixel_link/icdar2015
  • Start training
./scripts/train.sh ${GPU_IDs} ${IMG_PER_GPU}

For example, ./scripts/train.sh 0,1,2 8.

The existing training strategy in scripts/train.sh is configured for icdar2015, modify it if necessary. A lot of training or model options are available in config.py, try it yourself if you are interested.

Acknowlegement

Binarize document images

Binarization Binarization for document images Examples Introduction This tool performs document image binarization (i.e. transform colour/grayscale to

QURATOR-SPK 48 Jan 02, 2023
Python Computer Vision Aim Bot for Roblox's Phantom Forces

Python-Phantom-Forces-Aim-Bot Python Computer Vision Aim Bot for Roblox's Phanto

drag0ngam3s 2 Jul 11, 2022
A curated list of awesome synthetic data for text location and recognition

awesome-SynthText A curated list of awesome synthetic data for text location and recognition and OCR datasets. Text location SynthText SynthText_Chine

Tianzhong 283 Jan 05, 2023
This tool will help you convert your text to handwriting xD

So your teacher asked you to upload written assignments? Hate writing assigments? This tool will help you convert your text to handwriting xD

Saurabh Daware 4.2k Jan 07, 2023
A bot that plays TFT using OCR. Keeps track of bench, board, items, and plays the user defined team comp.

NOTES: To ensure best results, make sure you are running this on a computer that has decent specs. 1920x1080 fullscreen is required in League, game mu

francis 125 Dec 30, 2022
Super Mario Game With Python

Super_Mario Hello all this is a simple python program which tries to use our body as a controller for the super mario game Here I have used media pipe

Adarsh Badagala 219 Nov 25, 2022
Application that instantly translates sign-language to letters.

Sign Language Translator Project Description The main purpose of project is translating sign-language to letters. In accordance with this purpose we d

3 Sep 29, 2022
Motion Detection Squid Game with OpenCV Python

*Motion Detection Squid Game with OpenCV Python i am newbie in python. In this project I made a simple game to follow the trend about the red light gr

Nayan 17 Nov 22, 2022
Unofficial implementation of "TableNet: Deep Learning model for end-to-end Table detection and Tabular data extraction from Scanned Document Images"

TableNet Unofficial implementation of ICDAR 2019 paper : TableNet: Deep Learning model for end-to-end Table detection and Tabular data extraction from

Jainam Shah 243 Dec 30, 2022
A simple document layout analysis using Python-OpenCV

Run the application: python main.py *Note: For first time running the application, create a folder named "output". The application is a simple documen

Roinand Aguila 109 Dec 12, 2022
第一届西安交通大学人工智能实践大赛(2018AI实践大赛--图片文字识别)第一名;仅采用densenet识别图中文字

OCR 第一届西安交通大学人工智能实践大赛(2018AI实践大赛--图片文字识别)冠军 模型结果 该比赛计算每一个条目的f1score,取所有条目的平均,具体计算方式在这里。这里的计算方式不对一句话里的相同文字重复计算,故f1score比提交的最终结果低: - train val f1score 0

尹畅 441 Dec 22, 2022
A synthetic data generator for text recognition

TextRecognitionDataGenerator A synthetic data generator for text recognition What is it for? Generating text image samples to train an OCR software. N

Edouard Belval 2.5k Jan 04, 2023
Document Layout Analysis

Eynollah Document Layout Analysis Introduction This tool performs document layout analysis (segmentation) from image data and returns the results as P

QURATOR-SPK 198 Dec 29, 2022
Semantic-based Patch Detection for Binary Programs

PMatch Semantic-based Patch Detection for Binary Programs Requirement tensorflow-gpu 1.13.1 numpy 1.16.2 scikit-learn 0.20.3 ssdeep 3.4 Usage tar -xvz

Mr.Curiosity 3 Sep 02, 2022
Pixel art search engine for opengameart

Pixel Art Reverse Image Search for OpenGameArt What does the final search look like? The final search with an example can be found here. It looks like

Eivind Magnus Hvidevold 92 Nov 06, 2022
Face Detection with DLIB

Face Detection with DLIB In this project, we have detected our face with dlib and opencv libraries. Setup This Project Install DLIB & OpenCV You can i

Can 2 Jan 16, 2022
Creating a virtual tv using opencv in python3.

Virtual-TV Creating a virtual tv using opencv in python3. In order to run the code follow the below given steps: Make sure the desired videos which ar

Vamsi 1 Jan 01, 2022
This project proposes a camera vision based cursor control system, using hand moment captured from a webcam through a landmarks of hand by using Mideapipe module

This project proposes a camera vision based cursor control system, using hand moment captured from a webcam through a landmarks of hand by using Mideapipe module

Chandru 2 Feb 20, 2022
An interactive interface for using OpenCV's GrabCut algorithm for image segmentation.

Interactive GrabCut An interactive interface for using OpenCV's GrabCut algorithm for image segmentation. Setup Install dependencies: pip install nump

Jason Y. Zhang 16 Oct 10, 2022
Using computer vision method to recognize and calcutate the features of the architecture.

building-feature-recognition In this repository, we accomplished building feature recognition using traditional/dl-assisted computer vision method. Th

4 Aug 11, 2022