Implementation of our paper 'PixelLink: Detecting Scene Text via Instance Segmentation' in AAAI2018

Overview

Code for the AAAI18 paper PixelLink: Detecting Scene Text via Instance Segmentation, by Dan Deng, Haifeng Liu, Xuelong Li, and Deng Cai.

Contributions to this repo are welcome, e.g., some other backbone networks (including the model definition and pretrained models).

PLEASE CHECK EXSITING ISSUES BEFORE OPENNING YOUR OWN ONE. IF A SAME OR SIMILAR ISSUE HAD BEEN POSTED BEFORE, JUST REFER TO IT, AND DO NO OPEN A NEW ONE.

Installation

Clone the repo

git clone --recursive [email protected]:ZJULearning/pixel_link.git

Denote the root directory path of pixel_link by ${pixel_link_root}.

Add the path of ${pixel_link_root}/pylib/src to your PYTHONPATH:

export PYTHONPATH=${pixel_link_root}/pylib/src:$PYTHONPATH

Prerequisites

(Only tested on) Ubuntu14.04 and 16.04 with:

  • Python 2.7
  • Tensorflow-gpu >= 1.1
  • opencv2
  • setproctitle
  • matplotlib

Anaconda is recommended to for an easier installation:

  1. Install Anaconda
  2. Create and activate the required virtual environment by:
conda env create --file pixel_link_env.txt
source activate pixel_link

Testing

Download the pretrained model

Unzip the downloaded model. It contains 4 files:

  • config.py
  • model.ckpt-xxx.data-00000-of-00001
  • model.ckpt-xxx.index
  • model.ckpt-xxx.meta

Denote their parent directory as ${model_path}.

Test on ICDAR2015

The reported results on ICDAR2015 are:

Model Recall Precision F-mean
PixelLink+VGG16 2s 82.0 85.5 83.7
PixelLink+VGG16 4s 81.7 82.9 82.3

Suppose you have downloaded the ICDAR2015 dataset, execute the following commands to test the model on ICDAR2015:

cd ${pixel_link_root}
./scripts/test.sh ${GPU_ID} ${model_path}/model.ckpt-xxx ${path_to_icdar2015}/ch4_test_images

For example:

./scripts/test.sh 3 ~/temp/conv3_3/model.ckpt-38055 ~/dataset/ICDAR2015/Challenge4/ch4_test_images

The program will create a zip file of detection results, which can be submitted to the ICDAR2015 server directly. The detection results can be visualized via scripts/vis.sh.

Here are some samples: ./samples/img_333_pred.jpg ./samples/img_249_pred.jpg

Test on any images

Put the images to be tested in a single directory, i.e., ${image_dir}. Then:

cd ${pixel_link_root}
./scripts/test_any.sh ${GPU_ID} ${model_path}/model.ckpt-xxx ${image_dir}

For example:

 ./scripts/test_any.sh 3 ~/temp/conv3_3/model.ckpt-38055 ~/dataset/ICDAR2015/Challenge4/ch4_training_images

The program will visualize the detection results directly on images. If the detection result is not satisfying, try to:

  1. Adjust the inference parameters like eval_image_width, eval_image_height, pixel_conf_threshold, link_conf_threshold.
  2. Or train your own model.

Training

Converting the dataset to tfrecords files

Scripts for converting ICDAR2015 and SynthText datasets have been provided in the datasets directory. It not hard to write a converting script for your own dataset.

Train your own model

  • Modify scripts/train.sh to configure your dataset name and dataset path like:
DATASET=icdar2015
DATASET_DIR=$HOME/dataset/pixel_link/icdar2015
  • Start training
./scripts/train.sh ${GPU_IDs} ${IMG_PER_GPU}

For example, ./scripts/train.sh 0,1,2 8.

The existing training strategy in scripts/train.sh is configured for icdar2015, modify it if necessary. A lot of training or model options are available in config.py, try it yourself if you are interested.

Acknowlegement

Usando o Amazon Textract como OCR para Extração de Dados no DynamoDB

dio-live-textract2 Repositório de código para o live coding do dia 05/10/2021 sobre extração de dados estruturados e gravação em banco de dados a part

hugoportela 0 Jan 19, 2022
一款基于Qt与OpenCV的仿真数字示波器

一款基于Qt与OpenCV的仿真数字示波器

郭赟 4 Nov 02, 2022
An Implementation of the alogrithm in paper IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Oriented Scene Text Detection

InceptText-Tensorflow An Implementation of the alogrithm in paper IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Orien

GeorgeJoe 115 Dec 12, 2022
Official code for :rocket: Unsupervised Change Detection of Extreme Events Using ML On-Board :rocket:

RaVAEn The RaVÆn system We introduce the RaVÆn system, a lightweight, unsupervised approach for change detection in satellite data based on Variationa

SpaceML 35 Jan 05, 2023
Sign Language Recognition service utilizing a deep learning model with Long Short-Term Memory to perform sign language recognition.

Sign Language Recognition Service This is a Sign Language Recognition service utilizing a deep learning model with Long Short-Term Memory to perform s

Martin Lønne 1 Jan 08, 2022
[EMNLP 2021] Improving and Simplifying Pattern Exploiting Training

ADAPET This repository contains the official code for the paper: "Improving and Simplifying Pattern Exploiting Training". The model improves and simpl

Rakesh R Menon 138 Dec 26, 2022
OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched or copy-pasted. ocrmypdf # it's a scriptable c

jbarlow83 7.9k Jan 03, 2023
A facial recognition device is a device that takes an image or a video of a human face and compares it to another image faces in a database.

A facial recognition device is a device that takes an image or a video of a human face and compares it to another image faces in a database. The structure, shape and proportions of the faces are comp

Pavankumar Khot 4 Mar 19, 2022
Assignment work with webcam

work with webcam : Press key 1 to use emojy on your face Press key 2 to use lip and eye on your face Press key 3 to checkered your face Press key 4 to

Hanane Kheirandish 2 May 31, 2022
This is the code for our paper DAAIN: Detection of Anomalous and AdversarialInput using Normalizing Flows

Merantix-Labs: DAAIN This is the code for our paper DAAIN: Detection of Anomalous and Adversarial Input using Normalizing Flows which can be found at

Merantix 14 Oct 12, 2022
Single Shot Text Detector with Regional Attention

Single Shot Text Detector with Regional Attention Introduction SSTD is initially described in our ICCV 2017 spotlight paper. A third-party implementat

Pan He 215 Dec 07, 2022
利用Paddle框架复现CRAFT

CRAFT-Paddle 利用Paddle框架复现CRAFT CRAFT 本项目基于paddlepaddle框架复现CRAFT,并参加百度第三届论文复现赛,将在2021年5月15日比赛完后提供AIStudio链接~敬请期待 参考项目: CRAFT: Character-Region Awarenes

QuanHao Guo 2 Mar 07, 2022
Programa que viabiliza a OCR (Optical Character Reading - leitura óptica de caracteres) de um PDF.

Este programa tem o intuito de ser um modificador de arquivos PDF. Os arquivos PDFs podem ser 3: PDFs verdadeiros - em que podem ser selecionados o ti

Daniel Soares Saldanha 2 Oct 11, 2021
The virtual calculator will be above the live streaming from your camera

The virtual calculator is above the live streaming from my camera usb , the program first detect my hand and in each frame calculate the distance between two finger ,if the distance is lower than the

gasbaoui mohammed al amine 5 Jul 01, 2022
Amazing 3D explosion animation using Pygame module.

3D Explosion Animation 💣 💥 🔥 Amazing explosion animation with Pygame. 💣 Explosion physics An Explosion instance is made of a set of Particle objec

Dylan Tintenfich 12 Mar 11, 2022
Textboxes : Image Text Detection Model : python package (tensorflow)

shinTB Abstract A python package for use Textboxes : Image Text Detection Model implemented by tensorflow, cv2 Textboxes Paper Review in Korean (My Bl

Jayne Shin (신재인) 91 Dec 15, 2022
OCR of Chicago 1909 Renumbering Plan

Requirements: Python 3 (probably at least 3.4) pipenv (pip3 install pipenv) tesseract (brew install tesseract, at least if you have a mac and homebrew

ted whalen 2 Nov 21, 2021
Convert Text-to Handwriting Using Python

Convert Text-to Handwriting Using Python Description In this project we'll use python library that's "pywhatkit" for converting text to handwriting. t

8 Nov 19, 2022
Implementation of our paper 'PixelLink: Detecting Scene Text via Instance Segmentation' in AAAI2018

Code for the AAAI18 paper PixelLink: Detecting Scene Text via Instance Segmentation, by Dan Deng, Haifeng Liu, Xuelong Li, and Deng Cai. Contributions

758 Dec 22, 2022
Detect textlines in document images

Textline Detection Detect textlines in document images Introduction This tool performs border, region and textline detection from document image data

QURATOR-SPK 70 Jun 30, 2022