A PyTorch implementation of ECCV2018 Paper: TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes

Overview

TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes

A PyTorch implement of TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes (ECCV 2018) by Megvii

Paper

Comparison of different representations for text instances. (a) Axis-aligned rectangle. (b) Rotated rectangle. (c) Quadrangle. (d) TextSnake. Obviously, the proposed TextSnake representation is able to effectively and precisely describe the geometric properties, such as location, scale, and bending of curved text with perspective distortion, while the other representations (axis-aligned rectangle, rotated rectangle or quadrangle) struggle with giving accurate predictions in such cases.

Textsnake elements:

  • center point
  • tangent line
  • text region

Description

Generally, this code has following features:

  1. include complete training and inference code
  2. pure python version without extra compiling
  3. compatible with laste PyTorch version (write with pytroch 0.4.0)
  4. support TotalText and SynthText dataset

Getting Started

This repo includes the training code and inference demo of TextSnake, training and infercence can be simplely run with a few code.

Prerequisites

To run this repo successfully, it is highly recommanded with:

  • Linux (Ubuntu 16.04)
  • Python3.6
  • Anaconda3
  • NVIDIA GPU(with 8G or larger GPU memory for training, 2G for inference)

(I haven't test it on other Python version.)

  1. clone this repository
git clone https://github.com/princewang1994/TextSnake.pytorch.git
  1. python package can be installed with pip
$ cd $TEXTSNAKE_ROOT
$ pip install -r requirements.txt

Data preparation

Pretraining with SynthText

$ CUDA_VISIBLE_DEVICES=$GPUID python train.py synthtext_pretrain --dataset synth-text --viz --max_epoch 1 --batch_size 8

Training

Training model with given experiment name $EXPNAME

training from scratch:

$ EXPNAME=example
$ CUDA_VISIBLE_DEVICES=$GPUID python train.py $EXPNAME --viz

training with pretrained model(improved performance much)

$ EXPNAME=example
$ CUDA_VISIBLE_DEVICES=$GPUID python train.py example --viz --batch_size 8 --resume save/synthtext_pretrain/textsnake_vgg_0.pth

options:

  • exp_name: experiment name, used to identify different training processes
  • --viz: visualization toggle, output pictures are saved to ./vis by default

other options can be show by run python train.py -h

Running tests

Runing following command can generate demo on TotalText dataset (300 pictures), the result are save to ./vis by default

$ EXPNAME=example
$ CUDA_VISIBLE_DEVICES=$GPUID python eval_textsnake.py $EXPNAME --checkepoch 190

options:

  • exp_name: experiment name, used to identify different training process

other options can be show by run python train.py -h

Evaluation

Total-Text metric is included in dataset/total_text/Evaluation_Protocol/Python_scripts/Deteval.py, you should first modify the input_dir in Deteval.py and run following command for computing DetEval:

$ python dataset/total_text/Evaluation_Protocol/Python_scripts/Deteval.py $EXPNAME --tr 0.8 --tp 0.4

or

$ python dataset/total_text/Evaluation_Protocol/Python_scripts/Deteval.py $EXPNAME --tr 0.7 --tp 0.6

it will output metrics reports.

Pretrained Models

Download from links above and place pth file to the corresponding path(save/XXX/textsnake_vgg_XX.pth).

Performance

DetEval reporting

Following table reports DetEval metrics when we set vgg as the backbone(can be reproduced by using pertained model in Pretrained Model section):

tr=0.7 / tp=0.6(P|R|F1) tr=0.8 / tp=0.4(P|R|F1) FPS(On single 1080Ti)
expand / no merge 0.652 | 0.549 | 0.596 0.874 | 0.711 | 0.784 12.07
expand / merge 0.698 | 0.578 | 0.633 0.859 | 0.660 | 0.746 8.38
no expand / no merge 0.753 | 0.693 | 0.722 0.695 | 0.628 | 0.660 9.94
no expand / merge 0.747 | 0.677 | 0.710 0.691 | 0.602 | 0.643 11.05
reported on paper - 0.827 | 0.745 | 0.784

* expand denotes expanding radius by 0.3 times while post-processing

* merge denotes that merging overlapped instance while post-processing

Pure Inference

You can also run prediction on your own dataset without annotations:

  1. Download pretrained model and place .pth file to save/pretrained/textsnake_vgg_180.pth
  2. Run pure inference script as following:
$ EXPNAME=pretrained
$ CUDA_VISIBLE_DEVICES=$GPUID python demo.py $EXPNAME --checkepoch 180 --img_root /path/to/image

predicted result will be saved in output/$EXPNAME and visualization in vis/${EXPNAME}_deploy

Qualitative results

  • left: prediction/ground true
  • middle: text region(TR)
  • right: text center line(TCL)

What is comming

  • Pretraining with SynthText
  • Metric computing
  • Pretrained model upload
  • Pure inference script
  • More dataset suport: [ICDAR15, CTW1500]
  • Various backbone experiments

License

This project is licensed under the MIT License - see the LICENSE.md file for details

Acknowledgement

Owner
Prince Wang
I'm a CS graduate student from Zhejiang University
Prince Wang
Code for generating synthetic text images as described in "Synthetic Data for Text Localisation in Natural Images", Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, CVPR 2016.

SynthText Code for generating synthetic text images as described in "Synthetic Data for Text Localisation in Natural Images", Ankush Gupta, Andrea Ved

Ankush Gupta 1.8k Dec 28, 2022
Go package for OCR (Optical Character Recognition), by using Tesseract C++ library

gosseract OCR Golang OCR package, by using Tesseract C++ library. OCR Server Do you just want OCR server, or see the working example of this package?

Hiromu OCHIAI 1.9k Dec 28, 2022
This is an API written in python that uses FastAPI. It is a simple API that can detect discord tokens in Images.

Welcome This is an API written in python that uses FastAPI. It is a simple API that can detect discord tokens in Images. Installation There are curren

8 Jul 29, 2022
This project is basically to draw lines with your hand, using python, opencv, mediapipe.

Paint Opencv 📷 This project is basically to draw lines with your hand, using python, opencv, mediapipe. Screenshoots 📱 Tools ⚙️ Python Opencv Mediap

Williams Ismael Bobadilla Torres 3 Nov 17, 2021
ISI's Optical Character Recognition (OCR) software for machine-print and handwriting data

VistaOCR ISI's Optical Character Recognition (OCR) software for machine-print and handwriting data Publications "How to Efficiently Increase Resolutio

ISI Center for Vision, Image, Speech, and Text Analytics 21 Dec 08, 2021
Read Japanese manga inside browser with selectable text.

mokuro Read Japanese manga with selectable text inside a browser. See demo: https://kha-white.github.io/manga-demo mokuro_demo.mp4 Demo contains excer

Maciej Budyś 170 Dec 27, 2022
Open Source Differentiable Computer Vision Library for PyTorch

Kornia is a differentiable computer vision library for PyTorch. It consists of a set of routines and differentiable modules to solve generic computer

kornia 7.6k Jan 04, 2023
An advanced 2D image manipulation with features such as edge detection and image segmentation built using OpenCV

OpenCV-ToothPaint3-Advanced-Digital-Image-Editor This application named ‘Tooth Paint’ version TP_2020.3 (64-bit) or version 3 was developed within a w

JunHong 1 Nov 05, 2021
Handwritten Character Recognition using CNN

Handwritten Character Recognition using CNN Problem Definition The main objective of this project is to solve the problem of handwritten character rec

Mohit Kaushik 4 Mar 02, 2022
Ackermann Line Follower Robot Simulation.

Ackermann Line Follower Robot This is a simulation of a line follower robot that works with steering control based on Stanley: The Robot That Won the

Lucas Mazzetto 2 Apr 16, 2022
Generic framework for historical document processing

dhSegment dhSegment is a tool for Historical Document Processing. Its generic approach allows to segment regions and extract content from different ty

Digital Humanities Laboratory 343 Dec 24, 2022
Creating of virtual elements of the graphical interface using opencv and mediapipe.

Virtual GUI Creating of virtual elements of the graphical interface using opencv and mediapipe. Element GUI Output Description Button By default the b

Aleksei 4 Jun 16, 2022
Kornia is a open source differentiable computer vision library for PyTorch.

Open Source Differentiable Computer Vision Library

kornia 7.6k Jan 06, 2023
基于Paddle框架的PSENet复现

PSENet-Paddle 基于Paddle框架的PSENet复现 本项目基于paddlepaddle框架复现PSENet,并参加百度第三届论文复现赛,将在2021年5月15日比赛完后提供AIStudio链接~敬请期待 AIStudio链接 参考项目: whai362-PSENet 环境配置 本项目

QuanHao Guo 4 Apr 24, 2022
APS 6º Semestre - UNIP (2021)

UNIP - Universidade Paulista Ciência da Computação (CC) DESENVOLVIMENTO DE UM SISTEMA COMPUTACIONAL PARA ANÁLISE E CLASSIFICAÇÃO DE FORMAS Link do git

Eduardo Talarico 5 Mar 09, 2022
Code for the paper "DewarpNet: Single-Image Document Unwarping With Stacked 3D and 2D Regression Networks" (ICCV '19)

DewarpNet This repository contains the codes for DewarpNet training. Recent Updates [May, 2020] Added evaluation images and an important note about Ma

<a href=[email protected]"> 354 Jan 01, 2023
WACV 2022 Paper - Is An Image Worth Five Sentences? A New Look into Semantics for Image-Text Matching

Is An Image Worth Five Sentences? A New Look into Semantics for Image-Text Matching Code based on our WACV 2022 Accepted Paper: https://arxiv.org/pdf/

Andres 13 Dec 17, 2022
Demo for the paper "Overlap-aware low-latency online speaker diarization based on end-to-end local segmentation"

Streaming speaker diarization Overlap-aware low-latency online speaker diarization based on end-to-end local segmentation by Juan Manuel Coria, Hervé

Juanma Coria 185 Jan 01, 2023
CNN+Attention+Seq2Seq

Attention_OCR CNN+Attention+Seq2Seq The model and its tensor transformation are shown in the figure below It is necessary ch_ train and ch_ test the p

Tsukinousag1 2 Jul 14, 2022
Code for paper "Role-based network embedding via structural features reconstruction with degree-regularized constraint"

Role-based network embedding via structural features reconstruction with degree-regularized constraint Train python main.py --dataset brazil-flights

wang zhang 1 Jun 28, 2022