Unconstrained Text Detection with Box Supervisionand Dynamic Self-Training

Overview

SelfText Beyond Polygon: Unconstrained Text Detection with Box Supervisionand Dynamic Self-Training

Alt text

Introduction

This is a PyTorch implementation of "SelfText Beyond Polygon: Unconstrained Text Detection with Box Supervisionand Dynamic Self-Training"

The paper propose a novel text detection system termed SelfText Beyond Polygon(SBP) with Bounding Box Supervision(BBS) and Dynamic Self Training~(DST), where training a polygon-based text detector with only a limited set of upright bounding box annotations. As shown in the Figure, SBP achieves the same performance as strong supervision while saving huge data annotation costs.

From more details,please refer to our arXiv paper

Environments

  • python 3
  • torch = 1.1.0
  • torchvision
  • Pillow
  • numpy

ToDo List

  • Release code(BBS)
  • Release code(DST)
  • Document for Installation
  • Document for testing and training
  • Evaluation
  • Demo script
  • re-organize and clean the parameters

Dataset

Supported:

  • ICDAR15
  • ICDAR17MLI
  • sythtext800K
  • TotalText
  • MSRA-TD500
  • CTW1500

model zoo

Supported text detection:

Bounding Box Supervision(BBS)

Train

The training strategy includes three steps: (1) training SASN with synthetic data (2) generating pseudo label on real data based on bounding box annotation with SASN (3) training the detectors(EAST and PSENet) with the pseudo label

training SASN with synthtext or curved synthtext

(TDB)

generating pseudo label on real data with SASN

(TDB)

training EAST or PSENet with the pseudo label

(TDB)

Eval

for example (batchsize=2)

(TDB)

Visualization

Dynamic Self Training

Train

(TDB)

Eval

for example (batchsize=2)

(TDB)

Visualization

Experiments

Bounding Box Supervision

The performance of EAST on ICDAR15

Method Dataset Pretrain precision recall f-score
EAST_box ICDAR15 - 65.8 63.8 64.8
EAST ICDAR15 - 76.9 77.1 77.0
EAST_pseudo(SynthText) ICDAR15 - 77.8 78.2 78.0
EAST_box ICDAR15 SynthText 70.8 72.0 71.4
EAST ICDAR15 SynthText 82.0 82.4 82.2
EAST_pseudo(SynthText) ICDAR15 SynthText 81.3 82.2 81.8

The performance of EAST on MSRA-TD500

Method Dataset Pretrain precision recall f-score
EAST_box MSRA-TD500 - 40.49 31.05 35.15
EAST MSRA-TD500 - 71.76 69.05 70.38
EAST_pseudo(SynthText) MSRA-TD500 - 71.27 67.54 69.36
EAST_box MSRA-TD500 SynthText 48.34 42.37 45.16
EAST MSRA-TD500 SynthText 77.91 76.45 77.17
EAST_pseudo(SynthText) MSRA-TD500 SynthText 77.42 73.85 75.59

The performance of PSENet on ICDAR15

Method Dataset Pretrain precision recall f-score
PSENet_box ICDAR15 - 70.17 69.09 69.63
PSENet ICDAR15 - 81.6 79.5 80.5
PSENet_pseudo(SynthText) ICDAR15 - 82.9 77.6 80.2
PSENet_box ICDAR15 SynthText 72.65 74.29 73.46
PSENet ICDAR15 SynthText 86.42 83.54 84.96
PSENet_pseudo(SynthText) ICDAR15 SynthText 86.77 83.34 85.02

The performance of PSENet on MSRA-TD500

Method Dataset Pretrain precision recall f-score
PSENet_box MSRA-TD500 - 47.17 36.90 41.41
PSENet MSRA-TD500 - 80.86 77.72 79.13
PSENet_pseudo(SynthText) MSRA-TD500 - 80.32 77.26 78.86
PSENet_box MSRA-TD500 SynthText 47.45 39.49 43.11
PSENet MSRA-TD500 SynthText 84.11 84.97 84.54
PSENet_pseudo(SynthText) MSRA-TD500 SynthText 84.03 84.03 84.03

The performance of PSENet on Total Text

Method Dataset Pretrain precision recall f-score
PSENet_box Total Text - 46.5 43.6 45.0
PSENet Total Text - 80.4 76.5 78.4
PSENet_pseudo(SynthText) Total Text - 80.33 73.54 76.78
PSENet_pseudo(Curved SynthText) Total Text - 81.68 74.61 78.0
PSENet_box Total Text SynthText 51.94 47.45 49.59
PSENet Total Text SynthText 83.4 78.1 80.7
PSENet_pseudo(SynthText) Total Text SynthText 81.57 75.54 78.44
PSENet_pseudo(Curved SynthText) Total Text SynthText 82.51 77.57 80.0

The visualization of bounding-box annotation and the pseudo labels generated by BBS on Total-Text The visualization of bounding-box annotation and the pseudo labels generated by BBS on Total-Text

links

https://github.com/SakuraRiven/EAST

https://github.com/WenmuZhou/PSENet.pytorch

License

For academic use, this project is licensed under the Apache License - see the LICENSE file for details. For commercial use, please contact the authors.

Citations

Please consider citing our paper in your publications if the project helps your research.

Eamil: [email protected]

Owner
weijiawu
computer version, OCR I am looking for a research intern or visiting chance.
weijiawu
Data cleaning, missing value handle, EDA use in this project

Lending Club Case Study Project Brief Solving this assignment will give you an idea about how real business problems are solved using EDA. In this cas

Dhruvil Sheth 1 Jan 05, 2022
Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition

Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition The official code of ABINet (CVPR 2021, Oral).

334 Dec 31, 2022
An ever-growing playground of notebooks showcasing CLIP's impressive zero-shot capabilities.

Playground for CLIP-like models Demo Colab Link GradCAM Visualization Naive Zero-shot Detection Smarter Zero-shot Detection Captcha Solver Changelog 2

Kevin Zakka 101 Dec 30, 2022
The code for our paper Semi-Supervised Learning with Multi-Head Co-Training

Semi-Supervised Learning with Multi-Head Co-Training (PyTorch) Abstract Co-training, extended from self-training, is one of the frameworks for semi-su

cmc 6 Dec 04, 2022
Instance-Dependent Partial Label Learning

Instance-Dependent Partial Label Learning Installation pip install -r requirements.txt Run the Demo benchmark-random mnist python -u main.py --gpu 0 -

17 Dec 29, 2022
PyTorch code for our ECCV 2020 paper "Single Image Super-Resolution via a Holistic Attention Network"

HAN PyTorch code for our ECCV 2020 paper "Single Image Super-Resolution via a Holistic Attention Network" This repository is for HAN introduced in the

五维空间 140 Nov 23, 2022
ByteTrack超详细教程!训练自己的数据集&&摄像头实时检测跟踪

ByteTrack超详细教程!训练自己的数据集&&摄像头实时检测跟踪

Double-zh 45 Dec 19, 2022
A set of tools for creating and testing machine learning features, with a scikit-learn compatible API

Feature Forge This library provides a set of tools that can be useful in many machine learning applications (classification, clustering, regression, e

Machinalis 380 Nov 05, 2022
This repository compare a selfie with images from identity documents and response if the selfie match.

aws-rekognition-facecompare This repository compare a selfie with images from identity documents and response if the selfie match. This code was made

1 Jan 27, 2022
Underwater industrial application yolov5m6

This project wins the intelligent algorithm contest finalist award and stands out from over 2000teams in China Underwater Robot Professional Contest, entering the final of China Underwater Robot Prof

8 Nov 09, 2022
BT-Unet: A-Self-supervised-learning-framework-for-biomedical-image-segmentation-using-Barlow-Twins

BT-Unet: A-Self-supervised-learning-framework-for-biomedical-image-segmentation-using-Barlow-Twins Deep learning has brought most profound contributio

Narinder Singh Punn 12 Dec 04, 2022
Dynamic Realtime Animation Control

Our project is targeted at making an application that dynamically detects the user’s expressions and gestures and projects it onto an animation software which then renders a 2D/3D animation realtime

Harsh Avinash 10 Aug 01, 2022
An open-source, low-cost, image-based weed detection device for fallow scenarios.

Welcome to the OpenWeedLocator (OWL) project, an opensource hardware and software green-on-brown weed detector that uses entirely off-the-shelf compon

Guy Coleman 145 Jan 05, 2023
Starter Code for VALUE benchmark

StarterCode for VALUE Benchmark This is the starter code for VALUE Benchmark [website], [paper]. This repository currently supports all baseline model

VALUE Benchmark 73 Dec 09, 2022
Lucid library adapted for PyTorch

Lucent PyTorch + Lucid = Lucent The wonderful Lucid library adapted for the wonderful PyTorch! Lucent is not affiliated with Lucid or OpenAI's Clarity

Lim Swee Kiat 520 Dec 26, 2022
Discovering Dynamic Salient Regions with Spatio-Temporal Graph Neural Networks

Discovering Dynamic Salient Regions with Spatio-Temporal Graph Neural Networks This is the official code for DyReg model inroduced in Discovering Dyna

Bitdefender Machine Learning 11 Nov 08, 2022
Garbage Detection system which will detect objects based on whether it is plastic waste or plastics or just garbage.

Garbage Detection using Yolov5 on Jetson Nano 2gb Developer Kit. Garbage detection system which will detect objects based on whether it is plastic was

Rishikesh A. Bondade 2 May 13, 2022
Interacting Two-Hand 3D Pose and Shape Reconstruction from Single Color Image (ICCV 2021)

Interacting Two-Hand 3D Pose and Shape Reconstruction from Single Color Image Interacting Two-Hand 3D Pose and Shape Reconstruction from Single Color

75 Dec 02, 2022
Experiments for Operating Systems Lab (ETCS-352)

Operating Systems Lab (ETCS-352) Experiments for Operating Systems Lab (ETCS-352) performed by me in 2021 at uni. All codes are written by me except t

Deekshant Wadhwa 0 Sep 06, 2022
The datasets and code of ACL 2021 paper "Aspect-Category-Opinion-Sentiment Quadruple Extraction with Implicit Aspects and Opinions".

Aspect-Category-Opinion-Sentiment (ACOS) Quadruple Extraction This repo contains the data sets and source code of our paper: Aspect-Category-Opinion-S

NUSTM 144 Jan 02, 2023