Unconstrained Text Detection with Box Supervisionand Dynamic Self-Training

Overview

SelfText Beyond Polygon: Unconstrained Text Detection with Box Supervisionand Dynamic Self-Training

Alt text

Introduction

This is a PyTorch implementation of "SelfText Beyond Polygon: Unconstrained Text Detection with Box Supervisionand Dynamic Self-Training"

The paper propose a novel text detection system termed SelfText Beyond Polygon(SBP) with Bounding Box Supervision(BBS) and Dynamic Self Training~(DST), where training a polygon-based text detector with only a limited set of upright bounding box annotations. As shown in the Figure, SBP achieves the same performance as strong supervision while saving huge data annotation costs.

From more details,please refer to our arXiv paper

Environments

  • python 3
  • torch = 1.1.0
  • torchvision
  • Pillow
  • numpy

ToDo List

  • Release code(BBS)
  • Release code(DST)
  • Document for Installation
  • Document for testing and training
  • Evaluation
  • Demo script
  • re-organize and clean the parameters

Dataset

Supported:

  • ICDAR15
  • ICDAR17MLI
  • sythtext800K
  • TotalText
  • MSRA-TD500
  • CTW1500

model zoo

Supported text detection:

Bounding Box Supervision(BBS)

Train

The training strategy includes three steps: (1) training SASN with synthetic data (2) generating pseudo label on real data based on bounding box annotation with SASN (3) training the detectors(EAST and PSENet) with the pseudo label

training SASN with synthtext or curved synthtext

(TDB)

generating pseudo label on real data with SASN

(TDB)

training EAST or PSENet with the pseudo label

(TDB)

Eval

for example (batchsize=2)

(TDB)

Visualization

Dynamic Self Training

Train

(TDB)

Eval

for example (batchsize=2)

(TDB)

Visualization

Experiments

Bounding Box Supervision

The performance of EAST on ICDAR15

Method Dataset Pretrain precision recall f-score
EAST_box ICDAR15 - 65.8 63.8 64.8
EAST ICDAR15 - 76.9 77.1 77.0
EAST_pseudo(SynthText) ICDAR15 - 77.8 78.2 78.0
EAST_box ICDAR15 SynthText 70.8 72.0 71.4
EAST ICDAR15 SynthText 82.0 82.4 82.2
EAST_pseudo(SynthText) ICDAR15 SynthText 81.3 82.2 81.8

The performance of EAST on MSRA-TD500

Method Dataset Pretrain precision recall f-score
EAST_box MSRA-TD500 - 40.49 31.05 35.15
EAST MSRA-TD500 - 71.76 69.05 70.38
EAST_pseudo(SynthText) MSRA-TD500 - 71.27 67.54 69.36
EAST_box MSRA-TD500 SynthText 48.34 42.37 45.16
EAST MSRA-TD500 SynthText 77.91 76.45 77.17
EAST_pseudo(SynthText) MSRA-TD500 SynthText 77.42 73.85 75.59

The performance of PSENet on ICDAR15

Method Dataset Pretrain precision recall f-score
PSENet_box ICDAR15 - 70.17 69.09 69.63
PSENet ICDAR15 - 81.6 79.5 80.5
PSENet_pseudo(SynthText) ICDAR15 - 82.9 77.6 80.2
PSENet_box ICDAR15 SynthText 72.65 74.29 73.46
PSENet ICDAR15 SynthText 86.42 83.54 84.96
PSENet_pseudo(SynthText) ICDAR15 SynthText 86.77 83.34 85.02

The performance of PSENet on MSRA-TD500

Method Dataset Pretrain precision recall f-score
PSENet_box MSRA-TD500 - 47.17 36.90 41.41
PSENet MSRA-TD500 - 80.86 77.72 79.13
PSENet_pseudo(SynthText) MSRA-TD500 - 80.32 77.26 78.86
PSENet_box MSRA-TD500 SynthText 47.45 39.49 43.11
PSENet MSRA-TD500 SynthText 84.11 84.97 84.54
PSENet_pseudo(SynthText) MSRA-TD500 SynthText 84.03 84.03 84.03

The performance of PSENet on Total Text

Method Dataset Pretrain precision recall f-score
PSENet_box Total Text - 46.5 43.6 45.0
PSENet Total Text - 80.4 76.5 78.4
PSENet_pseudo(SynthText) Total Text - 80.33 73.54 76.78
PSENet_pseudo(Curved SynthText) Total Text - 81.68 74.61 78.0
PSENet_box Total Text SynthText 51.94 47.45 49.59
PSENet Total Text SynthText 83.4 78.1 80.7
PSENet_pseudo(SynthText) Total Text SynthText 81.57 75.54 78.44
PSENet_pseudo(Curved SynthText) Total Text SynthText 82.51 77.57 80.0

The visualization of bounding-box annotation and the pseudo labels generated by BBS on Total-Text The visualization of bounding-box annotation and the pseudo labels generated by BBS on Total-Text

links

https://github.com/SakuraRiven/EAST

https://github.com/WenmuZhou/PSENet.pytorch

License

For academic use, this project is licensed under the Apache License - see the LICENSE file for details. For commercial use, please contact the authors.

Citations

Please consider citing our paper in your publications if the project helps your research.

Eamil: [email protected]

Owner
weijiawu
computer version, OCR I am looking for a research intern or visiting chance.
weijiawu
SNE-RoadSeg in PyTorch, ECCV 2020

SNE-RoadSeg Introduction This is the official PyTorch implementation of SNE-RoadSeg: Incorporating Surface Normal Information into Semantic Segmentati

242 Dec 20, 2022
Code release for the paper “Worldsheet Wrapping the World in a 3D Sheet for View Synthesis from a Single Image”, ICCV 2021.

Worldsheet: Wrapping the World in a 3D Sheet for View Synthesis from a Single Image This repository contains the code for the following paper: R. Hu,

Meta Research 37 Jan 04, 2023
Implementation of accepted AAAI 2021 paper: Deep Unsupervised Image Hashing by Maximizing Bit Entropy

Deep Unsupervised Image Hashing by Maximizing Bit Entropy This is the PyTorch implementation of accepted AAAI 2021 paper: Deep Unsupervised Image Hash

62 Dec 30, 2022
Neurons Dataset API - The official dataloader and visualization tools for Neurons Datasets.

Neurons Dataset API - The official dataloader and visualization tools for Neurons Datasets. Introduction We propose our dataloader API for loading and

1 Nov 19, 2021
Deep Two-View Structure-from-Motion Revisited

Deep Two-View Structure-from-Motion Revisited This repository provides the code for our CVPR 2021 paper Deep Two-View Structure-from-Motion Revisited.

Jianyuan Wang 145 Jan 06, 2023
Source code of CIKM2021 Long Paper "PSSL: Self-supervised Learning for Personalized Search with Contrastive Sampling".

PSSL Source code of CIKM2021 Long Paper "PSSL: Self-supervised Learning for Personalized Search with Contrastive Sampling". It consists of the pre-tra

2 Dec 21, 2021
Code accompanying the paper Shared Independent Component Analysis for Multi-subject Neuroimaging

ShICA Code accompanying the paper Shared Independent Component Analysis for Multi-subject Neuroimaging Install Move into the ShICA directory cd ShICA

8 Nov 07, 2022
Code and dataset for ACL2018 paper "Exploiting Document Knowledge for Aspect-level Sentiment Classification"

Aspect-level Sentiment Classification Code and dataset for ACL2018 [paper] ‘‘Exploiting Document Knowledge for Aspect-level Sentiment Classification’’

Ruidan He 146 Nov 29, 2022
Sequence Modeling with Structured State Spaces

Structured State Spaces for Sequence Modeling This repository provides implementations and experiments for the following papers. S4 Efficiently Modeli

HazyResearch 896 Jan 01, 2023
Optimus: the first large-scale pre-trained VAE language model

Optimus: the first pre-trained Big VAE language model This repository contains source code necessary to reproduce the results presented in the EMNLP 2

314 Dec 19, 2022
Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition

Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition The official code of ABINet (CVPR 2021, Oral).

334 Dec 31, 2022
Consensus score for tripadvisor

ContripScore ContripScore is essentially a score that combines an Internet platform rating and a consensus rating from sentiment analysis (For instanc

Pepe 1 Jan 13, 2022
3D position tracking for soccer players with multi-camera videos

This repo contains a full pipeline to support 3D position tracking of soccer players, with multi-view calibrated moving/fixed video sequences as inputs.

Yuchang Jiang 72 Dec 27, 2022
An End-to-End Machine Learning Library to Optimize AUC (AUROC, AUPRC).

Logo by Zhuoning Yuan LibAUC: A Machine Learning Library for AUC Optimization Website | Updates | Installation | Tutorial | Research | Github LibAUC a

Optimization for AI 176 Jan 07, 2023
Fashion Landmark Estimation with HRNet

HRNet for Fashion Landmark Estimation (Modified from deep-high-resolution-net.pytorch) Introduction This code applies the HRNet (Deep High-Resolution

SVIP Lab 91 Dec 26, 2022
Notspot robot simulation - Python version

Notspot robot simulation - Python version This repository contains all the files and code needed to simulate the notspot quadrupedal robot using Gazeb

50 Sep 26, 2022
Programming with Neural Surrogates of Programs

Programming with Neural Surrogates of Programs

0 Dec 12, 2021
Accelerated Multi-Modal MR Imaging with Transformers

Accelerated Multi-Modal MR Imaging with Transformers Dependencies numpy==1.18.5 scikit_image==0.16.2 torchvision==0.8.1 torch==1.7.0 runstats==1.8.0 p

54 Dec 16, 2022
This repository contains the data and code for the paper "Diverse Text Generation via Variational Encoder-Decoder Models with Gaussian Process Priors" ([email protected])

GP-VAE This repository provides datasets and code for preprocessing, training and testing models for the paper: Diverse Text Generation via Variationa

Wanyu Du 18 Dec 29, 2022
Official code repository of the paper Learning Associative Inference Using Fast Weight Memory by Schlag et al.

Learning Associative Inference Using Fast Weight Memory This repository contains the offical code for the paper Learning Associative Inference Using F

Imanol Schlag 18 Oct 12, 2022