Unconstrained Text Detection with Box Supervisionand Dynamic Self-Training

Overview

SelfText Beyond Polygon: Unconstrained Text Detection with Box Supervisionand Dynamic Self-Training

Alt text

Introduction

This is a PyTorch implementation of "SelfText Beyond Polygon: Unconstrained Text Detection with Box Supervisionand Dynamic Self-Training"

The paper propose a novel text detection system termed SelfText Beyond Polygon(SBP) with Bounding Box Supervision(BBS) and Dynamic Self Training~(DST), where training a polygon-based text detector with only a limited set of upright bounding box annotations. As shown in the Figure, SBP achieves the same performance as strong supervision while saving huge data annotation costs.

From more details,please refer to our arXiv paper

Environments

  • python 3
  • torch = 1.1.0
  • torchvision
  • Pillow
  • numpy

ToDo List

  • Release code(BBS)
  • Release code(DST)
  • Document for Installation
  • Document for testing and training
  • Evaluation
  • Demo script
  • re-organize and clean the parameters

Dataset

Supported:

  • ICDAR15
  • ICDAR17MLI
  • sythtext800K
  • TotalText
  • MSRA-TD500
  • CTW1500

model zoo

Supported text detection:

Bounding Box Supervision(BBS)

Train

The training strategy includes three steps: (1) training SASN with synthetic data (2) generating pseudo label on real data based on bounding box annotation with SASN (3) training the detectors(EAST and PSENet) with the pseudo label

training SASN with synthtext or curved synthtext

(TDB)

generating pseudo label on real data with SASN

(TDB)

training EAST or PSENet with the pseudo label

(TDB)

Eval

for example (batchsize=2)

(TDB)

Visualization

Dynamic Self Training

Train

(TDB)

Eval

for example (batchsize=2)

(TDB)

Visualization

Experiments

Bounding Box Supervision

The performance of EAST on ICDAR15

Method Dataset Pretrain precision recall f-score
EAST_box ICDAR15 - 65.8 63.8 64.8
EAST ICDAR15 - 76.9 77.1 77.0
EAST_pseudo(SynthText) ICDAR15 - 77.8 78.2 78.0
EAST_box ICDAR15 SynthText 70.8 72.0 71.4
EAST ICDAR15 SynthText 82.0 82.4 82.2
EAST_pseudo(SynthText) ICDAR15 SynthText 81.3 82.2 81.8

The performance of EAST on MSRA-TD500

Method Dataset Pretrain precision recall f-score
EAST_box MSRA-TD500 - 40.49 31.05 35.15
EAST MSRA-TD500 - 71.76 69.05 70.38
EAST_pseudo(SynthText) MSRA-TD500 - 71.27 67.54 69.36
EAST_box MSRA-TD500 SynthText 48.34 42.37 45.16
EAST MSRA-TD500 SynthText 77.91 76.45 77.17
EAST_pseudo(SynthText) MSRA-TD500 SynthText 77.42 73.85 75.59

The performance of PSENet on ICDAR15

Method Dataset Pretrain precision recall f-score
PSENet_box ICDAR15 - 70.17 69.09 69.63
PSENet ICDAR15 - 81.6 79.5 80.5
PSENet_pseudo(SynthText) ICDAR15 - 82.9 77.6 80.2
PSENet_box ICDAR15 SynthText 72.65 74.29 73.46
PSENet ICDAR15 SynthText 86.42 83.54 84.96
PSENet_pseudo(SynthText) ICDAR15 SynthText 86.77 83.34 85.02

The performance of PSENet on MSRA-TD500

Method Dataset Pretrain precision recall f-score
PSENet_box MSRA-TD500 - 47.17 36.90 41.41
PSENet MSRA-TD500 - 80.86 77.72 79.13
PSENet_pseudo(SynthText) MSRA-TD500 - 80.32 77.26 78.86
PSENet_box MSRA-TD500 SynthText 47.45 39.49 43.11
PSENet MSRA-TD500 SynthText 84.11 84.97 84.54
PSENet_pseudo(SynthText) MSRA-TD500 SynthText 84.03 84.03 84.03

The performance of PSENet on Total Text

Method Dataset Pretrain precision recall f-score
PSENet_box Total Text - 46.5 43.6 45.0
PSENet Total Text - 80.4 76.5 78.4
PSENet_pseudo(SynthText) Total Text - 80.33 73.54 76.78
PSENet_pseudo(Curved SynthText) Total Text - 81.68 74.61 78.0
PSENet_box Total Text SynthText 51.94 47.45 49.59
PSENet Total Text SynthText 83.4 78.1 80.7
PSENet_pseudo(SynthText) Total Text SynthText 81.57 75.54 78.44
PSENet_pseudo(Curved SynthText) Total Text SynthText 82.51 77.57 80.0

The visualization of bounding-box annotation and the pseudo labels generated by BBS on Total-Text The visualization of bounding-box annotation and the pseudo labels generated by BBS on Total-Text

links

https://github.com/SakuraRiven/EAST

https://github.com/WenmuZhou/PSENet.pytorch

License

For academic use, this project is licensed under the Apache License - see the LICENSE file for details. For commercial use, please contact the authors.

Citations

Please consider citing our paper in your publications if the project helps your research.

Eamil: [email protected]

Owner
weijiawu
computer version, OCR I am looking for a research intern or visiting chance.
weijiawu
Deep Unsupervised 3D SfM Face Reconstruction Based on Massive Landmark Bundle Adjustment.

(ACMMM 2021 Oral) SfM Face Reconstruction Based on Massive Landmark Bundle Adjustment This repository shows two tasks: Face landmark detection and Fac

BoomStar 51 Dec 13, 2022
Official code release for "Learned Spatial Representations for Few-shot Talking-Head Synthesis" ICCV 2021

Official code release for "Learned Spatial Representations for Few-shot Talking-Head Synthesis" ICCV 2021

Moustafa Meshry 16 Oct 05, 2022
Research shows Google collects 20x more data from Android than Apple collects from iOS. Block this non-consensual telemetry using pihole blocklists.

pihole-antitelemetry Research shows Google collects 20x more data from Android than Apple collects from iOS. Block both using these pihole lists. Proj

Adrian Edwards 290 Jan 09, 2023
Code release for the ICML 2021 paper "PixelTransformer: Sample Conditioned Signal Generation".

PixelTransformer Code release for the ICML 2021 paper "PixelTransformer: Sample Conditioned Signal Generation". Project Page Installation Please insta

Shubham Tulsiani 24 Dec 17, 2022
Recreate CenternetV2 based on MMDET.

Introduction This project is trying to Recreate CenternetV2 based on MMDET, which is proposed in paper Probabilistic two-stage detection. This project

25 Dec 09, 2022
Publication describing 3 ML examples at NSLS-II and interfacing into Bluesky

Machine learning enabling high-throughput and remote operations at large-scale user facilities. Overview This repository contains the source code and

BNL 4 Sep 24, 2022
ML course - EPFL Machine Learning Course, Fall 2021

EPFL Machine Learning Course CS-433 Machine Learning Course, Fall 2021 Repository for all lecture notes, labs and projects - resources, code templates

EPFL Machine Learning and Optimization Laboratory 1k Jan 04, 2023
Registration Loss Learning for Deep Probabilistic Point Set Registration

RLLReg This repository contains a Pytorch implementation of the point set registration method RLLReg. Details about the method can be found in the 3DV

Felix Järemo Lawin 35 Nov 02, 2022
Pyramid Grafting Network for One-Stage High Resolution Saliency Detection. CVPR 2022

PGNet Pyramid Grafting Network for One-Stage High Resolution Saliency Detection. CVPR 2022, CVPR 2022 (arXiv 2204.05041) Abstract Recent salient objec

CVTEAM 109 Dec 05, 2022
Model-based 3D Hand Reconstruction via Self-Supervised Learning, CVPR2021

S2HAND: Model-based 3D Hand Reconstruction via Self-Supervised Learning S2HAND presents a self-supervised 3D hand reconstruction network that can join

Yujin Chen 72 Dec 12, 2022
Back to Event Basics: SSL of Image Reconstruction for Event Cameras

Back to Event Basics: SSL of Image Reconstruction for Event Cameras Minimal code for Back to Event Basics: Self-Supervised Learning of Image Reconstru

TU Delft 42 Dec 26, 2022
Deep deconfounded recommender (Deep-Deconf) for paper "Deep causal reasoning for recommendations"

Deep Causal Reasoning for Recommender Systems The codes are associated with the following paper: Deep Causal Reasoning for Recommendations, Yaochen Zh

Yaochen Zhu 22 Oct 15, 2022
Neural Oblivious Decision Ensembles

Neural Oblivious Decision Ensembles A supplementary code for anonymous ICLR 2020 submission. What does it do? It learns deep ensembles of oblivious di

25 Sep 21, 2022
Code for Learning to Segment The Tail (LST)

Learning to Segment the Tail [arXiv] In this repository, we release code for Learning to Segment The Tail (LST). The code is directly modified from th

47 Nov 07, 2022
An original implementation of "MetaICL Learning to Learn In Context" by Sewon Min, Mike Lewis, Luke Zettlemoyer and Hannaneh Hajishirzi

MetaICL: Learning to Learn In Context This includes an original implementation of "MetaICL: Learning to Learn In Context" by Sewon Min, Mike Lewis, Lu

Meta Research 141 Jan 07, 2023
A library for Deep Learning Implementations and utils

deeply A Deep Learning library Table of Contents Features Quick Start Usage License Features Python 2.7+ and Python 3.4+ compatible. Quick Start $ pip

Achilles Rasquinha 1 Dec 12, 2022
High-Fidelity Pluralistic Image Completion with Transformers (ICCV 2021)

Image Completion Transformer (ICT) Project Page | Paper (ArXiv) | Pre-trained Models | Supplemental Material This repository is the official pytorch i

Ziyu Wan 243 Jan 03, 2023
PyTorch Implement of Context Encoders: Feature Learning by Inpainting

Context Encoders: Feature Learning by Inpainting This is the Pytorch implement of CVPR 2016 paper on Context Encoders 1) Semantic Inpainting Demo Inst

321 Dec 25, 2022
This GitHub repo consists of Code and Some results of project- Diabetes Treatment using Gold nanoparticles. These Consist of ML Models used for prediction Diabetes and further the basic theory and working of Gold nanoparticles.

GoldNanoparticles This GitHub repo consists of Code and Some results of project- Diabetes Treatment using Gold nanoparticles. These Consist of ML Mode

1 Jan 30, 2022
Official page of Patchwork (RA-L'21 w/ IROS'21)

Patchwork Official page of "Patchwork: Concentric Zone-based Region-wise Ground Segmentation with Ground Likelihood Estimation Using a 3D LiDAR Sensor

Hyungtae Lim 254 Jan 05, 2023