Pytorch re-implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition (CVPR 2022)

Last update: Jan 03, 2023

Related tags

Deep Learning SwinTextSpotter

Overview

SwinTextSpotter

This is the pytorch implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition (CVPR 2022). The paper is available at this link.

We use the models pre-trained on ImageNet. The ImageNet pre-trained SwinTransformer backbone is obtained from SwinT_detectron2.

Models

SWINTS-swin-english-pretrain [config] | model_Google Drive | model_BaiduYun PW: 954t

SWINTS-swin-Total-Text [config] | model_Google Drive | model_BaiduYun PW: tf0i

SWINTS-swin-ctw [config] | model_Google Drive | model_BaiduYun PW: 4etq

SWINTS-swin-icdar2015 [config] | model_Google Drive | model_BaiduYun PW: 3n82

SWINTS-swin-ReCTS [config] | model_Google Drive | model_BaiduYun PW: a4be

SWINTS-swin-vintext [config] | model_Google Drive | model_BaiduYun PW: slmp

Installation

Python=3.8
PyTorch=1.8.0, torchvision=0.9.0, cudatoolkit=11.1
OpenCV for visualization

Steps

Install the repository (we recommend to use Anaconda for installation.)

conda create -n SWINTS python=3.8 -y
conda activate SWINTS
conda install pytorch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0 cudatoolkit=11.1 -c pytorch -c conda-forge
pip install opencv-python
pip install scipy
pip install shapely
pip install rapidfuzz
pip install timm
pip install Polygon3
git clone https://github.com/mxin262/SwinTextSpotter.git
cd SwinTextSpotter
python setup.py build develop

dataset path

datasets
|_ totaltext
|  |_ train_images
|  |_ test_images
|  |_ totaltext_train.json
|  |_ weak_voc_new.txt
|  |_ weak_voc_pair_list.txt
|_ mlt2017
|  |_ train_images
|  |_ annotations/icdar_2017_mlt.json
.......

Downloaded images

ICDAR2017-MLT [image]
Syntext-150k:
- Part1: 94,723 [dataset]
- Part2: 54,327 [dataset]
ICDAR2015 [image]
ICDAR2013 [image]
Total-Text_train_images [image]
Total-Text_test_images [image]
ReCTs [images&label] PW: 2b4q
LSVT [images&label] PW: 9uh1
ArT [images&label] PW: 2865
SynChinese130k [images][label]
Vintext_images [image]

Downloaded label[Google Drive] [BaiduYun] PW: 46vd

Downloader lexicion[Google Drive] and place it to corresponding dataset.

You can also prepare your custom dataset following the example scripts. [example scripts]

Totaltext

To evaluate on Total Text, CTW1500, ICDAR2015, first download the zipped annotations with

cd datasets
mkdir evaluation
cd evaluation
wget -O gt_ctw1500.zip https://cloudstor.aarnet.edu.au/plus/s/xU3yeM3GnidiSTr/download
wget -O gt_totaltext.zip https://cloudstor.aarnet.edu.au/plus/s/SFHvin8BLUM4cNd/download
wget -O gt_icdar2015.zip https://drive.google.com/file/d/1wrq_-qIyb_8dhYVlDzLZTTajQzbic82Z/view?usp=sharing
wget -O gt_vintext.zip https://drive.google.com/file/d/11lNH0uKfWJ7Wc74PGshWCOgSxgEnUPEV/view?usp=sharing

Pretrain SWINTS (e.g., with Swin-Transformer backbone)

python projects/SWINTS/train_net.py \
  --num-gpus 8 \
  --config-file projects/SWINTS/configs/SWINTS-swin-pretrain.yaml

Fine-tune model on the mixed real dataset

python projects/SWINTS/train_net.py \
  --num-gpus 8 \
  --config-file projects/SWINTS/configs/SWINTS-swin-mixtrain.yaml

Fine-tune model

python projects/SWINTS/train_net.py \
  --num-gpus 8 \
  --config-file projects/SWINTS/configs/SWINTS-swin-finetune-totaltext.yaml

Evaluate SWINTS (e.g., with Swin-Transformer backbone)

python projects/SWINTS/train_net.py \
  --config-file projects/SWINTS/configs/SWINTS-swin-finetune-totaltext.yaml \
  --eval-only MODEL.WEIGHTS ./output/model_final.pth

Visualize the detection and recognition results (e.g., with ResNet50 backbone)

python demo/demo.py \
  --config-file projects/SWINTS/configs/SWINTS-swin-finetune-totaltext.yaml \
  --input input1.jpg \
  --output ./output \
  --confidence-threshold 0.4 \
  --opts MODEL.WEIGHTS ./output/model_final.pth

Example results:

Acknowlegement

Adelaidet, Detectron2, ISTR, SwinT_detectron2, Focal-Transformer and MaskTextSpotterV3.

Citation

If our paper helps your research, please cite it in your publications:

@article{huang2022swints,
  title = {SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition},
  author = {Mingxin Huang and YuLiang liu and Zhenghao Peng and Chongyu Liu and Dahua Lin and Shenggao Zhu and Nicholas Yuan and Kai Ding and Lianwen Jin},
  journal={arXiv preprint arXiv:2203.10209},
  year = {2022}
}

Copyright

For commercial purpose usage, please contact Dr. Lianwen Jin: [email protected]

Pytorch re-implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition (CVPR 2022)

Related tags

Overview

SwinTextSpotter

Models

Installation

Steps

Totaltext

Example results:

Acknowlegement

Citation

Copyright

Owner

mxin262

Official code release for "Learned Spatial Representations for Few-shot Talking-Head Synthesis" ICCV 2021

An efficient 3D semantic segmentation framework for Urban-scale point clouds like SensatUrban, Campus3D, etc.

Generative Flow Networks

Source code for "Understanding Knowledge Integration in Language Models with Graph Convolutions"

[CVPR 2020] GAN Compression: Efficient Architectures for Interactive Conditional GANs

iPOKE: Poking a Still Image for Controlled Stochastic Video Synthesis

This project helps to colorize grayscale images using multiple exemplars.

Unsupervised Pre-training for Person Re-identification (LUPerson)

Code + pre-trained models for the paper Keeping Your Eye on the Ball Trajectory Attention in Video Transformers

Official PyTorch code for Mutual Affine Network for Spatially Variant Kernel Estimation in Blind Image Super-Resolution (MANet, ICCV2021)

Personals scripts using ageitgey/face_recognition

Official code for 'Robust Siamese Object Tracking for Unmanned Aerial Manipulator' and offical introduction to UAMT100 benchmark

Official pytorch implementation of "DSPoint: Dual-scale Point Cloud Recognition with High-frequency Fusion"

[CVPR 2021] MiVOS - Mask Propagation module. Reproduced STM (and better) with training code :star2:. Semi-supervised video object segmentation evaluation.

a basic code repository for basic task in CV(classification,detection,segmentation)

Implementation of the Chamfer Distance as a module for pyTorch

Code for Deterministic Neural Networks with Appropriate Inductive Biases Capture Epistemic and Aleatoric Uncertainty

Generating Fractals on Starknet with Cairo

Code for project: "Learning to Minimize Remainder in Supervised Learning".

SGoLAM - Simultaneous Goal Localization and Mapping