[ICME 2021 Oral] CORE-Text: Improving Scene Text Detection with Contrastive Relational Reasoning

Last update: Aug 11, 2022

Related tags

Deep Learning CORE-Text

Overview

CORE-Text: Improving Scene Text Detection with Contrastive Relational Reasoning

This repository is the official PyTorch implementation of CORE-Text, and contains demo training and evaluation scripts.

Requirements

mmdetection == 2.13.0
mmcv == 1.3.5
pyclipper == 1.3.0

Training Demo

Base (Mask R-CNN)

To train Base (Mask R-CNN) on a single node with 4 gpus, run:

#!/usr/bin/env bash

GPUS=4
PORT=${PORT:-29500}
PYTHON=${PYTHON:-"python"}

CONFIG=configs/icdar2017mlt/base.py
WORK_DIR=work_dirs/mask_rcnn_r50_fpn_train_base

$PYTHON -m torch.distributed.launch --nproc_per_node=$GPUS \
                                    --nnodes=1 --node_rank=0 --master_addr="localhost" \
                                    --master_port=$PORT \
                                    tools/train.py \
                                    $CONFIG \
                                    --no-validate \
                                    --launcher pytorch \
                                    --work-dir ${WORK_DIR} \
                                    --seed 0

VRM

To train VRM on a single node with 4 gpus, run:

#!/usr/bin/env bash

GPUS=4
PORT=${PORT:-29500}
PYTHON=${PYTHON:-"python"}

CONFIG=configs/icdar2017mlt/vrm.py
WORK_DIR=work_dirs/mask_rcnn_r50_fpn_train_vrm

$PYTHON -m torch.distributed.launch --nproc_per_node=$GPUS \
                                    --nnodes=1 --node_rank=0 --master_addr="localhost" \
                                    --master_port=$PORT \
                                    tools/train.py \
                                    $CONFIG \
                                    --no-validate \
                                    --launcher pytorch \
                                    --work-dir ${WORK_DIR} \
                                    --seed 0

CORE

To train CORE (ours) on a single node with 4 gpus, run:

#!/usr/bin/env bash

GPUS=4
PORT=${PORT:-29500}
PYTHON=${PYTHON:-"python"}

# pre-training
CONFIG=configs/icdar2017mlt/core_pretrain.py
WORK_DIR=work_dirs/mask_rcnn_r50_fpn_train_core_pretrain

$PYTHON -m torch.distributed.launch --nproc_per_node=$GPUS \
                                    --nnodes=1 --node_rank=0 --master_addr="localhost" \
                                    --master_port=$PORT \
                                    tools/train.py \
                                    $CONFIG \
                                    --no-validate \
                                    --launcher pytorch \
                                    --work-dir ${WORK_DIR} \
                                    --seed 0

# training
CONFIG=configs/icdar2017mlt/core.py
WORK_DIR=work_dirs/mask_rcnn_r50_fpn_train_core

$PYTHON -m torch.distributed.launch --nproc_per_node=$GPUS \
                                    --nnodes=1 --node_rank=0 --master_addr="localhost" \
                                    --master_port=$PORT \
                                    tools/train.py \
                                    $CONFIG \
                                    --no-validate \
                                    --launcher pytorch \
                                    --work-dir ${WORK_DIR} \
                                    --seed 0

Evaluation Demo

GPUS=4
PORT=${PORT:-29500}
CONFIG=path/to/config
CHECKPOINT=path/to/checkpoint

python -m torch.distributed.launch --nproc_per_node=$GPUS --master_port=$PORT \
    ./tools/test.py $CONFIG $CHECKPOINT --launcher pytorch \
    --eval segm \
    --not-encode-mask \
    --eval-options "jsonfile_prefix=path/to/work_dir/results/eval" "gt_path=data/icdar2017mlt/icdar2017mlt_gt.zip"

Dataset Format

The structure of the dataset directory is shown as following, and we provide the COCO-format label (ICDAR2017_train.json and ICDAR2017_val.json) and the ground truth zipfile (icdar2017mlt_gt.zip) for training and evaluation.

data
└── icdar2017mlt
    ├── annotations
    |   ├── ICDAR2017_train.json
    |   └── ICDAR2017_val.json
    ├── icdar2017mlt_gt.zip
    └── image
         ├── train
         └── val

Results

Our model achieves the following performance on ICDAR 2017 MLT val set. Note that the results are slightly different (~0.1%) from what we reported in the paper, because we reimplement the code based on the open-source mmdetection.

Method	Backbone	Training set	Test set	Hmean	Precision	Recall	Download
Base (Mask R-CNN)	ResNet50	ICDAR 2017 MLT Train	ICDAR 2017 MLT Val	0.800	0.828	0.773	model \| log
VRM	ResNet50	ICDAR 2017 MLT Train	ICDAR 2017 MLT Val	0.812	0.853	0.774	model \| log
CORE (ours)	ResNet50	ICDAR 2017 MLT Train	ICDAR 2017 MLT Val	0.821	0.872	0.777	model \| log

Citation

@inproceedings{9428457,
  author={Lin, Jingyang and Pan, Yingwei and Lai, Rongfeng and Yang, Xuehang and Chao, Hongyang and Yao, Ting},
  booktitle={2021 IEEE International Conference on Multimedia and Expo (ICME)},
  title={Core-Text: Improving Scene Text Detection with Contrastive Relational Reasoning},
  year={2021},
  pages={1-6},
  doi={10.1109/ICME51207.2021.9428457}
}

[ICME 2021 Oral] CORE-Text: Improving Scene Text Detection with Contrastive Relational Reasoning

Related tags

Overview

CORE-Text: Improving Scene Text Detection with Contrastive Relational Reasoning

Requirements

Training Demo

Base (Mask R-CNN)

VRM

CORE

Evaluation Demo

Dataset Format

Results

Citation

Owner

Jingyang Lin

(NeurIPS 2020) Wasserstein Distances for Stereo Disparity Estimation

PyTorch implementation of DCT fast weight RNNs

TVNet: Temporal Voting Network for Action Localization

EMNLP 2021 Adapting Language Models for Zero-shot Learning by Meta-tuning on Dataset and Prompt Collections

Visualize Camera's Pose Using Extrinsic Parameter by Plotting Pyramid Model on 3D Space

POCO: Point Convolution for Surface Reconstruction

All materials of Cassandra Event, Udyam'22

Revisiting Temporal Alignment for Video Restoration

Jittor implementation of Recursive-NeRF: An Efficient and Dynamically Growing NeRF

Automated Melanoma Recognition in Dermoscopy Images via Very Deep Residual Networks

Code for our work "Activation to Saliency: Forming High-Quality Labels for Unsupervised Salient Object Detection".

Spherical Confidence Learning for Face Recognition, accepted to CVPR2021.

Code release for Universal Domain Adaptation(CVPR 2019)

Machine Learning Privacy Meter: A tool to quantify the privacy risks of machine learning models with respect to inference attacks, notably membership inference attacks

🔥 Cannlytics-powered artificial intelligence 🤖

Multi-Modal Machine Learning toolkit based on PaddlePaddle.

Computations and statistics on manifolds with geometric structures.

An example of time series augmentation methods with Keras

TorchGeo is a PyTorch domain library, similar to torchvision, that provides datasets, transforms, samplers, and pre-trained models specific to geospatial data.

计算机视觉中用到的注意力模块和其他即插即用模块PyTorch Implementation Collection of Attention Module and Plug&Play Module