Rotated Box Is Back : Accurate Box Proposal Network for Scene Text Detection

Last update: Dec 21, 2022

Related tags

Deep Learning rotated-box-is-back

Overview

Rotated Box Is Back : Accurate Box Proposal Network for Scene Text Detection

This material is supplementray code for paper accepted in ICDAR 2021

We highly recommend to use docker image because our model contains custom operation which depends on framework and cuda version.
We provide trained model for ICDAR 2017, 2013 which is in final_checkpoint_ch8 and for ICDAR 2015 which is in final_checkpoint_ch4
This code is mainly focused on inference. To train our model, training gpu like V100 is needed. please check our paper in detail.

REQUIREMENT

Nvidia-docker
Tensorflow 1.14
Miminum GPU requirement : NVIDIA GTX 1080TI

INSTALLATION

Make docker image and container

docker build --tag rbimage ./dockerfile
docker run --runtime=nvidia --name rbcontainer -v /rotated-box-is-back-path:/rotated-box-is-back -i -t rbimage /bin/bash

build custom operations in container

cd /rotated-box-is-back/nms 
cmake ./
make
./shell.sh

SAMPLE IMAGE INFERENCE

cd /rotated-box-is-back/
python viz.py --test_data_path=./sample --checkpoint_path=./final_checkpoint_ch8 --output_dir=./sample_result  --thres 0.6 --min_size=1600 --max_size=2000

ICDAR 2017 INFERENCE

please replace icdar_testset_path to your-icdar-2017-testset-folder path.

python viz.py --test_data_path=icdar_testset_path --checkpoint_path=./final_checkpoint_ch8 --output_dir=./ic17  --thres 0.6 --min_size=1600 --max_size=2000

ICDAR 2015 INFERENCE

please replace icdar_testset_path to your-icdar-2015-testset-folder path.
To converting evalutation format. Convert result text file like below

python viz.py --test_data_path=icdar_testset_path --checkpoint_path=./final_checkpoint_ch4 --output_dir=./ic15  --thres 0.7 --min_size=1100 --max_size=2000
python text_postprocessing.py -i=./ic15/ -o=./ic15_format/ -e True

ICDAR 2013 INFERENCE

please replace icdar_testset_path to your-icdar-2013-testset-folder path.
To converting evalutation format. Convert result text file like below

python viz.py --test_data_path=icdar_testset_path --checkpoint_path=./final_checkpoint_ch8 --output_dir=./ic13  --thres 0.55 --min_size=700 --max_size=900
python text_postprocessing.py -i=./ic13/ -o=./ic13_format/ -e True -m rec

EVALUATION TABLE

IC13			IC15			IC17
P	R	F	P	R	F	P	R	F
95.9	89.1	92.4	89.7	84.2	86.9	83.4	68.2	75.0

TRAINING

It can be trained below command line

python train_refine_estimator.py --input_size=1024 --batch_size=2 --checkpoint_path=./finetuning --training_data_path=your-image-path --training_gt_path=your-gt-path  --learning_rate=0.00001 --max_epochs=500  --save_summary_steps=1000 --warmup_path=./final_checkpoint_ch8

ACKNOWLEDGEMENT

This work was supported by Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No. 1711125972, Audio-Visual Perception for Autonomous Rescue Drones).

CITATION

If you found it is helpfull for your research, please cite:

Lee J., Lee J., Yang C., Lee Y., Lee J. (2021) Rotated Box Is Back: An Accurate Box Proposal Network for Scene Text Detection. In: Lladós J., Lopresti D., Uchida S. (eds) Document Analysis and Recognition – ICDAR 2021. ICDAR 2021. Lecture Notes in Computer Science, vol 12824. Springer, Cham. https://doi.org/10.1007/978-3-030-86337-1_4

Rotated Box Is Back : Accurate Box Proposal Network for Scene Text Detection

Related tags

Overview

Rotated Box Is Back : Accurate Box Proposal Network for Scene Text Detection

This material is supplementray code for paper accepted in ICDAR 2021

REQUIREMENT

INSTALLATION

SAMPLE IMAGE INFERENCE

ICDAR 2017 INFERENCE

ICDAR 2015 INFERENCE

ICDAR 2013 INFERENCE

EVALUATION TABLE

TRAINING

ACKNOWLEDGEMENT

CITATION

Owner

NCSOFT

Pytorch Implementation of "Desigining Network Design Spaces", Radosavovic et al. CVPR 2020.

Auto-Lama combines object detection and image inpainting to automate object removals

Official implementation of particle-based models (GNS and DPI-Net) on the Physion dataset.

Interactive Image Generation via Generative Adversarial Networks

Memory-Augmented Model Predictive Control

Clinica is a software platform for clinical research studies involving patients with neurological and psychiatric diseases and the acquisition of multimodal data

Python scripts for performing 3D human pose estimation using the Mobile Human Pose model in ONNX.

An OpenAI-Gym Package for Training and Testing Reinforcement Learning algorithms with OpenSim Models

This repository contains the source code of an efficient 1D probabilistic model for music time analysis proposed in ICASSP2022 venue.

Adversarial Adaptation with Distillation for BERT Unsupervised Domain Adaptation

sktime companion package for deep learning based on TensorFlow

Discovering Interpretable GAN Controls [NeurIPS 2020]

Deep Networks with Recurrent Layer Aggregation

A BaSiC Tool for Background and Shading Correction of Optical Microscopy Images

Federated Learning Based on Dynamic Regularization

Code for the paper SphereRPN: Learning Spheres for High-Quality Region Proposals on 3D Point Clouds Object Detection, ICIP 2021.

Multi-Agent Reinforcement Learning for Active Voltage Control on Power Distribution Networks (MAPDN)

Ensemble Knowledge Guided Sub-network Search and Fine-tuning for Filter Pruning

Continuous Security Group Rule Change Detection & Response at scale

Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields.