Rotated Box Is Back : Accurate Box Proposal Network for Scene Text Detection

Overview

Rotated Box Is Back : Accurate Box Proposal Network for Scene Text Detection

overview

This material is supplementray code for paper accepted in ICDAR 2021

  1. We highly recommend to use docker image because our model contains custom operation which depends on framework and cuda version.
  2. We provide trained model for ICDAR 2017, 2013 which is in final_checkpoint_ch8 and for ICDAR 2015 which is in final_checkpoint_ch4
  3. This code is mainly focused on inference. To train our model, training gpu like V100 is needed. please check our paper in detail.

REQUIREMENT

  1. Nvidia-docker
  2. Tensorflow 1.14
  3. Miminum GPU requirement : NVIDIA GTX 1080TI

INSTALLATION

  • Make docker image and container
docker build --tag rbimage ./dockerfile
docker run --runtime=nvidia --name rbcontainer -v /rotated-box-is-back-path:/rotated-box-is-back -i -t rbimage /bin/bash
  • build custom operations in container
cd /rotated-box-is-back/nms 
cmake ./
make
./shell.sh

SAMPLE IMAGE INFERENCE

cd /rotated-box-is-back/
python viz.py --test_data_path=./sample --checkpoint_path=./final_checkpoint_ch8 --output_dir=./sample_result  --thres 0.6 --min_size=1600 --max_size=2000

ICDAR 2017 INFERENCE

  1. please replace icdar_testset_path to your-icdar-2017-testset-folder path.
python viz.py --test_data_path=icdar_testset_path --checkpoint_path=./final_checkpoint_ch8 --output_dir=./ic17  --thres 0.6 --min_size=1600 --max_size=2000

ICDAR 2015 INFERENCE

  1. please replace icdar_testset_path to your-icdar-2015-testset-folder path.
  2. To converting evalutation format. Convert result text file like below
python viz.py --test_data_path=icdar_testset_path --checkpoint_path=./final_checkpoint_ch4 --output_dir=./ic15  --thres 0.7 --min_size=1100 --max_size=2000
python text_postprocessing.py -i=./ic15/ -o=./ic15_format/ -e True

ICDAR 2013 INFERENCE

  1. please replace icdar_testset_path to your-icdar-2013-testset-folder path.
  2. To converting evalutation format. Convert result text file like below
python viz.py --test_data_path=icdar_testset_path --checkpoint_path=./final_checkpoint_ch8 --output_dir=./ic13  --thres 0.55 --min_size=700 --max_size=900
python text_postprocessing.py -i=./ic13/ -o=./ic13_format/ -e True -m rec

EVALUATION TABLE

IC13 IC15 IC17
P R F P R F P R F
95.9 89.1 92.4 89.7 84.2 86.9 83.4 68.2 75.0

TRAINING

  1. It can be trained below command line
python train_refine_estimator.py --input_size=1024 --batch_size=2 --checkpoint_path=./finetuning --training_data_path=your-image-path --training_gt_path=your-gt-path  --learning_rate=0.00001 --max_epochs=500  --save_summary_steps=1000 --warmup_path=./final_checkpoint_ch8

ACKNOWLEDGEMENT

This work was supported by Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No. 1711125972, Audio-Visual Perception for Autonomous Rescue Drones).

CITATION

If you found it is helpfull for your research, please cite:

Lee J., Lee J., Yang C., Lee Y., Lee J. (2021) Rotated Box Is Back: An Accurate Box Proposal Network for Scene Text Detection. In: Lladós J., Lopresti D., Uchida S. (eds) Document Analysis and Recognition – ICDAR 2021. ICDAR 2021. Lecture Notes in Computer Science, vol 12824. Springer, Cham. https://doi.org/10.1007/978-3-030-86337-1_4

Owner
NCSOFT
NCSOFT Open Sources
NCSOFT
JAXMAPP: JAX-based Library for Multi-Agent Path Planning in Continuous Spaces

JAXMAPP: JAX-based Library for Multi-Agent Path Planning in Continuous Spaces JAXMAPP is a JAX-based library for multi-agent path planning (MAPP) in c

OMRON SINIC X 24 Dec 28, 2022
Predict Breast Cancer Wisconsin (Diagnostic) using Naive Bayes

Naive-Bayes Predict Breast Cancer Wisconsin (Diagnostic) using Naive Bayes Downloading Data Set Use our Breast Cancer Wisconsin Data Set Also you can

Faeze Habibi 0 Apr 06, 2022
Aircraft design optimization made fast through modern automatic differentiation

Aircraft design optimization made fast through modern automatic differentiation. Plug-and-play analysis tools for aerodynamics, propulsion, structures, trajectory design, and much more.

Peter Sharpe 394 Dec 23, 2022
Scales, Chords, and Cadences: Practical Music Theory for MIR Researchers

ISMIR-musicTheoryTutorial This repository has slides and Jupyter notebooks for the ISMIR 2021 tutorial Scales, Chords, and Cadences: Practical Music T

Johanna Devaney 58 Oct 11, 2022
An implementation of EWC with PyTorch

EWC.pytorch An implementation of Elastic Weight Consolidation (EWC), proposed in James Kirkpatrick et al. Overcoming catastrophic forgetting in neural

Ryuichiro Hataya 166 Dec 22, 2022
The official implementation of A Unified Game-Theoretic Interpretation of Adversarial Robustness.

This repository is the official implementation of A Unified Game-Theoretic Interpretation of Adversarial Robustness. Requirements pip install -r requi

Jie Ren 17 Dec 12, 2022
Code for sound field predictions in domains with impedance boundaries. Used for generating results from the paper

Code for sound field predictions in domains with impedance boundaries. Used for generating results from the paper

DTU Acoustic Technology Group 11 Dec 17, 2022
An open software package to develop BCI based brain and cognitive computing technology for recognizing user's intention using deep learning

An open software package to develop BCI based brain and cognitive computing technology for recognizing user's intention using deep learning

deepbci 272 Jan 08, 2023
Convert Mission Planner (ArduCopter) Waypoint Missions to Litchi CSV Format to execute on DJI Drones

Mission Planner to Litchi Convert Mission Planner (ArduCopter) Waypoint Surveys to Litchi CSV Format to execute on DJI Drones Litchi doesn't support S

Yaros 24 Dec 09, 2022
Repository for open research on optimizers.

Open Optimizers Repository for open research on optimizers. This is a test in sharing research/exploration as it happens. If you use anything from thi

Ariel Ekgren 6 Jun 24, 2022
Project page for the paper Semi-Supervised Raw-to-Raw Mapping 2021.

Project page for the paper Semi-Supervised Raw-to-Raw Mapping 2021.

Mahmoud Afifi 22 Nov 08, 2022
Code for Contrastive-Geometry Networks for Generalized 3D Pose Transfer

CGTransformer Code for our AAAI 2022 paper "Contrastive-Geometry Transformer network for Generalized 3D Pose Transfer" Contrastive-Geometry Transforme

18 Jun 28, 2022
Code for "Adversarial Attack Generation Empowered by Min-Max Optimization", NeurIPS 2021

Min-Max Adversarial Attacks [Paper] [arXiv] [Video] [Slide] Adversarial Attack Generation Empowered by Min-Max Optimization Jingkang Wang, Tianyun Zha

Jingkang Wang 12 Nov 23, 2022
Supplementary code for SIGGRAPH 2021 paper: Discovering Diverse Athletic Jumping Strategies

SIGGRAPH 2021: Discovering Diverse Athletic Jumping Strategies project page paper demo video Prerequisites Important Notes We suspect there are bugs i

54 Dec 06, 2022
Rainbow is all you need! A step-by-step tutorial from DQN to Rainbow

Do you want a RL agent nicely moving on Atari? Rainbow is all you need! This is a step-by-step tutorial from DQN to Rainbow. Every chapter contains bo

Jinwoo Park (Curt) 1.4k Dec 29, 2022
Embeds a story into a music playlist by sorting the playlist so that the order of the music follows a narrative arc.

playlist-story-builder This project attempts to embed a story into a music playlist by sorting the playlist so that the order of the music follows a n

Dylan R. Ashley 0 Oct 28, 2021
K Closest Points and Maximum Clique Pruning for Efficient and Effective 3D Laser Scan Matching (To appear in RA-L 2022)

KCP The official implementation of KCP: k Closest Points and Maximum Clique Pruning for Efficient and Effective 3D Laser Scan Matching, accepted for p

Yu-Kai Lin 109 Dec 14, 2022
Repository for Multimodal AutoML Benchmark

Benchmarking Multimodal AutoML for Tabular Data with Text Fields Repository for the NeurIPS 2021 Dataset Track Submission "Benchmarking Multimodal Aut

Xingjian Shi 44 Nov 24, 2022
DSEE: Dually Sparsity-embedded Efficient Tuning of Pre-trained Language Models

DSEE Codes for [Preprint] DSEE: Dually Sparsity-embedded Efficient Tuning of Pre-trained Language Models Xuxi Chen, Tianlong Chen, Yu Cheng, Weizhu Ch

VITA 4 Dec 27, 2021
Reference implementation for Deep Unsupervised Learning using Nonequilibrium Thermodynamics

Diffusion Probabilistic Models This repository provides a reference implementation of the method described in the paper: Deep Unsupervised Learning us

Jascha Sohl-Dickstein 238 Jan 02, 2023