A novel region proposal network for more general object detection ( including scene text detection ).

Overview

DeRPN: Taking a further step toward more general object detection

DeRPN is a novel region proposal network which concentrates on improving the adaptivity of current detectors. The paper is available here.

Recent Update

· Mar. 13, 2019: The DeRPN pretrained models are added.

· Jan. 25, 2019: The code is released.

Contact Us

Welcome to improve DeRPN together. For any questions, please feel free to contact Lele Xie ([email protected]) or Prof. Jin ([email protected]).

Citation

If you find DeRPN useful to your research, please consider citing our paper as follow:

@article{xie2019DeRPN,
  title     = {DeRPN: Taking a further step toward more general object detection},
  author    = {Lele Xie, Yuliang Liu, Lianwen Jin*, Zecheng Xie}
  joural    = {AAAI}
  year      = {2019}
}

Main Results

Note: The reimplemented results are slightly different from those presented in the paper for different training settings, but the conclusions are still consistent. For example, this code doesn't use multi-scale training which should boost the results for both DeRPN and RPN.

COCO-Text

training data: COCO-Text train

test data: COCO-Text test

network [email protected] [email protected] [email protected] [email protected]
RPN+Faster R-CNN VGG16 32.48 52.54 7.40 17.59
DeRPN+Faster R-CNN VGG16 47.39 70.46 11.05 25.12
RPN+R-FCN ResNet-101 37.71 54.35 13.17 22.21
DeRPN+R-FCN ResNet-101 48.62 71.30 13.37 27.57

Pascal VOC

training data: VOC 07+12 trainval

test data: VOC 07 test

Inference time is evaluated on one TITAN XP GPU.

network inference time [email protected] [email protected] AP
RPN+Faster R-CNN VGG16 64 ms 75.53 42.08 42.60
DeRPN+Faster R-CNN VGG16 65 ms 76.17 44.97 43.84
RPN+R-FCN ResNet-101 85 ms 78.87 54.30 50.04
DeRPN+R-FCN (900) * ResNet-101 84 ms 79.21 54.43 50.28

( "*": On Pascal VOC dataset, we found that it is more suitable to train the DeRPN+R-FCN model with 900 proposals. For other experiments, we use the default proposal number to train the models, i.e., 2000 proposals fro Faster R-CNN, 300 proposals for R-FCN. )

MS COCO

training data: COCO 2017 train

test data: COCO 2017 test/val

test set network AP AP50 AP75 APS APM APL
RPN+Faster R-CNN VGG16 24.2 45.4 23.7 7.6 26.6 37.3
DeRPN+Faster R-CNN VGG16 25.5 47.2 25.2 10.3 27.9 36.7
RPN+R-FCN ResNet-101 27.7 47.9 29.0 10.1 30.2 40.1
DeRPN+R-FCN ResNet-101 28.4 49.0 29.5 11.1 31.7 40.5
val set network AP AP50 AP75 APS APM APL
RPN+Faster R-CNN VGG16 24.1 45.0 23.8 7.6 27.8 37.8
DeRPN+Faster R-CNN VGG16 25.5 47.3 25.0 9.9 28.8 37.8
RPN+R-FCN ResNet-101 27.8 48.1 28.8 10.4 31.2 42.5
DeRPN+R-FCN ResNet-101 28.4 48.5 29.5 11.5 32.9 42.0

Getting Started

  1. Requirements
  2. Installation
  3. Preparation for Training & Testing
  4. Usage

Requirements

  1. Cuda 8.0 and cudnn 5.1.
  2. Some python packages: cython, opencv-python, easydict et. al. Simply install them if your system misses these packages.
  3. Configure the caffe according to your environment (Caffe installation instructions). As the code requires pycaffe, caffe should be built with python layers. In Makefile.config, make sure to uncomment this line:
WITH_PYTHON_LAYER := 1
  1. An NVIDIA GPU with more than 6GB is required for ResNet-101.

Installation

  1. Clone the DeRPN repository

    git clone https://github.com/HCIILAB/DeRPN.git
    
  2. Build the Cython modules

    cd $DeRPN_ROOT/lib
    make
  3. Build caffe and pycaffe

    cd $DeRPN_ROOT/caffe
    make -j8 && make pycaffe

Preparation for Training & Testing

Dataset

  1. Download the datasets of Pascal VOC 2007 & 2012, MS COCO 2017 and COCO-Text.

  2. You need to put these datasets under the $DeRPN_ROOT/data folder (with symlinks).

  3. For COCO-Text, the folder structure is as follow:

    $DeRPN_ROOT/data/coco_text/images/train2014
    $DeRPN_ROOT/data/coco_text/images/val2014
    $DeRPN_ROOT/data/coco_text/annotations  
    # train2014, val2014, and annotations are symlinks from /pth_to_coco2014/train2014, 
    # /pth_to_coco2014/val2014 and /pth_to_coco2014/annotations2014/, respectively.
  4. For COCO, the folder structure is as follow:

    $DeRPN_ROOT/data/coco/images/train2017
    $DeRPN_ROOT/data/coco/images/val2017
    $DeRPN_ROOT/data/coco/images/test-dev2017
    $DeRPN_ROOT/data/coco/annotations  
    # the symlinks are similar to COCO-Text
  5. For Pascal VOC, the folder structure is as follow:

    $DeRPN_ROOT/data/VOCdevkit2007
    $DeRPN_ROOT/data/VOCdevkit2012
    #VOCdevkit2007 and VOCdevkit2012 are symlinks from $VOCdevkit whcich contains VOC2007 and VOC2012.

Pretrained models

Please download the ImageNet pretrained models (VGG16 and ResNet-101, password: k4z1), and put them under

$DeRPN_ROOT/data/imagenet_models

We also provide the DeRPN pretrained models here (password: fsd8).

Usage

cd $DeRPN_ROOT
./experiments/scripts/faster_rcnn_derpn_end2end.sh [GPU_ID] [NET] [DATASET]

# e.g., ./experiments/scripts/faster_rcnn_derpn_end2end.sh 0 VGG16 coco_text

Copyright

This code is free to the academic community for research purpose only. For commercial purpose usage, please contact Dr. Lianwen Jin: [email protected].

Owner
Deep Learning and Vision Computing Lab, SCUT
Deep Learning and Vision Computing Lab, SCUT
A curated list of resources for text detection/recognition (optical character recognition ) with deep learning methods.

awesome-deep-text-detection-recognition A curated list of awesome deep learning based papers on text detection and recognition. Text Detection Papers

2.4k Jan 08, 2023
Detect textlines in document images

Textline Detection Detect textlines in document images Introduction This tool performs border, region and textline detection from document image data

QURATOR-SPK 70 Jun 30, 2022
TedEval: A Fair Evaluation Metric for Scene Text Detectors

TedEval: A Fair Evaluation Metric for Scene Text Detectors Official Python 3 implementation of TedEval | paper | slides Chae Young Lee, Youngmin Baek,

Clova AI Research 167 Nov 20, 2022
Repository collecting all the submodules for the new PyTorch-based OCR System.

OCRopus3 is being replaced by OCRopus4, which is a rewrite using PyTorch 1.7; release should be soonish. Please check github.com/tmbdev/ocropus for up

NVIDIA Research Projects 138 Dec 09, 2022
Scene text detection and recognition based on Extremal Region(ER)

Scene text recognition A real-time scene text recognition algorithm. Our system is able to recognize text in unconstrain background. This algorithm is

HSIEH, YI CHIA 155 Dec 06, 2022
An Implementation of the alogrithm in paper IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Oriented Scene Text Detection

InceptText-Tensorflow An Implementation of the alogrithm in paper IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Orien

GeorgeJoe 115 Dec 12, 2022
Code for generating synthetic text images as described in "Synthetic Data for Text Localisation in Natural Images", Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, CVPR 2016.

SynthText Code for generating synthetic text images as described in "Synthetic Data for Text Localisation in Natural Images", Ankush Gupta, Andrea Ved

Ankush Gupta 1.8k Dec 28, 2022
Detect and fix skew in images containing text

Alyn Skew detection and correction in images containing text Image with skew Image after deskew Install and use via pip! Recommended way(using virtual

Kakul 230 Dec 21, 2022
A simple python program to record security cam footage by detecting a face and body of a person in the frame.

SecurityCam A simple python program to record security cam footage by detecting a face and body of a person in the frame. This code was created by me,

1 Nov 08, 2021
This is the implementation of the paper "Gated Recurrent Convolution Neural Network for OCR"

Gated Recurrent Convolution Neural Network for OCR This project is an implementation of the GRCNN for OCR. For details, please refer to the paper: htt

90 Dec 22, 2022
This repository contains the code for the paper "SCANimate: Weakly Supervised Learning of Skinned Clothed Avatar Networks"

SCANimate: Weakly Supervised Learning of Skinned Clothed Avatar Networks (CVPR 2021 Oral) This repository contains the official PyTorch implementation

Shunsuke Saito 235 Dec 18, 2022
Dataset and Code for ICCV 2021 paper "Real-world Video Super-resolution: A Benchmark Dataset and A Decomposition based Learning Scheme"

Dataset and Code for RealVSR Real-world Video Super-resolution: A Benchmark Dataset and A Decomposition based Learning Scheme Xi Yang, Wangmeng Xiang,

Xi Yang 91 Nov 22, 2022
Train custom VR face tracking parameters

Pal Buddy Guy: The anipal's best friend This is a small script to improve upon the tracking capabilities of the Vive Pro Eye and facial tracker. You c

7 Dec 12, 2021
天池2021"全球人工智能技术创新大赛"【赛道一】:医学影像报告异常检测 - 第三名解决方案

天池2021"全球人工智能技术创新大赛"【赛道一】:医学影像报告异常检测 比赛链接 个人博客记录 目录结构 ├── final------------------------------------决赛方案PPT ├── preliminary_contest--------------------

19 Aug 17, 2022
Optical character recognition for Japanese text, with the main focus being Japanese manga

Manga OCR Optical character recognition for Japanese text, with the main focus being Japanese manga. It uses a custom end-to-end model built with Tran

Maciej Budyś 327 Jan 01, 2023
a Deep Learning Framework for Text

DeLFT DeLFT (Deep Learning Framework for Text) is a Keras and TensorFlow framework for text processing, focusing on sequence labelling (e.g. named ent

Patrice Lopez 350 Dec 19, 2022
When Age-Invariant Face Recognition Meets Face Age Synthesis: A Multi-Task Learning Framework (CVPR 2021 oral)

MTLFace This repository contains the PyTorch implementation and the dataset of the paper: When Age-Invariant Face Recognition Meets Face Age Synthesis

Hzzone 120 Jan 05, 2023
This repository summarized computer vision theories.

This repository summarized computer vision theories.

3 Feb 04, 2022
Image augmentation for machine learning experiments.

imgaug This python library helps you with augmenting images for your machine learning projects. It converts a set of input images into a new, much lar

Alexander Jung 13.2k Jan 02, 2023
An OCR evaluation tool

dinglehopper dinglehopper is an OCR evaluation tool and reads ALTO, PAGE and text files. It compares a ground truth (GT) document page with a OCR resu

QURATOR-SPK 40 Dec 20, 2022