A novel region proposal network for more general object detection ( including scene text detection ).

Overview

DeRPN: Taking a further step toward more general object detection

DeRPN is a novel region proposal network which concentrates on improving the adaptivity of current detectors. The paper is available here.

Recent Update

· Mar. 13, 2019: The DeRPN pretrained models are added.

· Jan. 25, 2019: The code is released.

Contact Us

Welcome to improve DeRPN together. For any questions, please feel free to contact Lele Xie ([email protected]) or Prof. Jin ([email protected]).

Citation

If you find DeRPN useful to your research, please consider citing our paper as follow:

@article{xie2019DeRPN,
  title     = {DeRPN: Taking a further step toward more general object detection},
  author    = {Lele Xie, Yuliang Liu, Lianwen Jin*, Zecheng Xie}
  joural    = {AAAI}
  year      = {2019}
}

Main Results

Note: The reimplemented results are slightly different from those presented in the paper for different training settings, but the conclusions are still consistent. For example, this code doesn't use multi-scale training which should boost the results for both DeRPN and RPN.

COCO-Text

training data: COCO-Text train

test data: COCO-Text test

network [email protected] [email protected] [email protected] [email protected]
RPN+Faster R-CNN VGG16 32.48 52.54 7.40 17.59
DeRPN+Faster R-CNN VGG16 47.39 70.46 11.05 25.12
RPN+R-FCN ResNet-101 37.71 54.35 13.17 22.21
DeRPN+R-FCN ResNet-101 48.62 71.30 13.37 27.57

Pascal VOC

training data: VOC 07+12 trainval

test data: VOC 07 test

Inference time is evaluated on one TITAN XP GPU.

network inference time [email protected] [email protected] AP
RPN+Faster R-CNN VGG16 64 ms 75.53 42.08 42.60
DeRPN+Faster R-CNN VGG16 65 ms 76.17 44.97 43.84
RPN+R-FCN ResNet-101 85 ms 78.87 54.30 50.04
DeRPN+R-FCN (900) * ResNet-101 84 ms 79.21 54.43 50.28

( "*": On Pascal VOC dataset, we found that it is more suitable to train the DeRPN+R-FCN model with 900 proposals. For other experiments, we use the default proposal number to train the models, i.e., 2000 proposals fro Faster R-CNN, 300 proposals for R-FCN. )

MS COCO

training data: COCO 2017 train

test data: COCO 2017 test/val

test set network AP AP50 AP75 APS APM APL
RPN+Faster R-CNN VGG16 24.2 45.4 23.7 7.6 26.6 37.3
DeRPN+Faster R-CNN VGG16 25.5 47.2 25.2 10.3 27.9 36.7
RPN+R-FCN ResNet-101 27.7 47.9 29.0 10.1 30.2 40.1
DeRPN+R-FCN ResNet-101 28.4 49.0 29.5 11.1 31.7 40.5
val set network AP AP50 AP75 APS APM APL
RPN+Faster R-CNN VGG16 24.1 45.0 23.8 7.6 27.8 37.8
DeRPN+Faster R-CNN VGG16 25.5 47.3 25.0 9.9 28.8 37.8
RPN+R-FCN ResNet-101 27.8 48.1 28.8 10.4 31.2 42.5
DeRPN+R-FCN ResNet-101 28.4 48.5 29.5 11.5 32.9 42.0

Getting Started

  1. Requirements
  2. Installation
  3. Preparation for Training & Testing
  4. Usage

Requirements

  1. Cuda 8.0 and cudnn 5.1.
  2. Some python packages: cython, opencv-python, easydict et. al. Simply install them if your system misses these packages.
  3. Configure the caffe according to your environment (Caffe installation instructions). As the code requires pycaffe, caffe should be built with python layers. In Makefile.config, make sure to uncomment this line:
WITH_PYTHON_LAYER := 1
  1. An NVIDIA GPU with more than 6GB is required for ResNet-101.

Installation

  1. Clone the DeRPN repository

    git clone https://github.com/HCIILAB/DeRPN.git
    
  2. Build the Cython modules

    cd $DeRPN_ROOT/lib
    make
  3. Build caffe and pycaffe

    cd $DeRPN_ROOT/caffe
    make -j8 && make pycaffe

Preparation for Training & Testing

Dataset

  1. Download the datasets of Pascal VOC 2007 & 2012, MS COCO 2017 and COCO-Text.

  2. You need to put these datasets under the $DeRPN_ROOT/data folder (with symlinks).

  3. For COCO-Text, the folder structure is as follow:

    $DeRPN_ROOT/data/coco_text/images/train2014
    $DeRPN_ROOT/data/coco_text/images/val2014
    $DeRPN_ROOT/data/coco_text/annotations  
    # train2014, val2014, and annotations are symlinks from /pth_to_coco2014/train2014, 
    # /pth_to_coco2014/val2014 and /pth_to_coco2014/annotations2014/, respectively.
  4. For COCO, the folder structure is as follow:

    $DeRPN_ROOT/data/coco/images/train2017
    $DeRPN_ROOT/data/coco/images/val2017
    $DeRPN_ROOT/data/coco/images/test-dev2017
    $DeRPN_ROOT/data/coco/annotations  
    # the symlinks are similar to COCO-Text
  5. For Pascal VOC, the folder structure is as follow:

    $DeRPN_ROOT/data/VOCdevkit2007
    $DeRPN_ROOT/data/VOCdevkit2012
    #VOCdevkit2007 and VOCdevkit2012 are symlinks from $VOCdevkit whcich contains VOC2007 and VOC2012.

Pretrained models

Please download the ImageNet pretrained models (VGG16 and ResNet-101, password: k4z1), and put them under

$DeRPN_ROOT/data/imagenet_models

We also provide the DeRPN pretrained models here (password: fsd8).

Usage

cd $DeRPN_ROOT
./experiments/scripts/faster_rcnn_derpn_end2end.sh [GPU_ID] [NET] [DATASET]

# e.g., ./experiments/scripts/faster_rcnn_derpn_end2end.sh 0 VGG16 coco_text

Copyright

This code is free to the academic community for research purpose only. For commercial purpose usage, please contact Dr. Lianwen Jin: [email protected].

Owner
Deep Learning and Vision Computing Lab, SCUT
Deep Learning and Vision Computing Lab, SCUT
Image processing using OpenCv

Image processing using OpenCv Write a program that opens the webcam, and the user selects one of the following on the video: ✅ If the user presses the

M.Najafi 4 Feb 18, 2022
[python3.6] 运用tf实现自然场景文字检测,keras/pytorch实现ctpn+crnn+ctc实现不定长场景文字OCR识别

本文基于tensorflow、keras/pytorch实现对自然场景的文字检测及端到端的OCR中文文字识别 update20190706 为解决本项目中对数学公式预测的准确性,做了其他的改进和尝试,效果还不错,https://github.com/xiaofengShi/Image2Katex 希

xiaofeng 2.7k Dec 25, 2022
OCR, Scene-Text-Understanding, Text Recognition

Scene-Text-Understanding Survey [2015-PAMI] Text Detection and Recognition in Imagery: A Survey paper [2014-Front.Comput.Sci] Scene Text Detection and

Alan Tang 354 Dec 12, 2022
docstrum

Docstrum Algorithm Getting Started This repo is for developing a Docstrum algorithm presented by O’Gorman (1993). Disclaimer This source code is built

Chulwoo Mike Pack 54 Dec 13, 2022
A simple demo program for using OpenCV on Android

Kivy OpenCV Demo A simple demo program for using OpenCV on Android Build with: buildozer android debug deploy run Run (on desktop) with: python main.p

Andrea Ranieri 13 Dec 29, 2022
Satoshi is a discord bot template in python using discord.py that allow you to track some live crypto prices with your own discord bot.

Satoshi ~ DiscordCryptoBot Satoshi is a simple python discord bot using discord.py that allow you to track your favorites cryptos prices with your own

Théo 2 Sep 15, 2022
An easy to use an (hopefully useful) captcha solution for pyTelegramBotAPI

pyTelegramBotCAPTCHA An easy to use and (hopefully useful) image CAPTCHA soltion for pyTelegramBotAPI. Installation: pip install pyTelegramBotCAPTCHA

29 Dec 26, 2022
([email protected]) Boosting Co-teaching with Compression Regularization for Label Noise

Nested-Co-teaching ([email protected]) Pytorch implementation of paper "Boosting Co-tea

YINGYI CHEN 41 Jan 03, 2023
Pytorch implementation of PSEnet with Pyramid Attention Network as feature extractor

Scene Text-Spotting based on PSEnet+CRNN Pytorch implementation of an end to end Text-Spotter with a PSEnet text detector and CRNN text recognizer. We

azhar shaikh 62 Oct 10, 2022
Rubik's Cube in pygame with OpenGL

Rubik Rubik's Cube in pygame with OpenGL The script show on the screen a Rubik Cube buit with OpenGL. Then I have also implemented all the possible mo

Gabro 2 Apr 15, 2022
This can be use to convert text in a file to handwritten text.

TextToHandwriting This can be used to convert text to handwriting. Clone this project or download the code. Run TextToImage.py give the filename of th

Ashutosh Mahapatra 2 Feb 06, 2022
A fastai/PyTorch package for unpaired image-to-image translation.

Unpaired image-to-image translation A fastai/PyTorch package for unpaired image-to-image translation currently with CycleGAN implementation. This is a

Tanishq Abraham 120 Dec 02, 2022
[ICCV, 2021] Cloud Transformers: A Universal Approach To Point Cloud Processing Tasks

Cloud Transformers: A Universal Approach To Point Cloud Processing Tasks This is an official PyTorch code repository of the paper "Cloud Transformers:

Visual Understanding Lab @ Samsung AI Center Moscow 27 Dec 15, 2022
Just a script for detecting the lanes in any car game (not just gta 5) with specific resolution and road design ( very basic and limited )

GTA-5-Lane-detection Just a script for detecting the lanes in any car game (not just gta 5) with specific resolution and road design ( very basic and

Danciu Georgian 4 Aug 01, 2021
Python Computer Vision application that allows users to draw/erase on the screen using their webcam.

CV-Virtual-WhiteBoard The Virtual WhiteBoard is a project I made using the OpenCV and Mediapipe Python libraries. Using your index and middle finger y

Stephen Wang 1 Jan 07, 2022
With the virtual keyboard, you can write on the real time images by combining the thumb and index fingers on the letter you want.

Virtual Keyboard With the virtual keyboard, you can write on the real time images by combining the thumb and index fingers on the letter you want. At

Güldeniz Bektaş 5 Jan 23, 2022
Introduction to image processing, most used and popular functions of OpenCV

👀 OpenCV 101 Introduction to image processing, most used and popular functions of OpenCV go here.

Vusal Ismayilov 3 Jul 02, 2022
Usando o Amazon Textract como OCR para Extração de Dados no DynamoDB

dio-live-textract2 Repositório de código para o live coding do dia 05/10/2021 sobre extração de dados estruturados e gravação em banco de dados a part

hugoportela 0 Jan 19, 2022
Captcha Recognition

The objective of this project is to recognize the target numbers in the captcha images correctly which would tell us how good or bad a captcha system has been built.

Mohit Kaushik 5 Feb 20, 2022
Toolbox for OCR post-correction

Ochre Ochre is a toolbox for OCR post-correction. Please note that this software is experimental and very much a work in progress! Overview of OCR pos

National Library of the Netherlands / Research 117 Nov 10, 2022