TextBoxes: A Fast Text Detector with a Single Deep Neural Network https://github.com/MhLiao/TextBoxes 基于SSD改进的文本检测算法,textBoxes_note记录了之前整理的笔记。

Overview

TextBoxes: A Fast Text Detector with a Single Deep Neural Network

Introduction

This paper presents an end-to-end trainable fast scene text detector, named TextBoxes, which detects scene text with both high accuracy and efficiency in a single network forward pass, involving no post-process except for a standard nonmaximum suppression. For more details, please refer to our paper.

Citing TextBoxes

Please cite TextBoxes in your publications if it helps your research:

@inproceedings{LiaoSBWL17,
  author    = {Minghui Liao and
               Baoguang Shi and
               Xiang Bai and
               Xinggang Wang and
               Wenyu Liu},
  title     = {TextBoxes: {A} Fast Text Detector with a Single Deep Neural Network},
  booktitle = {AAAI},
  year      = {2017}
}

Contents

  1. Installation
  2. Download
  3. Test
  4. Train
  5. Performance

Installation

  1. Get the code. We will call the directory that you cloned Caffe into $CAFFE_ROOT
git clone https://github.com/MhLiao/TextBoxes.git

cd TextBoxes

make -j8

make py

Download

  1. Models trained on ICDAR 2013: Dropbox link BaiduYun link
  2. Fully convolutional reduced (atrous) VGGNet: Dropbox link BaiduYun link
  3. Compiled mex file for evaluation(for multi-scale test evaluation: evaluation_nms.m): Dropbox link BaiduYun link

Test

  1. Download the ICDAR 2013 DataSet
  2. Download the Models trained on ICDAR 2013
  3. Modify the related paths in the "examples/TextBoxes/test_icdar13.py"
  4. run "python examples/test_icdar13.py"
  5. To multi-scale test, you should use "test_icdar13_multi_scale.py" and "evaluation_nms.m"

Train

  1. Train about 50k iterions on Synthetic data which refered in the paper.
  2. Train about 2k iterions on corresponding training data such as ICDAR 2013 and SVT.
  3. For more information, such as learning rate setting, please refer to the paper.

Performance

  1. Using the given test code, you can achieve an F-measure of about 80% on ICDAR 2013 with a single scale.
  2. Using the given multi-scale test code, you can achieve an F-measure of about 85% on ICDAR 2013 with a non-maximum suppression.
  3. More performance information, please refer to the paper and Task1 and Task4 of Challenge2 on the ICDAR 2015 website: http://rrc.cvc.uab.es/?ch=2&com=evaluation

Please let me know if you encounter any issues.

Owner
zhangjing1
zhangjing1
Textboxes implementation with Tensorflow (python)

tb_tensorflow A python implementation of TextBoxes Dependencies TensorFlow r1.0 OpenCV2 Code from Chaoyue Wang 03/09/2017 Update: 1.Debugging optimize

Jayne Shin (신재인) 20 May 31, 2019
Unofficial implementation of "TableNet: Deep Learning model for end-to-end Table detection and Tabular data extraction from Scanned Document Images"

TableNet Unofficial implementation of ICDAR 2019 paper : TableNet: Deep Learning model for end-to-end Table detection and Tabular data extraction from

Jainam Shah 243 Dec 30, 2022
Omdena-abuja-anpd - Automatic Number Plate Detection for the security of lives and properties using Computer Vision.

Omdena-abuja-anpd - Automatic Number Plate Detection for the security of lives and properties using Computer Vision.

Abdulazeez Jimoh 1 Jan 01, 2022
Genalog is an open source, cross-platform python package allowing generation of synthetic document images with custom degradations and text alignment capabilities.

Genalog is an open source, cross-platform python package allowing generation of synthetic document images with custom degradations and text alignment capabilities.

Microsoft 235 Dec 22, 2022
Page to PAGE Layout Analysis Tool

P2PaLA Page to PAGE Layout Analysis (P2PaLA) is a toolkit for Document Layout Analysis based on Neural Networks. 💥 Try our new DEMO for online baseli

Lorenzo Quirós Díaz 180 Nov 24, 2022
Distilling Knowledge via Knowledge Review, CVPR 2021

ReviewKD Distilling Knowledge via Knowledge Review Pengguang Chen, Shu Liu, Hengshuang Zhao, Jiaya Jia This project provides an implementation for the

DV Lab 194 Dec 28, 2022
Python package for handwriting and sketching in Jupyter cells

ipysketch A Python package for handwriting and sketching in Jupyter notebooks. Usage A movie is worth a thousand pictures is worth a million words...

Matthias Baer 16 Jan 05, 2023
https://arxiv.org/abs/1904.01941

Character-Region-Awareness-for-Text-Detection- https://arxiv.org/abs/1904.01941 Train You can train SynthText data use python source/train_SynthText.p

DayDayUp 120 Dec 28, 2022
Volume Control using OpenCV

Gesture-Volume-Control Volume Control using OpenCV Here i made volume control using Python and OpenCV in which we can control the volume of our laptop

Mudit Sinha 3 Oct 10, 2021
A semi-automatic open-source tool for Layout Analysis and Region EXtraction on early printed books.

LAREX LAREX is a semi-automatic open-source tool for layout analysis on early printed books. It uses a rule based connected components approach which

162 Jan 05, 2023
The open source extract transaction infomation by using OCR.

Transaction OCR Mã nguồn trích xuất thông tin transaction từ file scaned pdf, ở đây tôi lựa chọn tài liệu sao kê công khai của Thuy Tien. Mã nguồn có

Nguyen Xuan Hung 18 Jun 02, 2022
Let's explore how we can extract text from forms

Form Segmentation Let's explore how we can extract text from any forms / scanned pages. Objectives The goal is to find an algorithm that can extract t

Philip Doxakis 42 Jun 05, 2022
OCR engine for all the languages

Description kraken is a turn-key OCR system optimized for historical and non-Latin script material. kraken's main features are: Fully trainable layout

431 Jan 04, 2023
⛓ marc is a small, but flexible Markov chain generator

About marc (markov chain) is a small, but flexible Markov chain generator. Usage marc is easy to use. To build a MarkovChain pass the object a sequenc

Max Humber 65 Oct 27, 2022
Optical character recognition for Japanese text, with the main focus being Japanese manga

Manga OCR Optical character recognition for Japanese text, with the main focus being Japanese manga. It uses a custom end-to-end model built with Tran

Maciej Budyś 327 Jan 01, 2023
Image Detector and Convertor App created using python's Pillow, OpenCV, cvlib, numpy and streamlit packages.

Image Detector and Convertor App created using python's Pillow, OpenCV, cvlib, numpy and streamlit packages.

Siva Prakash 11 Jan 02, 2022
Just a script for detecting the lanes in any car game (not just gta 5) with specific resolution and road design ( very basic and limited )

GTA-5-Lane-detection Just a script for detecting the lanes in any car game (not just gta 5) with specific resolution and road design ( very basic and

Danciu Georgian 4 Aug 01, 2021
Visual Attention based OCR

Attention-OCR Authours: Qi Guo and Yuntian Deng Visual Attention based OCR. The model first runs a sliding CNN on the image (images are resized to hei

Yuntian Deng 1.1k Jan 02, 2023
ISI's Optical Character Recognition (OCR) software for machine-print and handwriting data

VistaOCR ISI's Optical Character Recognition (OCR) software for machine-print and handwriting data Publications "How to Efficiently Increase Resolutio

ISI Center for Vision, Image, Speech, and Text Analytics 21 Dec 08, 2021
Extracting Tables from Document Images using a Multi-stage Pipeline for Table Detection and Table Structure Recognition:

Multi-Type-TD-TSR Check it out on Source Code of our Paper: Multi-Type-TD-TSR Extracting Tables from Document Images using a Multi-stage Pipeline for

Pascal Fischer 178 Dec 27, 2022