A curated list of resources dedicated to scene text localization and recognition

Overview

Awesome

Scene Text Localization & Recognition Resources

A curated list of resources dedicated to scene text localization and recognition. Any suggestions and pull requests are welcome.

Papers & Code

Overview

  • [2015-PAMI] Text Detection and Recognition in Imagery: A Survey paper
  • [2014-Front.Comput.Sci] Scene Text Detection and Recognition: Recent Advances and Future Trends paper

Visual Geometry Group, University of Oxford

CUHK & SIAT

  • [2016-arXiv] Accurate Text Localization in Natural Image with Cascaded Convolutional Text Network paper
  • [2016-AAAI] Reading Scene Text in Deep Convolutional Sequences paper
  • [2016-TIP] Text-Attentional Convolutional Neural Networks for Scene Text Detection paper
  • [2014-ECCV] Robust Scene Text Detection with Convolution Neural Network Induced MSER Trees paper

Media and Communication Lab, HUST

  • [2016-CVPR] Robust scene text recognition with automatic rectification paper
  • [2016-CVPR] Multi-oriented text detection with fully convolutional networks paper
  • [2015-CoRR] An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition paper code github

AI Lab, Stanford

  • [2012-ICPR, Wang] End-to-End Text Recognition with Convolutional Neural Networks paper code SVHN Dataset
  • [2012-PhD thesis, David Wu] End-to-End Text Recognition with Convolutional Neural Networks paper

Others

  • [2018-CVPR] FOTS: Fast Oriented Text Spotting With a Unified Network paper
  • [2018-IJCAI] IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Oriented Scene Text Detection paper
  • [2018-AAAI] PixelLink: Detecting Scene Text via Instance Segmentation paper code
  • [2018-AAAI] SEE: Towards Semi-Supervised End-to-End Scene Text Recognition paper code
  • [2017-arXiv] Fused Text Segmentation Networks for Multi-oriented Scene Text Detection paper
  • [2017-arXiv] WeText: Scene Text Detection under Weak Supervision paper
  • [2017-ICCV] Single Shot Text Detector with Regional Attention paper
  • [2017-ICCV] WordSup: Exploiting Word Annotations for Character based Text Detection paper
  • [2017-arXiv] R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection paper
  • [2017-CVPR] EAST: An Efficient and Accurate Scene Text Detector paper code
  • [2017-arXiv] Cascaded Segmentation-Detection Networks for Word-Level Text Spottingpaper
  • [2017-arXiv] Deep Direct Regression for Multi-Oriented Scene Text Detectionpaper
  • [2017-CVPR] Detecting oriented text in natural images by linking segments paper code
  • [2017-CVPR] Deep Matching Prior Network: Toward Tighter Multi-oriented Text Detectionpaper
  • [2017-arXiv] Arbitrary-Oriented Scene Text Detection via Rotation Proposals paper
  • [2017-AAAI] TextBoxes: A Fast Text Detector with a Single Deep Neural Network paper code
  • [2017-ICCV] Deep TextSpotter: An End-to-End Trainable Scene Text Localization and Recognition Framework paper code
  • [2016-CVPR] Recursive Recurrent Nets with Attention Modeling for OCR in the Wild paper
  • [2016-arXiv] COCO-Text: Dataset and Benchmark for Text Detection and Recognition in Natural Images paper
  • [2016-arXiv] DeepText:A Unified Framework for Text Proposal Generation and Text Detection in Natural Images paper
  • [2015 ICDAR] Object Proposals for Text Extraction in the Wild paper code
  • [2014-TPAMI] Word Spotting and Recognition with Embedded Attributes paper homepage code

Datasets

  • MLT 2017 2017

    • 7200 training, 1800 validation images
    • Bounding box, text transcription, and script annotations
    • Task: text detection, script identification
  • COCO-Text (Computer Vision Group, Cornell) 2016

    • 63,686 images, 173,589 text instances, 3 fine-grained text attributes.
    • Task: text location and recognition
    • COCO-Text API
  • Synthetic Word Dataset (Oxford, VGG) 2014

    • 9 million images covering 90k English words
    • Task: text recognition, segmentation
    • download
  • IIIT 5K-Words 2012

    • 5000 images from Scene Texts and born-digital (2k training and 3k testing images)
    • Each image is a cropped word image of scene text with case-insensitive labels
    • Task: text recognition
    • download
  • StanfordSynth(Stanford, AI Group) 2012

    • Small single-character images of 62 characters (0-9, a-z, A-Z)
    • Task: text recognition
    • download
  • MSRA Text Detection 500 Database (MSRA-TD500) 2012

    • 500 natural images(resolutions of the images vary from 1296x864 to 1920x1280)
    • Chinese, English or mixture of both
    • Task: text detection
  • Street View Text (SVT) 2010

    • 350 high resolution images (average size 1260 × 860) (100 images for training and 250 images for testing)
    • Only word level bounding boxes are provided with case-insensitive labels
    • Task: text location
  • KAIST Scene_Text Database 2010

    • 3000 images of indoor and outdoor scenes containing text
    • Korean, English (Number), and Mixed (Korean + English + Number)
    • Task: text location, segmantation and recognition
  • Chars74k 2009

    • Over 74K images from natural images, as well as a set of synthetically generated characters
    • Small single-character images of 62 characters (0-9, a-z, A-Z)
    • Task: text recognition
  • ICDAR Benchmark Datasets

Dataset Discription Competition Paper
ICDAR 2015 1000 training images and 500 testing images paper link
ICDAR 2013 229 training images and 233 testing images paper link
ICDAR 2011 229 training images and 255 testing images paper link
ICDAR 2005 1001 training images and 489 testing images paper link
ICDAR 2003 181 training images and 251 testing images(word level and character level) paper link

Blogs

Owner
CarlosTao
CarlosTao
Super Mario Game With Python

Super_Mario Hello all this is a simple python program which tries to use our body as a controller for the super mario game Here I have used media pipe

Adarsh Badagala 219 Nov 25, 2022
Binarize document images

Binarization Binarization for document images Examples Introduction This tool performs document image binarization (i.e. transform colour/grayscale to

QURATOR-SPK 48 Jan 02, 2023
Code for CVPR 2022 paper "Bailando: 3D dance generation via Actor-Critic GPT with Choreographic Memory"

Bailando Code for CVPR 2022 (oral) paper "Bailando: 3D dance generation via Actor-Critic GPT with Choreographic Memory" [Paper] | [Project Page] | [Vi

Li Siyao 237 Dec 29, 2022
A post-processing tool for scanned sheets of paper.

unpaper Originally written by Jens Gulden — see AUTHORS for more information. Licensed under GNU GPL v2 — see COPYING for more information. Overview u

27 Dec 07, 2022
Rest API Written In Python To Classify NSFW Images.

✨ NSFW Classifier API ✨ Rest API Written In Python To Classify NSFW Images. Fastest Solution If you don't want to selfhost it, there's already an inst

Akshay Rajput 23 Dec 30, 2022
PianoVisuals - Create background videos synced with piano music using opencv

Steps Record piano video Use Neural Network to do body segmentation (video matti

Solbiati Alessandro 4 Jan 24, 2022
LEARN OPENCV IN 3 HOURS USING PYTHON - INCLUDING EXAMPLE PROJECTS

LEARN OPENCV IN 3 HOURS USING PYTHON - INCLUDING EXAMPLE PROJECTS

Murtaza Hassan 815 Dec 29, 2022
CUTIE (TensorFlow implementation of Convolutional Universal Text Information Extractor)

CUTIE TensorFlow implementation of the paper "CUTIE: Learning to Understand Documents with Convolutional Universal Text Information Extractor." Xiaohu

Zhao,Xiaohui 147 Dec 20, 2022
Converts an image into funny, smaller amongus characters

SussyImage Converts an image into funny, smaller amongus characters Demo Mona Lisa | Lona Misa (Made up of AmongUs characters) API I've also added an

Dhravya Shah 14 Aug 18, 2022
Face Anonymizer - FaceAnonApp v1.0

Face Anonymizer - FaceAnonApp v1.0 Blur faces from image and video files in /data/files folder. Contents Repo of the source files for the FaceAnonApp.

6 Apr 18, 2022
Python Computer Vision Aim Bot for Roblox's Phantom Forces

Python-Phantom-Forces-Aim-Bot Python Computer Vision Aim Bot for Roblox's Phanto

drag0ngam3s 2 Jul 11, 2022
graph learning code for ogb

The final code for OGB Installation Requirements: ogb=1.3.1 torch=1.7.0 torch-geometric=1.7.0 torch-scatter=2.0.6 torch-sparse=0.6.9 Baseline models T

PierreHao 20 Nov 10, 2022
OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched or copy-pasted. ocrmypdf # it's a scriptable c

jbarlow83 7.9k Jan 03, 2023
A facial recognition program that plays a alarm (mp3 file) when a person i seen in the room. A basic theif using Python and OpenCV

Home-Security-Demo A facial recognition program that plays a alarm (mp3 file) when a person is seen in the room. A basic theif using Python and OpenCV

SysKey 4 Nov 02, 2021
Handwritten_Text_Recognition

Deep Learning framework for Line-level Handwritten Text Recognition Short presentation of our project Introduction Installation 2.a Install conda envi

24 Jul 15, 2022
Programa que viabiliza a OCR (Optical Character Reading - leitura óptica de caracteres) de um PDF.

Este programa tem o intuito de ser um modificador de arquivos PDF. Os arquivos PDFs podem ser 3: PDFs verdadeiros - em que podem ser selecionados o ti

Daniel Soares Saldanha 2 Oct 11, 2021
governance proposal to make fei redeemable for eth

Feil Proposal 🌲 Abstract Migrate all ETH from Fei protocol-controlled value into Yearn ETH Vault. Allow redemptions of outstanding FEI for yvETH. At

13 Mar 31, 2022
Creating of virtual elements of the graphical interface using opencv and mediapipe.

Virtual GUI Creating of virtual elements of the graphical interface using opencv and mediapipe. Element GUI Output Description Button By default the b

Aleksei 4 Jun 16, 2022
BoxToolBox is a simple python application built around the openCV library

BoxToolBox is a simple python application built around the openCV library. It is not a full featured application to guide you through the w

František Horínek 1 Nov 12, 2021
Document Layout Analysis Projects

Layout_Analysis Introduction This is an implementation of RLSA and X-Y Cut with OpenCV Dependencies OpenCV 3.0+ How to use Compile with g++ : g++ -std

22 Dec 08, 2022