A curated list of resources dedicated to scene text localization and recognition

Overview

Awesome

Scene Text Localization & Recognition Resources

A curated list of resources dedicated to scene text localization and recognition. Any suggestions and pull requests are welcome.

Papers & Code

Overview

  • [2015-PAMI] Text Detection and Recognition in Imagery: A Survey paper
  • [2014-Front.Comput.Sci] Scene Text Detection and Recognition: Recent Advances and Future Trends paper

Visual Geometry Group, University of Oxford

CUHK & SIAT

  • [2016-arXiv] Accurate Text Localization in Natural Image with Cascaded Convolutional Text Network paper
  • [2016-AAAI] Reading Scene Text in Deep Convolutional Sequences paper
  • [2016-TIP] Text-Attentional Convolutional Neural Networks for Scene Text Detection paper
  • [2014-ECCV] Robust Scene Text Detection with Convolution Neural Network Induced MSER Trees paper

Media and Communication Lab, HUST

  • [2016-CVPR] Robust scene text recognition with automatic rectification paper
  • [2016-CVPR] Multi-oriented text detection with fully convolutional networks paper
  • [2015-CoRR] An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition paper code github

AI Lab, Stanford

  • [2012-ICPR, Wang] End-to-End Text Recognition with Convolutional Neural Networks paper code SVHN Dataset
  • [2012-PhD thesis, David Wu] End-to-End Text Recognition with Convolutional Neural Networks paper

Others

  • [2018-CVPR] FOTS: Fast Oriented Text Spotting With a Unified Network paper
  • [2018-IJCAI] IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Oriented Scene Text Detection paper
  • [2018-AAAI] PixelLink: Detecting Scene Text via Instance Segmentation paper code
  • [2018-AAAI] SEE: Towards Semi-Supervised End-to-End Scene Text Recognition paper code
  • [2017-arXiv] Fused Text Segmentation Networks for Multi-oriented Scene Text Detection paper
  • [2017-arXiv] WeText: Scene Text Detection under Weak Supervision paper
  • [2017-ICCV] Single Shot Text Detector with Regional Attention paper
  • [2017-ICCV] WordSup: Exploiting Word Annotations for Character based Text Detection paper
  • [2017-arXiv] R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection paper
  • [2017-CVPR] EAST: An Efficient and Accurate Scene Text Detector paper code
  • [2017-arXiv] Cascaded Segmentation-Detection Networks for Word-Level Text Spottingpaper
  • [2017-arXiv] Deep Direct Regression for Multi-Oriented Scene Text Detectionpaper
  • [2017-CVPR] Detecting oriented text in natural images by linking segments paper code
  • [2017-CVPR] Deep Matching Prior Network: Toward Tighter Multi-oriented Text Detectionpaper
  • [2017-arXiv] Arbitrary-Oriented Scene Text Detection via Rotation Proposals paper
  • [2017-AAAI] TextBoxes: A Fast Text Detector with a Single Deep Neural Network paper code
  • [2017-ICCV] Deep TextSpotter: An End-to-End Trainable Scene Text Localization and Recognition Framework paper code
  • [2016-CVPR] Recursive Recurrent Nets with Attention Modeling for OCR in the Wild paper
  • [2016-arXiv] COCO-Text: Dataset and Benchmark for Text Detection and Recognition in Natural Images paper
  • [2016-arXiv] DeepText:A Unified Framework for Text Proposal Generation and Text Detection in Natural Images paper
  • [2015 ICDAR] Object Proposals for Text Extraction in the Wild paper code
  • [2014-TPAMI] Word Spotting and Recognition with Embedded Attributes paper homepage code

Datasets

  • MLT 2017 2017

    • 7200 training, 1800 validation images
    • Bounding box, text transcription, and script annotations
    • Task: text detection, script identification
  • COCO-Text (Computer Vision Group, Cornell) 2016

    • 63,686 images, 173,589 text instances, 3 fine-grained text attributes.
    • Task: text location and recognition
    • COCO-Text API
  • Synthetic Word Dataset (Oxford, VGG) 2014

    • 9 million images covering 90k English words
    • Task: text recognition, segmentation
    • download
  • IIIT 5K-Words 2012

    • 5000 images from Scene Texts and born-digital (2k training and 3k testing images)
    • Each image is a cropped word image of scene text with case-insensitive labels
    • Task: text recognition
    • download
  • StanfordSynth(Stanford, AI Group) 2012

    • Small single-character images of 62 characters (0-9, a-z, A-Z)
    • Task: text recognition
    • download
  • MSRA Text Detection 500 Database (MSRA-TD500) 2012

    • 500 natural images(resolutions of the images vary from 1296x864 to 1920x1280)
    • Chinese, English or mixture of both
    • Task: text detection
  • Street View Text (SVT) 2010

    • 350 high resolution images (average size 1260 × 860) (100 images for training and 250 images for testing)
    • Only word level bounding boxes are provided with case-insensitive labels
    • Task: text location
  • KAIST Scene_Text Database 2010

    • 3000 images of indoor and outdoor scenes containing text
    • Korean, English (Number), and Mixed (Korean + English + Number)
    • Task: text location, segmantation and recognition
  • Chars74k 2009

    • Over 74K images from natural images, as well as a set of synthetically generated characters
    • Small single-character images of 62 characters (0-9, a-z, A-Z)
    • Task: text recognition
  • ICDAR Benchmark Datasets

Dataset Discription Competition Paper
ICDAR 2015 1000 training images and 500 testing images paper link
ICDAR 2013 229 training images and 233 testing images paper link
ICDAR 2011 229 training images and 255 testing images paper link
ICDAR 2005 1001 training images and 489 testing images paper link
ICDAR 2003 181 training images and 251 testing images(word level and character level) paper link

Blogs

Owner
CarlosTao
CarlosTao
Perspective recovery of text using transformed ellipses

unproject_text Perspective recovery of text using transformed ellipses. See full writeup at https://mzucker.github.io/2016/10/11/unprojecting-text-wit

Matt Zucker 111 Nov 13, 2022
Characterizing possible failure modes in physics-informed neural networks.

Characterizing possible failure modes in physics-informed neural networks This repository contains the PyTorch source code for the experiments in the

Aditi Krishnapriyan 55 Jan 02, 2023
A simple demo program for using OpenCV on Android

Kivy OpenCV Demo A simple demo program for using OpenCV on Android Build with: buildozer android debug deploy run Run (on desktop) with: python main.p

Andrea Ranieri 13 Dec 29, 2022
Text-to-Image generation

Generate vivid Images for Any (Chinese) text CogView is a pretrained (4B-param) transformer for text-to-image generation in general domain. Read our p

THUDM 1.3k Jan 05, 2023
Autonomous Driving project for Euro Truck Simulator 2

hope-autonomous-driving Autonomous Driving project for Euro Truck Simulator 2 Video: How is it working ? In this video, the program processes the imag

Umut Görkem Kocabaş 36 Nov 06, 2022
Code for the paper "Controllable Video Captioning with an Exemplar Sentence"

SMCG Code for the paper "Controllable Video Captioning with an Exemplar Sentence" Introduction We investigate a novel and challenging task, namely con

10 Dec 04, 2022
📷 This repository is focused on having various feature implementation of OpenCV in Python.

📷 This repository is focused on having various feature implementation of OpenCV in Python. The aim is to have a minimal implementation of all OpenCV features together, under one roof.

Aditya Kumar Gupta 128 Dec 04, 2022
code for our ICCV 2021 paper "DeepCAD: A Deep Generative Network for Computer-Aided Design Models"

DeepCAD This repository provides source code for our paper: DeepCAD: A Deep Generative Network for Computer-Aided Design Models Rundi Wu, Chang Xiao,

Rundi Wu 85 Dec 31, 2022
An application of high resolution GANs to dewarp images of perturbed documents

Docuwarp This project is focused on dewarping document images through the usage of pix2pixHD, a GAN that is useful for general image to image translat

Thomas Huang 97 Dec 25, 2022
Web interface for browsing arXiv papers

Currently, arxivbox considers only major computer vision and machine learning conferences

Ankan Kumar Bhunia 12 Sep 11, 2022
Repository for playing the computer vision apps: People analytics on Raspberry Pi.

play-with-torch Repository for playing the computer vision apps: People analytics on Raspberry Pi. Tools Tested Hardware RasberryPi 4 Model B here, RA

eMHa 1 Sep 23, 2021
In this project we will be using the live feed coming from the webcam to create a virtual mouse with complete functionalities.

Virtual Mouse Using OpenCV In this project we will be using the live feed coming from the webcam to create a virtual mouse using hand tracking. Projec

Hassan Shahzad 8 Dec 20, 2022
Generate a list of papers with publicly available source code in the daily arxiv

2021-06-08 paper code optimal network slicing for service-oriented networks with flexible routing and guaranteed e2e latency networkslicing multi-moda

79 Jan 03, 2023
This repo contains several opencv projects done while learning opencv in python.

opencv-projects-python This repo contains both several opencv projects done while learning opencv by python and opencv learning resources [Basic conce

Fatin Shadab 2 Nov 03, 2022
EQFace: An implementation of EQFace: A Simple Explicit Quality Network for Face Recognition

EQFace: A Simple Explicit Quality Network for Face Recognition The first face recognition network that generates explicit face quality online.

DeepCam Shenzhen 141 Dec 31, 2022
Binarize document images

Binarization Binarization for document images Examples Introduction This tool performs document image binarization (i.e. transform colour/grayscale to

QURATOR-SPK 48 Jan 02, 2023
Opencv-image-filters - A camera to capture videos in real time by placing filters using Python with the help of the Tkinter and OpenCV libraries

Opencv-image-filters - A camera to capture videos in real time by placing filters using Python with the help of the Tkinter and OpenCV libraries

Sergio Díaz Fernández 1 Jan 13, 2022
Convolutional Recurrent Neural Network (CRNN) for image-based sequence recognition.

Convolutional Recurrent Neural Network This software implements the Convolutional Recurrent Neural Network (CRNN), a combination of CNN, RNN and CTC l

Baoguang Shi 2k Dec 31, 2022
Train custom VR face tracking parameters

Pal Buddy Guy: The anipal's best friend This is a small script to improve upon the tracking capabilities of the Vive Pro Eye and facial tracker. You c

7 Dec 12, 2021
Rest API Written In Python To Classify NSFW Images.

✨ NSFW Classifier API ✨ Rest API Written In Python To Classify NSFW Images. Fastest Solution If you don't want to selfhost it, there's already an inst

Akshay Rajput 23 Dec 30, 2022