~1000 book pages + OpenCV + python = page regions identified as paragraphs, lines, images, captions, etc.

Overview

cosc428-structor

I had an open-ended Computer Vision assignment to complete, and an out-of-copyright book that I wanted to turn into an ebook. Conventional OCR engines like Tesseract weren't able to accurately recognise the page structure, which led to many transcription errors. If I could tell Tesseract to ignore certain regions (like images or repeated headers), then I could greatly reduce the number of errors in the resulting ebook. Thus: for my assignment, I wrote a program that takes an image and uses computer vision magick to determine the page's structure. So far, my program can detect and locate:

  • lines of text,
  • paragraphs,
  • section titles,
  • images and their associated captions,
  • boilerplate like page numbers, and
  • chapter titles.

Ain't it grand?

Dependencies

The project is written in Python 2.7.3 and uses the cv2 library for interacting with openCV. It also uses numpy for some of the mathematical operations. On windows, the best way to get these dependencies is to install the Python(x,y) suite (https://code.google.com/p/pythonxy/), which combines python with a customisable set of scientific computing libraries.

Program Structure

The program's root is main.py, but this simply iterates through images in a folder and constructs a Page instance from each image. Thus, the real work happens in page.py.

page.py contains a few utility methods and the Page class. The constructor calls the appropriate methods in order to determine the logical structure of the page. This structure is stored in three objects: self.margin, self.content, and self.boilerplate (which contains such non-content text objects as the page number and header).

The getBuildingBlocks method is responsible for finding words, grouping words into textual lines, discarding marginal noise, and fitting a Margin instance around the remaining lines. Most of these tasks are preformed by calling other functions.

The self.content object is found by passing the set of lines to the Content() constructor. This uses a state machine to group lines into figures, paragraphs, section titles, etc. The Content class, along with a class for each content type, is found in content.py.

The other files can generally be ignored when trying to understand the program; they are largely just convenience classes which represent page elements (such as points, geometric lines, words, text lines, and boxes), as well as supporting tools such as the Stopwatch.

How to Run the Code

Run main.py using the python interpreter. This will process each page in ./images, and for each page a series of 'snapshot' images will be displayed in order to illustrate the algorithm. To show only the final result for each image, set showSteps in main.py to False.

You might also like...
Basic functions manipulating images using the OpenCV library

OpenCV Basic functions manipulating images using the OpenCV library. Reading Ima

Some bits of javascript to transcribe scanned pages using PageXML

nashi (nasḫī) Some bits of javascript to transcribe scanned pages using PageXML. Both ltr and rtl languages are supported. Try it! But wait, there's m

scantailor - Scan Tailor is an interactive post-processing tool for scanned pages.
scantailor - Scan Tailor is an interactive post-processing tool for scanned pages.

Scan Tailor - scantailor.org This project is no longer maintained, and has not been maintained for a while. About Scan Tailor is an interactive post-p

Text page dewarping using a "cubic sheet" model

page_dewarp Page dewarping and thresholding using a "cubic sheet" model - see full writeup at https://mzucker.github.io/2016/08/15/page-dewarping.html

Deep learning based page layout analysis
Deep learning based page layout analysis

Deep Learning Based Page Layout Analyze This is a Python implementaion of page layout analyze tool. The goal of page layout analyze is to segment page

ocroseg - This is a deep learning model for page layout analysis / segmentation.
ocroseg - This is a deep learning model for page layout analysis / segmentation.

ocroseg This is a deep learning model for page layout analysis / segmentation. There are many different ways in which you can train and run it, but by

a deep learning model for page layout analysis / segmentation.
a deep learning model for page layout analysis / segmentation.

OCR Segmentation a deep learning model for page layout analysis / segmentation. dependencies tensorflow1.8 python3 dataset: uw3-framed-lines-degraded-

OCR-D-compliant page segmentation

ocrd_segment This repository aims to provide a number of OCR-D-compliant processors for layout analysis and evaluation. Installation In your virtual e

This repository lets you train neural networks models for performing end-to-end full-page handwriting recognition using the Apache MXNet deep learning frameworks on the IAM Dataset.
This repository lets you train neural networks models for performing end-to-end full-page handwriting recognition using the Apache MXNet deep learning frameworks on the IAM Dataset.

Handwritten Text Recognition (OCR) with MXNet Gluon These notebooks have been created by Jonathan Chung, as part of his internship as Applied Scientis

Comments
  • The getBuildingBlocks

    The getBuildingBlocks

    Hello, Recently, I have some task about the document layout analysis. The description in "README.md" is very consistent with my mission. But when I try to run the code as README.md: How to Run the Code, there just some red line in each dobule word and have no resault of the detect and locate of "line of text", "paragraphs", "section titles" , etc. So I want to know what has happend to the code. Very thankful

    opened by lvbohui 3
Releases(v1.0)
Owner
Chad Oliver
Chad Oliver
Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)

English | 简体中文 Introduction PaddleOCR aims to create multilingual, awesome, leading, and practical OCR tools that help users train better models and a

27.5k Jan 08, 2023
The virtual calculator will be above the live streaming from your camera

The virtual calculator is above the live streaming from my camera usb , the program first detect my hand and in each frame calculate the distance between two finger ,if the distance is lower than the

gasbaoui mohammed al amine 5 Jul 01, 2022
Packaged, Pytorch-based, easy to use, cross-platform version of the CRAFT text detector

CRAFT: Character-Region Awareness For Text detection Packaged, Pytorch-based, easy to use, cross-platform version of the CRAFT text detector | Paper |

188 Dec 28, 2022
PyNeuro is designed to connect NeuroSky's MindWave EEG device to Python and provide Callback functionality to provide data to your application in real time.

PyNeuro PyNeuro is designed to connect NeuroSky's MindWave EEG device to Python and provide Callback functionality to provide data to your application

Zach Wang 45 Dec 30, 2022
(CVPR 2021) Back-tracing Representative Points for Voting-based 3D Object Detection in Point Clouds

BRNet Introduction This is a release of the code of our paper Back-tracing Representative Points for Voting-based 3D Object Detection in Point Clouds,

86 Oct 05, 2022
TableBank: A Benchmark Dataset for Table Detection and Recognition

TableBank TableBank is a new image-based table detection and recognition dataset built with novel weak supervision from Word and Latex documents on th

844 Jan 04, 2023
A curated list of resources dedicated to scene text localization and recognition

Scene Text Localization & Recognition Resources A curated list of resources dedicated to scene text localization and recognition. Any suggestions and

CarlosTao 1.6k Dec 22, 2022
The first open-source library that detects the font of a text in a image.

Typefont Typefont is an experimental library that detects the font of a text in a image. Usage Import the main function and invoke it like in the foll

Vasile Pește 1.6k Feb 24, 2022
Read Japanese manga inside browser with selectable text.

mokuro Read Japanese manga with selectable text inside a browser. See demo: https://kha-white.github.io/manga-demo mokuro_demo.mp4 Demo contains excer

Maciej Budyś 170 Dec 27, 2022
A version of nrsc5-gui that merges the interface developed by cmnybo with the architecture developed by zefie in order to start a new baseline that is not heavily dependent upon Python processing.

NRSC5-DUI is a graphical interface for nrsc5. It makes it easy to play your favorite FM HD radio stations using an RTL-SDR dongle. It will also displa

61 Dec 22, 2022
An Implementation of the alogrithm in paper IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Oriented Scene Text Detection

InceptText-Tensorflow An Implementation of the alogrithm in paper IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Orien

GeorgeJoe 115 Dec 12, 2022
A toolbox of scene text detection and recognition

FudanOCR This toolbox contains the implementations of the following papers: Scene Text Telescope: Text-Focused Scene Image Super-Resolution [Chen et a

FudanVIC Team 170 Dec 26, 2022
Distilling Knowledge via Knowledge Review, CVPR 2021

ReviewKD Distilling Knowledge via Knowledge Review Pengguang Chen, Shu Liu, Hengshuang Zhao, Jiaya Jia This project provides an implementation for the

DV Lab 194 Dec 28, 2022
Captcha Recognition

The objective of this project is to recognize the target numbers in the captcha images correctly which would tell us how good or bad a captcha system has been built.

Mohit Kaushik 5 Feb 20, 2022
Layout Analysis Evaluator for the ICDAR 2017 competition on Layout Analysis for Challenging Medieval Manuscripts

LayoutAnalysisEvaluator Layout Analysis Evaluator for: ICDAR 2019 Historical Document Reading Challenge on Large Structured Chinese Family Records ICD

17 Dec 08, 2022
nofacedb/faceprocessor is a face recognition engine for NoFaceDB program complex.

faceprocessor nofacedb/faceprocessor is a face recognition engine for NoFaceDB program complex. Tech faceprocessor uses a number of open source projec

NoFaceDB 3 Sep 06, 2021
A set of workflows for corpus building through OCR, post-correction and normalisation

PICCL: Philosophical Integrator of Computational and Corpus Libraries PICCL offers a workflow for corpus building and builds on a variety of tools. Th

Language Machines 41 Dec 27, 2022
Web interface for browsing arXiv papers

Currently, arxivbox considers only major computer vision and machine learning conferences

Ankan Kumar Bhunia 12 Sep 11, 2022
A program that takes in the hand gesture displayed by the user and translates ASL.

Interactive-ASL-Recognition Using the framework mediapipe made by google, OpenCV library and through self teaching, I was able to create a program tha

Riddhi Bajaj 3 Nov 22, 2021
かの有名なあの東方二次創作ソング、「bad apple!」のMVをPythonでやってみたって話

bad apple!! 内容 このプログラムは、bad apple!(feat. nomico)のPVをPythonを用いて再現しよう!という内容です。 実はYoutube並びにGithub上に似たようなプログラムがあったしなんならそっちの方が結構良かったりするんですが、一応公開しますw 使い方 こ

赤紫 8 Jan 05, 2023