Document Image Dewarping

Overview

Document image dewarping using text-lines and line Segments

Abstract

Conventional text-line based document dewarping methods have problems when handling complex layout and/or very few text-lines. When there are few aligned text-lines in the image, this usually means that photos, graphics and/or tables take large portion of the input instead. Hence, for the robust document dewarping, we propose to use line segments in the image in addition to the aligned text-lines. Based on the assumption and observation that all the transformed line segments are still straight (line to line mapping), and many of them are horizontally or vertically aligned in the well-rectified images, we encode this properties into the cost function in addition to the text-line based cost. By minimizing the function, we can obtain transformation parameters for camera pose, page curve (extrinsic parameters) and camera focal length (intrinsic parameter), which are used for document rectification. Considering that there are many outliers in line segment directions and missed text-lines in some cases, the overall algorithm is designed in an iterative manner. At each step, we remove text components and line segments that are not well horizontal/vertical aligned, and then minimize the cost function with the updated information. Experimental results show that the proposed method is robust to the variety of page layouts. Moreover, the proposed method can extend to general curves surfaces as well as document.

Algorithm

Two line semgent properties

Straightness property

The straightness property describes the line segments extracted in curved document image, lines on the curved document surface become still straight in the well-rectified domain (Although the lines extracted in the well-rectified image can be curved in the curved document surface). It means that line-to-line mapping. Since the straightness property is always satisfied with all plane to plane mapping, it is not a significant constraint in rectification considering only camera view (such as homography). However we consider page curve as well as camera view in rectification process, then this property becomes an efficient constraint that prevents lines from being curved.

Alignment property

Based on the observation that the majority of line segments are horizontally or vertically aligned in the rectified images.

Outlier removal

The direct optimization of equation may yield poorly rectified results, due to outliers. We treat two outlier types that are missed text-lines and line segments having arbitrary direction (non horizontal/vertical). For the outlier removal, we design an iterative method. At each step, we refine the features (text components and line segments) by removing outlier (that are not well aligned) and minimize the cost function with updated inliers.

Experimental results

CBDAR 2007 dataset

We evaluate our method on the CBDAR 2007 dewarpint contest dataset [http://staffhome.ecm.uwa.edu.au/~00082689/downloads.html], that is consisted of binarized text images.

Input image Kim [2] Proposed

Our document image dataset

In order to consist of non conventional document images (i.e., not text-abundant cases), we collected 100 images having various layouts (e.g., three column documents, documents containing large tables and/or figures, presentation slides, and so on).

Input image Kim [2] Proposed

Our curved image dataset

In order to consist of general curved surface images (such as bottles), we collected 74 images.

Input image Kim [2] Proposed

Executable program

Executable program can be downloaded by below links:

http://ispl.synology.me:8480/sharing/uA2DTRA8U

Reference

[1] Taeho Kil, Wonkyo Seo, Hyung Il Koo and Nam Ik Cho, "Robust Document Image Dewarping Using Text-Line and Line Segments", ICDAR 2017.

[2] Beom Su Kim, Hyung Il Koo, and Nam Ik Cho, "Document Dewarping via Text-line based Optimization", Pattern Recognition 2015.

Owner
Taeho Kil
My Research: Visual-Linguistic Representation, Computer Vision, Image Processing, Deep Learning
Taeho Kil
📷 This repository is focused on having various feature implementation of OpenCV in Python.

📷 This repository is focused on having various feature implementation of OpenCV in Python. The aim is to have a minimal implementation of all OpenCV features together, under one roof.

Aditya Kumar Gupta 128 Dec 04, 2022
Pure Javascript OCR for more than 100 Languages 📖🎉🖥

Version 2 is now available and under development in the master branch, read a story about v2: Why I refactor tesseract.js v2? Check the support/1.x br

Project Naptha 29.2k Jan 05, 2023
Scan the MRZ code of a passport and extract the firstname, lastname, passport number, nationality, date of birth, expiration date and personal numer.

PassportScanner Works with 2 and 3 line identity documents. What is this With PassportScanner you can use your camera to scan the MRZ code of a passpo

Edwin Vermeer 441 Dec 24, 2022
OCR engine for all the languages

Description kraken is a turn-key OCR system optimized for historical and non-Latin script material. kraken's main features are: Fully trainable layout

431 Jan 04, 2023
Virtual Zoom Gesture using OpenCV

Virtual_Zoom_Gesture I have created a virtual zoom gesture where we can Zoom in and Zoom out any image and even we can move that image anywhere on the

Mudit Sinha 2 Dec 26, 2021
PyNeuro is designed to connect NeuroSky's MindWave EEG device to Python and provide Callback functionality to provide data to your application in real time.

PyNeuro PyNeuro is designed to connect NeuroSky's MindWave EEG device to Python and provide Callback functionality to provide data to your application

Zach Wang 45 Dec 30, 2022
Text modding tools for FF7R (Final Fantasy VII Remake)

FF7R_text_mod_tools Subtitle modding tools for FF7R (Final Fantasy VII Remake) There are 3 tools I made. make_dualsub_mod.exe: Merges (or swaps) subti

10 Dec 19, 2022
GDB python tool to pretty print and debug c++ xtensor containers

gdb_xt2np GDB python tool to pretty print, examine, and debug c++ Xtensor containers. Xtensor is a c++ library for scientific computing using multidim

Christopher Burke 4 Oct 29, 2021
TextBoxes: A Fast Text Detector with a Single Deep Neural Network https://github.com/MhLiao/TextBoxes 基于SSD改进的文本检测算法,textBoxes_note记录了之前整理的笔记。

TextBoxes: A Fast Text Detector with a Single Deep Neural Network Introduction This paper presents an end-to-end trainable fast scene text detector, n

zhangjing1 24 Apr 28, 2022
Course material for the Multi-agents and computer graphics course

TC2008B Course material for the Multi-agents and computer graphics course. Setup instructions Strongly recommend using a custom conda environment. Ins

16 Dec 13, 2022
scantailor - Scan Tailor is an interactive post-processing tool for scanned pages.

Scan Tailor - scantailor.org This project is no longer maintained, and has not been maintained for a while. About Scan Tailor is an interactive post-p

1.5k Dec 28, 2022
Fun program to overlay a mask to yourself using a webcam

Superhero Mask Overlay Description Simple project made for fun. It consists of placing a mask (a PNG image with transparent background) on your face.

KB Kwan 10 Dec 01, 2022
Image Smoothing and Blurring Using OpenCV

Image-Smoothing-and-Blurring-Using-OpenCV This repository contains codes for performing image smoothing and blurring using OpenCV. There are different

Happy N. Monday 3 Feb 15, 2022
Converts an image into funny, smaller amongus characters

SussyImage Converts an image into funny, smaller amongus characters Demo Mona Lisa | Lona Misa (Made up of AmongUs characters) API I've also added an

Dhravya Shah 14 Aug 18, 2022
Python package for handwriting and sketching in Jupyter cells

ipysketch A Python package for handwriting and sketching in Jupyter notebooks. Usage A movie is worth a thousand pictures is worth a million words...

Matthias Baer 16 Jan 05, 2023
Python rubik's cube solver

This program makes a 3D representation of a rubiks cube and solves it step by step.

Pablo QB 4 May 29, 2022
Code for generating synthetic text images as described in "Synthetic Data for Text Localisation in Natural Images", Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, CVPR 2016.

SynthText Code for generating synthetic text images as described in "Synthetic Data for Text Localisation in Natural Images", Ankush Gupta, Andrea Ved

Ankush Gupta 1.8k Dec 28, 2022
Use Youdao OCR API to covert your clipboard image to text.

Alfred Clipboard OCR 注:本仓库基于 oott123/alfred-clipboard-ocr 的逻辑用 Python 重写,换用了有道 AI 的 API,准确率更高,有效防止百度导致隐私泄露等问题,并且有道 AI 初始提供的 50 元体验金对于其资费而言个人用户基本可以永久使用

Junlin Liu 6 Sep 19, 2022
Maze generator and solver with python

Procedural-Maze-Generator-Algorithms Check out my youtube channel : Auctux Ressources Thanks to Jamis Buck Book : Mazes for programmers Requirements P

Joseph 19 Dec 07, 2022
The open source extract transaction infomation by using OCR.

Transaction OCR Mã nguồn trích xuất thông tin transaction từ file scaned pdf, ở đây tôi lựa chọn tài liệu sao kê công khai của Thuy Tien. Mã nguồn có

Nguyen Xuan Hung 18 Jun 02, 2022