document image degradation

Overview

ocrodeg

The ocrodeg package is a small Python library implementing document image degradation for data augmentation for handwriting recognition and OCR applications.

The following illustrates the kinds of degradations available from ocrodeg.

%pylab inline
Populating the interactive namespace from numpy and matplotlib
rc("image", cmap="gray", interpolation="bicubic")
figsize(10, 10)
import scipy.ndimage as ndi
import ocrodeg

image = imread("testdata/W1P0.png")
imshow(image)
<matplotlib.image.AxesImage at 0x7fabcc7ab390>

png

PAGE ROTATION

This is just for illustration; for large page rotations, you can just use ndimage.

for i, angle in enumerate([0, 90, 180, 270]):
    subplot(2, 2, i+1)
    imshow(ndi.rotate(image, angle))

png

RANDOM GEOMETRIC TRANSFORMATIONS

random_transform generates random transformation parameters that work reasonably well for document image degradation. You can override the ranges used by each of these parameters by keyword arguments.

ocrodeg.random_transform()
{'angle': -0.016783842893063807,
 'aniso': 0.805280370671964,
 'scale': 0.9709145529604223,
 'translation': (0.014319657859164045, 0.03676897986267606)}

Here are four samples generated by random transforms.

for i in xrange(4):
    subplot(2, 2, i+1)
    imshow(ocrodeg.transform_image(image, **ocrodeg.random_transform()))

png

You can use transform_image directly with the different parameters to get a feel for the ranges and effects of these parameters.

for i, angle in enumerate([-2, -1, 0, 1]):
    subplot(2, 2, i+1)
    imshow(ocrodeg.transform_image(image, angle=angle*pi/180))

png

for i, angle in enumerate([-2, -1, 0, 1]):
    subplot(2, 2, i+1)
    imshow(ocrodeg.transform_image(image, angle=angle*pi/180)[1000:1500, 750:1250])

png

for i, aniso in enumerate([0.5, 1.0, 1.5, 2.0]):
    subplot(2, 2, i+1)
    imshow(ocrodeg.transform_image(image, aniso=aniso))

png

for i, aniso in enumerate([0.5, 1.0, 1.5, 2.0]):
    subplot(2, 2, i+1)
    imshow(ocrodeg.transform_image(image, aniso=aniso)[1000:1500, 750:1250])

png

for i, scale in enumerate([0.5, 0.9, 1.0, 2.0]):
    subplot(2, 2, i+1)
    imshow(ocrodeg.transform_image(image, scale=scale))

png

for i, scale in enumerate([0.5, 0.9, 1.0, 2.0]):
    subplot(2, 2, i+1)
    h, w = image.shape
    imshow(ocrodeg.transform_image(image, scale=scale)[h//2-200:h//2+200, w//3-200:w//3+200])

png

RANDOM DISTORTIONS

Pages often also have a small degree of warping. This can be modeled by random distortions. Very small and noisy random distortions also model ink spread, while large 1D random distortions model paper curl.

for i, sigma in enumerate([1.0, 2.0, 5.0, 20.0]):
    subplot(2, 2, i+1)
    noise = ocrodeg.bounded_gaussian_noise(image.shape, sigma, 5.0)
    distorted = ocrodeg.distort_with_noise(image, noise)
    h, w = image.shape
    imshow(distorted[h//2-200:h//2+200, w//3-200:w//3+200])

png

RULED SURFACE DISTORTIONS

for i, mag in enumerate([5.0, 20.0, 100.0, 200.0]):
    subplot(2, 2, i+1)
    noise = ocrodeg.noise_distort1d(image.shape, magnitude=mag)
    distorted = ocrodeg.distort_with_noise(image, noise)
    h, w = image.shape
    imshow(distorted[:1500])

png

BLUR, THRESHOLDING, NOISE

There are a range of utilities for modeling imaging artifacts: blurring, noise, inkspread.

patch = image[1900:2156, 1000:1256]
imshow(patch)
<matplotlib.image.AxesImage at 0x7fabc88c7e10>

png

for i, s in enumerate([0, 1, 2, 4]):
    subplot(2, 2, i+1)
    blurred = ndi.gaussian_filter(patch, s)
    imshow(blurred)

png

for i, s in enumerate([0, 1, 2, 4]):
    subplot(2, 2, i+1)
    blurred = ndi.gaussian_filter(patch, s)
    thresholded = 1.0*(blurred>0.5)
    imshow(thresholded)

png

reload(ocrodeg)
for i, s in enumerate([0.0, 1.0, 2.0, 4.0]):
    subplot(2, 2, i+1)
    blurred = ocrodeg.binary_blur(patch, s)
    imshow(blurred)

png

for i, s in enumerate([0.0, 0.1, 0.2, 0.3]):
    subplot(2, 2, i+1)
    blurred = ocrodeg.binary_blur(patch, 2.0, noise=s)
    imshow(blurred)

png

MULTISCALE NOISE

reload(ocrodeg)
for i in range(4):
    noisy = ocrodeg.make_multiscale_noise_uniform((512, 512))
    subplot(2, 2, i+1); imshow(noisy, vmin=0, vmax=1)

png

RANDOM BLOBS

for i, s in enumerate([2, 5, 10, 20]):
    subplot(2, 2, i+1)
    imshow(ocrodeg.random_blobs(patch.shape, 3e-4, s))

png

reload(ocrodeg)
blotched = ocrodeg.random_blotches(patch, 3e-4, 1e-4)
#blotched = minimum(maximum(patch, ocrodeg.random_blobs(patch.shape, 30, 10)), 1-ocrodeg.random_blobs(patch.shape, 15, 8))
subplot(121); imshow(patch); subplot(122); imshow(blotched)
<matplotlib.image.AxesImage at 0x7fabc8a35490>

png

FIBROUS NOISE

imshow(ocrodeg.make_fibrous_image((256, 256), 700, 300, 0.01))
<matplotlib.image.AxesImage at 0x7fabc8852450>

png

FOREGROUND / BACKGROUND SELECTION

subplot(121); imshow(patch); subplot(122); imshow(ocrodeg.printlike_multiscale(patch))
<matplotlib.image.AxesImage at 0x7fabc8676d90>

png

subplot(121); imshow(patch); subplot(122); imshow(ocrodeg.printlike_fibrous(patch))
<matplotlib.image.AxesImage at 0x7fabc8d1b250>

png

Owner
NVIDIA Research Projects
NVIDIA Research Projects
Satoshi is a discord bot template in python using discord.py that allow you to track some live crypto prices with your own discord bot.

Satoshi ~ DiscordCryptoBot Satoshi is a simple python discord bot using discord.py that allow you to track your favorites cryptos prices with your own

Théo 2 Sep 15, 2022
Text modding tools for FF7R (Final Fantasy VII Remake)

FF7R_text_mod_tools Subtitle modding tools for FF7R (Final Fantasy VII Remake) There are 3 tools I made. make_dualsub_mod.exe: Merges (or swaps) subti

10 Dec 19, 2022
Camera Intrinsic Calibration and Hand-Eye Calibration in Pybullet

This repository is mainly for camera intrinsic calibration and hand-eye calibration. Synthetic experiments are conducted in PyBullet simulator. 1. Tes

CAI Junhao 7 Oct 03, 2022
RRD: Rotation-Sensitive Regression for Oriented Scene Text Detection

RRD: Rotation-Sensitive Regression for Oriented Scene Text Detection For more details, please refer to our paper. Citing Please cite the related works

Minghui Liao 102 Jun 29, 2022
A curated list of resources dedicated to scene text localization and recognition

Scene Text Localization & Recognition Resources A curated list of resources dedicated to scene text localization and recognition. Any suggestions and

CarlosTao 1.6k Dec 22, 2022
Python Computer Vision application that allows users to draw/erase on the screen using their webcam.

CV-Virtual-WhiteBoard The Virtual WhiteBoard is a project I made using the OpenCV and Mediapipe Python libraries. Using your index and middle finger y

Stephen Wang 1 Jan 07, 2022
Some bits of javascript to transcribe scanned pages using PageXML

nashi (nasḫī) Some bits of javascript to transcribe scanned pages using PageXML. Both ltr and rtl languages are supported. Try it! But wait, there's m

Andreas Büttner 15 Nov 09, 2022
零样本学习测评基准,中文版

ZeroCLUE 零样本学习测评基准,中文版 零样本学习是AI识别方法之一。 简单来说就是识别从未见过的数据类别,即训练的分类器不仅仅能够识别出训练集中已有的数据类别, 还可以对于来自未见过的类别的数据进行区分。 这是一个很有用的功能,使得计算机能够具有知识迁移的能力,并无需任何训练数据, 很符合现

CLUE benchmark 27 Dec 10, 2022
Motion detector, Full body detection, Upper body detection, Cat face detection, Smile detection, Face detection (haar cascade), Silverware detection, Face detection (lbp), and Sending email notifications

Security camera running OpenCV for object and motion detection. The camera will send email with image of any objects it detects. It also runs a server that provides web interface with live stream vid

Peace 10 Jun 30, 2021
Extract tables from scanned image PDFs using Optical Character Recognition.

ocr-table This project aims to extract tables from scanned image PDFs using Optical Character Recognition. Install Requirements Tesseract OCR sudo apt

Abhijeet Singh 209 Dec 06, 2022
EAST for ICPR MTWI 2018 Challenge II (Text detection of network images)

EAST_ICPR2018: EAST for ICPR MTWI 2018 Challenge II (Text detection of network images) Introduction This is a repository forked from argman/EAST for t

QichaoWu 49 Dec 24, 2022
Textboxes : Image Text Detection Model : python package (tensorflow)

shinTB Abstract A python package for use Textboxes : Image Text Detection Model implemented by tensorflow, cv2 Textboxes Paper Review in Korean (My Bl

Jayne Shin (신재인) 91 Dec 15, 2022
Basic functions manipulating images using the OpenCV library

OpenCV Basic functions manipulating images using the OpenCV library. Reading Ima

Shatha Siala 3 Feb 17, 2022
Using computer vision method to recognize and calcutate the features of the architecture.

building-feature-recognition In this repository, we accomplished building feature recognition using traditional/dl-assisted computer vision method. Th

4 Aug 11, 2022
Um RPG de texto orientado a objetos.

RPG de texto Um RPG de texto orientado a objetos, sem história. Um RPG (Role-playing game) baseado em texto em que você pode viajar para alguns locais

Vinicius 3 Oct 05, 2022
Scale-aware Automatic Augmentation for Object Detection (CVPR 2021)

SA-AutoAug Scale-aware Automatic Augmentation for Object Detection Yukang Chen, Yanwei Li, Tao Kong, Lu Qi, Ruihang Chu, Lei Li, Jiaya Jia [Paper] [Bi

Jia Research Lab 182 Dec 29, 2022
Hand gesture detection project with aweome UI implementation.

an awesome hand gesture detection project for you to be creative! Imagination is the limit to do with this project.

AR Ashraf 39 Sep 26, 2022
Using Opencv ,based on Augmental Reality(AR) and will show the feature matching of image and then by finding its matching

Using Opencv ,this project is based on Augmental Reality(AR) and will show the feature matching of image and then by finding its matching ,it will just mask that image . This project ,if used in cctv

1 Feb 13, 2022
📷 Face Recognition using Haar-Cascade Classifier, OpenCV, and Python

Face-Recognition-System Face Recognition using Haar-Cascade Classifier, OpenCV and Python. This project is based on face detection and face recognitio

1 Jan 10, 2022
A simple python program to record security cam footage by detecting a face and body of a person in the frame.

SecurityCam A simple python program to record security cam footage by detecting a face and body of a person in the frame. This code was created by me,

1 Nov 08, 2021