Detect handwritten words in a text-line (classic image processing method).

Overview

Word segmentation

Implementation of scale space technique for word segmentation as proposed by R. Manmatha and N. Srimal. Even though the paper is from 1999, the method still achieves good results, is fast, and is easy to implement. The algorithm takes an image of a line as input and outputs the segmented words.

example

Run demo

Go to the src/ directory and run the script python main.py. The images from the data/ directory (taken from IAM dataset) are segmented into words and the results are saved to the out/ directory.

Documentation

An anisotropic filter kernel is applied to the input image to create blobs corresponding to words. After thresholding the blob-image, connected components are extracted which correspond to words.

Parameters

Most of the parameters of the function wordSegmentation deal with the shape of the filter kernel:

  • img: grayscale uint8 image of the text-line to be segmented.
  • kernelSize: size of filter kernel, must be an odd integer.
  • sigma: standard deviation of Gaussian function used for filter kernel.
  • theta: approximated width/height ratio of words, filter function is distorted by this factor.
  • minArea: ignore word candidates smaller than specified area.

The function prepareImg can be used to convert the input image to grayscale and to resize it to a fixed height:

  • img: input image.
  • height: image will be resized to fit specified height.

Algorithm

The illustration below shows how the algorithm works:

  • top left: input image.
  • top right: filter kernel is applied.
  • bottom left: blob image after thresholding.
  • bottom right: bounding boxes around words in original image.

illustration

Results

This algorithm gives good results on datasets with large inter-word-distances and small intra-word-distances like IAM. However, for historical datasets like Bentham or Ratsprotokolle results are not very good and more complex approaches should be used instead (e.g., a neural network based approach as implemented in the WordDetectorNN repository).

Owner
Harald Scheidl
Interested in computer vision, deep learning, C++ and Python.
Harald Scheidl
Erosion and dialation using structure element in OpenCV python

Erosion and dialation using structure element in OpenCV python

Tamzid hasan 2 Nov 11, 2021

Installations for running keras-theano on GPU Upgrade pip and install opencv2 cd ~ pip install --upgrade pip pip install opencv-python Upgrade keras

Berat Kurar Barakat 14 Sep 30, 2022
An interactive document scanner built in Python using OpenCV

The scanner takes a poorly scanned image, finds the corners of the document, applies the perspective transformation to get a top-down view of the document, sharpens the image, and applies an adaptive

Kushal Shingote 1 Feb 12, 2022
A Python wrapper for the tesseract-ocr API

tesserocr A simple, Pillow-friendly, wrapper around the tesseract-ocr API for Optical Character Recognition (OCR). tesserocr integrates directly with

Fayez 1.7k Dec 31, 2022
Perspective recovery of text using transformed ellipses

unproject_text Perspective recovery of text using transformed ellipses. See full writeup at https://mzucker.github.io/2016/10/11/unprojecting-text-wit

Matt Zucker 111 Nov 13, 2022
Detect textlines in document images

Textline Detection Detect textlines in document images Introduction This tool performs border, region and textline detection from document image data

QURATOR-SPK 70 Jun 30, 2022
Connect Aseprite to Blender for painting pixelart textures in real time

Pribambase Pribambase is a small tool that connects Aseprite and Blender, to allow painting with instant viewport feedback and all functionality of ex

117 Jan 03, 2023
基于openpose和图像分类的手语识别项目

手语识别 0、使用到的模型 (1). openpose,作者:CMU-Perceptual-Computing-Lab https://github.com/CMU-Perceptual-Computing-Lab/openpose (2). 图像分类classification,作者:Bubbl

20 Dec 15, 2022
Multi-choice answer sheet correction system using computer vision with opencv & python.

Multi choice answer correction 🔴 5 answer sheet samples with a specific solution for detecting answers and sheet correction. 🔴 By running the soluti

Reza Firouzi 7 Mar 07, 2022
Machine Leaning applied to denoise images to improve OCR Accuracy

Machine Learning to Denoise Images for Better OCR Accuracy This project is an adaptation of this tutorial and used only for learning purposes: https:/

Antonio Bri Pérez 2 Nov 16, 2022
OpenGait is a flexible and extensible gait recognition project

A flexible and extensible framework for gait recognition. You can focus on designing your own models and comparing with state-of-the-arts easily with the help of OpenGait.

Shiqi Yu 335 Dec 22, 2022
Um simples projeto para fazer o reconhecimento do captcha usado pelo jogo bombcrypto

CaptchaSolver - LEIA ISSO 😓 Para iniciar o codigo: pip install -r requirements.txt python captcha_solver.py Se você deseja pegar ver o resultado das

Kawanderson 50 Mar 21, 2022
SceneCollisionNet This repo contains the code for "Object Rearrangement Using Learned Implicit Collision Functions", an ICRA 2021 paper. For more info

SceneCollisionNet This repo contains the code for "Object Rearrangement Using Learned Implicit Collision Functions", an ICRA 2021 paper. For more info

NVIDIA Research Projects 31 Nov 22, 2022
TextField: Learning A Deep Direction Field for Irregular Scene Text Detection (TIP 2019)

TextField: Learning A Deep Direction Field for Irregular Scene Text Detection Introduction The code and trained models of: TextField: Learning A Deep

Yukang Wang 101 Dec 12, 2022
A python script based on opencv and paddleocr, which can automatically pick up tasks, make cookies, and receive rewards in the Destiny 2 Dawning Oven

A python script based on opencv and paddleocr, which can automatically pick up tasks, make cookies, and receive rewards in the Destiny 2 Dawning Oven

1 Dec 22, 2021
Image processing using OpenCv

Image processing using OpenCv Write a program that opens the webcam, and the user selects one of the following on the video: ✅ If the user presses the

M.Najafi 4 Feb 18, 2022
An unofficial implementation of the paper "AutoVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss".

AutoVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss This is an unofficial implementation of AutoVC based on the official one. The reposi

Chien-yu Huang 27 Jun 16, 2022
A toolbox of scene text detection and recognition

FudanOCR This toolbox contains the implementations of the following papers: Scene Text Telescope: Text-Focused Scene Image Super-Resolution [Chen et a

FudanVIC Team 170 Dec 26, 2022
virtual mouse which can copy files, close tabs and many other features !

AI Virtual Mouse Controller Developed an AI-based system to control the mouse cursor using Python and OpenCV with the real-time camera. Fingertip loca

Diwas Pandey 23 Oct 05, 2021
Write-ups for the SwissHackingChallenge2021 CTF.

SwissHackingChallenge 2021 : Write-ups This repository contains a collection of my write-ups for challenges solved during the SwissHackingChallenge (S

Julien Béguin 3 Jun 07, 2021