Captcha Recognition

Overview

Captcha Recognition

Problem Definition

CAPTCHA (Completely Automated Public Turing Test to tell Computers and Humans Apart) is an automated test created to prevent websites from being repeatedly accessed by an automatic program in a short period of time and wasting network resources. Among all the CAPTCHAs, commonly used types contain low resolution, deformed characters with character adhesions and background noise, which the user must read and type correctly into an input box. This is a relatively simple task for humans, taking an average of 10 seconds to solve, but it presents a difficulty for computers, because such noise makes it difficult for a program to differentiate the characters from them. The main objective of this project is to recognize the target numbers in the captcha images correctly.

The mainstream CAPTCHA is based on visual representation, including images such as letters and text. Traditional CAPTCHA recognition includes three steps: image pre-processing, character segmentation, and character recognition. Traditional methods have generalization capabilities and robustness for different types of CAPTCHA. The stickiness is poor. As a kind of deep neural network, convolutional neural network has shown excellent performance in the field of image recognition, and it is much better than traditional machine learning methods. Compared with traditional methods, the main advantage of CNN lies in the convolutional layer in which the extracted image features have strong expressive ability, avoiding the problems of data pre-processing and artificial design features in traditional recognition technology. Although CNN has achieved certain results, the recognition effect of complex CAPTCHA is insufficient

Dataset

The dataset contains CAPTCHA images. The images are 5 letter words, and have noise applied (blur and a line). They are of size 200 x 50. The file name is same as the image letters.
Link for the dataset: https://www.kaggle.com/fournierp/captcha-version-2-images

image

Image Pre-Processing

Three transformations have been applied to the data:

  1. Adaptive Thresholding
  2. Morphological transformations
  3. Gaussian blurring

Adaptive Thresholding

Thresholding is the process of converting a grayscale image to a binary image (an image that contains only black and white pixels). This process is explained in the steps below: • A threshold value is determined according to the requirements (Say 128). • The pixels of the grayscale image with values greater than the threshold (>128) are replaced with pixels of maximum pixel value(255). • The pixels of the grayscale image with values lesser than the threshold (<128) are replaced with pixels of minimum pixel value(0). But this method doesn’t perform well on all images, especially when the image has different lighting conditions in different areas. In such cases, we go for adaptive thresholding. In adaptive thresholding the threshold value for each pixel is determined individually based on a small region around it. Thus we get different thresholds for different regions of the image and so this method performs well on images with varying illumination.

The steps involved in calculating the pixel value for each of the pixels in the thresholded image are as follows: • The threshold value T(x,y) is calculated by taking the mean of the blockSize×blockSize neighborhood of (x,y) and subtracting it by C (Constant subtracted from the mean or weighted mean). • Then depending on the threshold type passed, either one of the following operations in the below image is performed:

image

OpenCV provides us the adaptive threshold function to perform adaptive thresholding : Thres_img=cv.adaptiveThreshold ( src, maxValue, adaptiveMethod, thresholdType, blockSize, C) Image after applying adaptive thresholding :

image

Morphological Transformations

Morphological transformations are some simple operations based on the image shape. It is normally performed on binary images. Two basic morphological operators are Erosion and Dilation. Then its variant forms like Opening, Closing, Gradient etc also comes into play. For this project I have used its variant form closing, closing is a dilation followed by an erosion. As the name suggests, a closing is used to close holes inside of objects or for connecting components together. An erosion in an image “erodes” the foreground object and makes it smaller. A foreground pixel in the input image will be kept only if all pixels inside the structuring element are > 0. Otherwise, the pixels are set to 0 (i.e., background). Erosion is useful for removing small blobs in an image or disconnecting two connected objects. The opposite of an erosion is a dilation. Just like an erosion will eat away at the foreground pixels, a dilation will grow the foreground pixels. Dilations increase the size of foreground objects and are especially useful for joining broken parts of an image together. Performing the closing operation is again accomplished by making a call to cv2.morphologyEx, but this time we are going to indicate that our morphological operation is a closing by specifying the cv2.MORPH_CLOSE. Image after applying morphological transformation:

image

Gaussian Blurring

Gaussian smoothing is used to remove noise that approximately follows a Gaussian distribution. The end result is that our image is less blurred, but more “naturally blurred,” than using the average in average blurring. Furthermore, based on this weighting we’ll be able to preserve more of the edges in our image as compared to average smoothing. Gaussian blurring is similar to average blurring, but instead of using a simple mean, we are now using a weighted mean, where neighbourhood pixels that are closer to the central pixel contribute more “weight” to the average. Gaussian smoothing uses a kernel of M X N, where both M and N are odd integers. Image after applying Gaussian blurring:

image

After applying all these image pre-processing techniques, images have been converted into n-dimension array

image

Further 2 more transformations have been applied on this n-dimensional array. The pixel values initially range from 0-255. They are first brought to 0-1 range by dividing all pixel values by 255. Then, they are normalized. Then, the data is shuffled and splitted into training and validation sets. Since the number of samples is not big enough and in deep learning we need large amounts of data and in some cases it is not feasible to collect thousands or millions of images, so data augmentation comes to the rescue. Data Augmentation is a technique that can be used to artificially expand the size of a training set by creating modified data from the existing one. It is a good practice to use data augmentation if you want to prevent overfitting, or the initial dataset is too small to train on, or even if you want to squeeze better performance from your model. In general, data augmentation is frequently used when building a deep learning model. To augment images when using Keras as our deep learning framework we can use ImageDataGenerator (tf.keras.preprocessing.image.ImageDataGenerator) that generates batches of tensor images with real-time data augmentation.

image

image

Testing

A helper function has been made to test the model on test data in which image pre-processing and transformations have been applied to get the final output

image

Result

The model achieves:

  1. Accuracy = 89.13%
  2. Precision = 91%
  3. Recall = 90%
  4. F1-score= 90%

Below is the full report:

image

Owner
Mohit Kaushik
Mohit Kaushik
Run tesseract with the tesserocr bindings with @OCR-D's interfaces

ocrd_tesserocr Crop, deskew, segment into regions / tables / lines / words, or recognize with tesserocr Introduction This package offers OCR-D complia

OCR-D 38 Oct 14, 2022
A Tensorflow model for text recognition (CNN + seq2seq with visual attention) available as a Python package and compatible with Google Cloud ML Engine.

Attention-based OCR Visual attention-based OCR model for image recognition with additional tools for creating TFRecords datasets and exporting the tra

Ed Medvedev 933 Dec 29, 2022
Python tool that takes the OCR.space JSON output as input and draws a text overlay on top of the image.

OCR.space OCR Result Checker = Draw OCR overlay on top of image Python tool that takes the OCR.space JSON output as input, and draws an overlay on to

a9t9 4 Oct 18, 2022
text detection mainly based on ctpn model in tensorflow, id card detect, connectionist text proposal network

text-detection-ctpn Scene text detection based on ctpn (connectionist text proposal network). It is implemented in tensorflow. The origin paper can be

Shaohui Ruan 3.3k Dec 30, 2022
SCOUTER: Slot Attention-based Classifier for Explainable Image Recognition

SCOUTER: Slot Attention-based Classifier for Explainable Image Recognition PDF Abstract Explainable artificial intelligence has been gaining attention

87 Dec 26, 2022
Code release for Hu et al., Learning to Segment Every Thing. in CVPR, 2018.

Learning to Segment Every Thing This repository contains the code for the following paper: R. Hu, P. Dollár, K. He, T. Darrell, R. Girshick, Learning

Ronghang Hu 417 Oct 03, 2022
This repo contains several opencv projects done while learning opencv in python.

opencv-projects-python This repo contains both several opencv projects done while learning opencv by python and opencv learning resources [Basic conce

Fatin Shadab 2 Nov 03, 2022
This is the implementation of the paper "Gated Recurrent Convolution Neural Network for OCR"

Gated Recurrent Convolution Neural Network for OCR This project is an implementation of the GRCNN for OCR. For details, please refer to the paper: htt

90 Dec 22, 2022
This repository lets you train neural networks models for performing end-to-end full-page handwriting recognition using the Apache MXNet deep learning frameworks on the IAM Dataset.

Handwritten Text Recognition (OCR) with MXNet Gluon These notebooks have been created by Jonathan Chung, as part of his internship as Applied Scientis

Amazon Web Services - Labs 422 Jan 03, 2023
Natural language detection

Detect the language of text. What’s so cool about franc? franc can support more languages(†) than any other library franc is packaged with support for

Titus 3.8k Jan 02, 2023
Responsive Doc. scanner using U^2-Net, Textcleaner and Tesseract

Responsive Doc. scanner using U^2-Net, Textcleaner and Tesseract Toolset U^2-Net is used for background removal Textcleaner is used for image cleaning

3 Jul 13, 2022
Handwritten Text Recognition (HTR) system implemented with TensorFlow.

Handwritten Text Recognition with TensorFlow Update 2021: more robust model, faster dataloader, word beam search decoder also available for Windows Up

Harald Scheidl 1.5k Jan 07, 2023
Automatically resolve RidderMaster based on TensorFlow & OpenCV

AutoRiddleMaster Automatically resolve RidderMaster based on TensorFlow & OpenCV 基于 TensorFlow 和 OpenCV 实现的全自动化解御迷士小马谜题 Demo How to use Deploy the ser

神龙章轩 5 Nov 19, 2021
Bu uygulamada Python ve Opencv kullanarak bilgisayar kamerasından yüz tespiti yapıyoruz.

opencv_yuz_bulma Bu uygulamada Python ve Opencv kullanarak bilgisayar kamerasından yüz tespiti yapıyoruz. Bilgisarın kendi kamerasını kullanmak için;

Ahmet Haydar Ornek 6 Apr 16, 2022
Crop regions in napari manually

napari-crop Crop regions in napari manually Usage Create a new shapes layer to annotate the region you would like to crop: Use the rectangle tool to a

Robert Haase 4 Sep 29, 2022
The papers published in top-tier AI conferences in recent years.

AI-conference-papers The papers published in top-tier AI conferences in recent years. Paper table AAAI ICLR CVPR ICML ICCV ECCV NIPS 2019 ✔️ ✔️ ✔️ ✔️

Jinbae Park 6 Dec 09, 2022
Lightning Fast Language Prediction 🚀

whatthelang Lightning Fast Language Prediction 🚀 Dependencies The dependencies can be installed using the requirements.txt file: $ pip install -r req

Indix 152 Oct 16, 2022
Some Boring Research About Products Recognition 、Duplicate Img Detection、Img Stitch、OCR

Products Recognition 介绍 商品识别,围绕在复杂的商场零售场景中,识别出货架图像中的商品信息。主要组成部分: 重复图像检测。【更新进度 4/10】 图像拼接。【更新进度 0/10】 目标检测。【更新进度 0/10】 商品识别。【更新进度 1/10】 OCR。【更新进度 1/10】

zhenjieWang 18 Jan 27, 2022
learn how to use Gesture Control to change the volume of a computer

Volume-Control-using-gesture In this project we are going to learn how to use Gesture Control to change the volume of a computer. We first look into h

Diwas Pandey 49 Sep 22, 2022
Comparison-of-OCR (KerasOCR, PyTesseract,EasyOCR)

Optical Character Recognition OCR (Optical Character Recognition) is a technology that enables the conversion of document types such as scanned paper

21 Dec 25, 2022