Captcha Recognition

Overview

Captcha Recognition

Problem Definition

CAPTCHA (Completely Automated Public Turing Test to tell Computers and Humans Apart) is an automated test created to prevent websites from being repeatedly accessed by an automatic program in a short period of time and wasting network resources. Among all the CAPTCHAs, commonly used types contain low resolution, deformed characters with character adhesions and background noise, which the user must read and type correctly into an input box. This is a relatively simple task for humans, taking an average of 10 seconds to solve, but it presents a difficulty for computers, because such noise makes it difficult for a program to differentiate the characters from them. The main objective of this project is to recognize the target numbers in the captcha images correctly.

The mainstream CAPTCHA is based on visual representation, including images such as letters and text. Traditional CAPTCHA recognition includes three steps: image pre-processing, character segmentation, and character recognition. Traditional methods have generalization capabilities and robustness for different types of CAPTCHA. The stickiness is poor. As a kind of deep neural network, convolutional neural network has shown excellent performance in the field of image recognition, and it is much better than traditional machine learning methods. Compared with traditional methods, the main advantage of CNN lies in the convolutional layer in which the extracted image features have strong expressive ability, avoiding the problems of data pre-processing and artificial design features in traditional recognition technology. Although CNN has achieved certain results, the recognition effect of complex CAPTCHA is insufficient

Dataset

The dataset contains CAPTCHA images. The images are 5 letter words, and have noise applied (blur and a line). They are of size 200 x 50. The file name is same as the image letters.
Link for the dataset: https://www.kaggle.com/fournierp/captcha-version-2-images

image

Image Pre-Processing

Three transformations have been applied to the data:

  1. Adaptive Thresholding
  2. Morphological transformations
  3. Gaussian blurring

Adaptive Thresholding

Thresholding is the process of converting a grayscale image to a binary image (an image that contains only black and white pixels). This process is explained in the steps below: • A threshold value is determined according to the requirements (Say 128). • The pixels of the grayscale image with values greater than the threshold (>128) are replaced with pixels of maximum pixel value(255). • The pixels of the grayscale image with values lesser than the threshold (<128) are replaced with pixels of minimum pixel value(0). But this method doesn’t perform well on all images, especially when the image has different lighting conditions in different areas. In such cases, we go for adaptive thresholding. In adaptive thresholding the threshold value for each pixel is determined individually based on a small region around it. Thus we get different thresholds for different regions of the image and so this method performs well on images with varying illumination.

The steps involved in calculating the pixel value for each of the pixels in the thresholded image are as follows: • The threshold value T(x,y) is calculated by taking the mean of the blockSize×blockSize neighborhood of (x,y) and subtracting it by C (Constant subtracted from the mean or weighted mean). • Then depending on the threshold type passed, either one of the following operations in the below image is performed:

image

OpenCV provides us the adaptive threshold function to perform adaptive thresholding : Thres_img=cv.adaptiveThreshold ( src, maxValue, adaptiveMethod, thresholdType, blockSize, C) Image after applying adaptive thresholding :

image

Morphological Transformations

Morphological transformations are some simple operations based on the image shape. It is normally performed on binary images. Two basic morphological operators are Erosion and Dilation. Then its variant forms like Opening, Closing, Gradient etc also comes into play. For this project I have used its variant form closing, closing is a dilation followed by an erosion. As the name suggests, a closing is used to close holes inside of objects or for connecting components together. An erosion in an image “erodes” the foreground object and makes it smaller. A foreground pixel in the input image will be kept only if all pixels inside the structuring element are > 0. Otherwise, the pixels are set to 0 (i.e., background). Erosion is useful for removing small blobs in an image or disconnecting two connected objects. The opposite of an erosion is a dilation. Just like an erosion will eat away at the foreground pixels, a dilation will grow the foreground pixels. Dilations increase the size of foreground objects and are especially useful for joining broken parts of an image together. Performing the closing operation is again accomplished by making a call to cv2.morphologyEx, but this time we are going to indicate that our morphological operation is a closing by specifying the cv2.MORPH_CLOSE. Image after applying morphological transformation:

image

Gaussian Blurring

Gaussian smoothing is used to remove noise that approximately follows a Gaussian distribution. The end result is that our image is less blurred, but more “naturally blurred,” than using the average in average blurring. Furthermore, based on this weighting we’ll be able to preserve more of the edges in our image as compared to average smoothing. Gaussian blurring is similar to average blurring, but instead of using a simple mean, we are now using a weighted mean, where neighbourhood pixels that are closer to the central pixel contribute more “weight” to the average. Gaussian smoothing uses a kernel of M X N, where both M and N are odd integers. Image after applying Gaussian blurring:

image

After applying all these image pre-processing techniques, images have been converted into n-dimension array

image

Further 2 more transformations have been applied on this n-dimensional array. The pixel values initially range from 0-255. They are first brought to 0-1 range by dividing all pixel values by 255. Then, they are normalized. Then, the data is shuffled and splitted into training and validation sets. Since the number of samples is not big enough and in deep learning we need large amounts of data and in some cases it is not feasible to collect thousands or millions of images, so data augmentation comes to the rescue. Data Augmentation is a technique that can be used to artificially expand the size of a training set by creating modified data from the existing one. It is a good practice to use data augmentation if you want to prevent overfitting, or the initial dataset is too small to train on, or even if you want to squeeze better performance from your model. In general, data augmentation is frequently used when building a deep learning model. To augment images when using Keras as our deep learning framework we can use ImageDataGenerator (tf.keras.preprocessing.image.ImageDataGenerator) that generates batches of tensor images with real-time data augmentation.

image

image

Testing

A helper function has been made to test the model on test data in which image pre-processing and transformations have been applied to get the final output

image

Result

The model achieves:

  1. Accuracy = 89.13%
  2. Precision = 91%
  3. Recall = 90%
  4. F1-score= 90%

Below is the full report:

image

Owner
Mohit Kaushik
Mohit Kaushik
QuanTaichi: A Compiler for Quantized Simulations (SIGGRAPH 2021)

QuanTaichi: A Compiler for Quantized Simulations (SIGGRAPH 2021) Yuanming Hu, Jiafeng Liu, Xuanda Yang, Mingkuan Xu, Ye Kuang, Weiwei Xu, Qiang Dai, W

Taichi Developers 119 Dec 02, 2022
SRA's seminar on Introduction to Computer Vision Fundamentals

Introduction to Computer Vision This repository includes basics to : Python Numpy: A python library Git Computer Vision. The aim of this repository is

Society of Robotics and Automation 147 Dec 04, 2022
BoxToolBox is a simple python application built around the openCV library

BoxToolBox is a simple python application built around the openCV library. It is not a full featured application to guide you through the w

František Horínek 1 Nov 12, 2021
Face_mosaic - Mosaic blur processing is applied to multiple faces appearing in the video

動機 face_recognitionを使用して得られる顔座標は長方形であり、この座標をそのまま用いてぼかし処理を行った場合得られる画像は醜い。 それに対してモ

Yoshitsugu Kesamaru 6 Feb 03, 2022
An easy to use an (hopefully useful) captcha solution for pyTelegramBotAPI

pyTelegramBotCAPTCHA An easy to use and (hopefully useful) image CAPTCHA soltion for pyTelegramBotAPI. Installation: pip install pyTelegramBotCAPTCHA

29 Dec 26, 2022
A curated list of awesome synthetic data for text location and recognition

awesome-SynthText A curated list of awesome synthetic data for text location and recognition and OCR datasets. Text location SynthText SynthText_Chine

Tianzhong 283 Jan 05, 2023
Controlling Volume by Hand Gestures

This program allows the user to control the volume of their device with specific hand gestures involving their thumb and index finger!

Riddhi Bajaj 1 Nov 11, 2021
How to detect objects in real time by using Jupyter Notebook and Neural Networks , by using Yolo3

Real Time Object Recognition From your Screen Desktop . In this post, I will explain how to build a simply program to detect objects from you desktop

Ruslan Magana Vsevolodovna 2 Sep 28, 2022
Image augmentation library in Python for machine learning.

Augmentor is an image augmentation library in Python for machine learning. It aims to be a standalone library that is platform and framework independe

Marcus D. Bloice 4.8k Jan 04, 2023
Convolutional Recurrent Neural Network (CRNN) for image-based sequence recognition.

Convolutional Recurrent Neural Network This software implements the Convolutional Recurrent Neural Network (CRNN), a combination of CNN, RNN and CTC l

Baoguang Shi 2k Dec 31, 2022
OCR-D-compliant page segmentation

ocrd_segment This repository aims to provide a number of OCR-D-compliant processors for layout analysis and evaluation. Installation In your virtual e

OCR-D 59 Sep 10, 2022
POT : Python Optimal Transport

This open source Python library provide several solvers for optimization problems related to Optimal Transport for signal, image processing and machine learning.

Python Optimal Transport 1.7k Jan 04, 2023
Handwritten Text Recognition (HTR) system implemented with TensorFlow (TF) and trained on the IAM off-line HTR dataset. This Neural Network (NN) model recognizes the text contained in the images of segmented words.

Handwritten-Text-Recognition Handwritten Text Recognition (HTR) system implemented with TensorFlow (TF) and trained on the IAM off-line HTR dataset. T

27 Jan 08, 2023
chineseocr/table_line 表格线检测模型pytorch版

table_line_pytorch chineseocr/table_detct 表格线检测模型table_line pytorch版 原项目github: https://github.com/chineseocr/table-detect 1、模型转换 下载原项目table_detect模型文

1 Oct 21, 2021
The virtual calculator will be above the live streaming from your camera

The virtual calculator is above the live streaming from my camera usb , the program first detect my hand and in each frame calculate the distance between two finger ,if the distance is lower than the

gasbaoui mohammed al amine 5 Jul 01, 2022
Controlling the computer volume with your hands // OpenCV

HandsControll-AI Controlling the computer volume with your hands // OpenCV Step 1 git clone https://github.com/Hayk-21/HandsControll-AI.git pip instal

Hayk 1 Nov 04, 2021
Python-based tools for document analysis and OCR

ocropy OCRopus is a collection of document analysis programs, not a turn-key OCR system. In order to apply it to your documents, you may need to do so

OCRopus 3.2k Dec 31, 2022
MeshToGeotiff - A fast Python algorithm to convert a 3D mesh into a GeoTIFF

MeshToGeotiff - A fast Python algorithm to convert a 3D mesh into a GeoTIFF Python class for converting (very fast) 3D Meshes/Surfaces to Raster DEMs

8 Sep 10, 2022
Document manipulation detection with python

image manipulation detection task: -- tianchi function image segmentation salie

JiaKui Hu 3 Aug 22, 2022
原神风花节自动弹琴辅助

GenshinAutoPlayBalladsofBreeze 原神风花节自动弹琴辅助(已适配1920*1080分辨率) 本程序基于opencv图像识别技术,不存在任何封号。 因为正确率取决于你的cpu性能,10900k都不一定全对。 由于图像识别存在误差,根本无法确定出错时间。更不用说被检测到了。

晓轩 20 Oct 27, 2022