OCR system for Arabic language that converts images of typed text to machine-encoded text.

Last update: Jan 05, 2023

Overview

Arabic OCR

OCR system for Arabic language that converts images of typed text to machine-encoded text.
The system currently supports only letters (29 letters) ا-ى , لا.
The system aims to solve a simpler problem of OCR with images that contain only Arabic characters (check the dataset link below to see a sample of the images).

Setup

Install python then run this command:

pip install -r requirements.txt

Run

Put the images in src/test directory
Go to src directory and run the following command
```
python OCR.py
```
Output folder will be created with:
- text folder which has text files corresponding to the images.
- running_time file which has the time taken to process each image.

Pipeline

Dataset

Link to dataset of images and the corresponding text: here.
We used 1000 images to generate character dataset that we used for training.

Examples

Line Segmentation

Word Segmentation

Character Segmentation

Performance

Average accuracy: 95%.
Average time per image: 16 seconds.

NOTE

We achieved these results when we used only the flatten image as feature.

OCR system for Arabic language that converts images of typed text to machine-encoded text.

Related tags

Overview

Arabic OCR

Setup

Run

Pipeline

Dataset

Examples

Line Segmentation

Word Segmentation

Character Segmentation

Performance

References

Owner

Hussein Youssef

Distort a video using Seam Carving (video) and Vibrato effect (sound)

FastOCR is a desktop application for OCR API.

Educational application aimed at automating user-defined workflows for the mobile game, "Granblue Fantasy", using a variety of CV technologies in the backend such as OpenCV, PyAutoGUI and EasyOCR and a frontend coded in Typescript.

零样本学习测评基准，中文版

Convolutional Recurrent Neural Network (CRNN) for image-based sequence recognition.

Smart computer vision application

A semi-automatic open-source tool for Layout Analysis and Region EXtraction on early printed books.

A toolbox of scene text detection and recognition

Image Detector and Convertor App created using python's Pillow, OpenCV, cvlib, numpy and streamlit packages.

(CVPR 2021) ST3D: Self-training for Unsupervised Domain Adaptation on 3D Object Detection

Pytorch implementation of PSEnet with Pyramid Attention Network as feature extractor

docstrum

APS 6º Semestre - UNIP (2021)

Pre-Recognize Library - library with algorithms for improving OCR quality.

Give a solution to recognize MaoYan font.

Ddddocr - 通用验证码识别OCR pypi版

2 telegram-bots: for image recognition and for text generation

Captcha Recognition

Make OpenCV camera loops less of a chore by skipping the boilerplate and getting right to the interesting stuff

Sign Language Recognition service utilizing a deep learning model with Long Short-Term Memory to perform sign language recognition.