Optical character recognition for Japanese text, with the main focus being Japanese manga

Overview

Manga OCR

Optical character recognition for Japanese text, with the main focus being Japanese manga. It uses a custom end-to-end model built with Transformers' Vision Encoder Decoder framework.

Manga OCR can be used as a general purpose printed Japanese OCR, but its main goal was to provide a high quality text recognition, robust against various scenarios specific to manga:

  • both vertical and horizontal text
  • text with furigana
  • text overlaid on images
  • wide variety of fonts and font styles
  • low quality images

Unlike many OCR models, Manga OCR supports recognizing multi-line text in a single forward pass, so that text bubbles found in manga can be processed at once, without splitting them into lines.

See also:

  • Poricom, a GUI reader, which uses manga-ocr
  • mokuro, a tool, which uses manga-ocr to generate an HTML overlay for manga
  • Xelieu's guide, a comprehensive guide on setting up a reading and mining workflow with manga-ocr/mokuro (and many other useful tips)
  • Development code, including code for training and synthetic data generation: link
  • Description of synthetic data generation pipeline + examples of generated images: link

Installation

You need Python 3.6, 3.7, 3.8 or 3.9. Unfortunately, PyTorch does not support Python 3.10 yet.

Some users have reported problems with Python installed from Microsoft Store. If you see an error: ImportError: DLL load failed while importing fugashi: The specified module could not be found., try installing Python from the official site.

If you want to run with GPU, install PyTorch as described here, otherwise this step can be skipped.

Run in command line:

pip3 install manga-ocr

Usage

Python API

from manga_ocr import MangaOcr

mocr = MangaOcr()
text = mocr('/path/to/img')

or

import PIL.Image

from manga_ocr import MangaOcr

mocr = MangaOcr()
img = PIL.Image.open('/path/to/img')
text = mocr(img)

Running in the background

Manga OCR can run in the background and process new images as they appear.

You might use a tool like ShareX to manually capture a region of the screen and let the OCR read it either from the system clipboard, or a specified directory. By default, Manga OCR will write recognized text to clipboard, from which it can be read by a dictionary like Yomichan. Reading images from clipboard works only on Windows and macOS, on Linux you should read from a directory instead.

Your full setup for reading manga in Japanese with a dictionary might look like this:

capture region with ShareX -> write image to clipboard -> Manga OCR -> write text to clipboard -> Yomichan

manga_ocr_demo.mp4
  • To read images from clipboard and write recognized texts to clipboard, run in command line:
    manga_ocr
    
  • To read images from ShareX's screenshot folder, run in command line:
    manga_ocr "/path/to/sharex/screenshot/folder"
    

Note that when running in the clipboard scanning mode, any image that you copy to clipboard will be processed by OCR and replaced by recognized text. If you want to be able to copy and paste images as usual, you should use the folder scanning mode instead and define a separate task in ShareX just for OCR, which saves screenshots to some folder without copying them to clipboard.

When running for the first time, downloading the model (~400 MB) might take a few minutes. The OCR is ready to use after OCR ready message appears in the logs.

  • To see other options, run in command line:
    manga_ocr --help
    

If manga_ocr doesn't work, you might also try replacing it with python -m manga_ocr.

Usage tips

  • OCR supports multi-line text, but the longer the text, the more likely some errors are to occur. If the recognition failed for some part of a longer text, you might try to run it on a smaller portion of the image.
  • The model was trained specifically to handle manga well, but should do a decent job on other types of printed text, such as novels or video games. It probably won't be able to handle handwritten text though.
  • The model always attempts to recognize some text on the image, even if there is none. Because it uses a transformer decoder (and therefore has some understanding of the Japanese language), it might even "dream up" some realistically looking sentences! This shouldn't be a problem for most use cases, but it might get improved in the next version.

Examples

Here are some cherry-picked examples showing the capability of the model.

image Manga OCR result
素直にあやまるしか
立川で見た〝穴〟の下の巨大な眼は:
実戦剣術も一流です
第30話重苦しい闇の奥で静かに呼吸づきながら
よかったじゃないわよ!何逃げてるのよ!!早くあいつを退治してよ!
ぎゃっ
ピンポーーン
LINK!私達7人の力でガノンの塔の結界をやぶります
ファイアパンチ
少し黙っている
わかるかな〜?
警察にも先生にも町中の人達に!!

Contact

For any inquiries, please feel free to contact me at [email protected]

Acknowledgments

This project was done with the usage of:

Owner
Maciej Budyś
Maciej Budyś
OCR, Object Detection, Number Plate, Real Time

README.md PrePareded anaconda env requirements.txt clova AI → deep text recognition → trained weights (ex, .pth) wpod-net weights (ex, .h5 , .json) ht

Kaven Lee 7 Dec 06, 2022
FOTS Pytorch Implementation

News!!! Recognition branch now is added into model. The whole project has beed optimized and refactored. ICDAR Dataset SynthText 800K Dataset detectio

Ning Lu 599 Dec 19, 2022
Zoom , GoogleMeets에서 Vtuber 데뷔하기

EasyVtuber Facial landmark와 GAN을 이용한 Character Face Generation Google Meets, Zoom 등에서 자신만의 웹툰, 만화 캐릭터로 대화해보세요! 악세사리는 어느정도 추가해도 잘 작동해요! 안타깝게도 RTX 2070

Gunwoo Han 140 Dec 23, 2022
Rest API Written In Python To Classify NSFW Images.

✨ NSFW Classifier API ✨ Rest API Written In Python To Classify NSFW Images. Fastest Solution If you don't want to selfhost it, there's already an inst

Akshay Rajput 23 Dec 30, 2022
learn how to use Gesture Control to change the volume of a computer

Volume-Control-using-gesture In this project we are going to learn how to use Gesture Control to change the volume of a computer. We first look into h

Diwas Pandey 49 Sep 22, 2022
Source Code for AAAI 2022 paper "Graph Convolutional Networks with Dual Message Passing for Subgraph Isomorphism Counting and Matching"

Graph Convolutional Networks with Dual Message Passing for Subgraph Isomorphism Counting and Matching This repository is an official implementation of

HKUST-KnowComp 13 Sep 08, 2022
An advanced 2D image manipulation with features such as edge detection and image segmentation built using OpenCV

OpenCV-ToothPaint3-Advanced-Digital-Image-Editor This application named ‘Tooth Paint’ version TP_2020.3 (64-bit) or version 3 was developed within a w

JunHong 1 Nov 05, 2021
PyNeuro is designed to connect NeuroSky's MindWave EEG device to Python and provide Callback functionality to provide data to your application in real time.

PyNeuro PyNeuro is designed to connect NeuroSky's MindWave EEG device to Python and provide Callback functionality to provide data to your application

Zach Wang 45 Dec 30, 2022
Localization of thoracic abnormalities model based on VinBigData (top 1%)

Repository contains the code for 2nd place solution of VinBigData Chest X-ray Abnormalities Detection competition. The goal of competition was to auto

33 May 24, 2022
A Python wrapper for Google Tesseract

Python Tesseract Python-tesseract is an optical character recognition (OCR) tool for python. That is, it will recognize and "read" the text embedded i

Matthias A Lee 4.6k Jan 06, 2023
Um simples projeto para fazer o reconhecimento do captcha usado pelo jogo bombcrypto

CaptchaSolver - LEIA ISSO 😓 Para iniciar o codigo: pip install -r requirements.txt python captcha_solver.py Se você deseja pegar ver o resultado das

Kawanderson 50 Mar 21, 2022
Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation

This is the official implementation of "Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation". For more details, please

Pengyuan Lyu 309 Dec 06, 2022
MONAI Label is a server-client system that facilitates interactive medical image annotation by using AI.

MONAI Label is a server-client system that facilitates interactive medical image annotation by using AI. It is an open-source and easy-to-install ecosystem that can run locally on a machine with one

Project MONAI 344 Dec 23, 2022
Text modding tools for FF7R (Final Fantasy VII Remake)

FF7R_text_mod_tools Subtitle modding tools for FF7R (Final Fantasy VII Remake) There are 3 tools I made. make_dualsub_mod.exe: Merges (or swaps) subti

10 Dec 19, 2022
A Python script to capture images from multiple webcams at once and save them into your local machine

Capturing multiple images at once from Webcam Using OpenCV Capture multiple image by accessing the webcam of your system and save it to your machine.

Fazal ur Rehman 2 Apr 16, 2022
[ICCV, 2021] Cloud Transformers: A Universal Approach To Point Cloud Processing Tasks

Cloud Transformers: A Universal Approach To Point Cloud Processing Tasks This is an official PyTorch code repository of the paper "Cloud Transformers:

Visual Understanding Lab @ Samsung AI Center Moscow 27 Dec 15, 2022
Binarize document images

Binarization Binarization for document images Examples Introduction This tool performs document image binarization (i.e. transform colour/grayscale to

QURATOR-SPK 48 Jan 02, 2023
QED-C: The Quantum Economic Development Consortium provides these computer programs and software for use in the fields of quantum science and engineering.

Application-Oriented Performance Benchmarks for Quantum Computing This repository contains a collection of prototypical application- or algorithm-cent

SRI International 67 Nov 30, 2022
Convert PDF/Image to TXT using EasyOcr - the best OCR engine available!

PDFImage2TXT - DOWNLOAD INSTALLER HERE What can you do with it? Convert scanned PDFs to TXT. Convert scanned Documents to TXT. No coding required!! In

Hans Alemão 2 Feb 22, 2022
This is a passport scanning web service to help you scan, identify and validate your passport created with a simple and flexible design and ready to be integrated right into your system!

Passport-Recogniton-System This is a passport scanning web service to help you scan, identify and validate your passport created with a simple and fle

Mo'men Ashraf Muhamed 7 Jan 04, 2023