ARU-Net - Deep Learning Chinese Word Segment

Overview

ARU-Net: A Neural Pixel Labeler for Layout Analysis of Historical Documents

Contents

Introduction

This is the Tensorflow code corresponding to A Two-Stage Method for Text Line Detection in Historical Documents . This repo contains the neural pixel labeling part described in the paper. It contains the so-called ARU-Net (among others) which is basically an extended version of the well known U-Net [2]. Besides the model and the basic workflow to train and test models, different data augmentation strategies are implemented to reduce the amound of training data needed. The repo's features are summarized below:

  • Inference Demo
    • Trained and freezed tensorflow graph included
    • Easy to reuse for own inference tests
  • Workflow
    • Full training workflow to parametrize and train your own models
    • Contains different models, data augmentation strategies, loss functions
    • Training on specific GPU, this enables the training of several models on a multi GPU system in parallel
    • Easy validation for trained model either using classical or ema-shadow weights

Please cite [1] if you find this repo useful and/or use this software for own work.

Installation

  1. Use python 2.7
  2. Any version of tensorflow version > 1.0 should be ok.
  3. Python packages: matplotlib (>=1.3.1), pillow (>=2.1.0), scipy (>=1.0.0), scikit-image (>=0.13.1), click (>=5.x)
  4. Clone the Repo
  5. Done

Demo

To run the demo follow:

  1. Open a shell
  2. Make sure Tensorflow is available, e.g., go to docker environment, activate conda, ...
  3. Navigate to the repo folder YOUR_PATH/ARU-Net/
  4. Run:
python run_demo_inference.py 

The demo will load a trained model and perform inference for five sample images of the cBad test set [3], [4]. The network was trained to predict the position of baselines and separators for the begining and end of each text line. After running the python script you should see a matplot window. To go to the next image just close it.

Example

The example images are sampled from the cBad test set [3], [4]. One image along with its results are shown below.

image_1 image_2 image_3

Training

This section describes step-by-step the procedure to train your own model.

Train data:

The following describes how the training data should look like:

  • The images along with its pixel ground truth have to be in the same folder
  • For each image: X.jpg, there have to be images named X_GT0.jpg, X_GT1.jpg, X_GT2.jpg, ... (for each channel to be predicted one GT image)
  • Each ground truth image is binary and contains ones at positions where the corresponding class is present and zeros otherwise (see demo_images/demo_traindata for a sample)
  • Generate a list containing row-wise the absolute pathes to the images (just the document images not the GT ones)

Val data:

The following describes how the validation data should look like:

Train the model:

The following describes how to train a model:

  • Have a look at the pix_lab/main/train_aru.py script
  • Parametrize it like you wish (have a look at the data_provider, cost and optimizer scripts to see all parameters)
  • Setting the correct paths, adapting the number of output classes and using the default parametrization should work fine for a first training
  • Run:
python -u pix_lab/main/train_aru.py &> info.log 

Validate the model:

The following describes how to validate a trained model:

  • Train and val losses are printed in info.log
  • To validate the checkpoints using the classical weights as well as its ema-shadows, adapt and run:
pix_lab/main/validate_ckpt.py

Comments

If you are interested in a related problem, this repo could maybe help you as well. The ARU-Net can be used for each pixel labeling task, besides the baseline detection task, it can be easily used for, e.g., binarization, page segmentation, ... purposes.

References

Please cite [1] if using this code.

A Two-Stage Method for Text Line Detection in Historical Documents

[1] T. Grüning, G. Leifert, T. Strauß, R. Labahn, A Two-Stage Method for Text Line Detection in Historical Documents

@article{Gruning2018,
arxivId = {1802.03345},
author = {Gr{\"{u}}ning, Tobias and Leifert, Gundram and Strau{\ss}, Tobias and Labahn, Roger},
title = {{A Two-Stage Method for Text Line Detection in Historical Documents}},
url = {http://arxiv.org/abs/1802.03345},
year = {2018}
}

U-Net: Convolutional Networks for Biomedical Image Segmentation

[2] O. Ronneberger, P, Fischer, T, Brox, U-Net: Convolutional Networks for Biomedical Image Segmentation

@article{Ronneberger2015,
arxivId = {1505.04597},
author = {Ronneberger, Olaf and Fischer, Philipp and Brox, Thomas},
journal = {Miccai},
pages = {234--241},
title = {{U-Net: Convolutional Networks for Biomedical Image Segmentation}},
year = {2015}
}

READ-BAD: A New Dataset and Evaluation Scheme for Baseline Detection in Archival Documents

[3] T. Grüning, R. Labahn, M. Diem, F. Kleber, S. Fiel, READ-BAD: A New Dataset and Evaluation Scheme for Baseline Detection in Archival Documents

@article{Gruning2017,
arxivId = {1705.03311},
author = {Gr{\"{u}}ning, Tobias and Labahn, Roger and Diem, Markus and Kleber, Florian and Fiel, Stefan},
title = {{READ-BAD: A New Dataset and Evaluation Scheme for Baseline Detection in Archival Documents}},
url = {http://arxiv.org/abs/1705.03311},
year = {2017}
}

A Robust and Binarization-Free Approach for Text Line Detection in Historical Documents

[4] M. Diem, F. Kleber, S. Fiel, T. Grüning, B. Gatos, ScriptNet: ICDAR 2017 Competition on Baseline Detection in Archival Documents (cBAD)

@misc{Diem2017,
author = {Diem, Markus and Kleber, Florian and Fiel, Stefan and Gr{\"{u}}ning, Tobias and Gatos, Basilis},
doi = {10.5281/zenodo.257972},
title = {ScriptNet: ICDAR 2017 Competition on Baseline Detection in Archival Documents (cBAD)},
year = {2017}
}
This is a project to detect gestures to zoom in or out, using the real-time distance between the index finger and the thumb. It's based on OpenCV and Mediapipe.

Pinch-zoom This is a python project based on real-time hand-gesture detection, to zoom in or out, using the distance between the index finger and the

Harshit Bhalla 6 Jul 11, 2022
Framework for the Complete Gaze Tracking Pipeline

Framework for the Complete Gaze Tracking Pipeline The figure below shows a general representation of the camera-to-screen gaze tracking pipeline [1].

Pascal 20 Jan 06, 2023
Code for the paper STN-OCR: A single Neural Network for Text Detection and Text Recognition

STN-OCR: A single Neural Network for Text Detection and Text Recognition This repository contains the code for the paper: STN-OCR: A single Neural Net

Christian Bartz 496 Jan 05, 2023
This project proposes a camera vision based cursor control system, using hand moment captured from a webcam through a landmarks of hand by using Mideapipe module

This project proposes a camera vision based cursor control system, using hand moment captured from a webcam through a landmarks of hand by using Mideapipe module

Chandru 2 Feb 20, 2022
OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched or copy-pasted. ocrmypdf # it's a scriptable c

jbarlow83 7.9k Jan 03, 2023
CNN+Attention+Seq2Seq

Attention_OCR CNN+Attention+Seq2Seq The model and its tensor transformation are shown in the figure below It is necessary ch_ train and ch_ test the p

Tsukinousag1 2 Jul 14, 2022
Generate text images for training deep learning ocr model

New version release:https://github.com/oh-my-ocr/text_renderer Text Renderer Generate text images for training deep learning OCR model (e.g. CRNN). Su

Qing 1.2k Jan 04, 2023
The first open-source library that detects the font of a text in a image.

Typefont Typefont is an experimental library that detects the font of a text in a image. Usage Import the main function and invoke it like in the foll

Vasile Pește 1.6k Feb 24, 2022
Detecting Text in Natural Image with Connectionist Text Proposal Network (ECCV'16)

Detecting Text in Natural Image with Connectionist Text Proposal Network The codes are used for implementing CTPN for scene text detection, described

Tian Zhi 1.3k Dec 22, 2022
[EMNLP 2021] Improving and Simplifying Pattern Exploiting Training

ADAPET This repository contains the official code for the paper: "Improving and Simplifying Pattern Exploiting Training". The model improves and simpl

Rakesh R Menon 138 Dec 26, 2022
✌️Using this you can control your PC/Laptop volume by Hand Gestures created with Python.

Hand Gesture Volume Controller ✋ Hand recognition 👆 Finger recognition 🔊 you can decrease and increase volume Demo Code Firstly I have created a Mod

Abbas Ataei 19 Nov 17, 2022
A general list of resources to image text localization and recognition 场景文本位置感知与识别的论文资源与实现合集 シーンテキストの位置認識と識別のための論文リソースの要約

Scene Text Localization & Recognition Resources Read this institute-wise: English, 简体中文. Read this year-wise: English, 简体中文. Tags: [STL] (Scene Text L

Karl Lok (Zhaokai Luo) 901 Dec 11, 2022
Isearch (OSINT) 🔎 Face recognition reverse image search on Instagram profile feed photos.

isearch is an OSINT tool on Instagram. Offers a face recognition reverse image search on Instagram profile feed photos.

Malek salem 20 Oct 25, 2022
Code release for our paper, "SimNet: Enabling Robust Unknown Object Manipulation from Pure Synthetic Data via Stereo"

SimNet: Enabling Robust Unknown Object Manipulation from Pure Synthetic Data via Stereo Thomas Kollar, Michael Laskey, Kevin Stone, Brijen Thananjeyan

68 Dec 14, 2022
Official code for ROCA: Robust CAD Model Retrieval and Alignment from a Single Image (CVPR 2022)

ROCA: Robust CAD Model Alignment and Retrieval from a Single Image (CVPR 2022) Code release of our paper ROCA. Check out our video, paper, and website

123 Dec 25, 2022
When Age-Invariant Face Recognition Meets Face Age Synthesis: A Multi-Task Learning Framework (CVPR 2021 oral)

MTLFace This repository contains the PyTorch implementation and the dataset of the paper: When Age-Invariant Face Recognition Meets Face Age Synthesis

Hzzone 120 Jan 05, 2023
Responsive Doc. scanner using U^2-Net, Textcleaner and Tesseract

Responsive Doc. scanner using U^2-Net, Textcleaner and Tesseract Toolset U^2-Net is used for background removal Textcleaner is used for image cleaning

3 Jul 13, 2022
BNF Globalization Code (CVPR 2016)

Boundary Neural Fields Globalization This is the code for Boundary Neural Fields globalization method. The technical report of the method can be found

25 Apr 15, 2022
OCR of Chicago 1909 Renumbering Plan

Requirements: Python 3 (probably at least 3.4) pipenv (pip3 install pipenv) tesseract (brew install tesseract, at least if you have a mac and homebrew

ted whalen 2 Nov 21, 2021
Document Layout Analysis Projects

Layout_Analysis Introduction This is an implementation of RLSA and X-Y Cut with OpenCV Dependencies OpenCV 3.0+ How to use Compile with g++ : g++ -std

22 Dec 08, 2022