An application of high resolution GANs to dewarp images of perturbed documents

Overview

Docuwarp

Codacy Badge Python version

This project is focused on dewarping document images through the usage of pix2pixHD, a GAN that is useful for general image to image translation. The objective is to take images of documents that are warped, folded, crumpled, etc. and convert the image to a "dewarped" state by using pix2pixHD to train and perform inference. All of the model code is borrowed directly from the pix2pixHD official repository.

Some of the intuition behind doing this is inspired by these two papers:

  1. DocUNet: Document Image Unwarping via A Stacked U-Net (Ma et.al)
  2. Document Image Dewarping using Deep Learning (Ramanna et.al)

May 8, 2020 : Important Update

  • This project does not contain a pre-trained model. I currently do not have the resources to train a model on an open source dataset, nor do I have the bandwidth at the moment to do so. If anyone would like to contribute a pretrained model and share their model checkpoints, feel free to do so, I will likely accept any PR trying to do this. Thanks!

Prerequisites

This project requires Python and the following Python libraries installed:

Getting Started

Installation

pip install dominate
  • Clone this repo:
git clone https://github.com/thomasjhuang/deep-learning-for-document-dewarping
cd deep-learning-for-document-dewarping

Training

  • Train the kaggle model with 256x256 crops:
python train.py --name kaggle --label_nc 0 --no_instance --no_flip --netG local --ngf 32 --fineSize 256
  • To view training results, please checkout intermediate results in ./checkpoints/kaggle/web/index.html. If you have tensorflow installed, you can see tensorboard logs in ./checkpoints/kaggle/logs by adding --tf_log to the training scripts.

Training with your own dataset

  • If you want to train with your own dataset, please generate label maps which are one-channel whose pixel values correspond to the object labels (i.e. 0,1,...,N-1, where N is the number of labels). This is because we need to generate one-hot vectors from the label maps. Please also specity --label_nc N during both training and testing.
  • If your input is not a label map, please just specify --label_nc 0 which will directly use the RGB colors as input. The folders should then be named train_A, train_B instead of train_label, train_img, where the goal is to translate images from A to B.
  • If you don't have instance maps or don't want to use them, please specify --no_instance.
  • The default setting for preprocessing is scale_width, which will scale the width of all training images to opt.loadSize (1024) while keeping the aspect ratio. If you want a different setting, please change it by using the --resize_or_crop option. For example, scale_width_and_crop first resizes the image to have width opt.loadSize and then does random cropping of size (opt.fineSize, opt.fineSize). crop skips the resizing step and only performs random cropping. If you don't want any preprocessing, please specify none, which will do nothing other than making sure the image is divisible by 32.

Testing

  • Test the model:
python test.py --name kaggle --label_nc 0 --netG local --ngf 32 --resize_or_crop crop --no_instance --no_flip --fineSize 256

The test results will be saved to a directory here: ./results/kaggle/test_latest/.

Dataset

  • I use the kaggle denoising dirty documents dataset. To train a model on the full dataset, please download it from the official website. After downloading, please put it under the datasets folder with warped images under the directory name train_A and unwarped images under the directory train_B. Your test images are warped images, and should be under the name test_A. Below is an example dataset directory structure.

        .
        ├── ...
        ├── datasets                  
        │   ├── train_A               # warped images
        │   ├── train_B               # unwarped, "ground truth" images
        │   └── test_A                # warped images used for testing
        └── ...
    

Multi-GPU training

  • Train a model using multiple GPUs (bash ./scripts/train_kaggle_256_multigpu.sh):
#!./scripts/train_kaggle_256_multigpu.sh
python train.py --name kaggle_256_multigpu --label_nc 0 --netG local --ngf 32 --resize_or_crop crop --no_instance --no_flip --fineSize 256 --batchSize 32 --gpu_ids 0,1,2,3,4,5,6,7

Training with Automatic Mixed Precision (AMP) for faster speed

  • To train with mixed precision support, please first install apex from: https://github.com/NVIDIA/apex
  • You can then train the model by adding --fp16. For example,
#!./scripts/train_512p_fp16.sh
python -m torch.distributed.launch train.py --name label2city_512p --fp16

In my test case, it trains about 80% faster with AMP on a Volta machine.

More Training/Test Details

  • Flags: see options/train_options.py and options/base_options.py for all the training flags; see options/test_options.py and options/base_options.py for all the test flags.
  • Instance map: we take in both label maps and instance maps as input. If you don't want to use instance maps, please specify the flag --no_instance.
Owner
Thomas Huang
I'm currently a Machine Learning Scientist @alectio. Purdue CS 2019
Thomas Huang
A python program to block out your face

Readme This is a small program I threw together in about 6 hours to block out your face. It probably doesn't work very well, so be warned. By default,

1 Oct 17, 2021
Handwriting Recognition System based on a deep Convolutional Recurrent Neural Network architecture

Handwriting Recognition System This repository is the Tensorflow implementation of the Handwriting Recognition System described in Handwriting Recogni

Edgard Chammas 346 Jan 07, 2023
An OCR evaluation tool

dinglehopper dinglehopper is an OCR evaluation tool and reads ALTO, PAGE and text files. It compares a ground truth (GT) document page with a OCR resu

QURATOR-SPK 40 Dec 20, 2022
Train custom VR face tracking parameters

Pal Buddy Guy: The anipal's best friend This is a small script to improve upon the tracking capabilities of the Vive Pro Eye and facial tracker. You c

7 Dec 12, 2021
Opencv face recognition desktop application

Opencv-Face-Recognition Opencv face recognition desktop application Program developed by Gustavo Wydler Azuaga - 2021-11-19 Screenshots of the program

Gus 1 Nov 19, 2021
A list of hyperspectral image super-solution resources collected by Junjun Jiang

A list of hyperspectral image super-resolution resources collected by Junjun Jiang. If you find that important resources are not included, please feel free to contact me.

Junjun Jiang 301 Jan 05, 2023
Some Boring Research About Products Recognition 、Duplicate Img Detection、Img Stitch、OCR

Products Recognition 介绍 商品识别,围绕在复杂的商场零售场景中,识别出货架图像中的商品信息。主要组成部分: 重复图像检测。【更新进度 4/10】 图像拼接。【更新进度 0/10】 目标检测。【更新进度 0/10】 商品识别。【更新进度 1/10】 OCR。【更新进度 1/10】

zhenjieWang 18 Jan 27, 2022
Go package for OCR (Optical Character Recognition), by using Tesseract C++ library

gosseract OCR Golang OCR package, by using Tesseract C++ library. OCR Server Do you just want OCR server, or see the working example of this package?

Hiromu OCHIAI 1.9k Dec 28, 2022
Line based ATR Engine based on OCRopy

OCR Engine based on OCRopy and Kraken using python3. It is designed to both be easy to use from the command line but also be modular to be integrated

948 Dec 23, 2022
BNF Globalization Code (CVPR 2016)

Boundary Neural Fields Globalization This is the code for Boundary Neural Fields globalization method. The technical report of the method can be found

25 Apr 15, 2022
Markup for note taking

Subtext: markup for note-taking Subtext is a text-based, block-oriented hypertext format. It is designed with note-taking in mind. It has a simple, pe

Gordon Brander 224 Jan 01, 2023
GDB python tool to pretty print and debug c++ xtensor containers

gdb_xt2np GDB python tool to pretty print, examine, and debug c++ Xtensor containers. Xtensor is a c++ library for scientific computing using multidim

Christopher Burke 4 Oct 29, 2021
huoyijie 1.2k Dec 29, 2022
Rotational region detection based on Faster-RCNN.

R2CNN_Faster_RCNN_Tensorflow Abstract This is a tensorflow re-implementation of R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detecti

UCAS-Det 581 Nov 22, 2022
Give a solution to recognize MaoYan font.

猫眼字体识别 该 github repo 在于帮助xjtlu的同学们识别猫眼的扭曲字体。已经打包上传至 pypi ,可以使用 pip 直接安装。 猫眼字体的识别不出来的原理与解决思路在采茶上 使用方法: import MaoYanFontRecognize

Aruix 4 Jun 30, 2022
A novel region proposal network for more general object detection ( including scene text detection ).

DeRPN: Taking a further step toward more general object detection DeRPN is a novel region proposal network which concentrates on improving the adaptiv

Deep Learning and Vision Computing Lab, SCUT 151 Dec 12, 2022
Resizing Canny Countour In Python

Resizing_Canny_Countour Install Visual Studio Code , https://code.visualstudio.com/download Select Python and install with terminal( pip install openc

Walter Ng 1 Nov 07, 2021
🔎 Like Chardet. 🚀 Package for encoding & language detection. Charset detection.

Charset Detection, for Everyone 👋 The Real First Universal Charset Detector A library that helps you read text from an unknown charset encoding. Moti

TAHRI Ahmed R. 332 Dec 31, 2022
A bot that plays TFT using OCR. Keeps track of bench, board, items, and plays the user defined team comp.

NOTES: To ensure best results, make sure you are running this on a computer that has decent specs. 1920x1080 fullscreen is required in League, game mu

francis 125 Dec 30, 2022
Character Segmentation using TensorFlow

Character Segmentation Segment characters and spaces in one text line,from this paper Chinese English mixed Character Segmentation as Semantic Segment

26 Aug 25, 2022