The official code for “DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction”, ACM MM, Oral Paper, 2021.

Last update: Dec 26, 2022

Related tags

Text Data & NLP DocTr

Overview

Good news! Our new work exhibits state-of-the-art performances on DocUNet benchmark dataset: DocScanner: Robust Document Image Rectification with Progressive Learning

DocTr

DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction
ACM MM 2021 Oral

Any questions or discussions are welcomed!

Training

For geometric unwarping, we train the GeoTr network using the Doc3d dataset.
For illumination correction, we train the IllTr network based on the DRIC dataset.

Inference

Download the pretrained models here and put them to $ROOT/model_pretrained/.
Geometric unwarping:
```
python inference.py
```
Geometric unwarping and illumination rectification:
```
python inference.py --ill_rec True
```

Evaluation

We use the same evaluation code as DocUNet benchmark dataset based on Matlab 2019a.
Please compare the scores according to your Matlab version.
Use the images available here for reproducing the quantitative performance reported in the paper and further comparison.

Citation

If you find this code useful for your research, please use the following BibTeX entry.

@inproceedings{feng2021doctr,
  title={DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction},
  author={Feng, Hao and Wang, Yuechen and Zhou, Wengang and Deng, Jiajun and Li, Houqiang},
  booktitle={Proceedings of the 29th ACM International Conference on Multimedia},
  pages={273--281},
  year={2021}
}

@article{feng2021docscanner,
  title={DocScanner: Robust Document Image Rectification with Progressive Learning},
  author={Feng, Hao and Zhou, Wengang and Deng, Jiajun and Tian, Qi and Li, Houqiang},
  journal={arXiv preprint arXiv:2110.14968},
  year={2021}
}

The official code for “DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction”, ACM MM, Oral Paper, 2021.

Related tags

Overview

DocTr

Training

Inference

Evaluation

Citation

Owner

Hao Feng

AllenNLP integration for Shiba: Japanese CANINE model

Beyond Paragraphs: NLP for Long Sequences

Natural Language Processing at EDHEC, 2022

KLUE-baseline contains the baseline code for the Korean Language Understanding Evaluation (KLUE) benchmark.

Voice Assistant inspired by Google Assistant, Cortana, Alexa, Siri, ...

NLP made easy

A machine learning model for analyzing text for user sentiment and determine whether its a positive, neutral, or negative review.

Kinky furry assitant based on GPT2

This is a modification of the OpenAI-CLIP repository of moein-shariatnia

gaiic2021-track3-小布助手对话短文本语义匹配复赛rank3、决赛rank4

BERT, LDA, and TFIDF based keyword extraction in Python

Generating new names based on trends in data using GPT2 (Transformer network)

LV-BERT: Exploiting Layer Variety for BERT (Findings of ACL 2021)

Implementation of COCO-LM, Correcting and Contrasting Text Sequences for Language Model Pretraining, in Pytorch

Scene Text Retrieval via Joint Text Detection and Similarity Learning

Espresso: A Fast End-to-End Neural Speech Recognition Toolkit

jiant is an NLP toolkit

Large-scale open domain KNOwledge grounded conVERsation system based on PaddlePaddle

Recognition of 38 speech commands in russian. Based on Yandex Cup 2021 ML Challenge: ASR

This repository contains the code for running the character-level Sandwich Transformers from our ACL 2020 paper on Improving Transformer Models by Reordering their Sublayers.