PyTorch implementation of CDistNet: Perceiving Multi-Domain Character Distance for Robust Text Recognition

Last update: Jul 20, 2022

Related tags

Overview

PyTorch implementation of CDistNet: Perceiving Multi-Domain Character Distance for Robust Text Recognition

The unofficial code of CDistNet.

Now, we have implemented all the modules according to the papaer except for TPS in the visual branch.You can refer ASTER for the implementation of TPS.

Requirements

Python3.6.8
lmdb==0.98
torch==1.5.1
torchvision==0.6.1
Pillow==6.1.0
opencv-python==4.2.0.32
numpy==1.17.1

Data preparation

We offer you a tool to transform raw dataset to LMDB dataset. Details please refer to tools/create_lmdb_dataset.py

You can also download lmdb dataset from OCR_Dataset

Train

First you need to modify some arguments in configs/cdistnet.yml.

TrainReader set the path of train lmdb dataset.
EvalReader set the path of evaluation lmdb dataset.
Global set the args like image_shape, dict_file, etc.
VisualModule set the args of visual branch in the original paper.
PositionalEmbedding set the args of positional branch.
SemanticEmbedding set the args of semantic branch.
MDCDP set the args of MDCDP.

python train.py -c configs/cdistnet.yml

Demo

Modify these arguments below in configs/cdistnet.yml.

pretrain_weights set the path of model file path.
infer_img set the image path.
`is_train set to False.

python predict.py -c configs/cdistnet.yml

TODO

Pretrained models
Test code
Comparison with original paper on benchmarks(CUTE, IC13, IC15, IIIT5K, SVT, SVTP)

PyTorch implementation of CDistNet: Perceiving Multi-Domain Character Distance for Robust Text Recognition

Related tags

Overview

PyTorch implementation of CDistNet: Perceiving Multi-Domain Character Distance for Robust Text Recognition

Requirements

Data preparation

Train

Demo

TODO

Owner

State-of-the-art language models can match human performance on many tasks

Boostcamp AI Tech 3rd / Basic Paper reading w.r.t Embedding

[ICML 2021] Break-It-Fix-It: Learning to Repair Programs from Unlabeled Data

Simple keras FCN Encoder/Decoder model for MS-COCO (food subset) segmentation

Record radiologists' eye gaze when they are labeling images.

NCVX (NonConVeX): A User-Friendly and Scalable Package for Nonconvex Optimization in Machine Learning.

JAX bindings to the Flatiron Institute Non-uniform Fast Fourier Transform (FINUFFT) library

Code for testing various M1 Chip benchmarks with TensorFlow.

Siamese TabNet

:fire: 2D and 3D Face alignment library build using pytorch

A Pytorch Implementation of Source Data-free Domain Adaptation for a Faster R-CNN

In this project I played with mlflow, streamlit and fastapi to create a training and prediction app on digits

Full Stack Deep Learning Labs

Discovering Dynamic Salient Regions with Spatio-Temporal Graph Neural Networks

SOLOv2 on onnx & tensorRT

Learning from History: Modeling Temporal Knowledge Graphs with Sequential Copy-Generation Networks

DeepLab is a state-of-art deep learning system for semantic image segmentation built on top of Caffe.

STYLER: Style Factor Modeling with Rapidity and Robustness via Speech Decomposition for Expressive and Controllable Neural Text to Speech

A high-performance Python-based I/O system for large (and small) deep learning problems, with strong support for PyTorch.

Research code for the paper "How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models"