Multi-modal Text Recognition Networks: Interactive Enhancements between Visual and Semantic Features

Related tags

Deep LearningMATRN
Overview

Multi-modal Text Recognition Networks: Interactive Enhancements between Visual and Semantic Features

| paper |

Official PyTorch implementation for Multi-modal Text Recognition Networks: Interactive Enhancements between Visual and Semantic Features (MATRN).

This paper introduces a novel method, called Multi-modAl Text Recognition Network (MATRN), that enables interactions between visual and semantic features for better recognition performances.

Datasets

We use lmdb dataset for training and evaluation dataset. The datasets can be downloaded in clova (for validation and evaluation) and ABINet (for training and evaluation).

  • Training datasets
  • Validation datasets
  • Evaluation datasets
  • Tree structure of data directory
    data
    ├── charset_36.txt
    ├── evaluation
    │   ├── CUTE80
    │   ├── IC13_857
    │   ├── IC13_1015
    │   ├── IC15_1811
    │   ├── IC15_2077
    │   ├── IIIT5k_3000
    │   ├── SVT
    │   └── SVTP
    ├── training
    │   ├── MJ
    │   │   ├── MJ_test
    │   │   ├── MJ_train
    │   │   └── MJ_valid
    │   └── ST
    ├── validation
    ├── WikiText-103.csv
    └── WikiText-103_eval_d1.csv
    

Requirements

pip install torch==1.7.1 torchvision==0.8.2 fastai==1.0.60 lmdb pillow opencv-python

Pretrained Models

  • Download pretrained model of MATRN from this link. Performances of the pretrained models are:
Model IIIT SVT IC13S IC13L IC15S IC15L SVTP CUTE
MATRN 96.7 94.9 97.9 95.8 86.6 82.9 90.5 94.1

Training and Evaluation

  • Training
python main.py --config=configs/train_matrn.yaml
  • Evaluation
python main.py --config=configs/train_matrn.yaml --phase test --image_only

Additional flags:

  • --checkpoint /path/to/checkpoint set the path of evaluation model
  • --test_root /path/to/dataset set the path of evaluation dataset
  • --model_eval [alignment|vision|language] which sub-model to evaluate
  • --image_only disable dumping visualization of attention masks

Acknowledgements

This implementation has been based on ABINet.

Citation

Please cite this work in your publications if it helps your research.

@article{na2021multi,
  title={Multi-modal Text Recognition Networks: Interactive Enhancements between Visual and Semantic Features},
  author={Na, Byeonghu and Kim, Yoonsik and Park, Sungrae},
  journal={arXiv preprint arXiv:2111.15263},
  year={2021}
}
Implementation of Change-Based Exploration Transfer (C-BET)

Implementation of Change-Based Exploration Transfer (C-BET), as presented in Interesting Object, Curious Agent: Learning Task-Agnostic Exploration.

Simone Parisi 29 Dec 04, 2022
Finite difference solution of 2D Poisson equation. Can handle Dirichlet, Neumann and mixed boundary conditions.

Poisson-solver-2D Finite difference solution of 2D Poisson equation Current version can handle Dirichlet, Neumann, and mixed (combination of Dirichlet

Mohammad Asif Zaman 34 Dec 23, 2022
Colossal-AI: A Unified Deep Learning System for Large-Scale Parallel Training

ColossalAI An integrated large-scale model training system with efficient parallelization techniques. arXiv: Colossal-AI: A Unified Deep Learning Syst

HPC-AI Tech 7.9k Jan 08, 2023
The Official Implementation of the ICCV-2021 Paper: Semantically Coherent Out-of-Distribution Detection.

SCOOD-UDG (ICCV 2021) This repository is the official implementation of the paper: Semantically Coherent Out-of-Distribution Detection Jingkang Yang,

Jake YANG 62 Nov 21, 2022
On the Adversarial Robustness of Visual Transformer

On the Adversarial Robustness of Visual Transformer Code for our paper "On the Adversarial Robustness of Visual Transformers"

Rulin Shao 35 Dec 14, 2022
Collaborative forensic timeline analysis

Timesketch Table of Contents About Timesketch Getting started Community Contributing About Timesketch Timesketch is an open-source tool for collaborat

Google 2.1k Dec 28, 2022
An ever-growing playground of notebooks showcasing CLIP's impressive zero-shot capabilities.

Playground for CLIP-like models Demo Colab Link GradCAM Visualization Naive Zero-shot Detection Smarter Zero-shot Detection Captcha Solver Changelog 2

Kevin Zakka 101 Dec 30, 2022
High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.

What is xLearn? xLearn is a high performance, easy-to-use, and scalable machine learning package that contains linear model (LR), factorization machin

Chao Ma 3k Jan 03, 2023
DimReductionClustering - Dimensionality Reduction + Clustering + Unsupervised Score Metrics

Dimensionality Reduction + Clustering + Unsupervised Score Metrics Introduction

11 Nov 15, 2022
Exploration & Research into cross-domain MEV. Initial focus on ETH/POLYGON.

xMEV, an apt exploration This is a small exploration on the xMEV opportunities between Polygon and Ethereum. It's a data analysis exercise on a few pa

odyslam.eth 7 Oct 18, 2022
Styled Augmented Translation

SAT Style Augmented Translation Introduction By collecting high-quality data, we were able to train a model that outperforms Google Translate on 6 dif

139 Dec 29, 2022
A Fast and Accurate One-Stage Approach to Visual Grounding, ICCV 2019 (Oral)

One-Stage Visual Grounding ***** New: Our recent work on One-stage VG is available at ReSC.***** A Fast and Accurate One-Stage Approach to Visual Grou

Zhengyuan Yang 118 Dec 05, 2022
CoANet: Connectivity Attention Network for Road Extraction From Satellite Imagery

CoANet: Connectivity Attention Network for Road Extraction From Satellite Imagery This paper (CoANet) has been published in IEEE TIP 2021. This code i

Jie Mei 53 Dec 03, 2022
Official Implementation of "Transformers Can Do Bayesian Inference"

Official Code for the Paper "Transformers Can Do Bayesian Inference" We train Transformers to do Bayesian Prediction on novel datasets for a large var

AutoML-Freiburg-Hannover 103 Dec 25, 2022
CLIPImageClassifier wraps clip image model from transformers

CLIPImageClassifier CLIPImageClassifier wraps clip image model from transformers. CLIPImageClassifier is initialized with the argument classes, these

Jina AI 6 Sep 12, 2022
Code and Data for the paper: Molecular Contrastive Learning with Chemical Element Knowledge Graph [AAAI 2022]

Knowledge-enhanced Contrastive Learning (KCL) Molecular Contrastive Learning with Chemical Element Knowledge Graph [ AAAI 2022 ]. We construct a Chemi

Fangyin 58 Dec 26, 2022
(CVPR 2021) Lifting 2D StyleGAN for 3D-Aware Face Generation

Lifting 2D StyleGAN for 3D-Aware Face Generation Official implementation of paper "Lifting 2D StyleGAN for 3D-Aware Face Generation". Requirements You

Yichun Shi 66 Nov 29, 2022
Curvlearn, a Tensorflow based non-Euclidean deep learning framework.

English | 简体中文 Why Non-Euclidean Geometry Considering these simple graph structures shown below. Nodes with same color has 2-hop distance whereas 1-ho

Alibaba 123 Dec 12, 2022
6D Grasping Policy for Point Clouds

GA-DDPG [website, paper] Installation git clone https://github.com/liruiw/GA-DDPG.git --recursive Setup: Ubuntu 16.04 or above, CUDA 10.0 or above, py

Lirui Wang 48 Dec 21, 2022
Repository for the paper "From global to local MDI variable importances for random forests and when they are Shapley values"

From global to local MDI variable importances for random forests and when they are Shapley values Antonio Sutera ( Antonio Sutera 3 Feb 23, 2022