Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps

Here is the code for ssbassline model. We also provide OCR results/features/models. The code is built on top of M4C, where more detailed information can also be found.

Citation

If you use ssbaseline in your work, please cite:

@article{zhu2020simple,
  title={Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps},
  author={Zhu, Qi and Gao, Chenyu and Wang, Peng and Wu, Qi},
  journal={arXiv preprint arXiv:2012.05153},
  year={2020}
}

Installation

First install the repo using

git clone https://github.com/ZephyrZhuQi/ssbaseline.git ~/ssbaseline
cd ~/ssbaseline
python setup.py build develop

Getting Data

We provide SBD-Trans OCR for TextVQA and ST-VQA datasets. The corresponding OCR Faster R-CNN features and Recog-CNN features are also released.

Datasets	ImDBs	Object Faster R-CNN Features	OCR Faster R-CNN Features	OCR Recog-CNN Features
TextVQA	TextVQA ImDB	Open Images	TextVQA SBD-Trans OCRs	TextVQA SBD-Trans OCRs
ST-VQA	ST-VQA ImDB	ST-VQA Objects	ST-VQA SBD-Trans OCRs	ST-VQA SBD-Trans OCRs

Pretrained Models

We release the following pretrained models for ssbaseline on TextVQA.

For the TextVQA dataset, we release: ssbaseline trained with ST-VQA as additional data (our best model) with SBD-Trans.

Datasets	Config Files (under `configs/vqa/`)	Pretrained Models	Metrics	Notes
TextVQA (`m4c_textvqa`)	`m4c_textvqa/m4c_with_stvqa.yml`	`ssbaseline_with_stvqa`	val accuracy - 45.53%; test accuracy - 45.66%	SBD-Trans OCRs; ST-VQA as additional data

Training and Evaluation

Please follow the M4C README for the training and evaluation of the M4C model on each dataset.

Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps[AAAI2021]

Related tags

Overview

Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps

Citation

Installation

Getting Data

Pretrained Models

Training and Evaluation

Owner

ZephyrZhuQi

UltraGCN: An Ultra Simplification of Graph Convolutional Networks for Recommendation

Personalized Federated Learning using Pytorch (pFedMe)

Causal Imitative Model for Autonomous Driving

MagFace: A Universal Representation for Face Recognition and Quality Assessment

This is an official repository of CLGo: Learning to Predict 3D Lane Shape and Camera Pose from a Single Image via Geometry Constraints

Blind Video Temporal Consistency via Deep Video Prior

Using deep actor-critic model to learn best strategies in pair trading

Machine Learning Platform for Kubernetes

The codebase for our paper "Generative Occupancy Fields for 3D Surface-Aware Image Synthesis" (NeurIPS 2021)

The world's simplest facial recognition api for Python and the command line

A fast Evolution Strategy implementation in Python

InsTrim: Lightweight Instrumentation for Coverage-guided Fuzzing

LightNet++: Boosted Light-weighted Networks for Real-time Semantic Segmentation

Meta Representation Transformation for Low-resource Cross-lingual Learning

Official Keras Implementation for UNet++ in IEEE Transactions on Medical Imaging and DLMIA 2018

Official repository of ICCV21 paper "Viewpoint Invariant Dense Matching for Visual Geolocalization"

This is the face keypoint train code of project face-detection-project

This repository contains the implementation of Deep Detail Enhancment for Any Garment proposed in Eurographics 2021

Semi-supervised learning for object detection

EMNLP'2021: Simple Entity-centric Questions Challenge Dense Retrievers