TAP: Text-Aware Pre-training for Text-VQA and Text-Caption, CVPR 2021 (Oral)

Related tags

Deep LearningTAP
Overview

TAP: Text-Aware Pre-training

TAP: Text-Aware Pre-training for Text-VQA and Text-Caption

by Zhengyuan Yang, Yijuan Lu, Jianfeng Wang, Xi Yin, Dinei Florencio, Lijuan Wang, Cha Zhang, Lei Zhang, and Jiebo Luo

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021, Oral

Introduction

We propose Text-Aware Pre-training (TAP) for Text-VQA and Text-Caption tasks. For more details, please refer to our paper.

Citation

@inproceedings{yang2021tap,
  title={TAP: Text-Aware Pre-training for Text-VQA and Text-Caption},
  author={Yang, Zhengyuan and Lu, Yijuan and Wang, Jianfeng and Yin, Xi and Florencio, Dinei and Wang, Lijuan and Zhang, Cha and Zhang, Lei and Luo, Jiebo},
  booktitle={CVPR},
  year={2021}
}

Prerequisites

  • Python 3.6

  • Pytorch 1.4.0

  • Please refer to requirements.txt. Or using

    python setup.py develop
    

Installation

  1. Clone the repository

    git clone https://github.com/microsoft/TAP.git
    cd TAP
    python setup.py develop
    
  2. Data

  • Please refer to the Readme in the data folder.

Training

  1. Train the model, run the code under main folder. Using flag --pretrain to access the pre-training mode, otherwise the main QA/Captioning losses are used to optimize the model. Example yml files are in configs folder. Detailed configs are in released models.

    Pre-training:

    python -m torch.distributed.launch --nproc_per_node $num_gpu tools/run.py --pretrain --tasks vqa --datasets $dataset --model $model --seed $seed --config configs/vqa/$dataset/"$pretrain_yml".yml --save_dir save/$pretrain_savedir training_parameters.distributed True
    
    # for example
    python -m torch.distributed.launch --nproc_per_node 4 tools/run.py --pretrain --tasks vqa --datasets m4c_textvqa --model m4c_split --seed 13 --config configs/vqa/m4c_textvqa/tap_base_pretrain.yml --save_dir save/m4c_split_pretrain_test training_parameters.distributed True
    

    Fine-tuning:

    python -m torch.distributed.launch --nproc_per_node $num_gpu tools/run.py --tasks vqa --datasets $dataset --model $model --seed $seed --config configs/vqa/$dataset/"$refine_yml".yml --save_dir save/$refine_savedir --resume_file save/$pretrain_savedir/$savename/best.ckpt training_parameters.distributed True
    
    # for example
    python -m torch.distributed.launch --nproc_per_node 4 tools/run.py --tasks vqa --datasets m4c_textvqa --model m4c_split --seed 13 --config configs/vqa/m4c_textvqa/tap_refine.yml --save_dir save/m4c_split_refine_test --resume_file save/pretrained/textvqa_tap_base_pretrain.ckpt training_parameters.distributed True
    
  2. Evaluate the model, run the code under main folder. Set up val or test set by --run_type.

    python -m torch.distributed.launch --nproc_per_node $num_gpu tools/run.py --tasks vqa --datasets $dataset --model $model --config configs/vqa/$dataset/"$refine_yml".yml --save_dir save/$refine_savedir --run_type val --resume_file save/$refine_savedir/$savename/best.ckpt training_parameters.distributed True
    
    # for example
    python -m torch.distributed.launch --nproc_per_node 4 tools/run.py --tasks vqa --datasets m4c_textvqa --model m4c_split --config configs/vqa/m4c_textvqa/tap_refine.yml --save_dir save/m4c_split_refine_test --run_type val --resume_file save/finetuned/textvqa_tap_base_best.ckpt training_parameters.distributed True
    
  3. Captioning evaluation.

    python projects/M4C_Captioner/scripts/textcaps_eval.py --set val --pred_file YOUR_VAL_PREDICTION_FILE
    

Performance and Pre-trained Models

Please check the detailed experiment settings in our paper.

Model checkpoints (~17G).

path/to/azcopy copy https://tapvqacaption.blob.core.windows.net/data/save <local_path>/save --recursive

Please refer to the Readme in the data folder for the detailed instructions on azcopy downloading.

Text-VQA TAP TAP** (with extra data)
TextVQA 49.91 54.71
STVQA 45.29 50.83
Text-Captioning TAP TAP** (with extra data)
TextCaps 105.05 109.16

Credits

The project is built based on the following repository:

Owner
Microsoft
Open source projects and samples from Microsoft
Microsoft
Resources for the Ki testnet challenge

Ki Testnet Challenge This repository hosts ki-testnet-challenge. A set of scripts and resources to be used for the Ki Testnet Challenge What is the te

Ki Foundation 23 Aug 08, 2022
MohammadReza Sharifi 27 Dec 13, 2022
[CVPR 2021] 'Searching by Generating: Flexible and Efficient One-Shot NAS with Architecture Generator'

[CVPR2021] Searching by Generating: Flexible and Efficient One-Shot NAS with Architecture Generator Overview This is the entire codebase for the paper

35 Dec 01, 2022
The DL Streamer Pipeline Zoo is a catalog of optimized media and media analytics pipelines.

The DL Streamer Pipeline Zoo is a catalog of optimized media and media analytics pipelines. It includes tools for downloading pipelines and their dependencies and tools for measuring their performace

8 Dec 04, 2022
The official PyTorch code for 'DER: Dynamically Expandable Representation for Class Incremental Learning' accepted by CVPR2021

DER.ClassIL.Pytorch This repo is the official implementation of DER: Dynamically Expandable Representation for Class Incremental Learning (CVPR 2021)

rhyssiyan 108 Jan 01, 2023
NeurIPS workshop paper 'Counter-Strike Deathmatch with Large-Scale Behavioural Cloning'

Counter-Strike Deathmatch with Large-Scale Behavioural Cloning Tim Pearce, Jun Zhu Offline RL workshop, NeurIPS 2021 Paper: https://arxiv.org/abs/2104

Tim Pearce 169 Dec 26, 2022
Deployment of PyTorch chatbot with Flask

Chatbot Deployment with Flask and JavaScript In this tutorial we deploy the chatbot I created in this tutorial with Flask and JavaScript. This gives 2

Patrick Loeber (Python Engineer) 107 Dec 29, 2022
Code for our ALiBi method for transformer language models.

Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation This repository contains the code and models for our paper Tra

Ofir Press 211 Dec 31, 2022
TensorFlow-based implementation of "Pyramid Scene Parsing Network".

PSPNet_tensorflow Important Code is fine for inference. However, the training code is just for reference and might be only used for fine-tuning. If yo

HsuanKung Yang 323 Dec 20, 2022
A curated list of neural rendering resources.

Awesome-of-Neural-Rendering A curated list of neural rendering and related resources. Please feel free to pull requests or open an issue to add papers

Zhiwei ZHANG 43 Dec 09, 2022
Train Yolov4 using NBX-Jobs

yolov4-trainer-nbox Train Yolov4 using NBX-Jobs. Use the powerfull functionality available in nbox-SDK repo to train a tiny-Yolo v4 model on Pascal VO

Yash Bonde 1 Jan 12, 2022
Zen-NAS: A Zero-Shot NAS for High-Performance Deep Image Recognition

Zen-NAS: A Zero-Shot NAS for High-Performance Deep Image Recognition How Fast Compare to Other Zero-Shot NAS Proxies on CIFAR-10/100 Pre-trained Model

190 Dec 29, 2022
CV backbones including GhostNet, TinyNet and TNT, developed by Huawei Noah's Ark Lab.

CV Backbones including GhostNet, TinyNet, TNT (Transformer in Transformer) developed by Huawei Noah's Ark Lab. GhostNet Code TinyNet Code TNT Code Pyr

HUAWEI Noah's Ark Lab 3k Jan 08, 2023
A GUI for Face Recognition, based upon Docker, Tkinter, GPU and a camera device.

Face Recognition GUI This repository is a GUI version of Face Recognition by Adam Geitgey, where e.g. Docker and Tkinter are utilized. All the materia

Kasper Henriksen 6 Dec 05, 2022
Face Mask Detection system based on computer vision and deep learning using OpenCV and Tensorflow/Keras

Face Mask Detection Face Mask Detection System built with OpenCV, Keras/TensorFlow using Deep Learning and Computer Vision concepts in order to detect

Chandrika Deb 1.4k Jan 03, 2023
i-RevNet Pytorch Code

i-RevNet: Deep Invertible Networks Pytorch implementation of i-RevNets. i-RevNets define a family of fully invertible deep networks, built from a succ

Jörn Jacobsen 378 Dec 06, 2022
A Joint Video and Image Encoder for End-to-End Retrieval

Frozen️ in Time ❄️ ️️️️ ⏳ A Joint Video and Image Encoder for End-to-End Retrieval project page | arXiv | webvid-data Repository containing the code,

225 Dec 25, 2022
Code for the CVPR2021 workshop paper "Noise Conditional Flow Model for Learning the Super-Resolution Space"

NCSR: Noise Conditional Flow Model for Learning the Super-Resolution Space Official NCSR training PyTorch Code for the CVPR2021 workshop paper "Noise

57 Oct 03, 2022
This is the repository of our article published on MDPI Entropy "Feature Selection for Recommender Systems with Quantum Computing".

Collaborative-driven Quantum Feature Selection This repository was developed by Riccardo Nembrini, PhD student at Politecnico di Milano. See the websi

Quantum Computing Lab @ Politecnico di Milano 10 Apr 21, 2022
Code for our CVPR 2021 Paper "Rethinking Style Transfer: From Pixels to Parameterized Brushstrokes".

Rethinking Style Transfer: From Pixels to Parameterized Brushstrokes (CVPR 2021) Project page | Paper | Colab | Colab for Drawing App Rethinking Style

CompVis Heidelberg 153 Jan 04, 2023