Research Code for NeurIPS 2020 Spotlight paper "Large-Scale Adversarial Training for Vision-and-Language Representation Learning": UNITER adversarial training part

Last update: Dec 31, 2022

Overview

VILLA: Vision-and-Language Adversarial Training

This is the official repository of VILLA (NeurIPS 2020 Spotlight). This repository currently supports adversarial finetuning of UNITER on VQA, VCR, NLVR2, and SNLI-VE. Adversarial pre-training with in-domain data will be available soon. Both VILLA-base and VILLA-large pre-trained checkpoints are released.

Most of the code in this repo are copied/modified from UNITER.

Requirements

We provide Docker image for easier reproduction. Please install the following:

Our scripts require the user to have the docker group membership so that docker commands can be run without sudo. We only support Linux with NVIDIA GPUs. We test on Ubuntu 18.04 and V100 cards. We use mixed-precision training hence GPUs with Tensor Cores are recommended.

Quick Start

NOTE: Please run bash scripts/download_pretrained.sh $PATH_TO_STORAGE to get our latest pretrained VILLA checkpoints. This will download both the base and large models.

We use VQA as an end-to-end example for using this code base.

Download processed data and pretrained models with the following command.

bash scripts/download_vqa.sh $PATH_TO_STORAGE

After downloading you should see the following folder structure:

├── finetune 
├── img_db
│   ├── coco_test2015
│   ├── coco_test2015.tar
│   ├── coco_train2014
│   ├── coco_train2014.tar
│   ├── coco_val2014
│   ├── coco_val2014.tar
│   ├── vg
│   └── vg.tar
├── pretrained
    ├── uniter-base.pt
│   └── villa-base.pt
└── txt_db
    ├── vqa_devval.db
    ├── vqa_devval.db.tar
    ├── vqa_test.db
    ├── vqa_test.db.tar
    ├── vqa_train.db
    ├── vqa_train.db.tar
    ├── vqa_trainval.db
    ├── vqa_trainval.db.tar
    ├── vqa_vg.db
    └── vqa_vg.db.tar

You can put different pre-trained checkpoints inside the /pretrained folder based on your need.

Launch the Docker container for running the experiments.
```
# docker image should be automatically pulled
source launch_container.sh $PATH_TO_STORAGE/txt_db $PATH_TO_STORAGE/img_db \
    $PATH_TO_STORAGE/finetune $PATH_TO_STORAGE/pretrained
```
The launch script respects $CUDA_VISIBLE_DEVICES environment variable. Note that the source code is mounted into the container under /src instead of built into the image so that user modification will be reflected without re-building the image. (Data folders are mounted into the container separately for flexibility on folder structures.)

Run finetuning for the VQA task.

# inside the container
horovodrun -np $N_GPU python train_vqa_adv.py --config $YOUR_CONFIG_JSON

# specific example
horovodrun -np 4 python train_vqa_adv.py --config config/train-vqa-base-4gpu-adv.json

Run inference for the VQA task and then evaluate.
```
# inference
python inf_vqa.py --txt_db /txt/vqa_test.db --img_db /img/coco_test2015 \
--output_dir $VQA_EXP --checkpoint 6000 --pin_mem --fp16
```
The result file will be written at $VQA_EXP/results_test/results_6000_all.json, which can be submitted to the evaluation server
Customization
```
# training options
python train_vqa_adv.py --help
```
- command-line argument overwrites JSON config files
- JSON config overwrites argparse default value.
- use horovodrun to run multi-GPU training
- --gradient_accumulation_steps emulates multi-gpu training
- --checkpoint selects UNITER or VILLA pre-trained checkpoints
- --adv_training decides using adv. training or not
- --adv_modality takes values from ['text'], ['image'], ['text','image'], and ['text','image','alter'], the last two correspond to adding perturbations on two modalities simultaneously or alternatively

Downstream Tasks Finetuning

VCR

NOTE: train and inference should be ran inside the docker container

download data

bash scripts/download_vcr.sh $PATH_TO_STORAGE

train

horovodrun -np 4 python train_vcr_adv.py --config config/train-vcr-base-4gpu-adv.json \
    --output_dir $VCR_EXP

inference

horovodrun -np 4 python inf_vcr.py --txt_db /txt/vcr_test.db \
    --img_db "/img/vcr_gt_test/;/img/vcr_test/" \
    --split test --output_dir $VCR_EXP --checkpoint 8000 \
    --pin_mem --fp16

The result file will be written at $VCR_EXP/results_test/results_8000_all.csv, which can be submitted to VCR leaderboard for evaluation.

NLVR2

NOTE: train and inference should be ran inside the docker container

download data

bash scripts/download_nlvr2.sh $PATH_TO_STORAGE

train

horovodrun -np 4 python train_nlvr2_adv.py --config config/train-nlvr2-base-1gpu-adv.json \
    --output_dir $NLVR2_EXP

inference

python inf_nlvr2.py --txt_db /txt/nlvr2_test1.db/ --img_db /img/nlvr2_test/ \
--train_dir /storage/nlvr-base/ --ckpt 6500 --output_dir . --fp16

Visual Entailment (SNLI-VE)

NOTE: train should be ran inside the docker container

download data

bash scripts/download_ve.sh $PATH_TO_STORAGE

train

horovodrun -np 2 python train_ve_adv.py --config config/train-ve-base-2gpu-adv.json \
    --output_dir $VE_EXP

Adversarial Training of LXMERT

To keep things simple, we provide another separate repo that can be used to reproduce our results on adversarial finetuning of LXMERT on VQA, GQA, and NLVR2.

Citation

If you find this code useful for your research, please consider citing:

@inproceedings{gan2020large,
  title={Large-Scale Adversarial Training for Vision-and-Language Representation Learning},
  author={Gan, Zhe and Chen, Yen-Chun and Li, Linjie and Zhu, Chen and Cheng, Yu and Liu, Jingjing},
  booktitle={NeurIPS},
  year={2020}
}

@inproceedings{chen2020uniter,
  title={Uniter: Universal image-text representation learning},
  author={Chen, Yen-Chun and Li, Linjie and Yu, Licheng and Kholy, Ahmed El and Ahmed, Faisal and Gan, Zhe and Cheng, Yu and Liu, Jingjing},
  booktitle={ECCV},
  year={2020}
}

License

MIT

Research Code for NeurIPS 2020 Spotlight paper "Large-Scale Adversarial Training for Vision-and-Language Representation Learning": UNITER adversarial training part

Related tags

Overview

VILLA: Vision-and-Language Adversarial Training

Requirements

Quick Start

Downstream Tasks Finetuning

VCR

NLVR2

Visual Entailment (SNLI-VE)

Adversarial Training of LXMERT

Citation

License

Owner

Zhe Gan

American Sign Language (ASL) to Text Converter

Takes a string and puts it through different languages in Google Translate a requested amount of times, returning nonsense.

Product-Review-Summarizer - Created a product review summarizer which clustered thousands of product reviews and summarized them into a maximum of 500 characters, saving precious time of customers and helping them make a wise buying decision.

AllenNLP integration for Shiba: Japanese CANINE model

Research code for the paper "Fine-tuning wav2vec2 for speaker recognition"

OceanScript is an Esoteric language used to encode and decode text into a formulation of characters

FedNLP: A Benchmarking Framework for Federated Learning in Natural Language Processing

🦅 Pretrained BigBird Model for Korean (up to 4096 tokens)

Honor's thesis project analyzing whether the GPT-2 model can more effectively generate free-verse or structured poetry.

fastai ulmfit - Pretraining the Language Model, Fine-Tuning and training a Classifier

Simple Text-Generator with OpenAI gpt-2 Pytorch Implementation

Implementation of TF-IDF algorithm to find documents similarity with cosine similarity

An A-SOUL Text Generator Based on CPM-Distill.

A framework for training and evaluating AI models on a variety of openly available dialogue datasets.

Syntax-aware Multi-spans Generation for Reading Comprehension (TASLP 2022)

Crie tokens de autenticação íntegros e seguros com UToken.

Reformer, the efficient Transformer, in Pytorch

Code for ACL 2020 paper "Rigid Formats Controlled Text Generation"

The code from the whylogs workshop in DataTalks.Club on 29 March 2022

A Domain Specific Language (DSL) for building language patterns. These can be later compiled into spaCy patterns, pure regex, or any other format