Visual dialog agents with pre-trained vision-and-language encoders.

Overview

Learning Better Visual Dialog Agents with Pretrained Visual-Linguistic Representation

Or READ-UP: Referring Expression Agent Dialog with Unified Pretraining.

This repo includes the training/testing code for our paper Learning Better Visual Dialog Agents with Pretrained Visual-Linguistic Representation that has been accepted by CVPR 2021.

Please cite the following paper if you use the code in this repository:

@inproceedings{tu2021learning,
  title={Learning Better Visual Dialog Agents with Pretrained Visual-Linguistic Representation},
  author={Tu, Tao and Ping, Qing and Thattai, Govindarajan and Tur, Gokhan and Natarajan, Prem},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={5622--5631},
  year={2021}
}

Repository Setup

Environment

The following environment is recommended:

Instance storage: > 800 GB
pytorch 1.4.0
cuda 10.0

Set up virtual environment and install pytorch:

$ conda create -n read_up python=3.6
$ conda activate read_up
$ git clone https://github.com/amazon-research/read-up.git

# [IMPORTANT] pytorch 1.4.0 have no issue for parallel training
$ conda install pytorch==1.4.0 torchvision==0.5.0 cudatoolkit=10.0 -c pytorch

Install dependencies:

# Install general dependencies
$ sudo apt-get install build-essential libcap-dev
$ pip install -r requirement.txt

# Install vqa-maskrcnn-benchmark (for feature extraction only)
$ git clone https://gitlab.com/vedanuj/vqa-maskrcnn-benchmark.git
$ cd vqa-maskrcnn-benchmark
$ python setup.py build develop

Install Apex for distributed training

# Apex is used for both `faster-rcnn feature extraction` & `distributed training`
$ git clone https://github.com/NVIDIA/apex
$ cd apex
$ pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

Dataset

Meta-data

Download the GuessWhat?! dataset:

$ wget https://florian-strub.com/guesswhat.train.jsonl.gz -P data/
$ wget https://florian-strub.com//guesswhat.valid.jsonl.gz -P data/
$ wget https://florian-strub.com//guesswhat.test.jsonl.gz -P data/

Prepare dict.json:

  1. Set up repo as instructed in https://github.com/GuessWhatGame/guesswhat
  2. Generate the dict.json file:
$ python src/guesswhat/preprocess_data/create_dictionary.py -data_dir data -dict_file dict.json -min_occ 3
  1. Copy dict.json file to read-up repo:
$ cd read-up
$ mkdir tf-pretrained-model
$ cp guesswhat/data/dict.json read-up/tf-pretrained-model/

Dataset for Oracle models

1. Dataset for baseline Oracle + Faster-RCNN visual features.

Under vqa-maskrcnn-benchmark/data/, download RCNN model and COCO images:

# download RCNN model
$ wget https://dl.fbaipublicfiles.com/vilbert-multi-task/detectron_model.pth
$ wget https://dl.fbaipublicfiles.com/vilbert-multi-task/detectron_config.yaml

# download COCO data  
$ wget http://images.cocodataset.org/zips/train2014.zip
$ wget http://images.cocodataset.org/zips/val2014.zip
$ wget http://images.cocodataset.org/zips/test2014.zip
$ unzip -j train2014.zip 
$ unzip -j valid2014.zip 
$ unzip -j test2014.zip 

Copy the guesswhat.train/valid/test.jsonl to vqa-maskrcnn-benchmark/data/. Unzip the COCO images into a folder image_dir/COCO_2014/images/, and prepare a npy file for feature extraction later.


$ python bin/prepare_extract_gt_features_gw.py \
    --src vqa-maskrcnn-benchmark/data/guesswhat.train.jsonl \
    --img-dir vqa-maskrcnn-benchmark/image_dir/COCO_2014/images/ \
    --out vqa-maskrcnn-benchmark/image_dir/COCO_2014/npy_files/guesswhat.train.npy

Repeat the same process for val and test. The generated file looks like the following:

{
    {
        'file_name': 'name_of_image_file',
        'file_path': '<path_to_image_file_on_your_disk>',
        'bbox': array([
                        [ x1, y1, width1, height1],
                        [ x2, y2, width2, height2],
                        ...
                    ]),
        'num_box': 2
    },
    ....
}

Extract features from the ground-truth bounding boxes generated before:

$ python bin/extract_features_from_gt.py \
    --model_file vqa-maskrcnn-benchmark/data/detectron_model.pth \
    --config_file vqa-maskrcnn-benchmark/data/detectron_config.yaml \
    --imdb_gt_file vqa-maskrcnn-benchmark/image_dir/COCO_2014/npy_files/guesswhat.train.npy \
    --output_folder data/rcnn/from_gt_gw_xyxy_scale/train

Repeat this process for val and test data.

2. Dataset for our Oracle model.

Download the pretrained VilBERT model (both vanilla and 12-in-1 have similar performance in our experiments).

# download vanilla pretrained model
$ cd vilbert-pretrained-model
$ wget https://dl.fbaipublicfiles.com/vilbert-multi-task/pretrained_model.bin

# download 12-in-1 pretrained model
$ wget https://dl.fbaipublicfiles.com/vilbert-multi-task/multi_task_model.bin

Download the features for COCO:

$ wget https://dl.fbaipublicfiles.com/vilbert-multi-task/datasets/coco/features_100/COCO_trainval_resnext152_faster_rcnn_genome.lmdb/data.mdb && mv data.mdb COCO_trainval_resnext152_faster_rcnn_genome.lmdb/
$ wget https://dl.fbaipublicfiles.com/vilbert-multi-task/datasets/coco/features_100/COCO_test_resnext152_faster_rcnn_genome.lmdb/data.mdb && mv data.mdb COCO_test_resnext152_faster_rcnn_genome.lmdb/

Dataset for Q-Gen models

1. Dataset for baseline Q-Gen model [1]

$ wget www.florian-strub.com/github/ft_vgg_img.zip
$ unzip ft_vgg_img.zip -d img/

2. Dataset for VDST Q-Gen model [2]

$ python bin/extract_features.py \
    --model_file vqa-maskrcnn-benchmark/data/detectron_model.pth \
    --config_file vqa-maskrcnn-benchmark/data/detectron_config.yaml \
    --image_dir vqa-maskrcnn-benchmark/image_dir/COCO_2014/images/ \
    --output_folder data/rcnn/from_rcnn/ \
    --batch_size 8

Dataset for Guesser models

1. Dataset for baseline Guesser model[1]

$ cd data/vilbert-multi-task
$ wget https://dl.fbaipublicfiles.com/vilbert-multi-task/datasets.tar.gz
$ tar -I pigz -xvf datasets.tar.gz datasets/guesswhat/

Model Training & Evaluation

Oracle

To train our Oracle model:

$ python -m torch.distributed.launch \
    --nproc_per_node=4 \
    --nnodes=1 \
    --node_rank=0 \
    main.py \
    --command train-oracle-vilbert \
    --config config_files/oracle_vilbert.yaml \
    --n-jobs 8

To evaluate our Oracle model:

$ python main.py \
    --command test-oracle-vilbert \
    --config config_files/oracle_vilbert.yaml \
    --load ckpt/oracle_vilbert-sd0/epoch-3.pth

This repo also implements other Oracle models:

  • Baseline Oracle model [1]
  • Baseline Oracle model + Faster-RCNN visual features (our ablation model)

To train and evaluate this model, run the main.py with corresponding config file and command.

Guesser

To train our Guesser model:

$ python main.py \
    --command train-guesser-vilbert \
    --config config_files/guesser_vilbert.yaml \
    --n-jobs 8

To evaluate our Guesser model:

$ python main.py \
    --command test-guesser-vilbert \
    --config config_files/guesser_vilbert.yaml \
    --n-jobs 8 \
    --load ckpt/guesser_vilbert-sd0/best.pth

This repo also implements other Guesser models:

  • Baseline Guesser model [1]

To train and evaluate this model, run the main.py with corresponding config file and command.

Q-Gen

To train our Q-Gen model:

# Distributed training
$ python -m torch.distributed.launch \
    --nproc_per_node=4 \
    --nnodes=1 \
    --node_rank=0 \
    main.py \
    --command train-qgen-vilbert \
    --config config_files/qgen_vilbert.yaml \
    --n-jobs 8 

# Non-distributed training
$ python main.py \
    --command train-qgen-vilbert \
    --config config_files/qgen_vilbert.yaml \
    --n-jobs 8 

To evalaute our Q-Gen model:

$ python main.py \
    --command test-self-play-all-vilbert \
    --config config_files/self_play_all_vilbert.yaml \
    --n-jobs 8

This repo also implements other Q-Gen models:

  • Baseline Q-Gen model [1]
  • VDST Q-Gen model [2]

To train and evaluate these models, run the main.py with corresponding config file and command.

References

[1] Strub, F., De Vries, H., Mary, J., Piot, B., Courvile, A., & Pietquin, O. (2017, August). End-to-end optimization of goal-driven and visually grounded dialogue systems. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (pp. 2765-2771).

[2] Pang, W., & Wang, X. (2020, April). Visual dialogue state tracking for question generation. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 34, No. 07, pp. 11831-11838).

TEDSummary is a speech summary corpus. It includes TED talks subtitle (Document), Title-Detail (Summary), speaker name (Meta info), MP4 URL, and utterance id

TEDSummary is a speech summary corpus. It includes TED talks subtitle (Document), Title-Detail (Summary), speaker name (Meta info), MP4 URL

3 Dec 26, 2022
Implementations of LSTM: A Search Space Odyssey variants and their training results on the PTB dataset.

An LSTM Odyssey Code for training variants of "LSTM: A Search Space Odyssey" on Fomoro. Check out the blog post. Training Install TensorFlow. Clone th

Fomoro AI 95 Apr 13, 2022
[CVPR 2022 Oral] Versatile Multi-Modal Pre-Training for Human-Centric Perception

Versatile Multi-Modal Pre-Training for Human-Centric Perception Fangzhou Hong1  Liang Pan1  Zhongang Cai1,2,3  Ziwei Liu1* 1S-Lab, Nanyang Technologic

Fangzhou Hong 96 Jan 03, 2023
Einshape: DSL-based reshaping library for JAX and other frameworks.

Einshape: DSL-based reshaping library for JAX and other frameworks. The jnp.einsum op provides a DSL-based unified interface to matmul and tensordot o

DeepMind 62 Nov 30, 2022
(ImageNet pretrained models) The official pytorch implemention of the TPAMI paper "Res2Net: A New Multi-scale Backbone Architecture"

Res2Net The official pytorch implemention of the paper "Res2Net: A New Multi-scale Backbone Architecture" Our paper is accepted by IEEE Transactions o

Res2Net Applications 928 Dec 29, 2022
Official pytorch implement for “Transformer-Based Source-Free Domain Adaptation”

Official implementation for TransDA Official pytorch implement for “Transformer-Based Source-Free Domain Adaptation”. Overview: Result: Prerequisites:

stanley 54 Dec 22, 2022
[CVPR'22] Official PyTorch Implementation of Collaborative Transformers for Grounded Situation Recognition

[CVPR'22] Collaborative Transformers for Grounded Situation Recognition Paper | Model Checkpoint This is the official PyTorch implementation of Collab

Junhyeong Cho 29 Dec 10, 2022
Reference PyTorch implementation of "End-to-end optimized image compression with competition of prior distributions"

PyTorch reference implementation of "End-to-end optimized image compression with competition of prior distributions" by Benoit Brummer and Christophe

Benoit Brummer 6 Jun 16, 2022
PyTorch implementation of neural style randomization for data augmentation

README Augment training images for deep neural networks by randomizing their visual style, as described in our paper: https://arxiv.org/abs/1809.05375

84 Nov 23, 2022
StarGAN - Official PyTorch Implementation (CVPR 2018)

StarGAN - Official PyTorch Implementation ***** New: StarGAN v2 is available at https://github.com/clovaai/stargan-v2 ***** This repository provides t

Yunjey Choi 5.1k Jan 04, 2023
Normalizing Flows with a resampled base distribution

Resampling Base Distributions of Normalizing Flows Normalizing flows are a popular class of models for approximating probability distributions. Howeve

Vincent Stimper 24 Nov 03, 2022
Prototypical Pseudo Label Denoising and Target Structure Learning for Domain Adaptive Semantic Segmentation (CVPR 2021)

Prototypical Pseudo Label Denoising and Target Structure Learning for Domain Adaptive Semantic Segmentation (CVPR 2021, official Pytorch implementatio

Microsoft 247 Dec 25, 2022
Optimizing DR with hard negatives and achieving SOTA first-stage retrieval performance on TREC DL Track (SIGIR 2021 Full Paper).

Optimizing Dense Retrieval Model Training with Hard Negatives Jingtao Zhan, Jiaxin Mao, Yiqun Liu, Jiafeng Guo, Min Zhang, Shaoping Ma This repo provi

Jingtao Zhan 99 Dec 27, 2022
A PyTorch implementation of Implicit Q-Learning

IQL-PyTorch This repository houses a minimal PyTorch implementation of Implicit Q-Learning (IQL), an offline reinforcement learning algorithm, along w

Garrett Thomas 30 Dec 12, 2022
Code for the paper "Balancing Training for Multilingual Neural Machine Translation, ACL 2020"

Balancing Training for Multilingual Neural Machine Translation Implementation of the paper Balancing Training for Multilingual Neural Machine Translat

Xinyi Wang 21 May 18, 2022
Pytorch Implementation of "Desigining Network Design Spaces", Radosavovic et al. CVPR 2020.

RegNet Pytorch Implementation of "Desigining Network Design Spaces", Radosavovic et al. CVPR 2020. Paper | Official Implementation RegNet offer a very

Vishal R 2 Feb 11, 2022
PyTorch implementation of 'Gen-LaneNet: a generalized and scalable approach for 3D lane detection'

(pytorch) Gen-LaneNet: a generalized and scalable approach for 3D lane detection Introduction This is a pytorch implementation of Gen-LaneNet, which p

Yuliang Guo 233 Jan 06, 2023
Simple (but Strong) Baselines for POMDPs

Recurrent Model-Free RL is a Strong Baseline for Many POMDPs Welcome to the POMDP world! This repo provides some simple baselines for POMDPs, specific

Tianwei V. Ni 172 Dec 29, 2022
We envision models that are pre-trained on a vast range of domain-relevant tasks to become key for molecule property prediction

We envision models that are pre-trained on a vast range of domain-relevant tasks to become key for molecule property prediction. This repository aims to give easy access to state-of-the-art pre-train

GMUM 90 Jan 08, 2023