PyTorch implementation of "Debiased Visual Question Answering from Feature and Sample Perspectives" (NeurIPS 2021)

Related tags

Deep LearningD-VQA
Overview

D-VQA

We provide the PyTorch implementation for Debiased Visual Question Answering from Feature and Sample Perspectives (NeurIPS 2021).

D-VQA

Dependencies

  • Python 3.6
  • PyTorch 1.1.0
  • dependencies in requirements.txt
  • We train and evaluate all of the models based on one TITAN Xp GPU

Getting Started

Installation

  1. Clone this repository:

     git clone https://github.com/Zhiquan-Wen/D-VQA.git
     cd D-VQA
    
  2. Install PyTorch and other dependencies:

     pip install -r requirements.txt
    

Download and preprocess the data

cd data 
bash download.sh
python preprocess_features.py --input_tsv_folder xxx.tsv --output_h5 xxx.h5
python feature_preprocess.py --input_h5 xxx.h5 --output_path trainval 
python create_dictionary.py --dataroot vqacp2/
python preprocess_text.py --dataroot vqacp2/ --version v2
cd ..

Training

  • Train our model
CUDA_VISIBLE_DEVICES=0 python main.py --dataroot data/vqacp2/ --img_root data/coco/trainval_features --output saved_models_cp2/ --self_loss_weight 3 --self_loss_q 0.7
  • Train the model with 80% of the original training set
CUDA_VISIBLE_DEVICES=0 python main.py --dataroot data/vqacp2/ --img_root data/coco/trainval_features --output saved_models_cp2/ --self_loss_weight 3 --self_loss_q 0.7 --ratio 0.8 

Evaluation

  • A json file of results from the test set can be produced with:
CUDA_VISIBLE_DEVICES=0 python test.py --dataroot data/vqacp2/ --img_root data/coco/trainval_features --checkpoint_path saved_models_cp2/best_model.pth --output saved_models_cp2/result/
  • Compute detailed accuracy for each answer type:
python comput_score.py --input saved_models_cp2/result/XX.json --dataroot data/vqacp2/

Pretrained model

A well-trained model can be found here. The test results file produced by it can be found here and its performance is as follows:

Overall score: 61.91
Yes/No: 88.93 Num: 52.32 other: 50.39

Reference

If you found this code is useful, please cite the following paper:

@inproceedings{D-VQA,
  title     = {Debiased Visual Question Answering from Feature and Sample Perspectives},
  author    = {Zhiquan Wen, 
               Guanghui Xu, 
               Mingkui Tan, 
               Qingyao Wu, 
               Qi Wu},
  booktitle = {NeurIPS},
  year = {2021}
}

Acknowledgements

This repository contains code modified from SSL-VQA, thank you very much!

Besides, we thank Yaofo Chen for providing MIO library to accelerate the data loading.

Comments
  • Questions about the code

    Questions about the code

    Thank you very much for providing the code, but I still have two questions that I did not understand well.

    1. A module, BDM, is used to capture negative bias, but this module only includes a multi-layer perceptron. Then how to ensure the features captured by this multi-layer perceptron are negative bias?
    2. On the left of Figure 2 of the paper, there are no backward gradient of the question-to-answer and the vision-to-answer branches. Where did it reflect in the code?
    opened by darwann 4
  • CVE-2007-4559 Patch

    CVE-2007-4559 Patch

    Patching CVE-2007-4559

    Hi, we are security researchers from the Advanced Research Center at Trellix. We have began a campaign to patch a widespread bug named CVE-2007-4559. CVE-2007-4559 is a 15 year old bug in the Python tarfile package. By using extract() or extractall() on a tarfile object without sanitizing input, a maliciously crafted .tar file could perform a directory path traversal attack. We found at least one unsantized extractall() in your codebase and are providing a patch for you via pull request. The patch essentially checks to see if all tarfile members will be extracted safely and throws an exception otherwise. We encourage you to use this patch or your own solution to secure against CVE-2007-4559. Further technical information about the vulnerability can be found in this blog.

    If you have further questions you may contact us through this projects lead researcher Kasimir Schulz.

    opened by TrellixVulnTeam 0
  • LXMERT numbers

    LXMERT numbers

    Hi, I wish to reproduce the LXMERT(LXMERT without D-VQA) numbers reported in the paper. It would be helpful if you could provide me with a way to do this using your code. I tried using the original LXMERT code, but I am not able to get the numbers reported in your paper on the VQA-CP2 dataset.

    opened by Vaidehi99 0
  • Download trainval_36.zip error

    Download trainval_36.zip error

    Hi, thank you for your work on this.

    I keep getting a download error when downloading the trainval_36.zip file. Is there another link I can use to download this?

    Thanks in advance!

    opened by chojw 0
  • 关于box和image的对齐问题

    关于box和image的对齐问题

    您好,我将box的注释解开后,重新生成特征,然后将其绘制出来,但是明显感觉有偏差,不知道您是否可以提供一份绘图的代码。 image 下面是我的代码 def plot_rect(image, boxes): img = Image.fromarray(np.uint8(image)) draw = ImageDraw.Draw(img) for k in range(2): box = boxes[k,:] print(box) drawrect(draw, box, outline='green', width=3) img = np.asarray(img) return img def drawrect(drawcontext, xy, outline=None, width=0): x1, y1, x2, y2 = xy points = (x1, y1), (x2, y1), (x2, y2), (x1, y2), (x1, y1) drawcontext.line(points, fill=outline, width=width)

    opened by LemonQC 0
Owner
Zhiquan Wen
Zhiquan Wen
Hypernetwork-Ensemble Learning of Segmentation Probability for Medical Image Segmentation with Ambiguous Labels

Hypernet-Ensemble Learning of Segmentation Probability for Medical Image Segmentation with Ambiguous Labels The implementation of Hypernet-Ensemble Le

Sungmin Hong 6 Jul 18, 2022
Repository for MDPGT

MD-PGT Repository for implementing and reproducing the results for the paper MDPGT: Momentum-based Decentralized Policy Gradient Tracking. Available E

Xian Yeow Lee 2 Dec 30, 2021
code for EMNLP 2019 paper Text Summarization with Pretrained Encoders

PreSumm This code is for EMNLP 2019 paper Text Summarization with Pretrained Encoders Updates Jan 22 2020: Now you can Summarize Raw Text Input!. Swit

Yang Liu 1.2k Dec 28, 2022
Multi-robot collaborative exploration and mapping through Voronoi partition and DRL in unknown environment

Voronoi Multi_Robot Collaborate Exploration Introduction In the unknown environment, the cooperative exploration of multiple robots is completed by Vo

PeaceWord 6 Nov 22, 2022
The Illinois repository for Climatehack (https://climatehack.ai/). We won 1st place!

Climatehack This is the repository for Illinois's Climatehack Team. We earned first place on the leaderboard with a final score of 0.87992. An overvie

Jatin Mathur 20 Jun 09, 2022
Multi-Glimpse Network With Python

Multi-Glimpse Network Our code requires Python ≥ 3.8 Installation For example, venv + pip: $ python3 -m venv env $ source env/bin/activate (env) $ pyt

9 May 10, 2022
Repository for the paper titled: "When is BERT Multilingual? Isolating Crucial Ingredients for Cross-lingual Transfer"

When is BERT Multilingual? Isolating Crucial Ingredients for Cross-lingual Transfer This repository contains code for our paper titled "When is BERT M

Princeton Natural Language Processing 9 Dec 23, 2022
A fast MoE impl for PyTorch

An easy-to-use and efficient system to support the Mixture of Experts (MoE) model for PyTorch.

Rick Ho 873 Jan 09, 2023
PyTorch implementation for paper StARformer: Transformer with State-Action-Reward Representations.

StARformer This repository contains the PyTorch implementation for our paper titled StARformer: Transformer with State-Action-Reward Representations.

Jinghuan Shang 14 Dec 09, 2022
Code and Datasets from the paper "Self-supervised contrastive learning for volcanic unrest detection from InSAR data"

Code and Datasets from the paper "Self-supervised contrastive learning for volcanic unrest detection from InSAR data" You can download the pretrained

Bountos Nikos 3 May 07, 2022
a project for 3D multi-object tracking

a project for 3D multi-object tracking

155 Jan 04, 2023
《Deep Single Portrait Image Relighting》(ICCV 2019)

Ratio Image Based Rendering for Deep Single-Image Portrait Relighting [Project Page] This is part of the Deep Portrait Relighting project. If you find

62 Dec 21, 2022
PICK: Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional Networks

Code for the paper "PICK: Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional Networks" (ICPR 2020)

Wenwen Yu 498 Dec 24, 2022
A PyTorch Implementation of Gated Graph Sequence Neural Networks (GGNN)

A PyTorch Implementation of GGNN This is a PyTorch implementation of the Gated Graph Sequence Neural Networks (GGNN) as described in the paper Gated G

Ching-Yao Chuang 427 Dec 13, 2022
face2comics by Sxela (Alex Spirin) - face2comics datasets

This is a paired face to comics dataset, which can be used to train pix2pix or similar networks.

Alex 164 Nov 13, 2022
A clean and robust Pytorch implementation of PPO on continuous action space.

PPO-Continuous-Pytorch I found the current implementation of PPO on continuous action space is whether somewhat complicated or not stable. And this is

XinJingHao 56 Dec 16, 2022
A note taker for NVDA. Allows the user to create, edit, view, manage and export notes to different formats.

Quick Notetaker add-on for NVDA The Quick Notetaker add-on is a wonderful tool which allows writing notes quickly and easily anytime and from any app

5 Dec 06, 2022
Place holder for HOPE: a human-centric and task-oriented MT evaluation framework using professional post-editing

HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Professional Post-Editing Towards More Effective MT Evaluation Place holder for dat

Lifeng Han 1 Apr 25, 2022
HSC4D: Human-centered 4D Scene Capture in Large-scale Indoor-outdoor Space Using Wearable IMUs and LiDAR. CVPR 2022

HSC4D: Human-centered 4D Scene Capture in Large-scale Indoor-outdoor Space Using Wearable IMUs and LiDAR. CVPR 2022 [Project page | Video] Getting sta

51 Nov 29, 2022