Pytorch implementation of winner from VQA Chllange Workshop in CVPR'17

Last update: Dec 11, 2022

Overview

2017 VQA Challenge Winner (CVPR'17 Workshop)

pytorch implementation of Tips and Tricks for Visual Question Answering: Learnings from the 2017 Challenge by Teney et al.

Prerequisites

python 3.6+
numpy
pytorch 0.4
tqdm
nltk
pandas

Data

Preparation

To download and extract vqav2, glove, and pretrained visual features:
```
bash scripts/download_extract.sh
```
To prepare data for training:
```
python scripts/preproc.py
```

The structure of data/ directory should look like this:

- data/
  - zips/
    - v2_XXX...zip
    - ...
    - glove...zip
    - trainval_36.zip
  - glove/
    - glove...txt
    - ...
  - v2_XXX.json
  - ...
  - trainval_resnet...tsv
  (The above are files created after executing scripts/download_extract.sh)
  - tokenizers/
    - ...
  - dict_ans.pkl
  - dict_q.pkl
  - glove_pretrained_300.npy
  - train_qa.pkl
  - val_qa.pkl
  - train_vfeats.pkl
  - val_vfeats.pkl
  (The above are files created after executing scripts/preproc.py)

Train

Use default parameters:

bash scripts/train.sh

Notes

Huge re-factor (especially data preprocessing), tested based on pytorch 0.4.1 and python 3.6
Training for 20 epochs reach around 50% training accuracy. (model seems buggy in my implementation)
After all the preprocessing, data/ directory may be up to 38G+
Some of preproc.py and utils.py are based on this repo

Pytorch implementation of winner from VQA Chllange Workshop in CVPR'17

Related tags

Overview

2017 VQA Challenge Winner (CVPR'17 Workshop)

Prerequisites

Data

Preparation

Train

Notes

Resources

Owner

Mark Dong

Deep Sea Treasure Environment for Multi-Objective Optimization Research

Training DiffWave using variational method from Variational Diffusion Models.

Code for Fold2Seq paper from ICML 2021

This program uses trial auth token of Azure Cognitive Services to do speech synthesis for you.

UIUCTF 2021 Public Challenge Repository

Using Language Model to Bootstrap Human Activity Recognition Ambient Sensors Based in Smart Homes

Official Code Release for "CLIP-Adapter: Better Vision-Language Models with Feature Adapters"

Taichi Course Homework Template

The source codes for TME-BNA: Temporal Motif-Preserving Network Embedding with Bicomponent Neighbor Aggregation.

f-BRS: Rethinking Backpropagating Refinement for Interactive Segmentation

CLDF dataset derived from Robbeets et al.'s "Triangulation Supports Agricultural Spread" from 2021

The official implementation of paper Siamese Transformer Pyramid Networks for Real-Time UAV Tracking, accepted by WACV22

The "breathing k-means" algorithm with datasets and example notebooks

An efficient PyTorch library for Global Wheat Detection using YOLOv5. The project is based on this Kaggle competition Global Wheat Detection (2021).

Real-Time Social Distance Monitoring tool using Computer Vision

Python library for tracking human heads with FLAME (a 3D morphable head model)

百度2021年语言与智能技术竞赛机器阅读理解Pytorch版baseline

Milano is a tool for automating hyper-parameters search for your models on a backend of your choice.

MoveNet Single Pose on DepthAI

Code for the AAAI-2022 paper: Imagine by Reasoning: A Reasoning-Based Implicit Semantic Data Augmentation for Long-Tailed Classification