Official code for: A Probabilistic Hard Attention Model For Sequentially Observed Scenes

Last update: Nov 19, 2022

Overview

"A Probabilistic Hard Attention Model For Sequentially Observed Scenes"

Authors: Samrudhdhi Rangrej, James Clark Accepted to: BMVC'21 A recurrent attention model sequentially observes glimpses from an image and predicts a class label. At time t, the model actively observes a glimpse g_t and its coordinates l_t. Given g_t and l_t, the feed-forward module F extracts features f_t, and the recurrent module R updates a hidden state to h_t. Using an updated hidden state h_t, the linear classifier C predicts the class distribution p(y|h_t). At time t+1, the model assesses various candidate locations l before attending an optimal one. It predicts p(y|g,l,h_t) ahead of time and selects the candidate l that maximizes KL[p(y|g,l,h_t)||p(y|h_t)]. The model synthesizes the features of g using a Partial VAE to approximate p(y|g,l,h_t) without attending to the glimpse g. The normalizing flow-based encoder S predicts the approximate posterior q(z|h_t). The decoder D uses a sample z~q(z|h_t) to synthesize a feature map f^~ containing features of all glimpses. The model uses f^~(l) as features of a glimpse at location l and evaluates p(y|g,l,h_t)=p(y|f^~(l),h_t). Dashed arrows show a path to compute the lookahead class distribution p(y|f^~(l),h_t).

Requirements:

torch==1.8.1, torchvision==0.9.1, tensorboard==2.5.0, fire==0.4.0

Datasets:

SVHN (Let PyTorch download this dataset)
CIFAR-10 (Let PyTorch download this dataset)
CIFAR-100 (Let PyTorch download this dataset)
CINIC-10 (download from: https://datashare.is.ed.ac.uk/bitstream/handle/10283/3192/CINIC-10.tar.gz, visit https://github.com/BayesWatch/cinic-10)
TinyImageNet (download from: http://cs231n.stanford.edu/tiny-imagenet-200.zip)

Training a model

Use main.py to train and evaluate the model.

Arguments

dataset: one of 'svhn', 'cifar10', 'cifar100', 'cinic10', 'tinyimagenet'
datapath: path to the downloaded datasets
lr: learning rate
training_phase: one of 'first', 'second', 'third'
ccebal: coefficient for cross entropy loss
batch: batch-size for training
batchv: batch-size for evaluation
T: maximum time-step
logfolder: path to log directory
epochs: number of training epochs
pretrain_checkpoint: checkpoint for pretrained model from previous training phase

Example commands to train the model for SVHN dataset are as follows. Training Stage one

python3 main.py \
    --dataset='svhn' \
    --datapath='./data/' \
    --lr=0.001 \
    --training_phase='first' \
    --ccebal=1 \
    --batch=64 \
    --batchv=64 \
    --T=7 \
    --logfolder='./svhn_log_first' \
    --epochs=1000 \
    --pretrain_checkpoint=None

Training Stage two

python3 main.py \
    --dataset='svhn' \
    --datapath='./data/' \
    --lr=0.001 \
    --training_phase='second' \
    --ccebal=0 \
    --batch=64 \
    --batchv=64 \
    --T=7 \
    --logfolder='./svhn_log_second' \
    --epochs=100 \
    --pretrain_checkpoint='./svhn_log_first/weights_f_1000.pth'

Training Stage three

python3 main.py \
    --dataset='svhn' \
    --datapath='./data/' \
    --lr=0.001 \
    --training_phase='third' \
    --ccebal=16 \
    --batch=64 \
    --batchv=64 \
    --T=7 \
    --logfolder='./svhn_log_third' \
    --epochs=100 \
    --pretrain_checkpoint='./svhn_log_second/weights_f_100.pth'

Visualization of attention policy for a CIFAR-10 image

The top row shows the entire image and the EIG maps for t=1 to 6. The bottom row shows glimpses attended by our model. The model observes the first glimpse at a random location. Our model observes a glimpse of size 8x8. The glimpses overlap with the stride of 4, resulting in a 7x7 grid of glimpses. The EIG maps are of size 7x7 and are upsampled for the display. We display the entire image for reference; our model never observes the whole image.

Acknowledgement

Major parts of neural spline flows implementation are borrowed from Karpathy's pytorch-normalizing-flows.

Official code for: A Probabilistic Hard Attention Model For Sequentially Observed Scenes

Related tags

Overview

"A Probabilistic Hard Attention Model For Sequentially Observed Scenes"

Requirements:

Datasets:

Training a model

Visualization of attention policy for a CIFAR-10 image

Acknowledgement

Owner

PoseCamera is python based SDK for human pose estimation through RGB webcam.

DSTC10 Track 2 - Knowledge-grounded Task-oriented Dialogue Modeling on Spoken Conversations

Perspective: Julia for Biologists

PyTorch implementation of the TTC algorithm

Space-invaders - Simple Game created using Python & PyGame, as my Beginner Python Project

Official PyTorch Implementation of Rank & Sort Loss [ICCV2021]

Transfer Learning for Pose Estimation of Illustrated Characters

Auto-updating data to assist in investment to NEPSE

Unified Interface for Constructing and Managing Workflows on different workflow engines, such as Argo Workflows, Tekton Pipelines, and Apache Airflow.

A really easy-to-use and powerful sudoku solver.

Official Implementation of Domain-Aware Universal Style Transfer

CS50x-AI - Artificial Intelligence with Python from Harvard University

Paddle-Adversarial-Toolbox (PAT) is a Python library for Deep Learning Security based on PaddlePaddle.

Software for Multimodalty 2D+3D Facial Expression Recognition (FER) UI

Open CV - Convert a picture to look like a cartoon sketch in python

Torchreid: Deep learning person re-identification in PyTorch.

List of content farm sites like g.penzai.com.

FCAF3D: Fully Convolutional Anchor-Free 3D Object Detection

[CVPR 2022 Oral] Balanced MSE for Imbalanced Visual Regression https://arxiv.org/abs/2203.16427

Instance Semantic Segmentation List