Alignment Attention Fusion framework for Few-Shot Object Detection

Last update: Dec 16, 2022

Overview

AAF framework

Framework generalities

This repository contains the code of the AAF framework proposed in this paper. The main idea behind this work is to propose a flexible framework to implement various attention mechanisms for Few-Shot Object Detection. The framework is composed of 3 different modules: Spatial Alignment, Global Attention and Fusion Layer, which are applied successively to combine features from query and support images.

The inputs of the framework are:

query_features List[Tensor(B, C, H, W)]: Query features at different levels. For each level, the features are of shape Batch x Channels x Height x Width.
support_features List[Tensor(N, C, H', W')] : Support features at different level. First dimension correspond to the number of support images, regrouped by class: N = N_WAY * K_SHOT.
support_targets List[BoxList] bounding boxes for object in each support image.

The framework can be configured using a separate config file. Examples of such files are available under /config_files/aaf_framework/. The structure of these files is simple:

ALIGN_FIRST: #True/False Run Alignment before Attention when True
OUT_CH: # Number of features output by the fusion layer
ALIGNMENT:
    MODE: # Name of the alignment module selected
ATTENTION:
    MODE: # Name of the attention module selected
FUSION:
    MODE: # Name of the fusion module selected

File name	Method	Alignment	Attention	Fusion
`identity.yaml`	Identity	IDENTITY	IDENTITY	IDENTITY
`feature_reweighting.yaml`	FSOD via feature reweighting	IDENTITY	REWEIGHTING_BATCH	IDENTITY
`meta_faster_rcnn.yaml`	Meta Faster-RCNN	SIMILARITY_ALIGN	META_FASTER	META_FASTER
`self_adapt.yaml`	Self-adaptive attention for FSOD	IDENTITY_NO_REPEAT	GRU	IDENTITY
`dynamic.yaml`	Dynamic relevance learning	IDENTITY	INTERPOLATE	DYNAMIC_R
`dana.yaml`	Dual Awarness Attention for FSOD	CISA	BGA	HADAMARD

The path to the AAF config file should be specified inside the master config file (i.e. for the whole network) under FEWSHOT.AAF.CFG.

For each module, classes implementing the available choices are regrouped under a single file: /modelling/aaf/alignment.py, /modelling/aaf/attention.py and /modelling/aaf/fusion.py.

Spatial Alignment

Spatial Alignment reorganizes spatially the features of one feature map to match another one. The idea is to align similar features in both maps so that comparison is easier.

Name	Description
IDENTITY	Repeats the feature to match BNCHW and NBCHW dimensions
IDENTITY_NO_REPEAT	Identity without repetition
SIMILARITY_ALIGN	Compute similarity matrix between support and query and align support to query accordingly.
CISA	CISA block from this method

### Global Attention Global Attention highlights some features of a map accordingly to an attention vector computed globally on another one. The idea is to leverage global and hopefully semantic information.

Name	Description
IDENTITY	Simply pass features to next modules.
REWEIGHTING	Reweights query features using globally pooled vectors from support.
REWEIGHTING_BATCH	Same as above but support examples are the same for the whole batch.
SELF_ATTENTION	Same as above but attention vectors are computed from the alignment matrix between query and support.
BGA	BGA blocks from this method
META_FASTER	Attention block from this method
POOLING	Pools query and support features to the same size.
INTERPOLATE	Upsamples support features to match query size.
GRU	Computes attention vectors through a graph representation using a GRU.

Fusion Layer

Combine directly the features from support and query. These maps must be of the same dimension for point-wise operation. Hence fusion is often employed along with alignment.

Name	Description
IDENTITY	Returns onlu adapted query features.
ADD	Point-wise sum between query and support features.
HADAMARD	Point-wise multiplication between query and support features.
SUBSTRACT	Point-wise substraction between query and support features.
CONCAT	Channel concatenation of query and support features.
META_FASTER	Fusion layer from this method
DYNAMIC_R	Fusion layer from this method

Training and evaluation

Training and evaluation scripts are available.

TODO: Give code snippet to run training with a specified config file (modify main) Basically create 2 scripts train.py and eval.py with arg config file.

DataHandler

Explain DataHandler class a bit.

Installation

Dependencies used for this projects can be installed through conda create --name <env> --file requirements.txt. Please note that these requirements are not all necessary and it will be updated soon.

FCOS must be installed from sources. But there might be some issue after installation depending on the version of the python packages you use.

cpu/vision.h file not found: replace all occurences in the FCOS source by vision.h (see this issue).
Error related to AT_CHECK with pytorch > 1.5 : replace all occurences by TORCH_CHECK (see this issue.
Error related to torch._six.PY36: replace all occurence of PY36 by PY37.

Results

Results on pascal VOC, COCO and DOTA.

Alignment Attention Fusion framework for Few-Shot Object Detection

Related tags

Overview

AAF framework

Framework generalities

Spatial Alignment

Fusion Layer

Training and evaluation

DataHandler

Installation

Results

Owner

Pierre Le Jeune

Self-Supervised Monocular DepthEstimation with Internal Feature Fusion(arXiv), BMVC2021

AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty

Awesome Monocular 3D detection

The official implementation of the IEEE S&P`22 paper "SoK: How Robust is Deep Neural Network Image Classification Watermarking".

DeepLM: Large-scale Nonlinear Least Squares on Deep Learning Frameworks using Stochastic Domain Decomposition (CVPR 2021)

Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting

RoBERTa Marathi Language model trained from scratch during huggingface 🤗 x flax community week

RepMLP: Re-parameterizing Convolutions into Fully-connected Layers for Image Recognition

Accurate Phylogenetic Inference with Symmetry-Preserving Neural Networks

Res2Net for Instance segmentation and Object detection using MaskRCNN

functorch is a prototype of JAX-like composable function transforms for PyTorch.

Provided is code that demonstrates the training and evaluation of the work presented in the paper: "On the Detection of Digital Face Manipulation" published in CVPR 2020.

Code, Models and Datasets for OpenViDial Dataset

Removing Inter-Experimental Variability from Functional Data in Systems Neuroscience

Code, pre-trained models and saliency results for the paper "Boosting RGB-D Saliency Detection by Leveraging Unlabeled RGB Images".

ThunderSVM: A Fast SVM Library on GPUs and CPUs

Data cleaning, missing value handle, EDA use in this project

Fully Convlutional Neural Networks for state-of-the-art time series classification

Dynamic Token Normalization Improves Vision Transformers

Code for the published paper : Learning to recognize rare traffic sign