PyTorch implementation for the paper Visual Representation Learning with Self-Supervised Attention for Low-Label High-Data Regime

Last update: Nov 08, 2022

Overview

Visual Representation Learning with Self-Supervised Attention for Low-Label High-Data Regime

Created by Prarthana Bhattacharyya.

Disclaimer: This is not an official product and is meant to be a proof-of-concept and for academic/educational use only.

This repository contains the PyTorch implementation for the paper Visual Representation Learning with Self-Supervised Attention for Low-Label High-Data Regime, to be presented at ICASSP-2022.

Self-supervision has shown outstanding results for natural language processing, and more recently, for image recognition. Simultaneously, vision transformers and its variants have emerged as a promising and scalable alternative to convolutions on various computer vision tasks. In this paper, we are the first to question if self-supervised vision transformers (SSL-ViTs) can be adapted to two important computer vision tasks in the low-label, high-data regime: few-shot image classification and zero-shot image retrieval. The motivation is to reduce the number of manual annotations required to train a visual embedder, and to produce generalizable, semantically meaningful and robust embeddings.

Results

SSL-ViT + few-shot image classification:

Qualitative analysis for base-classes chosen by supervised CNN and SSL-ViT for few-shot distribution calibration:

SSL-ViT + zero-shot image retrieval:

Pretraining Self-Supervised ViT

Run DINO with ViT-small network on a single node with 4 GPUs for 100 epochs with the following command.

cd dino/

python -m torch.distributed.launch --nproc_per_node=4 main_dino.py --arch vit_small --data_path /path/to/imagenet/train --output_dir /path/to/saving_dir

For mini-ImageNet pretraining, we use the classes listed in: ssl-vit-fewshot/data/ImageNetSSLTrainingSplit_mini.txt For tiered-ImageNet pretraining, we use the classes listed in: ssl-vit-fewshot/data/ImageNetSSLTrainingSplit_tiered.txt
For CUB-200, Cars-196 and SOP, we use the pretrained model from:

import torch
vits16 = torch.hub.load('facebookresearch/dino:main', 'dino_vits16')

Visual Representation Learning with Self-Supervised ViT for Low-Label High-Data Regime

Dataset Preparation

Please follow the instruction in FRN for few-shot image classification and RevisitDML for zero-shot image retrieval to download the datasets and put the corresponding datasets in ssl-vit-fewshot/data and DIML/data folder.

Training and Evaluation for few-shot image classification

The first step is to extract features for base and novel classes using the pretrained SSL-ViT.
get_dino_miniimagenet_feats.ipynb extracts SSL-ViT features for the base and novel classes.
Change the hyper-parameter data_path to use CUB or tiered-ImageNet.
The SSL-ViT checkpoints for the various datasets are provided below (Note: this has only been trained without labels). We also provide the extracted features which need to be stored in ssl-vit-fewshot/dino_features_data/.

arch	dataset	download	extracted-train	extracted-test
ViT-S/16	mini-ImageNet	mini_imagenet_checkpoint.pth	train.p	test.p
ViT-S/16	tiered-ImageNet	tiered_imagenet_checkpoint.pth	train.p	test.p
ViT-S/16	CUB	cub_checkpoint.pth	train.p	test.p

For n-way-k-shot evaluation, we provide miniimagenet_evaluate_dinoDC.ipynb.

Training and Evaluation for zero-shot image retrieval

To train the baseline CNN models, run the scripts in DIML/scripts/baselines. The checkpoints are saved in Training_Results folder. For example:

cd DIML/
CUDA_VISIBLE_DEVICES=0 ./script/baselines/cub_runs.sh

To train the supervised ViT and self-supervised ViT:

cp -r ssl-vit-retrieval/architectures/* DIML/ssl-vit-retrieval/architectures/

CUDA_VISIBLE_DEVICES=0 ./script/baselines/cub_runs.sh --arch vits
CUDA_VISIBLE_DEVICES=0 ./script/baselines/cub_runs.sh --arch dino

To test the models, first edit the checkpoint paths in test_diml.py, then run

CUDA_VISIBLE_DEVICES=0 ./scripts/diml/test_diml.sh cub200

dataset	Loss	SSL-ViT-download
CUB	Margin	cub_ssl-vit-margin.pth
CUB	Proxy-NCA	cub_ssl-vit-proxynca.pth
CUB	Multi-Similarity	cub_ssl-vit-ms.pth
Cars-196	Margin	cars_ssl-vit-margin.pth
Cars-196	Proxy-NCA	cars_ssl-vit-proxynca.pth
Cars-196	Multi-Similarity	cars_ssl-vit-ms.pth

Acknowledgement

The code is based on:

PyTorch implementation for the paper Visual Representation Learning with Self-Supervised Attention for Low-Label High-Data Regime

Related tags

Overview

Visual Representation Learning with Self-Supervised Attention for Low-Label High-Data Regime

Results

Pretraining Self-Supervised ViT

Visual Representation Learning with Self-Supervised ViT for Low-Label High-Data Regime

Dataset Preparation

Training and Evaluation for few-shot image classification

Training and Evaluation for zero-shot image retrieval

Acknowledgement

Owner

Prarthana Bhattacharyya

Bidimensional Leaderboards: Generate and Evaluate Language Hand in Hand

Learning a mapping from images to psychological similarity spaces with neural networks.

Pytorch implementation of the paper: "SAPNet: Segmentation-Aware Progressive Network for Perceptual Contrastive Image Deraining"

PyTorch implementation of NeurIPS 2021 paper: "CoFiNet: Reliable Coarse-to-fine Correspondences for Robust Point Cloud Registration"

An Easy-to-use, Modular and Prolongable package of deep-learning based Named Entity Recognition Models.

Neighbor2Seq: Deep Learning on Massive Graphs by Transforming Neighbors to Sequences

Pocsploit is a lightweight, flexible and novel open source poc verification framework

This repository contains code used to audit the stability of personality predictions made by two algorithmic hiring systems

pytorch implementation for Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network arXiv:1609.04802

A resource for learning about deep learning techniques from regression to LSTM and Reinforcement Learning using financial data and the fitness functions of algorithmic trading

PyTorch implementation of the WarpedGANSpace: Finding non-linear RBF paths in GAN latent space (ICCV 2021)

Topic Discovery via Latent Space Clustering of Pretrained Language Model Representations

Tutorial page of the Climate Hack, the greatest hackathon ever

This Repostory contains the pretrained DTLN-aec model for real-time acoustic echo cancellation.

Iris prediction model is used to classify iris species created julia's DecisionTree, DataFrames, JLD2, PlotlyJS and Statistics packages.

AugLy is a data augmentations library that currently supports four modalities (audio, image, text & video) and over 100 augmentations

Generalized Proximal Policy Optimization with Sample Reuse (GePPO)

Fully Convolutional DenseNet (A.K.A 100 layer tiramisu) for semantic segmentation of images implemented in TensorFlow.

This repo is a C++ version of yolov5_deepsort_tensorrt. Packing all C++ programs into .so files, using Python script to call C++ programs further.

[ICCV21] Self-Calibrating Neural Radiance Fields