Repo for the paper Extrapolating from a Single Image to a Thousand Classes using Distillation

Overview

Extrapolating from a Single Image to a Thousand Classes using Distillation

by Yuki M. Asano* and Aaqib Saeed* (*Equal Contribution)

Our-method

Extrapolating from one image. Strongly augmented patches from a single image are used to train a student (S) to distinguish semantic classes, such as those in ImageNet. The student neural network is initialized randomly and learns from a pretrained teacher (T) via KL-divergence. Although almost none of target categories are present in the image, we find student performances of > 59% for classifying ImageNet's 1000 classes. In this paper, we develop this single datum learning framework and investigate it across datasets and domains.

Key contributions

  • A minimal framework for training neural networks with a single datum from scratch using distillation.
  • Extensive ablations of the proposed method, such as the dependency on the source image, the choice of augmentations and network architectures.
  • Large scale empirical evidence of neural networks' ability to extrapolate on > 13 image, video and audio datasets.
  • Qualitative insights on what and how neural networks trained with a single image learn.

Neuron visualizations

Neurons

We compare activation-maximization-based visualizations using the Lucent library. Even though the model has never seen an image of a panda, the model trained with a teacher and only single-image inputs has a good idea of how a panda looks like.

Running the experiments

Installation

In each folder cifar\in1k\video you will find a requirements.txt file. Install packages as follows:

pip3 install -r requirements.txt

1. Prepare Dataset:

To generate single image data, we refer to the data_generation folder

2. Run Experiments:

There is a main "distill.py" file for each experiment type: small-scale and large-scale images and video. Note: 2a uses tensorflow and 2b, 2c use pytorch.

2a. Run distillation experiments for CIFAR-10/100

e.g. with Animal single-image dataset as follows:

# in cifar folder:
python3 distill.py --dataset=cifar10 --image=/path/to/single_image_dataset/ \
                   --student=wrn_16_4 --teacher=wrn_40_4 

Note that we provide a pretrained teacher model for reproducibility.

2b. Run distillation experiments for ImageNet with single-image dataset as follows:

# in in1k folder:
python3 distill.py --dataset=in1k --testdir /ILSVRC12/val/ \
                   --traindir=/path/to/dataset/ --student_arch=resnet50 --teacher_arch=resnet18 

Note that teacher models are automatically downloaded from torchvision or timm.

2c. Run distillation experiments for Kinetics with single-image-created video dataset as follows:

# in video folder:
python3 distill.py --dataset=k400 --traindir=/dataset/with/vids --test_data_path /path/to/k400/val 

Note that teacher models are automatically downloaded from torchvideo when you distill a K400 model.

Pretrained models

Large-scale (224x224-sized) image ResNet-50 models trained for 200ep:

Dataset Teacher Student Performance Checkpoint
ImageNet-12 R18 R50 59.1% R50 weights
ImageNet-12 R50 R50 53.5% R50 weights
Places365 R18 R50 54.7% R50 weights
Flowers101 R18 R50 58.1% R50 weights
Pets37 R18 R50 83.7% R50 weights
IN100 R18 R50 74.1% R50 weights
STL-10 R18 R50 93.0% R50 weights

Video x3d_s_e (expanded) models (160x160 crop, 4frames) trained for 400ep:

Dataset Teacher Student Performance Checkpoint
K400 x3d_xs x3d_xs_e 53.57% weights
UCF101 x3d_xs x3d_xs_e 77.32% weights

Citation

@inproceedings{asano2021extrapolating,
  title={Extrapolating from a Single Image to a Thousand Classes using Distillation},
  author={Asano, Yuki M. and Saeed, Aaqib},
  journal={arXiv preprint arXiv:2112.00725},
  year={2021}
}
Owner
Yuki M. Asano
I'm an Computer Vision researcher at the University of Amsterdam. Did my PhD at the Visual Geometry Group in Oxford.
Yuki M. Asano
This repository contains code released by Google Research.

This repository contains code released by Google Research.

Google Research 26.6k Dec 31, 2022
Implementation of self-attention mechanisms for general purpose. Focused on computer vision modules. Ongoing repository.

Self-attention building blocks for computer vision applications in PyTorch Implementation of self attention mechanisms for computer vision in PyTorch

AI Summer 962 Dec 23, 2022
A CNN model to detect hand gestures.

Software Used python - programming language used, tested on v3.8 miniconda - for managing virtual environment Libraries Used opencv - pip install open

Shivanshu 6 Jul 14, 2022
Keep CALM and Improve Visual Feature Attribution

Keep CALM and Improve Visual Feature Attribution Jae Myung Kim1*, Junsuk Choe1*, Zeynep Akata2, Seong Joon Oh1† * Equal contribution † Corresponding a

NAVER AI 90 Dec 07, 2022
Code for "Learning Graph Cellular Automata"

Learning Graph Cellular Automata This code implements the experiments from the NeurIPS 2021 paper: "Learning Graph Cellular Automata" Daniele Grattaro

Daniele Grattarola 37 Oct 26, 2022
Certifiable Outlier-Robust Geometric Perception

Certifiable Outlier-Robust Geometric Perception About This repository holds the implementation for certifiably solving outlier-robust geometric percep

83 Dec 31, 2022
【Arxiv】Exploring Separable Attention for Multi-Contrast MR Image Super-Resolution

SANet Exploring Separable Attention for Multi-Contrast MR Image Super-Resolution Dependencies numpy==1.18.5 scikit_image==0.16.2 torchvision==0.8.1 to

36 Jan 05, 2023
Out-of-Domain Human Mesh Reconstruction via Dynamic Bilevel Online Adaptation

DynaBOA Code repositoty for the paper: Out-of-Domain Human Mesh Reconstruction via Dynamic Bilevel Online Adaptation Shanyan Guan, Jingwei Xu, Michell

197 Jan 07, 2023
I decide to sync up this repo and self-critical.pytorch. (The old master is in old master branch for archive)

An Image Captioning codebase This is a codebase for image captioning research. It supports: Self critical training from Self-critical Sequence Trainin

Ruotian(RT) Luo 1.3k Dec 31, 2022
Code and dataset for AAAI 2021 paper FixMyPose: Pose Correctional Describing and Retrieval Hyounghun Kim, Abhay Zala, Graham Burri, Mohit Bansal.

FixMyPose / फिक्समाइपोज़ Code and dataset for AAAI 2021 paper "FixMyPose: Pose Correctional Describing and Retrieval" Hyounghun Kim*, Abhay Zala*, Grah

4 Sep 19, 2022
Civsim is a basic civilisation simulation and modelling system built in Python 3.8.

Civsim Introduction Civsim is a basic civilisation simulation and modelling system built in Python 3.8. It requires the following packages: perlin_noi

17 Aug 08, 2022
Repo for code associated with Modeling the Mitral Valve.

Project Title Mitral Valve Getting Started Repo for code associated with Modeling the Mitral Valve. See https://arxiv.org/abs/1902.00018 for preprint,

Alex Kaiser 1 May 17, 2022
PointCloud Annotation Tools, support to label object bound box, ground, lane and kerb

PointCloud Annotation Tools, support to label object bound box, ground, lane and kerb

halo 368 Dec 06, 2022
Official codebase for ICLR oral paper Unsupervised Vision-Language Grammar Induction with Shared Structure Modeling

CLIORA This is the official codebase for ICLR oral paper: Unsupervised Vision-Language Grammar Induction with Shared Structure Modeling. We introduce

Bo Wan 32 Dec 23, 2022
SSD-based Object Detection in PyTorch

SSD-based Object Detection in PyTorch 서강대학교 현대모비스 SW 프로그램에서 진행한 인공지능 프로젝트입니다. Jetson nano를 이용해 pre-trained network를 fine tuning시켜 차량 및 신호등 인식을 구현하였습니다

Haneul Kim 1 Nov 16, 2021
State-to-Distribution (STD) Model

State-to-Distribution (STD) Model In this repository we provide exemplary code on how to construct and evaluate a state-to-distribution (STD) model fo

<a href=[email protected]"> 2 Apr 07, 2022
使用OpenCV部署全景驾驶感知网络YOLOP,可同时处理交通目标检测、可驾驶区域分割、车道线检测,三项视觉感知任务,包含C++和Python两种版本的程序实现。本套程序只依赖opencv库就可以运行, 从而彻底摆脱对任何深度学习框架的依赖。

YOLOP-opencv-dnn 使用OpenCV部署全景驾驶感知网络YOLOP,可同时处理交通目标检测、可驾驶区域分割、车道线检测,三项视觉感知任务,依然是包含C++和Python两种版本的程序实现 onnx文件从百度云盘下载,链接:https://pan.baidu.com/s/1A_9cldU

178 Jan 07, 2023
Symmetry and Uncertainty-Aware Object SLAM for 6DoF Object Pose Estimation

SUO-SLAM This repository hosts the code for our CVPR 2022 paper "Symmetry and Uncertainty-Aware Object SLAM for 6DoF Object Pose Estimation". ArXiv li

Robot Perception & Navigation Group (RPNG) 97 Jan 03, 2023
Baleen: Robust Multi-Hop Reasoning at Scale via Condensed Retrieval (NeurIPS'21)

Baleen Baleen is a state-of-the-art model for multi-hop reasoning, enabling scalable multi-hop search over massive collections for knowledge-intensive

Stanford Future Data Systems 22 Dec 05, 2022
Cognate Detection Repository

Cognate Detection Repository Details This repository contains the data for two publications: Challenge Dataset of Cognates and False Friend Pairs from

Diptesh Kanojia 1 Apr 26, 2022