Code implementation of Data Efficient Stagewise Knowledge Distillation paper.

Overview

Data Efficient Stagewise Knowledge Distillation

Stagewise Training Procedure

Table of Contents

This repository presents the code implementation for Stagewise Knowledge Distillation, a technique for improving knowledge transfer between a teacher model and student model.

Requirements

  • Install the dependencies using conda with the requirements.yml file
    conda env create -f environment.yml
    
  • Setup the stagewise-knowledge-distillation package itself
    pip install -e .
    
  • Apart from the above mentioned dependencies, it is recommended to have an Nvidia GPU (CUDA compatible) with at least 8 GB of video memory (most of the experiments will work with 6 GB also). However, the code works with CPU only machines as well.

Image Classification

Introduction

In this work, ResNet architectures are used. Particularly, we used ResNet10, 14, 18, 20 and 26 as student networks and ResNet34 as the teacher network. The datasets used are CIFAR10, Imagenette and Imagewoof. Note that Imagenette and Imagewoof are subsets of ImageNet.

Preparation

  • Before any experiments, you need to download the data and saved weights of teacher model to appropriate locations.

  • The following script

    • downloads the datasets
    • saves 10%, 20%, 30% and 40% splits of each dataset separately
    • downloads teacher model weights for all 3 datasets
    # assuming you are in the root folder of the repository
    cd image_classification/scripts
    bash setup.sh
    

Experiments

For detailed information on the various experiments, refer to the paper. In all the image classification experiments, the following common training arguments are listed with the possible values they can take:

  • dataset (-d) : imagenette, imagewoof, cifar10
  • model (-m) : resnet10, resnet14, resnet18, resnet20, resnet26, resnet34
  • number of epochs (-e) : Integer is required
  • percentage of dataset (-p) : 10, 20, 30, 40 (don't use this argument at all for full dataset experiments)
  • random seed (-s) : Give any random seed (for reproducibility purposes)
  • gpu (-g) : Don't use unless training on CPU (in which case, use -g 'cpu' as the argument). In case of multi-GPU systems, run CUDA_VISIBLE_DEVICES=id in the terminal before the experiment, where id is the ID of your GPU according to nvidia-smi output.
  • Comet ML API key (-a) (optional) : If you want to use Comet ML for tracking your experiments, then either put your API key as the argument or make it the default argument in the arguments.py file. Otherwise, no need of using this argument.
  • Comet ML workspace (-w) (optional) : If you want to use Comet ML for tracking your experiments, then either put your workspace name as the argument or make it the default argument in the arguments.py file. Otherwise, no need of using this argument.

In the following subsections, example commands for training are given for one experiment each.

No Teacher

Full Imagenette dataset, ResNet10

python3 no_teacher.py -d imagenette -m resnet10 -e 100 -s 0

Traditional KD (FitNets)

20% Imagewoof dataset, ResNet18

python3 traditional_kd.py -d imagewoof -m resnet18 -p 20 -e 100 -s 0

FSP KD

30% CIFAR10 dataset, ResNet14

python3 fsp_kd.py -d cifar10 -m resnet14 -p 30 -e 100 -s 0

Attention Transfer KD

10% Imagewoof dataset, ResNet26

python3 attention_transfer_kd.py -d imagewoof -m resnet26 -p 10 -e 100 -s 0

Hinton KD

Full CIFAR10 dataset, ResNet14

python3 hinton_kd.py -d cifar10 -m resnet14 -e 100 -s 0

Simultaneous KD (Proposed Baseline)

40% Imagenette dataset, ResNet20

python3 simultaneous_kd.py -d imagenette -m resnet20 -p 40 -e 100 -s 0

Stagewise KD (Proposed Method)

Full CIFAR10 dataset, ResNet10

python3 stagewise_kd.py -d cifar10 -m resnet10 -e 100 -s 0

Semantic Segmentation

Introduction

In this work, ResNet backbones are used to construct symmetric U-Nets for semantic segmentation. Particularly, we used ResNet10, 14, 18, 20 and 26 as the backbones for student networks and ResNet34 as the backbone for the teacher network. The dataset used is the Cambridge-driving Labeled Video Database (CamVid).

Preparation

  • The following script
    • downloads the data (and shifts it to appropriate folder)
    • saves 10%, 20%, 30% and 40% splits of each dataset separately
    • downloads the pretrained teacher weights in appropriate folder
    # assuming you are in the root folder of the repository
    cd semantic_segmentation/scripts
    bash setup.sh
    

Experiments

For detailed information on the various experiments, refer to the paper. In all the semantic segmentation experiments, the following common training arguments are listed with the possible values they can take:

  • dataset (-d) : camvid
  • model (-m) : resnet10, resnet14, resnet18, resnet20, resnet26, resnet34
  • number of epochs (-e) : Integer is required
  • percentage of dataset (-p) : 10, 20, 30, 40 (don't use this argument at all for full dataset experiments)
  • random seed (-s) : Give any random seed (for reproducibility purposes)
  • gpu (-g) : Don't use unless training on CPU (in which case, use -g 'cpu' as the argument). In case of multi-GPU systems, run CUDA_VISIBLE_DEVICES=id in the terminal before the experiment, where id is the ID of your GPU according to nvidia-smi output.
  • Comet ML API key (-a) (optional) : If you want to use Comet ML for tracking your experiments, then either put your API key as the argument or make it the default argument in the arguments.py file. Otherwise, no need of using this argument.
  • Comet ML workspace (-w) (optional) : If you want to use Comet ML for tracking your experiments, then either put your workspace name as the argument or make it the default argument in the arguments.py file. Otherwise, no need of using this argument.

Note: Currently, there are no plans for adding Attention Transfer KD and FSP KD experiments for semantic segmentation.

In the following subsections, example commands for training are given for one experiment each.

No Teacher

Full CamVid dataset, ResNet10

python3 pretrain.py -d camvid -m resnet10 -e 100 -s 0

Traditional KD (FitNets)

20% CamVid dataset, ResNet18

python3 traditional_kd.py -d camvid -m resnet18 -p 20 -e 100 -s 0

Simultaneous KD (Proposed Baseline)

40% CamVid dataset, ResNet20

python3 simultaneous_kd.py -d camvid -m resnet20 -p 40 -e 100 -s 0

Stagewise KD (Proposed Method)

10 % CamVid dataset, ResNet10

python3 stagewise_kd.py -d camvid -m resnet10 -p 10 -e 100 -s 0

Citation

If you use this code or method in your work, please cite using

@misc{kulkarni2020data,
      title={Data Efficient Stagewise Knowledge Distillation}, 
      author={Akshay Kulkarni and Navid Panchi and Sharath Chandra Raparthy and Shital Chiddarwar},
      year={2020},
      eprint={1911.06786},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

Built by Akshay Kulkarni, Navid Panchi and Sharath Chandra Raparthy.

Owner
IvLabs
Robotics and AI community of VNIT
IvLabs
9th place solution

AllDataAreExt-Galixir-Kaggle-HPA-2021-Solution Team Members Qishen Ha is Master of Engineering from the University of Tokyo. Machine Learning Engineer

daishu 5 Nov 18, 2021
Model Zoo for MindSpore

Welcome to the Model Zoo for MindSpore In order to facilitate developers to enjoy the benefits of MindSpore framework, we will continue to add typical

MindSpore 226 Jan 07, 2023
Trax — Deep Learning with Clear Code and Speed

Trax — Deep Learning with Clear Code and Speed Trax is an end-to-end library for deep learning that focuses on clear code and speed. It is actively us

Google 7.3k Dec 26, 2022
KinectFusion implemented in Python with PyTorch

KinectFusion implemented in Python with PyTorch This is a lightweight Python implementation of KinectFusion. All the core functions (TSDF volume, fram

Jingwen Wang 80 Jan 03, 2023
The Rich Get Richer: Disparate Impact of Semi-Supervised Learning

The Rich Get Richer: Disparate Impact of Semi-Supervised Learning Preprocess file of the dataset used in implicit sub-populations: (Demographic groups

<a href=[email protected]"> 4 Oct 14, 2022
Multi-task head pose estimation in-the-wild

Multi-task head pose estimation in-the-wild We provide C++ code in order to replicate the head-pose experiments in our paper https://ieeexplore.ieee.o

Roberto Valle 26 Oct 06, 2022
Anomaly Detection Based on Hierarchical Clustering of Mobile Robot Data

We proposed a new approach to detect anomalies of mobile robot data. We investigate each data seperately with two clustering method hierarchical and k-means. There are two sub-method that we used for

Zekeriyya Demirci 1 Jan 09, 2022
A deep learning library that makes face recognition efficient and effective

Distributed Arcface Training in Pytorch This is a deep learning library that makes face recognition efficient, and effective, which can train tens of

Sajjad Aemmi 10 Nov 23, 2021
Cache Requests in Deta Bases and Echo them with Deta Micros

Deta Echo Cache Leverage the awesome Deta Micros and Deta Base to cache requests and echo them as needed. Stop worrying about slow public APIs or agre

Gingerbreadfork 8 Dec 07, 2021
This git repo contains the implementation of my ML project on Heart Disease Prediction

Introduction This git repo contains the implementation of my ML project on Heart Disease Prediction. This is a real-world machine learning model/proje

Aryan Dutta 1 Feb 02, 2022
Auto-updating data to assist in investment to NEPSE

Symbol Ratios Summary Sector LTP Undervalued Bonus % MEGA Strong Commercial Banks 368 5 10 JBBL Strong Development Banks 568 5 10 SIFC Strong Finance

Amit Chaudhary 16 Nov 01, 2022
Official pytorch implementation of the IrwGAN for unaligned image-to-image translation

IrwGAN (ICCV2021) Unaligned Image-to-Image Translation by Learning to Reweight [Update] 12/15/2021 All dataset are released, trained models and genera

37 Nov 09, 2022
내가 보려고 정리한 <프로그래밍 기초 Ⅰ> / organized for me

Programming-Basics 프로그래밍 기초 Ⅰ 아카이브 Do it! 점프 투 파이썬 주차 강의주제 비고 1주차 Syllabus 2주차 자료형 - 숫자형 3주차 자료형 - 문자열형 4주차 입력과 출력 5주차 제어문 - 조건문 if 6주차 제어문 - 반복문 whil

KIMMINSEO 1 Mar 07, 2022
Simple sinc interpolation in PyTorch.

Kazane: simple sinc interpolation for 1D signal in PyTorch Kazane utilize FFT based convolution to provide fast sinc interpolation for 1D signal when

Chin-Yun Yu 10 May 03, 2022
Official implementation for "Low-light Image Enhancement via Breaking Down the Darkness"

Low-light Image Enhancement via Breaking Down the Darkness by Qiming Hu, Xiaojie Guo. 1. Dependencies Python3 PyTorch=1.0 OpenCV-Python, TensorboardX

Qiming Hu 30 Jan 01, 2023
Bald-to-Hairy Translation Using CycleGAN

GANiry: Bald-to-Hairy Translation Using CycleGAN Official PyTorch implementation of GANiry. GANiry: Bald-to-Hairy Translation Using CycleGAN, Fidan Sa

Fidan Samet 10 Oct 27, 2022
UI2I via StyleGAN2 - Unsupervised image-to-image translation method via pre-trained StyleGAN2 network

We proposed an unsupervised image-to-image translation method via pre-trained StyleGAN2 network. paper: Unsupervised Image-to-Image Translation via Pr

208 Dec 30, 2022
Deep learning (neural network) based remote photoplethysmography: how to extract pulse signal from video using deep learning tools

Deep-rPPG: Camera-based pulse estimation using deep learning tools Deep learning (neural network) based remote photoplethysmography: how to extract pu

Terbe Dániel 138 Dec 17, 2022
Node-level Graph Regression with Deep Gaussian Process Models

Node-level Graph Regression with Deep Gaussian Process Models Prerequests our implementation is mainly based on tensorflow 1.x and gpflow 1.x: python

1 Jan 16, 2022