[NeurIPS-2021] Mosaicking to Distill: Knowledge Distillation from Out-of-Domain Data

Last update: Nov 10, 2022

Related tags

Overview

MosaicKD

Code for NeurIPS-21 paper "Mosaicking to Distill: Knowledge Distillation from Out-of-Domain Data"

1. Motivation

Natural images share common local patterns. In MosaicKD, these local patterns are first dissembled from OOD data and then assembled to synthesize in-domain data, making OOD-KD feasible.

2. Method

MosaicKD establishes a four-player minimax game between a generator G, a patch discriminator D, a teacher model T and a student model S. The generator, as those in prior GANs, takes as input a random noise vector and learns to mosaic synthetic in-domain samples with locally-authentic and globally-legitimate distributions, under the supervisions back-propagated from the other three players.

3. Reproducing our results

3.1 Prepare teachers

Please download our pre-trained models from Dropbox (266 M) and extract them as "checkpoints/pretrained/*.pth". You can also train your own models as follows:

python train_scratch.py --lr 0.1 --batch-size 256 --model wrn40_2 --dataset cifar100

3.2 OOD-KD: CIFAR-100 (ID) + CIFAR10 (OOD)

Vanilla KD (Blind KD)

python kd_vanilla.py --lr 0.1 --batch-size 128 --teacher wrn40_2 --student wrn16_1 --dataset cifar100 --unlabeled cifar10 --epoch 200 --gpu 0

Data-Free KD (DFQAD)

python kd_datafree.py --lr 0.1 --batch-size 256 --teacher wrn40_2 --student wrn16_1 --dataset cifar100 --unlabeled cifar10 --epoch 200 --lr 0.1 --local 1 --align 1 --adv 1 --balance 10 --gpu 0

MosaicKD (This work)

python kd_mosaic.py --lr 0.1 --batch-size 256 --teacher wrn40_2 --student wrn16_1 --dataset cifar100 --unlabeled cifar10 --epoch 200 --lr 0.1 --local 1 --align 1 --adv 1 --balance 10 --gpu 0

3.3 OOD-KD: CIFAR-100 (ID) + ImageNet/Places365 OOD Subset (OOD)

Prepare 32x32 datasets
Please prepare the 32x32 ImageNet following the instructions from https://patrykchrabaszcz.github.io/Imagenet32/ and extract them as "data/ImageNet_32x32/train" and "data/ImageNet_32x32/val". You can prepare Places365 in the same way.

MosaicKD on OOD subset
As ImageNet & Places365 contain a large number of in-domain samples, we construct OOD subset for training. Please run the scripts with ''--ood_subset'' to enable subset selection.

python kd_mosaic.py --lr 0.1 --batch-size 256 --teacher wrn40_2 --student wrn16_1 --dataset cifar100 --unlabeled cifar10 --epoch 200 --lr 0.1 --local 1 --align 1 --adv 1 --balance 10 --ood_subset --gpu 0

4. Visualization of synthetic data

5. Citation

If you found this work useful for your research, please cite our paper:

@article{fang2021mosaicking,
  title={Mosaicking to Distill: Knowledge Distillation from Out-of-Domain Data},
  author={Gongfan Fang and Yifan Bao and Jie Song and Xinchao Wang and Donglin Xie and Chengchao Shen and Mingli Song},
  journal={arXiv preprint arXiv:2110.15094},
  year={2021}
}

[NeurIPS-2021] Mosaicking to Distill: Knowledge Distillation from Out-of-Domain Data

Related tags

Overview

MosaicKD

1. Motivation

2. Method

3. Reproducing our results

3.1 Prepare teachers

3.2 OOD-KD: CIFAR-100 (ID) + CIFAR10 (OOD)

3.3 OOD-KD: CIFAR-100 (ID) + ImageNet/Places365 OOD Subset (OOD)

4. Visualization of synthetic data

5. Citation

Owner

ZJU-VIPA

Auto White-Balance Correction for Mixed-Illuminant Scenes

Pytorch implementation of MixNMatch

TLDR; Train custom adaptive filter optimizers without hand tuning or extra labels.

Automatic voice-synthetised summaries of latest research papers on arXiv

Everything about being a TA for ITP/AP course!

MultiTaskLearning - Multi Task Learning for 3D segmentation

Code for paper: Towards Tokenized Human Dynamics Representation

A pytorch implementation of the ACL2019 paper "Simple and Effective Text Matching with Richer Alignment Features".

Code for Towards Streaming Perception (ECCV 2020) :car:

This repository is a series of notebooks that show solutions for the projects at Dataquest.io.

Source code for CVPR2022 paper "Abandoning the Bayer-Filter to See in the Dark"

Process text, including tokenizing and representing sentences as vectors and Applying some concepts like RNN, LSTM and GRU to create a classifier can detect the language in which a sentence is written from among 17 languages.

Deep Reinforcement Learning with pytorch & visdom

NAACL'2021: Factual Probing Is [MASK]: Learning vs. Learning to Recall

Official implementation of Meta-StyleSpeech and StyleSpeech

Supervised multi-SNE (S-multi-SNE): Multi-view visualisation and classification

RID-Noise: Towards Robust Inverse Design under Noisy Environments

A scikit-learn compatible neural network library that wraps PyTorch

TJU Deep Learning & Neural Network

GeoMol: Torsional Geometric Generation of Molecular 3D Conformer Ensembles