Adaptive Multi-Teacher Multi-level Knowledge Distillation(AMTML-KD)

Paper has been accepted by Neurocomputing 415(2020): 106–113.

Authors: Yuang Liu, Wei Zhang and Jun Wang.

Links: [ pdf ] [ code ]

Requirements

PyTorch >= 1.0.0
Jupyter
visdom

Introduction

Knowledge distillation (KD) is an effective learning paradigm for improving the performance of light-weight student networks by utilizing additional supervision knowledge distilled from teacher networks. Most pioneering studies either learn from only a single teacher in their distillation learning methods, neglecting the potential that a student can learn from multiple teachers simultaneously, or simply treat each teacher to be equally important, unable to reveal the different importance of teachers for specific examples. To bridge this gap, we propose a novel adaptive multi-teacher multi-level knowledge distillation learning framework (AMTML-KD), which consists two novel insights: (i) associating each teacher with a latent representation to adaptively learn instance-level teacher importance weights which are leveraged for acquiring integrated soft-targets (high-level knowledge) and (ii) enabling the intermediate-level hints (intermediate-level knowledge) to be gathered from multiple teachers by the proposed multi-group hint strategy. As such, a student model can learn multi-level knowledge from multiple teachers through AMTML-KD. Extensive results on publicly available datasets demonstrate the proposed learning framework ensures student to achieve improved performance than strong competitors.

Citation

@article{LIU2020106,
    title = {Adaptive multi-teacher multi-level knowledge distillation},
    author = {Yuang Liu and Wei Zhang and Jun Wang},
    journal = {Neurocomputing},
    volume = {415},
    pages = {106 -- 113},
    year = {2020},
    issn = {0925 -- 2312},
}

AMTML-KD: Adaptive Multi-teacher Multi-level Knowledge Distillation

Related tags

Overview

Adaptive Multi-Teacher Multi-level Knowledge Distillation(AMTML-KD)

Requirements

Introduction

Citation

Owner

Frank Liu

The 1st Place Solution of the Facebook AI Image Similarity Challenge (ISC21) : Descriptor Track.

Code for our NeurIPS 2021 paper: Sparsely Changing Latent States for Prediction and Planning in Partially Observable Domains

PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation

A clean and robust Pytorch implementation of PPO on continuous action space.

code release for USENIX'22 paper `On the Security Risks of AutoML`

Optimizers-visualized - Visualization of different optimizers on local minimas and saddle points.

Scenarios, tutorials and demos for Autonomous Driving

Dynamic Visual Reasoning by Learning Differentiable Physics Models from Video and Language (NeurIPS 2021)

FAMIE is a comprehensive and efficient active learning (AL) toolkit for multilingual information extraction (IE)

Transformers4Rec is a flexible and efficient library for sequential and session-based recommendation, available for both PyTorch and Tensorflow.

MAGMA - a GPT-style multimodal model that can understand any combination of images and language

[CVPRW 21] "BNN - BN = ? Training Binary Neural Networks without Batch Normalization", Tianlong Chen, Zhenyu Zhang, Xu Ouyang, Zechun Liu, Zhiqiang Shen, Zhangyang Wang

Self-Supervised Learning for Domain Adaptation on Point-Clouds

Official pytorch code for SSC-GAN: Semi-Supervised Single-Stage Controllable GANs for Conditional Fine-Grained Image Generation(ICCV 2021)

Code for the ECIR'22 paper "Evaluating the Robustness of Retrieval Pipelines with Query Variation Generators"

A framework for using LSTMs to detect anomalies in multivariate time series data. Includes spacecraft anomaly data and experiments from the Mars Science Laboratory and SMAP missions.

The authors' implementation of Unsupervised Adversarial Learning of 3D Human Pose from 2D Joint Locations

Camera-caps - Examine the camera capabilities for V4l2 cameras

Uncertainty Estimation via Response Scaling for Pseudo-mask Noise Mitigation in Weakly-supervised Semantic Segmentation

Simulation-based performance analysis of server-less Blockchain-enabled Federated Learning