Pytorch implementation for Patient Knowledge Distillation for BERT Model Compression

Last update: Dec 19, 2022

Overview

Patient Knowledge Distillation for BERT Model Compression

Knowledge distillation for BERT model

Installation

Run command below to install the environment

conda install pytorch torchvision cudatoolkit=10.0 -c pytorch
pip install -r requirements.txt

Training

Objective Function

L = (1 - \alpha) L_CE + \alpha * L_DS + \beta * L_PT,

where L_CE is the CrossEntropy loss, DS is the usual Distillation loss, and PT is the proposed loss. Please see our paper below for more details.

Data Preprocess

Modify the HOME_DATA_FOLDER in envs.py and put all data under it (by default it is ./data), RTE data is uploaded for your convenience.

The folder name under HOME_DATA_FOLDER should be
- data_raw: store the raw datas of all tasks. So put downloaded raw data under here
  - MRPC
  - RTE
  - ... (other tasks)
- data_feat: store the tokenized data under this folder (optional)
  - MRPC
  - RTE
  - ...
models
- pretrained: put downloaded pretrained model (bert-base-uncased) under this folder

Predefinted Training

Run NLI_KD_training.py to start training, you can set DEBUG = True to run some pre-defined arguments

set argv = get_predefine_argv('glue', 'RTE', 'finetune_teacher') or argv = get_predefine_argv('glue', 'RTE', 'finetune_student') to start the normal fine-tuning
run run_glue_benchmark.py to get teacher's prediction for KD or PKD.
- set output_all_layers = True for patient teacher
- set output_all_layers = False for normal teacher
set argv = get_predefine_argv('glue', 'RTE', 'kd') to start the vanilla KD
set argv = get_predefine_argv('glue', 'RTE', 'kd.cls') to start the vanilla KD

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Citation

If you find this code useful for your research, please consider citing:

@article{sun2019patient,
title={Patient Knowledge Distillation for BERT Model Compression},
author={Sun, Siqi and Cheng, Yu and Gan, Zhe and Liu, Jingjing},
journal={arXiv preprint arXiv:1908.09355},
year={2019}
}

Paper is available at here.

Pytorch implementation for Patient Knowledge Distillation for BERT Model Compression

Related tags

Overview

Patient Knowledge Distillation for BERT Model Compression

Installation

Training

Objective Function

Data Preprocess

Predefinted Training

Contributing

Citation

Owner

Siqi

An implementation of the Contrast Predictive Coding (CPC) method to train audio features in an unsupervised fashion.

some classic model used to segment the medical images like CT、X-ray and so on

Predicting Semantic Map Representations from Images with Pyramid Occupancy Networks

Non-stationary GP package written from scratch in PyTorch

[SIGIR22] Official PyTorch implementation for "CORE: Simple and Effective Session-based Recommendation within Consistent Representation Space".

Volsdf - Volume Rendering of Neural Implicit Surfaces

VGGVox models for Speaker Identification and Verification trained on the VoxCeleb (1 & 2) datasets

Multi-Joint dynamics with Contact. A general purpose physics simulator.

Rethinking the U-Net architecture for multimodal biomedical image segmentation

NL-Augmenter 🦎 → 🐍 A Collaborative Repository of Natural Language Transformations

Implementation of the algorithm shown in the article "Modelo de Predicción de Éxito de Canciones Basado en Descriptores de Audio"

Official PyTorch Implementation of paper "NeLF: Neural Light-transport Field for Single Portrait View Synthesis and Relighting", EGSR 2021.

Object tracking implemented with YOLOv4, DeepSort, and TensorFlow.

This repository contains the implementation of the paper: "Towards Frequency-Based Explanation for Robust CNN"

A 10000+ hours dataset for Chinese speech recognition

Face Alignment using python

Uses Open AI Gym environment to create autonomous cryptocurrency bot to trade cryptocurrencies.

3D-Transformer: Molecular Representation with Transformer in 3D Space

Pytorch implementation for "Implicit Feature Alignment: Learn to Convert Text Recognizer to Text Spotter".

Semantic Image Synthesis with SPADE