⛵️The official PyTorch implementation for "BERT-of-Theseus: Compressing BERT by Progressive Module Replacing" (EMNLP 2020).

Last update: Nov 25, 2022

Overview

BERT-of-Theseus

Code for paper "BERT-of-Theseus: Compressing BERT by Progressive Module Replacing".

BERT-of-Theseus is a new compressed BERT by progressively replacing the components of the original BERT.

Citation

If you use this code in your research, please cite our paper:

@inproceedings{xu-etal-2020-bert,
    title = "{BERT}-of-Theseus: Compressing {BERT} by Progressive Module Replacing",
    author = "Xu, Canwen  and
      Zhou, Wangchunshu  and
      Ge, Tao  and
      Wei, Furu  and
      Zhou, Ming",
    booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.emnlp-main.633",
    pages = "7859--7869"
}

NEW: We have uploaded a script for making predictions on GLUE tasks and preparing for leaderboard submission. Check out here!

How to run BERT-of-Theseus

Requirement

Our code is built on huggingface/transformers. To use our code, you must clone and install huggingface/transformers.

Compress a BERT

You should fine-tune a predecessor model following the instruction from huggingface and then save it to a directory if you haven't done so.
Run compression following the examples below:

# For compression with a replacement scheduler
export GLUE_DIR=/path/to/glue_data
export TASK_NAME=MRPC

python ./run_glue.py \
  --model_name_or_path /path/to/saved_predecessor \
  --task_name $TASK_NAME \
  --do_train \
  --do_eval \
  --do_lower_case \
  --data_dir "$GLUE_DIR/$TASK_NAME" \
  --max_seq_length 128 \
  --per_gpu_train_batch_size 32 \
  --per_gpu_eval_batch_size 32 \
  --learning_rate 2e-5 \
  --save_steps 50 \
  --num_train_epochs 15 \
  --output_dir /path/to/save_successor/ \
  --evaluate_during_training \
  --replacing_rate 0.3 \
  --scheduler_type linear \
  --scheduler_linear_k 0.0006

# For compression with a constant replacing rate
export GLUE_DIR=/path/to/glue_data
export TASK_NAME=MRPC

python ./run_glue.py \
  --model_name_or_path /path/to/saved_predecessor \
  --task_name $TASK_NAME \
  --do_train \
  --do_eval \
  --do_lower_case \
  --data_dir "$GLUE_DIR/$TASK_NAME" \
  --max_seq_length 128 \
  --per_gpu_train_batch_size 32 \
  --per_gpu_eval_batch_size 32 \
  --learning_rate 2e-5 \
  --save_steps 50 \
  --num_train_epochs 15 \
  --output_dir /path/to/save_successor/ \
  --evaluate_during_training \
  --replacing_rate 0.5 \
  --steps_for_replacing 2500

For the detailed description of arguments, please refer to the source code.

Load Pretrained Model on MNLI

We provide a 6-layer pretrained model on MNLI as a general-purpose model, which can transfer to other sentence classification tasks, outperforming DistillBERT (with the same 6-layer structure) on six tasks of GLUE (dev set).

Method	MNLI	MRPC	QNLI	QQP	RTE	SST-2	STS-B
BERT-base	83.5	89.5	91.2	89.8	71.1	91.5	88.9
DistillBERT	79.0	87.5	85.3	84.9	59.9	90.7	81.2
BERT-of-Theseus	82.1	87.5	88.8	88.8	70.1	91.8	87.8

You can easily load our general-purpose model using huggingface/transformers.

from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("canwenxu/BERT-of-Theseus-MNLI")

model = AutoModel.from_pretrained("canwenxu/BERT-of-Theseus-MNLI")

Bug Report and Contribution

If you'd like to contribute and add more tasks (only GLUE is available at this moment), please submit a pull request and contact me. Also, if you find any problem or bug, please report with an issue. Thanks!

Third-Party Implementations

We list some third-party implementations from the community here. Please kindly add your implementation to this list:

Tensorflow Implementation (tested on NER): https://github.com/qiufengyuyi/bert-of-theseus-tf
Keras Implementation (tested on text classification): https://github.com/bojone/bert-of-theseus

⛵️The official PyTorch implementation for "BERT-of-Theseus: Compressing BERT by Progressive Module Replacing" (EMNLP 2020).

Related tags

Overview

BERT-of-Theseus

Citation

How to run BERT-of-Theseus

Requirement

Compress a BERT

Load Pretrained Model on MNLI

Bug Report and Contribution

Third-Party Implementations

Owner

Kevin Canwen Xu

Telegram bot to auto post messages of one channel in another channel as soon as it is posted, without the forwarded tag.

[WWW 2021 GLB] New Benchmarks for Learning on Non-Homophilous Graphs

Python bot created with Selenium that can guess the daily Wordle word correct 96.8% of the time.

DeepPavlov Tutorials

Python library for parsing resumes using natural language processing and machine learning

Converts text into a PDF of handwritten notes

Flexible interface for high-performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.

Code associated with the "Data Augmentation using Pre-trained Transformer Models" paper

Data manipulation and transformation for audio signal processing, powered by PyTorch

A crowdsourced dataset of dialogues grounded in social contexts involving utilization of commonsense.

CCQA A New Web-Scale Question Answering Dataset for Model Pre-Training

History Aware Multimodal Transformer for Vision-and-Language Navigation

Module for automatic summarization of text documents and HTML pages.

Pytorch implementation of Tacotron

Full Spectrum Bioinformatics - a free online text designed to introduce key topics in Bioinformatics using the Python

🤗🖼️ HuggingPics: Fine-tune Vision Transformers for anything using images found on the web.

Associated Repository for "Translation between Molecules and Natural Language"

Open Source Neural Machine Translation in PyTorch

Arabic speech recognition, classification and text-to-speech.

Utilities for preprocessing text for deep learning with Keras