Source code for "FastBERT: a Self-distilling BERT with Adaptive Inference Time".

Last update: Jan 02, 2023

Related tags

Overview

FastBERT

Source code for "FastBERT: a Self-distilling BERT with Adaptive Inference Time".

Good News

2021/10/29 - Code: Code of FastPLM is released on both Pypi and Github.

2021/09/08 - Paper: Journal version of FastBERT (FastPLM) is accepted by IEEE TNNLS. "An Empirical Study on Adaptive Inference for Pretrained Language Model".

2020/07/05 - Update: Pypi version of FastBERT has been launched. Please see fastbert-pypi.

Install fastbert with pip

$ pip install fastbert

Requirements

python >= 3.4.0, Install all the requirements with pip.

$ pip install -r requirements.txt

Quick start on the Chinese Book review dataset

Download the pre-trained Chinese BERT parameters from here, and save it to the models directory with the name of "Chinese_base_model.bin".

Run the following command to validate our FastBERT with Speed=0.5 on the Book review datasets.

$ CUDA_VISIBLE_DEVICES="0" python3 -u run_fastbert.py \
        --pretrained_model_path ./models/Chinese_base_model.bin \
        --vocab_path ./models/google_zh_vocab.txt \
        --train_path ./datasets/douban_book_review/train.tsv \
        --dev_path ./datasets/douban_book_review/dev.tsv \
        --test_path ./datasets/douban_book_review/test.tsv \
        --epochs_num 3 --batch_size 32 --distill_epochs_num 5 \
        --encoder bert --fast_mode --speed 0.5 \
        --output_model_path  ./models/douban_fastbert.bin

Meaning of each option.

usage: --pretrained_model_path Path to initialize model parameters.
       --vocab_path Path to the vocabulary.
       --train_path Path to the training dataset.
       --dev_path Path to the validating dataset.
       --test_path Path to the testing dataset.
       --epochs_num The epoch numbers of fine-tuning.
       --batch_size Batch size.
       --distill_epochs_num The epoch numbers of the self-distillation.
       --encoder The type of encoder.
       --fast_mode Whether to enable the fast mode of FastBERT.
       --speed The Speed value in the paper.
       --output_model_path Path to the output model parameters.

Test results on the Book review dataset.

Test results at fine-tuning epoch 3 (Baseline): Acc.=0.8688;  FLOPs=21785247744;
Test results at self-distillation epoch 1     : Acc.=0.8698;  FLOPs=6300902177;
Test results at self-distillation epoch 2     : Acc.=0.8691;  FLOPs=5844839008;
Test results at self-distillation epoch 3     : Acc.=0.8664;  FLOPs=5170940850;
Test results at self-distillation epoch 4     : Acc.=0.8664;  FLOPs=5170940327;
Test results at self-distillation epoch 5     : Acc.=0.8664;  FLOPs=5170940327;

Quick start on the English Ag.news dataset

Download the pre-trained English BERT parameters from here, and save it to the models directory with the name of "English_uncased_base_model.bin".

Download the ag_news.zip from here, and then unzip it to the datasets directory.

Run the following command to validate our FastBERT with Speed=0.5 on the Ag.news datasets.

$ CUDA_VISIBLE_DEVICES="0" python3 -u run_fastbert.py \
        --pretrained_model_path ./models/English_uncased_base_model.bin \
        --vocab_path ./models/google_uncased_en_vocab.txt \
        --train_path ./datasets/ag_news/train.tsv \
        --dev_path ./datasets/ag_news/test.tsv \
        --test_path ./datasets/ag_news/test.tsv \
        --epochs_num 3 --batch_size 32 --distill_epochs_num 5 \
        --encoder bert --fast_mode --speed 0.5 \
        --output_model_path  ./models/ag_news_fastbert.bin

Test results on the Ag.news dataset.

Test results at fine-tuning epoch 3 (Baseline): Acc.=0.9447;  FLOPs=21785247744;
Test results at self-distillation epoch 1     : Acc.=0.9308;  FLOPs=2172009009;
Test results at self-distillation epoch 2     : Acc.=0.9311;  FLOPs=2163471246;
Test results at self-distillation epoch 3     : Acc.=0.9314;  FLOPs=2108341649;
Test results at self-distillation epoch 4     : Acc.=0.9314;  FLOPs=2108341649;
Test results at self-distillation epoch 5     : Acc.=0.9314;  FLOPs=2108341649;

Datasets

More datasets can be downloaded from here.

Other implementations

There are some other excellent implementations of FastBERT.

BitVoyage/FastBERT (Pytorch): https://github.com/BitVoyage/FastBERT

Acknowledgement

This work is funded by 2019 Tencent Rhino-Bird Elite Training Program. Work done while this author was an intern at Tencent.

If you use this code, please cite this paper:

@inproceedings{weijie2020fastbert,
  title={{FastBERT}: a Self-distilling BERT with Adaptive Inference Time},
  author={Weijie Liu, Peng Zhou, Zhe Zhao, Zhiruo Wang, Haotang Deng, Qi Ju},
  booktitle={Proceedings of ACL 2020},
  year={2020}
}

Source code for "FastBERT: a Self-distilling BERT with Adaptive Inference Time".

Related tags

Overview

FastBERT

Good News

Requirements

Quick start on the Chinese Book review dataset

Quick start on the English Ag.news dataset

Datasets

Other implementations

Acknowledgement

Owner

Weijie Liu

Using fully convolutional networks for semantic segmentation with caffe for the cityscapes dataset

UAV-Networks-Routing is a Python simulator for experimenting routing algorithms and mac protocols on unmanned aerial vehicle networks.

Pytorch version of SfmLearner from Tinghui Zhou et al.

This script runs neural style transfer against the provided content image.

Face Transformer for Recognition

A denoising diffusion probabilistic model synthesises galaxies that are qualitatively and physically indistinguishable from the real thing.

Stochastic Tensor Optimization for Robot Motion - A GPU Robot Motion Toolkit

DEEPAGÉ: Answering Questions in Portuguese about the Brazilian Environment

A Library for Modelling Probabilistic Hierarchical Graphical Models in PyTorch

Non-Imaging Transient Reconstruction And TEmporal Search (NITRATES)

This is the repository of our article published on MDPI Entropy "Feature Selection for Recommender Systems with Quantum Computing".

Code for the paper "Implicit Representations of Meaning in Neural Language Models"

Torchlight2 lan game server tool - A message forwarding tool for Torchlight 2 lan game

⚾🤖⚾ Automatic baseball pitching overlay in realtime

[Machine Learning Engineer Basic Guide] 부스트캠프 AI Tech - Product Serving 자료

U-Time: A Fully Convolutional Network for Time Series Segmentation

Pytorch code for "State-only Imitation with Transition Dynamics Mismatch" (ICLR 2020)

A python library to artfully visualize Factorio Blueprints and an interactive web demo for using it.

RealFormer-Pytorch Implementation of RealFormer using pytorch

Pytorch implementation of paper "Efficient Nearest Neighbor Language Models" (EMNLP 2021)