🛠️ Tools for Transformers compression using Lightning ⚡

Overview

Hits

Bert-squeeze

Bert-squeeze is a repository aiming to provide code to reduce the size of Transformer-based models or decrease their latency at inference time.

It gathers a non-exhaustive list of techniques such as distillation, pruning, quantization, early-exiting. The repo is written using PyTorch Lightning and Transformers.

About the project

As a heavy user of transformer-based models (which are truly amazing from my point of view) I always struggled to put those heavy models in production while having a decent inference speed. There are of course a bunch of existing libraries to optimize and compress transformer-based models (ONNX , distiller, compressors , KD_Lib, ... ).
I started this project because of the need to reduce the latency of models integrating transformers as subcomponents. For this reason, this project aims at providing implementations to train various transformer-based models (and others) using PyTorch Lightning but also to distill, prune, and quantize models.
I chose to write this repo with Lightning because of its growing trend, its flexibility, and the very few repositories using it. It currently only handles sequence classification models, but support for other tasks and custom architectures is planned.

Installation

First download the repository:

git clone https://github.com/JulesBelveze/bert-squeeze.git

and then install dependencies using poetry:

poetry install

You are all set!

Quickstarts

You can find a bunch of already prepared configurations under the examples folder. Just choose the one you need and run the following:

python3 -m bert-squeeze.main -cp=examples -cn=wanted_config

Disclaimer: I have not extensively tested all procedures and thus do not guarantee the performance of every implemented method.

Concepts

Transformers

If you never heard of it then I can only recommend you to read this amazing blog post and if you want to dig deeper there is this awesome lecture was given by Stanford available here.

Distillation

The idea of distillation is to train a small network to mimic a big network by trying to replicate its outputs. The repository provides the ability to transfer knowledge from any model to any other (if you need a model that is not within the models folder just write your own).

The repository also provides the possibility to perform soft-distillation or hard-distillation on an unlabeled dataset. In the soft case, we use the probabilities of the teacher as a target. In the hard one, we assume that the teacher's predictions are the actual label.

You can find these implementations under the distillation/ folder.

Quantization

Neural network quantization is the process of reducing the weights precision in the neural network. The repo has two callbacks one for dynamic quantization and one for quantization-aware training (using the Lightning callback) .

You can find those implementations under the utils/callbacks/ folder.

Pruning

Pruning neural networks consist of removing weights from trained models to compress them. This repo features various pruning implementations and methods such as head-pruning, layer dropping, and weights dropping.

You can find those implementations under the utils/callbacks/ folder.

Contributions and questions

If you are missing a feature that could be relevant to this repo, or a bug that you noticed feel free to open a PR or open an issue. As you can see in the roadmap there are a bunch more features to come 😃

Also, if you have any questions or suggestions feel free to ask!

References

  1. Alammar, J (2018). The Illustrated Transformer [Blog post]. Retrieved from https://jalammar.github.io/illustrated-transformer/
  2. stanfordonline (2021) Stanford CS224N NLP with Deep Learning | Winter 2021 | Lecture 9 - Self- Attention and Transformers. [online video] Available at: https://www.youtube.com/watch?v=ptuGllU5SQQ
  3. Thomas Wolf and Lysandre Debut and Victor Sanh and Julien Chaumond and Clement Delangue and Anthony Moi and Pierric Cistac and Tim Rault and Rémi Louf and Morgan Funtowicz and Jamie Brew (2019). HuggingFace's Transformers: State-of-the-art Natural Language Processing
  4. Hassan Sajjad and Fahim Dalvi and Nadir Durrani and Preslav Nakov (2020). Poor Man's BERT Smaller and Faster Transformer Models
  5. Angela Fan and Edouard Grave and Armand Joulin (2019). Reducing Transformer Depth on Demand with Structured Dropout
  6. Paul Michel and Omer Levy and Graham Neubig (2019). Are Sixteen Heads Really Better than One?
  7. Fangxiaoyu Feng and Yinfei Yang and Daniel Cer and Naveen Arivazhagan and Wei Wang (2020). Language-agnostic BERT Sentence Embedding
Owner
Jules Belveze
AI craftsman | NLP | MLOps
Jules Belveze
R-package accompanying the paper "Dynamic Factor Model for Functional Time Series: Identification, Estimation, and Prediction"

dffm The goal of dffm is to provide functionality to apply the methods developed in the paper “Dynamic Factor Model for Functional Time Series: Identi

Sven Otto 3 Dec 09, 2022
This git repo contains the implementation of my ML project on Heart Disease Prediction

Introduction This git repo contains the implementation of my ML project on Heart Disease Prediction. This is a real-world machine learning model/proje

Aryan Dutta 1 Feb 02, 2022
A texturizer that I just made. Nothing special here.

texturizer This is a little project that I did with an hour's time. It texturizes an image given a image and a texture to texturize it with. There is

1 Nov 11, 2021
A chemical analysis of lipophilicities & molecule drawings including ML

A chemical analysis of lipophilicity & molecule drawings including a bit of ML analysis. This is a simple project that includes two Jupyter files (one

Aurimas A. Nausėdas 7 Nov 22, 2022
Over9000 optimizer

Optimizers and tests Every result is avg of 20 runs. Dataset LR Schedule Imagenette size 128, 5 epoch Imagewoof size 128, 5 epoch Adam - baseline OneC

Mikhail Grankin 405 Nov 27, 2022
This repository contains the code for Direct Molecular Conformation Generation (DMCG).

Direct Molecular Conformation Generation This repository contains the code for Direct Molecular Conformation Generation (DMCG). Dataset Download rdkit

25 Dec 20, 2022
3D Generative Adversarial Network

Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling This repository contains pre-trained models and sampling

Chengkai Zhang 791 Dec 20, 2022
Code for "Hierarchical Skills for Efficient Exploration" HSD-3 Algorithm and Baselines

Hierarchical Skills for Efficient Exploration This is the source code release for the paper Hierarchical Skills for Efficient Exploration. It contains

Facebook Research 38 Dec 06, 2022
Finetune the base 64 px GLIDE-text2im model from OpenAI on your own image-text dataset

Finetune the base 64 px GLIDE-text2im model from OpenAI on your own image-text dataset

Clay Mullis 82 Oct 13, 2022
NVTabular is a feature engineering and preprocessing library for tabular data designed to quickly and easily manipulate terabyte scale datasets used to train deep learning based recommender systems.

NVTabular is a feature engineering and preprocessing library for tabular data designed to quickly and easily manipulate terabyte scale datasets used to train deep learning based recommender systems.

880 Jan 07, 2023
PyTorch Implementation of our paper Explain Me the Painting: Multi-Topic Knowledgeable Art Description Generation

PyTorch Implementation of our paper Explain Me the Painting: Multi-Topic Knowledgeable Art Description Generation

Zechen Bai 12 Jul 08, 2022
PyTorch code for our ECCV 2020 paper "Single Image Super-Resolution via a Holistic Attention Network"

HAN PyTorch code for our ECCV 2020 paper "Single Image Super-Resolution via a Holistic Attention Network" This repository is for HAN introduced in the

五维空间 140 Nov 23, 2022
List of awesome things around semantic segmentation 🎉

Awesome Semantic Segmentation List of awesome things around semantic segmentation 🎉 Semantic segmentation is a computer vision task in which we label

Dam Minh Tien 18 Nov 26, 2022
An Unbiased Learning To Rank Algorithms (ULTRA) toolbox

Unbiased Learning to Rank Algorithms (ULTRA) This is an Unbiased Learning To Rank Algorithms (ULTRA) toolbox, which provides a codebase for experiment

back 3 Nov 18, 2022
Large scale PTM - PPI relation extraction

Large-scale protein-protein post-translational modification extraction with distant supervision and confidence calibrated BioBERT The silver standard

1 Feb 25, 2022
Yolov5-opencv-cpp-python - Example of using ultralytics YOLO V5 with OpenCV 4.5.4, C++ and Python

yolov5-opencv-cpp-python Example of performing inference with ultralytics YOLO V

183 Jan 09, 2023
Software Platform for solving and manipulating multiparametric programs in Python

PPOPT Python Parametric OPtimization Toolbox (PPOPT) is a software platform for solving and manipulating multiparametric programs in Python. This pack

10 Sep 13, 2022
Automatic Data-Regularized Actor-Critic (Auto-DrAC)

Auto-DrAC: Automatic Data-Regularized Actor-Critic This is a PyTorch implementation of the methods proposed in Automatic Data Augmentation for General

89 Dec 13, 2022
This is the offical website for paper ''Category-consistent deep network learning for accurate vehicle logo recognition''

The Pytorch Implementation of Category-consistent deep network learning for accurate vehicle logo recognition This is the offical website for paper ''

Wanglong Lu 28 Oct 29, 2022
PyTorch reimplementation of the paper Involution: Inverting the Inherence of Convolution for Visual Recognition [CVPR 2021].

Involution: Inverting the Inherence of Convolution for Visual Recognition Unofficial PyTorch reimplementation of the paper Involution: Inverting the I

Christoph Reich 100 Dec 01, 2022