πŸ› οΈ Tools for Transformers compression using Lightning ⚑

Overview

Hits

Bert-squeeze

Bert-squeeze is a repository aiming to provide code to reduce the size of Transformer-based models or decrease their latency at inference time.

It gathers a non-exhaustive list of techniques such as distillation, pruning, quantization, early-exiting. The repo is written using PyTorch Lightning and Transformers.

About the project

As a heavy user of transformer-based models (which are truly amazing from my point of view) I always struggled to put those heavy models in production while having a decent inference speed. There are of course a bunch of existing libraries to optimize and compress transformer-based models (ONNX , distiller, compressors , KD_Lib, ... ).
I started this project because of the need to reduce the latency of models integrating transformers as subcomponents. For this reason, this project aims at providing implementations to train various transformer-based models (and others) using PyTorch Lightning but also to distill, prune, and quantize models.
I chose to write this repo with Lightning because of its growing trend, its flexibility, and the very few repositories using it. It currently only handles sequence classification models, but support for other tasks and custom architectures is planned.

Installation

First download the repository:

git clone https://github.com/JulesBelveze/bert-squeeze.git

and then install dependencies using poetry:

poetry install

You are all set!

Quickstarts

You can find a bunch of already prepared configurations under the examples folder. Just choose the one you need and run the following:

python3 -m bert-squeeze.main -cp=examples -cn=wanted_config

Disclaimer: I have not extensively tested all procedures and thus do not guarantee the performance of every implemented method.

Concepts

Transformers

If you never heard of it then I can only recommend you to read this amazing blog post and if you want to dig deeper there is this awesome lecture was given by Stanford available here.

Distillation

The idea of distillation is to train a small network to mimic a big network by trying to replicate its outputs. The repository provides the ability to transfer knowledge from any model to any other (if you need a model that is not within the models folder just write your own).

The repository also provides the possibility to perform soft-distillation or hard-distillation on an unlabeled dataset. In the soft case, we use the probabilities of the teacher as a target. In the hard one, we assume that the teacher's predictions are the actual label.

You can find these implementations under the distillation/ folder.

Quantization

Neural network quantization is the process of reducing the weights precision in the neural network. The repo has two callbacks one for dynamic quantization and one for quantization-aware training (using the Lightning callback) .

You can find those implementations under the utils/callbacks/ folder.

Pruning

Pruning neural networks consist of removing weights from trained models to compress them. This repo features various pruning implementations and methods such as head-pruning, layer dropping, and weights dropping.

You can find those implementations under the utils/callbacks/ folder.

Contributions and questions

If you are missing a feature that could be relevant to this repo, or a bug that you noticed feel free to open a PR or open an issue. As you can see in the roadmap there are a bunch more features to come πŸ˜ƒ

Also, if you have any questions or suggestions feel free to ask!

References

  1. Alammar, J (2018). The Illustrated Transformer [Blog post]. Retrieved from https://jalammar.github.io/illustrated-transformer/
  2. stanfordonline (2021) Stanford CS224N NLP with Deep Learning | Winter 2021 | Lecture 9 - Self- Attention and Transformers. [online video] Available at: https://www.youtube.com/watch?v=ptuGllU5SQQ
  3. Thomas Wolf and Lysandre Debut and Victor Sanh and Julien Chaumond and Clement Delangue and Anthony Moi and Pierric Cistac and Tim Rault and RΓ©mi Louf and Morgan Funtowicz and Jamie Brew (2019). HuggingFace's Transformers: State-of-the-art Natural Language Processing
  4. Hassan Sajjad and Fahim Dalvi and Nadir Durrani and Preslav Nakov (2020). Poor Man's BERT Smaller and Faster Transformer Models
  5. Angela Fan and Edouard Grave and Armand Joulin (2019). Reducing Transformer Depth on Demand with Structured Dropout
  6. Paul Michel and Omer Levy and Graham Neubig (2019). Are Sixteen Heads Really Better than One?
  7. Fangxiaoyu Feng and Yinfei Yang and Daniel Cer and Naveen Arivazhagan and Wei Wang (2020). Language-agnostic BERT Sentence Embedding
Owner
Jules Belveze
AI craftsman | NLP | MLOps
Jules Belveze
Official Code For TDEER: An Efficient Translating Decoding Schema for Joint Extraction of Entities and Relations (EMNLP2021)

TDEER 🦌 πŸ¦’ Official Code For TDEER: An Efficient Translating Decoding Schema for Joint Extraction of Entities and Relations (EMNLP2021) Overview TDEE

33 Dec 23, 2022
USAD - UnSupervised Anomaly Detection on multivariate time series

USAD - UnSupervised Anomaly Detection on multivariate time series Scripts and utility programs for implementing the USAD architecture. Implementation

116 Jan 04, 2023
Code for "PVNet: Pixel-wise Voting Network for 6DoF Pose Estimation" CVPR 2019 oral

Good news! We release a clean version of PVNet: clean-pvnet, including how to train the PVNet on the custom dataset. Use PVNet with a detector. The tr

ZJU3DV 722 Dec 27, 2022
REBEL: Relation Extraction By End-to-end Language generation

REBEL: Relation Extraction By End-to-end Language generation This is the repository for the Findings of EMNLP 2021 paper REBEL: Relation Extraction By

Babelscape 222 Jan 06, 2023
Converts geometry node attributes to built-in attributes

Attribute Converter Simplifies converting attributes created by geometry nodes to built-in attributes like UVs or vertex colors, as a single click ope

Ivan Notaros 12 Dec 22, 2022
Codes and pretrained weights for winning submission of 2021 Brain Tumor Segmentation (BraTS) Challenge

Winning submission to the 2021 Brain Tumor Segmentation Challenge This repo contains the codes and pretrained weights for the winning submission to th

94 Dec 28, 2022
Code related to the manuscript "Averting A Crisis In Simulation-Based Inference"

Abstract We present extensive empirical evidence showing that current Bayesian simulation-based inference algorithms are inadequate for the falsificat

Montefiore Artificial Intelligence Research 3 Nov 14, 2022
Pi-NAS: Improving Neural Architecture Search by Reducing Supernet Training Consistency Shift (ICCV 2021)

Ξ -NAS This repository provides the evaluation code of our submitted paper: Pi-NAS: Improving Neural Architecture Search by Reducing Supernet Training

Jiqi Zhang 18 Aug 18, 2022
Transformer model implemented with Pytorch

transformer-pytorch Transformer model implemented with Pytorch Attention is all you need-[Paper] Architecture Self-Attention self_attention.py class

Mingu Kang 12 Sep 03, 2022
minimizer-space de Bruijn graphs (mdBG) for whole genome assembly

rust-mdbg: Minimizer-space de Bruijn graphs (mdBG) for whole-genome assembly rust-mdbg is an ultra-fast minimizer-space de Bruijn graph (mdBG) impleme

Barış Ekim 148 Dec 01, 2022
Vanilla and Prototypical Networks with Random Weights for image classification on Omniglot and mini-ImageNet. Made with Python3.

vanilla-rw-protonets-project Vanilla Prototypical Networks and PNs with Random Weights for image classification on Omniglot and mini-ImageNet. Made wi

Giovani Candido 8 Aug 31, 2022
A repo for Causal Imitation Learning under Temporally Correlated Noise

CausIL A repo for Causal Imitation Learning under Temporally Correlated Noise. Running Experiments To re-train an expert, run: python experts/train_ex

Gokul Swamy 5 Nov 01, 2022
OneFlow is a performance-centered and open-source deep learning framework.

OneFlow OneFlow is a performance-centered and open-source deep learning framework. Latest News Version 0.5.0 is out! First class support for eager exe

OneFlow 4.2k Jan 07, 2023
Temporal Knowledge Graph Reasoning Triggered by Memories

MTDM Temporal Knowledge Graph Reasoning Triggered by Memories To alleviate the time dependence, we propose a memory-triggered decision-making (MTDM) n

4 Sep 25, 2022
NALSM: Neuron-Astrocyte Liquid State Machine

NALSM: Neuron-Astrocyte Liquid State Machine This package is a Tensorflow implementation of the Neuron-Astrocyte Liquid State Machine (NALSM) that int

Computational Brain Lab 4 Nov 28, 2022
Automatic deep learning for image classification.

AutoDL AutoDL automates machine learning tasks enabling you to easily achieve strong predictive performance in your applications. With just a few line

wenqi 2 Oct 12, 2022
A PyTorch implementation of "ANEMONE: Graph Anomaly Detection with Multi-Scale Contrastive Learning", CIKM-21

ANEMONE A PyTorch implementation of "ANEMONE: Graph Anomaly Detection with Multi-Scale Contrastive Learning", CIKM-21 Dependencies python==3.6.1 dgl==

Graph Analysis & Deep Learning Laboratory, GRAND 30 Dec 14, 2022
DSTC10 Track 2 - Knowledge-grounded Task-oriented Dialogue Modeling on Spoken Conversations

DSTC10 Track 2 - Knowledge-grounded Task-oriented Dialogue Modeling on Spoken Conversations This repository contains the data, scripts and baseline co

Alexa 51 Dec 17, 2022
Beginner-friendly repository for Hacktober Fest 2021. Start your contribution to open source through baby steps. πŸ’œ

Hacktober Fest 2021 πŸŽ‰ Open source is changing the world – one contribution at a time! πŸŽ‰ This repository is made for beginners who are unfamiliar wit

Abhilash M Nair 32 Dec 11, 2022
INSPIRED: A Transparent Dialogue Dataset for Interactive Semantic Parsing

INSPIRED: A Transparent Dialogue Dataset for Interactive Semantic Parsing Existing studies on semantic parsing focus primarily on mapping a natural-la

7 Aug 22, 2022