Pytorch library for fast transformer implementations

Overview

Fast Transformers

Transformers are very successful models that achieve state of the art performance in many natural language tasks. However, it is very difficult to scale them to long sequences due to the quadratic scaling of self-attention.

This library was developed for our research on fast attention for transformers. You can find a list of our papers in the docs as well as related papers and papers that we have implemented.

Quick-start

The following code builds a transformer with softmax attention and one with linear attention and compares the time required by each to encode a sequence with 1000 elements.

import torch
from fast_transformers.builders import TransformerEncoderBuilder

# Create the builder for our transformers
builder = TransformerEncoderBuilder.from_kwargs(
    n_layers=8,
    n_heads=8,
    query_dimensions=64,
    value_dimensions=64,
    feed_forward_dimensions=1024
)

# Build a transformer with softmax attention
builder.attention_type = "full"
softmax_model = builder.get()

# Build a transformer with linear attention
builder.attention_type = "linear"
linear_model = builder.get()

# Construct the dummy input
X = torch.rand(10, 1000, 8*64)

# Prepare everythin for CUDA
X = X.cuda()
softmax_model.cuda()
softmax_model.eval()
linear_model.cuda()
linear_model.eval()

# Warmup the GPU
with torch.no_grad():
    softmax_model(X)
    linear_model(X)
torch.cuda.synchronize()

# Measure the execution time
softmax_start = torch.cuda.Event(enable_timing=True)
softmax_end = torch.cuda.Event(enable_timing=True)
linear_start = torch.cuda.Event(enable_timing=True)
linear_end = torch.cuda.Event(enable_timing=True)

with torch.no_grad():
    softmax_start.record()
    y = softmax_model(X)
    softmax_end.record()
    torch.cuda.synchronize()
    print("Softmax: ", softmax_start.elapsed_time(softmax_end), "ms")
    # Softmax: 144 ms (on a GTX1080Ti)

with torch.no_grad():
    linear_start.record()
    y = linear_model(X)
    linear_end.record()
    torch.cuda.synchronize()
    print("Linear: ", linear_start.elapsed_time(linear_end), "ms")
    # Linear: 68 ms (on a GTX1080Ti)

Dependencies & Installation

The fast transformers library has the following dependencies:

  • PyTorch
  • C++ toolchain
  • CUDA toolchain (if you want to compile for GPUs)

For most machines installation should be as simple as:

pip install --user pytorch-fast-transformers

Note: macOS users should ensure they have llvm and libomp installed. Using the homebrew package manager, this can be accomplished by running brew install llvm libomp.

Documentation

There exists a dedicated documentation site but you are also encouraged to read the source code.

Research

Ours

To read about the theory behind some attention implementations in this library we encourage you to follow our research.

  • Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention (2006.16236)
  • Fast Transformers with Clustered Attention (2007.04825)

If you found our research helpful or influential please consider citing

@inproceedings{katharopoulos_et_al_2020,
    author = {Katharopoulos, A. and Vyas, A. and Pappas, N. and Fleuret, F.},
    title = {Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention},
    booktitle = {Proceedings of the International Conference on Machine Learning (ICML)},
    year = {2020}
}

@article{vyas_et_al_2020,
    author={Vyas, A. and Katharopoulos, A. and Fleuret, F.},
    title={Fast Transformers with Clustered Attention},
    booktitle = {Proceedings of the International Conference on Neural Information Processing Systems (NeurIPS)},
    year={2020}
}

By others

  • Efficient Attention: Attention with Linear Complexities (1812.01243)
  • Linformer: Self-Attention with Linear Complexity (2006.04768)
  • Reformer: The Efficient Transformer (2001.04451)

Support, License and Copyright

This software is distributed with the MIT license which pretty much means that you can use it however you want and for whatever reason you want. All the information regarding support, copyright and the license can be found in the LICENSE file in the repository.

Owner
Idiap Research Institute
Idiap Research Institute
HiPAL: A Deep Framework for Physician Burnout Prediction Using Activity Logs in Electronic Health Records

HiPAL Code for KDD'22 Applied Data Science Track submission -- HiPAL: A Deep Framework for Physician Burnout Prediction Using Activity Logs in Electro

Hanyang Liu 4 Aug 08, 2022
EgGateWayGetShell py脚本

EgGateWayGetShell_py 免责声明 由于传播、利用此文所提供的信息而造成的任何直接或者间接的后果及损失,均由使用者本人负责,作者不为此承担任何责任。 使用 python3 eg.py urls.txt 目标 title:锐捷网络-EWEB网管系统 port:4430 漏洞成因 ?p

榆木 61 Nov 09, 2022
This repo is the official implementation for Multi-Scale Adaptive Graph Neural Network for Multivariate Time Series Forecasting

1 MAGNN This repo is the official implementation for Multi-Scale Adaptive Graph Neural Network for Multivariate Time Series Forecasting. 1.1 The frame

SZJ 12 Nov 08, 2022
buildseg is a building extraction plugin of QGIS based on PaddlePaddle.

buildseg buildseg is a Building Extraction plugin for QGIS based on PaddlePaddle. How to use Download and install QGIS and clone the repo : git clone

39 Dec 09, 2022
My take on a practical implementation of Linformer for Pytorch.

Linformer Pytorch Implementation A practical implementation of the Linformer paper. This is attention with only linear complexity in n, allowing for v

Peter 349 Dec 25, 2022
3D Multi-Person Pose Estimation by Integrating Top-Down and Bottom-Up Networks

3D Multi-Person Pose Estimation by Integrating Top-Down and Bottom-Up Networks Introduction This repository contains the code and models for the follo

124 Jan 06, 2023
This is the official implement of paper "ActionCLIP: A New Paradigm for Action Recognition"

This is an official pytorch implementation of ActionCLIP: A New Paradigm for Video Action Recognition [arXiv] Overview Content Prerequisites Data Prep

268 Jan 09, 2023
Mmdet benchmark with python

mmdet_benchmark 本项目是为了研究 mmdet 推断性能瓶颈,并且对其进行优化。 配置与环境 机器配置 CPU:Intel(R) Core(TM) i9-10900K CPU @ 3.70GHz GPU:NVIDIA GeForce RTX 3080 10GB 内存:64G 硬盘:1T

杨培文 (Yang Peiwen) 24 May 21, 2022
CVNets: A library for training computer vision networks

CVNets: A library for training computer vision networks This repository contains the source code for training computer vision models. Specifically, it

Apple 1.1k Jan 03, 2023
When in Doubt: Improving Classification Performance with Alternating Normalization

When in Doubt: Improving Classification Performance with Alternating Normalization Findings of EMNLP 2021 Menglin Jia, Austin Reiter, Ser-Nam Lim, Yoa

Menglin Jia 13 Nov 06, 2022
When BERT Plays the Lottery, All Tickets Are Winning

When BERT Plays the Lottery, All Tickets Are Winning Large Transformer-based models were shown to be reducible to a smaller number of self-attention h

Sai 16 Nov 10, 2022
JAXDL: JAX (Flax) Deep Learning Library

JAXDL: JAX (Flax) Deep Learning Library Simple and clean JAX/Flax deep learning algorithm implementations: Soft-Actor-Critic (arXiv:1812.05905) Transf

Patrick Hart 4 Nov 27, 2022
RoBERTa Marathi Language model trained from scratch during huggingface 🤗 x flax community week

RoBERTa base model for Marathi Language (मराठी भाषा) Pretrained model on Marathi language using a masked language modeling (MLM) objective. RoBERTa wa

Nipun Sadvilkar 23 Oct 19, 2022
Reducing Information Bottleneck for Weakly Supervised Semantic Segmentation (NeurIPS 2021)

Reducing Information Bottleneck for Weakly Supervised Semantic Segmentation (NeurIPS 2021) The implementation of Reducing Infromation Bottleneck for W

Jungbeom Lee 81 Dec 16, 2022
Official implementation of "UCTransNet: Rethinking the Skip Connections in U-Net from a Channel-wise Perspective with Transformer"

[AAAI2022] UCTransNet This repo is the official implementation of "UCTransNet: Rethinking the Skip Connections in U-Net from a Channel-wise Perspectiv

Haonan Wang 199 Jan 03, 2023
TC-GNN with Pytorch integration

TC-GNN (Running Sparse GNN on Dense Tensor Core on Ampere GPU) Cite this project and paper. @inproceedings{TC-GNN, title={TC-GNN: Accelerating Spars

YUKE WANG 19 Dec 01, 2022
tree-math: mathematical operations for JAX pytrees

tree-math: mathematical operations for JAX pytrees tree-math makes it easy to implement numerical algorithms that work on JAX pytrees, such as iterati

Google 137 Dec 28, 2022
Converts geometry node attributes to built-in attributes

Attribute Converter Simplifies converting attributes created by geometry nodes to built-in attributes like UVs or vertex colors, as a single click ope

Ivan Notaros 12 Dec 22, 2022
Creating a custom CNN hypertunned architeture for the Fashion MNIST dataset with Python, Keras and Tensorflow.

custom-cnn-fashion-mnist Creating a custom CNN hypertunned architeture for the Fashion MNIST dataset with Python, Keras and Tensorflow. The following

Danielle Almeida 1 Mar 05, 2022
A Dataset for Direct Quotation Extraction and Attribution in News Articles.

DirectQuote - A Dataset for Direct Quotation Extraction and Attribution in News Articles DirectQuote is a corpus containing 19,760 paragraphs and 10,3

THUNLP-MT 9 Sep 23, 2022