A curated list of efficient attention modules

Overview

awesome-fast-attention Awesome

A curated list of efficient attention modules (last update: Wed, 10 Mar 2021 23:52:22 +0000)

Table of Contents

Efficient Attention

Paper (citations) Implementation Computational Complexity AutoRegressive Main Idea
Generating Wikipedia by Summarizing Long Sequences (282) memory-compressed-attention formula ✔️
EXPAND

compresses key and value + blocked attention

CBAM: Convolutional Block Attention Module (999+) attention-module formula
EXPAND

combines the SE attention with a per pixel(local) weight

Set Transformer: A Framework for Attention-based Permutation-Invariant Neural Networks (16) set_transformer formula
EXPAND

uses K relay nodes

CCNet: Criss-Cross Attention for Semantic Segmentation (296) CCNet formula
EXPAND

each pixel attends to its row and column simultaneously

Efficient Attention: Attention with Linear Complexities (16) efficient-attention formula
EXPAND

Softmax(Q)*(Softmax(K^T)*V)

Star-Transformer (40) fastNLP formula
EXPAND

uses a relay(global) node and attends to/from that node

GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond (199) GCNet formula
EXPAND

squeeze and excitation with an attention pooling (instead of a GAP)

Generating Long Sequences with Sparse Transformers (257) DeepSpeed formula ✔️
EXPAND

sparse block based attention

SCRAM: Spatially Coherent Randomized Attention Maps (1) - formula ✔️
EXPAND

uses PatchMatch to find close keys

Interlaced Sparse Self-Attention for Semantic Segmentation (24) IN_PAPER formula ✔️
EXPAND

combination of a short length and then long range(dilated) attention

Permutohedral Attention Module for Efficient Non-Local Neural Networks (3) Permutohedral_attention_module formula
EXPAND

uses permutohedral lattice approximation algorithm to approximate the attention output

Large Memory Layers with Product Keys (43) XLM formula ✔️
EXPAND

search for nearest neighbor keys

Expectation-Maximization Attention Networks for Semantic Segmentation (79) EMANet formula
EXPAND

applys expectation maximization to cluster keys into k clusters

BP-Transformer: Modelling Long-Range Context via Binary Partitioning (15) BPT formula ✔️
EXPAND

attends to distant tokens coarsely and attends to close tokens in a more fine-grained manner

Compressive Transformers for Long-Range Sequence Modelling (48) compressive-transformer-pytorch formula ✔️
EXPAND

compresses distant tokens instead of just stop_grad() ing them, more efficient version of transformerXL

Axial Attention in Multidimensional Transformers (36) axial-attention formula ✔️
EXPAND

apply attention on each axis separately

Reformer: The Efficient Transformer (216) trax formula ✔️
EXPAND

uses LSH to find close keys

Sparse Sinkhorn Attention (16) sinkhorn-transformer formula ✔️
EXPAND

uses a cost matrix to limit attention between buckets

Transformer on a Diet (2) transformer-on-diet formula ✔️
EXPAND

dilated transformer like wavenet

Time-aware Large Kernel Convolutions (9) TaLKConvolutions formula ✔️
EXPAND

calculate mean over a dynamic subsequence around each token with the help of summed-area table

SAC: Accelerating and Structuring Self-Attention via Sparse Adaptive Connection (2) - formula ✔️
EXPAND

learns the q, k connections == dynamically creates a sparse attention matrix

Efficient Content-Based Sparse Attention with Routing Transformers (38) routing-transformer formula ✔️
EXPAND

computes attention with same-cluster tokens (computed by online k-means)

Neural Architecture Search for Lightweight Non-Local Networks (11) AutoNL formula
EXPAND

computes Q(KV) and also down samples q, k, v both in spatial and channel dimensions

Longformer: The Long-Document Transformer (159) longformer formula ✔️
EXPAND

global + blocked attention

ETC: Encoding Long and Structured Inputs in Transformers (16) - formula
EXPAND

combines global attention (star transformer with multiple global tokens) with local attention

Multi-scale Transformer Language Models (2) IN_PAPER formula ✔️
EXPAND

UNet like + retina attetion is something close to BP-Transformer

Synthesizer: Rethinking Self-Attention in Transformer Models (26) Synthesizer-Rethinking-Self-Attention-Transformer-Models formula ✔️
EXPAND

does not compute pairwise interactions

Jukebox: A Generative Model for Music (45) jukebox formula ✔️
EXPAND

better attention patterns from Sparse Transformer

Input-independent Attention Weights Are Expressive Enough: A Study of Attention in Self-supervised Audio Transformers (0) - formula ✔️
EXPAND

does not compute pairwise interactions and uses fixed mask patters

GMAT: Global Memory Augmentation for Transformers (2) gmat formula
EXPAND

adds global tokens

Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention (45) fast-transformers formula ✔️
EXPAND

uses phi(q)(phi(k)v) and also improves the sequential sampling step

Linformer: Self-Attention with Linear Complexity (47) linformer-pytorch formula
EXPAND

project key and value from nd to kd

Masked Language Modeling for Proteins via Linearly Scalable Long-Context Transformers (8) google-research formula ✔️
EXPAND

calculate an unbiased stochastic approximation of the attention matrix

Kronecker Attention Networks (1) kronecker-attention-pytorch formula
EXPAND

uses horizontal and lateral average matrices

Real-time Semantic Segmentation with Fast Attention (5) - formula
EXPAND

l2_norm(q)*(l2_norm(k)*v)

Fast Transformers with Clustered Attention (6) fast-transformers formula
EXPAND

groups queries together with LSH

Big Bird: Transformers for Longer Sequences (60) DeepSpeed formula
EXPAND

ETC with random connections

Tensor Low-Rank Reconstruction for Semantic Segmentation (3) - formula
EXPAND

decompose the full attention tensor into rank one tensors (CP decomposition)

Looking for change? Roll the Dice and demand Attention (0) IN_PAPER formula
EXPAND

uses the fractal tanimoto similarity to compare queries with keys inside the attention module

Rethinking Attention with Performers (30) google-research formula ✔️
EXPAND

unbiased approximation of the attention matrix with softmax kernel

Memformer: The Memory-Augmented Transformer (0) memformer formula ✔️
EXPAND

attend to memory slots + Memory-Replay BackPropagation

SMYRF: Efficient Attention using Asymmetric Clustering (1) smyrf formula
EXPAND

LSH with balanced clusters

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting (0) Informer2020 formula ✔️
EXPAND

sparse attention + funnel like encoder

Sub-Linear Memory: How to Make Performers SLiM (0) google-research formula ✔️
EXPAND

Performer but with sublinear Memory usage

Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention (0) Nystromformer formula
EXPAND

uses Nystrom method to approximate the attention matrix

Linear Transformers Are Secretly Fast Weight Memory Systems (0) fast-weight-transformers formula ✔️
EXPAND

show that linear transformers are basically fast weight networks + propose a new kernel function to linearise attention, balancing simplicity and effectiveness

LambdaNetworks: Modeling Long-Range Interactions Without Attention (6) lambda-networks formula ✔️
EXPAND

generates a linear layer based on context + decouple pos/context

Random Feature Attention (2) - formula ✔️
EXPAND

kernel approximation and also transformers are rnn

Articles/Surveys/Benchmarks

Owner
Sepehr Sameni
PhD Candidate at the University of Bern, Computer Vision Group
Sepehr Sameni
Code for our ACL 2021 (Findings) Paper - Fingerprinting Fine-tuned Language Models in the wild .

🌳 Fingerprinting Fine-tuned Language Models in the wild This is the code and dataset for our ACL 2021 (Findings) Paper - Fingerprinting Fine-tuned La

LCS2-IIITDelhi 5 Sep 13, 2022
Google AI 2018 BERT pytorch implementation

BERT-pytorch Pytorch implementation of Google AI's 2018 BERT, with simple annotation BERT 2018 BERT: Pre-training of Deep Bidirectional Transformers f

Junseong Kim 5.3k Jan 07, 2023
This repository serves as a place to document a toy attempt on how to create a generative text model in Catalan, based on GPT-2

GPT-2 Catalan playground and scripts to train a GPT-2 model either from scrath or from another pretrained model.

Laura 1 Jan 28, 2022
Plugin repository for Macast

Macast-plugins Plugin repository for Macast. How to use third-party player plugin Download Macast from GitHub Release. Download the plugin you want fr

109 Jan 04, 2023
Bidirectional Variational Inference for Non-Autoregressive Text-to-Speech (BVAE-TTS)

Bidirectional Variational Inference for Non-Autoregressive Text-to-Speech (BVAE-TTS) Yoonhyung Lee, Joongbo Shin, Kyomin Jung Abstract: Although early

LEE YOON HYUNG 147 Dec 05, 2022
تولید اسم های رندوم فینگیلیش

karafs کرفس تولید اسم های رندوم فینگیلیش installation ➜ pip install karafs usage دو زبانه ➜ karafs -n 10 توت فرنگی بی ناموس toot farangi-ye bi_namoos

Vaheed NÆINI (9E) 36 Nov 24, 2022
ETM - R package for Topic Modelling in Embedding Spaces

ETM - R package for Topic Modelling in Embedding Spaces This repository contains an R package called topicmodels.etm which is an implementation of ETM

bnosac 37 Nov 06, 2022
NLPIR tutorial: pretrain for IR. pre-train on raw textual corpus, fine-tune on MS MARCO Document Ranking

pretrain4ir_tutorial NLPIR tutorial: pretrain for IR. pre-train on raw textual corpus, fine-tune on MS MARCO Document Ranking 用作NLPIR实验室, Pre-training

ZYMa 12 Apr 07, 2022
Mkdocs + material + cool stuff

Modern-Python-Doc-Example mkdocs + material + cool stuff Doc is live here Features out of the box amazing good looking website thanks to mkdocs.org an

Francesco Saverio Zuppichini 61 Oct 26, 2022
This is my reading list for my PhD in AI, NLP, Deep Learning and more.

This is my reading list for my PhD in AI, NLP, Deep Learning and more.

Zhong Peixiang 156 Dec 21, 2022
Code for paper: An Effective, Robust and Fairness-awareHate Speech Detection Framework

BiQQLSTM_HS Code and data for paper: Title: An Effective, Robust and Fairness-awareHate Speech Detection Framework. Authors: Guanyi Mou and Kyumin Lee

Guanyi Mou 2 Dec 27, 2022
NumPy String-Indexed is a NumPy extension that allows arrays to be indexed using descriptive string labels

NumPy String-Indexed NumPy String-Indexed is a NumPy extension that allows arrays to be indexed using descriptive string labels, rather than conventio

Aitan Grossman 1 Jan 08, 2022
Beyond Paragraphs: NLP for Long Sequences

Beyond Paragraphs: NLP for Long Sequences

AI2 338 Dec 02, 2022
Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition

SEW (Squeezed and Efficient Wav2vec) The repo contains the code of the paper "Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speec

ASAPP Research 67 Dec 01, 2022
Fast, general, and tested differentiable structured prediction in PyTorch

Torch-Struct: Structured Prediction Library A library of tested, GPU implementations of core structured prediction algorithms for deep learning applic

HNLP 1.1k Dec 16, 2022
Full Spectrum Bioinformatics - a free online text designed to introduce key topics in Bioinformatics using the Python

Full Spectrum Bioinformatics is a free online text designed to introduce key topics in Bioinformatics using the Python programming language. The text is written in interactive Jupyter Notebooks, whic

Jesse Zaneveld 33 Dec 28, 2022
fastNLP: A Modularized and Extensible NLP Framework. Currently still in incubation.

fastNLP fastNLP是一款轻量级的自然语言处理(NLP)工具包,目标是快速实现NLP任务以及构建复杂模型。 fastNLP具有如下的特性: 统一的Tabular式数据容器,简化数据预处理过程; 内置多种数据集的Loader和Pipe,省去预处理代码; 各种方便的NLP工具,例如Embedd

fastNLP 2.8k Jan 01, 2023
AI-powered literature discovery and review engine for medical/scientific papers

AI-powered literature discovery and review engine for medical/scientific papers paperai is an AI-powered literature discovery and review engine for me

NeuML 819 Dec 30, 2022
Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"

T5: Text-To-Text Transfer Transformer The t5 library serves primarily as code for reproducing the experiments in Exploring the Limits of Transfer Lear

Google Research 4.6k Jan 01, 2023
Unofficial Parallel WaveGAN (+ MelGAN & Multi-band MelGAN & HiFi-GAN & StyleMelGAN) with Pytorch

Parallel WaveGAN implementation with Pytorch This repository provides UNOFFICIAL pytorch implementations of the following models: Parallel WaveGAN Mel

Tomoki Hayashi 1.2k Dec 23, 2022