You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling

Related tags

Deep LearningYOSO
Overview

You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling

Transformer-based models are widely used in natural language processing (NLP). Central to the transformer model is the self-attention mechanism, which captures the interactions of token pairs in the input sequences and depends quadratically on the sequence length. Training such models on longer sequences is expensive. In this paper, we show that a Bernoulli sampling attention mechanism based on Locality Sensitive Hash- ing (LSH), decreases the quadratic complexity of such models to linear. We bypass the quadratic cost by considering self-attention as a sum of individual tokens associated with Bernoulli random variables that can, in principle, be sampled at once by a single hash (although in practice, this number may be a small constant). This leads to an efficient sampling scheme to estimate self-attention which relies on specific modifications of LSH (to enable deployment on GPU architectures).

Requirements

docker, nvidia-docker

Start Docker Container

Under YOSO folder, run

docker run --ipc=host --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES= -v "$PWD:/workspace" -it mlpen/transformers:4

For Nvidia's 30 series GPU, run

docker run --ipc=host --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES= -v "$PWD:/workspace" -it mlpen/transformers:5

Then, the YOSO folder is mapped to /workspace in the container.

BERT

Datasets

To be updated

Pre-training

To start pre-training of a specific configuration: create a folder YOSO/BERT/models/ (for example, bert-small) and write YOSO/BERT/models/ /config.json to specify model and training configuration, then under YOSO/BERT folder, run

python3 run_pretrain.py --model 
   

   

The command will create a YOSO/BERT/models/ /model folder holding all checkpoints and log file.

Pre-training from Different Model's Checkpoint

Copy a checkpoint (one of .model or .cp file) from YOSO/BERT/models/ /model folder to YOSO/BERT/models/ folder and add a key-value pair in YOSO/BERT/models/ /config.json : "from_cp": " " . One example is shown in YOSO/BERT/models/bert-small-4096/config.json. This procedure also works for extending the max sequence length of a model (For example, use bert-small pre-trained weights as initialization for bert-small-4096).

GLUE Fine-tuning

Under YOSO/BERT folder, run

python3 run_glue.py --model 
   
     --batch_size 
    
      --lr 
     
       --task 
      
        --checkpoint 
        
       
      
     
    
   

For example,

python3 run_glue.py --model bert-small --batch_size 32 --lr 3e-5 --task MRPC --checkpoint cp-0249.model

The command will create a log file in YOSO/BERT/models/ /model .

Long Range Arena Benchmark

Datasets

To be updated

Run Evaluations

To start evaluation of a specific model on a task in LRA benchmark:

  • Create a folder YOSO/LRA/models/ (for example, softmax)
  • Write YOSO/LRA/models/ /config.json to specify model and training configuration

Under YOSO/LRA folder, run

python3 run_task.py --model 
   
     --task 
    

    
   

For example, run

python3 run_task.py --model softmax --task listops

The command will create a YOSO/LRA/models/ /model folder holding the best validation checkpoint and log file. After completion, the test set accuracy can be found in the last line of the log file.

RoBERTa

Datasets

To be updated

Pre-training

To start pretraining of a specific configuration:

  • Create a folder YOSO/RoBERTa/models/ (for example, bert-small)
  • Write YOSO/RoBERTa/models/ /config.json to specify model and training configuration

Under YOSO/RoBERTa folder, run

python3 run_pretrain.py --model 
   

   

For example, run

python3 run_pretrain.py --model bert-small

The command will create a YOSO/RoBERTa/models/ /model folder holding all checkpoints and log file.

GLUE Fine-tuning

To fine-tune model on GLUE tasks:

Under YOSO/RoBERTa folder, run

python3 run_glue.py --model 
   
     --batch_size 
    
      --lr 
     
       --task 
      
        --checkpoint 
        
       
      
     
    
   

For example,

python3 run_glue.py --model bert-small --batch_size 32 --lr 3e-5 --task MRPC --checkpoint 249

The command will create a log file in YOSO/RoBERTa/models/ /model .

Citation

@article{zeng2021yoso,
  title={You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling},
  author={Zhanpeng Zeng, Yunyang Xiong, Sathya N. Ravi, Shailesh Acharya, Glenn Fung, Vikas Singh},
  booktitle={Proceedings of the International Conference on Machine Learning},
  year={2021}
}
Owner
Zhanpeng Zeng
Zhanpeng Zeng
Semantically Contrastive Learning for Low-light Image Enhancement

Semantically Contrastive Learning for Low-light Image Enhancement Here, we propose an effective semantically contrastive learning paradigm for Low-lig

48 Dec 16, 2022
TilinGNN: Learning to Tile with Self-Supervised Graph Neural Network (SIGGRAPH 2020)

TilinGNN: Learning to Tile with Self-Supervised Graph Neural Network (SIGGRAPH 2020) About The goal of our research problem is illustrated below: give

59 Dec 09, 2022
Evaluating AlexNet features at various depths

Linear Separability Evaluation This repo provides the scripts to test a learned AlexNet's feature representation performance at the five different con

Yuki M. Asano 32 Dec 30, 2022
Discover hidden deepweb pages

DeepWeb Scapper Att: Demo version An simple script to scrappe deepweb to find pages. Will return if any of those exists and will save on a file. You s

Héber Júlio 77 Oct 02, 2022
DexterRedTool - Dexter's Red Team Tool that creates cronjob/task scheduler to consistently creates users

DexterRedTool Author: Dexter Delandro CSEC 473 - Spring 2022 This tool persisten

2 Feb 16, 2022
Tensorflow Implementation of ECCV'18 paper: Multimodal Human Motion Synthesis

MT-VAE for Multimodal Human Motion Synthesis This is the code for ECCV 2018 paper MT-VAE: Learning Motion Transformations to Generate Multimodal Human

Xinchen Yan 36 Oct 02, 2022
A model that attempts to learn and benefit from data collected on card counting.

A model that attempts to learn and benefit from data collected on card counting. A decision tree like model is built to win more often than loose and increase the bet of the player appropriately to c

1 Dec 17, 2021
Tooling for GANs in TensorFlow

TensorFlow-GAN (TF-GAN) TF-GAN is a lightweight library for training and evaluating Generative Adversarial Networks (GANs). Can be installed with pip

803 Dec 24, 2022
Offline Reinforcement Learning with Implicit Q-Learning

Offline Reinforcement Learning with Implicit Q-Learning This repository contains the official implementation of Offline Reinforcement Learning with Im

Ilya Kostrikov 126 Jan 06, 2023
Ejemplo Algoritmo Viterbi - Example of a Viterbi algorithm applied to a hidden Markov model on DNA sequence

Ejemplo Algoritmo Viterbi Ejemplo de un algoritmo Viterbi aplicado a modelo ocul

Mateo Velásquez Molina 1 Jan 10, 2022
Winners of DrivenData's Overhead Geopose Challenge

Winners of DrivenData's Overhead Geopose Challenge

DrivenData 22 Aug 04, 2022
Tianshou - An elegant PyTorch deep reinforcement learning library.

Tianshou (天授) is a reinforcement learning platform based on pure PyTorch. Unlike existing reinforcement learning libraries, which are mainly based on

Tsinghua Machine Learning Group 5.5k Jan 05, 2023
⚾🤖⚾ Automatic baseball pitching overlay in realtime

⚾ Automatically overlaying pitch motion and trajectory with machine learning! This project takes your baseball pitching clips and automatically genera

Tony Chou 240 Dec 05, 2022
Compare outputs between layers written in Tensorflow and layers written in Pytorch

Compare outputs of Wasserstein GANs between TensorFlow vs Pytorch This is our testing module for the implementation of improved WGAN in Pytorch Prereq

Hung Nguyen 72 Dec 20, 2022
[NeurIPS'21 Spotlight] PyTorch code for our paper "Aligned Structured Sparsity Learning for Efficient Image Super-Resolution"

ASSL This repository is for a new network pruning method (Aligned Structured Sparsity Learning, ASSL) for efficient single image super-resolution (SR)

Huan Wang 47 Nov 28, 2022
Stable Neural ODE with Lyapunov-Stable Equilibrium Points for Defending Against Adversarial Attacks

Stable Neural ODE with Lyapunov-Stable Equilibrium Points for Defending Against Adversarial Attacks Stable Neural ODE with Lyapunov-Stable Equilibrium

Kang Qiyu 8 Dec 12, 2022
[NeurIPS-2021] Slow Learning and Fast Inference: Efficient Graph Similarity Computation via Knowledge Distillation

Efficient Graph Similarity Computation - (EGSC) This repo contains the source code and dataset for our paper: Slow Learning and Fast Inference: Effici

23 Nov 11, 2022
Differentiable simulation for system identification and visuomotor control

gradsim gradSim: Differentiable simulation for system identification and visuomotor control gradSim is a unified differentiable rendering and multiphy

105 Dec 18, 2022
(Py)TOD: Tensor-based Outlier Detection, A General GPU-Accelerated Framework

(Py)TOD: Tensor-based Outlier Detection, A General GPU-Accelerated Framework Background: Outlier detection (OD) is a key data mining task for identify

Yue Zhao 127 Jan 05, 2023
A tensorflow implementation of an HMM layer

tensorflow_hmm Tensorflow and numpy implementations of the HMM viterbi and forward/backward algorithms. See Keras example for an example of how to use

Zach Dwiel 283 Oct 19, 2022