Big Bird: Transformers for Longer Sequences

Overview

Big Bird: Transformers for Longer Sequences

Not an official Google product.

What is BigBird?

BigBird, is a sparse-attention based transformer which extends Transformer based models, such as BERT to much longer sequences. Moreover, BigBird comes along with a theoretical understanding of the capabilities of a complete transformer that the sparse model can handle.

As a consequence of the capability to handle longer context, BigBird drastically improves performance on various NLP tasks such as question answering and summarization.

More details and comparisons can be found in our presentation.

Citation

If you find this useful, please cite our NeurIPS 2020 paper:

@article{zaheer2020bigbird,
  title={Big bird: Transformers for longer sequences},
  author={Zaheer, Manzil and Guruganesh, Guru and Dubey, Kumar Avinava and Ainslie, Joshua and Alberti, Chris and Ontanon, Santiago and Pham, Philip and Ravula, Anirudh and Wang, Qifan and Yang, Li and others},
  journal={Advances in Neural Information Processing Systems},
  volume={33},
  year={2020}
}

Code

The most important directory is core. There are three main files in core.

  • attention.py: Contains BigBird linear attention mechanism
  • encoder.py: Contains the main long sequence encoder stack
  • modeling.py: Contains packaged BERT and seq2seq transformer models with BigBird attention

Colab/IPython Notebook

A quick fine-tuning demonstration for text classification is provided in imdb.ipynb

Create GCP Instance

Please create a project first and create an instance in a zone which has quota as follows

gcloud compute instances create \
  bigbird \
  --zone=europe-west4-a \
  --machine-type=n1-standard-16 \
  --boot-disk-size=50GB \
  --image-project=ml-images \
  --image-family=tf-2-3-1 \
  --maintenance-policy TERMINATE \
  --restart-on-failure \
  --scopes=cloud-platform

gcloud compute tpus create \
  bigbird \
  --zone=europe-west4-a \
  --accelerator-type=v3-32 \
  --version=2.3.1

gcloud compute ssh --zone "europe-west4-a" "bigbird"

For illustration we used instance name bigbird and zone europe-west4-a, but feel free to change them. More details about creating Google Cloud TPU can be found in online documentations.

Instalation and checkpoints

git clone https://github.com/google-research/bigbird.git
cd bigbird
pip3 install -e .

You can find pretrained and fine-tuned checkpoints in our Google Cloud Storage Bucket.

Optionally, you can download them using gsutil as

mkdir -p bigbird/ckpt
gsutil cp -r gs://bigbird-transformer/ bigbird/ckpt/

The storage bucket contains:

  • pretrained BERT model for base(bigbr_base) and large (bigbr_large) size. It correspond to BERT/RoBERTa-like encoder only models. Following original BERT and RoBERTa implementation they are transformers with post-normalization, i.e. layer norm is happening after the attention layer. However, following Rothe et al, we can use them partially in encoder-decoder fashion by coupling the encoder and decoder parameters, as illustrated in bigbird/summarization/roberta_base.sh launch script.
  • pretrained Pegasus Encoder-Decoder Transformer in large size(bigbp_large). Again following original implementation of Pegasus, they are transformers with pre-normalization. They have full set of separate encoder-decoder weights. Also for long document summarization datasets, we have converted Pegasus checkpoints (model.ckpt-0) for each dataset and also provided fine-tuned checkpoints (model.ckpt-300000) which works on longer documents.
  • fine-tuned tf.SavedModel for long document summarization which can be directly be used for prediction and evaluation as illustrated in the colab nootebook.

Running Classification

For quickly starting with BigBird, one can start by running the classification experiment code in classifier directory. To run the code simply execute

export GCP_PROJECT_NAME=bigbird-project  # Replace by your project name
export GCP_EXP_BUCKET=gs://bigbird-transformer-training/  # Replace
sh -x bigbird/classifier/base_size.sh

Using BigBird Encoder instead BERT/RoBERTa

To directly use the encoder instead of say BERT model, we can use the following code.

from bigbird.core import modeling

bigb_encoder = modeling.BertModel(...)

It can easily replace BERT's encoder.

Alternatively, one can also try playing with layers of BigBird encoder

from bigbird.core import encoder

only_layers = encoder.EncoderStack(...)

Understanding Flags & Config

All the flags and config are explained in core/flags.py. Here we explain some of the important config paramaters.

attention_type is used to select the type of attention we would use. Setting it to block_sparse runs the BigBird attention module.

flags.DEFINE_enum(
    "attention_type", "block_sparse",
    ["original_full", "simulated_sparse", "block_sparse"],
    "Selecting attention implementation. "
    "'original_full': full attention from original bert. "
    "'simulated_sparse': simulated sparse attention. "
    "'block_sparse': blocked implementation of sparse attention.")

block_size is used to define the size of blocks, whereas num_rand_blocks is used to set the number of random blocks. The code currently uses window size of 3 blocks and 2 global blocks. The current code only supports static tensors.

Important points to note:

  • Hidden dimension should be divisible by the number of heads.
  • Currently the code only handles tensors of static shape as it is primarily designed for TPUs which only works with statically shaped tensors.
  • For sequene length less than 1024, using original_full is advised as there is no benefit in using sparse BigBird attention.
A Chinese to English Neural Model Translation Project

ZH-EN NMT Chinese to English Neural Machine Translation This project is inspired by Stanford's CS224N NMT Project Dataset used in this project: News C

Zhenbang Feng 29 Nov 26, 2022
Chatbot with Pytorch, Python & Nextjs

Installation Instructions Make sure that you have Python 3, gcc, venv, and pip installed. Clone the repository $ git clone https://github.com/sahr

Rohit Sah 0 Dec 11, 2022
This is a GUI program that will generate a word search puzzle image

Word Search Puzzle Generator Table of Contents About The Project Built With Getting Started Prerequisites Installation Usage Roadmap Contributing Cont

11 Feb 22, 2022
txtai: Build AI-powered semantic search applications in Go

txtai: Build AI-powered semantic search applications in Go txtai executes machine-learning workflows to transform data and build AI-powered semantic s

NeuML 49 Dec 06, 2022
This repository collects together basic linguistic processing data for using dataset dumps from the Common Voice project

Common Voice Utils This repository collects together basic linguistic processing data for using dataset dumps from the Common Voice project. It aims t

Francis Tyers 40 Dec 20, 2022
This is the writeup of all the challenges from Advent-of-cyber-2019 of TryHackMe

Advent-of-cyber-2019-writeup This is the writeup of all the challenges from Advent-of-cyber-2019 of TryHackMe https://tryhackme.com/shivam007/badges/c

shivam danawale 5 Jul 17, 2022
Implementation of TF-IDF algorithm to find documents similarity with cosine similarity

NLP learning Trying to learn NLP to use in my projects! Table of Contents About The Project Built With Getting Started Requirements Run Usage License

Faraz Farangizadeh 3 Aug 25, 2022
Chinese version of GPT2 training code, using BERT tokenizer.

GPT2-Chinese Description Chinese version of GPT2 training code, using BERT tokenizer or BPE tokenizer. It is based on the extremely awesome repository

Zeyao Du 5.6k Jan 04, 2023
Code for producing Japanese GPT-2 provided by rinna Co., Ltd.

japanese-gpt2 This repository provides the code for training Japanese GPT-2 models. This code has been used for producing japanese-gpt2-medium release

rinna Co.,Ltd. 491 Jan 07, 2023
A tool helps build a talk preview image by combining the given background image and talk event description

talk-preview-img-builder A tool helps build a talk preview image by combining the given background image and talk event description Installation and U

PyCon Taiwan 4 Aug 20, 2022
Paradigm Shift in NLP - "Paradigm Shift in Natural Language Processing".

Paradigm Shift in NLP Welcome to the webpage for "Paradigm Shift in Natural Language Processing". Some resources of the paper are constantly maintaine

Tianxiang Sun 41 Dec 30, 2022
precise iris segmentation

PI-DECODER Introduction PI-DECODER, a decoder structure designed for Precise Iris Segmentation and Location. The decoder structure is shown below: Ple

8 Aug 08, 2022
Semi-automated vocabulary generation from semantic vector models

vec2word Semi-automated vocabulary generation from semantic vector models This script generates a list of potential conlang word forms along with asso

9 Nov 25, 2022
Artificial Conversational Entity for queries in Eulogio "Amang" Rodriguez Institute of Science and Technology (EARIST)

🤖 Coeus - EARIST A.C.E 💬 Coeus is an Artificial Conversational Entity for queries in Eulogio "Amang" Rodriguez Institute of Science and Technology,

Dids Irwyn Reyes 3 Oct 14, 2022
Knowledge Graph,Question Answering System,基于知识图谱和向量检索的医疗诊断问答系统

Knowledge Graph,Question Answering System,基于知识图谱和向量检索的医疗诊断问答系统

wangle 823 Dec 28, 2022
Simple Python library, distributed via binary wheels with few direct dependencies, for easily using wav2vec 2.0 models for speech recognition

Wav2Vec2 STT Python Beta Software Simple Python library, distributed via binary wheels with few direct dependencies, for easily using wav2vec 2.0 mode

David Zurow 22 Dec 29, 2022
Python3 to Crystal Translation using Python AST Walker

py2cr.py A code translator using AST from Python to Crystal. This is basically a NodeVisitor with Crystal output. See AST documentation (https://docs.

66 Jul 25, 2022
PatrickStar enables Larger, Faster, Greener Pretrained Models for NLP. Democratize AI for everyone.

PatrickStar enables Larger, Faster, Greener Pretrained Models for NLP. Democratize AI for everyone.

Tencent 633 Dec 28, 2022
Subtitle Workshop (subshop): tools to download and synchronize subtitles

SUBSHOP Tools to download, remove ads, and synchronize subtitles. SUBSHOP Purpose Limitations Required Web Credentials Installation, Configuration, an

Joe D 4 Feb 13, 2022
Non-Autoregressive Translation with Layer-Wise Prediction and Deep Supervision

Deeply Supervised, Layer-wise Prediction-aware (DSLP) Transformer for Non-autoregressive Neural Machine Translation

Chenyang Huang 37 Jan 04, 2023