Source code of paper "BP-Transformer: Modelling Long-Range Context via Binary Partitioning"

Related tags

Text Data & NLPBPT
Overview

BP-Transformer

This repo contains the code for our paper

BP-Transformer: Modeling Long-Range Context via Binary Partition

Zihao Ye, Qipeng Guo, Quan Gan, Xipeng Qiu, Zheng Zhang

The code is written in DGL with PyTorch as backend.

Requirements

  • torchtext 0.4
  • dgl 0.4 (the code on master branch is not compatible with dgl 0.5, please checkout develop branch for dgl 0.5 compatible version).
  • yaml
  • spacy
  • PyTorch 1.1+

Usage

For Multi-GPU training, please export NCCL_LL_THRESHOLD=0 before running scripts because of a PyTorch bug mentioned here.

The codebase has two dependencies: graph_kernel and graph_builder, the first one is for efficient graph attention on GPU with node parallel strategy written in CUDA, the second one is for efficient graph construction written in Cython. To install them:

cd graph_builder
python setup.py install
cd ..
cd graph_kernel
python setup.py install
cd ..

We support the following tasks with BPT as backbone:

  • Text Classification: text_classification.py
  • Language Modeling: lm.py
  • Machine Translation: mt.py
  • Natural Language Inference: nli.py

All experiment settings mentioned in our paper are available at configs/.

python *.py --config configs/*.yml --gpu [GPUs]

Note that this repo does not contain any data files, to get dataset required for experiments, run . get_*.sh and the corresponding dataset would be downloaded and preprocessed.

For machine translation, we have another script mt_infer.py for decoding:

python mt_infer.py --config configs/*.yml --gpu [GPU]

Before decoding, please make sure you have finished the training using mt.py with the same config file.

NOTE: Currently we do not support CPU training/inference.

Visualization

Following is the visualization of the sparse matrix of BPT underlying graph when sequence length is 8192 and k is 4. image

Results

  • Character-Level Language Modeling (enwik8, metric: bpc), 12 layers.
    • BPT(context length=8192): 1.02
    • Adaptive Transformer: 1.02
    • Transformer-XL: 1.06
    • To reproduce: python lm.py --config configs/enwik8-8192.yml --gpu 0,1,2,3,4,5,6,7
  • Document-Level Machine Translation (IWSLT 2015 Zh-En, metric: BLEU), base setting.
    • BPT(context length=64): 19.84
    • HAN-NMT: 17.68
    • To reproduce: python mt.py --config configs/iwslt-4-64.yml --gpu 0
  • Text Classification (IMDB, metric: accuracy), 5 layers.
    • BPT+GloVe: 92.12(±0.11)
    • LSTM+CoVe: 91.8
    • Transformer+Glove: 89.24(±0.20)
    • Star Transformer: 90.50
    • To reproduce: python text_classification.py --config configs/imdb-4.yml --gpu 0
      • Note that our CUDA kernel uses atomic operations which may result in non-determinism, we report the mean and std of accuracy in multiple(10) runs.
      • The IMDB dataset has not official train/dev split, we follow the setting of Bryan et al., 2017 and hold out 10% samples for validation. We report the test accuracy of model with best valid loss.

For sentence level modeling, we show that BPT models better inductive bias than vanilla transformer by attending fine-grained features of neighbors and coarse-grained features of far-away tokens.

  • Machine Translation(WMT14 En-De, metric: BLEU), base setting.
    • BPT(k=1): 26.9
    • BPT(k=2): 27.4
    • BPT(k=4): 27.6
    • BPT(k=8): 26.7
    • Transformer-base(our implementation): 27.2
    • To reproduce: python mt.py --config configs/wmt-*.yml --gpu 0,1,2,3,4,5,6,7
      • We report SacreBLEU result for reproducibility (setting: BLEU+c.mixed+l.en-de+#.1+s.exp+t.wmt14+tok.intl+v.1.4.1), the sacrebleu score is usually lower than that produced by get_ende_bleu.sh script in tensor2tensor as described here.
  • Natural Language Inference(SNLI, metric: accuracy), ESIM-like structure, 3 layers for self-attention and 3 layers for cross-sentence attention.
    • BPT(k=4): 88.25(±0.07)
    • Transformer: 87.89(±0.31)
    • To reproduce: python nli.py --config configs/snli.yml --gpu 0
      • Like Text Classification, the result on NLI is also not stable because of randomness in our CUDA kernel, we report the mean and std of accuracy in multiple(7) runs.
  • Text Classification(SST-5, metric: accuracy), 4 layers.
    • BPT+GloVe: 52.71(±0.32)
    • Transformer+GloVe: 50.40
    • Tree-LSTM+GloVe: 51.0
    • To reproduce: python text_classification.py --config configs/sst5-2.yml --gpu 0

TODOs

  • FP16 support (mixed-precision training/inference)
  • Integrate kernels with dgl 0.5
  • CPU support
Owner
Zihao Ye
Ph.D. [email protected] of Washington, focusing on Compilers and Computer Arch
Zihao Ye
100+ Chinese Word Vectors 上百种预训练中文词向量

Chinese Word Vectors 中文词向量 中文 This project provides 100+ Chinese Word Vectors (embeddings) trained with different representations (dense and sparse),

embedding 10.4k Jan 09, 2023
Pytorch implementation of Tacotron

Tacotron-pytorch A pytorch implementation of Tacotron: A Fully End-to-End Text-To-Speech Synthesis Model. Requirements Install python 3 Install pytorc

soobin seo 203 Dec 02, 2022
Blackstone is a spaCy model and library for processing long-form, unstructured legal text

Blackstone Blackstone is a spaCy model and library for processing long-form, unstructured legal text. Blackstone is an experimental research project f

ICLR&D 579 Jan 08, 2023
SpeechBrain is an open-source and all-in-one speech toolkit based on PyTorch.

The goal is to create a single, flexible, and user-friendly toolkit that can be used to easily develop state-of-the-art speech technologies, including systems for speech recognition, speaker recognit

SpeechBrain 5.1k Jan 09, 2023
SIGIR'22 paper: Axiomatically Regularized Pre-training for Ad hoc Search

Introduction This codebase contains source-code of the Python-based implementation (ARES) of our SIGIR 2022 paper. Chen, Jia, et al. "Axiomatically Re

Jia Chen 17 Nov 09, 2022
Under the hood working of transformers, fine-tuning GPT-3 models, DeBERTa, vision models, and the start of Metaverse, using a variety of NLP platforms: Hugging Face, OpenAI API, Trax, and AllenNLP

Transformers-for-NLP-2nd-Edition @copyright 2022, Packt Publishing, Denis Rothman Contact me for any question you have on LinkedIn Get the book on Ama

Denis Rothman 150 Dec 23, 2022
Create a machine learning model which will predict if the mortgage will be approved or not based on 5 variables

Mortgage-Application-Analysis Create a machine learning model which will predict if the mortgage will be approved or not based on 5 variables: age, in

1 Jan 29, 2022
A Python script which randomly chooses and prints a file from a directory.

___ ____ ____ _ __ ___ / _ \ | _ \ | _ \ ___ _ __ | '__| / _ \ | |_| || | | || | | | / _ \| '__| | | | __/ | _ || |_| || |_| || __

yesmaybenookay 0 Aug 06, 2021
Code for the paper in Findings of EMNLP 2021: "EfficientBERT: Progressively Searching Multilayer Perceptron via Warm-up Knowledge Distillation".

This repository contains the code for the paper in Findings of EMNLP 2021: "EfficientBERT: Progressively Searching Multilayer Perceptron via Warm-up Knowledge Distillation".

Chenhe Dong 28 Nov 10, 2022
PRAnCER is a web platform that enables the rapid annotation of medical terms within clinical notes.

PRAnCER (Platform enabling Rapid Annotation for Clinical Entity Recognition) is a web platform that enables the rapid annotation of medical terms within clinical notes. A user can highlight spans of

Sontag Lab 39 Nov 14, 2022
A simple command line tool for text to image generation, using OpenAI's CLIP and a BigGAN

artificial intelligence cosmic love and attention fire in the sky a pyramid made of ice a lonely house in the woods marriage in the mountains lantern

Phil Wang 2.3k Jan 01, 2023
Espial is an engine for automated organization and discovery of personal knowledge

Live Demo (currently not running, on it) Espial is an engine for automated organization and discovery in knowledge bases. It can be adapted to run wit

Uzay-G 159 Dec 30, 2022
Korean Simple Contrastive Learning of Sentence Embeddings using SKT KoBERT and kakaobrain KorNLU dataset

KoSimCSE Korean Simple Contrastive Learning of Sentence Embeddings implementation using pytorch SimCSE Installation git clone https://github.com/BM-K/

34 Nov 24, 2022
Takes a string and puts it through different languages in Google Translate a requested amount of times, returning nonsense.

PythonTextObfuscator Takes a string and puts it through different languages in Google Translate a requested amount of times, returning nonsense. Requi

2 Aug 29, 2022
Predict an emoji that is associated with a text

Sentiment Analysis Sentiment analysis in computational linguistics is a general term for techniques that quantify sentiment or mood in a text. Can you

Tetsumichi(Telly) Umada 30 Sep 07, 2022
PyTorch impelementations of BERT-based Spelling Error Correction Models.

PyTorch impelementations of BERT-based Spelling Error Correction Models

Heng Cai 209 Dec 30, 2022
PyTorch Language Model for 1-Billion Word (LM1B / GBW) Dataset

PyTorch Large-Scale Language Model A Large-Scale PyTorch Language Model trained on the 1-Billion Word (LM1B) / (GBW) dataset Latest Results 39.98 Perp

Ryan Spring 114 Nov 04, 2022
NLP topic mdel LDA - Gathered from New York Times website

NLP topic mdel LDA - Gathered from New York Times website

1 Oct 14, 2021
Unsupervised Abstract Reasoning for Raven’s Problem Matrices

Unsupervised Abstract Reasoning for Raven’s Problem Matrices This code is the implementation of our TIP paper. This is the first unsupervised abstract

Tao Zhuo 9 Dec 17, 2022