(ACL-IJCNLP 2021) Convolutions and Self-Attention: Re-interpreting Relative Positions in Pre-trained Language Models.

Overview

BERT Convolutions

Code for the paper Convolutions and Self-Attention: Re-interpreting Relative Positions in Pre-trained Language Models. Contains experiments for integrating convolutions and self-attention in BERT models. Code is adapted from Huggingface Transformers. Model code is in src/transformers/modeling_bert.py. Run on Python 3.6.9 and Pytorch 1.7.1 (see requirements.txt).

Training

To train tokenizer, use custom_scripts/train_spm_tokenizer.py. To pre-train BERT with a plain text dataset:

python3 run_language_modeling.py \
--model_type=bert \
--tokenizer_name="./data/sentencepiece/spm.model" \
--config_name="./data/bert_base_config.json" \
--do_train --mlm --line_by_line \
--train_data_file="./data/training_text.txt" \
--per_device_train_batch_size=32 \
--save_steps=25000 \
--block_size=128 \
--max_steps=1000000 \
--warmup_steps=10000 \
--learning_rate=0.0001 --adam_epsilon=1e-6 --weight_decay=0.01 \
--output_dir="./bert-experiments/bert"

The code above produces a cached file of examples (a list of lists of token indices). Each example is an un-truncated and un-padded sentence pair (but includes [CLS] and [SEP] tokens). Convert these lists to an iterable text file using custom_scripts/shuffle_cached_dataset.py. Then, you can pre-train BERT using an iterable dataset (saving memory):

python3 run_language_modeling.py \
--model_type=bert \
--tokenizer_name="./data/sentencepiece/spm.model" \
--config_name="./data/bert_base_config.json" \
--do_train --mlm --train_iterable --line_by_line \
--train_data_file="./data/iterable_pairs_train.txt" \
--per_device_train_batch_size=32 \
--save_steps=25000 \
--block_size=128 \
--max_steps=1000000 \
--warmup_steps=10000 \
--learning_rate=0.0001 --adam_epsilon=1e-6 --weight_decay=0.01 \
--output_dir="./bert-experiments/bert"

Optional flags to change BERT architecture when pre-training from scratch:
In the following, qk uses query/key self-attention, convfixed is a fixed lightweight convolution, convq is query-based dynamic lightweight convolution (relative embeddings), convk is a key-based dynamic lightweight convolution, and convolution is a fixed depthwise convolution.

--attention_kernel="qk_convfixed_convq_convk [num_positions_each_dir]"

Remove absolute position embeddings:

--remove_position_embeddings

Convolutional values, using depthwise-separable (depth) convolutions for half of heads (mixed), and using no activation function (no_act) between the depthwise and pointwise convolutions:

--value_forward="convolution_depth_mixed_no_act [num_positions_each_dir] [num_convolution_groups]"

Convolutional queries/keys for half of heads:

--qk="convolution_depth_mixed_no_act [num_positions_each_dir] [num_convolution_groups]"

Fine-tuning

Training and evaluation for downstream GLUE tasks (note: batch size represents max batch size, because batch size is adjusted for each task):

python3 run_glue.py \
--data_dir="./glue-data/data-tsv" \
--task_name=ALL \
--save_steps=9999999 \
--max_seq_length 128 \
--per_device_train_batch_size 99999 \
--tokenizer_name="./data/sentencepiece/spm.model" \
--model_name_or_path="./bert-experiments/bert" \
--output_dir="./bert-experiments/bert-glue" \
--hyperparams="electra_base" \
--do_eval \
--do_train

Prediction

Run the fine-tuned models on the GLUE test set:
This adds a file with test set predictions to each GLUE task directory.

python3 run_glue.py \
--data_dir="./glue-data/data-tsv" \
--task_name=ALL \
--save_steps=9999999 \
--max_seq_length 128 \
--per_device_train_batch_size 99999 \
--tokenizer_name="./data/sentencepiece/spm.model" \
--model_name_or_path="./bert-experiments/placeholder" \
--output_dir="./bert-experiments/bert-glue" \
--hyperparams="electra_base" \
--do_predict

Then, test results can be compiled into one directory. The test_results directory will contain test predictions, using the fine-tuned model with the highest dev set score for each task. The files in test_results can be zipped and submitted to the GLUE benchmark site for evaluation.

python3 custom_scripts/parse_glue.py \
--input="./bert-experiments/bert-glue" \
--test_dir="./bert-experiments/bert-glue/test_results"

Citation

@inproceedings{chang-etal-2021-convolutions,
  title={Convolutions and Self-Attention: Re-interpreting Relative Positions in Pre-trained Language Models},
  author={Tyler Chang and Yifan Xu and Weijian Xu and Zhuowen Tu},
  booktitle={ACL-IJCNLP 2021},
  year={2021},
}
Owner
mlpc-ucsd
mlpc-ucsd
Code for the paper "VisualBERT: A Simple and Performant Baseline for Vision and Language"

This repository contains code for the following two papers: VisualBERT: A Simple and Performant Baseline for Vision and Language (arxiv) with a short

Natural Language Processing @UCLA 464 Jan 04, 2023
Just Another Telegram Ai Chat Bot Written In Python With Pyrogram.

OkaeriChatBot Just another Telegram AI chat bot written in Python using Pyrogram. Requirements Python 3.7 or higher.

Wahyusaputra 2 Dec 23, 2021
Accurately generate all possible forms of an English word e.g "election" --> "elect", "electoral", "electorate" etc.

Accurately generate all possible forms of an English word Word forms can accurately generate all possible forms of an English word. It can conjugate v

Dibya Chakravorty 570 Dec 31, 2022
This repository has a implementations of data augmentation for NLP for Japanese.

daaja This repository has a implementations of data augmentation for NLP for Japanese: EDA: Easy Data Augmentation Techniques for Boosting Performance

Koga Kobayashi 60 Nov 11, 2022
Learn meanings behind words is a key element in NLP. This project concentrates on the disambiguation of preposition senses. Therefore, we train a bert-transformer model and surpass the state-of-the-art.

New State-of-the-Art in Preposition Sense Disambiguation Supervisor: Prof. Dr. Alexander Mehler Alexander Henlein Institutions: Goethe University TTLa

Dirk Neuhäuser 4 Apr 06, 2022
Text to speech for Vietnamese, ez to use, ez to update

Chào mọi người, đây là dự án mở nhằm giúp việc đọc được trở nên dễ dàng hơn. Rất cảm ơn đội ngũ Zalo đã cung cấp hạ tầng để mình có thể tạo ra app này

Trần Cao Minh Bách 32 Jul 29, 2022
Gold standard corpus annotated with verb-preverb connections for Hungarian.

Hungarian Preverb Corpus A gold standard corpus manually annotated with verb-preverb connections for Hungarian. corpus The corpus consist of the follo

RIL Lexical Knowledge Representation Research Group 3 Jan 27, 2022
A very simple framework for state-of-the-art Natural Language Processing (NLP)

A very simple framework for state-of-the-art NLP. Developed by Humboldt University of Berlin and friends. IMPORTANT: (30.08.2020) We moved our models

flair 12.3k Dec 31, 2022
This repository is home to the Optimus data transformation plugins for various data processing needs.

Transformers Optimus's transformation plugins are implementations of Task and Hook interfaces that allows execution of arbitrary jobs in optimus. To i

Open Data Platform 37 Dec 14, 2022
Repo for Enhanced Seq2Seq Autoencoder via Contrastive Learning for Abstractive Text Summarization

ESACL: Enhanced Seq2Seq Autoencoder via Contrastive Learning for AbstractiveText Summarization This repo is for our paper "Enhanced Seq2Seq Autoencode

Rachel Zheng 14 Nov 01, 2022
Basic Utilities for PyTorch Natural Language Processing (NLP)

Basic Utilities for PyTorch Natural Language Processing (NLP) PyTorch-NLP, or torchnlp for short, is a library of basic utilities for PyTorch NLP. tor

Michael Petrochuk 2.1k Jan 01, 2023
A paper list of pre-trained language models (PLMs).

Large-scale pre-trained language models (PLMs) such as BERT and GPT have achieved great success and become a milestone in NLP.

RUCAIBox 124 Jan 02, 2023
Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"

T5: Text-To-Text Transfer Transformer The t5 library serves primarily as code for reproducing the experiments in Exploring the Limits of Transfer Lear

Google Research 4.6k Jan 01, 2023
BERTopic is a topic modeling technique that leverages 🤗 transformers and c-TF-IDF to create dense clusters allowing for easily interpretable topics whilst keeping important words in the topic descriptions

BERTopic BERTopic is a topic modeling technique that leverages 🤗 transformers and c-TF-IDF to create dense clusters allowing for easily interpretable

Maarten Grootendorst 3.6k Jan 07, 2023
Download videos from YouTube/Twitch/Twitter right in the Windows Explorer, without installing any shady shareware apps

youtube-dl and ffmpeg Windows Explorer Integration Download videos from YouTube/Twitch/Twitter and more (any platform that is supported by youtube-dl)

Wolfgang 226 Dec 30, 2022
Fine-tuning scripts for evaluating transformer-based models on KLEJ benchmark.

The KLEJ Benchmark Baselines The KLEJ benchmark (Kompleksowa Lista Ewaluacji Językowych) is a set of nine evaluation tasks for the Polish language und

Allegro Tech 17 Oct 18, 2022
Official implementations for various pre-training models of ERNIE-family, covering topics of Language Understanding & Generation, Multimodal Understanding & Generation, and beyond.

English|简体中文 ERNIE是百度开创性提出的基于知识增强的持续学习语义理解框架,该框架将大数据预训练与多源丰富知识相结合,通过持续学习技术,不断吸收海量文本数据中词汇、结构、语义等方面的知识,实现模型效果不断进化。ERNIE在累积 40 余个典型 NLP 任务取得 SOTA 效果,并在 G

5.4k Jan 03, 2023
Twewy-discord-chatbot - Build a Discord AI Chatbot that Speaks like Your Favorite Character

Build a Discord AI Chatbot that Speaks like Your Favorite Character! This is a Discord AI Chatbot that uses the Microsoft DialoGPT conversational mode

Lynn Zheng 231 Dec 30, 2022
Exploring dimension-reduced embeddings

sleepwalk Exploring dimension-reduced embeddings This is the code repository. See here for the Sleepwalk web page. License and disclaimer This program

S. Anders's research group at ZMBH 91 Nov 29, 2022
Graphical user interface for Argos Translate

Argos Translate GUI Website | GitHub | PyPI Graphical user interface for Argos Translate. Install pip3 install argostranslategui

Argos Open Tech 16 Dec 07, 2022