BERTAC (BERT-style transformer-based language model with Adversarially pretrained Convolutional neural network)

Related tags

Text Data & NLPbertac
Overview

BERTAC (BERT-style transformer-based language model with Adversarially pretrained Convolutional neural network)

BERTAC is a framework that combines a Transformer-based Language Model (TLM) such as BERT with an adversarially pretrained CNN (Convolutional Neural Network). It was proposed in our ACL-IJCNLP paper:

We showed in our experiments that BERTAC can improve the performance of TLMs on GLUE and open-domain QA tasks when using ALBERT or RoBERTa as the base TLM.

This repository provides the source code for BERTAC and adversarially pretrained CNN models described in the ACL-IJCNLP 2021 paper.

You can download the code and CNN models by following the procedure described in the "Try BERTAC section." The procedure includes downloading the BERTAC code, installing libraries required to run the code, and downloading pretrained models of the fastText word embedding vectors, the ALBERT xxlarge model, and our adversarially pretrained CNNs. The CNNs provided here were pretrained using the settings described in our ACL-IJCNLP 2021 paper. They can be downloaded automatically by running the script download_pretrained_model.sh as described in the "Try BERTAC section" or manually from the following page: cnn_models/README.md.

After this is done, you can run the GLUE and Open-domain QA experiments in the ACL-IJCNLP 2021 paper by following the procedure described in these pages, examples/GLUE/README.md and examples/QA/README.md. The procedure for the experiments starts from downloading GLUE and open-domain QA datasets (Quasar-T and SearchQA datasets for open-domain QA) and includes preprocessing the dataset and training/evaluating BERTAC models.

Overview of BERTAC

BERTAC is designed to improve Transformer-based Language Models such as ALBERT and BERT by integrating a simple CNN to them. The CNN is pretrained in a GAN (Generative Adversarial Network) style using Wikipedia data. By using as training data sentences in which an entity was masked in a cloze-test style, the CNN can generate alternative entity representations from sentences. BERTAC aims to improve TLMs for a variety of downstream tasks by using multiple text representations computed from different perspectives, i.e., those of TLMs trained by masked language modeling and those of CNNs trained in a GAN style to generate entity representations.

For a technical description of BERTAC, see our paper:

Try BERTAC

Prerequisites

BERTAC requires the following libraries and tools at runtime.

  • CUDA: A CUDA runtime must be available in the runtime environment. Currently, BERTAC has been tested with CUDA 10.1 and 10.2.
  • Python and Pytorch: BERTAC has been tested with Python 3.6 and 3.8, and Pytorch 1.5.1 and 1.8.1.
  • Perl: BERTAC has been tested with Perl 5.16.1 and 5.26.2.

Installation

You can install BERTAC by following the procedure described below.

  • Create a new conda environment bertac using the following command. Set a CUDA version available in your environment.
conda create -n bertac python=3.8 tqdm requests scikit-learn cudatoolkit cudnn lz4
  • Install Pytorch into the conda environment
conda activate bertac
conda install -n bertac pytorch=1.8 -c pytorch
  • Git clone the BERTAC code and run pip install -r requirements.txt in the root directory.
# git clone the code
git clone https://github.com/nict-wisdom/bertac
cd bertac

# Install requirements
pip install -r requirements.txt
  • Download the spaCy model en_core_web_md.
# Download the spaCy model 'en_core_web_md' 
python -m spacy download en_core_web_md
  • Install Perl and its JSON module into the conda environment.
# Install Perl and its JSON module
conda install -c anaconda perl -n bertac38
cpan install JSON
# Download pretrained CNN models, the fastText word embedding vectors, and
# the ALBERT xxlarge model (albert-xxlarge-v2) 
sh download_pretrained_model.sh

Note: the BERTAC code was built on the HuggingFace Transformers v2.4.1 and requires the NVIDIA apex as in the HuggingFace Transformers. Please install the NVIDIA apex following the procedure described in the NVIDIA apex page.

You can enter examples/GLUE or examples/QA folders and try the bash commands under these folders to run GLUE or open-domain QA experiments (see examples/GLUE/README.md and examples/QA/README.md for details on the procedures of the experiments).

GLUE experiments

You can run GLUE experiments by following the procedure described in examples/GLUE/README.md.

Results

The performances of BERTAC and other baseline models on the GLUE development set are shown below.

Models MNLI QNLI QQP RTE SST MRPC CoLA STS Avg.
RoBERTa-large 90.2/90.2 94.7 92.2 86.6 96.4 90.9 68.0 92.4 88.9
ELECTRA-large 90.9/- 95.0 92.4 88.0 96.9 90.8 69.1 92.6 89.5
ALBERT-xxlarge 90.8/- 95.3 92.2 89.2 96.9 90.9 71.4 93.0 90.0
DeBERTa-large 91.1/91.1 95.3 92.3 88.3 96.8 91.9 70.5 92.8 90.0
BERTAC
(ALBERT-xxlarge)
91.3/91.1 95.7 92.3 89.9 97.2 92.4 73.7 93.1 90.7

BERTAC(ALBERT-xxlarge), i.e., BERTAC using ALBERT-xxlarge as its base TLM, showed a higher average score (Avg. of the last column in the table) than (1) ALBERT-xxlarge (the base TLM) and (2) DeBERTa-large (the state-of-the-art method for the GLUE development set).

Open-domain QA experiments

You can run open-domain QA experiments by following the procedure described in examples/QA/README.md.

Results

The performances of BERTAC and other baseline methods on Quasar-T and SearchQA benchmarks are as follows.

Model Quasar-T (EM/F1) SearchQA (EM/F1)
OpenQA 42.2/49.3 58.8/64.5
OpenQA+ARG 43.2/49.7 59.6/65.3
WKLM(BERT-base) 45.8/52.2 61.7/66.7
MBERT(BERT-large) 51.1/59.1 65.1/70.7
CFormer(RoBERTa-large) 54.0/63.9 68.0/75.1
BERTAC(RoBERTa-large) 55.8/63.7 71.9/77.1
BERTAC(ALBERT-xxlarge) 58.0/65.8 74.0/79.2

Here, BERTAC(RoBERTa-large) and BERTAC(ALBERT-xxlarge) represent BERTAC using RoBERTa-large and ALBERT-xxlarge as their base TLM, respectively. BERTAC with any of the base TLMs showed better EM (Exact match with the gold standard answers) than the state-of-the-art method, CFormer(RoBERTa-large), for both benchmarks (Quasar-T and SearchQA).

Citation

If you use this source code, we would appreciate if you cite the following paper:

@inproceedings{ohetal2021bertac,
  title={BERTAC: Enhancing Transformer-based Language Models 
         with Adversarially Pretrained Convolutional Neural Networks},
  author={Jong-Hoon Oh and Ryu Iida and 
          Julien Kloetzer and Kentaro Torisawa},
  booktitle={The Joint Conference of the 59th Annual Meeting  
             of the Association for Computational Linguistics  
             and the 11th International Joint Conference 
             on Natural Language Processing (ACL-IJCNLP 2021)},
  year={2021}
}

Acknowledgements

Part of the source codes is borrowed from HuggingFace Transformers v2.4.1 licensed under Apache 2.0, DrQA licensed under BSD, and Open-QA licensed under MIT.

You might also like...
Natural language processing summarizer using 3 state of the art Transformer models: BERT, GPT2, and T5
Natural language processing summarizer using 3 state of the art Transformer models: BERT, GPT2, and T5

NLP-Summarizer Natural language processing summarizer using 3 state of the art Transformer models: BERT, GPT2, and T5 This project aimed to provide in

Learn meanings behind words is a key element in NLP. This project concentrates on the disambiguation of preposition senses. Therefore, we train a bert-transformer model and surpass the state-of-the-art.

New State-of-the-Art in Preposition Sense Disambiguation Supervisor: Prof. Dr. Alexander Mehler Alexander Henlein Institutions: Goethe University TTLa

LV-BERT: Exploiting Layer Variety for BERT (Findings of ACL 2021)

LV-BERT Introduction In this repo, we introduce LV-BERT by exploiting layer variety for BERT. For detailed description and experimental results, pleas

Pytorch-version BERT-flow: One can apply BERT-flow to any PLM within Pytorch framework.

Pytorch-version BERT-flow: One can apply BERT-flow to any PLM within Pytorch framework.

Create a semantic search engine with a neural network (i.e. BERT) whose knowledge base can be updated

Create a semantic search engine with a neural network (i.e. BERT) whose knowledge base can be updated. This engine can later be used for downstream tasks in NLP such as Q&A, summarization, generation, and natural language understanding (NLU).

PyTorch implementation and pretrained models for XCiT models. See XCiT: Cross-Covariance Image Transformer
PyTorch implementation and pretrained models for XCiT models. See XCiT: Cross-Covariance Image Transformer

Cross-Covariance Image Transformer (XCiT) PyTorch implementation and pretrained models for XCiT models. See XCiT: Cross-Covariance Image Transformer L

A library for finding knowledge neurons in pretrained transformer models.
A library for finding knowledge neurons in pretrained transformer models.

knowledge-neurons An open source repository replicating the 2021 paper Knowledge Neurons in Pretrained Transformers by Dai et al., and extending the t

This repository contains the code for "Generating Datasets with Pretrained Language Models".

Datasets from Instructions (DINO 🦕 ) This repository contains the code for Generating Datasets with Pretrained Language Models. The paper introduces

Composed Image Retrieval using Pretrained LANguage Transformers (CIRPLANT)
Composed Image Retrieval using Pretrained LANguage Transformers (CIRPLANT)

CIRPLANT This repository contains the code and pre-trained models for Composed Image Retrieval using Pretrained LANguage Transformers (CIRPLANT) For d

Releases(cnn_2.3.4.300)
Voice Assistant inspired by Google Assistant, Cortana, Alexa, Siri, ...

author: @shival_gupta VoiceAI This program is an example of a simple virtual assitant It will listen to you and do accordingly It will begin with wish

Shival Gupta 1 Jan 06, 2022
Rethinking the Truly Unsupervised Image-to-Image Translation - Official PyTorch Implementation (ICCV 2021)

Rethinking the Truly Unsupervised Image-to-Image Translation (ICCV 2021) Each image is generated with the source image in the left and the average sty

Clova AI Research 436 Dec 27, 2022
My Implementation for the paper EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks using Tensorflow

Easy Data Augmentation Implementation This repository contains my Implementation for the paper EDA: Easy Data Augmentation Techniques for Boosting Per

Aflah 9 Oct 31, 2022
Multilingual word vectors in 78 languages

Aligning the fastText vectors of 78 languages Facebook recently open-sourced word vectors in 89 languages. However these vectors are monolingual; mean

Babylon Health 1.2k Dec 17, 2022
An implementation of the Pay Attention when Required transformer

Pay Attention when Required (PAR) Transformer-XL An implementation of the Pay Attention when Required transformer from the paper: https://arxiv.org/pd

7 Aug 11, 2022
L3Cube-MahaCorpus a Marathi monolingual data set scraped from different internet sources.

L3Cube-MahaCorpus L3Cube-MahaCorpus a Marathi monolingual data set scraped from different internet sources. We expand the existing Marathi monolingual

21 Dec 17, 2022
Active learning for text classification in Python

Active Learning allows you to efficiently label training data in a small-data scenario.

Webis 375 Dec 28, 2022
To classify the News into Real/Fake using Features from the Text Content of the article

Hoax-Detector Authenticity of news has now become a major problem. The Idea is to classify the News into Real/Fake using Features from the Text Conten

Aravindhan 1 Feb 09, 2022
ChessCoach is a neural network-based chess engine capable of natural-language commentary.

ChessCoach is a neural network-based chess engine capable of natural-language commentary.

Chris Butner 380 Dec 03, 2022
p-tuning for few-shot NLU task

p-tuning_NLU Overview 这个小项目是受乐于分享的苏剑林大佬这篇p-tuning 文章启发,也实现了个使用P-tuning进行NLU分类的任务, 思路是一样的,prompt实现方式有不同,这里是将[unused*]的embeddings参数抽取出用于初始化prompt_embed后

3 Dec 29, 2022
PyTorch code for EMNLP 2019 paper "LXMERT: Learning Cross-Modality Encoder Representations from Transformers".

LXMERT: Learning Cross-Modality Encoder Representations from Transformers Our servers break again :(. I have updated the links so that they should wor

Hao Tan 838 Dec 19, 2022
Code to reproduce the results of the paper 'Towards Realistic Few-Shot Relation Extraction' (EMNLP 2021)

Realistic Few-Shot Relation Extraction This repository contains code to reproduce the results in the paper "Towards Realistic Few-Shot Relation Extrac

Bloomberg 8 Nov 09, 2022
PatrickStar enables Larger, Faster, Greener Pretrained Models for NLP. Democratize AI for everyone.

PatrickStar enables Larger, Faster, Greener Pretrained Models for NLP. Democratize AI for everyone.

Tencent 633 Dec 28, 2022
Dense Passage Retriever - is a set of tools and models for open domain Q&A task.

Dense Passage Retrieval Dense Passage Retrieval (DPR) - is a set of tools and models for state-of-the-art open-domain Q&A research. It is based on the

Meta Research 1.1k Jan 07, 2023
glow-speak is a fast, local, neural text to speech system that uses eSpeak-ng as a text/phoneme front-end.

Glow-Speak glow-speak is a fast, local, neural text to speech system that uses eSpeak-ng as a text/phoneme front-end. Installation git clone https://g

Rhasspy 8 Dec 25, 2022
Espresso: A Fast End-to-End Neural Speech Recognition Toolkit

Espresso Espresso is an open-source, modular, extensible end-to-end neural automatic speech recognition (ASR) toolkit based on the deep learning libra

Yiming Wang 919 Jan 03, 2023
Smart discord chatbot integrated with Dialogflow

academic-NLP-chatbot Smart discord chatbot integrated with Dialogflow to interact with students naturally and manage different classes in a school. De

Tom Huynh 5 Oct 24, 2022
基于“Seq2Seq+前缀树”的知识图谱问答

KgCLUE-bert4keras 基于“Seq2Seq+前缀树”的知识图谱问答 简介 博客:https://kexue.fm/archives/8802 环境 软件:bert4keras=0.10.8 硬件:目前的结果是用一张Titan RTX(24G)跑出来的。 运行 第一次运行的时候,会给知

苏剑林(Jianlin Su) 65 Dec 12, 2022
Fine-tune GPT-3 with a Google Chat conversation history

Google Chat GPT-3 This repo will help you fine-tune GPT-3 with a Google Chat conversation history. The trained model will be able to converse as one o

Nate Baer 7 Dec 10, 2022
A PyTorch implementation of paper "Learning Shared Semantic Space for Speech-to-Text Translation", ACL (Findings) 2021

Chimera: Learning Shared Semantic Space for Speech-to-Text Translation This is a Pytorch implementation for the "Chimera" paper Learning Shared Semant

Chi Han 43 Dec 28, 2022