LV-BERT: Exploiting Layer Variety for BERT (Findings of ACL 2021)

Related tags

Deep LearningLV-BERT
Overview

LV-BERT

Introduction

In this repo, we introduce LV-BERT by exploiting layer variety for BERT. For detailed description and experimental results, please refer to our paper LV-BERT: Exploiting Layer Variety for BERT (Findings of ACL 2021).

Requirements

  • Python 3.6
  • TensorFlow 1.15
  • numpy
  • scikit-learn

Experiments

Firstly, set your data dir (absolute) to place datasets and models by

DATA_DIR=/path/to/data/dir

Fine-tining

We give the instruction to fine-tune a pre-trained LV-BERT-small (13M parameters) on GLUE. You can refer to this Google Colab notebook for a quick example. All models of different are provided this Google Drive folder. The models are pre-trained 1M steps with sequence length 128 to save compute. *_seq512 named models are trained for more 100K steps with sequence length 512 whichs are used for long-sequence tasks like SQuAD. See our paper for more details on model performance.

  1. Create your data directory.
mkdir -p $DATA_DIR/models && cp vocab.txt $DATA_DIR/

Put the pre-trained model in the corresponding directory

mv lv-bert_small $DATA_DIR/models/
  1. Download the GLUE data by running
python3 download_glue_data.py
  1. Set up the data by running
cd glue_data && mv CoLA cola && mv MNLI mnli && mv MRPC mrpc && mv QNLI qnli && mv QQP qqp && mv RTE rte && mv SST-2 sst && mv STS-B sts && mv diagnostic/diagnostic.tsv mnli && mkdir -p $DATA_DIR/finetuning_data && mv * $DATA_DIR/finetuning_data && cd ..
  1. Fine-tune the model by running
bash finetune.sh $DATA_DIR

PS: (a) You can test different tasks by changing configs in finetune.sh. (b) Some of the datasets on GLUE are small, causing that the results may vary substantially for different random seeds. The same as ELECTRA, we report the median of 10 fine-tuning runs from the same pre-trained model for each result.

Pre-training

We give the instruction to pre-train LV-BERT-small (13M parameters) using the OpenWebText corpus.

  1. First download the OpenWebText pre-traing corpus (12G).

  2. After downloading the pre-training corpus, build the pre-training dataset tf-record by running

bash build_data.sh $DATA_DIR
  1. Then, pre-train the model by running
bash pretrain.sh $DATA_DIR

Bibtex

@inproceedings{yu2021lv-bert,
        author = {Yu, Weihao and Jiang, Zihang and Chen, Fei, Hou, Qibin and Feng, Jiashi},
        title = {LV-BERT: Exploiting Layer Variety for BERT},
        booktitle = {Findings of ACL},
        month = {August},
        year = {2021}
}

Reference

This repo is based on the repo ELECTRA.

Owner
Weihao Yu
PhD student at NUS
Weihao Yu
Large-Scale Unsupervised Object Discovery

Large-Scale Unsupervised Object Discovery Huy V. Vo, Elena Sizikova, Cordelia Schmid, Patrick Pérez, Jean Ponce [PDF] We propose a novel ranking-based

17 Sep 19, 2022
SigOpt wrappers for scikit-learn methods

SigOpt + scikit-learn Interfacing This package implements useful interfaces and wrappers for using SigOpt and scikit-learn together Getting Started In

SigOpt 73 Sep 30, 2022
EgGateWayGetShell py脚本

EgGateWayGetShell_py 免责声明 由于传播、利用此文所提供的信息而造成的任何直接或者间接的后果及损失,均由使用者本人负责,作者不为此承担任何责任。 使用 python3 eg.py urls.txt 目标 title:锐捷网络-EWEB网管系统 port:4430 漏洞成因 ?p

榆木 61 Nov 09, 2022
Portfolio asset allocation strategies: from Markowitz to RNNs

Portfolio asset allocation strategies: from Markowitz to RNNs Research project to explore different approaches for optimal portfolio allocation starti

Luigi Filippo Chiara 1 Feb 05, 2022
8-week curriculum for AI Builders

curriculum 8-week curriculum for AI Builders สารบัญ บทที่ 1 - Machine Learning คืออะไร บทที่ 2 - ชุดข้อมูลมหัศจรรย์และถิ่นที่อยู่ บทที่ 3 - Stochastic

AI Builders 134 Jan 03, 2023
Differentiable Surface Triangulation

Differentiable Surface Triangulation This is our implementation of the paper Differentiable Surface Triangulation that enables optimization for any pe

61 Dec 07, 2022
Conversational text Analysis using various NLP techniques

PyConverse Let me try first Installation pip install pyconverse Usage Please try this notebook that demos the core functionalities: basic usage noteb

Rita Anjana 158 Dec 25, 2022
Selfplay In MultiPlayer Environments

This project allows you to train AI agents on custom-built multiplayer environments, through self-play reinforcement learning.

200 Jan 08, 2023
Transport Mode detection - can detect the mode of transport with the help of features such as acceeration,jerk etc

title emoji colorFrom colorTo sdk app_file pinned Transport_Mode_Detector 🚀 purple yellow gradio app.py false Configuration title: string Display tit

Nishant Rajadhyaksha 3 Jan 16, 2022
Codes and models of NeurIPS2021 paper - DominoSearch: Find layer-wise fine-grained N:M sparse schemes from dense neural networks

DominoSearch This is repository for codes and models of NeurIPS2021 paper - DominoSearch: Find layer-wise fine-grained N:M sparse schemes from dense n

11 Sep 10, 2022
PyTorch-based framework for Deep Hedging

PFHedge: Deep Hedging in PyTorch PFHedge is a PyTorch-based framework for Deep Hedging. PFHedge Documentation Neural Network Architecture for Efficien

139 Dec 30, 2022
PyTorch Code of "Memory In Memory: A Predictive Neural Network for Learning Higher-Order Non-Stationarity from Spatiotemporal Dynamics"

Memory In Memory Networks It is based on the paper Memory In Memory: A Predictive Neural Network for Learning Higher-Order Non-Stationarity from Spati

Yang Li 12 May 30, 2022
Natural Intelligence is still a pretty good idea.

Human Learn Machine Learning models should play by the rules, literally. Project Goal Back in the old days, it was common to write rule-based systems.

vincent d warmerdam 641 Dec 26, 2022
A Keras implementation of YOLOv3 (Tensorflow backend)

keras-yolo3 Introduction A Keras implementation of YOLOv3 (Tensorflow backend) inspired by allanzelener/YAD2K. Quick Start Download YOLOv3 weights fro

7.1k Jan 03, 2023
Replication of Pix2Seq with Pretrained Model

Pretrained-Pix2Seq We provide the pre-trained model of Pix2Seq. This version contains new data augmentation. The model is trained for 300 epochs and c

peng gao 51 Nov 22, 2022
A Deep Learning Framework for Neural Derivative Hedging

NNHedge NNHedge is a PyTorch based framework for Neural Derivative Hedging. The following repository was implemented to ease the experiments of our pa

GUIJIN SON 17 Nov 14, 2022
A stable algorithm for GAN training

DRAGAN (Deep Regret Analytic Generative Adversarial Networks) Link to our paper - https://arxiv.org/abs/1705.07215 Pytorch implementation (thanks!) -

195 Oct 10, 2022
This is an official implementation for "SimMIM: A Simple Framework for Masked Image Modeling".

Project This repo has been populated by an initial template to help get you started. Please make sure to update the content to build a great experienc

Microsoft 674 Dec 26, 2022
Self-Supervised Deep Blind Video Super-Resolution

Self-Blind-VSR Paper | Discussion Self-Supervised Deep Blind Video Super-Resolution By Haoran Bai and Jinshan Pan Abstract Existing deep learning-base

Haoran Bai 35 Dec 09, 2022
Official Code for "Constrained Mean Shift Using Distant Yet Related Neighbors for Representation Learning"

CMSF Official Code for "Constrained Mean Shift Using Distant Yet Related Neighbors for Representation Learning" Requirements Python = 3.7.6 PyTorch

4 Nov 25, 2022