NAACL'2021: Factual Probing Is [MASK]: Learning vs. Learning to Recall

Overview

OptiPrompt

This is the PyTorch implementation of the paper Factual Probing Is [MASK]: Learning vs. Learning to Recall.

We propose OptiPrompt, a simple and effective approach for Factual Probing. OptiPrompt optimizes the prompts on the input embedding space directly. It outperforms previous prompting methods on the LAMA benchmark. Furthermore, in order to better interprete probing results, we propose control experiments based on the probing results on randomly initialized models. Please check our paper for details.

Quick links

Setup

Install dependecies

Our code is based on python 3.7. All experiments are run on a single GPU.

Please install all the dependency packages using the following command:

pip install -r requirements.txt

Download the data

We pack all datasets we used in our experiments here. Please download it and extract the files to ./data, or run the following commands to autoamtically download and extract it.

bash script/download_data.sh

The datasets are structured as below.

data
├── LAMA-TREx                         # The original LAMA-TREx test set (34,039 examples)
│   ├── P17.jsonl                     # Testing file for the relation `P17`
│   └── ...
├── LAMA-TREx_UHN                     # The LAMA-TREx_UHN test set (27,102 examples)
│   ├── P17.jsonl                     # Testing file for the relation `P17`
│   └── ...
├── LAMA-TREx-easy-hard               # The easy and hard partitions of the LAMA-TREx dataset (check the paper for details)
│   ├── Easy                          # The LAMA-easy partition (10,546 examples)
│   │   ├── P17.jsonl                 # Testing file for the relation `P17`
│   │   └── ...
│   └── Hard                          # The LAMA-hard partition (23,493 examples)
│       ├── P17.jsonl                 # Testing file for the relation `P17`
│       └── ...
├── autoprompt_data                   # Training data collected by AutoPrompt
│   ├── P17                           # Train/dev/test files for the relation `P17`
│   │   ├── train.jsonl               # Training examples
│   │   ├── dev.jsonl                 # Development examples
│   │   └── test.jsonl                # Test examples (the same as LAMA-TREx test set)
│   └── ...
└── cmp_lms_data                      # Training data collected by ourselves which can be used for BERT, RoBERTa, and ALBERT (we only use this dataset in Table 6 in the paper)
    ├── P17                           # Train/dev/test files for the relation `P17`
    │   ├── train.jsonl               # Training examples
    │   ├── dev.jsonl                 # Development examples
    │   ├── test.jsonl                # Test examples (a subset of the LAMA-TREx test set, filtered using the common vocab of three models)
    └── ...

Run OptiPrompt

Train/evaluate OptiPrompt

You can use code/run_optiprompt.py to train or evaluate the prompts on a specific relation. A command template is as follow:

rel=P101
dir=outputs/${rel}
mkdir -p ${dir}

python code/run_optiprompt.py \
    --relation_profile relation_metainfo/LAMA_relations.jsonl \
    --relation ${rel} \
    --common_vocab_filename common_vocabs/common_vocab_cased.txt \
    --model_name bert-base-cased \
    --do_train \
    --train_data data/autoprompt_data/${rel}/train.jsonl \
    --dev_data data/autoprompt_data/${rel}/dev.jsonl \
    --do_eval \
    --test_data data/LAMA-TREx/${rel}.jsonl \
    --output_dir ${dir} \
    --random_init none \
    --output_predictions \
    [--init_manual_template] [--num_vectors 5 | 10]

Arguments:

  • relation_profile: the meta information for each relation, containing the manual templates.
  • relation: the relation type (e.g., P101) considered in this experiment.
  • common_vocab_filename: the vocabulary used to filter out facts; it should be the intersection of different models' for fair comparison.
  • model_name: the pre-trained model used in this experiment, e.g., bert-base-cased, albert-xxlarge-v1.
  • do_train: whether to train the prompts on a training and development set.
  • do_eval: whether to test the trained prompts on a testing set.
  • {train|dev|test}_data: the file path of training/development/testing dataset.
  • random_init: how do we random initialize the model before training, there are three settings:
    • none: use the pre-trained model, no random initialization is used;
    • embedding: the Rand E control setting, where we random initialize the embedding layer of the model;
    • all: the Rand M control setting, where we random initialize all the parameters of the model.
  • init_manual_template: whether initialize the dense vectors in OptiPrompt using the manual prompts.
  • num_vectors: how many dense vectors are added in OptiPrompt (this argument is valid only when init_manual_template is not set).
  • output_predictions: whether to output top-k predictions for each testing fact (k is specified by --k).

Run experiments on all relations

We provide an example script (scripts/run_optiprompt.sh) to run OptiPrompt on all 41 relations on the LAMA benchmark. Run the following command to use it:

bash scripts/run_opti.sh

The default setting of this script is to run OptiPromot initialized with manual prompts on the pre-trained bert-base-cased model (no random initialization is used). The results will be stored in the outputs directory.

Please modify the shell variables (i.e., OUTPUTS_DIR, MODEL, RAND) in scripts/run_optiprompt.sh if you want to run experiments on other settings.

Run Fine-tuning

We release the code that we used in our experiments (check Section 4 in the paper).

Fine-tuning language models on factual probing

You can use code/run_finetune.py to fine-tune a language model on a specific relation. A command template is as follow:

rel=P101
dir=outputs/${rel}
mkdir -p ${dir}

python code/run_finetune.py \
    --relation_profile relation_metainfo/LAMA_relations.jsonl \
    --relation ${rel} \
    --common_vocab_filename common_vocabs/common_vocab_cased.txt \
    --model_name bert-base-cased \
    --do_train \
    --train_data data/autoprompt_data/${rel}/train.jsonl \
    --dev_data data/autoprompt_data/${rel}/dev.jsonl \
    --do_eval \
    --test_data data/LAMA-TREx/${rel}.jsonl \
    --output_dir ${dir} \
    --random_init none \
    --output_predictions

Arguments:

  • relation_profile: the meta information for each relation, containing the manual templates.
  • relation: the relation type (e.g., P101) considered in this experiment.
  • common_vocab_filename: the vocabulary used to filter out facts; it should be the intersection of different models' for fair comparison.
  • model_name: the pre-trained model used in this experiment, e.g., bert-base-cased, albert-xxlarge-v1.
  • do_train: whether to train the prompts on a training and development set.
  • do_eval: whether to test the trained prompts on a testing set.
  • {train|dev|test}_data: the file path of training/development/testing dataset.
  • random_init: how do we random initialize the model before training, there are three settings:
    • none: use the pre-trained model, no random initialization is used;
    • embedding: the Rand E control setting, where we random initialize the embedding layer of the model;
    • all: the Rand M control setting, where we random initialize all the parameters of the model.
  • output_predictions: whether to output top-k predictions for each testing fact (k is specified by --k).

Run experiments on all relations

We provide an example script (scripts/run_finetune.sh) to run fine-tuning on all 41 relations on the LAMA benchmark. Run the following command to use it:

bash scripts/run_finetune.sh

Please modify the shell variables (i.e., OUTPUTS_DIR, MODEL, RAND) in scripts/run_finetune.sh if you want to run experiments on other settings.

Evaluate LAMA/LPAQA/AutoPrompt prompts

We provide a script to evaluate prompts released in previous works (based on code/run_finetune.py with only --do_eval). Please use the foolowing command:

bash scripts/run_eval_prompts {lama | lpaqa | autoprompt}

Questions?

If you have any questions related to the code or the paper, feel free to email Zexuan Zhong ([email protected]) or Dan Friedman ([email protected]). If you encounter any problems when using the code, or want to report a bug, you can open an issue. Please try to specify the problem with details so we can help you better and quicker!

Citation

@inproceedings{zhong2021factual,
   title={Factual Probing Is [MASK]: Learning vs. Learning to Recall},
   author={Zhong, Zexuan and Friedman, Dan and Chen, Danqi},
   booktitle={North American Association for Computational Linguistics (NAACL)},
   year={2021}
}
Owner
Princeton Natural Language Processing
Princeton Natural Language Processing
Facilitates implementing deep neural-network backbones, data augmentations

Introduction Nowadays, the training of Deep Learning models is fragmented and unified. When AI engineers face up with one specific task, the common wa

40 Dec 29, 2022
Deep Residual Learning for Image Recognition

Deep Residual Learning for Image Recognition This is a Torch implementation of "Deep Residual Learning for Image Recognition",Kaiming He, Xiangyu Zhan

Kimmy 561 Dec 01, 2022
USAD - UnSupervised Anomaly Detection on multivariate time series

USAD - UnSupervised Anomaly Detection on multivariate time series Scripts and utility programs for implementing the USAD architecture. Implementation

116 Jan 04, 2023
Diverse Image Generation via Self-Conditioned GANs

Diverse Image Generation via Self-Conditioned GANs Project | Paper Diverse Image Generation via Self-Conditioned GANs Steven Liu, Tongzhou Wang, David

Steven Liu 147 Dec 03, 2022
OOD Generalization and Detection (ACL 2020)

Pretrained Transformers Improve Out-of-Distribution Robustness How does pretraining affect out-of-distribution robustness? We create an OOD benchmark

littleRound 57 Jan 09, 2023
Recurrent Conditional Query Learning

Recurrent Conditional Query Learning (RCQL) This repository contains the Pytorch implementation of One Model Packs Thousands of Items with Recurrent C

Dongda 4 Nov 28, 2022
SGoLAM - Simultaneous Goal Localization and Mapping

SGoLAM - Simultaneous Goal Localization and Mapping PyTorch implementation of the MultiON runner-up entry, SGoLAM: Simultaneous Goal Localization and

10 Jan 05, 2023
Neural Motion Learner With Python

Neural Motion Learner Introduction This work is to extract skeletal structure from volumetric observations and to learn motion dynamics from the detec

Jinseok Bae 14 Nov 28, 2022
wgan, wgan2(improved, gp), infogan, and dcgan implementation in lasagne, keras, pytorch

Generative Adversarial Notebooks Collection of my Generative Adversarial Network implementations Most codes are for python3, most notebooks works on C

tjwei 1.5k Dec 16, 2022
一套完整的微博舆情分析流程代码,包括微博爬虫、LDA主题分析和情感分析。

已经将项目的关键文件上传,包含微博爬虫、LDA主题分析和情感分析三个部分。 1.微博爬虫 实现微博评论爬取和微博用户信息爬取,一天大概十万条。 2.LDA主题分析 实现文档主题抽取,包括数据清洗及分词、主题数的确定(主题一致性和困惑度)和最优主题模型的选择(暴力搜索)。 3.情感分析 实现评论文本的

182 Jan 02, 2023
This repository contains the source code of our work on designing efficient CNNs for computer vision

Efficient networks for Computer Vision This repo contains source code of our work on designing efficient networks for different computer vision tasks:

Sachin Mehta 386 Nov 26, 2022
Black-Box-Tuning - Black-Box Tuning for Language-Model-as-a-Service

Black-Box-Tuning Source code for paper "Black-Box Tuning for Language-Model-as-a

Tianxiang Sun 149 Jan 04, 2023
This is an open source library implementing hyperbox-based machine learning algorithms

hyperbox-brain is a Python open source toolbox implementing hyperbox-based machine learning algorithms built on top of scikit-learn and is distributed

Complex Adaptive Systems (CAS) Lab - University of Technology Sydney 21 Dec 14, 2022
Caffe models in TensorFlow

Caffe to TensorFlow Convert Caffe models to TensorFlow. Usage Run convert.py to convert an existing Caffe model to TensorFlow. Make sure you're using

Saumitro Dasgupta 2.8k Dec 31, 2022
Fake News Detection Using Machine Learning Methods

Fake-News-Detection-Using-Machine-Learning-Methods Fake news is always a real and dangerous issue. However, with the presence and abundance of various

Achraf Safsafi 1 Jan 11, 2022
Pervasive Attention: 2D Convolutional Networks for Sequence-to-Sequence Prediction

This is a fork of Fairseq(-py) with implementations of the following models: Pervasive Attention - 2D Convolutional Neural Networks for Sequence-to-Se

Maha 490 Dec 15, 2022
a reimplementation of LiteFlowNet in PyTorch that matches the official Caffe version

pytorch-liteflownet This is a personal reimplementation of LiteFlowNet [1] using PyTorch. Should you be making use of this work, please cite the paper

Simon Niklaus 365 Dec 31, 2022
Official implementation of the RAVE model: a Realtime Audio Variational autoEncoder

Official implementation of the RAVE model: a Realtime Audio Variational autoEncoder

Antoine Caillon 589 Jan 02, 2023
Codes for [NeurIPS'21] You are caught stealing my winning lottery ticket! Making a lottery ticket claim its ownership.

You are caught stealing my winning lottery ticket! Making a lottery ticket claim its ownership Codes for [NeurIPS'21] You are caught stealing my winni

VITA 8 Nov 01, 2022
Github Traffic Insights as Prometheus metrics.

github-traffic Github Traffic collects your repository's traffic data and exposes it as Prometheus metrics. Grafana dashboard that displays the metric

Grafana Labs 34 Oct 27, 2022