Code for paper "Do Language Models Have Beliefs? Methods for Detecting, Updating, and Visualizing Model Beliefs"

Overview

This is the codebase for the paper: Do Language Models Have Beliefs? Methods for Detecting, Updating, and Visualizing Model Beliefs

Directory Structure

data/ --> data folder including splits we use for FEVER, zsRE, Wikidata5m, and LeapOfThought
training_reports/ --> folder to be populated with individual training run reports produced by main.py
result_sheets/ --> folder to be populated with .csv's of results from experiments produced by main.py
aggregated_results/ --> contains combined experiment results produced by run_jobs.py
outputs/ --> folder to be populated with analysis results, including belief graphs and bootstrap outputs
models/ --> contains model wrappers for Huggingface models and the learned optimizer code
data_utils/ --> contains scripts for making all datasets used in paper
main.py --> main script for all individual experiments in the paper
metrics.py --> functions for calculing metrics reported in the paper
utils.py --> data loading and miscellaneous utilities
run_jobs.py --> script for running groups of experiments
statistical_analysis.py --> script for running bootstraps with the experimental results
data_analysis.Rmd --> R markdown file that makes plots using .csv's in result_sheets
requirements.txt --> contains required packages

Requirements

The code is compatible with Python 3.6+. data_analysis.Rmd is an R markdown file that makes all the plots in the paper.

The required packages can be installed by running:

pip install -r requirements.txt

If you wish to visualize belief graphs, you should also install a few packages as so:

sudo apt install python-pydot python-pydot-ng graphviz

Making Data

We include the data splits from the paper in data/ (though the train split for Wikidata5m is divided into two files that need to be locally combined.) To construct the datasets from scratch, you can follow a few steps:

  1. Set the DATA_DIR environment variable to where you'd like the data to be stored. Set the CODE_DIR to point to the directory where this code is.
  2. Run the following blocks of code

Make FEVER and ZSRE

cd $DATA_DIR
git clone https://github.com/facebookresearch/KILT.git
cd KILT
mkdir data
python scripts/download_all_kilt_data.py
mv data/* ./
cd $CODE_DIR
python data_utils/shuffle_fever_splits.py
python data_utils/shuffle_zsre_splits.py

Make Leap-Of-Thought

cd $DATA_DIR
git clone https://github.com/alontalmor/LeapOfThought.git
cd LeapOfThought
python -m LeapOfThought.run -c Hypernyms --artiset_module soft_reasoning -o build_artificial_dataset -v training_mix -out taxonomic_reasonings.jsonl.gz
gunzip taxonomic_reasonings_training_mix_train.jsonl.gz taxonomic_reasonings_training_mix_dev.jsonl.gz taxonomic_reasonings_training_mix_test.jsonl.gz taxonomic_reasonings_training_mix_meta.jsonl.gz
cd $CODE_DIR
python data_utils/shuffle_leapofthought_splits.py

Make Wikidata5m

cd $DATA_DIR
mkdir Wikidata5m
cd Wikidata5m
wget https://www.dropbox.com/s/6sbhm0rwo4l73jq/wikidata5m_transductive.tar.gz
wget https://www.dropbox.com/s/lnbhc8yuhit4wm5/wikidata5m_alias.tar.gz
tar -xvzf wikidata5m_transductive.tar.gz
tar -xvzf wikidata5m_alias.tar.gz
cd $CODE_DIR
python data_utils/filter_wikidata.py

Experiment Replication

Experiment commands require a few arguments: --data_dir points to where the data is. --save_dir points to where models should be saved. --cache_dir points to where pretrained models will be stored. --gpu indicates the GPU device number. --seeds indicates how many seeds per condition to run. We give commands below for the experiments in the paper, saving everything in $DATA_DIR.

To train the task and prepare the necessary data for training learned optimizers, run:

python run_jobs.py -e task_model --seeds 5 --dataset all --data_dir $DATA_DIR --save_dir $DATA_DIR --cache_dir $DATA_DIR
python run_jobs.py -e write_LeapOfThought_preds --seeds 5 --dataset LeapOfThought --do_train false --data_dir $DATA_DIR --save_dir $DATA_DIR --cache_dir $DATA_DIR

To get the main experiments in a single-update setting, run:

python run_jobs.py -e learned_opt_main --seeds 5 --dataset all --data_dir $DATA_DIR --save_dir $DATA_DIR --cache_dir $DATA_DIR

For results in a sequential-update setting (with r=10) run:

python run_jobs.py -e learned_opt_r_main --seeds 5 --dataset all --data_dir $DATA_DIR --save_dir $DATA_DIR --cache_dir $DATA_DIR

To get the corresponding off-the-shelf optimizer baselines for these experiments, run

python run_jobs.py -e base_optimizers --seeds 5 --do_train false  --data_dir $DATA_DIR --save_dir $DATA_DIR --cache_dir $DATA_DIR
python run_jobs.py -e base_optimizers_r_main --seeds 5 --do_train false  --data_dir $DATA_DIR --save_dir $DATA_DIR --cache_dir $DATA_DIR

To get ablations across values of r for the learned optimizer and baselines, run

python run_jobs.py -e base_optimizers_r_ablation --seeds 1 --do_train false  --data_dir $DATA_DIR --save_dir $DATA_DIR --cache_dir $DATA_DIR

Next we give commands for for ablations across k, the choice of training labels, the choice of evaluation labels, training objective terms, and a comparison to the objective from de Cao (in order):

python run_jobs.py -e learned_opt_k_ablation --seeds 1 --dataset ZSRE  --data_dir $DATA_DIR --save_dir $DATA_DIR --cache_dir $DATA_DIR
python run_jobs.py -e learned_opt_label_ablation --seeds 1 --dataset ZSRE --data_dir $DATA_DIR --save_dir $DATA_DIR --cache_dir $DATA_DIR
python run_jobs.py -e learned_opt_eval_ablation --seeds 1 --dataset ZSRE  --data_dir $DATA_DIR --save_dir $DATA_DIR --cache_dir $DATA_DIR
python run_jobs.py -e learned_opt_objective_ablation --seeds 1 --dataset all  --data_dir $DATA_DIR --save_dir $DATA_DIR --cache_dir $DATA_DIR
python run_jobs.py -e learned_opt_de_cao --seeds 5 --dataset all --data_dir $DATA_DIR --save_dir $DATA_DIR --cache_dir $DATA_DIR

Analysis

Statistical Tests

After running an experiment from above, you can compute confidence intervals and hypothesis tests using statistical_analysis.py.

To get confidence intervals for the main single-update learned optimizer experiments, run

python statistical_analysis -e learned_opt_main -n 10000

To run hypothesis tests between statistics for the learned opt experiment and its baselines, run

python statistical_analysis -e learned_opt_main -n 10000 --hypothesis_tests true

You can substitute the experiment name for results for other conditions.

Belief Graphs

Add --save_dir, --cache_dir, and --data_dir arguments to the commands below per the instructions above.

Write preds from FEVER model:
python main.py --dataset FEVER --probing_style model --probe linear --model roberta-base --seed 0 --do_train false --do_eval true --write_preds_to_file true

Write graph to file:
python main.py --dataset FEVER --probing_style model --probe linear --model roberta-base --seed 0 --do_train false --do_eval true --test_batch_size 64 --update_eval_truthfully false --fit_to_alt_labels true --update_beliefs true --optimizer adamw --lr 1e-6 --update_steps 100 --update_all_points true --write_graph_to_file true --use_dev_not_test false --num_random_other 10444

Analyze graph:
python main.py --dataset FEVER --probing_style model --probe linear --model roberta-base --seed 0 --test_batch_size 64 --update_eval_truthfully false --fit_to_alt_labels true --update_beliefs true --use_dev_not_test false --optimizer adamw --lr 1e-6 --update_steps 100 --do_train false --do_eval false --pre_eval false --do_graph_analysis true

Combine LeapOfThought Main Inputs and Entailed Data:
python data_utils/combine_leapofthought_data.py

Write LeapOfThought preds to file:
python main.py --dataset LeapOfThought --probing_style model --probe linear --model roberta-base --seed 0 --do_train false --do_eval true --write_preds_to_file true --leapofthought_main main

Write graph for LeapOfThought:
python main.py --dataset LeapOfThought --leapofthought_main main --probing_style model --probe linear --model roberta-base --seed 0 --do_train false --do_eval true --test_batch_size 64 --update_eval_truthfully false --fit_to_alt_labels true --update_beliefs true --optimizer sgd --update_steps 100 --lr 1e-2 --update_all_points true --write_graph_to_file true --use_dev_not_test false --num_random_other 8642

Analyze graph (add --num_eval_points 2000 to compute update-transitivity):
python main.py --dataset LeapOfThought --leapofthought_main main --probing_style model --probe linear --model roberta-base --seed 0 --do_train false --do_eval true --test_batch_size 64 --update_eval_truthfully false --fit_to_alt_labels true --update_beliefs true --optimizer sgd --update_steps 100 --lr 1e-2 --do_train false --do_eval false --pre_eval false --do_graph_analysis true

Plots

The data_analysis.Rmd R markdown file contains code for plots in the paper. It reads data from aggregated_results and saves plots in a ./figures directory.

Owner
Peter Hase
I am a PhD student in the UNC-NLP group at UNC Chapel Hill.
Peter Hase
The source codes for ACL 2021 paper 'BoB: BERT Over BERT for Training Persona-based Dialogue Models from Limited Personalized Data'

BoB: BERT Over BERT for Training Persona-based Dialogue Models from Limited Personalized Data This repository provides the implementation details for

124 Dec 27, 2022
This is the official repository of the paper Stocastic bandits with groups of similar arms (NeurIPS 2021). It contains the code that was used to compute the figures and experiments of the paper.

Experiments How to reproduce experimental results of Stochastic bandits with groups of similar arms submitted paper ? Section 5 of the paper To reprod

Fabien 0 Oct 25, 2021
Transformer in Computer Vision

Transformer-in-Vision A paper list of some recent Transformer-based CV works. If you find some ignored papers, please open issues or pull requests. **

506 Dec 26, 2022
Code for our CVPR 2021 paper "MetaCam+DSCE"

Joint Noise-Tolerant Learning and Meta Camera Shift Adaptation for Unsupervised Person Re-Identification (CVPR'21) Introduction Code for our CVPR 2021

FlyingRoastDuck 59 Oct 31, 2022
A Pytorch implementation of MoveNet from Google. Include training code and pre-train model.

Movenet.Pytorch Intro MoveNet is an ultra fast and accurate model that detects 17 keypoints of a body. This is A Pytorch implementation of MoveNet fro

Mr.Fire 241 Dec 26, 2022
Mini-hmc-jax - A simple implementation of Hamiltonian Monte Carlo in JAX

mini-hmc-jax This is a simple implementation of Hamiltonian Monte Carlo in JAX t

Martin Marek 6 Mar 03, 2022
M3DSSD: Monocular 3D Single Stage Object Detector

M3DSSD: Monocular 3D Single Stage Object Detector Setup pytorch 0.4.1 Preparation Download the full KITTI detection dataset. Then place a softlink (or

mumianyuxin 64 Dec 27, 2022
Activity image-based video retrieval

Cross-modal-retrieval Our approach is focus on Activity Image-to-Video Retrieval (AIVR) task. The compared methods are state-of-the-art single modalit

BCMI 75 Oct 21, 2021
Parameter Efficient Deep Probabilistic Forecasting

PEDPF Parameter Efficient Deep Probabilistic Forecasting (PEDPF) is a repository containing code to run experiments for several deep learning based pr

Olivier Sprangers 10 Jun 13, 2022
Official code for paper "Demystifying Local Vision Transformer: Sparse Connectivity, Weight Sharing, and Dynamic Weight"

Demysitifing Local Vision Transformer, arxiv This is the official PyTorch implementation of our paper. We simply replace local self attention by (dyna

138 Dec 28, 2022
OCRA (Object-Centric Recurrent Attention) source code

OCRA (Object-Centric Recurrent Attention) source code Hossein Adeli and Seoyoung Ahn Please cite this article if you find this repository useful: For

Hossein Adeli 2 Jun 18, 2022
SkipGNN: Predicting Molecular Interactions with Skip-Graph Networks (Scientific Reports)

SkipGNN: Predicting Molecular Interactions with Skip-Graph Networks Molecular interaction networks are powerful resources for the discovery. While dee

Kexin Huang 49 Oct 15, 2022
Event queue (Equeue) dialect is an MLIR Dialect that models concurrent devices in terms of control and structure.

Event Queue Dialect Event queue (Equeue) dialect is an MLIR Dialect that models concurrent devices in terms of control and structure. Motivation The m

Cornell Capra 23 Dec 08, 2022
Where2Act: From Pixels to Actions for Articulated 3D Objects

Where2Act: From Pixels to Actions for Articulated 3D Objects The Proposed Where2Act Task. Given as input an articulated 3D object, we learn to propose

Kaichun Mo 69 Nov 28, 2022
Code For TDEER: An Efficient Translating Decoding Schema for Joint Extraction of Entities and Relations (EMNLP2021)

TDEER (WIP) Code For TDEER: An Efficient Translating Decoding Schema for Joint Extraction of Entities and Relations (EMNLP2021) Overview TDEER is an e

Alipay 6 Dec 17, 2022
Example-custom-ml-block-keras - Custom Keras ML block example for Edge Impulse

Custom Keras ML block example for Edge Impulse This repository is an example on

Edge Impulse 8 Nov 02, 2022
Physics-Informed Neural Networks (PINN) and Deep BSDE Solvers of Differential Equations for Scientific Machine Learning (SciML) accelerated simulation

NeuralPDE NeuralPDE.jl is a solver package which consists of neural network solvers for partial differential equations using scientific machine learni

SciML Open Source Scientific Machine Learning 680 Jan 02, 2023
SSD: Single Shot MultiBox Detector pytorch implementation focusing on simplicity

SSD: Single Shot MultiBox Detector Introduction Here is my pytorch implementation of 2 models: SSD-Resnet50 and SSDLite-MobilenetV2.

Viet Nguyen 149 Jan 07, 2023
Systemic Evolutionary Chemical Space Exploration for Drug Discovery

SECSE SECSE: Systemic Evolutionary Chemical Space Explorer Chemical space exploration is a major task of the hit-finding process during the pursuit of

64 Dec 16, 2022
[ICLR 2021, Spotlight] Large Scale Image Completion via Co-Modulated Generative Adversarial Networks

Large Scale Image Completion via Co-Modulated Generative Adversarial Networks, ICLR 2021 (Spotlight) Demo | Paper [NEW!] Time to play with our interac

Shengyu Zhao 373 Jan 02, 2023