Diagnostic tests for linguistic capacities in language models

Overview

LM diagnostics

This repository contains the diagnostic datasets and experimental code for What BERT is not: Lessons from a new suite of psycholinguistic diagnostics for language models, by Allyson Ettinger.

Diagnostic test data

The datasets folder contains TSV files with data for each diagnostic test, along with explanatory README files for each dataset.

Code

[All code now updated to be run with Python 3.]

The code in this section can be used to process the diagnostic datasets for input to a language model, and then to run the diagnostic tests on that language model's predictions. The code should be used in three steps:

Step 1: Process datasets to produce inputs for LM

proc_datasets.py can be used to process the provided datasets into 1) <testname>-contextlist files containing contexts (one per line) on which the LM's predictions should be conditioned, and b) <testname>-targetlist files containing target words (one per line, aligned with the contexts in *-contextlist) for which you will need probabilities conditioned on the corresponding contexts. Repeats in *-contextlist are intentional, to align with the targets in *-targetlist.

Basic usage:

python proc_datasets.py \
  --outputdir <location for output files> \
  --role_stim datasets/ROLE-88/ROLE-88.tsv \
  --negnat_stim datasets/NEG-88/NEG-88-NAT.tsv \
  --negsimp_stim datasets/NEG-88/NEG-88-SIMP.tsv \
  --cprag_stim datasets/CPRAG-34/CPRAG-34.tsv \
  --add_mask_tok
  • add_mask_tok flag will append '[MASK]' to the contexts in *-contextlist, for use with BERT.
  • <testname> comes from the following list: cprag, role, negsimp, negnat for CPRAG-34, ROLE-88, NEG-88-SIMP and NEG-88-NAT, respectively.

Step 2: Get LM predictions/probabilities

You will need to produce two files: one containing top word predictions conditioned on each context, and one containing the probabilities for each target word conditioned on its corresponding context.

Predictions: Model word predictions should be written to a file with naming modelpreds-<testname>-<modelname>. Each line of this file should contain the top word predictions conditioned on the context in the corresponding line in *-contextlist. Word predictions on a given line should be separated by whitespace. Number of predictions per line should be no less than the highest k that you want to use for accuracy tests.

Probabilities Model target probabilities should be written to a file with naming modeltgtprobs-<testname>-<modelname>. Each line of this file should contain the probability of the target word on the corresponding line of *-targetlist, conditioned on the context on the corresponding line of *-contextlist.

  • <testname> list is as above. <modelname> should be the name of the model that will be input to the code in Step 3.

Step 3: Run accuracy and sensitivity tests for each diagnostic

prediction_accuracy_tests.py takes modelpreds-<testname>-<modelname> as input and runs word prediction accuracy tests.

Basic usage:

python prediction_accuracy_tests.py \
  --preddir <location of modelpreds-<testname>-<modelname>> \
  --resultsdir <location for results files> \
  --models <names of models to be tested, e.g., bert-base-uncased bert-large-uncased> \
  --k_values <list of k values to be tested, e.g., 1 5> \
  --role_stim datasets/ROLE-88/ROLE-88.tsv \
  --negnat_stim datasets/NEG-88/NEG-88-NAT.tsv \
  --negsimp_stim datasets/NEG-88/NEG-88-SIMP.tsv \
  --cprag_stim datasets/CPRAG-34/CPRAG-34.tsv

sensitivity_tests.py takes modeltgtprobs-<testname>-<modelname> as input and runs sensitivity tests.

Basic usage:

python sensitivity_tests.py \
  --probdir <location of modelpreds-<testname>-<modelname>> \
  --resultsdir <location for results files> \
  --models <names of models to be tested, e.g., bert-base-uncased bert-large-uncased> \
  --role_stim datasets/ROLE-88/ROLE-88.tsv \
  --negnat_stim datasets/NEG-88/NEG-88-NAT.tsv \
  --negsimp_stim datasets/NEG-88/NEG-88-SIMP.tsv \
  --cprag_stim datasets/CPRAG-34/CPRAG-34.tsv

Experimental code

run_diagnostics_bert.py is the code that was used for the experiments on BERTBASE and BERTLARGE reported in the paper, including perturbations.

Example usage:

python run_diagnostics_bert.py \
  --cprag_stim datasets/CPRAG-34/CPRAG-34.tsv \
  --role_stim datasets/ROLE-88/ROLE-88.tsv \
  --negnat_stim datasets/NEG-88/NEG-88-NAT.tsv \
  --negsimp_stim datasets/NEG-88/NEG-88-SIMP.tsv \
  --resultsdir <location for results files> \
  --bertbase <BERT BASE location> \
  --bertlarge <BERT LARGE location> \
  --incl_perturb
  • bertbase and bertlarge specify locations for PyTorch BERTBASE and BERTLARGE models -- each folder is expected to include vocab.txt, bert_config.json, and pytorch_model.bin for the corresponding PyTorch BERT model. (Note that experiments were run with the original pytorch-pretrained-bert version, so I can't guarantee identical results with the updated pytorch-transformers.)
  • incl_perturb runs experiments with all perturbations reported in the paper. Without this flag, only runs experiments without perturbations.
Deep learning image registration library for PyTorch

TorchIR: Pytorch Image Registration TorchIR is a image registration library for deep learning image registration (DLIR). I have integrated several ide

Bob de Vos 40 Dec 16, 2022
Simulation of self-focusing of laser beams in condensed media

What is it? Program for scientific research, which allows to simulate the phenomenon of self-focusing of different laser beams (including Gaussian, ri

Evgeny Vasilyev 13 Dec 24, 2022
Step by Step on how to create an vision recognition model using LOBE.ai, export the model and run the model in an Azure Function

Step by Step on how to create an vision recognition model using LOBE.ai, export the model and run the model in an Azure Function

El Bruno 3 Mar 30, 2022
Code for the paper "Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks"

ON-LSTM This repository contains the code used for word-level language model and unsupervised parsing experiments in Ordered Neurons: Integrating Tree

Yikang Shen 572 Nov 21, 2022
Revitalizing CNN Attention via Transformers in Self-Supervised Visual Representation Learning

Revitalizing CNN Attention via Transformers in Self-Supervised Visual Representation Learning

ChongjianGE 89 Dec 02, 2022
Compares various time-series feature sets on computational performance, within-set structure, and between-set relationships.

feature-set-comp Compares various time-series feature sets on computational performance, within-set structure, and between-set relationships. Reposito

Trent Henderson 7 May 25, 2022
Cognition-aware Cognate Detection

Cognition-aware Cognate Detection The repository which contains our code for our EACL 2021 paper titled, "Cognition-aware Cognate Detection". This wor

Prashant K. Sharma 1 Feb 01, 2022
Implementation of GGB color space

GGB Color Space This package is implementation of GGB color space from Development of a Robust Algorithm for Detection of Nuclei and Classification of

Resha Dwika Hefni Al-Fahsi 2 Oct 06, 2021
Official code for the publication "HyFactor: Hydrogen-count labelled graph-based defactorization Autoencoder".

HyFactor Graph-based architectures are becoming increasingly popular as a tool for structure generation. Here, we introduce a novel open-source archit

Laboratoire-de-Chemoinformatique 11 Oct 10, 2022
A certifiable defense against adversarial examples by training neural networks to be provably robust

DiffAI v3 DiffAI is a system for training neural networks to be provably robust and for proving that they are robust. The system was developed for the

SRI Lab, ETH Zurich 202 Dec 13, 2022
Modifications of the official PyTorch implementation of StyleGAN3. Let's easily generate images and videos with StyleGAN2/2-ADA/3!

Alias-Free Generative Adversarial Networks (StyleGAN3) Official PyTorch implementation of the NeurIPS 2021 paper Alias-Free Generative Adversarial Net

Diego Porres 185 Dec 24, 2022
GraphGT: Machine Learning Datasets for Graph Generation and Transformation

GraphGT: Machine Learning Datasets for Graph Generation and Transformation Dataset Website | Paper Installation Using pip To install the core environm

y6q9 50 Aug 18, 2022
Efficient Multi Collection Style Transfer Using GAN

Proposed a new model that can make style transfer from single style image, and allow to transfer into multiple different styles in a single model.

Zhaozheng Shen 2 Jan 15, 2022
implicit displacement field

Geometry-Consistent Neural Shape Representation with Implicit Displacement Fields [project page][paper][cite] Geometry-Consistent Neural Shape Represe

Yifan Wang 100 Dec 19, 2022
ICSS - Interactive Continual Semantic Segmentation

Presentation This repository contains the code of our paper: Weakly-supervised c

Alteia 9 Jul 23, 2022
Full body anonymization - Realistic Full-Body Anonymization with Surface-Guided GANs

Realistic Full-Body Anonymization with Surface-Guided GANs This is the official

Håkon Hukkelås 30 Nov 18, 2022
A high-performance Python-based I/O system for large (and small) deep learning problems, with strong support for PyTorch.

WebDataset WebDataset is a PyTorch Dataset (IterableDataset) implementation providing efficient access to datasets stored in POSIX tar archives and us

1.1k Jan 08, 2023
Tool for live presentations using manim

manim-presentation Tool for live presentations using manim Install pip install manim-presentation opencv-python Usage Use the class Slide as your sce

Federico Galatolo 146 Jan 06, 2023
Implementation of Barlow Twins paper

barlowtwins PyTorch Implementation of Barlow Twins paper: Barlow Twins: Self-Supervised Learning via Redundancy Reduction This is currently a work in

IgorSusmelj 86 Dec 20, 2022
Python package for visualizing the loss landscape of parameterized quantum algorithms.

orqviz A Python package for easily visualizing the loss landscape of Variational Quantum Algorithms by Zapata Computing Inc. orqviz provides a collect

Zapata Computing, Inc. 75 Dec 30, 2022