Code for layerwise detection of linguistic anomaly paper (ACL 2021)

Last update: Dec 07, 2022

Related tags

Overview

Layerwise Anomaly

This repository contains the source code and data for our ACL 2021 paper: "How is BERT surprised? Layerwise detection of linguistic anomalies" by Bai Li, Zining Zhu, Guillaume Thomas, Yang Xu, and Frank Rudzicz.

Citation

If you use our work in your research, please cite:

Li, B., Zhu, Z., Thomas, G., Xu, Y., and Rudzicz, F. (2021) How is BERT surprised? Layerwise detection of linguistic anomalies. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics (ACL).

@inproceedings{li2021layerwise,
  author = "Li, Bai and Zhu, Zining and Thomas, Guillaume and Xu, Yang and Rudzicz, Frank",
  title = "How is BERT surprised? Layerwise detection of linguistic anomalies",
  booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics (ACL)",
  publisher = "Association for Computational Linguistics",
  year = "2021",
}

Dependencies

The project was developed with the following library versions. Running with other versions may crash or produce incorrect results.

Python 3.7.5
CUDA Version: 11.0
torch==1.7.1
transformers==4.5.1
numpy==1.19.0
pandas==0.25.3
scikit-learn==0.22

Setup Instructions

Clone this repo: git clone https://github.com/SPOClab-ca/layerwise-anomaly
Download BNC Baby (4m word sample) from this link and extract into data/bnc/
Run BNC preprocessing script: python scripts/process_bnc.py --bnc_dir=data/bnc/download/Texts --to=data/bnc.pkl
Clone BLiMP repo: cd data && git clone https://github.com/alexwarstadt/blimp

GMM experiments on BLiMP (Figure 2 and Appendix A)

PYTHONPATH=. time python scripts/blimp_anomaly.py \
  --bnc_path=data/bnc.pkl \
  --blimp_path=data/blimp/data/ \
  --out=blimp_result

Frequency correlation (Figure 3 and Appendix B)

Run the notebooks/FreqSurprisal.ipynb notebook.

Surprisal gap experiments (Figure 4)

PYTHONPATH=. time python scripts/run_surprisal_gaps.py \
  --bnc_path=data/bnc.pkl \
  --out=surprisal_gaps

Accuracy scores (Table 2)

PYTHONPATH=. time python scripts/run_accuracy.py \
  --model_name=roberta-base \
  --anomaly_model=gmm

Run unit tests

PYTHONPATH=. pytest tests

Code for layerwise detection of linguistic anomaly paper (ACL 2021)

Related tags

Overview

Layerwise Anomaly

Citation

Dependencies

Setup Instructions

GMM experiments on BLiMP (Figure 2 and Appendix A)

Frequency correlation (Figure 3 and Appendix B)

Surprisal gap experiments (Figure 4)

Accuracy scores (Table 2)

Run unit tests

Owner

Implemenets the Contourlet-CNN as described in C-CNN: Contourlet Convolutional Neural Networks, using PyTorch

Sequence to Sequence (seq2seq) Recurrent Neural Network (RNN) for Time Series Forecasting

Additional environments compatible with OpenAI gym

This repository contains all data used for writing a research paper Multiple Object Trackers in OpenCV: A Benchmark, presented in ISIE 2021 conference in Kyoto, Japan.

Facial expression detector

Official PyTorch implementation of "Adversarial Reciprocal Points Learning for Open Set Recognition"

Simple keras FCN Encoder/Decoder model for MS-COCO (food subset) segmentation

Extremely simple and fast extreme multi-class and multi-label classifiers.

Near-Duplicate Video Retrieval with Deep Metric Learning

AdelaiDepth is an open source toolbox for monocular depth prediction.

EMNLP 2021: Single-dataset Experts for Multi-dataset Question-Answering

Code for "On Memorization in Probabilistic Deep Generative Models"

Diffusion Probabilistic Models for 3D Point Cloud Generation (CVPR 2021)

NAS-FCOS: Fast Neural Architecture Search for Object Detection (CVPR 2020)

Official Pytorch implementation of C3-GAN

Lightweight Face Image Quality Assessment

Code for the paper "A Study of Face Obfuscation in ImageNet"

"Projelerle Yapay Zeka Ve Bilgisayarlı Görü" Kitabımın projeleri

SFD implement with pytorch

Learning Calibrated-Guidance for Object Detection in Aerial Images