An original implementation of "Noisy Channel Language Model Prompting for Few-Shot Text Classification"

Overview

Channel LM Prompting (and beyond)

This includes an original implementation of Sewon Min, Mike Lewis, Hannaneh Hajishirzi, Luke Zettlemoyer. "Noisy Channel Language Model Prompting for Few-Shot Text Classification" 2021.

For any questions about the paper or the code, or to request pretrained checkpoints, please contact the first author (email) or leave issues.

If you find our code or paper useful, please cite the paper:

@article{ min2021noisy ,
  title={ Noisy Channel Language Model Prompting for Few-Shot Text Classification },
  author={ Min, Sewon and Lewis, Mike and Hajishirzi, Hannaneh and Zettlemoyer, Luke },
  journal={ arXiv preprint },
  year={ 2021 }
}

This also includes implementations of many recent papers studying prompt-based learning. Please make sure to cite corresponding papers when you use implementations of the methods in this repo.

Content

  1. Installation
  2. Download & Preprocess Data
  3. Demonstration-based methods
  4. Tuning methods

You can run the channel model and the direct model for each of these methods. Please see Section 3 of the paper for more details about these formulations.

Installation

$ conda create -n lm-prompt python=3.8
$ conda activate lm-prompt
$ conda install pytorch=1.7.1 -c pytorch
$ pip install transformers==4.3.0

Download and Preprocess Data

We use (and modify) the data and the preprocessing script from Gao et al. ACL 2021 (paper, code) and Zhang et al. NeurIPS 2015 (paper, data).

To download the k-shot data (already preprocessed): Download the data (776MB) from this link. Pleae place data.zip under the same directory as the code and unzip it.

To download the original data and preprocess yourself:

pip install pandas==1.1.5 # for preprocessing script
mkdir data
cd data
wget https://nlp.cs.princeton.edu/projects/lm-bff/datasets.tar
tar xvf datasets.tar
cd ..

Also, download the data from here and place it in data/original.

Then, run python3 generative_k_shot_data.py, and you are done!

Optionally, you can specify arguments such as

  • --k: number of training examples (default is 16).
  • --balance: whether or not to guarantee the balance between labels in the training data; more precisely, whether k is the number of training examples in total or per label (default is False).
  • --data_dir: directory for the original data (default is data/original).
  • --output_dir: directory for the preprocessed data (default is data).

To check the data: You can see the list of eleven datasets used in the paper by ls data/k-shot. Each dataset consists of five different splits based on five different splits (test sets are the same).

Demonstration-based methods

This section is for methods which does not update any of the model parameters. For details about methods, please see Section 4.1 of the paper.

Zero-shot

python main.py \
    --task {task_name} \
    --split {dev|test} \
    --data_dir data \
    --out_dir out \
    --gpt2 gpt2-large \
    --do_zeroshot \
    --method {direct|channel}

This command will run zero-shot inference using GPT2-large using four different templates (verbalizers) as reported in the paper.

  • For "channel", please specify --method channel.
  • For "direct", please specify --method direct.
  • For "direct++", please run the command line without --split first (this will run inference using the N/A input, following Zhao et al. ICML 2021), and then run the command line with --method direct --use_calibration.

Useful notes:

  • Note that, once you run inference, it will save a cache in the out directory, and will re-load the cache file when you run the exact same command line.
  • You can adjust --batch_size if you run into OOM issue (default is 32).
  • Please note that GPU parallization is not implemented for inference.
  • To save a log file, please specify --log_file.
  • To use GPT2 with different sizes, please use --gpt2 {gpt2|gpt2-medium|gpt2-xl}.

Concat-based demonstration

python main.py \
    --task {task_name} \
    --split {dev|test} \
    --data_dir data \
    --out_dir out \
    --gpt2 gpt2-large \
    --do_zeroshot \
    --method {direct|channel} \
    --use_demonstrations \
    --k 16 \
    --seed {13|21|42|87|100}
  • You can modify k and seed to try different numbers of training examples and different seeds for the k-shot data.

Ensemble-based demonstration

Add --ensemble to the command line for the Concat-based demonstration method.

Tuning methods

This section is for methods that fully finetune the model parameters (standard finetuning), or update a very limited number of parameters (prompt tuning, head tuning and transformation tuning). For details about the methods, please see Section 4.2 of the paper.

Prompt tuning

python main.py \
    --task {task_name} \
    --split {dev|test} \
    --data_dir data \
    --out_dir out \
    --gpt2 gpt2-large \
    --method {direct|channel} \
    --prompt_tune \
    --do_train \
    --batch_size 32 \
    --lr {0.1|0.01|0.001}
  • Please see Appendix B of the paper to see which learning rate we used for each dataset.
  • Once you train the model, you can specify --do_check to load the existing checkpoint without retraining the model.
  • Please note that GPU parallization is implemented for training, but is not implemented for inference.
  • Note that, by default, we use the checkpoint that is trained for 100 steps.
  • To explore different numbers of prompts, please specify --n_prefix. The default value is 20, following the original prompt tuning paper (Lester et al. 2021).
  • If you want to explore zero-shot task transfer (Section 6.4 in the paper), you can (1) first train the model on the training data, and (2) run inference by specifying --task {task_name_for_test} --train_task {task_name_for_train} --do_check.

Head tuning

Use --head_tune instead of --prompt_tune to the command line for the Prompt tuning method. Note that head tuning is only for the direct baseline.

Transformation tuning

Use --transform_tune instead of --prompt_tune to the command line for the Prompt tuning method. Note that transformation tuning is only for the direct baseline.

Standard finetuning

To finetune the entire model parameters, as in typical finetuning, please do not specify any of --prompt_tune, --head_tune or --transform_tune.

Results

For all results, please check out Table 3 and Table 4 of the paper.

Owner
Sewon Min
PhD student @uwnlp
Sewon Min
Blind visual quality assessment on 360° Video based on progressive learning

Blind visual quality assessment on omnidirectional or 360 video (ProVQA) Blind VQA for 360° Video via Progressively Learning from Pixels, Frames and V

5 Jan 06, 2023
Simple Baselines for Human Pose Estimation and Tracking

Simple Baselines for Human Pose Estimation and Tracking News Our new work High-Resolution Representations for Labeling Pixels and Regions is available

Microsoft 2.7k Jan 05, 2023
[CVPR 2021] Region-aware Adaptive Instance Normalization for Image Harmonization

RainNet — Official Pytorch Implementation Region-aware Adaptive Instance Normalization for Image Harmonization Jun Ling, Han Xue, Li Song*, Rong Xie,

130 Dec 11, 2022
LAnguage Model Analysis

LAMA: LAnguage Model Analysis LAMA is a probe for analyzing the factual and commonsense knowledge contained in pretrained language models. The dataset

Meta Research 960 Jan 08, 2023
Code and Resources for the Transformer Encoder Reasoning Network (TERN)

Transformer Encoder Reasoning Network Code for the cross-modal visual-linguistic retrieval method from "Transformer Reasoning Network for Image-Text M

Nicola Messina 53 Dec 30, 2022
Optimizes image files by converting them to webp while also updating all references.

About Optimizes images by (re-)saving them as webp. For every file it replaced it automatically updates all references. Works on single files as well

Watermelon Wolverine 18 Dec 23, 2022
[NeurIPS 2021]: Are Transformers More Robust Than CNNs? (Pytorch implementation & checkpoints)

Are Transformers More Robust Than CNNs? Pytorch implementation for NeurIPS 2021 Paper: Are Transformers More Robust Than CNNs? Our implementation is b

Yutong Bai 145 Dec 01, 2022
EasyMocap is an open-source toolbox for markerless human motion capture from RGB videos.

EasyMocap is an open-source toolbox for markerless human motion capture from RGB videos. In this project, we provide the basic code for fitt

ZJU3DV 2.2k Jan 05, 2023
Voice assistant - Voice assistant with python

🌐 Python Voice Assistant 🌵 - User's greeting 🌵 - Writing tasks to todo-list ?

PythonToday 10 Dec 26, 2022
Official implementation of "Dynamic Anchor Learning for Arbitrary-Oriented Object Detection" (AAAI2021).

DAL This project hosts the official implementation for our AAAI 2021 paper: Dynamic Anchor Learning for Arbitrary-Oriented Object Detection [arxiv] [c

ming71 215 Nov 28, 2022
Dynamic View Synthesis from Dynamic Monocular Video

Dynamic View Synthesis from Dynamic Monocular Video Project Website | Video | Paper Dynamic View Synthesis from Dynamic Monocular Video Chen Gao, Ayus

Chen Gao 139 Dec 28, 2022
A Traffic Sign Recognition Project which can help the driver recognise the signs via text as well as audio. Can be used at Night also.

Traffic-Sign-Recognition In this report, we propose a Convolutional Neural Network(CNN) for traffic sign classification that achieves outstanding perf

Mini Project 64 Nov 19, 2022
Robust & Reliable Route Recommendation on Road Networks

NeuroMLR: Robust & Reliable Route Recommendation on Road Networks This repository is the official implementation of NeuroMLR: Robust & Reliable Route

4 Dec 20, 2022
Causal estimators for use with WhyNot

WhyNot Estimators A collection of causal inference estimators implemented in Python and R to pair with the Python causal inference library whynot. For

ZYKLS 8 Apr 06, 2022
Pyramid R-CNN: Towards Better Performance and Adaptability for 3D Object Detection

Pyramid R-CNN: Towards Better Performance and Adaptability for 3D Object Detection

61 Jan 07, 2023
Code for the Shortformer model, from the paper by Ofir Press, Noah A. Smith and Mike Lewis.

Shortformer This repository contains the code and the final checkpoint of the Shortformer model. This file explains how to run our experiments on the

Ofir Press 138 Apr 15, 2022
Learning to Communicate with Deep Multi-Agent Reinforcement Learning in PyTorch

Learning to Communicate with Deep Multi-Agent Reinforcement Learning This is a PyTorch implementation of the original Lua code release. Overview This

Minqi 297 Dec 12, 2022
Collection of machine learning related notebooks to share.

ML_Notebooks Collection of machine learning related notebooks to share. Notebooks GAN_distributed_training.ipynb In this Notebook, TensorFlow's tutori

Sascha Kirch 14 Dec 22, 2022
Prevent `CUDA error: out of memory` in just 1 line of code.

🐨 Koila Koila solves CUDA error: out of memory error painlessly. Fix it with just one line of code, and forget it. 🚀 Features 🙅 Prevents CUDA error

RenChu Wang 1.7k Jan 02, 2023
RETRO-pytorch - Implementation of RETRO, Deepmind's Retrieval based Attention net, in Pytorch

RETRO - Pytorch (wip) Implementation of RETRO, Deepmind's Retrieval based Attent

Phil Wang 556 Jan 04, 2023