Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners

Overview

DART

Implementation for ICLR2022 paper Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners.

Environment

  • [email protected]
  • Use pip install -r requirements.txt to install dependencies.
  • wandb account is required if the user wants to search for best hyper-parameter combinations.

Data source

  • 16-shot GLUE dataset from LM-BFF.
  • Generated data consists of 5 random splits (13/21/42/87/100) for a task, each has 16 samples.

How to run

  • To run across each 5 splits in a task, use run.py:
    • In the arguments, encoder="inner" is the method proposed in the paper where verbalizers are other trainable tokens; encoder="manual" means verbalizers are selected fixed tokens; encoder="lstm" refers to the P-Tuning method.
$ python run.py -h
usage: run.py [-h] [--encoder {manual,lstm,inner,inner2}] [--task TASK]
              [--num_splits NUM_SPLITS] [--repeat REPEAT] [--load_manual]
              [--extra_mask_rate EXTRA_MASK_RATE]
              [--output_dir_suffix OUTPUT_DIR_SUFFIX]

optional arguments:
  -h, --help            show this help message and exit
  --encoder {manual,lstm,inner,inner2}
  --task TASK
  --num_splits NUM_SPLITS
  --repeat REPEAT
  --load_manual
  --extra_mask_rate EXTRA_MASK_RATE
  --output_dir_suffix OUTPUT_DIR_SUFFIX, -o OUTPUT_DIR_SUFFIX
  • To train and evaluate on a single split with details recorded, use inference.py.
    • Before running, [task_name, label_list, prompt_type] should be configured in the code.
    • prompt_type="none" refers to fixed verbalizer training, while "inner" refers to the method proposed in the paper. ("inner2" is deprecated 2-stage training)
  • To find optimal hyper-parameters for each task-split and reproduce our result, please use sweep.py:
    • Please refer to documentation for WandB for more details.
$ python sweep.py -h
usage: sweep.py [-h]
                [--task {SST-2,sst-5,mr,cr,mpqa,subj,trec,CoLA,MNLI,MNLI-mm,SNLI,QNLI,RTE-glue,MRPC,QQP}]
                [--encoder {none,mlp,lstm,inner,inner2}]
                [--seed_split {13,21,42,87,100} [{13,21,42,87,100} ...]]
                [--batch_size {4,8,16,24,32} [{4,8,16,24,32} ...]]
                [--sweep_id SWEEP_ID]

optional arguments:
  -h, --help            show this help message and exit
  --task {SST-2,sst-5,mr,cr,mpqa,subj,trec,CoLA,MNLI,MNLI-mm,SNLI,QNLI,RTE-glue,MRPC,QQP}
  --encoder {none,mlp,lstm,inner,inner2}
  --seed_split {13,21,42,87,100} [{13,21,42,87,100} ...]
  --batch_size {4,8,16,24,32} [{4,8,16,24,32} ...]
  --sweep_id SWEEP_ID
  • To train and evaluate with more customized configurations, use cli.py.
  • To analyze and visualize the results come from inference.py, use visualize.py and visualize_word_emb.py.

How to Cite

@article{DBLP:journals/corr/abs-2108-13161,
  author    = {Ningyu Zhang and
               Luoqiu Li and
               Xiang Chen and
               Shumin Deng and
               Zhen Bi and
               Chuanqi Tan and
               Fei Huang and
               Huajun Chen},
  title     = {Differentiable Prompt Makes Pre-trained Language Models Better Few-shot
               Learners},
  journal   = {CoRR},
  volume    = {abs/2108.13161},
  year      = {2021},
  url       = {https://arxiv.org/abs/2108.13161},
  eprinttype = {arXiv},
  eprint    = {2108.13161},
  timestamp = {Thu, 13 Jan 2022 17:33:17 +0100},
  biburl    = {https://dblp.org/rec/journals/corr/abs-2108-13161.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}
Owner
ZJUNLP
NLP Group of Knowledge Engine Lab at Zhejiang University
ZJUNLP
Cosine Annealing With Warmup

CosineAnnealingWithWarmup Formulation The learning rate is annealed using a cosine schedule over the course of learning of n_total total steps with an

zhuyun 4 Apr 18, 2022
PyTorch for Semantic Segmentation

PyTorch for Semantic Segmentation This repository contains some models for semantic segmentation and the pipeline of training and testing models, impl

Zijun Deng 1.7k Jan 06, 2023
TEDSummary is a speech summary corpus. It includes TED talks subtitle (Document), Title-Detail (Summary), speaker name (Meta info), MP4 URL, and utterance id

TEDSummary is a speech summary corpus. It includes TED talks subtitle (Document), Title-Detail (Summary), speaker name (Meta info), MP4 URL

3 Dec 26, 2022
Official implementation of "Dynamic Anchor Learning for Arbitrary-Oriented Object Detection" (AAAI2021).

DAL This project hosts the official implementation for our AAAI 2021 paper: Dynamic Anchor Learning for Arbitrary-Oriented Object Detection [arxiv] [c

ming71 215 Nov 28, 2022
MGFN: Multi-Graph Fusion Networks for Urban Region Embedding was accepted by IJCAI-2022.

Multi-Graph Fusion Networks for Urban Region Embedding (IJCAI-22) This is the implementation of Multi-Graph Fusion Networks for Urban Region Embedding

202 Nov 18, 2022
Tensorflow-Project-Template - A best practice for tensorflow project template architecture.

Tensorflow Project Template A simple and well designed structure is essential for any Deep Learning project, so after a lot of practice and contributi

Mahmoud G. Salem 3.6k Dec 22, 2022
Transport Mode detection - can detect the mode of transport with the help of features such as acceeration,jerk etc

title emoji colorFrom colorTo sdk app_file pinned Transport_Mode_Detector 🚀 purple yellow gradio app.py false Configuration title: string Display tit

Nishant Rajadhyaksha 3 Jan 16, 2022
Python TFLite scripts for detecting objects of any class in an image without knowing their label.

Python TFLite scripts for detecting objects of any class in an image without knowing their label.

Ibai Gorordo 42 Oct 07, 2022
SPCL: A New Framework for Domain Adaptive Semantic Segmentation via Semantic Prototype-based Contrastive Learning

SPCL SPCL: A New Framework for Domain Adaptive Semantic Segmentation via Semantic Prototype-based Contrastive Learning Update on 2021/11/25: ArXiv Ver

Binhui Xie (谢斌辉) 11 Oct 29, 2022
Supervised Contrastive Learning for Downstream Optimized Sequence Representations

SupCL-Seq 📖 Supervised Contrastive Learning for Downstream Optimized Sequence representations (SupCS-Seq) accepted to be published in EMNLP 2021, ext

Hooman Sedghamiz 18 Oct 21, 2022
Probabilistic Tracklet Scoring and Inpainting for Multiple Object Tracking

Probabilistic Tracklet Scoring and Inpainting for Multiple Object Tracking (CVPR 2021) Pytorch implementation of the ArTIST motion model. In this repo

Fatemeh 38 Dec 12, 2022
Tensorflow implementation of "Learning Deep Features for Discriminative Localization"

Weakly_detector Tensorflow implementation of "Learning Deep Features for Discriminative Localization" B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and

Taeksoo Kim 363 Jun 29, 2022
基于pytorch构建cyclegan示例

cyclegan-demo 基于Pytorch构建CycleGAN示例 如何运行 准备数据集 将数据集整理成4个文件,分别命名为 trainA, trainB:训练集,A、B代表两类图片 testA, testB:测试集,A、B代表两类图片 例如 D:\CODE\CYCLEGAN-DEMO\DATA

Koorye 3 Oct 18, 2022
Fit Fast, Explain Fast

FastExplain Fit Fast, Explain Fast Installing pip install fast-explain About FastExplain FastExplain provides an out-of-the-box tool for analysts to

8 Dec 15, 2022
System-oriented IR evaluations are limited to rather abstract understandings of real user behavior

Validating Simulations of User Query Variants This repository contains the scripts of the experiments and evaluations, simulated queries, as well as t

IR Group at Technische Hochschule Köln 2 Nov 23, 2022
codes for paper Combining Dynamic Local Context Focus and Dependency Cluster Attention for Aspect-level sentiment classification

DLCF-DCA codes for paper Combining Dynamic Local Context Focus and Dependency Cluster Attention for Aspect-level sentiment classification. submitted t

15 Aug 30, 2022
Oriented Response Networks, in CVPR 2017

Oriented Response Networks [Home] [Project] [Paper] [Supp] [Poster] Torch Implementation The torch branch contains: the official torch implementation

ZhouYanzhao 217 Dec 12, 2022
Autonomous Perception: 3D Object Detection with Complex-YOLO

Autonomous Perception: 3D Object Detection with Complex-YOLO LiDAR object detect

Thomas Dunlap 2 Feb 18, 2022
Deep learning for spiking neural networks

A deep learning library for spiking neural networks. Norse aims to exploit the advantages of bio-inspired neural components, which are sparse and even

Electronic Vision(s) Group — BrainScaleS Neuromorphic Hardware 59 Nov 28, 2022
Object detection evaluation metrics using Python.

Object detection evaluation metrics using Python.

Louis Facun 2 Sep 06, 2022