EMNLP 2021: Single-dataset Experts for Multi-dataset Question-Answering

Last update: Jul 18, 2022

Related tags

Overview

MADE (Multi-Adapter Dataset Experts)

This repository contains the implementation of MADE (Multi-adapter dataset experts), which is described in the paper Single-dataset Experts for Multi-dataset Question Answering.

MADE combines a shared Transformer with a collection of adapters that are specialized to different reading comprehension datasets. See our paper for details.

Quick links

Requirements
Download the data
Download trained models
Run the model
- Train
- Evaluate
- Transfer
Bugs or questions?
Citation

Requirements

The code uses Python 3.8, PyTorch, and the adapter-transformers library. Install the requirements with:

pip install -r requirements.txt

Download the data

You can download the datasets used in the paper from the repository for the MRQA 2019 shared task.

The datasets should be stored in directories ending with train or dev. For example, download the in-domain training datasets to a directory called data/train/ and download the in-domain development datasets to data/dev/.

For zero-shot and few-shot experiments, download the MRQA out-of-domain development datasets to a separate directory and split them into training and development splits using scripts/split_datasets.py. For example, download the datasets to data/transfer/ and run

ls data/transfer/* -1 | xargs -l python scripts/split_datasets.py

Use the default random seed (13) to replicate the splits used in the paper.

Download the trained models

The trained models are stored on the HuggingFace model hub at this URL: https://huggingface.co/princeton-nlp/MADE. All of the models are based on the RoBERTa-base model. They are:

MADE Transformer
MADE adapters (with and without separately tuning the adapters on each dataset).
- SQuAD (with adapter tuning | without adapter tuning)
- HotpotQA (with adapter tuning | without adapter tuning)
- TriviaQA (with adapter tuning | without adapter tuning)
- NewsQA (with adapter tuning | without adapter tuning)
- SearchQA (with adapter tuning | without adapter tuning)
- NaturalQuestions (with adapter tuning | without adapter tuning)
Multi-dataset fine-tuning
Single-dataset fine-tuning
- SQuAD
- HotpotQA
- TriviaQA
- NewsQA
- SearchQA
- NaturalQuestions
Single-dataset adapters
- SQuAD
- HotpotQA
- TriviaQA
- NewsQA
- SearchQA
- NaturalQuestions

To download just the MADE Transformer and adapters:

mkdir made_transformer
wget https://huggingface.co/princeton-nlp/MADE/resolve/main/made_transformer/model.pt -O made_transformer/model.pt

mkdir made_tuned_adapters
for d in SQuAD HotpotQA TriviaQA SearchQA NewsQA NaturalQuestions; do
  mkdir "made_tuned_adapters/${d}"
  wget "https://huggingface.co/princeton-nlp/MADE/resolve/main/made_tuned_adapters/${d}/model.pt" -O "made_tuned_adapters/${d}/model.pt"
done;

You can download all of the models at once by cloning the repository (first installing Git LFS):

git lfs install
git clone https://huggingface.co/princeton-nlp/MADE
mv MADE models

Run the model

The scripts in scripts/train/ and scripts/transfer/ provide examples of how to run the code. For more details, see the descriptions of the command line flags in run.py.

Train

You can use the scripts in scripts/train/ to train models on the MRQA datasets. For example, to train MADE:

./scripts/train/made_training.sh

And to tune the MADE adapters separately on individual datasets:

for d in SQuAD HotpotQA TriviaQA SearchQA NewsQA NaturalQuestions; do
  ./scripts/train/made_adapter_tuning.sh $d
done;

See run.py for details about the command line arguments.

Evaluate

A single fine-tuned model:

python run.py \
    --eval_on BioASQ DROP DuoRC RACE RelationExtraction TextbookQA \
    --load_from multi_dataset_ft \
    --output_dir output/zero_shot/multi_dataset_ft

An individual MADE adapter (e.g. SQuAD):

python run.py \
    --eval_on BioASQ DROP DuoRC RACE RelationExtraction TextbookQA \
    --load_from made_transformer \
    --load_adapters_from made_tuned_adapters \
    --adapter \
    --adapter_name SQuAD \
    --output_dir output/zero_shot/made_tuned_adapters/SQuAD

An individual single-dataset adapter (e.g. SQuAD):

python run.py \
    --eval_on BioASQ DROP DuoRC RACE RelationExtraction TextbookQA \
    --load_adapters_from single_dataset_adapters/ \
    --adapter \
    --adapter_name SQuAD \
    --output_dir output/zero_shot/single_dataset_adapters/SQuAD

An ensemble of MADE adapters. This will run a forward pass through every adapter in parallel.

python run.py \
    --eval_on BioASQ DROP DuoRC RACE RelationExtraction TextbookQA \
    --load_from made_transformer \
    --load_adapters_from made_tuned_adapters \
    --adapter_names SQuAD HotpotQA TriviaQA SearchQA NewsQA NaturalQuestions \
    --made \
    --parallel_adapters  \
    --output_dir output/zero_shot/made_ensemble

Averaging the parameters of the MADE adapters:

python run.py \
    --eval_on BioASQ DROP DuoRC RACE RelationExtraction TextbookQA \
    --load_from made_transformer \
    --load_adapters_from made_tuned_adapters \
    --adapter_names SQuAD HotpotQA TriviaQA SearchQA NewsQA NaturalQuestions \
    --adapter \
    --average_adapters  \
    --output_dir output/zero_shot/made_avg

Running UnifiedQA:

python run.py \
    --eval_on BioASQ DROP DuoRC RACE RelationExtraction TextbookQA \
    --seq2seq \
    --model_name_or_path allenai/unifiedqa-t5-base \
    --output_dir output/zero_shot/unifiedqa

Transfer

The scripts in scripts/transfer/ provide examples of how to run the few-shot transfer learning experiments described in the paper. For example, the following command will repeat for three random seeds: (1) sample 64 training examples from BioASQ, (2) calculate the zero-shot loss of all the MADE adapters on the training examples, (3) average the adapter parameters in proportion to zero-shot loss, (4) hold out 32 training examples for validation data, (5) train the adapter until performance stops improving on the 32 validation examples, and (6) evaluate the adapter on the full development set.

python run.py \
    --train_on BioASQ \
    --adapter_names SQuAD HotpotQA TriviaQA NewsQA SearchQA NaturalQuestions \
    --made \
    --parallel_made \
    --weighted_average_before_training \
    --adapter_learning_rate 1e-5 \
    --steps 200 \
    --patience 10 \
    --eval_before_training \
    --full_eval_after_training \
    --max_train_examples 64 \
    --few_shot \
    --criterion "loss" \
    --negative_examples \
    --save \
    --seeds 7 19 29 \
    --load_from "made_transformer" \
    --load_adapters_from "made_tuned_adapters" \
    --name "transfer/made_preaverage/BioASQ/64"

Bugs or questions?

If you have any questions related to the code or the paper, feel free to email Dan Friedman ([email protected]). If you encounter any problems when using the code, or want to report a bug, you can open an issue. Please try to specify the problem with details so we can help you better and quicker!

Citation

@inproceedings{friedman2021single,
   title={Single-dataset Experts for Multi-dataset QA},
   author={Friedman, Dan and Dodge, Ben and Chen, Danqi},
   booktitle={Empirical Methods in Natural Language Processing (EMNLP)},
   year={2021}
}

EMNLP 2021: Single-dataset Experts for Multi-dataset Question-Answering

Related tags

Overview

MADE (Multi-Adapter Dataset Experts)

Quick links

Requirements

Download the data

Download the trained models

Run the model

Train

Evaluate

Transfer

Bugs or questions?

Citation

Owner

Princeton Natural Language Processing

Implementation of Shape and Electrostatic similarity metric in deepFMPO.

Code for the TIP 2021 Paper "Salient Object Detection with Purificatory Mechanism and Structural Similarity Loss"

TensorFlow GNN is a library to build Graph Neural Networks on the TensorFlow platform.

MultiTaskLearning - Multi Task Learning for 3D segmentation

An architecture that makes any doodle realistic, in any specified style, using VQGAN, CLIP and some basic embedding arithmetics.

Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes

Progressive Growing of GANs for Improved Quality, Stability, and Variation

“英特尔创新大师杯”深度学习挑战赛赛道3：CCKS2021中文NLP地址相关性任务

A large-scale database for graph representation learning

Learning to Estimate Hidden Motions with Global Motion Aggregation

😇A pyTorch implementation of the DeepMoji model: state-of-the-art deep learning model for analyzing sentiment, emotion, sarcasm etc

Code for "FPS-Net: A convolutional fusion network for large-scale LiDAR point cloud segmentation".

Code for "Share With Thy Neighbors: Single-View Reconstruction by Cross-Instance Consistency" paper

How Effective is Incongruity? Implications for Code-mix Sarcasm Detection.

Code for "Learning Graph Cellular Automata"

A self-supervised learning framework for audio-visual speech

Sample and Computation Redistribution for Efficient Face Detection

PyTorch CZSL framework containing GQA, the open-world setting, and the CGE and CompCos methods.

A JAX implementation of Broaden Your Views for Self-Supervised Video Learning, or BraVe for short.

Net2net - Network-to-Network Translation with Conditional Invertible Neural Networks

EMNLP 2021: Single-dataset Experts for Multi-dataset Question-Answering

Related tags

Overview

MADE (Multi-Adapter Dataset Experts)

Quick links

Requirements

Download the data

Download the trained models

Run the model

Train

Evaluate

Transfer

Bugs or questions?

Citation

Owner

Princeton Natural Language Processing

Implementation of Shape and Electrostatic similarity metric in deepFMPO.

Code for the TIP 2021 Paper "Salient Object Detection with Purificatory Mechanism and Structural Similarity Loss"

TensorFlow GNN is a library to build Graph Neural Networks on the TensorFlow platform.

MultiTaskLearning - Multi Task Learning for 3D segmentation

An architecture that makes any doodle realistic, in any specified style, using VQGAN, CLIP and some basic embedding arithmetics.

Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes

Progressive Growing of GANs for Improved Quality, Stability, and Variation

“英特尔创新大师杯”深度学习挑战赛 赛道3：CCKS2021中文NLP地址相关性任务

A large-scale database for graph representation learning

Learning to Estimate Hidden Motions with Global Motion Aggregation

😇A pyTorch implementation of the DeepMoji model: state-of-the-art deep learning model for analyzing sentiment, emotion, sarcasm etc

Code for "FPS-Net: A convolutional fusion network for large-scale LiDAR point cloud segmentation".

Code for "Share With Thy Neighbors: Single-View Reconstruction by Cross-Instance Consistency" paper

How Effective is Incongruity? Implications for Code-mix Sarcasm Detection.

Code for "Learning Graph Cellular Automata"

A self-supervised learning framework for audio-visual speech

Sample and Computation Redistribution for Efficient Face Detection

PyTorch CZSL framework containing GQA, the open-world setting, and the CGE and CompCos methods.

A JAX implementation of Broaden Your Views for Self-Supervised Video Learning, or BraVe for short.

Net2net - Network-to-Network Translation with Conditional Invertible Neural Networks

“英特尔创新大师杯”深度学习挑战赛赛道3：CCKS2021中文NLP地址相关性任务