Few-shot NLP benchmark for unified, rigorous eval

Related tags

Deep Learningflex
Overview

FLEX

FLEX is a benchmark and framework for unified, rigorous few-shot NLP evaluation. FLEX enables:

  • First-class NLP support
  • Support for meta-training
  • Reproducible fewshot evaluations
  • Extensible benchmark creation (benchmarks defined using HuggingFace Datasets)
  • Advanced sampling functions for creating episodes with class imbalance, etc.

For more context, see our arXiv preprint.

Together with FLEX, we also released a simple yet strong few-shot model called UniFew. For more details, see our preprint.

Leaderboards

These instructions are geared towards users of the first benchmark created with this framework. The benchmark has two leaderboards, for the Pretraining-Only and Meta-Trained protocols described in Section 4.2 of our paper:

  • FLEX (Pretraining-Only): for models that do not use meta-training data related to the test tasks (do not follow the Model Training section below).
  • FLEX-META (Meta-Trained): for models that use only the provided meta-training and meta-validation data (please do see the Model Training section below).

Installation

  • Clone the repository: git clone [email protected]:allenai/flex.git
  • Create a Python 3 environment (3.7 or greater), eg using conda create --name flex python=3.9
  • Activate the environment: conda activate flex
  • Install the package locally with pip install -e .

Data Preparation

Creating the data for the flex challenge for the first time takes about 10 minutes (using a recent Macbook Pro on a broadband connection) and requires 3GB of disk space. You can initiate this process by running

python -c "import fewshot; fewshot.make_challenge('flex');"

You can control the location of the cached data by setting the environment variable HF_DATASETS_CACHE. If you have not set this variable, the location should default to ~/.cache/huggingface/datasets/. See the HuggingFace docs for more details.

Model Evaluation

"Challenges" are datasets of sampled tasks for evaluation. They are defined in fewshot/challenges/__init__.py.

To evaluate a model on challenge flex (our first challenge), you should write a program that produces a predictions.json, for example:

#!/usr/bin/env python3
import random
from typing import Iterable, Dict, Any, Sequence
import fewshot


class YourModel(fewshot.Model):
    def fit_and_predict(
        self,
        support_x: Iterable[Dict[str, Any]],
        support_y: Iterable[str],
        target_x: Iterable[Dict[str, Any]],
        metadata: Dict[str, Any]
    ) -> Sequence[str]:
        """Return random label predictions for a fewshot task."""
        train_x = [d['txt'] for d in support_x]
        train_y = support_y
        test_x = [d['txt'] for d in target_x]
        test_y = [random.choice(metadata['labels']) for _ in test_x]
        # >>> print(test_y)
        # ['some', 'list', 'of', 'label', 'predictions']
        return test_y


if __name__ == '__main__':
    evaluator = fewshot.make_challenge("flex")
    model = YourModel()
    evaluator.save_model_predictions(model=model, save_path='/path/to/predictions.json')

Warning: Calling fewshot.make_challenge("flex") above requires some time to prepare all the necessary data (see "Data preparation" section).

Running the above script produces /path/to/predictions.json with contents formatted as:

{
    "[QUESTION_ID]": {
        "label": "[CLASS_LABEL]",  # Currently an integer converted to a string
        "score": float  # Only used for ranking tasks
    },
    ...
}

Each [QUESTION_ID] is an ID for a test example in a few-shot problem.

[Optional] Parallelizing Evaluation

Two options are available for parallelizing evaluation.

First, one can restrict evaluation to a subset of tasks with indices from [START] to [STOP] (exclusive) via

evaluator.save_model_predictions(model=model, start_task_index=[START], stop_task_index=[STOP])

Notes:

  • You may use stop_task_index=None (or omit it) to avoid specifying an end.
  • You can find the total number of tasks in the challenge with fewshot.get_challenge_spec([CHALLENGE]).num_tasks.
  • To merge partial evaluation outputs into a complete predictions.json file, use fewshot merge partial1.json partial2.json ... predictions.json.

The second option will call your model's .fit_and_predict() method with batches of [BATCH_SIZE] tasks, via

evaluator.save_model_predictions(model=model, batched=True, batch_size=[BATCH_SIZE])

Result Validation and Scoring

To validate the contents of your predictions, run:

fewshot validate --challenge_name flex --predictions /path/to/predictions.json

This validates all the inputs and takes some time. Substitute flex for another challenge to evaluate on a different challenge.

(There is also a score CLI command which should not be used on the final challenge except when reporting final results.)

Model Training

For the meta-training protocol (e.g., the FLEX-META leaderboard), challenges come with a set of related training and validation data. This data is most easily accessible in one of two formats:

  1. Iterable from sampled episodes. fewshot.get_challenge_spec('flex').get_sampler(split='[SPLIT]') returns an iterable that samples datasets and episodes from meta-training or meta-validation datasets, via [SPLIT]='train' or [SPLIT]='val', respectively. The sampler defaults to the fewshot.samplers.Sample2WayMax8ShotCfg sampler configuration (for the fewshot.samplers.sample.Sampler class), but can be reconfigured.

  2. Raw dataset stores. This option is for directly accessing the raw data. fewshot.get_challenge_spec('flex').get_stores(split='[SPLIT']) returns a mapping from dataset names to fewshot.datasets.store.Store instances. Each Store instance has a Store.store attribute containing a raw HuggingFace Dataset instance. The Store instance has a Store.label attribute with the Dataset object key for accessing the target label (e.g., via Store.store[Store.label]) and the FLEX-formatted text available at the flex.txt key (e.g., via Store.store['flex.txt']).

Two examples of these respective approaches are available at:

  1. The UniFew model repository. For more details on Unifew, see also the FLEX Arxiv paper.
  2. The baselines/bao/ directory, for training and evaluating the approach described in the following paper:

Yujia Bao*, Menghua Wu*, Shiyu Chang, and Regina Barzilay. Few-shot Text Classification with Distributional Signatures. In International Conference on Learning Representations 2020

Benchmark Construction and Optimization

To add a new benchmark (challenge) named [NEW_CHALLENGE], you must edit fewshot/challenges/__init__.py or otherwise add it to the registry. The above usage instructions would change to substitute [NEW_CHALLENGE] in place of flex when calling fewshot.get_challenge_spec('[NEW_CHALLENGE]') and fewshot.make_challenge('[NEW_CHALLENGE]').

For an example of how to optimize the sample size of the challenge, see scripts/README-sample-size.md.

Attribution

If you make use of our framework, benchmark, or model, please cite our preprint:

@misc{bragg2021flex,
      title={FLEX: Unifying Evaluation for Few-Shot NLP},
      author={Jonathan Bragg and Arman Cohan and Kyle Lo and Iz Beltagy},
      year={2021},
      eprint={2107.07170},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
Scalable machine learning based time series forecasting

mlforecast Scalable machine learning based time series forecasting. Install PyPI pip install mlforecast Optional dependencies If you want more functio

Nixtla 145 Dec 24, 2022
Rethinking Portrait Matting with Privacy Preserving

Rethinking Portrait Matting with Privacy Preserving This is the official repository of the paper Rethinking Portrait Matting with Privacy Preserving.

184 Jan 03, 2023
TEA: A Sequential Recommendation Framework via Temporally Evolving Aggregations

TEA: A Sequential Recommendation Framework via Temporally Evolving Aggregations Requirements python 3.6 torch 1.9 numpy 1.19 Quick Start The experimen

DMIRLAB 4 Oct 16, 2022
A annotation of yolov5-5.0

代码版本:0714 commit #4000 $ git clone https://github.com/ultralytics/yolov5 $ cd yolov5 $ git checkout 720aaa65c8873c0d87df09e3c1c14f3581d4ea61 这个代码只是注释版

Laughing 229 Dec 17, 2022
Propagate Yourself: Exploring Pixel-Level Consistency for Unsupervised Visual Representation Learning, CVPR 2021

Propagate Yourself: Exploring Pixel-Level Consistency for Unsupervised Visual Representation Learning By Zhenda Xie*, Yutong Lin*, Zheng Zhang, Yue Ca

Zhenda Xie 293 Dec 20, 2022
Adjust Decision Boundary for Class Imbalanced Learning

Adjusting Decision Boundary for Class Imbalanced Learning This repository is the official PyTorch implementation of WVN-RS, introduced in Adjusting De

Peyton Byungju Kim 16 Jan 04, 2023
A high-performance Python-based I/O system for large (and small) deep learning problems, with strong support for PyTorch.

WebDataset WebDataset is a PyTorch Dataset (IterableDataset) implementation providing efficient access to datasets stored in POSIX tar archives and us

1.1k Jan 08, 2023
NAACL2021 - COIL Contextualized Lexical Retriever

COIL Repo for our NAACL paper, COIL: Revisit Exact Lexical Match in Information Retrieval with Contextualized Inverted List. The code covers learning

Luyu Gao 108 Dec 31, 2022
Implementation of experiments in the paper Clockwork Variational Autoencoders (project website) using JAX and Flax

Clockwork VAEs in JAX/Flax Implementation of experiments in the paper Clockwork Variational Autoencoders (project website) using JAX and Flax, ported

Julius Kunze 26 Oct 05, 2022
PyTorch implementation of the R2Plus1D convolution based ResNet architecture described in the paper "A Closer Look at Spatiotemporal Convolutions for Action Recognition"

R2Plus1D-PyTorch PyTorch implementation of the R2Plus1D convolution based ResNet architecture described in the paper "A Closer Look at Spatiotemporal

Irhum Shafkat 342 Dec 16, 2022
HashNeRF-pytorch - Pure PyTorch Implementation of NVIDIA paper on Instant Training of Neural Graphics primitives

HashNeRF-pytorch Instant-NGP recently introduced a Multi-resolution Hash Encodin

Yash Sanjay Bhalgat 616 Jan 06, 2023
Learning based AI for playing multi-round Koi-Koi hanafuda card games. Have fun.

Koi-Koi AI Learning based AI for playing multi-round Koi-Koi hanafuda card games. Platform Python PyTorch PySimpleGUI (for the interface playing vs AI

Sanghai Guan 10 Nov 20, 2022
Official release of MSHT: Multi-stage Hybrid Transformer for the ROSE Image Analysis of Pancreatic Cancer axriv: http://arxiv.org/abs/2112.13513

MSHT: Multi-stage Hybrid Transformer for the ROSE Image Analysis This is the official page of the MSHT with its experimental script and records. We de

Tianyi Zhang 53 Dec 27, 2022
Mall-Customers-Segmentation - Customer Segmentation Using K-Means Clustering

Overview Customer Segmentation is one the most important applications of unsupervised learning. Using clustering techniques, companies can identify th

NelakurthiSudheer 2 Jan 03, 2022
PyTorch Implementation of Daft-Exprt: Robust Prosody Transfer Across Speakers for Expressive Speech Synthesis

PyTorch Implementation of Daft-Exprt: Robust Prosody Transfer Across Speakers for Expressive Speech Synthesis

Ubisoft 76 Dec 30, 2022
Style-based Neural Drum Synthesis with GAN inversion

Style-based Drum Synthesis with GAN Inversion Demo TensorFlow implementation of a style-based version of the adversarial drum synth (ADS) from the pap

Sound and Music Analysis (SoMA) Group 29 Nov 19, 2022
RoMa: A lightweight library to deal with 3D rotations in PyTorch.

RoMa: A lightweight library to deal with 3D rotations in PyTorch. RoMa (which stands for Rotation Manipulation) provides differentiable mappings betwe

NAVER 90 Dec 27, 2022
[CVPR 2022] Back To Reality: Weak-supervised 3D Object Detection with Shape-guided Label Enhancement

Back To Reality: Weak-supervised 3D Object Detection with Shape-guided Label Enhancement Announcement 🔥 We have not tested the code yet. We will fini

Xiuwei Xu 7 Oct 30, 2022
Project page for our ICCV 2021 paper "The Way to my Heart is through Contrastive Learning"

The Way to my Heart is through Contrastive Learning: Remote Photoplethysmography from Unlabelled Video This is the official project page of our ICCV 2

36 Jan 06, 2023
Python Multi-Agent Reinforcement Learning framework

- Please pay attention to the version of SC2 you are using for your experiments. - Performance is *not* always comparable between versions. - The re

whirl 1.3k Jan 05, 2023