QuALITY: Question Answering with Long Input Texts, Yes!

Related tags

Deep Learningquality
Overview

QuALITY: Question Answering with Long Input Texts, Yes!

Authors: Richard Yuanzhe Pang,* Alicia Parrish,* Nitish Joshi,* Nikita Nangia, Jason Phang, Angelica Chen, Vishakh Padmakumar, Johnny Ma, Jana Thompson, He He, and Samuel R. Bowman (* = equal contribution)

Data link

Download QuALITY v0.9 (zip).

Paper preprint

You can read the paper here.

Data README

Here are the explanations to the fields in the jsonl file. Each json line corresponds to the set of validated questions, corresponding to one article, written by one writer.

  • article_id: String. A five-digit number uniquely identifying the article. In each split, there are exactly two lines containing the same article_id, because two writers wrote questions for the same article.
  • set_unique_id: String. The unique ID corresponding to the set of questions, which corresponds to the line of json. Each set of questions is written by the same writer.
  • batch_num: String. The batch number. Our data collection is split in two groups, and there are three batches in each group. [i][j] means the j-th batch in the i-th group. For example, 23 corresponds to the third batch in the second group.
  • writer_id: String. The anonymized ID of the writer who wrote this set of questions.
  • source: String. The source of the article.
  • title: String. The title of the article.
  • author: String. The author of the article.
  • topic: String. The topic of the article.
  • url: String. The URL of the original unprocessed source article.
  • license: String. The license information for the article.
  • article: String. The HTML of the article. A script that converts HTML to plain texts is provided.
  • questions: A list of dictionaries explained below. Each line of json has a different number of questions because some questions were removed following validation.

As discussed, the value of questions is a list of dictionaries. Each dictionary has the following fields.

  • question: The question.
  • options: A list of four answer options.
  • gold_label: The correct answer, defined by a majority vote of 3 or 5 annotators + the original writer's label. The number corresponds to the option number (1-indexed) in options.
  • writer_label: The label the writer provided. The number corresponds to the option number (1-indexed) in options.
  • validation: A list of dictionaries containing the untimed validation results. Each dictionary contains the following fields.
    • untimed_annotator_id: The anonymized annotator IDs corresponding to the untimed validation results shown in untimed_answer.
    • untimed_answer: The responses in the untimed validation. Each question in the training set is annotated by three workers in most cases, and each question in the dev/test sets is annotated by five cases in most cases (see paper for exceptions).
    • untimed_eval1_answerability: The responses (represented numerically) to the first eval question in untimed validation. We asked the raters: “Is the question answerable and unambiguous?” The values correspond to the following choices:
      • 1: Yes, there is a single answer choice that is the most correct.
      • 2: No, two or more answer choices are equally correct.
      • 3: No, it is unclear what the question is asking, or the question or answer choices are unrelated to the passage.
    • untimed_eval2_context: The responses (represented numerically) to the second eval question in untimed validation. We asked the raters: “How much of the passage/text is needed as context to answer this question correctly?” The values correspond to the following choices:
      • 1: Only a sentence or two of context.
      • 2: At least a long paragraph or two of context.
      • 3: At least a third of the passage for context.
      • 4: Most or all of the passage for context.
    • untimed_eval3_distractor: The responses to the third eval question in untimed validation. We asked the raters: “Which of the options that you did not select was the best "distractor" item (i.e., an answer choice that you might be tempted to select if you hadn't read the text very closely)?” The numbers correspond to the option numbers (1-indexed).
  • speed_validation: A list of dictionaries containing the speed validation results. Each dictionary contains the following fields.
    • speed_annotator_id: The anonymized annotator IDs corresponding to the speed annotation results shown in speed_answer.
    • speed_answer: The responses in the speed validation. Each question is annotated by five workers.
  • difficult: A binary value. 1 means that less than 50% of the speed annotations answer the question correctly, so we include this question in the hard subset. Otherwise, the value is 0. In our evaluations, we report one accuracy figure for the entire dataset, and a second for the difficult=1 subset.

Validation criteria for the questions

  • More than 50% of annotators answer the question correctly in the untimed setting. That is, more than 50% of the untimed_answer annotations agree with gold_label (defined as the majority vote of validators' annotations together with the writer's provided label).
  • More than 50% of annotators think that the question is unambiguous and answerable. That is, more than 50% of the untimed_eval1_answerability annotations have 1's.

What are the hard questions?

  • More than 50% of annotators answer the question correctly in the untimed setting. That is, more than 50% of the untimed_answer annotations agree with gold_label.
  • More than 50% of annotators think that the question is unambiguous and answerable. That is, more than 50% of the untimed_eval1_answerability annotations have 1's.
  • More than 50% of annotators answer the question incorrectly in the speed validaiton setting. That is, more than 50% of the speed_answer annotations are incorrect.

Test set

The annotations for questions in the test set will not be released. We are currently working on a leaderboard. Stay tuned for an update by early January!

Code

The code for our baseline models will be released soon. Stay tuned for an update by early January!

Citation

@article{pang2021quality,
  title={{QuALITY}: Question Answering with Long Input Texts, Yes!},
  author={Pang, Richard Yuanzhe and Parrish, Alicia and Joshi, Nitish and Nangia, Nikita and Phang, Jason and Chen, Angelica and Padmakumar, Vishakh and Ma, Johnny and Thompson, Jana and He, He and Bowman, Samuel R.},
  journal={arXiv preprint arXiv:2112.08608},
  year={2021}
}

Contact

{yzpang, alicia.v.parrish}@nyu.edu

Owner
ML² AT CILVR
The Machine Learning for Language Group at NYU CILVR
ML² AT CILVR
This example implements the end-to-end MLOps process using Vertex AI platform and Smart Analytics technology capabilities

MLOps with Vertex AI This example implements the end-to-end MLOps process using Vertex AI platform and Smart Analytics technology capabilities. The ex

Google Cloud Platform 238 Dec 21, 2022
Converts given image (png, jpg, etc) to amogus gif.

Image to Amogus Converter Converts given image (.png, .jpg, etc) to an amogus gif! Usage Place image in the /target/ folder (or anywhere realistically

Hank Magan 1 Nov 24, 2021
Implementation for our AAAI2021 paper (Entity Structure Within and Throughout: Modeling Mention Dependencies for Document-Level Relation Extraction).

SSAN Introduction This is the pytorch implementation of the SSAN model (see our AAAI2021 paper: Entity Structure Within and Throughout: Modeling Menti

benfeng 69 Nov 15, 2022
Official implementation of VQ-Diffusion

Vector Quantized Diffusion Model for Text-to-Image Synthesis Overview This is the official repo for the paper: [Vector Quantized Diffusion Model for T

Microsoft 592 Jan 03, 2023
Distributing Deep Learning Hyperparameter Tuning for 3D Medical Image Segmentation

DistMIS Distributing Deep Learning Hyperparameter Tuning for 3D Medical Image Segmentation. DistriMIS Distributing Deep Learning Hyperparameter Tuning

HiEST 2 Sep 09, 2022
Out-of-Distribution Generalization of Chest X-ray Using Risk Extrapolation

OoD_Gen-Chest_Xray Out-of-Distribution Generalization of Chest X-ray Using Risk Extrapolation Requirements (Installations) Install the following libra

Enoch Tetteh 2 Oct 01, 2022
Multi-Horizon-Forecasting-for-Limit-Order-Books

Multi-Horizon-Forecasting-for-Limit-Order-Books This jupyter notebook is used to demonstrate our work, Multi-Horizon Forecasting for Limit Order Books

Zihao Zhang 116 Dec 23, 2022
MINIROCKET: A Very Fast (Almost) Deterministic Transform for Time Series Classification

MINIROCKET: A Very Fast (Almost) Deterministic Transform for Time Series Classification

187 Dec 26, 2022
[NeurIPS 2021] Galerkin Transformer: a linear attention without softmax

[NeurIPS 2021] Galerkin Transformer: linear attention without softmax Summary A non-numerical analyst oriented explanation on Toward Data Science abou

Shuhao Cao 159 Dec 20, 2022
A Python implementation of active inference for Markov Decision Processes

A Python package for simulating Active Inference agents in Markov Decision Process environments. Please see our companion preprint on arxiv for an ove

235 Dec 21, 2022
UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language

UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language This repository contains UA-GEC data and an accompanying Python lib

Grammarly 226 Dec 29, 2022
Motion Reconstruction Code and Data for Skills from Videos (SFV)

Motion Reconstruction Code and Data for Skills from Videos (SFV) This repo contains the data and the code for motion reconstruction component of the S

268 Dec 01, 2022
Simplified interface for TensorFlow (mimicking Scikit Learn) for Deep Learning

SkFlow has been moved to Tensorflow. SkFlow has been moved to http://github.com/tensorflow/tensorflow into contrib folder specifically located here. T

3.2k Dec 29, 2022
The implementation of "Optimizing Shoulder to Shoulder: A Coordinated Sub-Band Fusion Model for Real-Time Full-Band Speech Enhancement"

SF-Net for fullband SE This is the repo of the manuscript "Optimizing Shoulder to Shoulder: A Coordinated Sub-Band Fusion Model for Real-Time Full-Ban

Guochen Yu 36 Dec 02, 2022
Official Implementation of "Designing an Encoder for StyleGAN Image Manipulation"

Designing an Encoder for StyleGAN Image Manipulation (SIGGRAPH 2021) Recently, there has been a surge of diverse methods for performing image editing

749 Jan 09, 2023
Active and Sample-Efficient Model Evaluation

Active Testing: Sample-Efficient Model Evaluation Hi, good to see you here! 👋 This is code for "Active Testing: Sample-Efficient Model Evaluation". P

Jannik Kossen 19 Oct 30, 2022
A python code to convert Keras pre-trained weights to Pytorch version

Weights_Keras_2_Pytorch 最近想在Pytorch项目里使用一下谷歌的NIMA,但是发现没有预训练好的pytorch权重,于是整理了一下将Keras预训练权重转为Pytorch的代码,目前是支持Keras的Conv2D, Dense, DepthwiseConv2D, Batch

Liu Hengyu 2 Dec 16, 2021
Video-face-extractor - Video face extractor with Python

Python face extractor Setup Create the srcvideos and faces directories Put your

2 Feb 03, 2022
Deep Learning for Time Series Classification

Deep Learning for Time Series Classification This is the companion repository for our paper titled "Deep learning for time series classification: a re

Hassan ISMAIL FAWAZ 1.2k Jan 02, 2023
This is a repository for a semantic segmentation inference API using the OpenVINO toolkit

BMW-IntelOpenVINO-Segmentation-Inference-API This is a repository for a semantic segmentation inference API using the OpenVINO toolkit. It's supported

BMW TechOffice MUNICH 34 Nov 24, 2022