COPA-SSE contains crowdsourced explanations for the Balanced COPA dataset

Last update: Jul 31, 2022

Related tags

Overview

COPA-SSE

Repository for COPA-SSE: Semi-Structured Explanations for Commonsense Reasoning.

COPA-SSE contains crowdsourced explanations for the Balanced COPA dataset, a variant of the Choice of Plausible Alternatives (COPA) benchmark. The explanations are formatted as a set of triple-like common sense statements with ConceptNet relations but freely written concepts.

Data format

dev-explained.jsonl and test-explained.jsonl each contain Balanced COPA samples with added explanations in .jsonl format. The question ids match the original questions of the development and test set, respectively.

Each entry contains:

the original question (matching format and ids)
human-explanations: a list of explanations each containing:
- expl-id: the explanation id
- text: the explanation in plain text (full sentences)
- worker-id: anonymized worker id (the author of the explanation)
- worker-avg: the average score the author got for their explanations
- all-ratings: all collected ratings for the explanation
- filtered-ratings: ratings excluding those that failed the control
- triples: the triple-form explanation (a list of ConceptNet-like triples)

Example entry:

id: 1, 
asks-for: cause, 
most-plausible-alternative: 1,
p: "My body cast a shadow over the grass.", 
a1: "The sun was rising.", 
a2: "The grass was cut.", 
human-explanations: [
    {expl-id: f4d9b407-681b-4340-9be1-ac044f1c2230, 
     text: "Sunrise causes casted shadows.", 
     worker-id: 3a71407b-9431-49f9-b3ca-1641f7c05f3b, 
     worker-avg: 3.5832864694635025, 
     all-ratings: [1, 3, 3, 4, 3], 
     filtered-ratings: [3, 3, 4, 3], 
     filtered-avg-rating: 3.25, 
     triples: [["sunrise", "Causes", "casted shadows"]]
     }, ...]

Aggregated versions

graphs.pkl contains aggregated versions of the triples for each question in a dictionary format with COPA question ids as the key.

Each entry contains a list of edges, each being a tuple of (u, v, {'rel': relation, 'weight': weight}). Similar nodes were connected or merged with relatedto, depending on the cosine similarity between their SentenceTransformer embeddings. The weight is the average score of the explanation the edge originated from (summed if multiple), or 1.0 if the edge was automatically generated.

Note: not all graphs are (weakly) connected.

Example entry:

1: [('sunrise', 'casted_shadows', {'rel': 'causes', 'weight': 3.25}),
  ('sunrise', 'sun', {'rel': 'relatedto', 'weight': 1.0}),
  ('casted_shadows', 'the_shadow', {'rel': 'relatedto', 'weight': 1.0}),
  ('sun_rising', 'bringing_light', {'rel': 'hasproperty', 'weight': 4.25}),
  ('sun_rising', 'a_sun_raising', {'rel': 'relatedto', 'weight': 1.0}),
 ...
]

Citation

Thank you for your interest in our dataset! If you use it in your research, please cite:

@misc{brassard2022copasse,
    title={COPA-SSE: Semi-structured Explanations for Commonsense Reasoning},
    author={Ana Brassard and Benjamin Heinzerling and Pride Kavumba and Kentaro Inui},
    year={2022},
    eprint={2201.06777},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

COPA-SSE contains crowdsourced explanations for the Balanced COPA dataset

Related tags

Overview

COPA-SSE

Data format

Example entry:

Aggregated versions

Example entry:

Citation

Owner

Ana Brassard

Stacked Hourglass Network with a Multi-level Attention Mechanism: Where to Look for Intervertebral Disc Labeling

Orange Chicken: Data-driven Model Generalizability in Crosslinguistic Low-resource Morphological Segmentation

Code for Talk-to-Edit (ICCV2021). Paper: Talk-to-Edit: Fine-Grained Facial Editing via Dialog.

SOTA model in CIFAR10

The ARCA23K baseline system

Streamlit component for TensorBoard, TensorFlow's visualization toolkit

This repo contains the official code and pre-trained models for the Dynamic Vision Transformer (DVT).

Catalyst.Detection

PyTorch Implementation of PIXOR: Real-time 3D Object Detection from Point Clouds

This implements one of result networks from Large-scale evolution of image classifiers

A privacy-focused, intelligent security camera system.

exponential adaptive pooling for PyTorch

Estimating and Exploiting the Aleatoric Uncertainty in Surface Normal Estimation

[CVPR2021] The source code for our paper 《Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Learning》.

Improving XGBoost survival analysis with embeddings and debiased estimators

Code for our paper "Interactive Analysis of CNN Robustness"

ZeroVL - The official implementation of ZeroVL

Fast methods to work with hydro- and topography data in pure Python.

The "breathing k-means" algorithm with datasets and example notebooks

Investigating Attention Mechanism in 3D Point Cloud Object Detection (arXiv 2021)