Amazon Multilingual Counterfactual Dataset (AMCD)

Last update: Sep 20, 2022

Overview

Amazon Multilingual Counterfactual Dataset (AMCD)

This repository contains a dataset described in the paper:

I Wish I Would Have Loved This One, But I Didn’t – A Multilingual Dataset for Counterfactual Detection in Product Reviews. James O’Neill, Polina Rozenshtein, Ryuichi Kiryo, Motoko Kubota, Danushka Bollegala. EMNLP'21. arxiv version

The dataset contains sentences from Amazon customer reviews (sampled from Amazon product review dataset) annotated for counterfactual detection (CFD) binary classification. Counterfactual statements describe events that did not or cannot take place. Counterfactual statements may be identified as statements of the form – If p was true, then q would be true (i.e. assertions whose antecedent (p) and consequent (q) are known or assumed to be false).

The key features of this dataset are:

The dataset is multilingual and contains sentences in English, German, and Japanese.
The labeling was done by professional linguists and high quality was ensured.
The dataset is supplemented with the annotation guidelines and definitions, which were worked out by professional linguists. We also provide the clue word lists, which are typical for counterfactual sentences and were used for initial data filtering. The clue word lists were also compiled by professional linguists.

Please see paper for the data statistics, detailed description of data collection and annotation.

For the dataset format please see README.txt.

Cite

If you use this dataset in your research, please cite the paper.

License Summary

The documentation is made available under the Creative Commons Attribution-ShareAlike 4.0 International License. See the LICENSE file.

Amazon Multilingual Counterfactual Dataset (AMCD)

Related tags

Overview

Amazon Multilingual Counterfactual Dataset (AMCD)

Cite

License Summary

Owner

Repository for the paper "Optimal Subarchitecture Extraction for BERT"

☀️ Measuring the accuracy of BBC weather forecasts in Honolulu, USA

MILES is a multilingual text simplifier inspired by LSBert - A BERT-based lexical simplification approach proposed in 2018. Unlike LSBert, MILES uses the bert-base-multilingual-uncased model, as well as simple language-agnostic approaches to complex word identification (CWI) and candidate ranking.

Mednlp - Medical natural language parsing and utility library

A Python module made to simplify the usage of Text To Speech and Speech Recognition.

TalkNet: Audio-visual active speaker detection Model

NL-Augmenter 🦎 → 🐍 A Collaborative Repository of Natural Language Transformations

Deal or No Deal? End-to-End Learning for Negotiation Dialogues

Associated Repository for "Translation between Molecules and Natural Language"

A fast and easy implementation of Transformer with PyTorch.

Conversational-AI-ChatBot - Intelligent ChatBot built with Microsoft's DialoGPT transformer to make conversations with human users!

The Classical Language Toolkit

Tools to download and cleanup Common Crawl data

Codename generator using WordNet parts of speech database

A library for finding knowledge neurons in pretrained transformer models.

ttslearn: Library for Pythonで学ぶ音声合成 (Text-to-speech with Python)

Rank-One Model Editing for Locating and Editing Factual Knowledge in GPT

A library for Multilingual Unsupervised or Supervised word Embeddings

Web Scraping, Document Deduplication & GPT-2 Fine-tuning with a newly created scam dataset.

HAN2HAN : Hangul Font Generation