GCRC: A Gaokao Chinese Reading Comprehension dataset for interpretable Evaluation

Related tags

Text Data & NLPGCRC
Overview

GCRC

GCRC: A New Challenging MRC Dataset from Gaokao Chinese for Explainable Evaluation

Introduction

Currently, machine reading comprehension models have made exciting progress, driven by a large number of publicly available data sets. However, the real language comprehension capabilities of models are far from what people expect, and most of the data sets provide black-box evaluations that fail to diagnose whether the system is based on correct reasoning processes. In order to alleviate these problems and promote machine intelligence to humanoid intelligence, Shanxi University focuses on the more diverse and challenging reading comprehension tasks of the college entrance examination, and attempts to evaluate machine intelligence effectively and practically based on standardized human tests. We collected gaokao reading comprehension test questions in the past 10 years and constructed a datasets which is GCRC(A New MRC Dataset from Gaokao Chinese for Explainable Evaluation) containing more than 5000 texts and more than 8,700 multiple-choice questions (about 15,000 options). The datasets is annotated three kinds of information: the sentence level support fact, interference item’s error cause and the reasoning skills required to answer questions. Related experiments show that this datasets is more challenging, which is very useful for diagnosing system limitations in an interpretable manner, and will help researchers develop new machine learning and reasoning methods to solve these challenging problems in the future.

Leaderboard

GCRC Leaderboard for Explainable Evaluation

Paper

GCRC: A New Challenging MRC Dataset from Gaokao Chinese for Explainable Evaluation. ACL 2021 Findings.

Data Size

Train:6,994 questions;Dev:863 questions;Test:862 questions

Data Format

Each instance is composed of id (id, a string), title (title, a string), passage (passage, a string), question(question, a string), options (options, a list, representing the contents of A, B, C, and D, respectively), evidences (evidences, a list, representing the contents of the supporting sentence in the original text of A, B, C and D, respectively), reasoning_ability(reasoning_ability, a list,representing the reasoning ability required to answer questions of A, B, C and D, respectively), error_type (error_type, a list, representing the Error reason of A, B, C and D, respectively), answer(answer,a string).

Example

{
  "id": "gcrc_4916_8172", 
  "title": "我们需要怎样的科学素养", 
  "passage": "第八次中国公民科学素养调查显示,2010年,我国具备...激励科技创新、促进创新型国家建设,我们任重道远。", 
  "question": "下列对“我们需要怎样的科学素养”的概括,不正确的一项是", 
  "options":  [
    "科学素养是一项基本公民素质,公民科学素养可以从科学知识、科学方法和科学精神三个方面来衡量。",
    "不仅需要掌握足够的科学知识、科学方法,更需要具备学习、理解、表达、参与和决策科学事务的能力。",
    "应该明白科学技术需要控制,期望科学技术解决哪些问题,希望所纳的税费使用于科学技术的哪些方面。", 
    "需要具备科学的思维和科学的精神,对科学技术能持怀疑态度,对于媒体信息具有质疑精神和过滤功能。"
  ],
  "evidences": [
    ["公民科学素养可以从三个方面衡量:科学知识、科学方法和科学精神。", "在“建设创新型国家”的语境中,科学素养作为一项基本公民素质的重要性不言而喻。"],
    ["一个具备科学素养的公民,不仅应该掌握足够的科学知识、科学方法,更需要强调科学的思维、科学的精神,理性认识科技应用到社会中可能产生的影响,进而具备学习、理解、表达、参与和决策科学事务的能力。"], 
    ["西方发达国家不仅测试公众对科学技术与社会、经济、文化等各方面关系的看法,更考察公众对科学技术是否持怀疑态度,是否认为科学技术需要控制,期望科学技术解决哪些问题,希望所纳的税费使用于科学技术的哪些方面等。"], 
    ["甚至还有国家专门测试公众对于媒体信息是否具有质疑精神和过滤功能。", "西方发达国家不仅测试公众对科学技术与社会、经济、文化等各方面关系的看法,更考察公众对科学技术是否持怀疑态度,是否认为科学技术需要控制,期望科学技术解决哪些问题,希望所纳的税费使用于科学技术的哪些方面等。"]
   ],
  "error_type": ["E", "", "", ""],
  "answer": "A",
}

Evaluation Code

The prediction result needs to be consistent with the format of the training set.

python eval.py prediction_file test_private_file

Participants are required to complete the following tasks: Task 1: Output the answer to the question. Task 2: Output the sentence-level supporting facts(SFs) that support the answer to the question, that is, the original supporting sentences for each option. Task 3: Output the error cause of the interference option. There are 7 reasons for the error in this evaluation: 1) Wrong details; 2) Wrong temporal properties; 3) Wrong subject-predicate-object triple relationship; 4) Wrong necessary and sufficient conditions; 5) Wrong causality; 6) Irrelevant to the question; 7) Irrelevant to the article. The evaluation metrics are Task1_Acc, Task2_F1,Task3_Acc(The accuracy of error reason identification),and the output is in dictionary format.

return {"Task1_Acc":_, " Task2_F1":_, "Task3_Acc":_}

Author List

Hongye Tan, Xiaoyue Wang, Yu Ji, Ru Li, Xiaoli Li, Zhiwei Hu, Yunxiao Zhao, Xiaoqi Han.

Institutions

Shanxi University

Citation

Please kindly cite our paper if the work is helpful.

@inproceedings{tan-etal-2021-gcrc,
    title = "{GCRC}: A New Challenging {MRC} Dataset from {G}aokao {C}hinese for Explainable Evaluation",
    author = "Tan, Hongye  and
      Wang, Xiaoyue  and
      Ji, Yu  and
      Li, Ru  and
      Li, Xiaoli  and
      Hu, Zhiwei  and
      Zhao, Yunxiao  and
      Han, Xiaoqi",
    booktitle = "Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021",
    month = aug,
    year = "2021",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.findings-acl.113",
    doi = "10.18653/v1/2021.findings-acl.113",
    pages = "1319--1330",
}
Owner
Yunxiao Zhao
Yunxiao Zhao
An open source framework for seq2seq models in PyTorch.

pytorch-seq2seq Documentation This is a framework for sequence-to-sequence (seq2seq) models implemented in PyTorch. The framework has modularized and

International Business Machines 1.4k Jan 02, 2023
A natural language modeling framework based on PyTorch

Overview PyText is a deep-learning based NLP modeling framework built on PyTorch. PyText addresses the often-conflicting requirements of enabling rapi

Meta Research 6.4k Jan 08, 2023
YACLC - Yet Another Chinese Learner Corpus

汉语学习者文本多维标注数据集YACLC V1.0 中文 | English 汉语学习者文本多维标注数据集(Yet Another Chinese Learner

BLCU-ICALL 47 Dec 15, 2022
Simple program that translates the name of files into English

Simple program that translates the name of files into English. Useful for when editing/inspecting programs that were developed in a foreign language.

0 Dec 22, 2021
Code for "Parallel Instance Query Network for Named Entity Recognition", accepted at ACL 2022.

README Code for Two-stage Identifier: "Parallel Instance Query Network for Named Entity Recognition", accepted at ACL 2022. For details of the model a

Yongliang Shen 45 Nov 29, 2022
Nateve compiler developed with python.

Adam Adam is a Nateve Programming Language compiler developed using Python. Nateve Nateve is a new general domain programming language open source ins

Nateve 7 Jan 15, 2022
Extract city and country mentions from Text like GeoText without regex, but FlashText, a Aho-Corasick implementation.

flashgeotext ⚡ 🌍 Extract and count countries and cities (+their synonyms) from text, like GeoText on steroids using FlashText, a Aho-Corasick impleme

Ben 57 Dec 16, 2022
ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.

ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.

Antlr Project 13.6k Jan 05, 2023
NLP command-line assistant powered by OpenAI

NLP command-line assistant powered by OpenAI

Axel 16 Dec 09, 2022
Twitter Sentiment Analysis using #tag, words and username

Twitter Sentment Analysis Web App using #tag, words and username to fetch data finds Insides of data and Tells Sentiment of the perticular #tag, words or username.

Kumar Saksham 26 Dec 25, 2022
:P Some basic stuff I'm gonna use for my upcoming Agile Software Development and Devops

reverse-image-search-py bash script.sh img_name.jpg Requirements pip install requests pip install pyshorteners Dry run [ Sudhanva M 3 Dec 18, 2021

Py65 65816 - Add support for the 65C816 to py65

Add support for the 65C816 to py65 Py65 (https://github.com/mnaberez/py65) is a

4 Jan 04, 2023
Türkçe küfürlü içerikleri bulan bir yapay zeka kütüphanesi / An ML library for profanity detection in Turkish sentences

"Kötü söz sahibine aittir." -Anonim Nedir? sinkaf uygunsuz yorumların bulunmasını sağlayan bir python kütüphanesidir. Farkı nedir? Diğer algoritmalard

KaraGoz 4 Feb 18, 2022
This is the Alpha of Nutte language, she is not complete yet / Essa é a Alpha da Nutte language, não está completa ainda

nutte-language This is the Alpha of Nutte language, it is not complete yet / Essa é a Alpha da Nutte language, não está completa ainda My language was

catdochrome 2 Dec 18, 2021
BiNE: Bipartite Network Embedding

BiNE: Bipartite Network Embedding This repository contains the demo code of the paper: BiNE: Bipartite Network Embedding. Ming Gao, Leihui Chen, Xiang

leihuichen 214 Nov 24, 2022
Blazing fast language detection using fastText model

Luga A blazing fast language detection using fastText's language models Luga is a Swahili word for language. fastText provides a blazing fast language

Prayson Wilfred Daniel 18 Dec 20, 2022
Contains links to publicly available datasets for modeling health outcomes using speech and language.

speech-nlp-datasets Contains links to publicly available datasets for modeling various health outcomes using speech and language. Speech-based Corpora

Tuka Alhanai 77 Dec 07, 2022
A repo for materials relating to the tutorial of CS-332 NLP

CS-332-NLP A repo for materials relating to the tutorial of CS-332 NLP Contents Tutorial 1: Introduction Corpus Regular expression Tokenization Tutori

Alok singh 9 Feb 15, 2022
Bidirectional Variational Inference for Non-Autoregressive Text-to-Speech (BVAE-TTS)

Bidirectional Variational Inference for Non-Autoregressive Text-to-Speech (BVAE-TTS) Yoonhyung Lee, Joongbo Shin, Kyomin Jung Abstract: Although early

LEE YOON HYUNG 147 Dec 05, 2022
Fine-tuning scripts for evaluating transformer-based models on KLEJ benchmark.

The KLEJ Benchmark Baselines The KLEJ benchmark (Kompleksowa Lista Ewaluacji Językowych) is a set of nine evaluation tasks for the Polish language und

Allegro Tech 17 Oct 18, 2022