INSPIRED: A Transparent Dialogue Dataset for Interactive Semantic Parsing

Existing studies on semantic parsing focus primarily on mapping a natural-language utterance to a corresponding logical form in one turn. However, because natural language can contain a great deal of ambiguity and variability, this is a difficult challenge. In this work, we investigate an interactive semantic parsing framework that explains the predicted logical form step by step in natural language and enables the user to make corrections through natural-language feedback for individual steps. We focus on question answering over knowledge bases (KBQA) as an instantiation of our framework, aiming to increase the transparency of the parsing process and help the user appropriately trust the final answer. To do so, we construct INSPIRED, a crowdsourced dialogue dataset derived from the ComplexWebQuestions dataset.

This repository will contain the dataset and code for our paper Towards Transparent Interactive Semantic Parsing via Step-by-Step Correction.

Data

Dataset Download

The dataset can be downloaded under this path: ./data/dataset.jsonl

Data Structure

In the dataset file, each line is a dictionary with several keys:

{
    "id": "ID number",
    "cwq_question": "Original complex question in CWQ dataset",
    "rephrased_question": "Rephrased complex question by workers",
    "rephrased_question_label": " 'Replacement' or 'Alternative' ",
    "question": "If rephrased_question_label is marked as 'Replacement', set the value the same as rephrased_question; Otherwise, set it the same as cwq_question",
    "final_answer": "Final answer for the complex question",
    "gold_parse": "Gold sparql query for complex question",
    "preprocessed_gold_parse": "Preprocessed gold parse with entities and prefix replaced",
    "predicted_parse": "Predicted sparql query by initial semantic parser",
    "gold_sub_lfs": "A list of gold sub-logical forms after decomposition",
    "pred_sub_lfs": "A list of predicted sub-logical forms after decomposition",
    "gold_sub_qs": [
        {
          "sub_id": "ID of sub questions",
          "sub_question": "Rephrased sub question",
          "temp_sub_question": "Templated sub question for gold sub-logical form",
          "answer": "Answer for each sub question",
        }, "..."], 
    "pred_sub_qs": [
        {
          "sub_id": "ID of sub questions",
          "sub_question": "Rephrased sub question",
          "temp_sub_question": "Templated sub question for predicted sub-logical form",
          "answer": "Answer for each sub question",
        }, "..."], 
    "feedback": "A list of human feedback"
    
}

INSPIRED: A Transparent Dialogue Dataset for Interactive Semantic Parsing

Related tags

Overview

INSPIRED: A Transparent Dialogue Dataset for Interactive Semantic Parsing

Data

Dataset Download

Data Structure

Owner

SemiNAS: Semi-Supervised Neural Architecture Search

Differentiable scientific computing library

neural image generation

Source code for the BMVC-2021 paper "SimReg: Regression as a Simple Yet Effective Tool for Self-supervised Knowledge Distillation".

PyExplainer: A Local Rule-Based Model-Agnostic Technique (Explainable AI)

This toolkit provides codes to download and pre-process the SLUE datasets, train the baseline models, and evaluate SLUE tasks.

POPPY (Physical Optics Propagation in Python) is a Python package that simulates physical optical propagation including diffraction

Frequency Spectrum Augmentation Consistency for Domain Adaptive Object Detection

Official Repository for our ICCV2021 paper: Continual Learning on Noisy Data Streams via Self-Purified Replay

More Photos are All You Need: Semi-Supervised Learning for Fine-Grained Sketch Based Image Retrieval

Teaches a student network from the knowledge obtained via training of a larger teacher network

Reproduced Code for Image Forgery Detection papers.

ALL Snow Removed: Single Image Desnowing Algorithm Using Hierarchical Dual-tree Complex Wavelet Representation and Contradict Channel Loss (HDCWNet)

This repository contains code and data for "On the Multimodal Person Verification Using Audio-Visual-Thermal Data"

Building a real-time environment using webcam frame division in OpenCV and classify cropped images using a fine-tuned vision transformers on hybryd datasets samples for facial emotion recognition.

PyTorch Implementation of Temporal Output Discrepancy for Active Learning, ICCV 2021

Pytorch implementation of SenFormer: Efficient Self-Ensemble Framework for Semantic Segmentation

Project NII pytorch scripts

Official implementation for Likelihood Regret: An Out-of-Distribution Detection Score For Variational Auto-encoder at NeurIPS 2020

PowerGridworld: A Framework for Multi-Agent Reinforcement Learning in Power Systems