NUANCED is a user-centric conversational recommendation dataset that contains 5.1k annotated dialogues and 26k high-quality user turns.

Related tags

Deep Learningnuanced
Overview

NUANCED: Natural Utterance Annotation for Nuanced Conversation with Estimated Distributions

Overview

NUANCED is a user-centric conversational recommendation dataset that contains 5.1k annotated dialogues and 26k high-quality user turns. The dataset focuses on realistic settings where user preferences are extracted from real-world Yelp Open Dataset and paraphrased into natural user responses.

Existing conversational systems are mostly agent-centric, which assumes the user utterances would closely follow the system ontology (for NLU or dialogue state tracking). However, in real-world scenarios, it is highly desirable that the users can speak freely in their own way. It is extremely hard, if not impossible, for the users to adapt to the unknown system ontology.

In this work, we attempt to build a user-centric dialogue system. As there is no clean mapping for a user’s free form utterance to an ontology, we first model the user preferences as estimated distributions over the system ontology and map the users’ utterances to such distributions. Learning such a mapping poses new challenges on reasoning over existing knowledge, ranging from factoid knowledge, commonsense knowledge to the users’ own situations. To this end, we build a new dataset named NUANCED that focuses on such realistic settings for conversational recommendation. We believe NUANCED can serve as a valuable resource to push existing research from the agent-centric system to the user-centric system.

For more details, please refer to the following two papers:
NUANCED: Natural Utterance Annotation for Nuanced Conversation with Estimated Distributions
User Memory Reasoning for Conversational Recommendation

Examples of traditional dataset and NUANCED

Examples of traditional dataset and NUANCED: in real-world scenarios, the free form user utterances often mismatch with system ontology. In NUANCED, we model the user preferences (or dialogue state) as distributions over the ontology, therefore to allow mapping of entities unknown to the system to multiple values and slots for efficient conversation.

Data

In this data release, we have included both the nuanced version where user preferences are mapped to an estimated distribution and the coarse version where user preferences are mapped to discrete slot labels according to system ontology.

  • Folder data_dist: the nuanced version;
  • Folder data_discrete: the coarse version with 0-1 labels;
  • meta.json: ontology for this restaurant domain;

Format for the dataset: A list of dictionaries, with each dictionary as one dialogue of the following important fields:

  • "dialogue": a list of dialog turns. Each turn has the following fields:
  • "role": user or assistant
  • "text": user utterance or system response
  • "dialog_acts": acts of this turn
  • "slots": slots involved in this turn
  • "dist": for user turn, the preference distribution
  • "strategy": strategy 1 means the user utterance does not have grounded ontology terms (implicit reasoning), strategy 2 means the user utterance has grounded ontology terms

Citations

If you want to publish experimental results with our datasets or use the baseline models, please cite the following articles (pdf, pdf):

@article{chen2020nuanced,
  title={NUANCED: Natural Utterance Annotation for Nuanced Conversation with Estimated Distributions},
  author={Chen, Zhiyu and Liu, Honglei and Xu, Hu and Moon, Seungwhan and Zhou, Hao and Liu, Bing},
  journal={arXiv preprint arXiv:2010.12758},
  year={2020}
}
@inproceedings{xu2020user,
  title={User Memory Reasoning for Conversational Recommendation},
  author={Xu, Hu and Moon, Seungwhan and Liu, Honglei and Liu, Bing and Shah, Pararth and Philip, S Yu},
  booktitle={Proceedings of the 28th International Conference on Computational Linguistics},
  pages={5288--5308},
  year={2020}
}

License

NUANCED is released under CC-BY-NC-4.0, see LICENSE for details.

Owner
Facebook Research
Facebook Research
This repository is to support contributions for tools for the Project CodeNet dataset hosted in DAX

The goal of Project CodeNet is to provide the AI-for-Code research community with a large scale, diverse, and high quality curated dataset to drive innovation in AI techniques.

International Business Machines 1.2k Jan 04, 2023
Code for our paper: Online Variational Filtering and Parameter Learning

Variational Filtering To run phi learning on linear gaussian (Fig1a) python linear_gaussian_phi_learning.py To run phi and theta learning on linear g

16 Aug 14, 2022
Code of U2Fusion: a unified unsupervised image fusion network for multiple image fusion tasks, including multi-modal, multi-exposure and multi-focus image fusion.

U2Fusion Code of U2Fusion: a unified unsupervised image fusion network for multiple image fusion tasks, including multi-modal (VIS-IR, medical), multi

Han Xu 129 Dec 11, 2022
IndoNLI: A Natural Language Inference Dataset for Indonesian

IndoNLI: A Natural Language Inference Dataset for Indonesian This is a repository for data and code accompanying our EMNLP 2021 paper "IndoNLI: A Natu

15 Feb 10, 2022
Source code for paper: Knowledge Inheritance for Pre-trained Language Models

Knowledge-Inheritance Source code paper: Knowledge Inheritance for Pre-trained Language Models (preprint). The trained model parameters (in Fairseq fo

THUNLP 31 Nov 19, 2022
Learning Continuous Signed Distance Functions for Shape Representation

DeepSDF This is an implementation of the CVPR '19 paper "DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation" by Park et a

Meta Research 1.1k Jan 01, 2023
Pytorch implementations of Bayes By Backprop, MC Dropout, SGLD, the Local Reparametrization Trick, KF-Laplace, SG-HMC and more

Bayesian Neural Networks Pytorch implementations for the following approximate inference methods: Bayes by Backprop Bayes by Backprop + Local Reparame

1.4k Jan 07, 2023
Much faster than SORT(Simple Online and Realtime Tracking), a little worse than SORT

QSORT QSORT(Quick + Simple Online and Realtime Tracking) is a simple online and realtime tracking algorithm for 2D multiple object tracking in video s

Yonghye Kwon 8 Jul 27, 2022
A computer vision pipeline to identify the "icons" in Christian paintings

Christian-Iconography A computer vision pipeline to identify the "icons" in Christian paintings. A bit about iconography. Iconography is related to id

Rishab Mudliar 3 Jul 30, 2022
A web application that provides real time temperature and humidity readings of a house.

About A web application which provides real time temperature and humidity readings of a house. If you're interested in the data collected so far click

Ben Thompson 3 Jan 28, 2022
Official PyTorch implementation of "Uncertainty-Based Offline Reinforcement Learning with Diversified Q-Ensemble" (NeurIPS'21)

Uncertainty-Based Offline Reinforcement Learning with Diversified Q-Ensemble This is the code for reproducing the results of the paper Uncertainty-Bas

43 Nov 23, 2022
TeST: Temporal-Stable Thresholding for Semi-supervised Learning

TeST: Temporal-Stable Thresholding for Semi-supervised Learning TeST Illustration Semi-supervised learning (SSL) offers an effective method for large-

Xiong Weiyu 1 Jul 14, 2022
maximal update parametrization (µP)

Maximal Update Parametrization (μP) and Hyperparameter Transfer (μTransfer) Paper link | Blog link In Tensor Programs V: Tuning Large Neural Networks

Microsoft 694 Jan 03, 2023
A pytorch implementation of Paper "Improved Training of Wasserstein GANs"

WGAN-GP An pytorch implementation of Paper "Improved Training of Wasserstein GANs". Prerequisites Python, NumPy, SciPy, Matplotlib A recent NVIDIA GPU

Marvin Cao 1.4k Dec 14, 2022
Deploy a ML inference service on a budget in less than 10 lines of code.

BudgetML is perfect for practitioners who would like to quickly deploy their models to an endpoint, but not waste a lot of time, money, and effort trying to figure out how to do this end-to-end.

1.3k Dec 25, 2022
Official codebase for "B-Pref: Benchmarking Preference-BasedReinforcement Learning" contains scripts to reproduce experiments.

B-Pref Official codebase for B-Pref: Benchmarking Preference-BasedReinforcement Learning contains scripts to reproduce experiments. Install conda env

48 Dec 20, 2022
Reinforcement Learning for finance

Reinforcement Learning for Finance We apply reinforcement learning for stock trading. Fetch Data Example import utils # fetch symbols from yahoo fina

Tomoaki Fujii 159 Jan 03, 2023
Pytorch re-implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition (CVPR 2022)

SwinTextSpotter This is the pytorch implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text R

mxin262 183 Jan 03, 2023
Problem-943.-ACMP - Problem 943. ACMP

Problem-943.-ACMP В "main.py" расположен вариант моего решения задачи 943 с серв

Konstantin Dyomshin 2 Aug 19, 2022
Optimizing synthesizer parameters using gradient approximation

Optimizing synthesizer parameters using gradient approximation NASH 2021 Hackathon! These are some experiments I conducted during NASH 2021, the Neura

Jordie Shier 10 Feb 10, 2022