Findings of ACL 2021

Last update: Feb 24, 2022

Overview

Assessing Dialogue Systems with Distribution Distances

We propose to measure the performance of a dialogue system by computing the distributionwise distance between its generated conversations and real-world conversations.

To appear in Findings of ACL 2021.

Note that this is not an officially supported Tencent product.

1. Configuratin

This repository requires the packages:

pytorch
huggingface/transformers.

2. Usage

To evaluate the system-level human correlations of metrics:

python eval_metric.py \
  --data_path ./datasets/convai2_annotation.json \
  --metric fbd \
  --sample_num 10 \
  --model_type roberta-base \
  --batch_size 32

Currently, our repo supports the common metrics used in text generation field, inclduing bleu, meteor, rouge, greedy, average, extrema, bert_score, fbd and prd.

Here are some details of the six corpura compared in the main paper:

File Name	Dataset Name	Num. of Samples	Reference
`personam_annotation.json`	Persona(M)	60	Shikib/usr
`dailyh_annotation.json`	Daily(H)	150	li3cmz/GRADE
`convai2_annotation.json`	Convai2	150	li3cmz/GRADE
`empathetic_annotation.json`	Empathetic	150	li3cmz/GRADE
`dailyz_annotation.json`	Daily(Z)	100	ZHAOTING/dialog-processing
`personaz_annotation.json`	Persona(Z)	150	ZHAOTING/dialog-processing

Citation

If you use this research/codebase/dataset, please cite our paper:

@article{xiang2021assessing,
  title={Assessing Dialogue Systems with Distribution Distances},
  author={Xiang, Jiannan and Liu, Yahui and Cai, Deng and Li, Huayang and Lian, Defu and Liu, Lemao},
  journal={arXiv preprint arXiv:2105.02573},
  year={2021}
}

Other related papers:

[1] FID, GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium, NIPS 2017
[2] PRD, Assessing Generative Models via Precision and Recall, NIPS 2018
[3] BERTScore, BERTScore: Evaluating Text Generation with BERT, ICLR 2020

Findings of ACL 2021

Related tags

Overview

Assessing Dialogue Systems with Distribution Distances

1. Configuratin

2. Usage

Citation

Owner

Yahui Liu

Implementation of Multistream Transformers in Pytorch

Research code for the paper "Fine-tuning wav2vec2 for speaker recognition"

This library is testing the ethics of language models by using natural adversarial texts.

A library for finding knowledge neurons in pretrained transformer models.

The (extremely) naive sentiment classification function based on NBSVM trained on wisesight_sentiment

This simple Python program calculates a love score based on your and your crush's full names in English

Maha is a text processing library specially developed to deal with Arabic text.

text to speech toolkit. 好用的中文语音合成工具箱，包含语音编码器、语音合成器、声码器和可视化模块。

spaCy-wrap: For Wrapping fine-tuned transformers in spaCy pipelines

KoBERT - Korean BERT pre-trained cased (KoBERT)

Repositório do trabalho de introdução a NLP

QVHighlights: Detecting Moments and Highlights in Videos via Natural Language Queries

Implementation of TF-IDF algorithm to find documents similarity with cosine similarity

ProteinBERT is a universal protein language model pretrained on ~106M proteins from the UniRef90 dataset.

Data loaders and abstractions for text and NLP

Textlesslib - Library for Textless Spoken Language Processing

This repository contains the official release of the model "BanglaBERT" and associated downstream finetuning code and datasets introduced in the paper titled "BanglaBERT: Combating Embedding Barrier in Multilingual Models for Low-Resource Language Understanding".

This project deals with a simplified version of a more general problem of Aspect Based Sentiment Analysis.

SpeechBrain is an open-source and all-in-one speech toolkit based on PyTorch.

Rhyme with AI