Bidimensional Leaderboards: Generate and Evaluate Language Hand in Hand

Overview

Bidimensional Leaderboards: Generate and Evaluate Language Hand in Hand

billboard

Introduction

We propose a generalization of leaderboards, bidimensional leaderboards (Billboards), that simultaneously drives progress in language generation tasks and their evaluation. We accept two types of submissions:

  • Generator developers submit output text. A Billboard computes all metric scores.
  • Metric developers submit an executable program. A Billboard computes correlations with the human judgments, updates the ensemble metric, and measures how much it overrates machine over human generations.

Anonymous submissions are allowed!!

Submit

Submission guides and examples are available here.

Scoring Results

Scoring results for all past public submissions are available here. We have generator-name||metric-name.csv files from the Cartesian product between the generators and metrics: each contains instance-level scores.

Citations

Bidimesional Leaderboards

@misc{kasai2021billboard,
    title   = {Bidimensional Leaderboards: Generate and Evaluate Language Hand in Hand},
    author  = {Jungo Kasai and Keisuke Sakaguchi and Ronan Le Bras and Lavinia Dunagan and Jacob Morrison and Alexander R. Fabbri and Yejin Choi and Noah A. Smith},
    year    = {2021},
    url     = {https://arxiv.org/abs/2112.04139}, 
}

MSCOCO Captioning Evaluations and THumB 1.0 Protocol

@misc{kasai2021thumb,
    title   = {Transparent Human Evaluation for Image Captioning},
    author  = {Jungo Kasai and Keisuke Sakaguchi and Lavinia Dunagan and Jacob Morrison and Ronan Le Bras and Yejin Choi and Noah A. Smith},
    year    = {2021},
    url     = {https://arxiv.org/abs/2111.08940}, 
}

CNNDM Summarization Evaluations

@article{fabbri2021summeval,
    title   = {{SummEval}: Re-evaluating Summarization Evaluation},
    author  = {Fabbri, Alexander R and Kry{\'s}ci{\'n}ski, Wojciech and McCann, Bryan and Xiong, Caiming and Socher, Richard and Radev, Dragomir},
    journal = {TACL},
    year    = {2021},
    url     = {https://arxiv.org/abs/2007.12626},
}

WMT20 ZH-EN/EN-DE Machine Translation Evaluations

@misc{freitag2021experts,
      title={Experts, Errors, and Context: A Large-Scale Study of Human Evaluation for Machine Translation}, 
      author={Markus Freitag and George Foster and David Grangier and Viresh Ratnakar and Qijun Tan and Wolfgang Macherey},
      year={2021},
      url={https://arxiv.org/abs/2104.14478},
}

AI2 Logo             UWNLP Logo             Salesforce Logo

Code and data of the ACL 2021 paper: Few-Shot Text Ranking with Meta Adapted Synthetic Weak Supervision

MetaAdaptRank This repository provides the implementation of meta-learning to reweight synthetic weak supervision data described in the paper Few-Shot

THUNLP 5 Jun 16, 2022
Implementation of the Chamfer Distance as a module for pyTorch

Chamfer Distance for pyTorch This is an implementation of the Chamfer Distance as a module for pyTorch. It is written as a custom C++/CUDA extension.

Christian Diller 205 Jan 05, 2023
DaReCzech is a dataset for text relevance ranking in Czech

Dataset DaReCzech is a dataset for text relevance ranking in Czech. The dataset consists of more than 1.6M annotated query-documents pairs,

Seznam.cz a.s. 8 Jul 26, 2022
Learning Chinese Character style with conditional GAN

zi2zi: Master Chinese Calligraphy with Conditional Adversarial Networks Introduction Learning eastern asian language typefaces with GAN. zi2zi(字到字, me

Yuchen Tian 2.2k Jan 02, 2023
Keyhole Imaging: Non-Line-of-Sight Imaging and Tracking of Moving Objects Along a Single Optical Path

Keyhole Imaging Code & Dataset Code associated with the paper "Keyhole Imaging: Non-Line-of-Sight Imaging and Tracking of Moving Objects Along a Singl

Stanford Computational Imaging Lab 20 Feb 03, 2022
FSL-Mate: A collection of resources for few-shot learning (FSL).

FSL-Mate is a collection of resources for few-shot learning (FSL). In particular, FSL-Mate currently contains FewShotPapers: a paper list which tracks

Yaqing Wang 1.5k Jan 08, 2023
A Python implementation of active inference for Markov Decision Processes

A Python package for simulating Active Inference agents in Markov Decision Process environments. Please see our companion preprint on arxiv for an ove

235 Dec 21, 2022
VACA: Designing Variational Graph Autoencoders for Interventional and Counterfactual Queries

VACA Code repository for the paper "VACA: Designing Variational Graph Autoencoders for Interventional and Counterfactual Queries (arXiv)". The impleme

Pablo Sánchez-Martín 16 Oct 10, 2022
The code of NeurIPS 2021 paper "Scalable Rule-Based Representation Learning for Interpretable Classification".

Rule-based Representation Learner This is a PyTorch implementation of Rule-based Representation Learner (RRL) as described in NeurIPS 2021 paper: Scal

Zhuo Wang 53 Dec 17, 2022
A library for preparing, training, and evaluating scalable deep learning hybrid recommender systems using PyTorch.

collie_recs Collie is a library for preparing, training, and evaluating implicit deep learning hybrid recommender systems, named after the Border Coll

ShopRunner 97 Jan 03, 2023
SIEM Logstash parsing for more than hundred technologies

LogIndexer Pipeline Logstash Parsing Configurations for Elastisearch SIEM and OpenDistro for Elasticsearch SIEM Why this project exists The overhead o

146 Dec 29, 2022
Deep Reinforcement Learning for mobile robot navigation in ROS Gazebo simulator

DRL-robot-navigation Deep Reinforcement Learning for mobile robot navigation in ROS Gazebo simulator. Using Twin Delayed Deep Deterministic Policy Gra

87 Jan 07, 2023
Building blocks for uncertainty-aware cycle consistency presented at NeurIPS'21.

UncertaintyAwareCycleConsistency This repository provides the building blocks and the API for the work presented in the NeurIPS'21 paper Robustness vi

EML Tübingen 19 Dec 12, 2022
Fedlearn支持前沿算法研发的Python工具库 | Fedlearn algorithm toolkit for researchers

FedLearn-algo Installation Development Environment Checklist python3 (3.6 or 3.7) is required. To configure and check the development environment is c

89 Nov 14, 2022
DFFNet: An IoT-perceptive Dual Feature Fusion Network for General Real-time Semantic Segmentation

DFFNet Paper DFFNet: An IoT-perceptive Dual Feature Fusion Network for General Real-time Semantic Segmentation. Xiangyan Tang, Wenxuan Tu, Keqiu Li, J

4 Sep 23, 2022
Codes of paper "Unseen Object Amodal Instance Segmentation via Hierarchical Occlusion Modeling"

Unseen Object Amodal Instance Segmentation (UOAIS) Seunghyeok Back, Joosoon Lee, Taewon Kim, Sangjun Noh, Raeyoung Kang, Seongho Bak, Kyoobin Lee This

GIST-AILAB 92 Dec 13, 2022
Video-based open-world segmentation

UVO_Challenge Team Alpes_runner Solutions This is an official repo for our UVO Challenge solutions for Image/Video-based open-world segmentation. Our

Yuming Du 84 Dec 22, 2022
Method for facial emotion recognition compitition of Xunfei and Datawhale .

人脸情绪识别挑战赛-第3名-W03KFgNOc-源代码、模型以及说明文档 队名:W03KFgNOc 排名:3 正确率: 0.75564 队员:yyMoming,xkwang,RichardoMu。 比赛链接:人脸情绪识别挑战赛 文章地址:link emotion 该项目分别训练八个模型并生成csv文

6 Oct 17, 2022
Meta graph convolutional neural network-assisted resilient swarm communications

Resilient UAV Swarm Communications with Graph Convolutional Neural Network This repository contains the source codes of Resilient UAV Swarm Communicat

62 Dec 06, 2022
automated systems to assist guarding corona Virus precautions for Closed Rooms (e.g. Halls, offices, etc..)

Automatic-precautionary-guard automated systems to assist guarding corona Virus precautions for Closed Rooms (e.g. Halls, offices, etc..) what is this

badra 0 Jan 06, 2022