UNION: An Unreferenced Metric for Evaluating Open-ended Story Generation

Overview

UNION

Automatic Evaluation Metric described in the paper UNION: An UNreferenced MetrIc for Evaluating Open-eNded Story Generation (EMNLP 2020). Please refer to the Paper List for more information about Open-eNded Language Generation (ONLG) tasks. Hopefully the paper list will help you know more about this field.

Contents

Prerequisites

The code is written in TensorFlow library. To use the program the following prerequisites need to be installed.

  • Python 3.7.0
  • tensorflow-gpu 1.14.0
  • numpy 1.18.1
  • regex 2020.2.20
  • nltk 3.4.5

Computing Infrastructure

We train UNION based on the platform:

  • OS: Ubuntu 16.04.3 LTS (GNU/Linux 4.4.0-98-generic x86_64)
  • GPU: NVIDIA TITAN Xp

Quick Start

1. Constructing Negative Samples

Execute the following command:

cd ./Data
python3 ./get_vocab.py your_mode
python3 ./gen_train_data.py your_mode
  • your_mode is roc for ROCStories corpus or wp for WritingPrompts dataset. Then the summary of vocabulary and the corresponding frequency and pos-tagging will be found under ROCStories/ini_data/entitiy_vocab.txt or WritingPrompts/ini_data/entity_vocab.txt.
  • Negative samples and human-written stories will be constructed based on the original training set. The training set will be found under ROCStories/train_data or WritingPrompts/train_data.
  • Note: currently only 10 samples of the full original data and training data are provided. The full data can be downloaded from THUcloud or GoogleDrive.

2. Training of UNION

Execute the following command:

python3 ./run_union.py --data_dir your_data_dir \
    --output_dir ./model/union \
    --task_name train \
    --init_checkpoint ./model/uncased_L-12_H-768_A-12/bert_model.ckpt
  • your_data_dir is ./Data/ROCStories or ./Data/WritingPrompts.
  • The initial checkpoint of BERT can be downloaded from bert. We use the uncased base version of BERT (about 110M parameters). We train the model for 40000 steps at most. The training process will task about 1~2 days.

3. Prediction with UNION

Execute the following command:

python3 ./run_union.py --data_dir your_data_dir \
    --output_dir ./model/output \
    --task_name pred \
    --init_checkpoint your_model_name
  • your_data_dir is ./Data/ROCStories or ./Data/WritingPrompts. If you want to evaluate your custom texts, you only need tp change your file format into ours.

  • your_model_name is ./model/union_roc/union_roc or ./model/union_wp/union_wp. The fine-tuned checkpoint can be downloaded from the following link:

Dataset Fine-tuned Model
ROCStories THUcloud; GoogleDrive
WritingPrompts THUcloud; GoogleDrive
  • The union score of the stories under your_data_dir/ant_data can be found under the output_dir ./model/output.

4. Correlation Calculation

Execute the following command:

python3 ./correlation.py your_mode

Then the correlation between the human judgements under your_data_dir/ant_data and the scores of metrics under your_data_dir/metric_output will be output. The figures under "./figure" show the score graph between metric scores and human judgments for ROCStories corpus.

Data Instruction for files under ./Data

├── Data
   └── `negation.txt`             # manually constructed negation word vocabulary.
   └── `conceptnet_antonym.txt`   # triples with antonym relations extracted from ConceptNet.
   └── `conceptnet_entity.csv`    # entities acquired from ConceptNet.
   └── `ROCStories`
       ├── `ant_data`        # sampled stories and corresponding human annotation.
              └── `ant_data.txt`        # include only binary annotation for reasonable(1) or unreasonable(0)
              └── `ant_data_all.txt`    # include the annotation for specific error types: reasonable(0), repeated plots(1), bad coherence(2), conflicting logic(3), chaotic scenes(4), and others(5). 
              └── `reference.txt`       # human-written stories with the same leading context with annotated stories.
              └── `reference_ipt.txt`
              └── `reference_opt.txt`
       ├── `ini_data`        # original dataset for training/validation/testing.
              └── `train.txt`
              └── `dev.txt`
              └── `test.txt`
              └── `entity_vocab.txt`    # generated by `get_vocab.py`, consisting of all the entities and the corresponding tagged POS followed by the mention frequency in the dataset.
       ├── `train_data`      # negative samples and corresponding human-written stories for training, which are constructed by `gen_train_data.py`.
              └── `train_human.txt`
              └── `train_negative.txt`
              └── `dev_human.txt`
              └── `dev_negative.txt`
              └── `test_human.txt`
              └── `test_negative.txt`
       ├── `metric_output`   # the scores of different metrics, which can be used to replicate the correlation in Table 5 of the paper. 
              └── `bleu.txt`
              └── `bleurt.txt`
              └── `ppl.txt`             # the sign of the result of Perplexity needs to be changed to get the result for *minus* Perplexity.
              └── `union.txt`
              └── `union_recon.txt`     # the ablated model without the reconstruction task
              └── ...
   └── `WritingPrompts`
       ├── ...
 
  • The annotated data file ant_data.txt and ant_data_all.txt are formatted as Story ID ||| Story ||| Seven Annotated Scores.
  • ant_data_all.txt is only available for ROCStories corpus. ant_data_all.txt is the same with ant_data.txt for WrintingPrompts dataset.

Citation

Please kindly cite our paper if this paper and the code are helpful.

@misc{guan2020union,
    title={UNION: An Unreferenced Metric for Evaluating Open-ended Story Generation},
    author={Jian Guan and Minlie Huang},
    year={2020},
    eprint={2009.07602},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Owner
Conversational AI groups from Tsinghua University
PyTorch Code for "Generalization in Dexterous Manipulation via Geometry-Aware Multi-Task Learning"

Generalization in Dexterous Manipulation via Geometry-Aware Multi-Task Learning [Project Page] [Paper] Wenlong Huang1, Igor Mordatch2, Pieter Abbeel1,

Wenlong Huang 40 Nov 22, 2022
CasualHealthcare's Pneumonia detection with Artificial Intelligence (Convolutional Neural Network)

CasualHealthcare's Pneumonia detection with Artificial Intelligence (Convolutional Neural Network) This is PneumoniaDiagnose, an artificially intellig

Azhaan 2 Jan 03, 2022
Implementation of Geometric Vector Perceptron, a simple circuit for 3d rotation equivariance for learning over large biomolecules, in Pytorch. Idea proposed and accepted at ICLR 2021

Geometric Vector Perceptron Implementation of Geometric Vector Perceptron, a simple circuit with 3d rotation equivariance for learning over large biom

Phil Wang 59 Nov 24, 2022
An unofficial PyTorch implementation of a federated learning algorithm, FedAvg.

Federated Averaging (FedAvg) in PyTorch An unofficial implementation of FederatedAveraging (or FedAvg) algorithm proposed in the paper Communication-E

Seok-Ju Hahn 123 Jan 06, 2023
Official repository of "Investigating Tradeoffs in Real-World Video Super-Resolution"

RealBasicVSR [Paper] This is the official repository of "Investigating Tradeoffs in Real-World Video Super-Resolution, arXiv". This repository contain

Kelvin C.K. Chan 566 Dec 28, 2022
Kaggleship: Kaggle Notebooks

Kaggleship: Kaggle Notebooks This repository contains my Kaggle notebooks. They are generally about data science, machine learning, and deep learning.

Erfan Sobhaei 1 Jan 25, 2022
Project page for End-to-end Recovery of Human Shape and Pose

End-to-end Recovery of Human Shape and Pose Angjoo Kanazawa, Michael J. Black, David W. Jacobs, Jitendra Malik CVPR 2018 Project Page Requirements Pyt

1.4k Dec 29, 2022
Public repository containing materials used for Feed Forward (FF) Neural Networks article.

Art041_NN_Feed_Forward Public repository containing materials used for Feed Forward (FF) Neural Networks article. -- Illustration of a very simple Fee

SolClover 2 Dec 29, 2021
[CVPR2021] The source code for our paper 《Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Learning》.

TBE The source code for our paper "Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Le

Jinpeng Wang 150 Dec 28, 2022
2D&3D human pose estimation

Human Pose Estimation Papers [CVPR 2016] - 201511 [IJCAI 2016] - 201602 Other Action Recognition with Joints-Pooled 3D Deep Convolutional Descriptors

133 Jan 02, 2023
MLPs for Vision and Langauge Modeling (Coming Soon)

MLP Architectures for Vision-and-Language Modeling: An Empirical Study MLP Architectures for Vision-and-Language Modeling: An Empirical Study (Code wi

Yixin Nie 27 May 09, 2022
👨‍💻 run nanosaur in simulation with Gazebo/Ingnition

🦕 👨‍💻 nanosaur_gazebo nanosaur The smallest NVIDIA Jetson dinosaur robot, open-source, fully 3D printable, based on ROS2 & Isaac ROS. Designed & ma

nanosaur 9 Jul 19, 2022
Code for CVPR 2021 paper TransNAS-Bench-101: Improving Transferrability and Generalizability of Cross-Task Neural Architecture Search.

TransNAS-Bench-101 This repository contains the publishable code for CVPR 2021 paper TransNAS-Bench-101: Improving Transferrability and Generalizabili

Yawen Duan 17 Nov 20, 2022
PyTorch implementation for the Neuro-Symbolic Sudoku Solver leveraging the power of Neural Logic Machines (NLM)

Neuro-Symbolic Sudoku Solver PyTorch implementation for the Neuro-Symbolic Sudoku Solver leveraging the power of Neural Logic Machines (NLM). Please n

Ashutosh Hathidara 60 Dec 10, 2022
BLEURT is a metric for Natural Language Generation based on transfer learning.

BLEURT: a Transfer Learning-Based Metric for Natural Language Generation BLEURT is an evaluation metric for Natural Language Generation. It takes a pa

Google Research 492 Jan 05, 2023
A modular, open and non-proprietary toolkit for core robotic functionalities by harnessing deep learning

A modular, open and non-proprietary toolkit for core robotic functionalities by harnessing deep learning Website • About • Installation • Using OpenDR

OpenDR 304 Dec 28, 2022
Torchyolo - Yolov3 ve Yolov4 modellerin Pytorch uygulamasıdır

TORCHYOLO : Yolo Modellerin Pytorch Uygulaması Yapılacaklar: Yolov3 model.py ve

Kadir Nar 3 Aug 22, 2022
Fast Learning of MNL Model From General Partial Rankings with Application to Network Formation Modeling

Fast-Partial-Ranking-MNL This repo provides a PyTorch implementation for the CopulaGNN models as described in the following paper: Fast Learning of MN

Xingjian Zhang 3 Aug 19, 2022
Barbershop: GAN-based Image Compositing using Segmentation Masks (SIGGRAPH Asia 2021)

Barbershop: GAN-based Image Compositing using Segmentation Masks Barbershop: GAN-based Image Compositing using Segmentation Masks Peihao Zhu, Rameen A

Peihao Zhu 928 Dec 30, 2022
The official repository for "Revealing unforeseen diagnostic image features with deep learning by detecting cardiovascular diseases from apical four-chamber ultrasounds"

Revealing unforeseen diagnostic image features with deep learning by detecting cardiovascular diseases from apical four-chamber ultrasounds The why Im

3 Mar 29, 2022