Source code for the paper: Variance-Aware Machine Translation Test Sets (NeurIPS 2021 Datasets and Benchmarks Track)

Overview

Variance-Aware-MT-Test-Sets

Variance-Aware Machine Translation Test Sets

License

See LICENSE. We follow the data licensing plan as the same as the WMT benchmark.

VAT Data

We release 70 lightweight and discriminative test sets for machine translation evaluation, covering 35 translation directions from WMT16 to WMT20 competitions. See VAT_data folder for detailed information.

For each translation direction of a specific year, both source and reference are provided for different types of evaluation metrics. For example,

VAT_data/
├── wmt20
    ├── ...
    ├── vat_newstest2020-zhen-ref.en.txt
    └── vat_newstest2020-zhen-src.zh.txt

Meta-Information of VAT

We also provide the meta-inforamtion of reserved data. Each json file contains the IDs of retained data in the original test set. For instance, file wmt20/bert-r_filter-std60.json describes:

{
	...
	"en-de": [4, 6, 10, 13, 14, 15, ...],
	"de-en": [0, 3, 4, 5, 7, 9, ...],
	...
}

Reproduce & Create VAT

The reported results in the paper were produced by single NVIDIA GeForce 1080Ti card.

We will keep updating the code and related documentation after the response.

Requirements

  • sacreBLEU version >= 1.4.14
  • BLEURT version >= 0.0.2
  • COMET version >= 0.1.0
  • BERTScore version >= 0.3.7 (hug_trans==4.2.1)
  • PyTorch version >= 1.5.1
  • Python version >= 3.8
  • CUDA & cudatoolkit >= 10.1

Note: the minimal version is for reproducing the results

Pipeline

  1. Use score_xxx.py to generate the CSV files that stores the sentence-level scores evaluated by the corresponding metrics. For example, evaluating all the WMT20 submissions of all the language pairs using BERTScore:
    CUDA_VISIBLE_DEVICES=0 python score_bert.py -b 128 -s -r dummy -c dummy --rescale_with_baseline \
    	--hypos-dir ${WMT_DATA_PATH}/system-outputs \
    	--refs-dir ${WMT_DATA_PATH}/references \
    	--scores-dir ${WMT_DATA_PATH}/results/system-level/scores_ALL \
    	--testset-name newstest2020 --score-dump wmt20-bertscore.csv
    (Alternative Option) You can use your implementation for dumping the scores given by the metrics. But the CSV header should contain:
    ,TESTSET,LP,ID,METRIC,SYS,SCORE
    
  2. Use cal_filtering.py to filter the test set based on the score warehouse calculated in the last step. For example,
    python cal_filtering.py --score-dump wmt20-bertscore.csv --output VAT_meta/wmt20-test/ --filter-per 60
    It will produce the json files which contain the IDs of reserved sentences.

Statistics of VAT (References)

Benchmark Translation Direction # Sentences # Words # Vocabulary
wmt20 km-en 928 17170 3645
wmt20 cs-en 266 12568 3502
wmt20 en-de 567 21336 5945
wmt20 ja-en 397 10526 3063
wmt20 ps-en 1088 20296 4303
wmt20 en-zh 567 18224 5019
wmt20 en-ta 400 7809 4028
wmt20 de-en 314 16083 4046
wmt20 zh-en 800 35132 6457
wmt20 en-ja 400 12718 2969
wmt20 en-cs 567 16579 6391
wmt20 en-pl 400 8423 3834
wmt20 en-ru 801 17446 6877
wmt20 pl-en 400 7394 2399
wmt20 iu-en 1188 23494 3876
wmt20 ru-en 396 6966 2330
wmt20 ta-en 399 7427 2148
wmt19 zh-en 800 36739 6168
wmt19 en-cs 799 15433 6111
wmt19 de-en 800 15219 4222
wmt19 en-gu 399 8494 3548
wmt19 fr-de 680 12616 3698
wmt19 en-zh 799 20230 5547
wmt19 fi-en 798 13759 3555
wmt19 en-fi 799 13303 6149
wmt19 kk-en 400 9283 2584
wmt19 de-cs 799 15080 6166
wmt19 lt-en 400 10474 2874
wmt19 en-lt 399 7251 3364
wmt19 ru-en 800 14693 3817
wmt19 en-kk 399 6411 3252
wmt19 en-ru 799 16393 6125
wmt19 gu-en 406 8061 2434
wmt19 de-fr 680 16181 3517
wmt19 en-de 799 18946 5340
wmt18 en-cs 1193 19552 7926
wmt18 cs-en 1193 23439 5453
wmt18 en-fi 1200 16239 7696
wmt18 en-tr 1200 19621 8613
wmt18 en-et 800 13034 6001
wmt18 ru-en 1200 26747 6045
wmt18 et-en 800 20045 5045
wmt18 tr-en 1200 25689 5955
wmt18 fi-en 1200 24912 5834
wmt18 zh-en 1592 42983 7985
wmt18 en-zh 1592 34796 8579
wmt18 en-ru 1200 22830 8679
wmt18 de-en 1199 28275 6487
wmt18 en-de 1199 25473 7130
wmt17 en-lv 800 14453 6161
wmt17 zh-en 800 20590 5149
wmt17 en-tr 1203 17612 7714
wmt17 lv-en 800 18653 4747
wmt17 en-de 1202 22055 6463
wmt17 ru-en 1200 24807 5790
wmt17 en-fi 1201 17284 7763
wmt17 tr-en 1203 23037 5387
wmt17 en-zh 800 18001 5629
wmt17 en-ru 1200 22251 8761
wmt17 fi-en 1201 23791 5300
wmt17 en-cs 1202 21278 8256
wmt17 de-en 1202 23838 5487
wmt17 cs-en 1202 22707 5310
wmt16 tr-en 1200 19225 4823
wmt16 ru-en 1199 23010 5442
wmt16 ro-en 800 16200 3968
wmt16 de-en 1200 22612 5511
wmt16 en-ru 1199 20233 7872
wmt16 fi-en 1200 20744 5176
wmt16 cs-en 1200 23235 5324
Owner
NLP2CT Lab, University of Macau
Natural Language Processing & Portuguese - Chinese Machine Translation Laboratory
NLP2CT Lab, University of Macau
nanodet_plus,yolov5_v6.0

OAK_Detection OAK设备上适配nanodet_plus,yolov5_v6.0 Environment pytorch = 1.7.0

炼丹去了 1 Feb 18, 2022
App customer segmentation cohort rfm clustering

CUSTOMER SEGMENTATION COHORT RFM CLUSTERING TỔNG QUAN VỀ HỆ THỐNG DỮ LIỆU Nên chuyển qua theme màu dark thì sẽ nhìn đẹp hơn https://customer-segmentat

hieulmsc 3 Dec 18, 2021
A Runtime method overload decorator which should behave like a compiled language

strongtyping-pyoverload A Runtime method overload decorator which should behave like a compiled language there is a override decorator from typing whi

20 Oct 31, 2022
Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting This is the origin Pytorch implementation of Informer in the followin

Haoyi 3.1k Dec 29, 2022
Official PyTorch implementation of the paper "Recycling Discriminator: Towards Opinion-Unaware Image Quality Assessment Using Wasserstein GAN", accepted to ACM MM 2021 BNI Track.

RecycleD Official PyTorch implementation of the paper "Recycling Discriminator: Towards Opinion-Unaware Image Quality Assessment Using Wasserstein GAN

Yunan Zhu 23 Nov 05, 2022
GANmouflage: 3D Object Nondetection with Texture Fields

GANmouflage: 3D Object Nondetection with Texture Fields Rui Guo1 Jasmine Collins

29 Aug 10, 2022
YOLOv2 in PyTorch

YOLOv2 in PyTorch NOTE: This project is no longer maintained and may not compatible with the newest pytorch (after 0.4.0). This is a PyTorch implement

Long Chen 1.5k Jan 02, 2023
Learning Versatile Neural Architectures by Propagating Network Codes

Learning Versatile Neural Architectures by Propagating Network Codes Mingyu Ding, Yuqi Huo, Haoyu Lu, Linjie Yang, Zhe Wang, Zhiwu Lu, Jingdong Wang,

Mingyu Ding 36 Dec 06, 2022
Event queue (Equeue) dialect is an MLIR Dialect that models concurrent devices in terms of control and structure.

Event Queue Dialect Event queue (Equeue) dialect is an MLIR Dialect that models concurrent devices in terms of control and structure. Motivation The m

Cornell Capra 23 Dec 08, 2022
A Keras implementation of YOLOv3 (Tensorflow backend)

keras-yolo3 Introduction A Keras implementation of YOLOv3 (Tensorflow backend) inspired by allanzelener/YAD2K. Quick Start Download YOLOv3 weights fro

7.1k Jan 03, 2023
SurvITE: Learning Heterogeneous Treatment Effects from Time-to-Event Data

SurvITE: Learning Heterogeneous Treatment Effects from Time-to-Event Data SurvITE: Learning Heterogeneous Treatment Effects from Time-to-Event Data Au

14 Nov 28, 2022
Source code for CVPR 2020 paper "Learning to Forget for Meta-Learning"

L2F - Learning to Forget for Meta-Learning Sungyong Baik, Seokil Hong, Kyoung Mu Lee Source code for CVPR 2020 paper "Learning to Forget for Meta-Lear

Sungyong Baik 29 May 22, 2022
[ICCV 2021] FaPN: Feature-aligned Pyramid Network for Dense Image Prediction

FaPN: Feature-aligned Pyramid Network for Dense Image Prediction [arXiv] [Project Page] @inproceedings{ huang2021fapn, title={{FaPN}: Feature-alig

Shihua Huang 23 Jul 22, 2022
Code for the paper Open Sesame: Getting Inside BERT's Linguistic Knowledge.

Open Sesame This repository contains the code for the paper Open Sesame: Getting Inside BERT's Linguistic Knowledge. Credits We built the project on t

9 Jul 24, 2022
YOLOX is a high-performance anchor-free YOLO, exceeding yolov3~v5 with ONNX, TensorRT, ncnn, and OpenVINO supported.

Introduction YOLOX is an anchor-free version of YOLO, with a simpler design but better performance! It aims to bridge the gap between research and ind

7.7k Jan 03, 2023
JupyterNotebook - C/C++, Javascript, HTML, LaTex, Shell scripts in Jupyter Notebook Also run them on remote computer

JupyterNotebook Read, write and execute C, C++, Javascript, Shell scripts, HTML, LaTex in jupyter notebook, And also execute them on remote computer R

1 Jan 09, 2022
CVNets: A library for training computer vision networks

CVNets: A library for training computer vision networks This repository contains the source code for training computer vision models. Specifically, it

Apple 1.1k Jan 03, 2023
nextPARS, a novel Illumina-based implementation of in-vitro parallel probing of RNA structures.

nextPARS, a novel Illumina-based implementation of in-vitro parallel probing of RNA structures. Here you will find the scripts necessary to produce th

Jesse Willis 0 Jan 20, 2022
A Python library that enables ML teams to share, load, and transform data in a collaborative, flexible, and efficient way :chestnut:

Squirrel Core Share, load, and transform data in a collaborative, flexible, and efficient way What is Squirrel? Squirrel is a Python library that enab

Merantix Momentum 249 Dec 07, 2022