This repository contains the code for EMNLP-2021 paper "Word-Level Coreference Resolution"

Last update: Dec 27, 2022

Related tags

Overview

Word-Level Coreference Resolution

This is a repository with the code to reproduce the experiments described in the paper of the same name, which was accepted to EMNLP 2021. The paper is available here.

Preparation
Training
Evaluation

Preparation

The following instruction has been tested with Python 3.7 on an Ubuntu 20.04 machine.

You will need:

OntoNotes 5.0 corpus (download here, registration needed)
Python 2.7 to run conll-2012 scripts
Java runtime to run Stanford Parser
Python 3.7+ to run the model
Perl to run conll-2012 evaluation scripts
CUDA-enabled machine (48 GB to train, 4 GB to evaluate)

Extract OntoNotes 5.0 arhive. In case it's in the repo's root directory:
```
 tar -xzvf ontonotes-release-5.0_LDC2013T19.tgz
```
Switch to Python 2.7 environment (where python would run 2.7 version). This is necessary for conll scripts to run correctly. To do it with with conda:
```
 conda create -y --name py27 python=2.7 && conda activate py27
```

Run the conll data preparation scripts (~30min):

 sh get_conll_data.sh ontonotes-release-5.0 data

Download conll scorers and Stanford Parser:
```
 sh get_third_party.sh
```

Prepare your environment. To do it with conda:

 conda create -y --name wl-coref python=3.7 openjdk perl
 conda activate wl-coref
 python -m pip install -r requirements.txt

Build the corpus in jsonlines format (~20 min):

 python convert_to_jsonlines.py data/conll-2012/ --out-dir data
 python convert_to_heads.py

You're all set!

Training

If you have completed all the steps in the previous section, then just run:

python run.py train roberta

Use -h flag for more parameters and CUDA_VISIBLE_DEVICES environment variable to limit the cuda devices visible to the script. Refer to config.toml to modify existing model configurations or create your own.

Evaluation

Make sure that you have successfully completed all steps of the Preparation section.

Download and save the pretrained model to the data directory.

 https://www.dropbox.com/s/vf7zadyksgj40zu/roberta_%28e20_2021.05.02_01.16%29_release.pt?dl=0

Generate the conll-formatted output:

 python run.py eval roberta --data-split test

Run the conll-2012 scripts to obtain the metrics:
```
 python calculate_conll.py roberta test 20
```

This repository contains the code for EMNLP-2021 paper "Word-Level Coreference Resolution"

Related tags

Overview

Word-Level Coreference Resolution

Table of contents

Preparation

Training

Evaluation

Owner

P-Tuning v2: Prompt Tuning Can Be Comparable to Finetuning Universally Across Scales and Tasks

Posterior predictive distributions quantify uncertainties ignored by point estimates.

Video-Music Transformer

“Data Augmentation for Cross-Domain Named Entity Recognition” (EMNLP 2021)

HeatNet is a python package that provides tools to build, train and evaluate neural networks designed to predict extreme heat wave events globally on daily to subseasonal timescales.

Ensemble Learning Priors Driven Deep Unfolding for Scalable Snapshot Compressive Imaging [PyTorch]

Official PyTorch Implementation of paper "Deep 3D Mask Volume for View Synthesis of Dynamic Scenes", ICCV 2021.

[RSS 2021] An End-to-End Differentiable Framework for Contact-Aware Robot Design

clustimage is a python package for unsupervised clustering of images.

Official implementation of Deep Reparametrization of Multi-Frame Super-Resolution and Denoising

DexterRedTool - Dexter's Red Team Tool that creates cronjob/task scheduler to consistently creates users

Official PyTorch implementation of the paper "Graph-based Generative Face Anonymisation with Pose Preservation" in ICIAP 2021

Differentiable simulation for system identification and visuomotor control

First-Order Probabilistic Programming Language

Learning to Estimate Hidden Motions with Global Motion Aggregation

For holding anime-related object classification and detection models

Experimental solutions to selected exercises from the book [Advances in Financial Machine Learning by Marcos Lopez De Prado]

Code basis for the paper "Camera Condition Monitoring and Readjustment by means of Noise and Blur" (2021)

Some bravo or inspiring research works on the topic of curriculum learning.

Official PyTorch implementation of "ArtFlow: Unbiased Image Style Transfer via Reversible Neural Flows"