System Combination for Grammatical Error Correction Based on Integer Programming

Last update: Mar 29, 2022

Related tags

Overview

System Combination for Grammatical Error Correction Based on Integer Programming

This repository contains the code and scripts that implement the system combination approach for grammatical error correction in Lin and Ng (2021).

Reference

Ruixi Lin and Hwee Tou Ng (2021). System Combination for Grammatical Error Correction Based on Integer Programming.

Please cite:

@inproceedings{lin2021gecip,
  author    = "Lin, Ruixi and Ng, Hwee Tou",
  title     = "System Combination for Grammatical Error Correction Based on Integer Programming",
  booktitle = "Proceedings of Recent Advances in Natural Language Processing",
  year      = "2021",
  pages     = "829-834"
}

Table of contents

Prerequisites

Example

License

Prerequisites

conda create --name comb python=3.6
conda activate comb
pip install spacy
python -m spacy download en

For the nonlinear integer programming solver, we use

LINGO10.0

Note that educational institutions can obtain a free license to use the LINGO solver.

Example

Combine the 3 GEC systems listed in the paper using the IP approach. The three systems are UEdin-MS (https://aclanthology.org/W19-4427), Kakao (https://aclanthology.org/W19-4423), and Tohoku (https://aclanthology.org/D19-1119). The core functions for the IP objective are implemented in model.lg4. You can find model.lg4 under lingo/inputs.

Run python prepare_data.py -dir . -list kakao uedinms tohoku to generate aggregated TP, FP, and FN counts. The counts files are stored under lingo/inputs.
Load model.lg4 into the LINGO console and specify the input data path with the counts file path, select the INLP model, and run optimizations. Store the solutions to lingo/outputs/sol_kakao_uedinms_tohoku.txt.
Run ./comb.sh . sol_kakao_uedinms_tohoku.txt to load LINGO solutions, merge and apply edits. The resulted blind test file can be found under submissions. It can be zipped and submitted to the BEA CodeLab website (https://competitions.codalab.org/competitions/20228) for evaluations.

The data folder provides individual GEC system output files, and .m2 files generated using ERRANT for the listed systems. For more information, please visit the ERRANT github page.

We include the IP combined .m2 files under merged_m2, and the corresponding text files under submissions.

License

The source code and models in this repository are licensed under the GNU General Public License v3.0 (see LICENSE). For further research interests and commercial use of the code and models, please contact Ruixi Lin ([email protected]) and Prof. Hwee Tou Ng ([email protected]).

System Combination for Grammatical Error Correction Based on Integer Programming

Related tags

Overview

System Combination for Grammatical Error Correction Based on Integer Programming

Reference

Prerequisites

Example

License

Owner

NUS NLP Group

MinkLoc3D-SI: 3D LiDAR place recognition with sparse convolutions,spherical coordinates, and intensity

SE-MSCNN: A Lightweight Multi-scaled Fusion Network for Sleep Apnea Detection Using Single-Lead ECG Signals

State-Relabeling Adversarial Active Learning

Hyperbolic Hierarchical Clustering.

Codes of the paper Deformable Butterfly: A Highly Structured and Sparse Linear Transform.

HistoSeg : Quick attention with multi-loss function for multi-structure segmentation in digital histology images

Constrained Logistic Regression - How to apply specific constraints to logistic regression's coefficients

ESP32 python application to read data from a Tilt™ Hydrometer for homebrewing

A 3D Dense mapping backend library of SLAM based on taichi-Lang designed for the aerial swarm.

Reproducible research and reusable acyclic workflows in Python. Execute code on HPC systems as if you executed them on your personal computer!

An easier way to build neural search on the cloud

Just Randoms Cats with python

ObjDetApp deploys a pytorch model for object detection

Churn-Prediction-Project - In this project, a churn prediction model is developed for a private bank as a term project for Data Mining class.

Python interface for SmartRF Sniffer 2 Firmware

A keras implementation of ENet (abandoned for the foreseeable future)

FairMOT - A simple baseline for one-shot multi-object tracking

Evaluating Cross-lingual Sentence Representations

YolactEdge: Real-time Instance Segmentation on the Edge

A DNN inference latency prediction toolkit for accurately modeling and predicting the latency on diverse edge devices.

System Combination for Grammatical Error Correction Based on Integer Programming

Related tags

Overview

System Combination for Grammatical Error Correction Based on Integer Programming

Reference

Prerequisites

Example

License

Owner

NUS NLP Group

MinkLoc3D-SI: 3D LiDAR place recognition with sparse convolutions,spherical coordinates, and intensity

SE-MSCNN: A Lightweight Multi-scaled Fusion Network for Sleep Apnea Detection Using Single-Lead ECG Signals

State-Relabeling Adversarial Active Learning

Hyperbolic Hierarchical Clustering.

Codes of the paper Deformable Butterfly: A Highly Structured and Sparse Linear Transform.

HistoSeg : Quick attention with multi-loss function for multi-structure segmentation in digital histology images

Constrained Logistic Regression - How to apply specific constraints to logistic regression's coefficients

ESP32 python application to read data from a Tilt™ Hydrometer for homebrewing

A 3D Dense mapping backend library of SLAM based on taichi-Lang designed for the aerial swarm.

Reproducible research and reusable acyclic workflows in Python. Execute code on HPC systems as if you executed them on your personal computer!

An easier way to build neural search on the cloud

Just Randoms Cats with python

*ObjDetApp* deploys a pytorch model for object detection

Churn-Prediction-Project - In this project, a churn prediction model is developed for a private bank as a term project for Data Mining class.

Python interface for SmartRF Sniffer 2 Firmware

A keras implementation of ENet (abandoned for the foreseeable future)

FairMOT - A simple baseline for one-shot multi-object tracking

Evaluating Cross-lingual Sentence Representations

YolactEdge: Real-time Instance Segmentation on the Edge

A DNN inference latency prediction toolkit for accurately modeling and predicting the latency on diverse edge devices.

ObjDetApp deploys a pytorch model for object detection