System Combination for Grammatical Error Correction Based on Integer Programming

Related tags

Deep Learninggec_ip
Overview

System Combination for Grammatical Error Correction Based on Integer Programming


This repository contains the code and scripts that implement the system combination approach for grammatical error correction in Lin and Ng (2021).

Reference

Ruixi Lin and Hwee Tou Ng (2021). System Combination for Grammatical Error Correction Based on Integer Programming.

Please cite:

@inproceedings{lin2021gecip,
  author    = "Lin, Ruixi and Ng, Hwee Tou",
  title     = "System Combination for Grammatical Error Correction Based on Integer Programming",
  booktitle = "Proceedings of Recent Advances in Natural Language Processing",
  year      = "2021",
  pages     = "829-834"
}

Table of contents

Prerequisites

Example

License

Prerequisites

conda create --name comb python=3.6
conda activate comb
pip install spacy
python -m spacy download en

For the nonlinear integer programming solver, we use

LINGO10.0

Note that educational institutions can obtain a free license to use the LINGO solver.

Example

Combine the 3 GEC systems listed in the paper using the IP approach. The three systems are UEdin-MS (https://aclanthology.org/W19-4427), Kakao (https://aclanthology.org/W19-4423), and Tohoku (https://aclanthology.org/D19-1119). The core functions for the IP objective are implemented in model.lg4. You can find model.lg4 under lingo/inputs.

  1. Run python prepare_data.py -dir . -list kakao uedinms tohoku to generate aggregated TP, FP, and FN counts. The counts files are stored under lingo/inputs.

  2. Load model.lg4 into the LINGO console and specify the input data path with the counts file path, select the INLP model, and run optimizations. Store the solutions to lingo/outputs/sol_kakao_uedinms_tohoku.txt.

  3. Run ./comb.sh . sol_kakao_uedinms_tohoku.txt to load LINGO solutions, merge and apply edits. The resulted blind test file can be found under submissions. It can be zipped and submitted to the BEA CodeLab website (https://competitions.codalab.org/competitions/20228) for evaluations.

The data folder provides individual GEC system output files, and .m2 files generated using ERRANT for the listed systems. For more information, please visit the ERRANT github page.

We include the IP combined .m2 files under merged_m2, and the corresponding text files under submissions.

License

The source code and models in this repository are licensed under the GNU General Public License v3.0 (see LICENSE). For further research interests and commercial use of the code and models, please contact Ruixi Lin ([email protected]) and Prof. Hwee Tou Ng ([email protected]).

Owner
NUS NLP Group
National University of Singapore
NUS NLP Group
Implementation of the state of the art beat-detection, downbeat-detection and tempo-estimation model

The ISMIR 2020 Beat Detection, Downbeat Detection and Tempo Estimation Model Implementation. This is an implementation in TensorFlow to implement the

Koen van den Brink 1 Nov 12, 2021
From Fidelity to Perceptual Quality: A Semi-Supervised Approach for Low-Light Image Enhancement (CVPR'2020)

Under-exposure introduces a series of visual degradation, i.e. decreased visibility, intensive noise, and biased color, etc. To address these problems, we propose a novel semi-supervised learning app

Yang Wenhan 117 Jan 03, 2023
AITom is an open-source platform for AI driven cellular electron cryo-tomography analysis.

AITom Introduction AITom is an open-source platform for AI driven cellular electron cryo-tomography analysis. AITom is originated from the tomominer l

93 Jan 02, 2023
Automates Machine Learning Pipeline with Feature Engineering and Hyper-Parameters Tuning :rocket:

MLJAR Automated Machine Learning Documentation: https://supervised.mljar.com/ Source Code: https://github.com/mljar/mljar-supervised Table of Contents

MLJAR 2.4k Dec 31, 2022
zeus is a Python implementation of the Ensemble Slice Sampling method.

zeus is a Python implementation of the Ensemble Slice Sampling method. Fast & Robust Bayesian Inference, Efficient Markov Chain Monte Carlo (MCMC), Bl

Minas Karamanis 197 Dec 04, 2022
[NeurIPS 2021] Source code for the paper "Qu-ANTI-zation: Exploiting Neural Network Quantization for Achieving Adversarial Outcomes"

Qu-ANTI-zation This repository contains the code for reproducing the results of our paper: Qu-ANTI-zation: Exploiting Quantization Artifacts for Achie

Secure AI Systems Lab 8 Mar 26, 2022
High-Resolution Image Synthesis with Latent Diffusion Models

Latent Diffusion Models arXiv | BibTeX High-Resolution Image Synthesis with Latent Diffusion Models Robin Rombach*, Andreas Blattmann*, Dominik Lorenz

CompVis Heidelberg 5.6k Dec 30, 2022
A Python module for parallel optimization of expensive black-box functions

blackbox: A Python module for parallel optimization of expensive black-box functions What is this? A minimalistic and easy-to-use Python module that e

Paul Knysh 426 Dec 08, 2022
Multimodal commodity image retrieval 多模态商品图像检索

Multimodal commodity image retrieval 多模态商品图像检索 Not finished yet... introduce explain:The specific description of the project and the product image dat

hongjie 8 Nov 25, 2022
Match SafeGraph POIs with Data collected through a cultural resource survey in Washington DC.

Match SafeGraph POI data with Cultural Resource Places in Washington DC Match SafeGraph POIs with Data collected through a cultural resource survey in

Changjie Chen 1 Jan 05, 2022
LIVECell - A large-scale dataset for label-free live cell segmentation

LIVECell dataset This document contains instructions of how to access the data associated with the submitted manuscript "LIVECell - A large-scale data

Sartorius Corporate Research 112 Jan 07, 2023
Code and data for "Broaden the Vision: Geo-Diverse Visual Commonsense Reasoning" (EMNLP 2021).

GD-VCR Code for Broaden the Vision: Geo-Diverse Visual Commonsense Reasoning (EMNLP 2021). Research Questions and Aims: How well can a model perform o

Da Yin 24 Oct 13, 2022
ZeroVL - The official implementation of ZeroVL

This repository contains source code necessary to reproduce the results presente

31 Nov 04, 2022
EdiBERT is a generative model based on a bi-directional transformer, suited for image manipulation

EdiBERT, a generative model for image editing EdiBERT is a generative model based on a bi-directional transformer, suited for image manipulation. The

16 Dec 07, 2022
This repository contains the scripts for downloading and validating scripts for the documents

HC4: HLTCOE CLIR Common-Crawl Collection This repository contains the scripts for downloading and validating scripts for the documents. Document ids,

JHU Human Language Technology Center of Excellence 6 Jun 07, 2022
GCC: Graph Contrastive Coding for Graph Neural Network Pre-Training @ KDD 2020

GCC: Graph Contrastive Coding for Graph Neural Network Pre-Training Original implementation for paper GCC: Graph Contrastive Coding for Graph Neural N

THUDM 274 Dec 27, 2022
Scripts used to make and evaluate OpenAlex's concept tagging model

openalex-concept-tagging This repository contains all of the code for getting the concept tagger up and running. To learn more about where this model

OurResearch 18 Dec 09, 2022
PyTorch implementation of the method described in the paper VoiceLoop: Voice Fitting and Synthesis via a Phonological Loop.

VoiceLoop PyTorch implementation of the method described in the paper VoiceLoop: Voice Fitting and Synthesis via a Phonological Loop. VoiceLoop is a n

Meta Archive 873 Dec 15, 2022
Codes for paper "Towards Diverse Paragraph Captioning for Untrimmed Videos". CVPR 2021

Towards Diverse Paragraph Captioning for Untrimmed Videos This repository contains PyTorch implementation of our paper Towards Diverse Paragraph Capti

Yuqing Song 61 Oct 11, 2022
Implementation of Feedback Transformer in Pytorch

Feedback Transformer - Pytorch Simple implementation of Feedback Transformer in Pytorch. They improve on Transformer-XL by having each token have acce

Phil Wang 93 Oct 04, 2022