This repository includes the code of the sequence-to-sequence model for discontinuous constituent parsing described in paper Discontinuous Grammar as a Foreign Language.

Last update: Apr 07, 2022

Related tags

Deep Learning Disco-Seq2seq-Parser

Overview

Discontinuous Grammar as a Foreign Language

This repository includes the code of the sequence-to-sequence model for discontinuous constituent parsing described in paper Discontinuous Grammar as a Foreign Language. In particular, it uses the in-order+SWAP linearization to deal with discontinuities and yields 95.47 F1 on the English Discontinuous Penn Treebank (DPTB). This implementation is based on the system by Fernandez Astudillo et al. (2020) and reuses part of its code.

Requirements

This implementation was tested on Python 3.6.9, PyTorch 1.1.0 and CUDA 9.0.176. Please run the following command to proceed with the installation:

    cd Disco-Seq2seq-Parser
    pip install -r requirements.txt

For the evaluation, script DISCODOP must be also installed following steps described in https://github.com/andreasvc/disco-dop.

Data

To get shift-reduce linearizations from discontinuous constituent treebanks (for instance, the DPTB), please include train, dev and test splits in discbracket format in the disco_data folder and name them as train.discbracket, dev.discbracket and test.discbracket. Then use the following script:

    ./linearization/generate.sh DPTB

Experiments

To train a model for the DPTB treebank, just execute the following script:

   ./scripts/stack-transformer/con_experiment.sh configs/ptb_roberta.large.sh

To test the trained model on the test split, please run the following command:

    ./scripts/stack-transformer/con_test-test.sh configs/test_roberta_large.sh DATA/dep-parsing/models/DPTB_RoBERTa-large_stnp6x6-seed44/checkpoint_top3-average.pt DATA/dep-parsing/models/DPTB_RoBERTa-large_stnp6x6-seed44/epoch-tests-test/dec-checkpoint-top3-average

Citation

@misc{fernándezgonzález2021discontinuous,
      title={Discontinuous Grammar as a Foreign Language},
      author={Daniel Fernández-González and Carlos Gómez-Rodríguez},
      year={2021},
      eprint={2110.10431},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
    }

Acknowledgments

We acknowledge the European Research Council (ERC), which has funded this research under the European Union’s Horizon 2020 research and innovation programme (FASTPARSE, grant agreement No 714150), MINECO (ANSWER-ASAP, TIN2017-85160-C2-1-R), MICINN (SCANNER, PID2020-113230RB-C21) Xunta de Galicia (ED431C 2020/11), and Centro de Investigación de Galicia "CITIC", funded by Xunta de Galicia and the European Union (ERDF - Galicia 2014-2020 Program), by grant ED431G 2019/01.

This repository includes the code of the sequence-to-sequence model for discontinuous constituent parsing described in paper Discontinuous Grammar as a Foreign Language.

Related tags

Overview

Discontinuous Grammar as a Foreign Language

Requirements

Data

Experiments

Citation

Acknowledgments

Owner

Daniel Fernández-González

a generic C++ library for image analysis

This repository attempts to replicate the SqueezeNet architecture and implement the same on an image classification task.

Data pipelines for both TensorFlow and PyTorch!

Code for the paper "VisualBERT: A Simple and Performant Baseline for Vision and Language"

[CVPR 2019 Oral] Multi-Channel Attention Selection GAN with Cascaded Semantic Guidance for Cross-View Image Translation

OpenMMLab Computer Vision Foundation

Official Pytorch Implementation of 3DV2021 paper: SAFA: Structure Aware Face Animation.

DNA-RECON { Automatic Web Reconnaissance Tool }

An implementation of "Optimal Textures: Fast and Robust Texture Synthesis and Style Transfer through Optimal Transport"

Ontologysim: a Owlready2 library for applied production simulation

Joint Detection and Identification Feature Learning for Person Search

Code of the paper "Multi-Task Meta-Learning Modification with Stochastic Approximation".

Code for training and evaluation of the model from "Language Generation with Recurrent Generative Adversarial Networks without Pre-training"

Solving SMPL/MANO parameters from keypoint coordinates.

A Flexible Generative Framework for Graph-based Semi-supervised Learning (NeurIPS 2019)

CSAW-M: An Ordinal Classification Dataset for Benchmarking Mammographic Masking of Cancer

Detecting and Tracking Small and Dense Moving Objects in Satellite Videos: A Benchmark

Code for our paper Domain Adaptive Semantic Segmentation with Self-Supervised Depth Estimation

MIM: MIM Installs OpenMMLab Packages

PyTorch code for: Learning to Generate Grounded Visual Captions without Localization Supervision