TANL: Structured Prediction as Translation between Augmented Natural Languages

Related tags

Deep Learningtanl
Overview

TANL: Structured Prediction as Translation between Augmented Natural Languages

Code for the paper "Structured Prediction as Translation between Augmented Natural Languages" (ICLR 2021).

If you use this code, please cite the paper using the bibtex reference below.

@inproceedings{tanl,
    title={Structured Prediction as Translation between Augmented Natural Languages},
    author={Giovanni Paolini and Ben Athiwaratkun and Jason Krone and Jie Ma and Alessandro Achille and Rishita Anubhai and Cicero Nogueira dos Santos and Bing Xiang and Stefano Soatto},
    booktitle={9th International Conference on Learning Representations, {ICLR} 2021},
    year={2021},
}

Requirements

  • Python 3.6+
  • PyTorch (tested with version 1.7.1)
  • Transformers (tested with version 4.0.0)
  • NetworkX (tested with version 2.5, only used in coreference resolution)

You can install all required Python packages with pip install -r requirements.txt

Datasets

By default, datasets are expected to be in data/DATASET_NAME. Dataset-specific code is in datasets.py.

For example, the CoNLL04 and ADE datasets (joint entity and relation extraction) in the correct format can be downloaded using https://github.com/markus-eberts/spert/blob/master/scripts/fetch_datasets.sh. For other datasets, pre-processing and links are documented in the code.

Running the code

Use the following command: python run.py JOB

The JOB argument refers to a section of the config file, which by default is config.ini. A sample config file is provided, with settings that allow for a faster training and less memory usage than the settings used to obtain the final results in the paper.

For example, to replicate the paper's results on CoNLL04, have the following section in the config file:

[conll04_final]
datasets = conll04
model_name_or_path = t5-base
num_train_epochs = 200
max_seq_length = 256
max_seq_length_eval = 512
train_split = train,dev
per_device_train_batch_size = 8
per_device_eval_batch_size = 16
do_train = True
do_eval = False
do_predict = True
episodes = 1-10
num_beams = 8

Then run python run.py conll04_final. Note that the final results will differ slightly from the ones reported in the paper, due to small code changes and randomness.

Config arguments can be overwritten by command line arguments. For example: python run.py conll04_final --num_train_epochs 50.

Additional details

If do_train = True, the model is trained on the given train split (e.g., 'train') of the given datasets. The final weights and intermediate checkpoints are written in a directory such as experiments/conll04_final-t5-base-ep200-len256-b8-train, with one subdirectory per episode. Results in JSON format are also going to be saved there.

In every episode, the model is trained on a different (random) permutation of the training set. The random seed is given by the episode number, so that every episode always produces the same exact model.

Once a model is trained, it is possible to evaluate it without training again. For this, set do_train = False or (more easily) provide the -e command-line argument: python run.py conll04_final -e.

If do_eval = True, the model is evaluated on the 'dev' split. If do_predict = True, the model is evaluated on the 'test' split.

Arguments

The following are the most important command-line arguments for the run.py script. Run python run.py -h for the full list.

  • -c CONFIG_FILE: specify config file to use (default is config.ini)
  • -e: only run evaluation (overwrites the setting do_train in the config file)
  • -a: evaluate also intermediate checkpoints, in addition to the final model
  • -v : print results for each evaluation run
  • -g GPU: specify which GPU to use for evaluation

The following are the most important arguments for the config file. See the sample config file to understand the format.

  • datasets (str): comma-separated list of datasets for training
  • eval_datasets (str): comma-separated list of datasets for evaluation (default is the same as for training)
  • model_name_or_path (str): path to pretrained model or model identifier from huggingface.co/models (e.g. t5-base)
  • do_train (bool): whether to run training (default is False)
  • do_eval (bool): whether to run evaluation on the dev set (default is False)
  • do_predict (bool): whether to run evaluation on the test set (default is False)
  • train_split (str): comma-separated list of data splits for training (default is train)
  • num_train_epochs (int): number of train epochs
  • learning_rate (float): initial learning rate (default is 5e-4)
  • train_subset (float > 0 and <=1): portion of training data to effectively use during training (default is 1, i.e., use all training data)
  • per_device_train_batch_size (int): batch size per GPU during training (default is 8)
  • per_device_eval_batch_size (int): batch size during evaluation (default is 8; only one GPU is used for evaluation)
  • max_seq_length (int): maximum input sequence length after tokenization; longer sequences are truncated
  • max_output_seq_length (int): maximum output sequence length (default is max_seq_length)
  • max_seq_length_eval (int): maximum input sequence length for evaluation (default is max_seq_length)
  • max_output_seq_length_eval (int): maximum output sequence length for evaluation (default is max_output_seq_length or max_seq_length_eval or max_seq_length)
  • episodes (str): episodes to run (default is 0; an interval can be specified, such as 1-4; the episode number is used as the random seed)
  • num_beams (int): number of beams for beam search during generation (default is 1)
  • multitask (bool): if True, the name of the dataset is prepended to each input sentence (default is False)

See arguments.py and transformers.TrainingArguments for additional config arguments.

Source code of the paper Meta-learning with an Adaptive Task Scheduler.

ATS About Source code of the paper Meta-learning with an Adaptive Task Scheduler. If you find this repository useful in your research, please cite the

Huaxiu Yao 16 Dec 26, 2022
Official implementation for the paper "Attentive Prototypes for Source-free Unsupervised Domain Adaptive 3D Object Detection"

Attentive Prototypes for Source-free Unsupervised Domain Adaptive 3D Object Detection PyTorch code release of the paper "Attentive Prototypes for Sour

Deepti Hegde 23 Oct 17, 2022
MetaBalance: Improving Multi-Task Recommendations via Adapting Gradient Magnitudes of Auxiliary Tasks

MetaBalance: Improving Multi-Task Recommendations via Adapting Gradient Magnitudes of Auxiliary Tasks Introduction This repo contains the pytorch impl

Meta Research 38 Oct 10, 2022
CHERRY is a python library for predicting the interactions between viral and prokaryotic genomes

CHERRY is a python library for predicting the interactions between viral and prokaryotic genomes. CHERRY is based on a deep learning model, which consists of a graph convolutional encoder and a link

Kenneth Shang 12 Dec 15, 2022
Plenoxels: Radiance Fields without Neural Networks, Code release WIP

Plenoxels: Radiance Fields without Neural Networks Alex Yu*, Sara Fridovich-Keil*, Matthew Tancik, Qinhong Chen, Benjamin Recht, Angjoo Kanazawa UC Be

Alex Yu 2.3k Dec 30, 2022
NeRF visualization library under construction

NeRF visualization library using PlenOctrees, under construction pip install nerfvis Docs will be at: https://nerfvis.readthedocs.org import nerfvis s

Alex Yu 196 Jan 04, 2023
Anatomy of Matplotlib -- tutorial developed for the SciPy conference

Introduction This tutorial is a complete re-imagining of how one should teach users the matplotlib library. Hopefully, this tutorial may serve as insp

Matplotlib Developers 1.1k Dec 29, 2022
A curated list of neural rendering resources.

Awesome-of-Neural-Rendering A curated list of neural rendering and related resources. Please feel free to pull requests or open an issue to add papers

Zhiwei ZHANG 43 Dec 09, 2022
Spatially-Adaptive Pixelwise Networks for Fast Image Translation, CVPR 2021

Image Translation with ASAPNets Spatially-Adaptive Pixelwise Networks for Fast Image Translation, CVPR 2021 Webpage | Paper | Video Installation insta

Tamar Rott Shaham 100 Dec 28, 2022
A memory-efficient implementation of DenseNets

efficient_densenet_pytorch A PyTorch =1.0 implementation of DenseNets, optimized to save GPU memory. Recent updates Now works on PyTorch 1.0! It uses

Geoff Pleiss 1.4k Dec 25, 2022
PyTorch code for the "Deep Neural Networks with Box Convolutions" paper

Box Convolution Layer for ConvNets Single-box-conv network (from `examples/mnist.py`) learns patterns on MNIST What This Is This is a PyTorch implemen

Egor Burkov 515 Dec 18, 2022
Seasonal Contrast: Unsupervised Pre-Training from Uncurated Remote Sensing Data

Seasonal Contrast: Unsupervised Pre-Training from Uncurated Remote Sensing Data This is the official PyTorch implementation of the SeCo paper: @articl

ElementAI 101 Dec 12, 2022
Minimalist Error collection Service compatible with Rollbar clients. Sentry or Rollbar alternative.

Minimalist Error collection Service Features Compatible with any Rollbar client(see https://docs.rollbar.com/docs). Just change the endpoint URL to yo

Haukur Rósinkranz 381 Nov 11, 2022
Code from the paper "High-Performance Brain-to-Text Communication via Handwriting"

High-Performance Brain-to-Text Communication via Handwriting Overview This repo is associated with this manuscript, preprint and dataset. The code can

Francis R. Willett 306 Jan 03, 2023
Perform zero-order Hankel Transform for an 1D array (float or real valued).

perform zero-order Hankel Transform for an 1D array (float or real valued). An discrete form of Parseval theorem is guaranteed. Suit for iterative problems.

1 Jan 17, 2022
Code repository of the paper Neural circuit policies enabling auditable autonomy published in Nature Machine Intelligence

Neural Circuit Policies Enabling Auditable Autonomy Online access via SharedIt Neural Circuit Policies (NCPs) are designed sparse recurrent neural net

8 Jan 07, 2023
Cross View SLAM

Cross View SLAM This is the associated code and dataset repository for our paper I. D. Miller et al., "Any Way You Look at It: Semantic Crossview Loca

Ian D. Miller 99 Dec 09, 2022
Extremely simple and fast extreme multi-class and multi-label classifiers.

napkinXC napkinXC is an extremely simple and fast library for extreme multi-class and multi-label classification, that focus of implementing various m

Marek Wydmuch 43 Nov 14, 2022
Jittor Medical Segmentation Lib -- The assignment of Pattern Recognition course (2021 Spring) in Tsinghua University

THU模式识别2021春 -- Jittor 医学图像分割 模型列表 本仓库收录了课程作业中同学们采用jittor框架实现的如下模型: UNet SegNet DeepLab V2 DANet EANet HarDNet及其改动HarDNet_alter PSPNet OCNet OCRNet DL

48 Dec 26, 2022
details on efforts to dump the Watermelon Games Paprium cart

Reminder, if you like these repos, fork them so they don't disappear https://github.com/ArcadeHustle/WatermelonPapriumDump/fork Big thanks to Fonzie f

Hustle Arcade 29 Dec 11, 2022