Code repo for EMNLP21 paper "Zero-Shot Information Extraction as a Unified Text-to-Triple Translation"

Related tags

Deep Learningdeepex
Overview

Zero-Shot Information Extraction as a Unified Text-to-Triple Translation

Source code repo for paper Zero-Shot Information Extraction as a Unified Text-to-Triple Translation, EMNLP 2021.

Installation

git clone --recursive [email protected]:cgraywang/deepex.git
cd ./deepex
conda create --name deepex python=3.7 -y
conda activate deepex
pip install -r requirements.txt
pip install -e .

Requires PyTorch version 1.5.1 or above with CUDA. PyTorch 1.7.1 with CUDA 10.1 is tested. Please refer to https://pytorch.org/get-started/locally/ for installing PyTorch.

Dataset Preparation

Relation Classification

FewRel

You can add --prepare-rc-dataset argument when running the scripts in this section, which would allow the script to automatically handle the preparation of FewRel dataset.

Or, you could manually download and prepare the FewRel dataset using the following script:

bash scripts/rc/prep_FewRel.sh

The processed data will be stored at data/FewRel/data.jsonl.

TACRED

TACRED is licensed under LDC, please first download TACRED dataset from link. The downloaded file should be named as tacred_LDC2018T24.tgz.

After downloading and correctly naming the tacred .tgz data file, you can add --prepare-rc-dataset argument when running the scripts in this section, which would allow the script to automatically handle the preparation of TACRED dataset.

Or, you could manually download and prepare the TACRED dataset using the following script:

bash scripts/rc/prep_TACRED.sh

The processed data will be stored at data/TACRED/data.jsonl.

Scripts for Reproducing Results

This section contains the scripts for running the tasks with default setting (e.g.: using model bert-large-cased, using 8 CUDA devices with per-device batch size equal to 4).

To modify the settings, please checkout this section.

Open Information Extraction

bash tasks/OIE_2016.sh
bash tasks/PENN.sh
bash tasks/WEB.sh
bash tasks/NYT.sh

Relation Classification

bash tasks/FewRel.sh
bash tasks/TACRED.sh

Arguments

General script:

python scripts/manager.py --task=<task_name> <other_args>

The default setting is:

python scripts/manager.py --task=<task_name> --model="bert-large-cased" --beam-size=6
                          --max-distance=2048 --batch-size-per-device=4 --stage=0
                          --cuda=0,1,2,3,4,5,6,7

All tasks are already implemented as above .sh files in tasks/, using the default arguments.

The following are the most important command-line arguments for the scripts/manager.py script:

  • --task: The task to be run, supported tasks are OIE_2016, WEB, NYT, PENN, FewRel and TACRED.
  • --model: The pre-trained model type to be used for generating attention matrices to perform beam search on, supported models are bert-base-cased and bert-large-cased.
  • --beam-size: The beam size during beam search.
  • --batch-size-per-device: The batch size on a single device.
  • --stage: Run task starting from an intermediate stage:
    • --stage=0: data preparation and beam-search
    • --stage=1: post processing
    • --stage=2: ranking
    • --stage=3: evaluation
  • --prepare-rc-dataset: If true, automatically run the relation classification dataset preparation scripts. Notice that this argument should be turned on only for relation classification tasks (i.e.: FewRel and TACRED).
  • --cuda: Specify CUDA gpu devices.

Run python scripts/manager.py -h for the full list.

Results

NOTE

We are able to obtain improved or same results compared to the paper's results. We will release the code and datasets for factual probe soon!

Related Work

We implement an extended version of the beam search algorithm proposed in Language Models are Open Knowledge Graphs in src/deepex/model/kgm.py.

Citation

@inproceedings{wang-etal-2021-deepex,
    title = "Zero-Shot Information Extraction as a Unified Text-to-Triple Translation",
    author = "Chenguang Wang and Xiao Liu and Zui Chen and Haoyun Hong and Jie Tang and Dawn Song",
    booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing",
    year = "2021",
    publisher = "Association for Computational Linguistics"
}

@article{wang-etal-2020-language,
    title = "Language Models are Open Knowledge Graphs",
    author = "Chenguang Wang and Xiao Liu and Dawn Song",
    journal = "arXiv preprint arXiv:2010.11967",
    year = "2020"
}
Genetic Programming in Python, with a scikit-learn inspired API

Welcome to gplearn! gplearn implements Genetic Programming in Python, with a scikit-learn inspired and compatible API. While Genetic Programming (GP)

Trevor Stephens 1.3k Jan 03, 2023
Recurrent Neural Network Tutorial, Part 2 - Implementing a RNN in Python and Theano

Please read the blog post that goes with this code! Jupyter Notebook Setup System Requirements: Python, pip (Optional) virtualenv To start the Jupyter

Denny Britz 863 Dec 15, 2022
LVI-SAM: Tightly-coupled Lidar-Visual-Inertial Odometry via Smoothing and Mapping

LVI-SAM This repository contains code for a lidar-visual-inertial odometry and mapping system, which combines the advantages of LIO-SAM and Vins-Mono

Tixiao Shan 1.1k Dec 27, 2022
HistoSeg : Quick attention with multi-loss function for multi-structure segmentation in digital histology images

HistoSeg : Quick attention with multi-loss function for multi-structure segmentation in digital histology images Histological Image Segmentation This

Saad Wazir 11 Dec 16, 2022
Source code of our work: "Benchmarking Deep Models for Salient Object Detection"

SALOD Source code of our work: "Benchmarking Deep Models for Salient Object Detection". In this works, we propose a new benchmark for SALient Object D

22 Dec 30, 2022
Official PyTorch implementation of PICCOLO: Point-Cloud Centric Omnidirectional Localization (ICCV 2021)

Official PyTorch implementation of PICCOLO: Point-Cloud Centric Omnidirectional Localization (ICCV 2021)

16 Nov 19, 2022
Deep Learning applied to Integral data analysis

DeepIntegralCompton Deep Learning applied to Integral data analysis Module installation Move to the root directory of the project and execute : pip in

Thomas Vuillaume 1 Dec 10, 2021
arxiv-sanity, but very lite, simply providing the core value proposition of the ability to tag arxiv papers of interest and have the program recommend similar papers.

arxiv-sanity, but very lite, simply providing the core value proposition of the ability to tag arxiv papers of interest and have the program recommend similar papers.

Andrej 671 Dec 31, 2022
Get the partition that a file belongs and the percentage of space that consumes

tinos_eisai_sy Get the partition that a file belongs and the percentage of space that consumes (works only with OSes that use the df command) tinos_ei

Konstantinos Patronas 6 Jan 24, 2022
magiCARP: Contrastive Authoring+Reviewing Pretraining

magiCARP: Contrastive Authoring+Reviewing Pretraining Welcome to the magiCARP API, the test bed used by EleutherAI for performing text/text bi-encoder

EleutherAI 43 Dec 29, 2022
The Generic Manipulation Driver Package - Implements a ROS Interface over the robotics toolbox for Python

Armer Driver Armer aims to provide an interface layer between the hardware drivers of a robotic arm giving the user control in several ways: Joint vel

QUT Centre for Robotics (QCR) 13 Nov 26, 2022
library for nonlinear optimization, wrapping many algorithms for global and local, constrained or unconstrained, optimization

NLopt is a library for nonlinear local and global optimization, for functions with and without gradient information. It is designed as a simple, unifi

Steven G. Johnson 1.4k Dec 25, 2022
Instant Real-Time Example-Based Style Transfer to Facial Videos

FaceBlit: Instant Real-Time Example-Based Style Transfer to Facial Videos The official implementation of FaceBlit: Instant Real-Time Example-Based Sty

Aneta Texler 131 Dec 19, 2022
Official implementation of "Variable-Rate Deep Image Compression through Spatially-Adaptive Feature Transform", ICCV 2021

Variable-Rate Deep Image Compression through Spatially-Adaptive Feature Transform This repository is the implementation of "Variable-Rate Deep Image C

Myungseo Song 47 Dec 13, 2022
BraTs-VNet - BraTS(Brain Tumour Segmentation) using V-Net

BraTS(Brain Tumour Segmentation) using V-Net This project is an approach to dete

Rituraj Dutta 7 Nov 27, 2022
Implementation for HFGI: High-Fidelity GAN Inversion for Image Attribute Editing

HFGI: High-Fidelity GAN Inversion for Image Attribute Editing High-Fidelity GAN Inversion for Image Attribute Editing Update: We released the inferenc

Tengfei Wang 371 Dec 30, 2022
UniLM AI - Large-scale Self-supervised Pre-training across Tasks, Languages, and Modalities

Pre-trained (foundation) models across tasks (understanding, generation and translation), languages (100+ languages), and modalities (language, image, audio, vision + language, audio + language, etc.

Microsoft 7.6k Jan 01, 2023
A Python reference implementation of the CF data model

cfdm A Python reference implementation of the CF data model. References Compliance with FAIR principles Documentation https://ncas-cms.github.io/cfdm

NCAS CMS 25 Dec 13, 2022
This project contains an implemented version of Face Detection using OpenCV and Mediapipe. This is a code snippet and can be used in projects.

Live-Face-Detection Project Description: In this project, we will be using the live video feed from the camera to detect Faces. It will also detect so

Hassan Shahzad 3 Oct 02, 2021
Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields.

This repository contains the code release for Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields. This implementation is written in JAX, and is a fork of Google's JaxNeRF

Google 625 Dec 30, 2022