Benchmark for Answering Existential First Order Queries with Single Free Variable

Overview

EFO-1-QA Benchmark for First Order Query Estimation on Knowledge Graphs

This repository contains an entire pipeline for the EFO-1-QA benchmark. EFO-1 stands for the Existential First Order Queries with Single Free Varibale. The related paper has been submitted to the NeurIPS 2021 track on dataset and benchmark. OpenReview Link, and appeared on arXiv

If this work helps you, please cite

@article{EFO-1-QA,
  title={Benchmarking the Combinatorial Generalizability of Complex Query Answering on Knowledge Graphs},
  author={Wang, Zihao and Yin, Hang and Song, Yangqiu},
  journal={arXiv preprint arXiv:2109.08925},
  year={2021}
}

The pipeline overview.

alt text

  1. Query type generation and normalization The query types are generated by the DFS iteration of the context free grammar with the bounded negation hypothesis. The generated types are also normalized to several normal forms
  2. Query grounding and answer sampling The queries are grounded on specific knowledge graphs and the answers that are non-trivial are sampled.
  3. Model training and estimation We train and evaluate the specific query structure

Query type generation and normalization

The OpsTree is represented in the nested objects of FirstOrderSetQuery class in fol/foq_v2.py. We first generate the specific OpsTree and then store then by the formula property of FirstOrderSetQuery.

The OpsTree is generated by binary_formula_iterator in fol/foq_v2.py. The overall process is managed in formula_generation.py.

To generate the formula, just run

python formula_generation.py

Then the file formula csv is generated in the outputs folder. In this paper, we use the file in outputs/test_generated_formula_anchor_node=3.csv

Query grounding and answer sampling

We first prepare the KG data and then run the sampling code

The KG data (FB15k, FB15k-237, NELL995) should be put into under 'data/' folder. We use the data provided in the KGReasoning.

The structure of the data folder should be at least

data
	|---FB15k-237-betae
	|---FB15k-betae
	|---NELL-betae	

Then we can run the benchmark sampling code on specific knowledge graph by

python benchmark_sampling.py --knowledge_graph FB15k-237 
python benchmark_sampling.py --knowledge_graph FB15k
python benchmark_sampling.py --knowledge_graph NELL

Append new forms to existing data One can append new forms to the existing dataset by

python append_new_normal_form.py --knowledge_graph FB15k-237 

Model training and estimation

Models

Examples

The detailed setting of hyper-parameters or the knowledge graph to choose are in config folder, you can modify those configurations to create your own, all the experiments are on FB15k-237 by default.

Besides, the generated benchmark, one can also use the BetaE dataset after converting to our format by running:

python transform_beta_data.py

Use one of the commands in the following, depending on the choice of models:

python main.py --config config/{data_type}_{model_name}.yaml
  • The data_type includes benchmark and beta
  • The model_name includes BetaE, LogicE, NewLook and Query2Box

If you need to evaluate on the EFO-1-QA benchmark, be sure to load from existing model checkpoint, you can train one on your own or download from here:

python main.py --config config/benchmark_beta.yaml --checkpoint_path ckpt/FB15k/Beta_full
python main.py --config config/benchmark_NewLook.yaml --checkpoint_path ckpt/FB15k/NLK_full --load_step 450000
python main.py --config config/benchmark_Logic.yaml --checkpoint_path ckpt/FB15k/Logic_full --load_step 450000

We note that the BetaE checkpoint above is trained from KGReasoning

Paper Checklist

  1. For all authors..

    (a) Do the main claims made in the abstract and introduction accurately reflect the paper's contributions and scope? Yes

    (b) Have you read the ethics review guidelines and ensured that your paper conforms to them? Yes

    (c) Did you discuss any potential negative societal impacts of your work? No

    (d) Did you describe the limitations of your work? Yes

  2. If you are including theoretical results...

    (a) Did you state the full set of assumptions of all theoretical results? N/A

    (b) Did you include complete proofs of all theoretical results? N/A

  3. If you ran experiments...

    (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? Yes

    (b) Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? Yes

    (c) Did you report error bars (e.g., with respect to the random seed after running experiments multiple times)? No

    (d) Did you include the amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? No

  4. If you are using existing assets (e.g., code, data, models) or curating/releasing new assets...

    (a) If your work uses existing assets, did you cite the creators? Yes

    (b) Did you mention the license of the assets? No

    (c) Did you include any new assets either in the supplemental material or as a URL? Yes

    (d) Did you discuss whether and how consent was obtained from people whose data you're using/curating? N/A

    (e) Did you discuss whether the data you are using/curating contains personally identifiable information or offensive content? N/A

  5. If you used crowdsourcing or conducted research with human subjects...

    (a) Did you include the full text of instructions given to participants and screenshots, if applicable? N/A

    (b) Did you describe any potential participant risks, with links to Institutional Review Board (IRB) approvals, if applicable? N/A

    (c) Did you include the estimated hourly wage paid to participants and the total amount spent on participant compensation? N/A

Owner
HKUST-KnowComp
Knowledge Computation [email protected], led by Yangqiu Song
HKUST-KnowComp
This code is for our paper "VTGAN: Semi-supervised Retinal Image Synthesis and Disease Prediction using Vision Transformers"

ICCV Workshop 2021 VTGAN This code is for our paper "VTGAN: Semi-supervised Retinal Image Synthesis and Disease Prediction using Vision Transformers"

Sharif Amit Kamran 25 Dec 08, 2022
Offical implementation of Shunted Self-Attention via Multi-Scale Token Aggregation

Shunted Transformer This is the offical implementation of Shunted Self-Attention via Multi-Scale Token Aggregation by Sucheng Ren, Daquan Zhou, Shengf

156 Dec 27, 2022
This program can detect your face and add an Christams hat on the top of your head

Auto_Christmas This program can detect your face and add a Christmas hat to the top of your head. just run the Auto_Christmas.py, then you can see the

3 Dec 22, 2021
PyTorch code for our ECCV 2020 paper "Single Image Super-Resolution via a Holistic Attention Network"

HAN PyTorch code for our ECCV 2020 paper "Single Image Super-Resolution via a Holistic Attention Network" This repository is for HAN introduced in the

五维空间 140 Nov 23, 2022
GAN encoders in PyTorch that could match PGGAN, StyleGAN v1/v2, and BigGAN. Code also integrates the implementation of these GANs.

MTV-TSA: Adaptable GAN Encoders for Image Reconstruction via Multi-type Latent Vectors with Two-scale Attentions. This is the official code release fo

owl 37 Dec 24, 2022
Spatial Sparse Convolution Library

SpConv: Spatially Sparse Convolution Library PyPI Install Downloads CPU (Linux Only) pip install spconv CUDA 10.2 pip install spconv-cu102 CUDA 11.1 p

Yan Yan 1.2k Jan 07, 2023
A toolkit for Lagrangian-based constrained optimization in Pytorch

Cooper About Cooper is a toolkit for Lagrangian-based constrained optimization in Pytorch. This library aims to encourage and facilitate the study of

Cooper 34 Jan 01, 2023
Library for time-series-forecasting-as-a-service.

TIMEX TIMEX (referred in code as timexseries) is a framework for time-series-forecasting-as-a-service. Its main goal is to provide a simple and generi

Alessandro Falcetta 8 Jan 06, 2023
Kaggle competition: Springleaf Marketing Response

PruebaEnel Prueba Kaggle-Springleaf-master Prueba Kaggle-Springleaf Kaggle competition: Springleaf Marketing Response Competencia de Kaggle: Marketing

1 Feb 09, 2022
Generating Radiology Reports via Memory-driven Transformer

R2Gen This is the implementation of Generating Radiology Reports via Memory-driven Transformer at EMNLP-2020. Citations If you use or extend our work,

CUHK-SZ NLP Group 101 Dec 13, 2022
N-RPG - Novel role playing game da turfu

N-RPG Ce README sera la page de garde du projet. Contenu Il contiendra la présen

4 Mar 15, 2022
Inhomogeneous Social Recommendation with Hypergraph Convolutional Networks

Inhomogeneous Social Recommendation with Hypergraph Convolutional Networks This is our Pytorch implementation for the paper: Zirui Zhu, Chen Gao, Xu C

Zirui Zhu 3 Dec 30, 2022
ALBERT-pytorch-implementation - ALBERT pytorch implementation

ALBERT-pytorch-implementation developing... 모델의 개념이해를 돕기 위한 구현물로 현재 변수명을 상세히 적었고

BG Kim 3 Oct 06, 2022
retweet 4 satoshi ⚡️

rt4sat retweet 4 satoshi This bot is the codebase for https://twitter.com/rt4sat please feel free to create an issue if you saw any bugs basically thi

6 Sep 30, 2022
Official implementation of "MetaSDF: Meta-learning Signed Distance Functions"

MetaSDF: Meta-learning Signed Distance Functions Project Page | Paper | Data Vincent Sitzmann*, Eric Ryan Chan*, Richard Tucker, Noah Snavely Gordon W

Vincent Sitzmann 100 Jan 01, 2023
Code for ACM MM2021 paper "Complementary Trilateral Decoder for Fast and Accurate Salient Object Detection"

CTDNet The PyTorch code for ACM MM2021 paper "Complementary Trilateral Decoder for Fast and Accurate Salient Object Detection" Requirements Python 3.6

CVTEAM 28 Oct 20, 2022
🐸STT integration examples

🐸 STT 0.9.x Examples These are various examples on how to use or integrate 🐸 STT using our packages. It is a good way to just try out 🐸 STT before

coqui 92 Dec 19, 2022
Sign Language is detected in realtime using video sequences. Our approach involves MediaPipe Holistic for keypoints extraction and LSTM Model for prediction.

RealTime Sign Language Detection using Action Recognition Approach Real-Time Sign Language is commonly predicted using models whose architecture consi

Rishikesh S 15 Aug 20, 2022
Codebase for Amodal Segmentation through Out-of-Task andOut-of-Distribution Generalization with a Bayesian Model

Codebase for Amodal Segmentation through Out-of-Task andOut-of-Distribution Generalization with a Bayesian Model

Yihong Sun 12 Nov 15, 2022
Collection of generative models in Pytorch version.

pytorch-generative-model-collections Original : [Tensorflow version] Pytorch implementation of various GANs. This repository was re-implemented with r

Hyeonwoo Kang 2.4k Dec 31, 2022