Benchmark for Answering Existential First Order Queries with Single Free Variable

Overview

EFO-1-QA Benchmark for First Order Query Estimation on Knowledge Graphs

This repository contains an entire pipeline for the EFO-1-QA benchmark. EFO-1 stands for the Existential First Order Queries with Single Free Varibale. The related paper has been submitted to the NeurIPS 2021 track on dataset and benchmark. OpenReview Link, and appeared on arXiv

If this work helps you, please cite

@article{EFO-1-QA,
  title={Benchmarking the Combinatorial Generalizability of Complex Query Answering on Knowledge Graphs},
  author={Wang, Zihao and Yin, Hang and Song, Yangqiu},
  journal={arXiv preprint arXiv:2109.08925},
  year={2021}
}

The pipeline overview.

alt text

  1. Query type generation and normalization The query types are generated by the DFS iteration of the context free grammar with the bounded negation hypothesis. The generated types are also normalized to several normal forms
  2. Query grounding and answer sampling The queries are grounded on specific knowledge graphs and the answers that are non-trivial are sampled.
  3. Model training and estimation We train and evaluate the specific query structure

Query type generation and normalization

The OpsTree is represented in the nested objects of FirstOrderSetQuery class in fol/foq_v2.py. We first generate the specific OpsTree and then store then by the formula property of FirstOrderSetQuery.

The OpsTree is generated by binary_formula_iterator in fol/foq_v2.py. The overall process is managed in formula_generation.py.

To generate the formula, just run

python formula_generation.py

Then the file formula csv is generated in the outputs folder. In this paper, we use the file in outputs/test_generated_formula_anchor_node=3.csv

Query grounding and answer sampling

We first prepare the KG data and then run the sampling code

The KG data (FB15k, FB15k-237, NELL995) should be put into under 'data/' folder. We use the data provided in the KGReasoning.

The structure of the data folder should be at least

data
	|---FB15k-237-betae
	|---FB15k-betae
	|---NELL-betae	

Then we can run the benchmark sampling code on specific knowledge graph by

python benchmark_sampling.py --knowledge_graph FB15k-237 
python benchmark_sampling.py --knowledge_graph FB15k
python benchmark_sampling.py --knowledge_graph NELL

Append new forms to existing data One can append new forms to the existing dataset by

python append_new_normal_form.py --knowledge_graph FB15k-237 

Model training and estimation

Models

Examples

The detailed setting of hyper-parameters or the knowledge graph to choose are in config folder, you can modify those configurations to create your own, all the experiments are on FB15k-237 by default.

Besides, the generated benchmark, one can also use the BetaE dataset after converting to our format by running:

python transform_beta_data.py

Use one of the commands in the following, depending on the choice of models:

python main.py --config config/{data_type}_{model_name}.yaml
  • The data_type includes benchmark and beta
  • The model_name includes BetaE, LogicE, NewLook and Query2Box

If you need to evaluate on the EFO-1-QA benchmark, be sure to load from existing model checkpoint, you can train one on your own or download from here:

python main.py --config config/benchmark_beta.yaml --checkpoint_path ckpt/FB15k/Beta_full
python main.py --config config/benchmark_NewLook.yaml --checkpoint_path ckpt/FB15k/NLK_full --load_step 450000
python main.py --config config/benchmark_Logic.yaml --checkpoint_path ckpt/FB15k/Logic_full --load_step 450000

We note that the BetaE checkpoint above is trained from KGReasoning

Paper Checklist

  1. For all authors..

    (a) Do the main claims made in the abstract and introduction accurately reflect the paper's contributions and scope? Yes

    (b) Have you read the ethics review guidelines and ensured that your paper conforms to them? Yes

    (c) Did you discuss any potential negative societal impacts of your work? No

    (d) Did you describe the limitations of your work? Yes

  2. If you are including theoretical results...

    (a) Did you state the full set of assumptions of all theoretical results? N/A

    (b) Did you include complete proofs of all theoretical results? N/A

  3. If you ran experiments...

    (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? Yes

    (b) Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? Yes

    (c) Did you report error bars (e.g., with respect to the random seed after running experiments multiple times)? No

    (d) Did you include the amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? No

  4. If you are using existing assets (e.g., code, data, models) or curating/releasing new assets...

    (a) If your work uses existing assets, did you cite the creators? Yes

    (b) Did you mention the license of the assets? No

    (c) Did you include any new assets either in the supplemental material or as a URL? Yes

    (d) Did you discuss whether and how consent was obtained from people whose data you're using/curating? N/A

    (e) Did you discuss whether the data you are using/curating contains personally identifiable information or offensive content? N/A

  5. If you used crowdsourcing or conducted research with human subjects...

    (a) Did you include the full text of instructions given to participants and screenshots, if applicable? N/A

    (b) Did you describe any potential participant risks, with links to Institutional Review Board (IRB) approvals, if applicable? N/A

    (c) Did you include the estimated hourly wage paid to participants and the total amount spent on participant compensation? N/A

Owner
HKUST-KnowComp
Knowledge Computation [email protected], led by Yangqiu Song
HKUST-KnowComp
Evolutionary Population Curriculum for Scaling Multi-Agent Reinforcement Learning

Evolutionary Population Curriculum for Scaling Multi-Agent Reinforcement Learning This is the code for implementing the MADDPG algorithm presented in

97 Dec 21, 2022
CSE-519---Project - Job Title Analysis (Project for CSE 519 - Data Science Fundamentals)

A Multifaceted Approach to Job Title Analysis CSE 519 - Data Science Fundamentals Project Description Project consists of three parts: Salary Predicti

Jimit Dholakia 1 Jan 04, 2022
Using some basic methods to show linkages and transformations of robotic arms

roboticArmVisualizer Python GUI application to create custom linkages and adjust joint angles. In the future, I plan to add 2d inverse kinematics solv

Sandesh Banskota 1 Nov 19, 2021
这是一个mobilenet-yolov4-lite的库,把yolov4主干网络修改成了mobilenet,修改了Panet的卷积组成,使参数量大幅度缩小。

YOLOV4:You Only Look Once目标检测模型-修改mobilenet系列主干网络-在Keras当中的实现 2021年2月8日更新: 加入letterbox_image的选项,关闭letterbox_image后网络的map一般可以得到提升。

Bubbliiiing 65 Dec 01, 2022
Replication of Pix2Seq with Pretrained Model

Pretrained-Pix2Seq We provide the pre-trained model of Pix2Seq. This version contains new data augmentation. The model is trained for 300 epochs and c

peng gao 51 Nov 22, 2022
Easily pull telemetry data and create beautiful visualizations for analysis.

This repository is a work in progress. Anything and everything is subject to change. Porpo Table of Contents Porpo Table of Contents General Informati

Ryan Dawes 33 Nov 30, 2022
pcnaDeep integrates cutting-edge detection techniques with tracking and cell cycle resolving models.

pcnaDeep: a deep-learning based single-cell cycle profiler with PCNA signal Welcome! pcnaDeep integrates cutting-edge detection techniques with tracki

ChanLab 8 Oct 18, 2022
Code for our NeurIPS 2021 paper 'Exploiting the Intrinsic Neighborhood Structure for Source-free Domain Adaptation'

Exploiting the Intrinsic Neighborhood Structure for Source-free Domain Adaptation (NeurIPS 2021) Code for our NeurIPS 2021 paper 'Exploiting the Intri

Shiqi Yang 53 Dec 25, 2022
SCAAML is a deep learning framwork dedicated to side-channel attacks run on top of TensorFlow 2.x.

SCAAML (Side Channel Attacks Assisted with Machine Learning) is a deep learning framwork dedicated to side-channel attacks. It is written in python and run on top of TensorFlow 2.x.

Google 69 Dec 21, 2022
Unrolled Variational Bayesian Algorithm for Image Blind Deconvolution

unfoldedVBA Unrolled Variational Bayesian Algorithm for Image Blind Deconvolution This repository contains the Pytorch implementation of the unrolled

Yunshi HUANG 2 Jul 10, 2022
Official implementation of the RAVE model: a Realtime Audio Variational autoEncoder

Official implementation of the RAVE model: a Realtime Audio Variational autoEncoder

Antoine Caillon 589 Jan 02, 2023
This is the PyTorch implementation of GANs N’ Roses: Stable, Controllable, Diverse Image to Image Translation

Official PyTorch repo for GAN's N' Roses. Diverse im2im and vid2vid selfie to anime translation.

1.1k Jan 01, 2023
[arXiv'22] Panoptic NeRF: 3D-to-2D Label Transfer for Panoptic Urban Scene Segmentation

Panoptic NeRF Project Page | Paper | Dataset Panoptic NeRF: 3D-to-2D Label Transfer for Panoptic Urban Scene Segmentation Xiao Fu*, Shangzhan zhang*,

Xiao Fu 111 Dec 16, 2022
Domain Generalization with MixStyle, ICLR'21.

MixStyle This repo contains the code of our ICLR'21 paper, "Domain Generalization with MixStyle". The OpenReview link is https://openreview.net/forum?

Kaiyang 208 Dec 28, 2022
Simple ONNX operation generator. Simple Operation Generator for ONNX.

sog4onnx Simple ONNX operation generator. Simple Operation Generator for ONNX. https://github.com/PINTO0309/simple-onnx-processing-tools Key concept V

Katsuya Hyodo 6 May 15, 2022
A graphical Semi-automatic annotation tool based on labelImg and Yolov5

💕YOLOV5 semi-automatic annotation tool (Based on labelImg)

EricFang 247 Jan 05, 2023
A denoising autoencoder + adversarial losses and attention mechanisms for face swapping.

faceswap-GAN Adding Adversarial loss and perceptual loss (VGGface) to deepfakes'(reddit user) auto-encoder architecture. Updates Date Update 2018-08-2

3.2k Dec 30, 2022
Tensorflow implementation of "BEGAN: Boundary Equilibrium Generative Adversarial Networks"

BEGAN in Tensorflow Tensorflow implementation of BEGAN: Boundary Equilibrium Generative Adversarial Networks. Requirements Python 2.7 or 3.x Pillow tq

Taehoon Kim 922 Dec 21, 2022
Continual Learning of Electronic Health Records (EHR).

Continual Learning of Longitudinal Health Records Repo for reproducing the experiments in Continual Learning of Longitudinal Health Records (2021). Re

Jacob 7 Oct 21, 2022
Learning Generative Models of Textured 3D Meshes from Real-World Images, ICCV 2021

Learning Generative Models of Textured 3D Meshes from Real-World Images This is the reference implementation of "Learning Generative Models of Texture

Dario Pavllo 115 Jan 07, 2023