A Critical Assessment of State-of-the-Art in Entity Alignment
This repository contains the source code for the paper
A Critical Assessment of State-of-the-Art in Entity Alignment
Max Berrendorf, Ludwig Wacker, and Evgeniy Faerman
https://arxiv.org/abs/2010.16314
Installation
Setup and activate virtual environment:
python3.8 -m venv ./venv
source ./venv/bin/activate
Install requirements (in this virtual environment):
pip install -U pip
pip install -U -r requirements.txt
In order to run the DGMC scripts, you additionally need to setup its requirements as described in the corresponding GitHub repository's README. We do not include them into requirements.txt, since their installation is a bit more involved, including non-Python dependencies.
Preparation
MLFlow
In order to track results to a MLFlow server, start it first by running
mlflow server
Note: When storing the result for many configurations, we recommend to setup a database backend following the instructions. For the following examples, we assume that the server is running at
TRACKING_URI=http://localhost:5000
OpenEA RDGCN embeddings
Please download the RDGCN embeddings extracted with the OpenEA codebase from here and place them in ~/.kgm/openea_rdgcn_embeddings. They require around 160MiB storage.
BERT initialization
To generate data for the BERT-based initialization, run
(venv) PYTHONPATH=./src python3 executables/prepare_bert.py
We also provide preprocessed files at this url. If you prefer to use those, please download and place them in ~/.kgm/bert_prepared. They require around 6.1GiB storage.
Experiments
For all experiments the results are logged to the running MLFlow instance.
Note: The hyperparameter searches takes a significant amount of time (~multiple days), and requires access to GPU(s). You can abort the script at any time, and inspect the current results via the web interface of MLFlow.
Zero-Shot
For the zero-shot evaluation run
(venv) PYTHONPATH=./src python3 executables/zero_shot.py --tracking_uri=${TRACKING_URI}
GCN-Align
To run the hyperparameter search run
(venv) PYTHONPATH=./src python3 executables/tune_gcn_align.py --tracking_uri=${TRACKING_URI}
RDGCN
To run the hyperparameter search run
(venv) PYTHONPATH=./src python3 executables/tune_rdgcn.py --tracking_uri=${TRACKING_URI}
DGMC
To run the hyperparameter search run
(venv) PYTHONPATH=./src python3 executables/tune_dgmc.py --tracking_uri=${TRACKING_URI}
Evaluation
To summarize the dataset statistics run
(venv) PYTHONPATH=./src python3 executables/summarize.py --target datasets --force
To summarize all experiments run
(venv) PYTHONPATH=./src python3 executables/summarize.py --target results --tracking_uri=${TRACKING_URI} --force
To generate the ablation study table run
(venv) PYTHONPATH=./src python3 executables/summarize.py --target ablation --tracking_uri=${TRACKING_URI} --force