A Broader Picture of Random-walk Based Graph Embedding

Overview

Random-walk Embedding Framework

This repository is a reference implementation of the random-walk embedding framework as described in the paper:

A Broader Picture of Random-walk Based Graph Embedding.
Zexi Huang, Arlei Silva, Ambuj Singh.
ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2021.

The framework decomposes random-walk based graph embedding into three major components: random-walk process, similarity function, and embedding algorithm. By tuning the components, it not only covers many existing approaches such as DeepWalk but naturally motivates novel ones that have shown superior performance on certain downstream tasks.

Usage

Example

To use the framework with default settings to embed the BlogCatalog network:
python src/embedding.py --graph graph/blogcatalog.edges --embeddings emb/blogcatalog.embeddings
where graph/blogcatalog.edges stores the input graph and emb/blogcatalog.embeddings is the target file for output embeddings.

Options

You can check out all the available options (framework components, Markov time parameters, graph types, etc.) with:
python src/embedding.py --help

Input Graph

The supported input graph format is a list of edges:

node1_id_int node2_id_int <weight_float, optional>

where node ids are should be consecutive integers starting from 1. The graph is by default undirected and unweighted, which can be changed by setting appropriate flags.

Output Embeddings

The output embedding file has n lines where n is the number of nodes in the graph. Each line stores the learned embedding of the node with its id equal to the line number:

emb_dim1 emb_dim2 ... emb_dimd

Evaluating

Here, we show by examples how to evaluate and compare different settings of our framework on node classification, link prediction, and community detection tasks. Full evaluation options are can be found with:
python src/evaluating.py --help

Note that the results shown below may not be identical to those in the paper due to different random seeds, but the conclusions are the same.

Node Classification

Once we generate the embedding with the script in previous section, we can call
python src/evaluating.py --task node-classification --embeddings emb/blogcatalog.embeddings --training-ratio 0.5
to compute the Micro-F1 and Macro-F1 scores of the node classification.

The results for comparing Pointwise Mutual Information (PMI) and Autocovariance (AC) similarity metrics with the best Markov times and varying training ratios are as follows:

Training Ratio 10% 20% 30% 40% 50% 60% 70% 80% 90%
PMI Micro-F1 0.3503 0.3814 0.3993 0.4106 0.4179 0.4227 0.4255 0.4222 0.4228
(time=4) Macro-F1 0.2212 0.2451 0.2575 0.2669 0.2713 0.2772 0.2768 0.2689 0.2678
AC Micro-F1 0.3547 0.3697 0.3785 0.3837 0.3872 0.3906 0.3912 0.3927 0.3930
(time=5) Macro-F1 0.2137 0.2299 0.2371 0.2406 0.2405 0.2413 0.2385 0.2356 0.2352

Link Prediction

Prepare

To evaluate the embedding method on link prediction, we first have to remove a ratio of edges in the original graph:
python src/evaluating.py --task link-prediction --mode prepare --graph graph/blogcatalog.edges --remaining-edges graph/blogcatalog.remaining-edges --removed-edges graph/blogcatalog.removed-edges

This takes the original graph graph/blogcatalog.edges as input and output the removed and remaining edges to graph/blogcatalog.removed-edges and graph/blogcatalog.remaining-edges.

Embed

Then, we embed based on the remaining edges of the network with the embedding script. For example:
python src/embedding.py --graph graph/blogcatalog.remaining-edges --embeddings emb/blogcatalog.residual-embeddings

Evaluate

Finally, we evaluate the performance of link prediction in terms of [email protected] based on the embeddings of the residual graph and the removed edges:
python src/evaluating.py --task link-prediction --mode evaluate --embeddings emb/blogcatalog.residual-embeddings --remaining-edges graph/blogcatalog.remaining-edges --removed-edges graph/blogcatalog.removed-edges --k 1.0

The results for comparing PMI and autocovariance similarity metrics with the best Markov times and varying k are as follows:

k 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
PMI (time=1) 0.2958 0.2380 0.2068 0.1847 0.1678 0.1560 0.1464 0.1382 0.1315 0.1260
AC (time=3) 0.4213 0.3420 0.2982 0.2667 0.2434 0.2253 0.2112 0.2000 0.1893 0.1802

Community Detection

Assume the embeddings for the Airport network emb/airport.embeddings have been generated. The following computes the Normalized Mutual Information (NMI) between the ground-truth country communities and the k-means clustering of embeddings:
python src/evaluating.py --task community-detection --embeddings emb/airport.embeddings --communities graph/airport.country-labels

Citing

If you find our framework useful, please consider citing the following paper:

@inproceedings{random-walk-embedding,
author = {Huang, Zexi and Silva, Arlei and Singh, Ambuj},
 title = {A Broader Picture of Random-walk Based Graph Embedding},
 booktitle = {SIGKDD},
 year = {2021}
}
Owner
Zexi Huang
Zexi Huang
From Perceptron model to Deep Neural Network from scratch in Python.

Neural-Network-Basics Aim of this Repository: From Perceptron model to Deep Neural Network (from scratch) in Python. ** Currently working on a basic N

Aditya Kahol 1 Jan 14, 2022
Tools for computational pathology

A toolkit for computational pathology and machine learning. View documentation Please cite our paper Installation There are several ways to install Pa

254 Dec 12, 2022
Code for Contrastive-Geometry Networks for Generalized 3D Pose Transfer

Code for Contrastive-Geometry Networks for Generalized 3D Pose Transfer

18 Jun 28, 2022
QICK: Quantum Instrumentation Control Kit

QICK: Quantum Instrumentation Control Kit The QICK is a kit of firmware and software to use the Xilinx RFSoC to control quantum systems. It consists o

81 Dec 15, 2022
Implementation supporting the ICCV 2017 paper "GANs for Biological Image Synthesis"

GANs for Biological Image Synthesis This codes implements the ICCV-2017 paper "GANs for Biological Image Synthesis". The paper and its supplementary m

Anton Osokin 95 Nov 25, 2022
Noether Networks: meta-learning useful conserved quantities

Noether Networks: meta-learning useful conserved quantities This repository contains the code necessary to reproduce experiments from "Noether Network

Dylan Doblar 33 Nov 23, 2022
Read and write layered TIFF ImageSourceData and ImageResources tags

Read and write layered TIFF ImageSourceData and ImageResources tags Psdtags is a Python library to read and write the Adobe Photoshop(r) specific Imag

Christoph Gohlke 4 Feb 05, 2022
Get 2D point positions (e.g., facial landmarks) projected on 3D mesh

points2d_projection_mesh Input 2D points (e.g. facial landmarks) on an image Camera parameters (extrinsic and intrinsic) of the image Aligned 3D mesh

5 Dec 08, 2022
PyBullet CartPole and Quadrotor environments—with CasADi symbolic a priori dynamics—for learning-based control and reinforcement learning

safe-control-gym Physics-based CartPole and Quadrotor Gym environments (using PyBullet) with symbolic a priori dynamics (using CasADi) for learning-ba

Dynamic Systems Lab 300 Dec 28, 2022
SLAMP: Stochastic Latent Appearance and Motion Prediction

SLAMP: Stochastic Latent Appearance and Motion Prediction Official implementation of the paper SLAMP: Stochastic Latent Appearance and Motion Predicti

Kaan Akan 34 Dec 08, 2022
[AAAI 2022] Separate Contrastive Learning for Organs-at-Risk and Gross-Tumor-Volume Segmentation with Limited Annotation

A paper Introduction This is an official release of the paper Separate Contrastive Learning for Organs-at-Risk and Gross-Tumor-Volume Segmentation wit

Jiacheng Wang 14 Dec 08, 2022
Multi-Anchor Active Domain Adaptation for Semantic Segmentation (ICCV 2021 Oral)

Multi-Anchor Active Domain Adaptation for Semantic Segmentation Munan Ning*, Donghuan Lu*, Dong Wei†, Cheng Bian, Chenglang Yuan, Shuang Yu, Kai Ma, Y

Munan Ning 36 Dec 07, 2022
Deep Neural Networks Improve Radiologists' Performance in Breast Cancer Screening

Deep Neural Networks Improve Radiologists' Performance in Breast Cancer Screening Introduction This is an implementation of the model used for breast

757 Dec 30, 2022
PyTorch implementation for the paper Pseudo Numerical Methods for Diffusion Models on Manifolds

Pseudo Numerical Methods for Diffusion Models on Manifolds (PNDM) This repo is the official PyTorch implementation for the paper Pseudo Numerical Meth

Luping Liu (刘路平) 196 Jan 05, 2023
Code of paper: "DropAttack: A Masked Weight Adversarial Training Method to Improve Generalization of Neural Networks"

DropAttack: A Masked Weight Adversarial Training Method to Improve Generalization of Neural Networks Abstract: Adversarial training has been proven to

倪仕文 (Shiwen Ni) 58 Nov 10, 2022
A Pytorch implementation of "Manifold Matching via Deep Metric Learning for Generative Modeling" (ICCV 2021)

Manifold Matching via Deep Metric Learning for Generative Modeling A Pytorch implementation of "Manifold Matching via Deep Metric Learning for Generat

69 Dec 10, 2022
Self-Supervised Pillar Motion Learning for Autonomous Driving (CVPR 2021)

Self-Supervised Pillar Motion Learning for Autonomous Driving Chenxu Luo, Xiaodong Yang, Alan Yuille Self-Supervised Pillar Motion Learning for Autono

QCraft 101 Dec 05, 2022
Pytorch implementation AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks

AttnGAN Pytorch implementation for reproducing AttnGAN results in the paper AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative

Tao Xu 1.2k Dec 26, 2022
Pytorch reimplementation of the Vision Transformer (An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale)

Vision Transformer Pytorch reimplementation of Google's repository for the ViT model that was released with the paper An Image is Worth 16x16 Words: T

Eunkwang Jeon 1.4k Dec 28, 2022
Generalized Random Forests

generalized random forests A pluggable package for forest-based statistical estimation and inference. GRF currently provides non-parametric methods fo

GRF Labs 781 Dec 25, 2022