A Broader Picture of Random-walk Based Graph Embedding

Overview

Random-walk Embedding Framework

This repository is a reference implementation of the random-walk embedding framework as described in the paper:

A Broader Picture of Random-walk Based Graph Embedding.
Zexi Huang, Arlei Silva, Ambuj Singh.
ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2021.

The framework decomposes random-walk based graph embedding into three major components: random-walk process, similarity function, and embedding algorithm. By tuning the components, it not only covers many existing approaches such as DeepWalk but naturally motivates novel ones that have shown superior performance on certain downstream tasks.

Usage

Example

To use the framework with default settings to embed the BlogCatalog network:
python src/embedding.py --graph graph/blogcatalog.edges --embeddings emb/blogcatalog.embeddings
where graph/blogcatalog.edges stores the input graph and emb/blogcatalog.embeddings is the target file for output embeddings.

Options

You can check out all the available options (framework components, Markov time parameters, graph types, etc.) with:
python src/embedding.py --help

Input Graph

The supported input graph format is a list of edges:

node1_id_int node2_id_int <weight_float, optional>

where node ids are should be consecutive integers starting from 1. The graph is by default undirected and unweighted, which can be changed by setting appropriate flags.

Output Embeddings

The output embedding file has n lines where n is the number of nodes in the graph. Each line stores the learned embedding of the node with its id equal to the line number:

emb_dim1 emb_dim2 ... emb_dimd

Evaluating

Here, we show by examples how to evaluate and compare different settings of our framework on node classification, link prediction, and community detection tasks. Full evaluation options are can be found with:
python src/evaluating.py --help

Note that the results shown below may not be identical to those in the paper due to different random seeds, but the conclusions are the same.

Node Classification

Once we generate the embedding with the script in previous section, we can call
python src/evaluating.py --task node-classification --embeddings emb/blogcatalog.embeddings --training-ratio 0.5
to compute the Micro-F1 and Macro-F1 scores of the node classification.

The results for comparing Pointwise Mutual Information (PMI) and Autocovariance (AC) similarity metrics with the best Markov times and varying training ratios are as follows:

Training Ratio 10% 20% 30% 40% 50% 60% 70% 80% 90%
PMI Micro-F1 0.3503 0.3814 0.3993 0.4106 0.4179 0.4227 0.4255 0.4222 0.4228
(time=4) Macro-F1 0.2212 0.2451 0.2575 0.2669 0.2713 0.2772 0.2768 0.2689 0.2678
AC Micro-F1 0.3547 0.3697 0.3785 0.3837 0.3872 0.3906 0.3912 0.3927 0.3930
(time=5) Macro-F1 0.2137 0.2299 0.2371 0.2406 0.2405 0.2413 0.2385 0.2356 0.2352

Link Prediction

Prepare

To evaluate the embedding method on link prediction, we first have to remove a ratio of edges in the original graph:
python src/evaluating.py --task link-prediction --mode prepare --graph graph/blogcatalog.edges --remaining-edges graph/blogcatalog.remaining-edges --removed-edges graph/blogcatalog.removed-edges

This takes the original graph graph/blogcatalog.edges as input and output the removed and remaining edges to graph/blogcatalog.removed-edges and graph/blogcatalog.remaining-edges.

Embed

Then, we embed based on the remaining edges of the network with the embedding script. For example:
python src/embedding.py --graph graph/blogcatalog.remaining-edges --embeddings emb/blogcatalog.residual-embeddings

Evaluate

Finally, we evaluate the performance of link prediction in terms of [email protected] based on the embeddings of the residual graph and the removed edges:
python src/evaluating.py --task link-prediction --mode evaluate --embeddings emb/blogcatalog.residual-embeddings --remaining-edges graph/blogcatalog.remaining-edges --removed-edges graph/blogcatalog.removed-edges --k 1.0

The results for comparing PMI and autocovariance similarity metrics with the best Markov times and varying k are as follows:

k 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
PMI (time=1) 0.2958 0.2380 0.2068 0.1847 0.1678 0.1560 0.1464 0.1382 0.1315 0.1260
AC (time=3) 0.4213 0.3420 0.2982 0.2667 0.2434 0.2253 0.2112 0.2000 0.1893 0.1802

Community Detection

Assume the embeddings for the Airport network emb/airport.embeddings have been generated. The following computes the Normalized Mutual Information (NMI) between the ground-truth country communities and the k-means clustering of embeddings:
python src/evaluating.py --task community-detection --embeddings emb/airport.embeddings --communities graph/airport.country-labels

Citing

If you find our framework useful, please consider citing the following paper:

@inproceedings{random-walk-embedding,
author = {Huang, Zexi and Silva, Arlei and Singh, Ambuj},
 title = {A Broader Picture of Random-walk Based Graph Embedding},
 booktitle = {SIGKDD},
 year = {2021}
}
Owner
Zexi Huang
Zexi Huang
Disentangled Cycle Consistency for Highly-realistic Virtual Try-On, CVPR 2021

Disentangled Cycle Consistency for Highly-realistic Virtual Try-On, CVPR 2021 [WIP] The code for CVPR 2021 paper 'Disentangled Cycle Consistency for H

ChongjianGE 94 Dec 11, 2022
This is an easy python software which allows to sort images with faces by gender and after by age.

Gender-age Classifier This is an easy python software which allows to sort images with faces by gender and after by age. Usage First install Deepface

Claudio Ciccarone 6 Sep 17, 2022
HAR-stacked-residual-bidir-LSTMs - Deep stacked residual bidirectional LSTMs for HAR

HAR-stacked-residual-bidir-LSTM The project is based on this repository which is presented as a tutorial. It consists of Human Activity Recognition (H

Guillaume Chevalier 287 Dec 27, 2022
This repository contains the implementation of Deep Detail Enhancment for Any Garment proposed in Eurographics 2021

Deep-Detail-Enhancement-for-Any-Garment Introduction This repository contains the implementation of Deep Detail Enhancment for Any Garment proposed in

40 Dec 13, 2022
The Unsupervised Reinforcement Learning Benchmark (URLB)

The Unsupervised Reinforcement Learning Benchmark (URLB) URLB provides a set of leading algorithms for unsupervised reinforcement learning where agent

259 Dec 26, 2022
Official PyTorch implementation for paper Context Matters: Graph-based Self-supervised Representation Learning for Medical Images

Context Matters: Graph-based Self-supervised Representation Learning for Medical Images Official PyTorch implementation for paper Context Matters: Gra

49 Nov 23, 2022
Convert dog pictures into various painting styles. Try LimnPet

LimnPet Cartoon stylization service project Try our service » Home page · Team notion · Members 목차 프로젝트 소개 프로젝트 목표 사용한 기술스택과 수행도구 팀원 구현 기능 주요 기능 추가 기능

LiJell 7 Jul 14, 2022
CTC segmentation python package

CTC segmentation CTC segmentation can be used to find utterances alignments within large audio files. This repository contains the ctc-segmentation py

Ludwig Kürzinger 217 Jan 04, 2023
This is an official implementation for "ResT: An Efficient Transformer for Visual Recognition".

ResT By Qing-Long Zhang and Yu-Bin Yang [State Key Laboratory for Novel Software Technology at Nanjing University] This repo is the official implement

zhql 222 Dec 13, 2022
PyTorch implementation of "LayoutTransformer: Layout Generation and Completion with Self-attention"

PyTorch implementation of "LayoutTransformer: Layout Generation and Completion with Self-attention" to appear in ICCV 2021

Kamal Gupta 75 Dec 23, 2022
RoIAlign & crop_and_resize for PyTorch

RoIAlign for PyTorch This is a PyTorch version of RoIAlign. This implementation is based on crop_and_resize and supports both forward and backward on

Long Chen 530 Jan 07, 2023
Search Youtube Video and Get Video info

PyYouTube Get Video Data from YouTube link Installation pip install PyYouTube How to use it ? Get Videos Data from pyyoutube import Data yt = Data("ht

lokaman chendekar 35 Nov 25, 2022
CR-FIQA: Face Image Quality Assessment by Learning Sample Relative Classifiability

This is the official repository of the paper: CR-FIQA: Face Image Quality Assessment by Learning Sample Relative Classifiability A private copy of the

Fadi Boutros 33 Dec 31, 2022
Code for the CVPR2021 paper "Patch-NetVLAD: Multi-Scale Fusion of Locally-Global Descriptors for Place Recognition"

Patch-NetVLAD: Multi-Scale Fusion of Locally-Global Descriptors for Place Recognition This repository contains code for the CVPR2021 paper "Patch-NetV

QVPR 368 Jan 06, 2023
Hashformers is a framework for hashtag segmentation with transformers.

Hashtag segmentation is the task of automatically inserting the missing spaces between the words in a hashtag. Hashformers applies Transformer models

Ruan Chaves 41 Nov 09, 2022
NeWT: Natural World Tasks

NeWT: Natural World Tasks This repository contains resources for working with the NeWT dataset. ❗ At this time the binary tasks are not publicly avail

Visipedia 26 Oct 18, 2022
DuBE: Duple-balanced Ensemble Learning from Skewed Data

DuBE: Duple-balanced Ensemble Learning from Skewed Data "Towards Inter-class and Intra-class Imbalance in Class-imbalanced Learning" (IEEE ICDE 2022 S

6 Nov 12, 2022
Generic ecosystem for feature extraction from aerial and satellite imagery

Note: Robosat is neither maintained not actively developed any longer by Mapbox. See this issue. The main developers (@daniel-j-h, @bkowshik) are no l

Mapbox 1.9k Jan 06, 2023
A PyTorch implementation of the Transformer model in "Attention is All You Need".

Attention is all you need: A Pytorch Implementation This is a PyTorch implementation of the Transformer model in "Attention is All You Need" (Ashish V

Yu-Hsiang Huang 7.1k Jan 04, 2023
A fast python implementation of Ray Tracing in One Weekend using python and Taichi

ray-tracing-one-weekend-taichi A fast python implementation of Ray Tracing in One Weekend using python and Taichi. Taichi is a simple "Domain specific

157 Dec 26, 2022