Distance Encoding for GNN Design

Overview

Distance-encoding for GNN design

This repository is the official PyTorch implementation of the DEGNN and DEAGNN framework reported in the paper:
Distance-Encoding -- Design Provably More PowerfulGNNs for Structural Representation Learning, to appear in NeurIPS 2020.

The project's home page is: http://snap.stanford.edu/distance-encoding/

Authors & Contact

Pan Li, Yanbang Wang, Hongwei Wang, Jure Leskovec

Questions on this repo can be emailed to [email protected] (Yanbang Wang)

Installation

Requirements: Python >= 3.5, Anaconda3

  • Update conda:
conda update -n base -c defaults conda
  • Install basic dependencies to virtual environment and activate it:
conda env create -f environment.yml
conda activate degnn-env
  • Install PyTorch >= 1.4.0 and torch-geometric >= 1.5.0 (please refer to the PyTorch and PyTorch Geometric official websites for more details). Commands examples are:
conda install pytorch=1.4.0 torchvision cudatoolkit=10.1 -c pytorch
pip install torch-scatter==latest+cu101 -f https://pytorch-geometric.com/whl/torch-1.4.0.html
pip install torch-sparse==latest+cu101 -f https://pytorch-geometric.com/whl/torch-1.4.0.html
pip install torch-cluster==latest+cu101 -f https://pytorch-geometric.com/whl/torch-1.4.0.html
pip install torch-spline-conv==latest+cu101 -f https://pytorch-geometric.com/whl/torch-1.4.0.html
pip install torch-geometric

The latest tested combination is: Python 3.8.2 + Pytorch 1.4.0 + torch-geometric 1.5.0.

Quick Start

python main.py --dataset celegans --feature sp --hidden_features 100 --prop_depth 1 --test_ratio 0.1 --epoch 300

    This uses 100-dimensional hidden features, 80/10/10 split of train/val/test set, and trains for 300 epochs.

  • To train DEAGNN-SPD for Task 3 (node-triads prediction) on C.elegans dataset:
python main.py --dataset celegans_tri --hidden_features 100 --prop_depth 2 --epoch 300 --feature sp --max_sp 5 --l2 1e-3 --test_ratio 0.1 --seed 9

    This enables 2-hop propagation per layer, truncates distance encoding at 5, and uses random seed 9.

  • To train DEGNN-LP (i.e. the random walk variant) for Task 1 (node-level prediction) on usa-airports using average accuracy as evaluation metric:
python main.py --dataset usa-airports --metric acc --hidden_features 100 --feature rw --rw_depth 2 --epoch 500 --bs 128 --test_ratio 0.1

Note that here the test_ratio currently contains both validation set and the actual test set, and will be changed to contain only test set.

  • To generate Figure2 LEFT of the paper (Simulation to validate Theorem 3.3):
python main.py --dataset simulation --max_sp 10

    The result will be plot to ./simulation_results.png.

  • All detailed training logs can be found at <log_dir>/<dataset>/<training-time>.log. A one-line summary will also be appended to <log_dir>/result_summary.log for each training instance.

Usage Summary

Interface for DE-GNN framework [-h] [--dataset DATASET] [--test_ratio TEST_RATIO]
                                      [--model {DE-GNN,GIN,GCN,GraphSAGE,GAT}] [--layers LAYERS]
                                      [--hidden_features HIDDEN_FEATURES] [--metric {acc,auc}] [--seed SEED] [--gpu GPU]
                                      [--data_usage DATA_USAGE] [--directed DIRECTED] [--parallel] [--prop_depth PROP_DEPTH]
                                      [--use_degree USE_DEGREE] [--use_attributes USE_ATTRIBUTES] [--feature FEATURE]
                                      [--rw_depth RW_DEPTH] [--max_sp MAX_SP] [--epoch EPOCH] [--bs BS] [--lr LR]
                                      [--optimizer OPTIMIZER] [--l2 L2] [--dropout DROPOUT] [--k K] [--n [N [N ...]]]
                                      [--N N] [--T T] [--log_dir LOG_DIR] [--summary_file SUMMARY_FILE] [--debug]

Optinal Arguments

  -h, --help            show this help message and exit
  
  # general settings
  --dataset DATASET     dataset name
  --test_ratio TEST_RATIO
                        ratio of the test against whole
  --model {DE-GCN,GIN,GAT,GCN,GraphSAGE}
                        model to use
  --layers LAYERS       largest number of layers
  --hidden_features HIDDEN_FEATURES
                        hidden dimension
  --metric {acc,auc}    metric for evaluating performance
  --seed SEED           seed to initialize all the random modules
  --gpu GPU             gpu id
  --adj_norm {asym,sym,None}
                        how to normalize adj
  --data_usage DATA_USAGE
                        use partial dataset
  --directed DIRECTED   (Currently unavailable) whether to treat the graph as directed
  --parallel            (Currently unavailable) whether to use multi cpu cores to prepare data
  
  # positional encoding settings
  --prop_depth PROP_DEPTH
                        propagation depth (number of hops) for one layer
  --use_degree USE_DEGREE
                        whether to use node degree as the initial feature
  --use_attributes USE_ATTRIBUTES
                        whether to use node attributes as the initial feature
  --feature FEATURE     distance encoding category: shortest path or random walk (landing probabilities)
  --rw_depth RW_DEPTH   random walk steps
  --max_sp MAX_SP       maximum distance to be encoded for shortest path feature
  
  # training settings
  --epoch EPOCH         number of epochs to train
  --bs BS               minibatch size
  --lr LR               learning rate
  --optimizer OPTIMIZER
                        optimizer to use
  --l2 L2               l2 regularization weight
  --dropout DROPOUT     dropout rate
  
  # imulation settings (valid only when dataset == 'simulation')
  --k K                 node degree (k) or synthetic k-regular graph
  --n [N [N ...]]       a list of number of nodes in each connected k-regular subgraph
  --N N                 total number of nodes in simultation
  --T T                 largest number of layers to be tested
  
  # logging
  --log_dir LOG_DIR     log directory
  --summary_file SUMMARY_FILE
                        brief summary of training result
  --debug               whether to use debug mode

Reference

If you make use of the code/experiment of Distance-encoding in your work, please cite our paper:

@article{li2020distance,
  title={Distance Encoding: Design Provably More Powerful Neural Networks for Graph Representation Learning},
  author={Li, Pan and Wang, Yanbang and Wang, Hongwei and Leskovec, Jure},
  journal={Advances in Neural Information Processing Systems},
  volume={33},
  year={2020}
}
CTRL-C: Camera calibration TRansformer with Line-Classification

CTRL-C: Camera calibration TRansformer with Line-Classification This repository contains the official code and pretrained models for CTRL-C (Camera ca

57 Nov 14, 2022
Deep Networks with Recurrent Layer Aggregation

RLA-Net: Recurrent Layer Aggregation Recurrence along Depth: Deep Networks with Recurrent Layer Aggregation This is an implementation of RLA-Net (acce

Joy Fang 21 Aug 16, 2022
[ICML'21] Estimate the accuracy of the classifier in various environments through self-supervision

What Does Rotation Prediction Tell Us about Classifier Accuracy under Varying Testing Environments? [Paper] [ICML'21 Project] PyTorch Implementation T

24 Oct 26, 2022
Malmo Collaborative AI Challenge - Team Pig Catcher

The Malmo Collaborative AI Challenge - Team Pig Catcher Approach The challenge involves 2 agents who can either cooperate or defect. The optimal polic

Kai Arulkumaran 66 Jun 29, 2022
Converting CPT to bert form for use

cpt-encoder 将CPT转成bert形式使用 说明 刚刚刷到又出了一种模型:CPT,看论文显示,在很多中文任务上性能比mac bert还好,就迫不及待想把它用起来。 根据对源码的研究,发现该模型在做nlu建模时主要用的encoder部分,也就是bert,因此我将这部分权重转为bert权重类型

黄辉 1 Oct 14, 2021
Online Pseudo Label Generation by Hierarchical Cluster Dynamics for Adaptive Person Re-identification

Online Pseudo Label Generation by Hierarchical Cluster Dynamics for Adaptive Person Re-identification

TANG, shixiang 6 Nov 25, 2022
SAGE: Sensitivity-guided Adaptive Learning Rate for Transformers

SAGE: Sensitivity-guided Adaptive Learning Rate for Transformers This repo contains our codes for the paper "No Parameters Left Behind: Sensitivity Gu

Chen Liang 23 Nov 07, 2022
code and data for paper "GIANT: Scalable Creation of a Web-scale Ontology"

GIANT Code and data for paper "GIANT: Scalable Creation of a Web-scale Ontology" https://arxiv.org/pdf/2004.02118.pdf Please cite our paper if this pr

Excalibur 39 Dec 29, 2022
[Preprint] "Chasing Sparsity in Vision Transformers: An End-to-End Exploration" by Tianlong Chen, Yu Cheng, Zhe Gan, Lu Yuan, Lei Zhang, Zhangyang Wang

Chasing Sparsity in Vision Transformers: An End-to-End Exploration Codes for [Preprint] Chasing Sparsity in Vision Transformers: An End-to-End Explora

VITA 64 Dec 08, 2022
Deep functional residue identification

DeepFRI Deep functional residue identification Citing @article {Gligorijevic2019, author = {Gligorijevic, Vladimir and Renfrew, P. Douglas and Koscio

Flatiron Institute 156 Dec 25, 2022
This code is an implementation for Singing TTS.

MLP Singer This code is an implementation for Singing TTS. The algorithm is based on the following papers: Tae, J., Kim, H., & Lee, Y. (2021). MLP Sin

Heejo You 22 Dec 23, 2022
Human Dynamics from Monocular Video with Dynamic Camera Movements

Human Dynamics from Monocular Video with Dynamic Camera Movements Ri Yu, Hwangpil Park and Jehee Lee Seoul National University ACM Transactions on Gra

215 Jan 01, 2023
SEOVER: Sentence-level Emotion Orientation Vector based Conversation Emotion Recognition Model

SEOVER-Master This code is the implementation of paper: SEOVER: Sentence-level Emotion Orientation Vector based Conversation Emotion Recognition Model

4 Feb 24, 2022
In-Place Activated BatchNorm for Memory-Optimized Training of DNNs

In-Place Activated BatchNorm In-Place Activated BatchNorm for Memory-Optimized Training of DNNs In-Place Activated BatchNorm (InPlace-ABN) is a novel

1.3k Dec 29, 2022
MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation

MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation This repo is the official implementation of "MHFormer: Multi-Hypothesis Transforme

Vegetabird 281 Jan 07, 2023
FAMIE is a comprehensive and efficient active learning (AL) toolkit for multilingual information extraction (IE)

FAMIE: A Fast Active Learning Framework for Multilingual Information Extraction

18 Sep 01, 2022
A modified version of DeepMind's Alphafold2 to divide CPU part (MSA and template searching) and GPU part (prediction model)

ParallelFold Author: Bozitao Zhong This is a modified version of DeepMind's Alphafold2 to divide CPU part (MSA and template searching) and GPU part (p

Bozitao Zhong 77 Dec 22, 2022
Official Pytorch implementation for video neural representation (NeRV)

NeRV: Neural Representations for Videos (NeurIPS 2021) Project Page | Paper | UVG Data Hao Chen, Bo He, Hanyu Wang, Yixuan Ren, Ser-Nam Lim, Abhinav S

hao 214 Dec 28, 2022
An offline deep reinforcement learning library

d3rlpy: An offline deep reinforcement learning library d3rlpy is an offline deep reinforcement learning library for practitioners and researchers. imp

Takuma Seno 817 Jan 02, 2023
Fashion Recommender System With Python

Fashion-Recommender-System Thr growing e-commerce industry presents us with a la

Omkar Gawade 2 Feb 02, 2022