Scalable Graph Neural Networks for Heterogeneous Graphs

Related tags

Deep LearningNARS
Overview

Neighbor Averaging over Relation Subgraphs (NARS)

NARS is an algorithm for node classification on heterogeneous graphs, based on scalable neighbor averaging techniques that have been previously used in e.g. SIGN to heterogeneous scenarios by generating neighbor-averaged features on sampled relation induced subgraphs.

For more details, please check out our paper:

Scalable Graph Neural Networks for Heterogeneous Graphs

Setup

Dependencies

  • torch==1.5.1+cu101
  • dgl-cu101==0.4.3.post2
  • ogb==1.2.1
  • dglke==0.1.0

Docker

We have prepared a dockerfile for building a container with clean environment and all required dependencies. Please checkout instructions in docker.

Data Preparation

Download and pre-process OAG dataset (optional)

If you plan to evaluate on OAG dataset, you need to follow instructions in oag_dataset to download and pre-process dataset.

Generate input for featureless node types

In academic graph datasets (ACM, MAG, OAG) in which only paper nodes are associated with input features. NARS featurizes other node types with TransE relational graph embedding pre-trained on the graph structure.

Please follow instructions in graph_embed to generate embeddings for each dataset.

Sample relation subsets

NARS samples Relation Subsets (see our paper for details). Please follow the instructions in sample_relation_subsets to generate these subsets.

Or you may skip this step and use the example subsets that have added to this repository.

Run NARS Experiments

NARS are evaluated on three academic graph datasets to predict publishing venues and fields of papers.

ACM

python3 train.py --dataset acm --use-emb TransE_acm --R 2 \
    --use-relation-subsets sample_relation_subsets/examples/acm \
    --num-hidden 64 --lr 0.003 --dropout 0.7 --eval-every 1 \
    --num-epochs 100 --input-dropout

OGBN-MAG

python3 train.py --dataset mag --use-emb TransE_mag --R 5 \
    --use-relation-subset sample_relation_subsets/examples/mag \
    --eval-batch-size 50000 --num-hidden 512 --lr 0.001 --batch-s 50000 \
    --dropout 0.5 --num-epochs 1000

OAG (venue prediction)

python3 train.py --dataset oag_venue --use-emb TransE_oag_venue --R 3 \
    --use-relation-subsets sample_relation_subsets/examples/oag_venue \
    --eval-batch-size 25000 --num-hidden 256 --lr 0.001 --batch-size 1000 \
    --data-dir oag_dataset --dropout 0.5 --num-epochs 200

OAG (L1-field prediction)

python3 train.py --dataset oag_L1 --use-emb TransE_oag_L1 --R 3 \
    --use-relation-subsets sample_relation_subsets/examples/oag_L1 \
    --eval-batch-size 25000 --num-hidden 256 --lr 0.001 --batch-size 1000 \
    --data-dir oag_dataset --dropout 0.5 --num-epochs 200

Results

Here is a summary of model performance using example relation subsets:

For ACM and OGBN-MAG dataset, the task is to predict paper publishing venue.

Dataset # Params Test Accuracy
ACM 0.40M 0.9305±0.0043
OGBN-MAG 4.13M 0.5240±0.0016

For OAG dataset, there are two different node predictions tasks: predicting venue (single-label) and L1-field (multi-label). And we follow Heterogeneous Graph Transformer to evaluate using NDCG and MRR metrics.

Task # Params NDCG MRR
Venue 2.24M 0.5214±0.0010 0.3434±0.0012
L1-field 1.41M 0.86420.0022 0.8542±0.0019

Run with limited GPU memory

The above commands were tested on Tesla V100 (32 GB) and Tesla T4 (15GB). If your GPU memory isn't enough for handling large graphs, try the following:

  • add --cpu-process to the command to move preprocessing logic to CPU
  • reduce evaluation batch size with --eval-batch-size. The evaluation result won't be affected since model is fixed.
  • reduce training batch with --batch-size

Run NARS with Reduced CPU Memory Footprint

As mentioned in our paper, using a lot of relation subsets may consume too much CPU memory. To reduce CPU memory footprint, we implemented an optimization in train_partial.py which trains part of our feature aggregation weights at a time.

Using OGBN-MAG dataset as an example, the following command randomly picks 3 subsets from all 8 sampled relation subsets and trains their aggregation weights every 10 epochs.

python3 train_partial.py --dataset mag --use-emb TransE_mag --R 5 \
    --use-relation-subsets sample_relation_subsets/examples/mag \
    --eval-batch-size 50000 --num-hidden 512 --lr 0.001 --batch-size 50000 \
    --dropout 0.5 --num-epochs 1000 --sample-size 3 --resample-every 10

Citation

Please cite our paper with:

@article{yu2020scalable,
    title={Scalable Graph Neural Networks for Heterogeneous Graphs},
    author={Yu, Lingfan and Shen, Jiajun and Li, Jinyang and Lerer, Adam},
    journal={arXiv preprint arXiv:2011.09679},
    year={2020}
}

License

NARS is CC-by-NC licensed, as found in the LICENSE file.

Owner
Facebook Research
Facebook Research
Automatic meme generation model using Tensorflow Keras.

Memefly You can find the project at MemeflyAI. Contributors Nick Buukhalter Harsh Desai Han Lee Project Overview Trello Board Product Canvas Automatic

BloomTech Labs 2 Jan 13, 2022
Continuous Time LiDAR odometry

CT-ICP: Elastic SLAM for LiDAR sensors This repository implements the SLAM CT-ICP (see our article), a lightweight, precise and versatile pure LiDAR o

385 Dec 29, 2022
PyTorch implementation for our AAAI 2022 Paper "Graph-wise Common Latent Factor Extraction for Unsupervised Graph Representation Learning"

deepGCFX PyTorch implementation for our AAAI 2022 Paper "Graph-wise Common Latent Factor Extraction for Unsupervised Graph Representation Learning" Pr

Thilini Cooray 4 Aug 11, 2022
A Pytorch implementation of SMU: SMOOTH ACTIVATION FUNCTION FOR DEEP NETWORKS USING SMOOTHING MAXIMUM TECHNIQUE

SMU_pytorch A Pytorch Implementation of SMU: SMOOTH ACTIVATION FUNCTION FOR DEEP NETWORKS USING SMOOTHING MAXIMUM TECHNIQUE arXiv https://arxiv.org/ab

Fuhang 36 Dec 24, 2022
How to use TensorLayer

How to use TensorLayer While research in Deep Learning continues to improve the world, we use a bunch of tricks to implement algorithms with TensorLay

zhangrui 349 Dec 07, 2022
PyTorch implementation of Barlow Twins.

Barlow Twins: Self-Supervised Learning via Redundancy Reduction PyTorch implementation of Barlow Twins. @article{zbontar2021barlow, title={Barlow Tw

Facebook Research 839 Dec 29, 2022
A tutorial showing how to train, convert, and run TensorFlow Lite object detection models on Android devices, the Raspberry Pi, and more!

A tutorial showing how to train, convert, and run TensorFlow Lite object detection models on Android devices, the Raspberry Pi, and more!

Evan 1.3k Jan 02, 2023
Fairness Metrics: All you need to know

Fairness Metrics: All you need to know Testing machine learning software for ethical bias has become a pressing current concern. Recent research has p

Anonymous2020 1 Jan 17, 2022
Deep Learning Slide Captcha

滑动验证码深度学习识别 本项目使用深度学习 YOLOV3 模型来识别滑动验证码缺口,基于 https://github.com/eriklindernoren/PyTorch-YOLOv3 修改。 只需要几百张缺口标注图片即可训练出精度高的识别模型,识别效果样例: 克隆项目 运行命令: git cl

Python3WebSpider 55 Jan 02, 2023
mPose3D, a mmWave-based 3D human pose estimation model.

mPose3D, a mmWave-based 3D human pose estimation model.

KylinChen 35 Nov 08, 2022
The codebase for Data-driven general-purpose voice activity detection.

Data driven GPVAD Repository for the work in TASLP 2021 Voice activity detection in the wild: A data-driven approach using teacher-student training. S

Heinrich Dinkel 75 Nov 27, 2022
Disease Informed Neural Networks (DINNs) — neural networks capable of learning how diseases spread, forecasting their progression, and finding their unique parameters (e.g. death rate).

DINN We introduce Disease Informed Neural Networks (DINNs) — neural networks capable of learning how diseases spread, forecasting their progression, a

19 Dec 10, 2022
Generalized Random Forests

generalized random forests A pluggable package for forest-based statistical estimation and inference. GRF currently provides non-parametric methods fo

GRF Labs 781 Dec 25, 2022
The official repository for our paper "The Neural Data Router: Adaptive Control Flow in Transformers Improves Systematic Generalization".

Codebase for learning control flow in transformers The official repository for our paper "The Neural Data Router: Adaptive Control Flow in Transformer

Csordás Róbert 24 Oct 15, 2022
Towards Debiasing NLU Models from Unknown Biases

Towards Debiasing NLU Models from Unknown Biases Abstract: NLU models often exploit biased features to achieve high dataset-specific performance witho

Ubiquitous Knowledge Processing Lab 22 Jun 14, 2022
Implementation of Axial attention - attending to multi-dimensional data efficiently

Axial Attention Implementation of Axial attention in Pytorch. A simple but powerful technique to attend to multi-dimensional data efficiently. It has

Phil Wang 250 Dec 25, 2022
HW3 ― GAN, ACGAN and UDA

HW3 ― GAN, ACGAN and UDA In this assignment, you are given datasets of human face and digit images. You will need to implement the models of both GAN

grassking100 1 Dec 13, 2021
Self-describing JSON-RPC services made easy

ReflectRPC Self-describing JSON-RPC services made easy Contents What is ReflectRPC? Installation Features Datatypes Custom Datatypes Returning Errors

Andreas Heck 31 Jul 16, 2022
Segmentation in Style: Unsupervised Semantic Image Segmentation with Stylegan and CLIP

Segmentation in Style: Unsupervised Semantic Image Segmentation with Stylegan and CLIP Abstract: We introduce a method that allows to automatically se

Daniil Pakhomov 134 Dec 19, 2022
A generalist algorithm for cell and nucleus segmentation.

Cellpose | A generalist algorithm for cell and nucleus segmentation. Cellpose was written by Carsen Stringer and Marius Pachitariu. To learn about Cel

MouseLand 733 Dec 29, 2022