[NeurIPS 2021] Large Scale Learning on Non-Homophilous Graphs: New Benchmarks and Strong Simple Methods

Overview

Large Scale Learning on Non-Homophilous Graphs: New Benchmarks and Strong Simple Methods

Large Scale Learning on Non-Homophilous Graphs: New Benchmarks and Strong Simple Methods

Derek Lim*, Felix Hohne*, Xiuyu Li*, Sijia Linda Huang, Vaishnavi Gupta, Omkar Bhalerao, Ser-Nam Lim

Published at NeurIPS 2021

Here are codes to load our proposed datasets, compute our measure of homophily, and train various graph machine learning models in our experimental setup. We include an implementation of the new graph neural network LINKX that we develop.

Organization

main.py contains the main full batch experimental scripts.

main_scalable.py contains the minibatching experimental scripts.

parse.py contains flags for running models with specific settings and hyperparameters.

dataset.py loads our datasets.

models.py contains implementations for graph machine learning models, though C&S (correct_smooth.py, cs_tune_hparams.py) are in separate files. Running several of the GNN models on larger datasets may require at least 24GB of VRAM. Our LINKX model is implemented in this file.

homophily.py contains functions for computing homophily measures, including the one that we introduce in our_measure.

experiments/ contains the bash files to reproduce full batch experiments.

scalable_experiments/ contains the bash files to reproduce minibatching experiments.

wiki_scraping/ contains the Python scripts to reproduce the "wiki" dataset by querying the Wikipedia API and cleaning up the data.

Datasets

Screenshot 2021-06-03 at 6 04 01 PM

As discussed in the paper, our proposed datasets are "genius", "twitch-gamer", "fb100", "pokec", "wiki", "arxiv-year", and "snap-patents", which can be loaded by load_nc_dataset in dataset.py by passing in their respective string name. Many of these datasets are included in the data/ directory, but wiki, twitch-gamer, snap-patents, and pokec are automatically downloaded from a Google drive link when loaded from dataset.py. The arxiv-year dataset is downloaded using OGB downloaders. load_nc_dataset returns an NCDataset, the documentation for which is also provided in dataset.py. It is functionally equivalent to OGB's Library-Agnostic Loader for Node Property Prediction, except for the fact that it returns torch tensors. See the OGB website for more specific documentation. Just like the OGB function, dataset.get_idx_split() returns fixed dataset split for training, validation, and testing.

When there are multiple graphs (as in the case of fb100), different ones can be loaded by passing in the sub_dataname argument to load_nc_dataset in dataset.py. In particular, fb100 consists of 100 graphs. We only include ["Amherst41", "Cornell5", "Johns Hopkins55", "Penn94", "Reed98"] in this repo, although others may be downloaded from the internet archive. In the paper we test on Penn94.

References

The datasets come from a variety of sources, as listed here:

  • Penn94. Traud et al 2012. Social Structure of Facebook Networks
  • pokec. Leskovec et al. Stanford Network Analysis Project
  • arXiv-year. Hu et al 2020. Open Graph Benchmark
  • snap-patents. Leskovec et al. Stanford Network Analysis Project
  • genius. Lim and Benson 2020. Expertise and Dynamics within Crowdsourced Musical Knowledge Curation: A Case Study of the Genius Platform
  • twitch-gamers. Rozemberczki and Sarkar 2021. Twitch Gamers: a Dataset for Evaluating Proximity Preserving and Structural Role-based Node Embeddings
  • wiki. Collected by the authors of this work in 2021.

Installation instructions

  1. Create and activate a new conda environment using python=3.8 (i.e. conda create --name non-hom python=3.8)
  2. Activate your conda environment
  3. Check CUDA version using nvidia-smi
  4. run bash install.sh cu110, replacing cu110 with your CUDA version (CUDA 11 -> cu110, CUDA 10.2 -> cu102, CUDA 10.1 -> cu101). We tested on Ubuntu 18.04, CUDA 11.0.

Running experiments

  1. Make sure a results folder exists in the root directory.
  2. Our experiments are in the experiments/ and scalable_experiments/ directory. There are bash scripts for running methods on single and multiple datasets. Please note that the experiments must be run from the root directory, e.g. (bash experiments/mixhop_exp.sh snap-patents). For instance, to run the MixHop experiments on arxiv-year, use:
bash experiments/mixhop_exp.sh arxiv-year

To run LINKX on pokec, use:

bash experiments/linkx_exp.sh pokec

To run LINK on Penn94, use:

bash experiments/link_exp.sh fb100 Penn94

To run GCN-cluster on twitch-gamers, use:

bash scalable_experiments/gcn_cluster.sh twitch-gamer

To run LINKX minibatched on wiki, use

bash scalable_experiments/linkx_exp.sh wiki

To run LINKX on Geom-GCN with full hyperparameter grid on chameleon, use

bash experiments/linkx_tuning.sh chameleon
Owner
Cornell University Artificial Intelligence
Scalable, event-driven, deep-learning-friendly backtesting library

...Minimizing the mean square error on future experience. - Richard S. Sutton BTGym Scalable event-driven RL-friendly backtesting library. Build on

Andrew 922 Dec 27, 2022
AI-based, context-driven network device ranking

Batea A batea is a large shallow pan of wood or iron traditionally used by gold prospectors for washing sand and gravel to recover gold nuggets. Batea

Secureworks Taegis VDR 269 Nov 26, 2022
The all new way to turn your boring vector meshes into the new fad in town; Voxels!

Voxelator The all new way to turn your boring vector meshes into the new fad in town; Voxels! Notes: I have not tested this on a rotated mesh. With fu

6 Feb 03, 2022
Estimating and Exploiting the Aleatoric Uncertainty in Surface Normal Estimation

Estimating and Exploiting the Aleatoric Uncertainty in Surface Normal Estimation

Bae, Gwangbin 95 Jan 04, 2023
Reference code for the paper "Cross-Camera Convolutional Color Constancy" (ICCV 2021)

Cross-Camera Convolutional Color Constancy, ICCV 2021 (Oral) Mahmoud Afifi1,2, Jonathan T. Barron2, Chloe LeGendre2, Yun-Ta Tsai2, and Francois Bleibe

Mahmoud Afifi 76 Jan 07, 2023
Unofficial implementation of "Coordinate Attention for Efficient Mobile Network Design"

Unofficial implementation of "Coordinate Attention for Efficient Mobile Network Design". CoordAttention tensorflow slim

Billy 9 Aug 22, 2022
Code for our CVPR 2022 Paper "GEN-VLKT: Simplify Association and Enhance Interaction Understanding for HOI Detection"

GEN-VLKT Code for our CVPR 2022 paper "GEN-VLKT: Simplify Association and Enhance Interaction Understanding for HOI Detection". Contributed by Yue Lia

Yue Liao 47 Dec 04, 2022
Tutorial to set up TensorFlow Object Detection API on the Raspberry Pi

A tutorial showing how to set up TensorFlow's Object Detection API on the Raspberry Pi

Evan 1.1k Dec 26, 2022
Multiple types of NN model optimization environments. It is possible to directly access the host PC GUI and the camera to verify the operation. Intel iHD GPU (iGPU) support. NVIDIA GPU (dGPU) support.

mtomo Multiple types of NN model optimization environments. It is possible to directly access the host PC GUI and the camera to verify the operation.

Katsuya Hyodo 24 Mar 02, 2022
ChebLieNet, a spectral graph neural network turned equivariant by Riemannian geometry on Lie groups.

ChebLieNet: Invariant spectral graph NNs turned equivariant by Riemannian geometry on Lie groups Hugo Aguettaz, Erik J. Bekkers, Michaël Defferrard We

haguettaz 12 Dec 10, 2022
Self-Supervised Multi-Frame Monocular Scene Flow (CVPR 2021)

Self-Supervised Multi-Frame Monocular Scene Flow 3D visualization of estimated depth and scene flow (overlayed with input image) from temporally conse

Visual Inference Lab @TU Darmstadt 85 Dec 22, 2022
Robust Consistent Video Depth Estimation

[CVPR 2021] Robust Consistent Video Depth Estimation This repository contains Python and C++ implementation of Robust Consistent Video Depth, as descr

Facebook Research 213 Dec 17, 2022
🎃 Core identification module of AI powerful point reading system platform.

ppReader-Kernel Intro Core identification module of AI powerful point reading system platform. Usage 硬件: Windows10、GPU:nvdia GTX 1060 、普通RBG相机 软件: con

CrashKing 1 Jan 11, 2022
AfriBERTa: Exploring the Viability of Pretrained Multilingual Language Models for Low-resourced Languages

AfriBERTa: Exploring the Viability of Pretrained Multilingual Language Models for Low-resourced Languages This repository contains the code for the pa

Kelechi 40 Nov 24, 2022
ZeroGen: Efficient Zero-shot Learning via Dataset Generation

ZEROGEN This repository contains the code for our paper “ZeroGen: Efficient Zero

Jiacheng Ye 31 Dec 30, 2022
The Pytorch implementation for "Video-Text Pre-training with Learned Regions"

Region_Learner The Pytorch implementation for "Video-Text Pre-training with Learned Regions" (arxiv) We are still cleaning up the code further and pre

Rui Yan 0 Mar 20, 2022
Multilingual Image Captioning

Multilingual Image Captioning Authors: Bhavitvya Malik, Gunjan Chhablani Demo Link: https://huggingface.co/spaces/flax-community/multilingual-image-ca

Gunjan Chhablani 32 Nov 25, 2022
Definition of a business problem according to Wilson Lower Bound Score and Time Based Average Rating

Wilson Lower Bound Score, Time Based Rating Average In this study I tried to calculate the product rating and sorting reviews more accurately. I have

3 Sep 30, 2021
pix2pix in tensorflow.js

pix2pix in tensorflow.js This repo is moved to https://github.com/yining1023/pix2pix_tensorflowjs_lite See a live demo here: https://yining1023.github

Yining Shi 47 Oct 04, 2022
Weighted QMIX: Expanding Monotonic Value Function Factorisation

This repo contains the cleaned-up code that was used in "Weighted QMIX: Expanding Monotonic Value Function Factorisation"

whirl 82 Dec 29, 2022