Transformers are Graph Neural Networks!

Overview

🚀 Gated Graph Transformers

Gated Graph Transformers for graph-level property prediction, i.e. graph classification and regression.

Associated article: Transformers are Graph Neural Networks, by Chaitanya K. Joshi, published with The Gradient.

This repository is a continuously updated personal project to build intuitions about and track progress in Graph Representation Learning research. I aim to develop the most universal and powerful model which unifies state-of-the-art architectures from Graph Neural Networks and Transformers, without incorporating domain-specific tricks.

Gated Graph Transformer

Key Architectural Ideas

🤖 Deep, Residual Transformer Backbone

  • As the backbone architecture, I borrow the two-sub-layered, pre-normalization variant of Transformer encoders that has emerged as the standard in the NLP community, e.g. GPT-3. Each Transformer block consists of a message-passing sub-layer followed by a node-wise feedforward sub-layer. The graph convolution is described later.
  • The feedforward sub-layer projects node embeddings to an absurdly large dimension, passes them through a non-linear activation function, does dropout, and reduces back to the original embedding dimension.
  • The Transformer backbone enables training very deep and extremely overparameterized models. Overparameterization is important for performance in NLP and other combinatorially large domains, but was previously not possible for GNNs trained on small graph classifcation datasets. Coupled with unique node positional encodings (described later) and the feedforward sub-layer, overparameterization ensures that our GNN is Turing Universal (based on A. Loukas's recent insightful work, including this paper).

✉️ Anisotropic Graph Convolutions


Source: 'Deep Parametric Continuous Convolutional Neural Networks', Wang et al., 2018

  • As the graph convolution layer, I use the Gated Graph Convolution with dense attention mechanism, which we found to be the best performing graph convolution in Benchmarking GNNs. Intuitively, Gated GraphConv generalizes directional CNN filters for 2D images to arbitrary graphs by learning a weighted aggregations over the local neighbors of each node. It upgrades the node-to-node attention mechanism from GATs and MoNet (i.e. one attention weight per node pair) to consider dense feature-to-feature attention (i.e. d attention weights for pairs of d-dimensional node embeddings).
  • Another intuitive motivation for the Gated GraphConv is as a learnable directional diffusion process over the graph, or as a coupled PDE over node and edge features in the graph. Gated GraphConv makes the diffusion process/neighborhood aggregation anisotropic or directional, countering oversmoothing/oversquashing of features and enabling deeper models.
  • This graph convolution was originally proposed as a sentence encoder for NLP and further developed at NTU for molecule generation and combinatorial optimization. Evidently, I am partial to this idea. At the same time, it is worth noting that anisotropic local aggregations and generalizations of directed CNN filters have demonstrated strong performance across a myriad of applications, including 3D point clouds, drug discovery, material science, and programming languages.

🔄 Graph Positional Encodings


Source: 'Geometric Deep Learning: Going beyond Euclidean Data', Bronstein et al., 2017

  • I use the top-k non-trivial Laplacian Eigenvectors as unique node identifiers to inject structural/positional priors into the Transformer backbone. Laplacian Eigenvectors are a generalization of sinusoidal positional encodings from the original Transformers, and were concurrently proposed in the Benchmarking GNNs, EigenGNNs, and GCC papers.
  • Randomly flipping the sign of Laplacian Eigenvectors during training (due to symmetry) can be seen as an additional data augmentation or regularization technique, helping delay overfitting to training patterns. Going further, the Directional Graph Networks paper presents a more principled approach for using Laplacian Eigenvectors.

Some ideas still in the pipeline include:

  • Graph-specific Normalization - Originally motivated in Benchmarking GNNs as 'graph size normalization', there have been several subsequent graph-specific normalization techniques such as GraphNorm and MessageNorm, aiming to replace or augment standard Batch Normalization. Intuitively, there is room for improvement as BatchNorm flattens mini-batches of graphs instead of accounting for the underlying graph structure.

  • Theoretically Expressive Aggregation - There are several exciting ideas aiming to bridge the gap between theoretical expressive power, computational feasability, and generalization capacity for GNNs: PNA-style multi-head aggregation and scaling, generalized aggreagators from DeeperGCNs, pre-computing structural motifs as in GSN, etc.

  • Virtual Node and Low Rank Global Attention - After the message-passing step, the virtual node trick adds messages to-and-fro a virtual/super node connected to all graph nodes. LRGA comes with additional theretical motivations but does something similar. Intuitively, these techniques enable modelling long range or latent interactions in graphs and counter the oversquashing problem with deeper networks.

  • General Purpose Pre-training - It isn't truly a Transformer unless its pre-trained on hundreds of GPUs for thousands of hours...but general purpose pre-training for graph representation learning remains an open question!

Installation and Usage

# Create new Anaconda environment
conda create -n new-env python=3.7
conda activate new-env
# Install PyTorch 1.6 for CUDA 10.x
conda install pytorch=1.6 cudatoolkit=10.x -c pytorch
# Install DGL for CUDA 10.x
conda install -c dglteam dgl-cuda10.x
# Install other dependencies
conda install tqdm scikit-learn pandas urllib3 tensorboard
pip install -U ogb

# Train GNNs on ogbg-mol* datasets
python main_mol.py --dataset [ogbg-molhiv/ogbg-molpcba] --gnn [gated-gcn/gcn/mlp]

# Prepare submission for OGB leaderboards
bash scripts/ogbg-mol*.sh

# Collate results for submission
python submit.py --dataset [ogbg-molhiv/ogbg-molpcba] --expt [path-to-logs]

Note: The code was tested on Ubuntu 16.04, using Python 3.6, PyTorch 1.6 and CUDA 10.1.

Citation

@article{joshi2020transformers,
  author = {Joshi, Chaitanya K},
  title = {Transformers are Graph Neural Networks},
  journal = {The Gradient},
  year = {2020},
  howpublished = {\url{https://thegradient.pub/transformers-are-gaph-neural-networks/ } },
}
Owner
Chaitanya Joshi
Research Engineer at A*STAR, working on Graph Neural Networks
Chaitanya Joshi
A computational block to solve entity alignment over textual attributes in a knowledge graph creation pipeline.

How to apply? Create your config.ini file following the example provided in config.ini Choose one of the options below to run: Run with Python3 pip in

Scientific Data Management Group 3 Jun 23, 2022
CvT2DistilGPT2 is an encoder-to-decoder model that was developed for chest X-ray report generation.

CvT2DistilGPT2 Improving Chest X-Ray Report Generation by Leveraging Warm-Starting This repository houses the implementation of CvT2DistilGPT2 from [1

The Australian e-Health Research Centre 21 Dec 28, 2022
Physics-informed convolutional-recurrent neural networks for solving spatiotemporal PDEs

PhyCRNet Physics-informed convolutional-recurrent neural networks for solving spatiotemporal PDEs Paper link: [ArXiv] By: Pu Ren, Chengping Rao, Yang

Pu Ren 11 Aug 23, 2022
NER for Indian languages

CL-NERIL: A Cross-Lingual Model for NER in Indian Languages Code for the paper - https://arxiv.org/abs/2111.11815 Setup Setup a virtual environment Th

Akshara P 0 Nov 24, 2021
Data, notebooks, and articles associated with the RSNA AI Deep Learning Lab at RSNA 2021

RSNA AI Deep Learning Lab 2021 Intro Welcome Deep Learners! This document provides all the information you need to participate in the RSNA AI Deep Lea

RSNA 65 Dec 16, 2022
Official project website for the CVPR 2021 paper "Exploring intermediate representation for monocular vehicle pose estimation"

EgoNet Official project website for the CVPR 2021 paper "Exploring intermediate representation for monocular vehicle pose estimation". This repo inclu

Shichao Li 138 Dec 09, 2022
Multi-Template Mouse Brain MRI Atlas (MBMA): both in-vivo and ex-vivo

Multi-template MRI mouse brain atlas (both in vivo and ex vivo) Mouse Brain MRI atlas (both in-vivo and ex-vivo) (repository relocated from the origin

8 Nov 18, 2022
Bachelor's Thesis in Computer Science: Privacy-Preserving Federated Learning Applied to Decentralized Data

federated is the source code for the Bachelor's Thesis Privacy-Preserving Federated Learning Applied to Decentralized Data (Spring 2021, NTNU) Federat

Dilawar Mahmood 25 Nov 30, 2022
Conversational text Analysis using various NLP techniques

PyConverse Let me try first Installation pip install pyconverse Usage Please try this notebook that demos the core functionalities: basic usage noteb

Rita Anjana 158 Dec 25, 2022
Official code for MPG2: Multi-attribute Pizza Generator: Cross-domain Attribute Control with Conditional StyleGAN

This is the official code for Multi-attribute Pizza Generator (MPG2): Cross-domain Attribute Control with Conditional StyleGAN. Paper Demo Setup Envir

Fangda Han 5 Sep 01, 2022
Official PyTorch implementation of PS-KD

Self-Knowledge Distillation with Progressive Refinement of Targets (PS-KD) Accepted at ICCV 2021, oral presentation Official PyTorch implementation of

61 Dec 28, 2022
Official code for 'Weakly-supervised Video Anomaly Detection with Robust Temporal Feature Magnitude Learning' [ICCV 2021]

RTFM This repo contains the Pytorch implementation of our paper: Weakly-supervised Video Anomaly Detection with Robust Temporal Feature Magnitude Lear

Yu Tian 242 Jan 08, 2023
The mini-AlphaStar (mini-AS, or mAS) - mini-scale version (non-official) of the AlphaStar (AS)

A mini-scale reproduction code of the AlphaStar program. Note: the original AlphaStar is the AI proposed by DeepMind to play StarCraft II.

Ruo-Ze Liu 216 Jan 04, 2023
Implementation of EMNLP 2017 Paper "Natural Language Does Not Emerge 'Naturally' in Multi-Agent Dialog" using PyTorch and ParlAI

Language Emergence in Multi Agent Dialog Code for the Paper Natural Language Does Not Emerge 'Naturally' in Multi-Agent Dialog Satwik Kottur, José M.

Karan Desai 105 Nov 25, 2022
[CVPR'21] FedDG: Federated Domain Generalization on Medical Image Segmentation via Episodic Learning in Continuous Frequency Space

FedDG: Federated Domain Generalization on Medical Image Segmentation via Episodic Learning in Continuous Frequency Space by Quande Liu, Cheng Chen, Ji

Quande Liu 178 Jan 06, 2023
STBP is a way to train SNN with datasets by Backward propagation.

Spiking neural network (SNN), compared with depth neural network (DNN), has faster processing speed, lower energy consumption and more biological interpretability, which is expected to approach Stron

Ling Zhang 18 Dec 09, 2022
Pytorch Implementation of rpautrat/SuperPoint

SuperPoint-Pytorch (A Pure Pytorch Implementation) SuperPoint: Self-Supervised Interest Point Detection and Description Thanks This work is based on:

76 Dec 27, 2022
A baseline code for VSPW

A baseline code for VSPW Preparation Download VSPW dataset The VSPW dataset with extracted frames and masks is available here.

28 Aug 22, 2022
git《Pseudo-ISP: Learning Pseudo In-camera Signal Processing Pipeline from A Color Image Denoiser》(2021) GitHub: [fig5]

Pseudo-ISP: Learning Pseudo In-camera Signal Processing Pipeline from A Color Image Denoiser Abstract The success of deep denoisers on real-world colo

Yue Cao 51 Nov 22, 2022
This repository contains pre-trained models and some evaluation code for our paper Towards Unsupervised Dense Information Retrieval with Contrastive Learning

Contriever: Towards Unsupervised Dense Information Retrieval with Contrastive Learning This repository contains pre-trained models and some evaluation

Meta Research 207 Jan 08, 2023