Edge-Augmented Graph Transformer

Overview

PWCPWCPWCPWCPWC

Edge-augmented Graph Transformer

Introduction

This is the official implementation of the Edge-augmented Graph Transformer (EGT) as described in https://arxiv.org/abs/2108.03348, which augments the Transformer architecture with residual edge channels. The resultant architecture can directly process graph-structured data and acheives good results on supervised graph-learning tasks as presented by Dwivedi et al.. It also achieves good performance on the large-scale PCQM4M-LSC (0.1263 MAE on val) dataset. EGT beats convolutional/message-passing graph neural networks on a wide range of supervised tasks and thus demonstrates that convolutional aggregation is not an essential inductive bias for graphs.

Requirements

  • python >= 3.7
  • tensorflow >= 2.1.0
  • h5py >= 2.8.0
  • numpy >= 1.18.4
  • scikit-learn >= 0.22.1

Download the Datasets

For our experiments, we converted the datasets to HDF5 format for the convenience of using them without any specific library. Only the h5py library is required. The datasets can be downloaded from -

Or you can simply run the provided bash scripts download_medium_scale_datasets.sh, download_large_scale_datasets.sh. The default location of the datasets is the datasets directory.

Run Training and Evaluations

You must create a JSON config file containing the configuration of a model, its training and evaluation configs (configurations). The same config file is used to do both training and evaluations.

  • To run training: python run_training.py <config_file.json>
  • To end training (prematurely): python end_training.py <config_file.json>
  • To perform evaluations: python do_evaluations.py <config_file.json>

Config files for the main results presented in the paper are contained in the configs/main directory, whereas configurations for the ablation study are contained in the configs/ablation directory. The paths and names of the files are self-explanatory.

More About Training and Evaluations

Once the training is started a model folder will be created in the models directory, under the specified dataset name. This folder will contain a copy of the input config file, for the convenience of resuming training/evaluation. Also, it will contain a config.json which will contain all configs, including unspecified default values, used for the training. Training will be checkpointed per epoch. In case of any interruption you can resume training by running the run_training.py with the config.json file again.

In case you wish to finalize training midway, just stop training and run end_training.py script with the config.json file to save the model weights.

After training, you can run the do_evaluations.py script with the same config file to perform evaluations. Alongside being printed to stdout, results will be saved in the predictions directory, under the model directory.

Config File

The config file can contain many different configurations, however, the only required configuration is scheme, which specifies the training scheme. If the other configurations are not specified, a default value will be assumed for them. Here are some of the commonly used configurations:

scheme: Used to specify the training scheme. It has a format <dataset_name>.<positional_encoding>. For example: cifar10.svd or zinc.eig. If no encoding is to be used it can be something like pcqm4m.mat. For a full list you can explore the lib/training/schemes directory.

dataset_path: If the datasets are contained in the default location in the datasets directory, this config need not be specified. Otherwise you have to point it towards the <dataset_name>.h5 file.

model_name: Serves as an identifier for the model, also specifies default path of the model directory, weight files etc.

save_path: The training process will create a model directory containing the logs, checkpoints, configs, model summary and predictions/evaluations. By default it creates a folder at models/<dataset_name>/<model_name> but it can be changed via this config.

cache_dir: During first time of training/evaluation the data will be cached to a tensorflow cache format. Default path is data_cache/<dataset_name>/<positional_encoding>. But it can be changed via this config.

distributed: In a multi-gpu setting you can set it to True, for distributed training.

batch_size: Batch size.

num_epochs: Maximum Number of epochs.

initial_lr: Initial learning rate. In case of warmup it is the maximum learning rate.

rlr_factor: Reduce LR on plateau factor. Setting it to a value >= 1.0 turns off Reduce LR.

rlr_patience: Reduce LR patience, i.e. the number of epochs after which LR is reduced if validation loss doesn't improve.

min_lr_factor: The factor by which the minimum LR is smaller, of the initial LR. Default is 0.01.

model_height: The number of layers L.

model_width: The dimensionality of the node channels d_h.

edge_width: The dimensionality of the edge channels d_e.

num_heads: The number of attention heads. Default is 8.

ffn_multiplier: FFN multiplier for both channels. Default is 2.0 .

virtual_nodes: number of virtual nodes. 0 (default) would result in global average pooling being used instead of virtual nodes.

upto_hop: Clipping value of the input distance matrix. A value of 1 (default) would result in adjacency matrix being used as input structural matrix.

mlp_layers: Dimensionality of the final MLP layers, specified as a list of factors with respect to d_h. Default is [0.5, 0.25].

gate_attention: Set this to False to get the ungated EGT variant (EGT-U).

dropout: Dropout rate for both channels. Default is 0.

edge_dropout: If specified, applies a different dropout rate to the edge channels.

edge_channel_type: Used to create ablated variants of EGT. A value of "residual" (default) implies pure/full EGT. "constrained" implies EGT-constrained. "bias" implies EGT-simple.

warmup_steps: If specified, performs a linear learning rate warmup for the specified number of gradient update steps.

total_steps: If specified, performs a cosine annealing after warmup, so that the model is trained for the specified number of steps.

[For SVD-based encodings]:

use_svd: Turning this off (False) would result in no positional encoding being used.

sel_svd_features: Rank of the SVD encodings r.

random_neg: Augment SVD encodings by random negation.

[For Eigenvectors encodings]:

use_eig: Turning this off (False) would result in no positional encoding being used.

sel_eig_features: Number of eigen vectors.

[For Distance prediction Objective (DO)]:

distance_target: Predict distance up to the specified hop, nu.

distance_loss: Factor by which to multiply the distance prediction loss, kappa.

Creation of the HDF5 Datasets from Scratch

We included two Jupyter notebooks to demonstrate how the HDF5 datasets are created

  • For the medium scale datasets view create_hdf_benchmarking_datasets.ipynb. You will need pytorch, ogb==1.1.1 and dgl==0.4.2 libraries to run the notebook. The notebook is also runnable on Google Colaboratory.
  • For the large scale pcqm4m dataset view create_hdf_pcqm4m.ipynb. You will need pytorch, ogb>=1.3.0 and rdkit>=2019.03.1 to run the notebook.

Python Environment

The Anaconda environment in which our experiments were conducted is specified in the environment.yml file.

Citation

Please cite the following paper if you find the code useful:

@article{hussain2021edge,
  title={Edge-augmented Graph Transformers: Global Self-attention is Enough for Graphs},
  author={Hussain, Md Shamim and Zaki, Mohammed J and Subramanian, Dharmashankar},
  journal={arXiv preprint arXiv:2108.03348},
  year={2021}
}
Owner
Md Shamim Hussain
Md Shamim Hussain is a Ph.D. student in Computer Science at Rensselaer Polytechnic Institute, NY. He got his B.Sc. and M.Sc. in EEE from BUET, Dhaka.
Md Shamim Hussain
Natural Language Processing Specialization

Natural Language Processing Specialization In this folder, Natural Language Processing Specialization projects and notes can be found. WHAT I LEARNED

Kaan BOKE 3 Oct 06, 2022
SEJE is a prototype for the paper Learning Text-Image Joint Embedding for Efficient Cross-Modal Retrieval with Deep Feature Engineering.

SEJE is a prototype for the paper Learning Text-Image Joint Embedding for Efficient Cross-Modal Retrieval with Deep Feature Engineering. Contents Inst

0 Oct 21, 2021
A number of methods in order to perform Natural Language Processing on live data derived from Twitter

A number of methods in order to perform Natural Language Processing on live data derived from Twitter

1 Nov 24, 2021
Clone a voice in 5 seconds to generate arbitrary speech in real-time

This repository is forked from Real-Time-Voice-Cloning which only support English. English | 中文 Features 🌍 Chinese supported mandarin and tested with

Weijia Chen 25.6k Jan 06, 2023
✨Fast Coreference Resolution in spaCy with Neural Networks

✨ NeuralCoref 4.0: Coreference Resolution in spaCy with Neural Networks. NeuralCoref is a pipeline extension for spaCy 2.1+ which annotates and resolv

Hugging Face 2.6k Jan 04, 2023
CDLA: A Chinese document layout analysis (CDLA) dataset

CDLA: A Chinese document layout analysis (CDLA) dataset 介绍 CDLA是一个中文文档版面分析数据集,面向中文文献类(论文)场景。包含以下10个label: 正文 标题 图片 图片标题 表格 表格标题 页眉 页脚 注释 公式 Text Title

buptlihang 84 Dec 28, 2022
A fast and lightweight python-based CTC beam search decoder for speech recognition.

pyctcdecode A fast and feature-rich CTC beam search decoder for speech recognition written in Python, providing n-gram (kenlm) language model support

Kensho 315 Dec 21, 2022
A PyTorch implementation of the Transformer model in "Attention is All You Need".

Attention is all you need: A Pytorch Implementation This is a PyTorch implementation of the Transformer model in "Attention is All You Need" (Ashish V

Yu-Hsiang Huang 7.1k Jan 05, 2023
Unsupervised text tokenizer for Neural Network-based text generation.

SentencePiece SentencePiece is an unsupervised text tokenizer and detokenizer mainly for Neural Network-based text generation systems where the vocabu

Google 6.4k Jan 01, 2023
This repository contains (not all) code from my project on Named Entity Recognition in philosophical text

NERphilosophy 👋 Welcome to the github repository of my BsC thesis. This repository contains (not all) code from my project on Named Entity Recognitio

Ruben 1 Jan 27, 2022
Implementation of TF-IDF algorithm to find documents similarity with cosine similarity

NLP learning Trying to learn NLP to use in my projects! Table of Contents About The Project Built With Getting Started Requirements Run Usage License

Faraz Farangizadeh 3 Aug 25, 2022
Materials (slides, code, assignments) for the NYU class I teach on NLP and ML Systems (Master of Engineering).

FREE_7773 Repo containing material for the NYU class (Master of Engineering) I teach on NLP, ML Sys etc. For context on what the class is trying to ac

Jacopo Tagliabue 90 Dec 19, 2022
Search msDS-AllowedToActOnBehalfOfOtherIdentity

前言 现在进行RBCD的攻击手段主要是搜索mS-DS-CreatorSID,如果机器的创建者是我们可控的话,那就可以修改对应机器的msDS-AllowedToActOnBehalfOfOtherIdentity,利用工具SharpAllowedToAct-Modify 那我们索性也试试搜索所有计算机

Jumbo 26 Dec 05, 2022
Code for "Generating Disentangled Arguments with Prompts: a Simple Event Extraction Framework that Works"

GDAP The code of paper "Code for "Generating Disentangled Arguments with Prompts: a Simple Event Extraction Framework that Works"" Event Datasets Prep

45 Oct 29, 2022
precise iris segmentation

PI-DECODER Introduction PI-DECODER, a decoder structure designed for Precise Iris Segmentation and Location. The decoder structure is shown below: Ple

8 Aug 08, 2022
An assignment from my grad-level data mining course demonstrating some experience with NLP/neural networks/Pytorch

NLP-Pytorch-Assignment An assignment from my grad-level data mining course (before I started personal projects) demonstrating some experience with NLP

David Thorne 0 Feb 06, 2022
BERTopic is a topic modeling technique that leverages 🤗 transformers and c-TF-IDF to create dense clusters allowing for easily interpretable topics whilst keeping important words in the topic descriptions

BERTopic BERTopic is a topic modeling technique that leverages 🤗 transformers and c-TF-IDF to create dense clusters allowing for easily interpretable

Maarten Grootendorst 3.6k Jan 07, 2023
Guide: Finetune GPT2-XL (1.5 Billion Parameters) and GPT-NEO (2.7 B) on a single 16 GB VRAM V100 Google Cloud instance with Huggingface Transformers using DeepSpeed

Guide: Finetune GPT2-XL (1.5 Billion Parameters) and GPT-NEO (2.7 Billion Parameters) on a single 16 GB VRAM V100 Google Cloud instance with Huggingfa

289 Jan 06, 2023
Python SDK for working with Voicegain Speech-to-Text

Voicegain Speech-to-Text Python SDK Python SDK for the Voicegain Speech-to-Text API. This API allows for large vocabulary speech-to-text transcription

Voicegain 3 Dec 14, 2022
Create a semantic search engine with a neural network (i.e. BERT) whose knowledge base can be updated

Create a semantic search engine with a neural network (i.e. BERT) whose knowledge base can be updated. This engine can later be used for downstream tasks in NLP such as Q&A, summarization, generation

Diego 1 Mar 20, 2022