Learning from graph data using Keras

Overview

Steps to run =>

  • Download the cora dataset from this link : https://linqs.soe.ucsc.edu/data
  • unzip the files in the folder input/cora
  • cd code
  • python eda.py
  • python word_features_only.py # for baseline model 53.28% accuracy
  • python graph_embedding.py # for model_1 73.06% accuracy
  • python graph_features_embedding.py # for model_2 76.35% accuracy

Learning from Graph data using Keras and Tensorflow

Cora Data set Citation Graph

Motivation :

There is a lot of data out there that can be represented in the form of a graph in real-world applications like in Citation Networks, Social Networks (Followers graph, Friends network, … ), Biological Networks or Telecommunications.
Using Graph extracted features can boost the performance of predictive models by relying of information flow between close nodes. However, representing graph data is not straightforward especially if we don’t intend to implement hand-crafted features.
In this post we will explore some ways to deal with generic graphs to do node classification based on graph representations learned directly from data.

Dataset :

The Cora citation network data set will serve as the base to the implementations and experiments throughout this post. Each node represents a scientific paper and edges between nodes represent a citation relation between the two papers.
Each node is represented by a set of binary features ( Bag of words ) as well as by a set of edges that link it to other nodes.
The dataset has 2708 nodes classified into one of seven classes. The network has 5429 links. Each Node is also represented by a binary word features indicating the presence of the corresponding word. Overall there is 1433 binary (Sparse) features for each node. In what follows we only use 140 samples for training and the rest for validation/test.

Problem Setting :

Problem : Assigning a class label to nodes in a graph while having few training samples.
Intuition/Hypothesis : Nodes that are close in the graph are more likely to have similar labels.
Solution : Find a way to extract features from the graph to help classify new nodes.

Proposed Approach :


Baseline Model :

Simple Baseline Model

We first experiment with the simplest model that learn to predict node classes using only the binary features and discarding all graph information.
This model is a fully-connected Neural Network that takes as input the binary features and outputs the class probabilities for each node.

Baseline model Accuracy : 53.28%

****This is the initial accuracy that we will try to improve on by adding graph based features.

Adding Graph features :

One way to automatically learn graph features by embedding each node into a vector by training a network on the auxiliary task of predicting the inverse of the shortest path length between two input nodes like detailed on the figure and code snippet below :

Learning an embedding vector for each node

The next step is to use the pre-trained node embedding as input to the classification model. We also add the an additional input which is the average binary features of the neighboring nodes using distance of learned embedding vectors.

The resulting classification network is described in the following figure :

Using pretrained embeddings to do node classification

Graph embedding classification model Accuracy : 73.06%

We can see that adding learned graph features as input to the classification model helps significantly improve the classification accuracy compared to the baseline model from **53.28% to 73.06% ** 😄 .

Improving Graph feature learning :

We can look to further improve the previous model by pushing the pre-training further and using the binary features in the node embedding network and reusing the pre-trained weights from the binary features in addition to the node embedding vector. This results in a model that relies on more useful representations of the binary features learned from the graph structure.

Improved Graph embedding classification model Accuracy : 76.35%

This additional improvement adds a few percent accuracy compared to the previous approach.

Conclusion :

In this post we saw that we can learn useful representations from graph structured data and then use these representations to improve the generalization performance of a node classification model from **53.28% to 76.35% ** 😎 .

Code to reproduce the results is available here : https://github.com/CVxTz/graph_classification

Owner
Mansar Youness
Mansar Youness
Byzantine-robust decentralized learning via self-centered clipping

Byzantine-robust decentralized learning via self-centered clipping In this paper, we study the challenging task of Byzantine-robust decentralized trai

EPFL Machine Learning and Optimization Laboratory 4 Aug 27, 2022
Malware Env for OpenAI Gym

Malware Env for OpenAI Gym Citing If you use this code in a publication please cite the following paper: Hyrum S. Anderson, Anant Kharkar, Bobby Fila

ENDGAME 563 Dec 29, 2022
This tutorial aims to learn the basics of deep learning by hands, and master the basics through combination of lectures and exercises

2021-Deep-learning This tutorial aims to learn the basics of deep learning by hands, and master the basics through combination of paper and exercises.

108 Feb 24, 2022
Graph Robustness Benchmark: A scalable, unified, modular, and reproducible benchmark for evaluating the adversarial robustness of Graph Machine Learning.

Homepage | Paper | Datasets | Leaderboard | Documentation Graph Robustness Benchmark (GRB) provides scalable, unified, modular, and reproducible evalu

THUDM 66 Dec 22, 2022
A curated list of awesome Active Learning

Awesome Active Learning 🤩 A curated list of awesome Active Learning ! 🤩 Background (image source: Settles, Burr) What is Active Learning? Active lea

BAI Fan 431 Jan 03, 2023
Contour-guided image completion with perceptual grouping (BMVC 2021 publication)

Contour-guided Image Completion with Perceptual Grouping Authors Morteza Rezanejad*, Sidharth Gupta*, Chandra Gummaluru, Ryan Marten, John Wilder, Mic

Sid Gupta 6 Dec 27, 2022
https://arxiv.org/abs/2102.11005

LogME LogME: Practical Assessment of Pre-trained Models for Transfer Learning How to use Just feed the features f and labels y to the function, and yo

THUML: Machine Learning Group @ THSS 149 Dec 19, 2022
pytorch implementation of Attention is all you need

A Pytorch Implementation of the Transformer: Attention Is All You Need Our implementation is largely based on Tensorflow implementation Requirements N

230 Dec 07, 2022
Privacy-Preserving Portrait Matting [ACM MM-21]

Privacy-Preserving Portrait Matting [ACM MM-21] This is the official repository of the paper Privacy-Preserving Portrait Matting. Jizhizi Li∗, Sihan M

Jizhizi_Li 212 Dec 27, 2022
MINIROCKET: A Very Fast (Almost) Deterministic Transform for Time Series Classification

MINIROCKET: A Very Fast (Almost) Deterministic Transform for Time Series Classification

187 Dec 26, 2022
Codes for "CSDI: Conditional Score-based Diffusion Models for Probabilistic Time Series Imputation"

CSDI This is the github repository for the NeurIPS 2021 paper "CSDI: Conditional Score-based Diffusion Models for Probabilistic Time Series Imputation

106 Jan 04, 2023
OpenAi's gym environment wrapper to vectorize them with Ray

Ray Vector Environment Wrapper You would like to use Ray to vectorize your environment but you don't want to use RLLib ? You came to the right place !

Pierre TASSEL 15 Nov 10, 2022
Underwater industrial application yolov5m6

This project wins the intelligent algorithm contest finalist award and stands out from over 2000teams in China Underwater Robot Professional Contest, entering the final of China Underwater Robot Prof

8 Nov 09, 2022
DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting

DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting Created by Yongming Rao*, Wenliang Zhao*, Guangyi Chen, Yansong Tang, Zheng Z

Yongming Rao 321 Dec 27, 2022
Goal of the project : Detecting Temporal Boundaries in Sign Language videos

MVA RecVis course final project : Goal of the project : Detecting Temporal Boundaries in Sign Language videos. Sign language automatic indexing is an

Loubna Ben Allal 6 Dec 21, 2022
Code for Contrastive-Geometry Networks for Generalized 3D Pose Transfer

Code for Contrastive-Geometry Networks for Generalized 3D Pose Transfer

18 Jun 28, 2022
GNPy: Optical Route Planning and DWDM Network Optimization

GNPy is an open-source, community-developed library for building route planning and optimization tools in real-world mesh optical networks

Telecom Infra Project 140 Dec 19, 2022
Use .csv files to record, play and evaluate motion capture data.

Purpose These scripts allow you to record mocap data to, and play from .csv files. This approach facilitates parsing of body movement data in statisti

21 Dec 12, 2022
load .txt to train YOLOX, same as Yolo others

YOLOX train your data you need generate data.txt like follow format (per line- one image). prepare one data.txt like this: img_path1 x1,y1,x2,y2,clas

LiMingf 18 Aug 18, 2022
Starter kit for getting started in the Music Demixing Challenge.

Music Demixing Challenge - Starter Kit 👉 Challenge page This repository is the Music Demixing Challenge Submission template and Starter kit! Clone th

AIcrowd 106 Dec 20, 2022