A PyTorch implementation of "Signed Graph Convolutional Network" (ICDM 2018).

Overview

SGCN

Arxiv repo size codebeat badgebenedekrozemberczki

A PyTorch implementation of Signed Graph Convolutional Network (ICDM 2018).

Abstract

Due to the fact much of today's data can be represented as graphs, there has been a demand for generalizing neural network models for graph data. One recent direction that has shown fruitful results, and therefore growing interest, is the usage of graph convolutional neural networks (GCNs). They have been shown to provide a significant improvement on a wide range of tasks in network analysis, one of which being node representation learning. The task of learning low-dimensional node representations has shown to increase performance on a plethora of other tasks from link prediction and node classification, to community detection and visualization. Simultaneously, signed networks (or graphs having both positive and negative links) have become ubiquitous with the growing popularity of social media. However, since previous GCN models have primarily focused on unsigned networks (or graphs consisting of only positive links), it is unclear how they could be applied to signed networks due to the challenges presented by negative links. The primary challenges are based on negative links having not only a different semantic meaning as compared to positive links, but their principles are inherently different and they form complex relations with positive links. Therefore we propose a dedicated and principled effort that utilizes balance theory to correctly aggregate and propagate the information across layers of a signed GCN model. We perform empirical experiments comparing our proposed signed GCN against state-of-the-art baselines for learning node representations in signed networks. More specifically, our experiments are performed on four real-world datasets for the classical link sign prediction problem that is commonly used as the benchmark for signed network embeddings algorithms.

This repository provides an implementation for SGCN as described in the paper:

Signed Graph Convolutional Network. Tyler Derr, Yao Ma, and Jiliang Tang ICDM, 2018. [Paper]

The original implementation is available [here] and SGCN is also available in [PyTorch Geometric].

Requirements

The codebase is implemented in Python 3.5.2. package versions used for development are just below.

networkx           2.4
tqdm               4.28.1
numpy              1.15.4
pandas             0.23.4
texttable          1.5.0
scipy              1.1.0
argparse           1.1.0
sklearn            0.20.0
torch             1.1.0
torch-scatter     1.4.0
torch-sparse      0.4.3
torch-cluster     1.4.5
torch-geometric   1.3.2
torchvision       0.3.0

Datasets

The code takes an input graph in a csv file. Every row indicates an edge between two nodes separated by a comma. The first row is a header. Nodes should be indexed starting with 0. Sample graphs for the `Bitcoin Alpha` and `Bitcoin OTC` graphs are included in the `input/` directory. The structure of the edge dataset is the following:

NODE ID 1 NODE ID 2 Sign
0 3 -1
1 1 1
2 2 1
3 1 -1
... ... ...
n 9 -1

An attributed dataset for an `Erdos-Renyi` graph is also included in the input folder. **The node feature dataset rows are sorted by ID increasing**. The structure of the features csv has to be the following:

Feature 1 Feature 2 Feature 3 ... Feature d
3 0 1.37 ... 1
1 1 2.54 ... -11
2 0 1.08 ... -12
1 1 1.22 ... -4
. ... ... ... ... ...
5 0 2.47 ... 21

Options

Learning an embedding is handled by the `src/main.py` script which provides the following command line arguments.

Input and output options

  --edge-path                STR    Input graph path.          Default is `input/bitcoin_otc.csv`.
  --features-path            STR    Features path.             Default is `input/bitcoin_otc.csv`.
  --embedding-path           STR    Embedding path.            Default is `output/embedding/bitcoin_otc_sgcn.csv`.
  --regression-weights-path  STR    Regression weights path.   Default is `output/weights/bitcoin_otc_sgcn.csv`.
  --log-path                 STR    Log path.                  Default is `logs/bitcoin_otc_logs.json`.  

Model options

  --epochs                INT         Number of SGCN training epochs.      Default is 100. 
  --reduction-iterations  INT         Number of SVD epochs.                Default is 128.
  --reduction-dimensions  INT         SVD dimensions.                      Default is 30.
  --seed                  INT         Random seed value.                   Default is 42.
  --lamb                  FLOAT       Embedding regularization parameter.  Default is 1.0.
  --test-size             FLOAT       Test ratio.                          Default is 0.2.  
  --learning-rate         FLOAT       Learning rate.                       Default is 0.01.  
  --weight-decay          FLOAT       Weight decay.                        Default is 10^-5. 
  --layers                LST         Layer sizes in model.                Default is [32, 32].
  --spectral-features     BOOL        Layer sizes in autoencoder model.    Default is True
  --general-features      BOOL        Loss calculation for the model.      Sets spectral features to False.  

Examples

The following commands learn a node embedding, regression parameters and write the embedding to disk. The node representations are ordered by the ID. The layer sizes can be set manually.

Training an SGCN model on the default dataset. Saving the embedding, regression weights and logs at default paths. The regression weight file contains weights and bias (last column).

python src/main.py

``pos_ratio" in the output table gives the ratio of test links predicted to be positive.

Creating an SGCN model of the default dataset with a 96-64-32 architecture.

python src/main.py --layers 96 64 32

Creating a single layer SGCN model with 32 features.

python src/main.py --layers 32

Creating a model with some custom learning rate and epoch number.

python src/main.py --learning-rate 0.001 --epochs 200

Training a model on another dataset with features present - a signed Erdos-Renyi graph. Saving the weights, output and logs in a custom folder.

python src/main.py --general-features --edge-path input/erdos_renyi_edges.csv --features-path input/erdos_renyi_features.csv --embedding-path output/embedding/erdos_renyi.csv --regression-weights-path output/weights/erdos_renyi.csv --log-path logs/erdos_renyi.json

License


Comments
  • F1 score can not match the paper

    F1 score can not match the paper

    It seems that F1 score can not match the paper, for example, I set epoch=200, layers=96 64 32, learning rate=0.001, but on BIT-OTC dataset the F1 score can only reach 0.802.

    I was wondering if there is anything wrong about my experimental settings?

    opened by Coderbai 2
  • How can I run the SGCN with the data containing not numerical ID?

    How can I run the SGCN with the data containing not numerical ID?

    Before starting, thank you for developing the SGCN model. When I tried to run the code, I faced a problem with input data.

    In your example data(bitcoin_otc), all ids have a numerical value. But my data has string data that need to convert to numerical ID. I guess that I should add a converter such as key and library, I'm not sure where should I modify it. Can you suggest the method for running such cases?

    opened by songsong0425 1
  • F1 value in the result stays the same

    F1 value in the result stays the same

    Hi Benedek,

    Thanks for releasing the code!

    I find that the F1 value in the result stays the same. Could you give me some advice on how I should correct it?

    Looking forward to your reply. Thanks!

    opened by Lebesgue-zyker 1
  • BUGFIX: a little problem in ./src/signedsageconvolution.py

    BUGFIX: a little problem in ./src/signedsageconvolution.py

    Hello @benedekrozemberczki :

    I found out a little bug in file ./src/signedsageconvolution.py

    the code below in your file

    edge_index = add_self_loops(edge_index, num_nodes=x.size(0))
    

    should be replaced by

    edge_index, _ = add_self_loops(edge_index, num_nodes=x.size(0))
    

    i.e. just add , _ after edge_index

    All the add_self_loops lines in the file ./src/signedsageconvolution.py are modified as I typed above.

    (perhaps it is just a slip of the finger... sorry, i just know a slip of the tongue...)

    and finally, it works.

    My Environment is:

    Ubuntu 16.04
    Python 3.7.3
    PyTorch 1.1.0
    PyTorch_Geometric 1.2.1
    PyTorch_Scatter 1.2.0
    

    Yours, @wmf1997

    opened by WMF1997 1
  • ImportError: No module named 'torch_spline_conv'

    ImportError: No module named 'torch_spline_conv'

    (.venv) [email protected]:~/ub16_prj/SGCN$ python src/main.py Traceback (most recent call last): File "src/main.py", line 3, in from sgcn import SignedGCNTrainer File "/home/ub16hp/ub16_prj/SGCN/src/sgcn.py", line 12, in from torch_geometric.nn import SAGEConv File "/home/ub16hp/ub16_prj/SGCN/.venv/lib/python3.5/site-packages/torch_geometric/nn/init.py", line 1, in from .conv import * # noqa File "/home/ub16hp/ub16_prj/SGCN/.venv/lib/python3.5/site-packages/torch_geometric/nn/conv/init.py", line 1, in from .spline_conv import SplineConv File "/home/ub16hp/ub16_prj/SGCN/.venv/lib/python3.5/site-packages/torch_geometric/nn/conv/spline_conv.py", line 3, in from torch_spline_conv import SplineConv as Conv ImportError: No module named 'torch_spline_conv' (.venv) [email protected]:~/ub16_prj/SGCN$

    opened by loveJasmine 1
  • How can I calculate AUPR in SGCN?

    How can I calculate AUPR in SGCN?

    Dear bendek, Greetings, I have a question about the calculation of AUPR in SGCN. I'm trying to get AUPR score by input the below code in utils.py:

    from sklearn.metrics import roc_auc_score, f1_score, average_precision_score, precision_recall_curve, auc, plot_precision_recall_curve
    
    def calculate_auc(targets, predictions, edges):
        """
        Calculate performance measures on test dataset.
        :param targets: Target vector to predict.
        :param predictions: Predictions vector.
        :param edges: Edges dictionary with number of edges etc.
        :return auc: AUC value.
        :return f1: F1-score.
        """
        targets = [0 if target == 1 else 1 for target in targets]
        auc_score = roc_auc_score(targets, predictions)
        #precision, recall, thresholds = precision_recall_curve(targets, predictions)
        #aupr=auc(recall, precision)
        pred = [1 if p > 0.5 else 0 for p in  predictions]
        f1 = f1_score(targets, pred)
        #precision, recall, thresholds = precision_recall_curve(targets, pred)
        #aupr=auc(recall, precision)
        pos_ratio = sum(pred)/len(pred)
        return auc_score, aupr, f1, pos_ratio
    

    But I'm not sure where should I put the new code (#). Does AUPR need to get predictions value or pred value? Also, what the different between predictions and pred? I guess 'predictions' is probability value and 'pred' is binary value, is it right?

    Sorry for frequent edit, but why did you set targets = [0 if target == 1 else 1 for target in targets]? I think that it means give opposite labels to positive and negative edges. (i.e., positive=0, negative=1) I look forward to your reply. Sincerely, Songyeon

    opened by songsong0425 0
  • Fix negative sampling. Avoid invalid samples. Also add regression bias.

    Fix negative sampling. Avoid invalid samples. Also add regression bias.

    The first commit: Fix negative sampling. Avoid sampling positive edges and viewing them as non-edges, and avoid sampling negative edges and viewing them as non-edges. Borrow utility functions from torch_geometric. The trick to avoiding sampling an edge that exists is to map the existent links and the sampled links to some unique numbers (by multiplying the source node with the total number of nodes, then add the target node, ref: https://github.com/rusty1s/pytorch_geometric/blob/712f581642bccc225189fc3eb364ce0642b915a3/torch_geometric/utils/negative_sampling.py#L95) After sampling and mapping, we use NumPy.isin to pick out the invalid links (assumed to be non-existent, but exists), then resample iteratively until no invalid samples exist.

    The second commit: Add a bias term in regression to predict link signs. This is to learn some prior distribution of different types of links: positive, negative and none. The bias term can be helpful in dealing with unbalanced links. In the data sets used in SGCN, most links do not exist, and positive links are much more frequent than negative ones.

    opened by SherylHYX 0
  • fix classifier so that it will not always predict the majority class

    fix classifier so that it will not always predict the majority class

    Fix classifier so that it will not always predict the majority class, also add one column to the output table, change README correspondingly as well.

    There are some problems using a threshold determined by the ratio of negative links: On the one hand, you need to know the ground truth negative link ratio, which is a kind of cheating. On the other hand, due to the high imbalance of positive and negative links, your threshold would be close to zero, resulting in voting the majority class all the time.

    opened by SherylHYX 0
  • self.y should be related to negative_edges and test_negative_edges?

    self.y should be related to negative_edges and test_negative_edges?

    thanks for your code! I have a little question,

    self.y should bu related to negative_edges and test_negative_edges, when we calculate "calculate_regression_loss" function, pos = torch.cat((self.positive_z_i, self.positive_z_j), 1) # [14624, 128] neg = torch.cat((self.negative_z_i, self.negative_z_j), 1) # [2522, 128]

    opened by 529261027 0
Releases(v_00001)
Owner
Benedek Rozemberczki
Machine Learning Engineer at AstraZeneca | PhD from The University of Edinburgh.
Benedek Rozemberczki
PyTorch Implementation of PortaSpeech: Portable and High-Quality Generative Text-to-Speech

PortaSpeech - PyTorch Implementation PyTorch Implementation of PortaSpeech: Portable and High-Quality Generative Text-to-Speech. Model Size Module Nor

Keon Lee 279 Jan 04, 2023
Main Results on ImageNet with Pretrained Models

This repository contains Pytorch evaluation code, training code and pretrained models for the following projects: SPACH (A Battle of Network Structure

Microsoft 151 Dec 14, 2022
Bib-parser - Convenient script to parse .bib files with the ACM Digital Library like metadata

Bib Parser Convenient script to parse .bib files with the ACM Digital Library li

Mehtab Iqbal (Shahan) 1 Jan 26, 2022
Implementation of "Deep Implicit Templates for 3D Shape Representation"

Deep Implicit Templates for 3D Shape Representation Zerong Zheng, Tao Yu, Qionghai Dai, Yebin Liu. arXiv 2020. This repository is an implementation fo

Zerong Zheng 144 Dec 07, 2022
[ICCV 2021 Oral] PoinTr: Diverse Point Cloud Completion with Geometry-Aware Transformers

PoinTr: Diverse Point Cloud Completion with Geometry-Aware Transformers Created by Xumin Yu*, Yongming Rao*, Ziyi Wang, Zuyan Liu, Jiwen Lu, Jie Zhou

Xumin Yu 317 Dec 26, 2022
[ICCV21] Self-Calibrating Neural Radiance Fields

Self-Calibrating Neural Radiance Fields, ICCV, 2021 Project Page | Paper | Video Author Information Yoonwoo Jeong [Google Scholar] Seokjun Ahn [Google

381 Dec 30, 2022
Code and data for ACL2021 paper Cross-Lingual Abstractive Summarization with Limited Parallel Resources.

Multi-Task Framework for Cross-Lingual Abstractive Summarization (MCLAS) The code for ACL2021 paper Cross-Lingual Abstractive Summarization with Limit

Yu Bai 43 Nov 07, 2022
A collection of inference modules for fastai2

fastinference A collection of inference modules for fastai including inference speedup and interpretability Install pip install fastinference There ar

Zachary Mueller 83 Oct 10, 2022
PyTorch trainer and model for Sequence Classification

PyTorch-trainer-and-model-for-Sequence-Classification After cloning the repository, modify your training data so that the training data is a .csv file

NhanTieu 2 Dec 09, 2022
Code for "Diversity can be Transferred: Output Diversification for White- and Black-box Attacks"

Output Diversified Sampling (ODS) This is the github repository for the NeurIPS 2020 paper "Diversity can be Transferred: Output Diversification for W

50 Dec 11, 2022
Self-labelling via simultaneous clustering and representation learning. (ICLR 2020)

Self-labelling via simultaneous clustering and representation learning 🆗 🆗 🎉 NEW models (20th August 2020): Added standard SeLa pretrained torchvis

Yuki M. Asano 469 Jan 02, 2023
Official Codes for Graph Modularity:Towards Understanding the Cross-Layer Transition of Feature Representations in Deep Neural Networks.

Dynamic-Graphs-Construction Official Codes for Graph Modularity:Towards Understanding the Cross-Layer Transition of Feature Representations in Deep Ne

11 Dec 14, 2022
Implementation of Neural Distance Embeddings for Biological Sequences (NeuroSEED) in PyTorch

Neural Distance Embeddings for Biological Sequences Official implementation of Neural Distance Embeddings for Biological Sequences (NeuroSEED) in PyTo

Gabriele Corso 56 Dec 23, 2022
Contains source code for the winning solution of the xView3 challenge

Winning Solution for xView3 Challenge This repository contains source code and pretrained models for my (Eugene Khvedchenya) solution to xView 3 Chall

Eugene Khvedchenya 51 Dec 30, 2022
An 16kHz implementation of HiFi-GAN for soft-vc.

HiFi-GAN An 16kHz implementation of HiFi-GAN for soft-vc. Relevant links: Official HiFi-GAN repo HiFi-GAN paper Soft-VC repo Soft-VC paper Example Usa

Benjamin van Niekerk 42 Dec 27, 2022
Official PyTorch implementation for paper "Efficient Two-Stage Detection of Human–Object Interactions with a Novel Unary–Pairwise Transformer"

UPT: Unary–Pairwise Transformers This repository contains the official PyTorch implementation for the paper Frederic Z. Zhang, Dylan Campbell and Step

Frederic Zhang 109 Dec 20, 2022
Largest list of models for Core ML (for iOS 11+)

Since iOS 11, Apple released Core ML framework to help developers integrate machine learning models into applications. The official documentation We'v

Kedan Li 5.6k Jan 08, 2023
A boosting-based Multiple Instance Learning (MIL) package that includes MIL-Boost and MCIL-Boost

A boosting-based Multiple Instance Learning (MIL) package that includes MIL-Boost and MCIL-Boost

Jun-Yan Zhu 27 Aug 08, 2022
Lighting the Darkness in the Deep Learning Era: A Survey, An Online Platform, A New Dataset

Lighting the Darkness in the Deep Learning Era: A Survey, An Online Platform, A New Dataset This repository provides a unified online platform, LoLi-P

Chongyi Li 457 Jan 03, 2023
Code for "Localization with Sampling-Argmax", NeurIPS 2021

Localization with Sampling-Argmax [Paper] [arXiv] [Project Page] Localization with Sampling-Argmax Jiefeng Li, Tong Chen, Ruiqi Shi, Yujing Lou, Yong-

JeffLi 71 Dec 17, 2022