Semi-Supervised Signed Clustering Graph Neural Network (and Implementation of Some Spectral Methods)

Overview

SSSNET

SSSNET: Semi-Supervised Signed Network Clustering

For details, please read our paper.

Environment Setup

Overview

The project has been tested on the following environment specification:

  1. Ubuntu 18.04.5 LTS (Other x86_64 based Linux distributions should also be fine, such as Fedora 32)
  2. Nvidia Graphic Card (NVIDIA GeForce RTX 2080 with driver version 440.36, and NVIDIA RTX 8000) and CPU (Intel Core i7-10700 CPU @ 2.90GHz)
  3. Python 3.6.13 (and Python 3.6.12)
  4. CUDA 10.2 (and CUDA 9.2)
  5. Pytorch 1.8.0 (built against CUDA 10.2) and Python 1.6.0 (built against CUDA 9.2)
  6. Other libraries and python packages (See below)

You should handle (1),(2) yourself. For (3), (4), (5) and (6), see following methods.

Installation Method 1 (Using Installation Script)

We provide two examples of environmental setup, one with CUDA 10.2 and GPU, the other with CPU.

Following steps assume you've done with (1) and (2).

  1. Install conda. Both Miniconda and Anaconda are OK.

  2. Run the following bash script under SSSNET's root directory.

./create_conda_env.sh

Installation Method 2 (.yml files)

We provide two examples of envionmental setup, one with CUDA 10.2 and GPU, the other with CPU.

Following steps assume you've done with (1) and (2).

  1. Install conda. Both Miniconda and Anaconda are OK.

  2. Create an environment and install python packages (GPU):

conda env create -f environment_GPU.yml
  1. Create an environment and install python packages (CPU):
conda env create -f environment_CPU.yml

Installation Method 3 (Manually Install)

The codebase is implemented in Python 3.6.12. package versions used for development are just below.

networkx           2.5
tqdm               4.50.2
numpy              1.19.2
pandas             1.1.4
texttable          1.6.3
latextable         0.1.1
scipy              1.5.4
argparse           1.1.0
sklearn            0.23.2
torch              1.8.1
torch-scatter      2.0.5
torch-geometric    1.6.3 (follow https://pytorch-geometric.readthedocs.io/en/latest/notes/installation.html)
matplotlib         3.3.4 (for generating plots and results)
SigNet         (for comparison methods, can get from the command: pip install git+https://github.com/alan-turing-institute/SigNet.git)

Execution checks

When installation is done, you could check you enviroment via:

bash setup_test.sh

Folder structure

  • ./execution/ stores files that can be executed to generate outputs. For vast number of experiments, we use GNU parallel, can be downloaded in command line and make it executable via:
wget http://git.savannah.gnu.org/cgit/parallel.git/plain/src/parallel
chmod 755 ./parallel
  • ./joblog/ stores job logs from parallel. You might need to create it by
mkdir joblog
  • ./Output/ stores raw outputs (ignored by Git) from parallel. You might need to create it by
mkdir Output
  • ./data/ stores processed data sets for node clustering.

  • ./src/ stores files to train various models, utils and metrics.

  • ./result_arrays/ stores results for different data sets. Each data set has a separate subfolder.

  • ./result_anlysis/ stores notebooks for generating result plots or tables.

  • ./logs/ stores trained models and logs, as well as predicted clusters (optional). When you are in debug mode (see below), your logs will be stored in ./debug_logs/ folder.

Options

SSSNET provides the following command line arguments, which can be viewed in the ./src/param_parser.py and ./src/link_sign_param_parser.py.

Synthetic data options:

See file ./src/param_parser.py.

  --p                     FLOAT         Probability of the existence of a link.                 Default is 0.02. 
  --eta                   FLOAT         Probability of flipping the sign of each edge.          Default is 0.1.
  --N                     INT           (Expected) Number of nodes in an SSBM.                  Default is 1000.
  --K                     INT           Number of blocks in an SSBM.                            Default is 3.
  --total_n               INT           Total number of nodes in the polarized network.         Default is 1050.
  --num_com               INT           Number of polarized communities (SSBMs).                Default is 2.

Major model options:

See file ./src/param_parser.py.

  --epochs                INT         Number of SSSNET (maximum) training epochs.               Default is 300. 
  --early_stopping        INT         Number of SSSNET early stopping epochs.                   Default is 100. 
  --num_trials            INT         Number of trials to generate results.                     Default is 10.
  --seed_ratio            FLOAT       Ratio in the training set of each cluster 
                                                        to serve as seed nodes.                 Default is 0.1.
  --loss_ratio            FLOAT       Ratio of loss_pbnc to loss_pbrc. -1 means only loss_pbnc. Default is -1.0.
  --supervised_loss_ratio FLOAT       Ratio of factor of supervised loss part to
                                      self-supervised loss part.                                Default is 50.
  --triplet_loss_ratio    FLOAT       Ratio of triplet loss to cross entropy loss in 
                                      supervised loss part.                                     Default is 0.1.
  --tau                   FLOAT       Regularization parameter when adding self-loops to the positive 
                                      part of the adjacency matrix, i.e. A -> A + tau * I,
                                      where I is the identity matrix.                           Default is 0.5.
  --hop                   INT         Number of hops to consider for the random walk.           Default is 2.
  --samples               INT         Number of samples in triplet loss.                        Default is 10000.
  --train_ratio           FLOAT       Training ratio.                                           Default is 0.8.  
  --test_ratio            FLOAT       Test ratio.                                               Default is 0.1.
  --lr                    FLOAT       Initial learning rate.                                    Default is 0.01.  
  --weight_decay          FLOAT       Weight decay (L2 loss on parameters).                     Default is 5^-4. 
  --dropout               FLOAT       Dropout rate (1 - keep probability).                      Default is 0.5.
  --hidden                INT         Number of hidden units.                                   Default is 32. 
  --seed                  INT         Random seed.                                              Default is 31.
  --no-cuda               BOOL        Disables CUDA training.                                   Default is False.
  --debug, -D             BOOL        Debug with minimal training setting, not to get results.  Default is False.
  --directed              BOOL        Directed input graph.                                     Default is False.
  --no_validation         BOOL        Whether to disable validation and early stopping
                                      during traing.                                            Default is False.
  --regenerate_data       BOOL        Whether to force creation of data splits.                 Default is False.
  --load_only             BOOL        Whether not to store generated data.                      Default is False.
  --dense                 BOOL        Whether not to use torch sparse.                          Default is False.
  -AllTrain, -All         BOOL        Whether to use all data to do gradient descent.           Default is False.
  --SavePred, -SP         BOOL        Whether to save predicted labels.                         Default is False.
  --dataset               STR         Data set to consider.                                     Default is 'SSBM/'.
  --all_methods           LST         Methods to use to generate results.                       Default is ['spectral','SSSNET'].
  --feature_options       LST         Features to use for SSSNET. 
                                      Can choose from ['A_reg','L','given','None'].            Default is ['A_reg'].

Reproduce results

First, get into the ./execution/ folder:

cd execution

To reproduce SSBM results.

bash SSBM.sh

To reproduce results on polarized SSBMs.

bash polarized.sh

To reproduce results of node clustering on real data.

bash real.sh

Note that if you are operating on CPU, you may delete the commands ``CUDA_VISIBLE_DEVICES=xx". You can also set you own number of parallel jobs, not necessarily following the j numbers in the .sh files.

You can also use CPU for training if you add ``--no-duca", or GPU if you delete this.

Direct execution with training files

First, get into the ./src/ folder:

cd src

Then, below are various options to try:

Creating an SSSNET model for SSBM of the default setting.

python ./train.py

Creating an SSSNET model for polarized SSBMs with 5000 nodes, N=500.

python ./train.py --dataset polarized --total_n 5000 --N 500

Creating a model for S&P1500 data set with some custom learning rate and epoch number.

python ./train.py --dataset SP1500 --lr 0.001 --epochs 300

Creating a model for Wiki-Rfa data set (directed) with specific number of trials and use CPU.

python ./train.py --dataset wikirfa --directed --no-cuda --num_trials 5

Note

  • When no ground-truth exists, the labels loaded for training/testing are not really meaningful. They simply provides some relative Adjusted Rand Index (ARI) for the model's predicted clustering to some fitted/dummy clustering. The codebase loads these fitted/dummy labels and prints out ARIs for completeness instead of evaluation purpose.

  • Other versions of the code. ./src/PyG_models.py provides a version of SSSNET with pytorch geometric message passing implementation, and ./src/PyG_train.py runs this model for SSSNET. Note that this implementation gives almost the same results.


Owner
Yixuan He
DPhil in Statistics @ University of Oxford
Yixuan He
First-Order Probabilistic Programming Language

FOPPL: A First-Order Probabilistic Programming Language This is an implementation of FOPPL, an S-expression based probabilistic programming language d

Renato Costa 23 Dec 20, 2022
Implementation of Perceiver, General Perception with Iterative Attention, in Pytorch

Perceiver - Pytorch Implementation of Perceiver, General Perception with Iterative Attention, in Pytorch Install $ pip install perceiver-pytorch Usage

Phil Wang 876 Dec 29, 2022
S2-BNN: Bridging the Gap Between Self-Supervised Real and 1-bit Neural Networks via Guided Distribution Calibration (CVPR 2021)

S2-BNN (Self-supervised Binary Neural Networks Using Distillation Loss) This is the official pytorch implementation of our paper: "S2-BNN: Bridging th

Zhiqiang Shen 52 Dec 24, 2022
An intuitive library to extract features from time series

Time Series Feature Extraction Library Intuitive time series feature extraction This repository hosts the TSFEL - Time Series Feature Extraction Libra

Associação Fraunhofer Portugal Research 589 Jan 04, 2023
Modelisation on galaxy evolution using PEGASE-HR

model_galaxy Modelisation on galaxy evolution using PEGASE-HR This is a labwork done in internship at IAP directed by Damien Le Borgne (https://github

Adrien Anthore 1 Jan 14, 2022
Code for the paper "Regularizing Variational Autoencoder with Diversity and Uncertainty Awareness"

DU-VAE This is the pytorch implementation of the paper "Regularizing Variational Autoencoder with Diversity and Uncertainty Awareness" Acknowledgement

Dazhong Shen 4 Oct 19, 2022
Header-only library for using Keras models in C++.

frugally-deep Use Keras models in C++ with ease Table of contents Introduction Usage Performance Requirements and Installation FAQ Introduction Would

Tobias Hermann 927 Jan 05, 2023
"Learning and Analyzing Generation Order for Undirected Sequence Models" in Findings of EMNLP, 2021

undirected-generation-dev This repo contains the source code of the models described in the following paper "Learning and Analyzing Generation Order f

Yichen Jiang 0 Mar 25, 2022
Probabilistic Tracklet Scoring and Inpainting for Multiple Object Tracking

Probabilistic Tracklet Scoring and Inpainting for Multiple Object Tracking (CVPR 2021) Pytorch implementation of the ArTIST motion model. In this repo

Fatemeh 38 Dec 12, 2022
AgML is a comprehensive library for agricultural machine learning

AgML is a comprehensive library for agricultural machine learning. Currently, AgML provides access to a wealth of public agricultural datasets for common agricultural deep learning tasks.

Plant AI and Biophysics Lab 1 Jul 07, 2022
Tilted Empirical Risk Minimization (ICLR '21)

Tilted Empirical Risk Minimization This repository contains the implementation for the paper Tilted Empirical Risk Minimization ICLR 2021 Empirical ri

Tian Li 40 Nov 28, 2022
Static Features Classifier - A static features classifier for Point-Could clusters using an Attention-RNN model

Static Features Classifier This is a static features classifier for Point-Could

ABDALKARIM MOHTASIB 1 Jan 25, 2022
Phonetic PosteriorGram (PPG)-Based Voice Conversion (VC)

ppg-vc Phonetic PosteriorGram (PPG)-Based Voice Conversion (VC) This repo implements different kinds of PPG-based VC models. Pretrained models. More m

Liu Songxiang 227 Dec 28, 2022
交互式标注软件,暂定名 iann

iann 交互式标注软件,暂定名iann。 安装 按照官网介绍安装paddle。 安装其他依赖 pip install -r requirements.txt 运行 git clone https://github.com/PaddleCV-SIG/iann/ cd iann python iann

294 Dec 30, 2022
Pytorch code for paper "Image Compressed Sensing Using Non-local Neural Network" TMM 2021.

NL-CSNet-Pytorch Pytorch code for paper "Image Compressed Sensing Using Non-local Neural Network" TMM 2021. Note: this repo only shows the strategy of

WenxueCui 7 Nov 07, 2022
Modeling CNN layers activity with Gaussian mixture model

GMM-CNN This code package implements the modeling of CNN layers activity with Gaussian mixture model and Inference Graphs visualization technique from

3 Aug 05, 2022
When in Doubt: Improving Classification Performance with Alternating Normalization

When in Doubt: Improving Classification Performance with Alternating Normalization Findings of EMNLP 2021 Menglin Jia, Austin Reiter, Ser-Nam Lim, Yoa

Menglin Jia 13 Nov 06, 2022
BrainGNN - A deep learning model for data-driven discovery of functional connectivity

A deep learning model for data-driven discovery of functional connectivity https://doi.org/10.3390/a14030075 Usman Mahmood, Zengin Fu, Vince D. Calhou

Usman Mahmood 3 Aug 28, 2022
The implementation of CVPR2021 paper Temporal Query Networks for Fine-grained Video Understanding, by Chuhan Zhang, Ankush Gupta and Andrew Zisserman.

Temporal Query Networks for Fine-grained Video Understanding 📋 This repository contains the implementation of CVPR2021 paper Temporal_Query_Networks

55 Dec 21, 2022
Face Mask Detection is a project to determine whether someone is wearing mask or not, using deep neural network.

face-mask-detection Face Mask Detection is a project to determine whether someone is wearing mask or not, using deep neural network. It contains 3 scr

amirsalar 13 Jan 18, 2022