A Pytorch implementation of "Splitter: Learning Node Representations that Capture Multiple Social Contexts" (WWW 2019).

Overview

Splitter Arxiv repo sizebenedekrozemberczki

A PyTorch implementation of Splitter: Learning Node Representations that Capture Multiple Social Contexts (WWW 2019).

Abstract

Recent interest in graph embedding methods has focused on learning a single representation for each node in the graph. But can nodes really be best described by a single vector representation? In this work, we propose a method for learning multiple representations of the nodes in a graph (e.g., the users of a social network). Based on a principled decomposition of the ego-network, each representation encodes the role of the node in a different local community in which the nodes participate. These representations allow for improved reconstruction of the nuanced relationships that occur in the graph a phenomenon that we illustrate through state-of-the-art results on link prediction tasks on a variety of graphs, reducing the error by up to 90%. In addition, we show that these embeddings allow for effective visual analysis of the learned community structure.

This repository provides a PyTorch implementation of Splitter as described in the paper:

Splitter: Learning Node Representations that Capture Multiple Social Contexts. Alessandro Epasto and Bryan Perozzi. WWW, 2019. [Paper]

The original Tensorflow implementation is available [here].

Requirements

The codebase is implemented in Python 3.5.2. package versions used for development are just below.

networkx          1.11
tqdm              4.28.1
numpy             1.15.4
pandas            0.23.4
texttable         1.5.0
scipy             1.1.0
argparse          1.1.0
torch             1.1.0
gensim            3.6.0

Datasets

The code takes the **edge list** of the graph in a csv file. Every row indicates an edge between two nodes separated by a comma. The first row is a header. Nodes should be indexed starting with 0. A sample graph for `Cora` is included in the `input/` directory.

Outputs

The embeddings are saved in the `input/` directory. Each embedding has a header and a column with the node IDs. Finally, the node embedding is sorted by the node ID column.

Options

The training of a Splitter embedding is handled by the `src/main.py` script which provides the following command line arguments.

Input and output options

  --edge-path               STR    Edge list csv.           Default is `input/chameleon_edges.csv`.
  --embedding-output-path   STR    Embedding output csv.    Default is `output/chameleon_embedding.csv`.
  --persona-output-path     STR    Persona mapping JSON.    Default is `output/chameleon_personas.json`.

Model options

  --seed               INT     Random seed.                       Default is 42.
  --number of walks    INT     Number of random walks per node.   Default is 10.
  --window-size        INT     Skip-gram window size.             Default is 5.
  --negative-samples   INT     Number of negative samples.        Default is 5.
  --walk-length        INT     Random walk length.                Default is 40.
  --lambd              FLOAT   Regularization parameter.          Default is 0.1
  --dimensions         INT     Number of embedding dimensions.    Default is 128.
  --workers            INT     Number of cores for pre-training.  Default is 4.   
  --learning-rate      FLOAT   SGD learning rate.                 Default is 0.025

Examples

The following commands learn an embedding and save it with the persona map. Training a model on the default dataset.

python src/main.py

Training a Splitter model with 32 dimensions.

python src/main.py --dimensions 32

Increasing the number of walks and the walk length.

python src/main.py --number-of-walks 20 --walk-length 80

License


Owner
Benedek Rozemberczki
Machine Learning Engineer at AstraZeneca | PhD from The University of Edinburgh.
Benedek Rozemberczki
Blender addon - Scrub timeline from viewport with a shortcut

Viewport scrub timeline Move in the timeline directly in viewport and snap to nearest keyframe Note : This standalone feature will be added in the nat

Samuel Bernou 40 Nov 07, 2022
A toolkit for document-level event extraction, containing some SOTA model implementations

Document-level Event Extraction via Heterogeneous Graph-based Interaction Model with a Tracker Source code for ACL-IJCNLP 2021 Long paper: Document-le

84 Dec 15, 2022
In this project, we aim to achieve the task of predicting emojis from tweets. We aim to investigate the relationship between words and emojis.

Making Emojis More Predictable by Karan Abrol, Karanjot Singh and Pritish Wadhwa, Natural Language Processing (CSE546) under the guidance of Dr. Shad

Karanjot Singh 2 Jan 17, 2022
뉴스 도메인 질의응답 시스템 (21-1학기 졸업 프로젝트)

뉴스 도메인 질의응답 시스템 본 프로젝트는 뉴스기사에 대한 질의응답 서비스 를 제공하기 위해서 진행한 프로젝트입니다. 약 3개월간 ( 21. 03 ~ 21. 05 ) 진행하였으며 Transformer 아키텍쳐 기반의 Encoder를 사용하여 한국어 질의응답 데이터셋으로

TaegyeongEo 4 Jul 08, 2022
Binary LSTM model for text classification

Text Classification The purpose of this repository is to create a neural network model of NLP with deep learning for binary classification of texts re

Nikita Elenberger 1 Mar 11, 2022
auto_code_complete is a auto word-completetion program which allows you to customize it on your need

auto_code_complete v1.3 purpose and usage auto_code_complete is a auto word-completetion program which allows you to customize it on your needs. the m

RUO 2 Feb 22, 2022
This repo stores the codes for topic modeling on palliative care journals.

This repo stores the codes for topic modeling on palliative care journals. Data Preparation You first need to download the journal papers. bash 1_down

3 Dec 20, 2022
Code for paper "Role-oriented Network Embedding Based on Adversarial Learning between Higher-order and Local Features"

Role-oriented Network Embedding Based on Adversarial Learning between Higher-order and Local Features Train python main.py --dataset brazil-flights C

wang zhang 0 Jun 28, 2022
AI-powered literature discovery and review engine for medical/scientific papers

AI-powered literature discovery and review engine for medical/scientific papers paperai is an AI-powered literature discovery and review engine for me

NeuML 819 Dec 30, 2022
Statistics and Mathematics for Machine Learning, Deep Learning , Deep NLP

Stat4ML Statistics and Mathematics for Machine Learning, Deep Learning , Deep NLP This is the first course from our trio courses: Statistics Foundatio

Omid Safarzadeh 83 Dec 29, 2022
A program that uses real statistics to choose the best times to bet on BloxFlip's crash gamemode

Bloxflip Smart Bet A program that uses real statistics to choose the best times to bet on BloxFlip's crash gamemode. https://bloxflip.com/crash. THIS

43 Jan 05, 2023
Scikit-learn style model finetuning for NLP

Scikit-learn style model finetuning for NLP Finetune is a library that allows users to leverage state-of-the-art pretrained NLP models for a wide vari

indico 665 Dec 17, 2022
An open source framework for seq2seq models in PyTorch.

pytorch-seq2seq Documentation This is a framework for sequence-to-sequence (seq2seq) models implemented in PyTorch. The framework has modularized and

International Business Machines 1.4k Jan 02, 2023
CJK computer science terms comparison / 中日韓電腦科學術語對照 / 日中韓のコンピュータ科学の用語対照 / 한·중·일 전산학 용어 대조

CJK computer science terms comparison This repository contains the source code of the website. You can see the website from the following link: Englis

Hong Minhee (洪 民憙) 88 Dec 23, 2022
Integrating the Best of TF into PyTorch, for Machine Learning, Natural Language Processing, and Text Generation. This is part of the CASL project: http://casl-project.ai/

Texar-PyTorch is a toolkit aiming to support a broad set of machine learning, especially natural language processing and text generation tasks. Texar

ASYML 726 Dec 30, 2022
A design of MIDI language for music generation task, specifically for Natural Language Processing (NLP) models.

MIDI Language Introduction Reference Paper: Pop Music Transformer: Beat-based Modeling and Generation of Expressive Pop Piano Compositions: code This

Robert Bogan Kang 3 May 25, 2022
Trains an OpenNMT PyTorch model and SentencePiece tokenizer.

Trains an OpenNMT PyTorch model and SentencePiece tokenizer. Designed for use with Argos Translate and LibreTranslate.

Argos Open Tech 61 Dec 13, 2022
The entmax mapping and its loss, a family of sparse softmax alternatives.

entmax This package provides a pytorch implementation of entmax and entmax losses: a sparse family of probability mappings and corresponding loss func

DeepSPIN 330 Dec 22, 2022
A repo for open resources & information for people to succeed in PhD in CS & career in AI / NLP

A repo for open resources & information for people to succeed in PhD in CS & career in AI / NLP

420 Dec 28, 2022