BiNE: Bipartite Network Embedding

Related tags

Text Data & NLPBiNE
Overview

BiNE: Bipartite Network Embedding

This repository contains the demo code of the paper:

BiNE: Bipartite Network Embedding. Ming Gao, Leihui Chen, Xiangnan He & Aoying Zhou

which has been accepted by SIGIR2018.

Note: Any problems, you can contact me at [email protected]. Through email, you will get my rapid response.

Environment settings

  • python==2.7.11
  • numpy==1.13.3
  • sklearn==0.17.1
  • networkx==1.11
  • datasketch==1.2.5
  • scipy==0.17.0
  • six==1.10.0

Basic Usage

Main Parameters:

Input graph path. Defult is '../data/rating_train.dat' (--train-data)
Test dataset path. Default is '../data/rating_test.dat' (--test-data)
Name of model. Default is 'default' (--model-name)
Number of dimensions. Default is 128 (--d)
Number of negative samples. Default is 4 (--ns)
Size of window. Default is 5 (--ws)
Trade-off parameter $\alpha$. Default is 0.01 (--alpha)
Trade-off parameter $\beta$. Default is 0.01 (--beta)
Trade-off parameter $\gamma$. Default is 0.1 (--gamma)
Learning rate $\lambda$. Default is 0.01 (--lam)
Maximal iterations. Default is 50 (--max-iters)
Maximal walks per vertex. Default is 32 (--maxT)
Minimal walks per vertex. Default is 1 (--minT)
Walk stopping probability. Default is 0.15 (--p)
Calculate the recommendation metrics. Default is 0 (--rec)
Calculate the link prediction. Default is 0 (--lip)
File of training data for LR. Default is '../data/wiki/case_train.dat' (--case-train)
File of testing data for LR. Default is '../data/wiki/case_test.dat' (--case-test)
File of embedding vectors of U. Default is '../data/vectors_u.dat' (--vectors-u)
File of embedding vectors of V. Default is '../data/vectors_v.dat' (--vectors-v)
For large bipartite, 1 do not generate homogeneous graph file; 2 do not generate homogeneous graph. Default is 0 (--large)
Mertics of centrality. Default is 'hits', options: 'hits' and 'degree_centrality' (--mode)

Usage

We provide two processed dataset:

  • DBLP (for recommendation). It contains:

    • A training dataset ./data/dblp/rating_train.dat
    • A testing dataset ./data/dblp/rating_test.dat
  • Wikipedia (for link prediction). It contains:

    • A training dataset ./data/wiki/rating_train.dat
    • A testing dataset ./data/wiki/rating_test.dat
  • Each line is a instance: userID (begin with 'u')\titemID (begin with 'i') \t weight\n

    For example: u0\ti0\t1

Please run the './model/train.py'

cd model
python train.py --train-data ../data/dblp/rating_train.dat --test-data ../data/dblp/rating_test.dat --lam 0.025 --max-iter 100 --model-name dblp --rec 1 --large 2 --vectors-u ../data/dblp/vectors_u.dat --vectors-v ../data/dblp/vectors_v.dat

The embedding vectors of nodes are saved in file '/model-name/vectors_u.dat' and '/model-name/vectors_v.dat', respectively.

Example

Recommendation

Run

cd model
python train.py --train-data ../data/dblp/rating_train.dat --test-data ../data/dblp/rating_test.dat --lam 0.025 --max-iter 100 --model-name dblp --rec 1 --large 2 --vectors-u ../data/dblp/vectors_u.dat --vectors-v ../data/dblp/vectors_v.dat

Output (training process)

======== experiment settings =========
alpha : 0.0100, beta : 0.0100, gamma : 0.1000, lam : 0.0250, p : 0.1500, ws : 5, ns : 4, maxT :  32, minT : 1, max_iter : 100
========== processing data ===========
constructing graph....
number of nodes: 6001
walking...
walking...ok
number of nodes: 1177
walking...
walking...ok
getting context and negative samples....
negative samples is ok.....
context...
context...ok
context...
context...ok
============== training ==============
[*************************************************************************************************** ]100.00%

Output (testing process)

============== testing ===============
recommendation metrics: F1 : 0.1132, MAP : 0.2041, MRR : 0.3331, NDCG : 0.2609

Link Prediction

Run

cd model
python train.py --train-data ../data/wiki/rating_train.dat --test-data ../data/wiki/rating_test.dat --lam 0.01 --max-iter 100 --model-name wiki --lip 1 --large 2 --gamma 1 --vectors-u ../data/wiki/vectors_u.dat --vectors-v ../data/wiki/vectors_v.dat --case-train ../data/wiki/case_train.dat --case-test ../data/wiki/case_test.dat

Output (training process)

======== experiment settings =========
alpha : 0.0100, beta : 0.0100, gamma : 1.0000, lam : 0.0100, p : 0.1500, ws : 5, ns : 4, maxT :  32, minT : 1, max_iter : 100, d : 128
========== processing data ===========
constructing graph....
number of nodes: 15000
walking...
walking...ok
number of nodes: 2529
walking...
walking...ok
getting context and negative samples....
negative samples is ok.....
context...
context...ok
context...
context...ok
============== training ==============
[*************************************************************************************************** ]100.00%

Output (testing process)

============== testing ===============
link prediction metrics: AUC_ROC : 0.9468, AUC_PR : 0.9614
Owner
leihuichen
student
leihuichen
基于百度的语音识别,用python实现,pyaudio+pyqt

Speech-recognition 基于百度的语音识别,python3.8(conda)+pyaudio+pyqt+baidu-aip 百度有面向python

J-L 1 Jan 03, 2022
Fake news detector filters - Smart filter project allow to classify the quality of information and web pages

fake-news-detector-1.0 Lists, lists and more lists... Spam filter list, quality keyword list, stoplist list, top-domains urls list, news agencies webs

Memo Sim 1 Jan 04, 2022
Named-entity recognition using neural networks. Easy-to-use and state-of-the-art results.

NeuroNER NeuroNER is a program that performs named-entity recognition (NER). Website: neuroner.com. This page gives step-by-step instructions to insta

Franck Dernoncourt 1.6k Dec 27, 2022
Higher quality textures for the Metal Gear Solid series.

Metal Gear Solid: HD Textures Higher quality textures for the Metal Gear Solid series. The goal is to maximize the quality of assets that the engine w

Samantha 6 Dec 06, 2022
2021海华AI挑战赛·中文阅读理解·技术组·第三名

文字是人类用以记录和表达的最基本工具,也是信息传播的重要媒介。透过文字与符号,我们可以追寻人类文明的起源,可以传播知识与经验,读懂文字是认识与了解的第一步。对于人工智能而言,它的核心问题之一就是认知,而认知的核心则是语义理解。

21 Dec 26, 2022
aMLP Transformer Model for Japanese

aMLP-japanese Japanese aMLP Pretrained Model aMLPとは、Liu, Daiらが提案する、Transformerモデルです。 ざっくりというと、BERTの代わりに使えて、より性能の良いモデルです。 詳しい解説は、こちらの記事などを参考にしてください。 この

tanreinama 13 Aug 11, 2022
The proliferation of disinformation across social media has led the application of deep learning techniques to detect fake news.

Fake News Detection Overview The proliferation of disinformation across social media has led the application of deep learning techniques to detect fak

Kushal Shingote 1 Feb 08, 2022
Translators - is a library which aims to bring free, multiple, enjoyable translation to individuals and students in Python

Translators - is a library which aims to bring free, multiple, enjoyable translation to individuals and students in Python

UlionTse 907 Dec 27, 2022
Machine translation models released by the Gourmet project

Gourmet Models Overview The Gourmet project has released several machine translation models to translate low-resource languages. This repository conta

Edinburgh NLP 5 Dec 08, 2021
MMDA - multimodal document analysis

MMDA - multimodal document analysis

AI2 75 Jan 04, 2023
NLP-Project - Used an API to scrape 2000 reddit posts, then used NLP analysis and created a classification model to mixed succcess

Project 3: Web APIs & NLP Problem Statement How do r/Libertarian and r/Neoliberal differ on Biden post-inaguration? The goal of the project is to see

Adam Muhammad Klesc 2 Mar 29, 2022
Sentence Embeddings with BERT & XLNet

Sentence Transformers: Multilingual Sentence Embeddings using BERT / RoBERTa / XLM-RoBERTa & Co. with PyTorch This framework provides an easy method t

Ubiquitous Knowledge Processing Lab 9.1k Jan 02, 2023
本插件是pcrjjc插件的重置版,可以独立于后端api运行

pcrjjc2 本插件是pcrjjc重置版,不需要使用其他后端api,但是需要自行配置客户端 本项目基于AGPL v3协议开源,由于项目特殊性,禁止基于本项目的任何商业行为 配置方法 环境需求:.net framework 4.5及以上 jre8 别忘了装jre8 别忘了装jre8 别忘了装jre8

132 Dec 26, 2022
Black for Python docstrings and reStructuredText (rst).

Style-Doc Style-Doc is Black for Python docstrings and reStructuredText (rst). It can be used to format docstrings (Google docstring format) in Python

Telekom Open Source Software 13 Oct 24, 2022
Scikit-learn style model finetuning for NLP

Scikit-learn style model finetuning for NLP Finetune is a library that allows users to leverage state-of-the-art pretrained NLP models for a wide vari

indico 665 Dec 17, 2022
Crie tokens de autenticação íntegros e seguros com UToken.

UToken - Tokens seguros. UToken (ou Unhandleable Token) é uma bilioteca criada para ser utilizada na geração de tokens seguros e íntegros, ou seja, nã

Jaedson Silva 0 Nov 29, 2022
A repository to run gpt-j-6b on low vram machines (4.2 gb minimum vram for 2000 token context, 3.5 gb for 1000 token context). Model loading takes 12gb free ram.

Basic-UI-for-GPT-J-6B-with-low-vram A repository to run GPT-J-6B on low vram systems by using both ram, vram and pinned memory. There seem to be some

90 Dec 25, 2022
An algorithm that can solve the word puzzle Wordle with an optimal number of guesses on HARD mode.

WordleSolver An algorithm that can solve the word puzzle Wordle with an optimal number of guesses on HARD mode. How to use the program Copy this proje

Akil Selvan Rajendra Janarthanan 3 Mar 02, 2022
The code for the Subformer, from the EMNLP 2021 Findings paper: "Subformer: Exploring Weight Sharing for Parameter Efficiency in Generative Transformers", by Machel Reid, Edison Marrese-Taylor, and Yutaka Matsuo

Subformer This repository contains the code for the Subformer. To help overcome this we propose the Subformer, allowing us to retain performance while

Machel Reid 10 Dec 27, 2022
Prompt-learning is the latest paradigm to adapt pre-trained language models (PLMs) to downstream NLP tasks

Prompt-learning is the latest paradigm to adapt pre-trained language models (PLMs) to downstream NLP tasks, which modifies the input text with a textual template and directly uses PLMs to conduct pre

THUNLP 2.3k Jan 08, 2023