README

Code for the paper Asymptotics of L2 Regularized Network Embeddings.

Requirements

Requires Stellargraph 1.2.1, Tensorflow 2.6.0, scikit-learm 0.24.1, tqdm, along with any other packages required for the above three packages.

Code

To run node classification or link prediction experiments, run

python -m code.train_embed [[args]]

or

python -m code.train_embed_link [[args]]

from the command line respectively, where [[args]] correspond to the command line arguments for each function. Note that the scripts expect to run from the parent directory of the code folder; you will need to change the import statements in the associated python files if you move them around. The -h command line argument will display the arguments (with descriptions) of each of the two files.

train_embed.py arguments

short	long	default	help
`-h`	`--help`		show this help message and exit
	`--dataset`	`Cora`	Dataset to perform training on. Available options: Cora,CiteSeer,PubMedDiabetes
	`--emb-size`	`128`	Embedding dimension. Defaults to 128.
	`--reg-weight`	`0.0`	Weight to use for L2 regularization. If norm_reg is True, then reg_weight/num_of_nodes is used instead.
	`--norm-reg`		Boolean for whether to normalize the L2 regularization weight by the number of nodes in the graph. Defaults to false.
	`--method`	`node2vec`	Algorithm to perform training on. Available options: node2vec,GraphSAGE,GCN,DGI
	`--verbose`	`1`	Level of verbosity. Defaults to 1.
	`--epochs`	`5`	Number of epochs through the dataset to be used for training.
	`--optimizer`	`Adam`	Optimization algorithm to use for training.
	`--learning-rate`	`0.001`	Learning rate to use for optimization.
	`--batch-size`	`64`	Batch size used for training.
	`--train-split`	`[0.01, 0.025, 0.05]`	Percentage(s) to use for the training split when using the learned embeddings for downstream classification tasks.
	`--train-split-num`	`25`	Decides the number of random training/test splits to use for evaluating performance. Defaults to 50.
	`--output-fname`	`None`	If not None, saves the hyperparameters and testing results to a .json file with filename given by the argument.
	`--node2vec-p`	`1.0`	Hyperparameter governing probability of returning to source node.
	`--node2vec-q`	`1.0`	Hyperparameter governing probability of moving to a node away from the source node.
	`--node2vec-walk-number`	`50`	Number of walks used to generate a sample for node2vec.
	`--node2vec-walk-length`	`5`	Walk length to use for node2vec.
	`--dgi-sampler`	`fullbatch`	Specifies either a fullbatch or a minibatch sampling scheme for DGI.
	`--gcn-activation`	`['relu']`	Determines the activations of each layer within a GCN. Defaults to a single layer with relu activation.
	`--graphSAGE-aggregator`	`mean`	Specifies the aggreagtion rule used in GraphSAGE. Defaults to mean pooling.
	`--graphSAGE-nbhd-sizes`	`[10, 5]`	Specify multiple neighbourhood sizes for sampling in GraphSAGE. Defaults to [10, 5].
	`--tensorboard`		If toggles, saves Tensorboard logs for debugging purposes.
	`--visualize-embeds`	`None`	If specified with a directory, saves an image of a TSNE 2D projection of the learned embeddings at the specified directory.
	`--save-spectrum`	`None`	If specifies, saves the spectrum of the learned embeddings output by the algorithm.

train_embed_link.py arguments

short	long	default	help
`-h`	`--help`		show this help message and exit
	`--dataset`	`Cora`	Dataset to perform training on. Available options: Cora,CiteSeer,PubMedDiabetes
	`--emb-size`	`128`	Embedding dimension. Defaults to 128.
	`--reg-weight`	`0.0`	Weight to use for L2 regularization. If norm_reg is True, then reg_weight/num_of_nodes is used instead.
	`--norm-reg`		Boolean for whether to normalize the L2 regularization weight by the number of nodes in the graph. Defaults to false.
	`--method`	`node2vec`	Algorithm to perform training on. Available options: node2vec,GraphSAGE,GCN,DGI
	`--verbose`	`1`	Level of verbosity. Defaults to 1.
	`--epochs`	`5`	Number of epochs through the dataset to be used for training.
	`--optimizer`	`Adam`	Optimization algorithm to use for training.
	`--learning-rate`	`0.001`	Learning rate to use for optimization.
	`--batch-size`	`64`	Batch size used for training.
	`--test-split`	`0.1`	Split of edge/non-edge set to be used for testing.
	`--output-fname`	`None`	If not None, saves the hyperparameters and testing results to a .json file with filename given by the argument.
	`--node2vec-p`	`1.0`	Hyperparameter governing probability of returning to source node.
	`--node2vec-q`	`1.0`	Hyperparameter governing probability of moving to a node away from the source node.
	`--node2vec-walk-number`	`50`	Number of walks used to generate a sample for node2vec.
	`--node2vec-walk-length`	`5`	Walk length to use for node2vec.
	`--gcn-activation`	`['relu']`	Specifies layers in terms of their output activation (either relu or linear), with the number of arguments determining the length of the GCN. Defaults to a single layer with relu activation.
	`--graphSAGE-aggregator`	`mean`	Specifies the aggreagtion rule used in GraphSAGE. Defaults to mean pooling.
	`--graphSAGE-nbhd-sizes`	`[10, 5]`	Specify multiple neighbourhood sizes for sampling in GraphSAGE. Defaults to [25, 10].

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
__pycache__		__pycache__
code		code
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pycache

pycache

code

code

LICENSE

LICENSE

README.md

README.md

requirements.txt

requirements.txt

Repository files navigation

README

Requirements

Code

train_embed.py arguments

train_embed_link.py arguments

About

Releases

Packages

Languages

License

aday651/embed-reg

Folders and files

Latest commit

History

Repository files navigation

README

Requirements

Code

train_embed.py arguments

train_embed_link.py arguments

About

Resources

License

Stars

Watchers

Forks

Languages