Little Ball of Fur - A graph sampling extension library for NetworKit and NetworkX (CIKM 2020)

Overview

Version repo size Arxiv build badge coverage badge benedekrozemberczki


Little Ball of Fur is a graph sampling extension library for Python.

Please look at the Documentation, relevant Paper, Promo video and External Resources.

Little Ball of Fur consists of methods that can sample from graph structured data. To put it simply it is a Swiss Army knife for graph sampling tasks. First, it includes a large variety of vertex, edge, and exploration sampling techniques. Second, it provides a unified application public interface which makes the application of sampling algorithms trivial for end-users. Implemented methods cover a wide range of networking (Networking, INFOCOM, SIGCOMM) and data mining (KDD, TKDD, ICDE) conferences, workshops, and pieces from prominent journals.


Citing

If you find Little Ball of Fur useful in your research, please consider citing the following paper:

@inproceedings{littleballoffur,
               title={{Little Ball of Fur: A Python Library for Graph Sampling}},
               author={Benedek Rozemberczki and Oliver Kiss and Rik Sarkar},
               year={2020},
               pages = {3133–3140},
               booktitle={Proceedings of the 29th ACM International Conference on Information and Knowledge Management (CIKM '20)},
               organization={ACM},
}

A simple example

Little Ball of Fur makes using modern graph subsampling techniques quite easy (see here for the accompanying tutorial). For example, this is all it takes to use Diffusion Sampling on a Watts-Strogatz graph:

import networkx as nx
from littleballoffur import DiffusionSampler

graph = nx.newman_watts_strogatz_graph(1000, 20, 0.05)

sampler = DiffusionSampler()

new_graph = sampler.sample(graph)

Methods included

In detail, the following sampling methods were implemented.

Node Sampling

Edge Sampling

Exploration Based Sampling

Head over to our documentation to find out more about installation and data handling, a full list of implemented methods, and datasets. For a quick start, check out our examples.

If you notice anything unexpected, please open an issue and let us know. If you are missing a specific method, feel free to open a feature request. We are motivated to constantly make Little Ball of Fur even better.


Installation

Little Ball of Fur can be installed with the following pip command.

$ pip install littleballoffur

As we create new releases frequently, upgrading the package casually might be beneficial.

$ pip install littleballoffur --upgrade

Running examples

As part of the documentation we provide a number of use cases to show how to use various sampling techniques. These can accessed here with detailed explanations.

Besides the case studies we provide synthetic examples for each model. These can be tried out by running the scripts in the examples folder. You can try out the random walk sampling example by running:

$ cd examples
$ python ./exploration_sampling/randomwalk_sampler.py

Running tests

$ python setup.py test

License

Comments
  • change initial num of nodes formula

    change initial num of nodes formula

    to avoid having more initial nodes than the requested final number of nodes (when the final number of nodes requested is much smaller than the graph size).

    opened by bricaud 7
  • Error install dependency networkit==7.1

    Error install dependency networkit==7.1

    I didn't manage to install littleballoffur due to one of its dependency that seems outdated. It didn't work to install networkit==7.1 but I did manage to run its latest version. However, littleballoffur runs on networkit==7.1.

    I am using a Jupyter notebook as an environment and the following system specs: posix Darwin 21.4.0 3.8.12 (default, Mar 17 2022, 14:54:15) [Clang 13.0.0 (clang-1300.0.29.30)]

    The specific error output:

    Collecting networkit==7.1
      Using cached networkit-7.1.tar.gz (3.1 MB)
      Preparing metadata (setup.py) ... error
      error: subprocess-exited-with-error
      
      × python setup.py egg_info did not run successfully.
      │ exit code: 1
      ╰─> [2 lines of output]
          ERROR: No suitable compiler found. Install any of these:  ['g++', 'g++-8', 'g++-7', 'g++-6.1', 'g++-6', 'g++-5.3', 'g++-5.2', 'g++-5.1', 'g++-5', 'g++-4.9', 'g++-4.8', 'clang++', 'clang++-3.8', 'clang++-3.7']
          If using AppleClang, OpenMP might be needed. Install with: 'brew install libomp'
          [end of output]
      
      note: This error originates from a subprocess, and is likely not a problem with pip.
    error: metadata-generation-failed
    
    × Encountered error while generating package metadata.
    ╰─> See above for output.
    
    note: This is an issue with the package mentioned above, not pip.
    hint: See above for details.
    
    

    Please note that: libomp 14.0.0 is already installed and up-to-date.

    Is there some way I could install the library on networkit v10? Thanks a lot!

    opened by CristinBSE 6
  • Node attributes are not copied from original graph

    Node attributes are not copied from original graph

    Breadth and Depth First Search return me subgraphs without correct attributes on nodes/edges. Actually, I found that the dict containing those attributes has been completely deleted in the sampled graph. Is this a known issue? Is the sampler supposed to work in this way?

    opened by jungla88 6
  • Why can't I use the graph imported by nx.read_edgelist()

    Why can't I use the graph imported by nx.read_edgelist()

    graph = nx.read_edgelist("filename", nodetype=int, data=(("Weight", int),))

    error : AssertionError: Graph is not connected. why? 'graph' is a networkx graph

    opened by DeathSentence 5
  • Spikyball exploration sampling

    Spikyball exploration sampling

    You might find the change a bit invasive (understandable :) This adds a new family exploration sampling method (spikyball) described in the paper Spikyball sampling: Exploring large networks via an inhomogeneous filtered diffusion available here https://arxiv.org/abs/2010.11786 and submitted for publication in Combinatorial Optimization, Graph, and Network Algorithms journal. The version number has been increased in order not to collide with official releases of lbof, you might want to change this...

    opened by naspert 4
  • Assumptions on graph properties

    Assumptions on graph properties

    Hi there,

    I am wondering if it would be possible to relax some constrain the graph has to satisfy in order to start an exploration on it. In particular, the requirement of connectivity seems a bit strong to me. I think a graph sampling procedure could easily deal with such property, since in the case the graph is not connected the sampling could take place on the single connected components or the exploration could rely on the neighborhood of the current node explored. For node sampling strategies like BFS and DFS looks pretty natural to me, also for Random Walk Sampling (maybe the one with the restart probability could be a little tricky). Something strange could probably happen for edge sampling if the connectivity property is not satisfied. Do you see any possibility to extend little ball of fur to such type of graphs? What was the reason that bring you to assume the connectivity property for graphs?

    Thank you !

    opened by jungla88 3
  • Error importing DiffusionSampler

    Error importing DiffusionSampler

    Hello,

    First of all, thank you for your great work building this library. Great extension to NetworkX.

    I am facing an issue when trying to import the DiffusionSampler specifically. All the other samplers get imported just fine. However the DiffusionSampler raises import issue.

    I am using a Jupyter notebook as an environment.

    The specific error output:

    ---------------------------------------------------------------------------
    ImportError                               Traceback (most recent call last)
    <ipython-input-29-fbd222d9c756> in <module>
    ----> 1 from littleballoffur import DiffusionSampler
          2 
          3 
          4 model = DiffusionSampler()
          5 new_graph = model.sample(wd50k_connected_relabeled)
    
    ImportError: cannot import name 'DiffusionSampler' from 'littleballoffur'
    

    Is this replicable?

    Thank you in advance for looking into it.

    opened by DimitrisAlivas 3
  • ForestFireSampler throws exceptions for some seed values

    ForestFireSampler throws exceptions for some seed values

    Hi,

    I am trying to sample an undirected, connected graph of 5559 nodes and 10804 edges into a sample of 100 nodes. As I loop over the "creation of samples" part, I am altering the seed for the ForeFireSampler every time to obtain a different sample.

    E.g. seed_value = random.randint(1,2147483646) sampler = ForestFireSampler(100, seed=seed_value )

    However, for some runs I get an exception thrown, which is also reproducible. I assume it is related to specific seed values which the sampler doesn´t seem to be able to handle. An example is seed value 1176372277.

    Traceback (most recent call last): File "/project/topology_extraction.py", line 472, in abstraction_G = graph_sampling(S) File "/project/topology_extraction.py", line 234, in graph_sampling new_graph = sampler.sample(S) File "/usr/local/lib/python3.8/dist-packages/littleballoffur/exploration_sampling/forestfiresampler.py", line 74, in sample self._start_a_fire(graph) File "/usr/local/lib/python3.8/dist-packages/littleballoffur/exploration_sampling/forestfiresampler.py", line 47, in _start_a_fire top_node = node_queue.popleft() IndexError: pop from an empty deque

    Process finished with exit code 1

    I believe this is a bug in the library.

    Thanks! Nils

    opened by nrodday 3
  • Error in forest fire sampling

    Error in forest fire sampling

    Hi,

    While running the forest fire sampling code, I got an error that it is trying to pop an element from an empty deque.

    File "/opt/anaconda3/lib/python3.7/site-packages/littleballoffur/exploration_sampling/forestfiresampler.py", line 47, in _start_a_fire top_node = node_queue.popleft() IndexError: pop from an empty deque

    I am not sure if it was due to data or needs an empty/try-catch check or should it be handled by application code. Hence opened an issue.

    Thank you

    opened by apurvamulay 2
  • Broken link in Readme (readdthedocs)

    Broken link in Readme (readdthedocs)

    https://littleballoffur.readthedocs.io/en/latest/notes/introduction.html

    as of 2020-05-18 9:35 AM EDT, it says "sorry this page does not exist"

    opened by bbrewington 1
  • Error in line 254 _checking_indexing() of backend.py

    Error in line 254 _checking_indexing() of backend.py

    According to your code, once numeric_indices != node_indices, the error raises. Under my scenario, I constructed a networkx graph in which the indices of nodes start from '1', and then, the sampler did not work. This error will be triggered if the indices of nodes in a networkx graph do not start from '0'. I have to adjust my graph such that the indices of nodes start from '0' to utilize your samplers. I hope you can refine this part of the code to avoid someone else meets this problem.

    opened by Haoran-Young 0
Releases(v_20200)
Owner
Benedek Rozemberczki
Machine Learning Engineer at AstraZeneca and PhD candidate at The University of Edinburgh.
Benedek Rozemberczki
Decision Tree Regression algorithm implemented on Python from scratch.

Decision_Tree_Regression I implemented the decision tree regression algorithm on Python. Unlike regular linear regression, this algorithm is used when

1 Dec 22, 2021
scikit-learn models hyperparameters tuning and feature selection, using evolutionary algorithms.

Sklearn-genetic-opt scikit-learn models hyperparameters tuning and feature selection, using evolutionary algorithms. This is meant to be an alternativ

Rodrigo Arenas 180 Dec 20, 2022
Lseng-iseng eksplor Machine Learning dengan menggunakan library Scikit-Learn

Kalo dengar istilah ML, biasanya rada ambigu. Soalnya punya beberapa kepanjangan, seperti Mobile Legend, Makan Lontong, Ma**ng L*v* dan lain-lain. Tapi pada repo ini membahas Machine Learning :)

Alfiyanto Kondolele 1 Apr 06, 2022
A series of Jupyter notebooks that walk you through the fundamentals of Machine Learning and Deep Learning in Python using Scikit-Learn, Keras and TensorFlow 2.

Machine Learning Notebooks, 3rd edition This project aims at teaching you the fundamentals of Machine Learning in python. It contains the example code

Aurélien Geron 1.6k Jan 05, 2023
Implementation of K-Nearest Neighbors Algorithm Using PySpark

KNN With Spark Implementation of KNN using PySpark. The KNN was used on two separate datasets (https://archive.ics.uci.edu/ml/datasets/iris and https:

Zachary Petroff 4 Dec 30, 2022
Convoys is a simple library that fits a few statistical model useful for modeling time-lagged conversions.

Convoys is a simple library that fits a few statistical model useful for modeling time-lagged conversions. There is a lot more info if you head over to the documentation. You can also take a look at

Better 240 Dec 26, 2022
Regularization and Feature Selection in Least Squares Temporal Difference Learning

Regularization and Feature Selection in Least Squares Temporal Difference Learning Description This is Python implementations of Least Angle Regressio

Mina Parham 0 Jan 18, 2022
Interactive Parallel Computing in Python

Interactive Parallel Computing with IPython ipyparallel is the new home of IPython.parallel. ipyparallel is a Python package and collection of CLI scr

IPython 2.3k Dec 30, 2022
Pandas-method-chaining is a plugin for flake8 that provides method chaining linting for pandas code

pandas-method-chaining pandas-method-chaining is a plugin for flake8 that provides method chaining linting for pandas code. It is a fork from pandas-v

Francis 5 May 14, 2022
PennyLane is a cross-platform Python library for differentiable programming of quantum computers

PennyLane is a cross-platform Python library for differentiable programming of quantum computers. Train a quantum computer the same way as a neural ne

PennyLaneAI 1.6k Jan 01, 2023
Meerkat provides fast and flexible data structures for working with complex machine learning datasets.

Meerkat makes it easier for ML practitioners to interact with high-dimensional, multi-modal data. It provides simple abstractions for data inspection, model evaluation and model training supported by

Robustness Gym 115 Dec 12, 2022
Deep Survival Machines - Fully Parametric Survival Regression

Package: dsm Python package dsm provides an API to train the Deep Survival Machines and associated models for problems in survival analysis. The under

Carnegie Mellon University Auton Lab 10 Dec 30, 2022
A Python toolkit for rule-based/unsupervised anomaly detection in time series

Anomaly Detection Toolkit (ADTK) Anomaly Detection Toolkit (ADTK) is a Python package for unsupervised / rule-based time series anomaly detection. As

Arundo Analytics 888 Dec 30, 2022
Metric learning algorithms in Python

metric-learn: Metric Learning in Python metric-learn contains efficient Python implementations of several popular supervised and weakly-supervised met

1.3k Dec 28, 2022
CyLP is a Python interface to COIN-OR’s Linear and mixed-integer program solvers (CLP, CBC, and CGL)

CyLP CyLP is a Python interface to COIN-OR’s Linear and mixed-integer program solvers (CLP, CBC, and CGL). CyLP’s unique feature is that you can use i

COIN-OR Foundation 161 Dec 14, 2022
Time-series momentum for momentum investing strategy

Time-series-momentum Time-series momentum strategy. You can use the data_analysis.py file to find out the best trigger and window for a given asset an

Victor Caldeira 3 Jun 18, 2022
A repository for collating all the resources such as articles, blogs, papers, and books related to Bayesian Statistics.

A repository for collating all the resources such as articles, blogs, papers, and books related to Bayesian Statistics.

Aayush Malik 80 Dec 12, 2022
Hierarchical Time Series Forecasting using Prophet

htsprophet Hierarchical Time Series Forecasting using Prophet Credit to Rob J. Hyndman and research partners as much of the code was developed with th

Collin Rooney 131 Dec 02, 2022
Time Series Prediction with tf.contrib.timeseries

TensorFlow-Time-Series-Examples Additional examples for TensorFlow Time Series(TFTS). Read a Time Series with TFTS From a Numpy Array: See "test_input

Zhiyuan He 476 Nov 17, 2022
XAI - An eXplainability toolbox for machine learning

XAI - An eXplainability toolbox for machine learning XAI is a Machine Learning library that is designed with AI explainability in its core. XAI contai

The Institute for Ethical Machine Learning 875 Dec 27, 2022