A Graph Learning library for Humans

Overview

A Graph Learning library for Humans

These novel algorithms include but are not limited to:

  • A graph construction and graph searching class can be found here (NodeGraph). It was developed and invented as a faster alternative for hierarchical DAG construction and searching.
  • A fast DBSCAN method utilizing my connectivity code as invented during my PhD.
  • A NLP pattern matching algorithm useful for sequence alignment clustering.
  • High dimensional alignment code for aligning models to data.
  • An SVD based variant of the Distance Geometry algorithm. For going from relative to absolute coordinates.

License DOI Downloads

Visit the active code via : https://github.com/richardtjornhammar/graphtastic

Pip installation with :

pip install graphtastic

Version controlled installation of the Graphtastic library

The Graphtastic library

In order to run these code snippets we recommend that you download the nix package manager. Nix package manager links from Februari 2022:

https://nixos.org/download.html

$ curl -L https://nixos.org/nix/install | sh

If you cannot install it using your Wintendo then please consider installing Windows Subsystem for Linux first:

https://docs.microsoft.com/en-us/windows/wsl/install-win10

In order to run the code in this notebook you must enter a sensible working environment. Don't worry! We have created one for you. It's version controlled against python3.9 (and experimental python3.10 support) and you can get the file here:

https://github.com/richardtjornhammar/graphtastic/blob/master/env/env39.nix

Since you have installed Nix as well as WSL, or use a Linux (NixOS) or bsd like system, you should be able to execute the following command in a termnial:

$ nix-shell env39.nix

Now you should be able to start your jupyter notebook locally:

$ jupyter-notebook graphhaxxor.ipynb

and that's it.

EXAMPLE 0

Running

import graphtastic.graphs as gg
import graphtastic.clustering as gl
import graphtastic.fit as gf
import graphtastic.convert as gc

Should work if the install was succesful

Example 1 : Absolute and relative coordinates

In this example, we will use the SVD based distance geometry method to go between absolute coordinates, relative coordinate distances and back to ordered absolute coordinates. Absolute coordinates are float values describing the position of something in space. If you have several of these then the same information can be conveyed via the pairwise distance graph. Going from absolute coordinates to pairwise distances is simple and only requires you to calculate all the pairwise distances between your absolute coordinates. Going back to mutually orthogonal ordered coordinates from the pariwise distances is trickier, but a solved problem. The distance geometry can be obtained with SVD and it is implemented in the graphtastic.fit module under the name distance_matrix_to_absolute_coordinates. We start by defining coordinates afterwhich we can calculate the pair distance matrix and transforming it back by using the code below

import numpy as np

coordinates = np.array([[-23.7100 ,  24.1000 ,  85.4400],
  [-22.5600 ,  23.7600 ,  85.6500],
  [-21.5500 ,  24.6200 ,  85.3800],
  [-22.2600 ,  22.4200 ,  86.1900],
  [-23.2900 ,  21.5300 ,  86.4800],
  [-20.9300 ,  22.0300 ,  86.4300],
  [-20.7100 ,  20.7600 ,  86.9400],
  [-21.7900 ,  19.9300 ,  87.1900],
  [-23.0300 ,  20.3300 ,  86.9600],
  [-24.1300 ,  19.4200 ,  87.2500],
  [-23.7400 ,  18.0500 ,  87.0000],
  [-24.4900 ,  19.4600 ,  88.7500],
  [-23.3700 ,  19.8900 ,  89.5200],
  [-24.8500 ,  18.0000 ,  89.0900],
  [-23.9600 ,  17.4800 ,  90.0800],
  [-24.6600 ,  17.2400 ,  87.7500],
  [-24.0800 ,  15.8500 ,  88.0100],
  [-23.9600 ,  15.1600 ,  86.7600],
  [-23.3400 ,  13.7100 ,  87.1000],
  [-21.9600 ,  13.8700 ,  87.6300],
  [-24.1800 ,  13.0300 ,  88.1100],
  [-23.2900 ,  12.8200 ,  85.7600],
  [-23.1900 ,  11.2800 ,  86.2200],
  [-21.8100 ,  11.0000 ,  86.7000],
  [-24.1500 ,  11.0300 ,  87.3200],
  [-23.5300 ,  10.3200 ,  84.9800],
  [-23.5400 ,   8.9800 ,  85.4800],
  [-23.8600 ,   8.0100 ,  84.3400],
  [-23.9800 ,   6.5760 ,  84.8900],
  [-23.2800 ,   6.4460 ,  86.1300],
  [-23.3000 ,   5.7330 ,  83.7800],
  [-22.7300 ,   4.5360 ,  84.3100],
  [-22.2000 ,   6.7130 ,  83.3000],
  [-22.7900 ,   8.0170 ,  83.3800],
  [-21.8100 ,   6.4120 ,  81.9200],
  [-20.8500 ,   5.5220 ,  81.5200],
  [-20.8300 ,   5.5670 ,  80.1200],
  [-21.7700 ,   6.4720 ,  79.7400],
  [-22.3400 ,   6.9680 ,  80.8000],
  [-20.0100 ,   4.6970 ,  82.1500],
  [-19.1800 ,   3.9390 ,  81.4700] ]);

if __name__=='__main__':

    import graphtastic.fit as gf

    distance_matrix = gf.absolute_coordinates_to_distance_matrix( coordinates )
    ordered_coordinates = gf.distance_matrix_to_absolute_coordinates( distance_matrix , n_dimensions=3 )

    print ( ordered_coordinates )

You will notice that the largest variation is now aligned with the X axis, the second most variation aligned with the Y axis and the third most, aligned with the Z axis while the graph topology remained unchanged.

Example 2 : Deterministic DBSCAN

DBSCAN is a clustering algorithm that can be seen as a way of rejecting points, from any cluster, that are positioned in low dense regions of a point cloud. This introduces holes and may result in a larger segment, that would otherwise be connected via a non dense link to become disconnected and form two segments, or clusters. The rejection criterion is simple. The central concern is to evaluate a distance matrix with an applied cutoff this turns the distances into true or false values depending on if a pair distance between point i and j is within the distance cutoff. This new binary Neighbour matrix tells you wether or not two points are neighbours (including itself). The DBSCAN criterion states that a point is not part of any cluster if it has fewer than minPts neighbors. Once you've calculated the distance matrix you can immediately evaluate the number of neighbors each point has and the rejection criterion, via . If the rejection vector R value of a point is True then all the pairwise distances in the distance matrix of that point is set to a value larger than epsilon. This ensures that a distance matrix search will reject those points as neighbours of any other for the choosen epsilon. By tracing out all points that are neighbors and assessing the connectivity (search for connectivity) you can find all the clusters.

import numpy as np
from graphtastic.clustering import dbscan, reformat_dbscan_results
from graphtastic.fit import absolute_coordinates_to_distance_matrix

N   = 100
N05 = int ( np.floor(0.5*N) )
R   = 0.25*np.random.randn(N).reshape(N05,2) + 1.5
P   = 0.50*np.random.randn(N).reshape(N05,2)

coordinates = np.array([*P,*R])

results = dbscan ( distance_matrix = absolute_coordinates_to_distance_matrix(coordinates,bInvPow=True) , eps=0.45 , minPts=4 )
clusters = reformat_dbscan_results(results)
print ( clusters )

Example 3 : NodeGraph, distance matrix to DAG

Here we demonstrate how to convert the graph coordinates into a hierarchy. The leaf nodes will correspond to the coordinate positions.

import numpy as np

coordinates = np.array([[-23.7100 ,  24.1000 ,  85.4400],
  [-22.5600 ,  23.7600 ,  85.6500],
  [-21.5500 ,  24.6200 ,  85.3800],
  [-22.2600 ,  22.4200 ,  86.1900],
  [-23.2900 ,  21.5300 ,  86.4800],
  [-20.9300 ,  22.0300 ,  86.4300],
  [-20.7100 ,  20.7600 ,  86.9400],
  [-21.7900 ,  19.9300 ,  87.1900],
  [-23.0300 ,  20.3300 ,  86.9600],
  [-24.1300 ,  19.4200 ,  87.2500],
  [-23.7400 ,  18.0500 ,  87.0000],
  [-24.4900 ,  19.4600 ,  88.7500],
  [-23.3700 ,  19.8900 ,  89.5200],
  [-24.8500 ,  18.0000 ,  89.0900],
  [-23.9600 ,  17.4800 ,  90.0800],
  [-24.6600 ,  17.2400 ,  87.7500],
  [-24.0800 ,  15.8500 ,  88.0100],
  [-23.9600 ,  15.1600 ,  86.7600],
  [-23.3400 ,  13.7100 ,  87.1000],
  [-21.9600 ,  13.8700 ,  87.6300],
  [-24.1800 ,  13.0300 ,  88.1100],
  [-23.2900 ,  12.8200 ,  85.7600],
  [-23.1900 ,  11.2800 ,  86.2200],
  [-21.8100 ,  11.0000 ,  86.7000],
  [-24.1500 ,  11.0300 ,  87.3200],
  [-23.5300 ,  10.3200 ,  84.9800],
  [-23.5400 ,   8.9800 ,  85.4800],
  [-23.8600 ,   8.0100 ,  84.3400],
  [-23.9800 ,   6.5760 ,  84.8900],
  [-23.2800 ,   6.4460 ,  86.1300],
  [-23.3000 ,   5.7330 ,  83.7800],
  [-22.7300 ,   4.5360 ,  84.3100],
  [-22.2000 ,   6.7130 ,  83.3000],
  [-22.7900 ,   8.0170 ,  83.3800],
  [-21.8100 ,   6.4120 ,  81.9200],
  [-20.8500 ,   5.5220 ,  81.5200],
  [-20.8300 ,   5.5670 ,  80.1200],
  [-21.7700 ,   6.4720 ,  79.7400],
  [-22.3400 ,   6.9680 ,  80.8000],
  [-20.0100 ,   4.6970 ,  82.1500],
  [-19.1800 ,   3.9390 ,  81.4700] ]);


if __name__=='__main__':

    import graphtastic.graphs as gg
    import graphtastic.fit as gf
    GN = gg.NodeGraph()
    #
    # bInvPow refers to the distance type. If True then R distances are returned
    # instead of R2 (R**2) distances. That is also computing the square root if True
    #
    distm = gf.absolute_coordinates_to_distance_matrix( coordinates , bInvPow=True )
    #
    # Now a Graph DAG is constructed from the pairwise distances
    GN.distance_matrix_to_graph_dag( distm )
    #
    # And write it to a json file so that we may employ JS visualisations
    # such as D3 or other nice packages to view our hierarchy
    GN.write_json( jsonfile='./graph_hierarchy.json' )

Manually updated code backups for this library :

GitLab | https://gitlab.com/richardtjornhammar/graphtastic

CSDN | https://codechina.csdn.net/m0_52121311/graphtastic

You might also like...
Fastest Gephi's ForceAtlas2 graph layout algorithm implemented for Python and NetworkX
Fastest Gephi's ForceAtlas2 graph layout algorithm implemented for Python and NetworkX

ForceAtlas2 for Python A port of Gephi's Force Atlas 2 layout algorithm to Python 2 and Python 3 (with a wrapper for NetworkX and igraph). This is the

🐍PyNode Next allows you to easily create beautiful graph visualisations and animations
🐍PyNode Next allows you to easily create beautiful graph visualisations and animations

PyNode Next A complete rewrite of PyNode for the modern era. Up to five times faster than the original PyNode. PyNode Next allows you to easily create

LabGraph is a a Python-first framework used to build sophisticated research systems with real-time streaming, graph API, and parallelism.
LabGraph is a a Python-first framework used to build sophisticated research systems with real-time streaming, graph API, and parallelism.

LabGraph is a a Python-first framework used to build sophisticated research systems with real-time streaming, graph API, and parallelism.

Automatization of BoxPlot graph usin Python MatPlotLib and Excel

BoxPlotGraphAutomation Automatization of BoxPlot graph usin Python / Excel. This file is an automation of BoxPlot-Graph using python graph library mat

Library for exploring and validating machine learning data

TensorFlow Data Validation TensorFlow Data Validation (TFDV) is a library for exploring and validating machine learning data. It is designed to be hig

Library for exploring and validating machine learning data

TensorFlow Data Validation TensorFlow Data Validation (TFDV) is a library for exploring and validating machine learning data. It is designed to be hig

Declarative statistical visualization library for Python
Declarative statistical visualization library for Python

Altair http://altair-viz.github.io Altair is a declarative statistical visualization library for Python. With Altair, you can spend more time understa

Plotting library for IPython/Jupyter notebooks
Plotting library for IPython/Jupyter notebooks

bqplot 2-D plotting library for Project Jupyter Introduction bqplot is a 2-D visualization system for Jupyter, based on the constructs of the Grammar

Cartopy - a cartographic python library with matplotlib support
Cartopy - a cartographic python library with matplotlib support

Cartopy is a Python package designed to make drawing maps for data analysis and visualisation easy. Table of contents Overview Get in touch License an

Releases(v0.12.0)
Owner
Richard Tjörnhammar
PhD in Biological physics https://richardtjornhammar.github.io
Richard Tjörnhammar
Tandem Mass Spectrum Prediction with Graph Transformers

MassFormer This is the original implementation of MassFormer, a graph transformer for small molecule MS/MS prediction. Check out the preprint on arxiv

Röst Lab 13 Oct 27, 2022
Pydrawer: The Python package for visualizing curves and linear transformations in a super simple way

pydrawer 📐 The Python package for visualizing curves and linear transformations in a super simple way. ✏️ Installation Install pydrawer package with

Dylan Tintenfich 56 Dec 30, 2022
A simple python script using Numpy and Matplotlib library to plot a Mohr's Circle when given a two-dimensional state of stress.

Mohr's Circle Calculator This is a really small personal project done for Department of Civil Engineering, Delhi Technological University (formerly, D

Agyeya Mishra 0 Jul 17, 2021
Jupyter notebook and datasets from the pandas Q&A video series

Python pandas Q&A video series Read about the series, and view all of the videos on one page: Easier data analysis in Python with pandas. Jupyter Note

Kevin Markham 2k Jan 05, 2023
Practical-statistics-for-data-scientists - Code repository for O'Reilly book

Code repository Practical Statistics for Data Scientists: 50+ Essential Concepts Using R and Python by Peter Bruce, Andrew Bruce, and Peter Gedeck Pub

1.7k Jan 04, 2023
A Python package for caclulations and visualizations in geological sciences.

geo_calcs A Python package for caclulations and visualizations in geological sciences. Free software: MIT license Documentation: https://geo-calcs.rea

Drew Heasman 1 Jul 12, 2022
Plot toolbox based on Matplotlib, simple and elegant.

Elegant-Plot Plot toolbox based on Matplotlib, simple and elegant. 绘制效果 绘制过程 数据准备 每种图标类型的目录下有data.csv文件,依据样例数据填入自己的数据。

3 Jul 15, 2022
Use Perspective to create the chart for the trader’s dashboard

Task Overview | Installation Instructions | Link to Module 3 Introduction Experience Technology at JP Morgan Chase Try out what real work is like in t

Abdulazeez Jimoh 1 Jan 22, 2022
PyPassword is a simple follow up to PyPassphrase

PyPassword PyPassword is a simple follow up to PyPassphrase. After finishing that project it occured to me that while some may wish to use that option

Scotty 2 Jan 22, 2022
https://there.oughta.be/a/macro-keyboard

inkkeys Details and instructions can be found on https://there.oughta.be/a/macro-keyboard In contrast to most of my other projects, I decided to put t

Sebastian Staacks 209 Dec 21, 2022
A Python library for plotting hockey rinks with Matplotlib.

Hockey Rink A Python library for plotting hockey rinks with Matplotlib. Installation pip install hockey_rink Current Rinks The following shows the cus

24 Jan 02, 2023
Custom Plotly Dash components based on Mantine React Components library

Dash Mantine Components Dash Mantine Components is a Dash component library based on Mantine React Components Library. It makes it easier to create go

Snehil Vijay 239 Jan 08, 2023
DataVisualization - The evolution of my arduino and python journey. New level of competence achieved

DataVisualization - The evolution of my arduino and python journey. New level of competence achieved

1 Jan 03, 2022
Data parsing and validation using Python type hints

pydantic Data validation and settings management using Python type hinting. Fast and extensible, pydantic plays nicely with your linters/IDE/brain. De

Samuel Colvin 12.1k Jan 06, 2023
🎨 Python Echarts Plotting Library

pyecharts Python ❤️ ECharts = pyecharts English README 📣 简介 Apache ECharts (incubating) 是一个由百度开源的数据可视化,凭借着良好的交互性,精巧的图表设计,得到了众多开发者的认可。而 Python 是一门富有表达

pyecharts 13.1k Jan 03, 2023
FairLens is an open source Python library for automatically discovering bias and measuring fairness in data

FairLens FairLens is an open source Python library for automatically discovering bias and measuring fairness in data. The package can be used to quick

Synthesized 69 Dec 15, 2022
A little word cloud generator in Python

Linux macOS Windows PyPI word_cloud A little word cloud generator in Python. Read more about it on the blog post or the website. The code is tested ag

Andreas Mueller 9.2k Dec 30, 2022
Make sankey, alluvial and sankey bump plots in ggplot

The goal of ggsankey is to make beautiful sankey, alluvial and sankey bump plots in ggplot2

David Sjoberg 156 Jan 03, 2023
Monochromatic colorscheme for matplotlib with opinionated sensible default

Monochromatic colorscheme for matplotlib with opinionated sensible default If you need a simple monochromatic colorscheme for your matplotlib figures,

Aria Ghora Prabono 2 May 06, 2022
Set of matplotlib operations that are not trivial

Matplotlib Snippets This repository contains a set of matplotlib operations that are not trivial. Histograms Histogram with bins adapted to log scale

Raphael Meudec 1 Nov 15, 2021