Make sankey, alluvial and sankey bump plots in ggplot

Overview

ggsankey

The goal of ggsankey is to make beautiful sankey, alluvial and sankey bump plots in ggplot2

Installation

You can install the development version of ggsankey from github with:

# install.packages("devtools")
devtools::install_github("davidsjoberg/ggsankey")

How does it work

Google defines a sankey as:

A sankey diagram is a visualization used to depict a flow from one set of values to another. The things being connected are called nodes and the connections are called links. Sankeys are best used when you want to show a many-to-many mapping between two domains or multiple paths through a set of stages.

To plot a sankey diagram with ggsankey each observation has a stage (called a discrete x-value in ggplot) and be part of a node. Furthermore, each observation needs to have instructions of which node it will belong to in the next stage. See the image below for some clarification.

Hence, to use geom_sankey the aestethics x, next_x, node and next_node are required. The last stage should point to NA. The aestethics fill and color will affect both nodes and flows.

To controll geometries (not changed by data) like fill, color, size, alpha etc for nodes and flows you can either choose to set a global value that affect both, or you can specify which one you want to alter. For example node.color = 'black' will only draw a black line around the nodes, but not the flows (links).

Example

geom_sankey

A basic sankey plot that shows how dimensions are linked.

library(ggsankey)
library(dplyr)
library(ggplot2)

df <- mtcars %>%
  make_long(cyl, vs, am, gear, carb)

ggplot(df, aes(x = x, 
               next_x = next_x, 
               node = node, 
               next_node = next_node,
               fill = factor(node))) +
  geom_sankey()

And by adding a little pimp.

  • Labels with geom_sankey_label which places labels in the center of nodes if given the same aestethics.

  • ggsankey also comes with custom minimalistic themes that can be used. Here I use theme_sankey.

ggplot(df, aes(x = x, next_x = next_x, node = node, next_node = next_node, fill = factor(node), label = node)) +
  geom_sankey(flow.alpha = .6,
              node.color = "gray30") +
  geom_sankey_label(size = 3, color = "white", fill = "gray40") +
  scale_fill_viridis_d() +
  theme_sankey(base_size = 18) +
  labs(x = NULL) +
  theme(legend.position = "none",
        plot.title = element_text(hjust = .5)) +
  ggtitle("Car features")

geom_alluvial

Alluvial plots are very similiar to sankey plots but have no spaces between nodes and start at y = 0 instead being centered around the x-axis.

ggplot(df, aes(x = x, next_x = next_x, node = node, next_node = next_node, fill = factor(node), label = node)) +
  geom_alluvial(flow.alpha = .6) +
  geom_alluvial_text(size = 3, color = "white") +
  scale_fill_viridis_d() +
  theme_alluvial(base_size = 18) +
  labs(x = NULL) +
  theme(legend.position = "none",
        plot.title = element_text(hjust = .5)) +
  ggtitle("Car features")

geom_sankey_bump

Sankey bump plots is mix between bump plots and sankey and mostly useful for time series. When a group becomes larger than another it bumps above it.

# install.packages("gapminder")
library(gapminder)

df <- gapminder %>%
  group_by(continent, year) %>%
  summarise(gdp = (sum_(pop * gdpPercap)/1e9) %>% round(0), .groups = "keep") %>%
  ungroup()

ggplot(df, aes(x = year,
               node = continent,
               fill = continent,
               value = gdp)) +
  geom_sankey_bump(space = 0, type = "alluvial", color = "transparent", smooth = 6) +
  scale_fill_viridis_d(option = "A", alpha = .8) +
  theme_sankey_bump(base_size = 16) +
  labs(x = NULL,
       y = "GDP ($ bn)",
       fill = NULL,
       color = NULL) +
  theme(legend.position = "bottom") +
  labs(title = "GDP development per continent")

Owner
David Sjoberg
Happy R user. Twitter: @davsjob
David Sjoberg
Application for viewing pokemon regional variants.

Pokemon Regional Variants Application Application for viewing pokemon regional variants. Run The Source Code Download Python https://www.python.org/do

Michael J Bailey 4 Oct 08, 2021
Productivity Tools for Plotly + Pandas

Cufflinks This library binds the power of plotly with the flexibility of pandas for easy plotting. This library is available on https://github.com/san

Jorge Santos 2.7k Dec 30, 2022
Visualize and compare datasets, target values and associations, with one line of code.

In-depth EDA (target analysis, comparison, feature analysis, correlation) in two lines of code! Sweetviz is an open-source Python library that generat

Francois Bertrand 2.3k Jan 05, 2023
Visualization Website by using Dash and Heroku

Visualization Website by using Dash and Heroku You can visit the website https://payroll-expense-analysis.herokuapp.com/ In this project, I am interes

YF Liu 1 Jan 14, 2022
Generating interfaces(CLI, Qt GUI, Dash web app) from a Python function.

oneFace is a Python library for automatically generating multiple interfaces(CLI, GUI, WebGUI) from a callable Python object. oneFace is an easy way t

NaNg 31 Oct 21, 2022
Visual Python is a GUI-based Python code generator, developed on the Jupyter Notebook environment as an extension.

Visual Python is a GUI-based Python code generator, developed on the Jupyter Notebook environment as an extension.

Visual Python 564 Jan 03, 2023
An open-source tool for visual and modular block programing in python

PyFlow PyFlow is an open-source tool for modular visual programing in python ! Although for now the tool is in Beta and features are coming in bit by

1.1k Jan 06, 2023
Material for dataviz course at university of Bordeaux

Material for dataviz course at university of Bordeaux

Nicolas P. Rougier 50 Jul 17, 2022
LabGraph is a a Python-first framework used to build sophisticated research systems with real-time streaming, graph API, and parallelism.

LabGraph is a a Python-first framework used to build sophisticated research systems with real-time streaming, graph API, and parallelism.

MLH Fellowship 7 Oct 05, 2022
High performance, editable, stylable datagrids in jupyter and jupyterlab

An ipywidgets wrapper of regular-table for Jupyter. Examples Two Billion Rows Notebook Click Events Notebook Edit Events Notebook Styling Notebook Pan

J.P. Morgan Chase 75 Dec 15, 2022
GD-UltraHack - A Mod Menu for Geometry Dash. Specifically a MegahackV5 clone in Python. Only for Windows

GD UltraHack: The Mod Menu that Nobody asked for. This is a mod menu for the gam

zeo 1 Jan 05, 2022
Python script for writing text on github contribution chart.

Github Contribution Drawer Python script for writing text on github contribution chart. Requirements Python 3.X Getting Started Create repository Put

Steven 0 May 27, 2022
Histogramming for analysis powered by boost-histogram

Hist Hist is an analyst-friendly front-end for boost-histogram, designed for Python 3.7+ (3.6 users get version 2.4). See what's new. Installation You

Scikit-HEP Project 97 Dec 25, 2022
Simple Python interface for Graphviz

Simple Python interface for Graphviz

Sebastian Bank 1.3k Dec 26, 2022
https://there.oughta.be/a/macro-keyboard

inkkeys Details and instructions can be found on https://there.oughta.be/a/macro-keyboard In contrast to most of my other projects, I decided to put t

Sebastian Staacks 209 Dec 21, 2022
3D-Lorenz-Attractor-simulation-with-python

3D-Lorenz-Attractor-simulation-with-python Animação 3D da trajetória do Atrator de Lorenz, implementada em Python usando o método de Runge-Kutta de 4ª

Hevenicio Silva 17 Dec 08, 2022
Interactive Data Visualization in the browser, from Python

Bokeh is an interactive visualization library for modern web browsers. It provides elegant, concise construction of versatile graphics, and affords hi

Bokeh 17.1k Dec 31, 2022
股票行情实时数据接口-A股,完全免费的沪深证券股票数据-中国股市,python最简封装的API接口

股票行情实时数据接口-A股,完全免费的沪深证券股票数据-中国股市,python最简封装的API接口,包含日线,历史K线,分时线,分钟线,全部实时采集,系统包括新浪腾讯双数据核心采集获取,自动故障切换,STOCK数据格式成DataFrame格式,可用来查询研究量化分析,股票程序自动化交易系统.为量化研究者在数据获取方面极大地减轻工作量,更加专注于策略和模型的研究与实现。

dev 572 Jan 08, 2023
A high-level plotting API for pandas, dask, xarray, and networkx built on HoloViews

hvPlot A high-level plotting API for the PyData ecosystem built on HoloViews. Build Status Coverage Latest dev release Latest release Docs What is it?

HoloViz 697 Jan 06, 2023
Bioinformatics tool for exploring RNA-Protein interactions

Explore RNA-Protein interactions. RNPFind is a bioinformatics tool. It takes an RNA transcript as input and gives a list of RNA binding protein (RBP)

Nahin Khan 3 Jan 27, 2022