Python histogram library - histograms as updateable, fully semantic objects with visualization tools. [P]ython [HYST]ograms.

Overview

physt Physt logo

P(i/y)thon h(i/y)stograms. Inspired (and based on) numpy.histogram, but designed for humans(TM) on steroids(TM).

The goal is to unify different concepts of histograms as occurring in numpy, pandas, matplotlib, ROOT, etc. and to create one representation that is easily manipulated with from the data point of view and at the same time provides nice integration into IPython notebook and various plotting options. In short, whatever you want to do with histograms, physt aims to be on your side.

Note: bokeh plotting backend has been discontinued (due to external library being redesigned.)

Travis ReadTheDocs Join the chat at https://gitter.im/physt/Lobby PyPI version Anaconda-Server Badge Anaconda-Server Badge

Versioning

  • Versions 0.3.x support Python 2.7 (no new releases in 2019)
  • Versions 0.4.x support Python 3.5+ while continuing the 0.3 API
  • Versions 0.4.9+ support only Python 3.6+ while continuing the 0.3 API
  • Versions 0.5.x slightly change the interpretation of *args in h1, h2, ...

Simple example

from physt import h1

# Create the sample
heights = [160, 155, 156, 198, 177, 168, 191, 183, 184, 179, 178, 172, 173, 175,
           172, 177, 176, 175, 174, 173, 174, 175, 177, 169, 168, 164, 175, 188,
           178, 174, 173, 181, 185, 166, 162, 163, 171, 165, 180, 189, 166, 163,
           172, 173, 174, 183, 184, 161, 162, 168, 169, 174, 176, 170, 169, 165]

hist = h1(heights, 10)           # <--- get the histogram data
hist << 190                      # <--- add a forgotten value
hist.plot()                      # <--- and plot it

Heights plot

2D example

from physt import h2
import seaborn as sns

iris = sns.load_dataset('iris')
iris_hist = h2(iris["sepal_length"], iris["sepal_width"], "human", bin_count=[12, 7], name="Iris")
iris_hist.plot(show_zero=False, cmap="gray_r", show_values=True);

Iris 2D plot

3D directional example

import numpy as np
from physt import special_histograms

# Generate some sample data
data = np.empty((1000, 3))
data[:,0] = np.random.normal(0, 1, 1000)
data[:,1] = np.random.normal(0, 1.3, 1000)
data[:,2] = np.random.normal(1, .6, 1000)

# Get histogram data (in spherical coordinates)
h = special_histograms.spherical(data)                 

# And plot its projection on a globe
h.projection("theta", "phi").plot.globe_map(density=True, figsize=(7, 7), cmap="rainbow")   

Directional 3D plot

See more in docstring's and notebooks:

Installation

Using pip:

pip install physt

Features

Implemented

  • 1D histograms
  • 2D histograms
  • ND histograms
  • Some special histograms
    • 2D polar coordinates (with plotting)
    • 3D spherical / cylindrical coordinates (beta)
  • Adaptive rebinning for on-line filling of unknown data (beta)
  • Non-consecutive bins
  • Memory-effective histogramming of dask arrays (beta)
  • Understands any numpy-array-like object
  • Keep underflow / overflow / missed bins
  • Basic numeric operations (* / + -)
  • Items / slice selection (including mask arrays)
  • Add new values (fill, fill_n)
  • Cumulative values, densities
  • Simple statistics for original data (mean, std, sem)
  • Plotting with several backends
    • matplotlib (static plots with many options)
    • vega (interactive plots, beta, help wanted!)
    • folium (experimental for geo-data)
    • plotly (very basic, help wanted!)
    • ascii (experimental)
  • Algorithms for optimized binning
    • human-friendly
    • mathematical
  • IO, conversions
    • I/O JSON
    • I/O xarray.DataSet (experimental)
    • O ROOT file (experimental)
    • O pandas.DataFrame (basic)

Planned

  • Rebinning
    • using reference to original data?
    • merging bins
  • Statistics (based on original data)?
  • Stacked histograms (with names)
  • Potentially holoviews plotting backend (instead of the discontinued bokeh one)

Not planned

  • Kernel density estimates - use your favourite statistics package (like seaborn)
  • Rebinning using interpolation - it should be trivial to use rebin (https://github.com/jhykes/rebin) with physt

Rationale (for both): physt is dumb, but precise.

Dependencies

  • Python 3.5+
  • numpy
  • (optional) matplotlib - simple output
  • (optional) xarray - I/O
  • (optional) protobuf - I/O
  • (optional) uproot - I/O
  • (optional) astropy - additional binning algorithms
  • (optional) folium - map plotting
  • (optional) vega3 - for vega in-line in IPython notebook (note that to generate vega JSON, this is not necessary)
  • (optional) asciiplotlib - for ASCII bar plots
  • (optional) xtermcolot - for ASCII color maps
  • (testing) py.test, pandas
  • (docs) sphinx, sphinx_rtd_theme, ipython

Publicity

Talk at PyData Berlin 2018:

Contribution

I am looking for anyone interested in using / developing physt. You can contribute by reporting errors, implementing missing features and suggest new one.

Thanks to:

Patches:

Alternatives and inspirations

Comments
  • python 2.7 plotting is not working

    python 2.7 plotting is not working

    When runnin plot() function I get the error below even though matplotlib is installed. Also the algorithm is pretty slow when running on something bigger than toy example.

    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/usr/local/lib/python2.7/dist-packages/physt/plotting/__init__.py", line 137, in __call__
        return plot(self.histogram, kind=kind, **kwargs)
      File "/usr/local/lib/python2.7/dist-packages/physt/plotting/__init__.py", line 91, in plot
        backend_name, backend = _get_backend(backend)
      File "/usr/local/lib/python2.7/dist-packages/physt/plotting/__init__.py", line 70, in _get_backend
        raise RuntimeError("No plotting backend available. Please, install matplotlib (preferred) or bokeh (limited).")
    RuntimeError: No plotting backend available. Please, install matplotlib (preferred) or bokeh (limited).
    
    bug 
    opened by romange 13
  • Smooth polar histograms?

    Smooth polar histograms?

    Thanks for writing this awesome library!

    I have a question regarding smoothing of polar 2D histograms. I am constructing a histogram like described on this page https://physt.readthedocs.io/en/latest/special_histograms.html#Polar-histogram and now I want to smooth it with a Gaussian kernel (like scipy.ndimage.gaussian_filter). What is the most elegant / correct method to do that?

    question 
    opened by horsto 7
  • Rebinning histograms related project

    Rebinning histograms related project

    Hi I found a project on rebinning histogram at https://github.com/jhykes/rebin and I opened an issue (jhykes/rebin#5) on that project page asking about integrating his code to this project. I hope you will appreciate it.

    enhancement idea? 
    opened by DancingQuanta 7
  • Option to center labels on bins

    Option to center labels on bins

    If you have a large dataset with a small number of values (such as consisting only of integers 1-10) then it would be nice to have the bin x-axis labels at the center under the respective bin instead of at the bin edges.

    I recognise this case is more of a 'histogram as bar plot' kind of thing, but it is a use-case I have often.

    opened by nzjrs 5
  • Usage of spherical histogram

    Usage of spherical histogram

    Hi, I have tried the example of spherical histogram. After a small modification of the code (normalized the data as unit vectors),

    n = 100 data = np.empty((n, 3)) data[:,0] = np.random.normal(0, 1, n) data[:,1] = np.random.normal(0, 1, n) data[:,2] = np.random.normal(0, 1, n) for i in range(n): scale = np.sqrt(data[i,0]**2 + data[i,1]**2 + data[i,2]**2) data[i,0] = data[i,0]/scale data[i,1] = data[i,1]/scale data[i,2] = data[i,2]/scale

    h = special.spherical_histogram(data, theta_bins=20, phi_bins=20) ax.scatter(data[:,0], data[:,1], data[:,2])

    globe = h.projection("theta", "phi") globe.plot.globe_map(density=True, figsize=(7, 7), cmap="rainbow")

    plt.show()

    I got an error: “RuntimeError: Bins not in rising order.” What did I do wrong? Thank you for your support.

    question 
    opened by zhengpuchen 3
  • approximate histograms

    approximate histograms

    I'm following the paper (http://jmlr.org/papers/volume11/ben-haim10a/ben-haim10a.pdf) implemented by https://github.com/carsonfarmer/streamhist, and the notion of approximate histograms seems elegant and efficient.

    After seeing the internals of streamhist (trying to fix bugs) and reading the paper, I can imagine ways to make a better implementation: e.g. much more efficient discovery of bins to be joined, and avoiding temporary lists when possible. Also the code seems overly complex, partially due to features like "bin freezing" which try to workaround poor bin joining performance.

    Anyway since streamhist is defunct, I'm thinking about trying an implementation. I wonder if this kind of histogram would fit into physt (and if sortedcollections would be reasonable as a dependency).

    opened by belm0 3
  • please make this library discoverable

    please make this library discoverable

    name: physt (?) github tag line: P(i/y)thon h(i/y)stograms (???)

    google search for "python streaming histogram"

    • top result is https://github.com/carsonfarmer/streamhist (unused / unmaintained)
    • physt not in initial 10 pages of results...

    For over a year I've wanted to find a Python library which supports efficient histogram updates without a bunch of ugly dependencies. I've searched many times. Today I happened to get lucky by seeing physt mentioned at the bottom of a SO question (https://stackoverflow.com/questions/40627274/).

    To improve discoverability by search, please consider updating the github tag line to concisely and accurately describe the library (... rather than be cute).

    opened by belm0 2
  • Warning in current numpy

    Warning in current numpy

    If you try to merge bins:

    from physt import h2
    from scipy.stats import multivariate_normal
    hist = h2(*multivariate_normal.rvs((0,0), size=100_000).T, bins=100)
    hist.merge_bins(2)
    

    You get a warning from numpy:

    /home/schreihf/.local/lib/python3.7/site-packages/physt/histogram_base.py:572: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
      new_frequencies[new_index] += old_frequencies[old_index]
    /home/schreihf/.local/lib/python3.7/site-packages/physt/histogram_base.py:573: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
      new_errors2[new_index] += old_errors2[old_index]
    
    opened by henryiii 2
  • Add 2D & ND histograms

    Add 2D & ND histograms

    • [x] Analogous data model to Histogram1D
    • [x] refactor HistogramBase class -> common behaviour of 1D and 2D
    • [x] revisit binning schemas
    • [x] histogram2D facade function to be compatible with numpy one
    • [x] plotting
    • [x] arithmetic operations
    • [x] documentation
    • [ ] stats
    enhancement 
    opened by janpipek 2
  • ImportError with newer plotly

    ImportError with newer plotly

    [SOMEDIR}\physt\physt\plotting\plotly.py in <module>
         12 
         13 import plotly.offline as pyo
    ---> 14 import plotly.plotly as pyp
         15 import plotly.graph_objs as go
         16 
    
    ~\Miniconda3\lib\site-packages\plotly\plotly\__init__.py in <module>
          2 from _plotly_future_ import _chart_studio_error
          3 
    ----> 4 _chart_studio_error("plotly")
    
    ~\Miniconda3\lib\site-packages\_plotly_future_\__init__.py in _chart_studio_error(submodule)
         41 
         42 def _chart_studio_error(submodule):
    ---> 43     raise ImportError(
         44         """
         45 The plotly.{submodule} module is deprecated,
    
    ImportError: 
    The plotly.plotly module is deprecated,
    please install the chart-studio package and use the
    chart_studio.plotly module instead. 
    
    bug visualization 
    opened by janpipek 1
  • Wrong bars center in polar_map

    Wrong bars center in polar_map

    I have found that the bars in polar_map are centered on the left edge of the phi bins instead of their center. Because of this, the representation of the histogram does not coincide with the data, as in the figure below: polarmap_wrong

    I think this can be easily solved by replacing

    bars = ax.bar(phipos[i], dr[i], width=dphi[i], bottom=rpos[i], color=bin_color,

    with

    bars = ax.bar(phipos[i] + 0.5*dphi[i], dr[i], width=dphi[i], bottom=rpos[i], color=bin_color,

    in the definition of polar_map.

    By the way, thank you for this amazing package!

    bug visualization 
    opened by ruhugu 1
  • Be more explicit about bins too narrow for float representation

    Be more explicit about bins too narrow for float representation

    If the computed range for the binning divided by the number of bins is lower than the minimum float difference at the scale, we receive an error [ValueError: Bins not in rising order.] which is not very informative.

    To reproduce:

    data = [1, np.nextafter(1, 2)]
    physt.h1(data)
    

    It also happens when the range is 0, like in:

    data = [1, 1]
    physt.h1(data)
    
    enhancement 
    opened by janpipek 1
Releases(v0.5.2)
Owner
Jan Pipek
PyData Prague
Jan Pipek
🧇 Make Waffle Charts in Python.

PyWaffle PyWaffle is an open source, MIT-licensed Python package for plotting waffle charts. It provides a Figure constructor class Waffle, which coul

Guangyang Li 528 Jan 02, 2023
A tool to plot and execute Rossmos's Formula, that helps to catch serial criminals using mathematics

Rossmo Plotter A tool to plot and execute Rossmos's Formula using python, that helps to catch serial criminals using mathematics Author: Amlan Saha Ku

Amlan Saha Kundu 3 Aug 29, 2022
Piglet-shaders - PoC of custom shaders for Piglet

Piglet custom shader PoC This is a PoC for compiling Piglet fragment shaders usi

6 Mar 10, 2022
With Holoviews, your data visualizes itself.

HoloViews Stop plotting your data - annotate your data and let it visualize itself. HoloViews is an open-source Python library designed to make data a

HoloViz 2.3k Jan 04, 2023
visualize_ML is a python package made to visualize some of the steps involved while dealing with a Machine Learning problem

visualize_ML visualize_ML is a python package made to visualize some of the steps involved while dealing with a Machine Learning problem. It is build

Ayush Singh 164 Dec 12, 2022
Schema validation for Xarray objects

xarray-schema Schema validation for Xarray installation This package is in the early stages of development. Install it from source: pip install git+gi

carbonplan 22 Oct 31, 2022
This is simply repo for line drawing rendering using freestyle in Blender.

blender_freestyle_line_drawing This is simply repo for line drawing rendering using freestyle in Blender. how to use blender2935 --background --python

MaxLin 3 Jul 02, 2022
Decision Border Visualizer for Classification Algorithms

dbv Decision Border Visualizer for Classification Algorithms Project description A python package for Machine Learning Engineers who want to visualize

Sven Eschlbeck 1 Nov 01, 2021
Create a visualization for Trump's Tweeted Words Using Python

Data Trump's Tweeted Words This plot illustrates twitter word occurences. We already did the coding I needed for this plot, so I was very inspired to

7 Mar 27, 2022
These data visualizations were created as homework for my CS40 class. I hope you enjoy!

Data Visualizations These data visualizations were created as homework for my CS40 class. I hope you enjoy! Nobel Laureates by their Country of Birth

9 Sep 02, 2022
Data Visualization Guide for Presentations, Reports, and Dashboards

This is a highly practical and example-based guide on visually representing data in reports and dashboards.

Anton Zhiyanov 395 Dec 29, 2022
Fast data visualization and GUI tools for scientific / engineering applications

PyQtGraph A pure-Python graphics library for PyQt5/PyQt6/PySide2/PySide6 Copyright 2020 Luke Campagnola, University of North Carolina at Chapel Hill h

pyqtgraph 3.1k Jan 08, 2023
A Bokeh project developed for learning and teaching Bokeh interactive plotting!

Bokeh-Python-Visualization A Bokeh project developed for learning and teaching Bokeh interactive plotting! See my medium blog posts about making bokeh

Will Koehrsen 350 Dec 05, 2022
daily report of @arkinvest ETF activity + data collection

ark_invest daily weekday report of @arkinvest ETF activity + data collection This script was created to: Extract and save daily csv's from ARKInvest's

T D 27 Jan 02, 2023
Rockstar - Makes you a Rockstar C++ Programmer in 2 minutes

Rockstar Rockstar is one amazing library, which will make you a Rockstar Programmer in just 2 minutes. In last decade, people learned C++ in 21 days.

4k Jan 05, 2023
Create matplotlib visualizations from the command-line

MatplotCLI Create matplotlib visualizations from the command-line MatplotCLI is a simple utility to quickly create plots from the command-line, levera

Daniel Moura 46 Dec 16, 2022
Smarthome Dashboard with Grafana & InfluxDB

Smarthome Dashboard with Grafana & InfluxDB This is a complete overhaul of my Raspberry Dashboard done with Flask. I switched from sqlite to InfluxDB

6 Oct 20, 2022
Simple function to plot multiple barplots in the same figure.

Simple function to plot multiple barplots in the same figure. Supports padding and custom color.

Matthias Jakobs 2 Feb 21, 2022
Data visualization electromagnetic spectrum

Datenvisualisierung-Elektromagnetischen-Spektrum Anhand des Moduls matplotlib sollen die Daten des elektromagnetischen Spektrums dargestellt werden. D

Pulsar 1 Sep 01, 2022