Simple and fast histogramming in Python accelerated with OpenMP.

Overview

pygram11

Documentation Status Actions Status PyPI version Conda Forge Python Version

Simple and fast histogramming in Python accelerated with OpenMP with help from pybind11.

pygram11 provides functions for very fast histogram calculations (and the variance in each bin) in one and two dimensions. The API is very simple; documentation can be found here (you'll also find some benchmarks there).

Installing

From PyPI

Binary wheels are provided for Linux and macOS. They can be installed from PyPI via pip:

pip install pygram11

From conda-forge

For installation via the conda package manager pygram11 is part of conda-forge.

conda install pygram11 -c conda-forge

Please note that on macOS the OpenMP libraries from LLVM (libomp) and Intel (libiomp) may clash if your conda environment includes the Intel Math Kernel Library (MKL) package distributed by Anaconda. You may need to install the nomkl package to prevent the clash (Intel MKL accelerates many linear algebra operations, but does not impact pygram11):

conda install nomkl ## sometimes necessary fix (macOS only)

From Source

You need is a C++14 compiler and OpenMP. If you are using a relatively modern GCC release on Linux then you probably don't have to worry about the OpenMP dependency. If you are on macOS, you can install libomp from Homebrew (pygram11 does compile on Apple Silicon devices with Python version >= 3.9 and libomp installed from Homebrew). With those dependencies met, simply run:

git clone https://github.com/douglasdavis/pygram11.git --recurse-submodules
cd pygram11
pip install .

Or let pip handle the cloning procedure:

pip install git+https://github.com/douglasdavis/[email protected]

Tests are run on Python versions >= 3.7 and binary wheels are provided for those versions.

In Action

A histogram (with fixed bin width) of weighted data in one dimension:

>>> rng = np.random.default_rng(123)
>>> x = rng.standard_normal(10000)
>>> w = rng.uniform(0.8, 1.2, x.shape[0])
>>> h, err = pygram11.histogram(x, bins=40, range=(-4, 4), weights=w)

A histogram with fixed bin width which saves the under and overflow in the first and last bins:

>>> x = rng.standard_normal(1000000)
>>> h, __ = pygram11.histogram(x, bins=20, range=(-3, 3), flow=True)

where we've used __ to catch the None returned when weights are absent. A histogram in two dimensions with variable width bins:

>>> x = rng.standard_normal(1000)
>>> y = rng.standard_normal(1000)
>>> xbins = [-2.0, -1.0, -0.5, 1.5, 2.0, 3.1]
>>> ybins = [-3.0, -1.5, -0.1, 0.8, 2.0, 2.8]
>>> h, err = pygram11.histogram2d(x, y, bins=[xbins, ybins])

Manually controlling OpenMP acceleration with context managers:

>>> with pygram11.omp_disabled():  # disable all thresholds.
...     result, _ = pygram11.histogram(x, bins=10, range=(-3, 3))
...
>>> with pygram11.omp_forced(key="thresholds.var1d"):  # force a single threshold.
...     result, _ = pygram11.histogram(x, bins=[-3, -2, 0, 2, 3])
...

Histogramming multiple weight variations for the same data, then putting the result in a DataFrame (the input pandas DataFrame will be interpreted as a NumPy array):

>> data = rng.standard_normal(N) >>> count, err = pygram11.histogram(data, bins=20, range=(-3, 3), weights=weights, flow=True) >>> count_df = pd.DataFrame(count, columns=weights.columns) >>> err_df = pd.DataFrame(err, columns=weights.columns)">
>>> N = 10000
>>> weights = pd.DataFrame({"weight_a": np.abs(rng.standard_normal(N)),
...                         "weight_b": rng.uniform(0.5, 0.8, N),
...                         "weight_c": rng.uniform(0.0, 1.0, N)})
>>> data = rng.standard_normal(N)
>>> count, err = pygram11.histogram(data, bins=20, range=(-3, 3), weights=weights, flow=True)
>>> count_df = pd.DataFrame(count, columns=weights.columns)
>>> err_df = pd.DataFrame(err, columns=weights.columns)

I also wrote a blog post with some simple examples.

Other Libraries

  • boost-histogram provides Pythonic object oriented histograms.
  • Simple and fast histogramming in Python using the NumPy C API: fast-histogram (no variance or overflow support).
  • To calculate histograms in Python on a GPU, see cupy.histogram.

If there is something you'd like to see in pygram11, please open an issue or pull request.

Comments
  • Where is pygram11 in performance compared to fast-histogram?

    Where is pygram11 in performance compared to fast-histogram?

    Basically the title.

    I want to go for raw speed only and don't care for much more than being able to specify bin-edges and getting back a numpy-array from the function that calculates the histogram.

    To give a little more context: It could be, that i will have to calculate a histogram of 1'000'000 bins, where each bin is 1000 integers wide. I would have to fill 1000 values in those bins. Can you tell me if this is possible with pygram11?

    opened by tim-hilt 7
  • Config adjustments

    Config adjustments

    • Rewrite configuration; now controllable by a module, context managers, and decorator
    • modularize things a bit (separate numpy API from pygram11 core API)
    opened by douglasdavis 0
  • WIP: Large overhaul of backend, reorganize internal (non-public) API

    WIP: Large overhaul of backend, reorganize internal (non-public) API

    • [x] OpenMP required to build (not a difficult dependency to satisfy) - toggled via if statements on the input data size.
      • [x] omp function argument deprecated
    • [x] Backend C++ code for 1D histograms rewritten (use Python and NumPy C APIs directory, no pybind11)
    • [x] Rewrite multiweight histograms (still via pybind11) to support more types (avoid conversions)
    • [x] Python code moved to the top level module (__init__.py) (without changing public API)
    • [x] Documentation updates
    opened by douglasdavis 0
  • Improve OpenMP inspection; docs improvements; pybind11 update

    Improve OpenMP inspection; docs improvements; pybind11 update

    • Improve API for accessing information about OpenMP availability and properties
    • Docs improvements
    • Update to latest pybind11 release (2.4.3)
    • Clean up some old scripts
    opened by douglasdavis 0
  • Development for v0.5

    Development for v0.5

    • [x] implement C++ back end for multi weight histogramming
    • [x] redesign one-dimensional histogramming backend, simplify code by using more of the pybind11 array API (front end API uinchanged)
    opened by douglasdavis 0
  • Allow over/under flow for 1D histograms

    Allow over/under flow for 1D histograms

    • extend the number of bins in memory by 2
    • fill bin 0 for data less than xmin
    • fill bin nbins + 1 for data larger than than xmax
    • first real bin is now index 1
    • last real bin is now index nbins
    • add named arguments to python functions which dictate how to slice result so we return the desired bin counts.
    opened by douglasdavis 0
Releases(0.13.2)
  • 0.13.2(Dec 5, 2022)

  • 0.13.1(Aug 30, 2022)

  • 0.13.0(Aug 24, 2022)

  • 0.12.2(May 25, 2022)

  • 0.12.1(Oct 19, 2021)

  • 0.12.0(Apr 16, 2021)

    • OpenMP configuration redesigned.
      • constants have been removed in favor of new pygram11.config module.
      • Decorators and context managers added to control toggling OpenMP thresholds
        • pygram11.without_omp decorator
        • pygram11.with_omp decorator
        • pygram11.disable_omp context manager
        • pygram11.force_omp context manager
      • See the documentation for more
    • New keyword argument added to histogramming functions (cons_var) for returning variance instead of standard error.
    Source code(tar.gz)
    Source code(zip)
  • 0.11.2(Feb 27, 2021)

    Renamed internal Python files hist.py and misc.py to _hist.py and _misc.py, respectively.

    The contents of these modules are brought in to the main pygram11 module namespace by imports in __init__.py (the submodules themselves are not meant to be part of the public API). This avoids tab completions of pygram11.hi<tab> yielding pygram11.hist when we actually want pygram11.histogram.

    Source code(tar.gz)
    Source code(zip)
  • 0.11.1(Feb 25, 2021)

    Two convenience functions added to the pygram11 namespace:

    • bin_centers: returns an array representing the the center of histogram bins given a total number of bins and an axis range or given existing bin edges.
    • bin_edges: returns an array representing bin edges given a total number of bins and an axis range.
    Source code(tar.gz)
    Source code(zip)
  • 0.11.0(Feb 11, 2021)

    • API change: functions calls with weights=None now return None as the second return. Previously the uncertainty was returned (which is just the square-root of the bin heights); now users can take the square-root themselves, and the back-end does not waste cycles tracking the uncertainty since it's trivial for unweighted histograms.
    • More types are supported without conversions (previously np.float64 and np.float32 were the only supported array types, and we converted non-floating point input). Now signed and unsigned integer inputs (both 64 and 32 bit) are supported.
      • If unsupported array types are used TypeError is now raised. This library prioritizes performance; hidden performance hits are explicitly avoided.
    • Configurable thresholds have been introduced to configure when OpenMP acceleration is used (described in the documentation).
    • The back-end was refactored with some help from boost::mp11 to aid in adding more type support without introducing more boilerplate. We now vendor boost::mp11 as a submodule.
    • Bumped the vendored pybind11 submodule to v2.6.2.
    • C++14 now required to build from source.
    • Added Apple Silicon support for Python 3.9 with libomp from Homebrew installed at /opt/homebrew.
    • Documentation improvements
    • Renamed master branch to main.
    Source code(tar.gz)
    Source code(zip)
  • 0.11.0rc1(Feb 10, 2021)

  • 0.10.3(Oct 9, 2020)

  • 0.10.2(Oct 9, 2020)

    • Vendor pybind11 2.6 series.
    • Some changes to backend types (std::size_t -> pybind11::ssize_t for sizes) (https://github.com/pybind/pybind11/pull/2293)
    • Improvements to CI/CD: using cibuildwheel to build wheels.
    • Python 3.9 is now tested and compatible wheels are built.
    Source code(tar.gz)
    Source code(zip)
  • 0.10.2rc1(Oct 9, 2020)

  • 0.10.1(Sep 5, 2020)

  • 0.10.0(Jun 30, 2020)

    Renamed internal Python module from histogram to hist. This avoids a clash with the module function of the same name. Some IDE features were confused.

    Source code(tar.gz)
    Source code(zip)
  • 0.9.1(May 26, 2020)

  • 0.9.0(May 26, 2020)

  • 0.8.2(May 18, 2020)

  • 0.8.1(Apr 26, 2020)

  • 0.8.0(Feb 25, 2020)

    Public API changes:

    • Remove previously deprecated omp function argument

    Backend changed:

    • moved 1D backend back to pybind11 (template flexibility)
    • parallelization cutoffs consistently at array size of 5000 for all calculations
    • all non-floating point inputs are converted

    Other:

    • Improved tests
    Source code(tar.gz)
    Source code(zip)
  • 0.7.3(Feb 11, 2020)

  • 0.7.2(Feb 6, 2020)

  • 0.7.1(Feb 6, 2020)

  • 0.7.0(Feb 5, 2020)

    • OpenMP is now required to build from source (not a difficult dependency to satisfy)
      • omp function argument deprecated
      • omp_max_threads renamed to omp_get_max_threads to mirror OpenMP C API
      • omp_available function removed
    • Backend C++ code for 1D histograms rewritten (use Python and NumPy C APIs directly, no pybind11), more types supported (avoid conversions)
    • Rewrite multiweight histograms (still via pybind11) to support more types (avoid conversions)
    • Python code moved to the top level module (__init__.py) (without changing public API)
    Source code(tar.gz)
    Source code(zip)
  • 0.6.1(Oct 20, 2019)

    • OpenMP inspection improved; new functions replace pygram11.OPENMP:
      • pygram11.omp_available() -> bool checks for availability
      • pygram11.omp_max_threads() -> int checks for the maximum available threads
    • some documentation improvements
    • bump pybind11 version to 2.4.3
    • pygram11.OPENMP will be removed in a future release
    Source code(tar.gz)
    Source code(zip)
  • 0.6.0(Oct 17, 2019)

  • 0.5.2(Aug 20, 2019)

  • 0.5.1(Aug 10, 2019)

    • Fix setup.py such that setuptools doesn't try to install numpy via easy_install (remove unnecessary setup_requires argument).
    • Add _max_threads attribute to pygram11._core.
    • Fixed macOS wheels (use delocate to ensure OpenMP symbols are bundled).
    Source code(tar.gz)
    Source code(zip)
  • 0.5.0(Jul 29, 2019)

    New features

    • histogramming multiple weight variations in a single function call is now possible via fix1dmw and var1dmw
    • In the NumPy like API, passing a 2 dimensional array to the weights argument will use this feature as well.
    • Binary wheels for Linux and macOS are provided on PyPI (conda-forge binaries are of course still available as well)

    Other updates

    • All functions now return the sqrt(sum-of-weights-squared) instead of sum-of-weights; before v0.5.x the 2D functions returned the sum of weights squared.
    Source code(tar.gz)
    Source code(zip)
  • 0.5.0.a2(Jul 29, 2019)

Owner
Doug Davis
OSS Engineer @anaconda
Doug Davis
PanGraphViewer -- show panenome graph in an easy way

PanGraphViewer -- show panenome graph in an easy way Table of Contents Versions and dependences Desktop-based panGraphViewer Library installation for

16 Dec 17, 2022
Exploratory analysis and data visualization of aircraft accidents and incidents in Brazil.

Exploring aircraft accidents in Brazil Occurrencies with aircraft in Brazil are investigated by the Center for Investigation and Prevention of Aircraf

Augusto Herrmann 5 Dec 14, 2021
This Crash Course will cover all you need to know to start using Plotly in your projects.

Plotly Crash Course This course was designed to help you get started using Plotly. If you ever felt like your data visualization skills could use an u

Fábio Neves 2 Aug 21, 2022
Fast 1D and 2D histogram functions in Python

About Sometimes you just want to compute simple 1D or 2D histograms with regular bins. Fast. No nonsense. Numpy's histogram functions are versatile, a

Thomas Robitaille 237 Dec 18, 2022
Flipper Zero documentation repo

Flipper Zero Docs Participation To fix a bug or add something new to this repository, you need to open a pull-request. Also, on every page of the site

Flipper Zero (All Repositories will be public soon) 114 Dec 30, 2022
Python scripts to manage Chia plots and drive space, providing full reports. Also monitors the number of chia coins you have.

Chia Plot, Drive Manager & Coin Monitor (V0.5 - April 20th, 2021) Multi Server Chia Plot and Drive Management Solution Be sure to ⭐ my repo so you can

338 Nov 25, 2022
Create Badges with stats of Scratch User, Project and Studio. Use those badges in Github readmes, etc.

Scratch-Stats-Badge Create customized Badges with stats of Scratch User, Studio or Project. Use those badges in Github readmes, etc. Examples Document

Siddhesh Chavan 5 Aug 28, 2022
Generate knowledge graphs with interesting geometries, like lattices

Geometric Graphs Generate knowledge graphs with interesting geometries, like lattices. Works on Python 3.9+ because it uses cool new features. Get out

Charles Tapley Hoyt 5 Jan 03, 2022
Automatically visualize your pandas dataframe via a single print! 📊 💡

A Python API for Intelligent Visual Discovery Lux is a Python library that facilitate fast and easy data exploration by automating the visualization a

Lux 4.3k Dec 28, 2022
This is a place where I'm playing around with pandas to analyze data in a csv/excel file.

pandas-csv-excel-analysis This is a place where I'm playing around with pandas to analyze data in a csv/excel file. 0-start A very simple cheat sheet

Chuqin 3 Oct 05, 2022
Visualization of hidden layer activations of small multilayer perceptrons (MLPs)

MLP Hidden Layer Activation Visualization To gain some intuition about the internal representation of simple multi-layer perceptrons (MLPs) I trained

Andreas Köpf 7 Dec 30, 2022
A TileDB backend for xarray.

TileDB-xarray This library provides a backend engine to xarray using the TileDB Storage Engine. Example usage: import xarray as xr dataset = xr.open_d

TileDB, Inc. 14 Jun 02, 2021
This is a small program that prints a user friendly, visual representation, of your current bsp tree

bspcq, q for query A bspc analyzer (utility for bspwm) This is a small program that prints a user friendly, visual representation, of your current bsp

nedia 9 Apr 24, 2022
A python script editor for napari based on PyQode.

napari-script-editor A python script editor for napari based on PyQode. This napari plugin was generated with Cookiecutter using with @napari's cookie

Robert Haase 9 Sep 20, 2022
Show Data: Show your dataset in web browser!

Show Data is to generate html tables for large scale image dataset, especially for the dataset in remote server. It provides some useful commond line tools and fully customizeble API reference to gen

Dechao Meng 83 Nov 26, 2022
Small project demonstrating the use of Grafana and InfluxDB for monitoring the speed of an internet connection

Speedtest monitor for Grafana A small project that allows internet speed monitoring using Grafana, InfluxDB 2 and Speedtest. Demo Requirements Docker

Joshua Ghali 3 Aug 06, 2021
A set of three functions, useful in geographical calculations of different sorts

GreatCircle A set of three functions, useful in geographical calculations of different sorts. Available for PHP, Python, Javascript and Ruby. Live dem

72 Sep 30, 2022
LabGraph is a a Python-first framework used to build sophisticated research systems with real-time streaming, graph API, and parallelism.

LabGraph is a a Python-first framework used to build sophisticated research systems with real-time streaming, graph API, and parallelism.

MLH Fellowship 7 Oct 05, 2022
WhatsApp Chat Analyzer is a WebApp and it can be used by anyone to analyze their chat. 😄

WhatsApp-Chat-Analyzer You can view the working project here. WhatsApp chat Analyzer is a WebApp where anyone either tech or non-tech person can analy

Prem Chandra Singh 26 Nov 02, 2022
Easily convert matplotlib plots from Python into interactive Leaflet web maps.

mplleaflet mplleaflet is a Python library that converts a matplotlib plot into a webpage containing a pannable, zoomable Leaflet map. It can also embe

Jacob Wasserman 502 Dec 28, 2022