Simple and fast histogramming in Python accelerated with OpenMP.

Overview

pygram11

Documentation Status Actions Status PyPI version Conda Forge Python Version

Simple and fast histogramming in Python accelerated with OpenMP with help from pybind11.

pygram11 provides functions for very fast histogram calculations (and the variance in each bin) in one and two dimensions. The API is very simple; documentation can be found here (you'll also find some benchmarks there).

Installing

From PyPI

Binary wheels are provided for Linux and macOS. They can be installed from PyPI via pip:

pip install pygram11

From conda-forge

For installation via the conda package manager pygram11 is part of conda-forge.

conda install pygram11 -c conda-forge

Please note that on macOS the OpenMP libraries from LLVM (libomp) and Intel (libiomp) may clash if your conda environment includes the Intel Math Kernel Library (MKL) package distributed by Anaconda. You may need to install the nomkl package to prevent the clash (Intel MKL accelerates many linear algebra operations, but does not impact pygram11):

conda install nomkl ## sometimes necessary fix (macOS only)

From Source

You need is a C++14 compiler and OpenMP. If you are using a relatively modern GCC release on Linux then you probably don't have to worry about the OpenMP dependency. If you are on macOS, you can install libomp from Homebrew (pygram11 does compile on Apple Silicon devices with Python version >= 3.9 and libomp installed from Homebrew). With those dependencies met, simply run:

git clone https://github.com/douglasdavis/pygram11.git --recurse-submodules
cd pygram11
pip install .

Or let pip handle the cloning procedure:

pip install git+https://github.com/douglasdavis/[email protected]

Tests are run on Python versions >= 3.7 and binary wheels are provided for those versions.

In Action

A histogram (with fixed bin width) of weighted data in one dimension:

>>> rng = np.random.default_rng(123)
>>> x = rng.standard_normal(10000)
>>> w = rng.uniform(0.8, 1.2, x.shape[0])
>>> h, err = pygram11.histogram(x, bins=40, range=(-4, 4), weights=w)

A histogram with fixed bin width which saves the under and overflow in the first and last bins:

>>> x = rng.standard_normal(1000000)
>>> h, __ = pygram11.histogram(x, bins=20, range=(-3, 3), flow=True)

where we've used __ to catch the None returned when weights are absent. A histogram in two dimensions with variable width bins:

>>> x = rng.standard_normal(1000)
>>> y = rng.standard_normal(1000)
>>> xbins = [-2.0, -1.0, -0.5, 1.5, 2.0, 3.1]
>>> ybins = [-3.0, -1.5, -0.1, 0.8, 2.0, 2.8]
>>> h, err = pygram11.histogram2d(x, y, bins=[xbins, ybins])

Manually controlling OpenMP acceleration with context managers:

>>> with pygram11.omp_disabled():  # disable all thresholds.
...     result, _ = pygram11.histogram(x, bins=10, range=(-3, 3))
...
>>> with pygram11.omp_forced(key="thresholds.var1d"):  # force a single threshold.
...     result, _ = pygram11.histogram(x, bins=[-3, -2, 0, 2, 3])
...

Histogramming multiple weight variations for the same data, then putting the result in a DataFrame (the input pandas DataFrame will be interpreted as a NumPy array):

>> data = rng.standard_normal(N) >>> count, err = pygram11.histogram(data, bins=20, range=(-3, 3), weights=weights, flow=True) >>> count_df = pd.DataFrame(count, columns=weights.columns) >>> err_df = pd.DataFrame(err, columns=weights.columns)">
>>> N = 10000
>>> weights = pd.DataFrame({"weight_a": np.abs(rng.standard_normal(N)),
...                         "weight_b": rng.uniform(0.5, 0.8, N),
...                         "weight_c": rng.uniform(0.0, 1.0, N)})
>>> data = rng.standard_normal(N)
>>> count, err = pygram11.histogram(data, bins=20, range=(-3, 3), weights=weights, flow=True)
>>> count_df = pd.DataFrame(count, columns=weights.columns)
>>> err_df = pd.DataFrame(err, columns=weights.columns)

I also wrote a blog post with some simple examples.

Other Libraries

  • boost-histogram provides Pythonic object oriented histograms.
  • Simple and fast histogramming in Python using the NumPy C API: fast-histogram (no variance or overflow support).
  • To calculate histograms in Python on a GPU, see cupy.histogram.

If there is something you'd like to see in pygram11, please open an issue or pull request.

Comments
  • Where is pygram11 in performance compared to fast-histogram?

    Where is pygram11 in performance compared to fast-histogram?

    Basically the title.

    I want to go for raw speed only and don't care for much more than being able to specify bin-edges and getting back a numpy-array from the function that calculates the histogram.

    To give a little more context: It could be, that i will have to calculate a histogram of 1'000'000 bins, where each bin is 1000 integers wide. I would have to fill 1000 values in those bins. Can you tell me if this is possible with pygram11?

    opened by tim-hilt 7
  • Config adjustments

    Config adjustments

    • Rewrite configuration; now controllable by a module, context managers, and decorator
    • modularize things a bit (separate numpy API from pygram11 core API)
    opened by douglasdavis 0
  • WIP: Large overhaul of backend, reorganize internal (non-public) API

    WIP: Large overhaul of backend, reorganize internal (non-public) API

    • [x] OpenMP required to build (not a difficult dependency to satisfy) - toggled via if statements on the input data size.
      • [x] omp function argument deprecated
    • [x] Backend C++ code for 1D histograms rewritten (use Python and NumPy C APIs directory, no pybind11)
    • [x] Rewrite multiweight histograms (still via pybind11) to support more types (avoid conversions)
    • [x] Python code moved to the top level module (__init__.py) (without changing public API)
    • [x] Documentation updates
    opened by douglasdavis 0
  • Improve OpenMP inspection; docs improvements; pybind11 update

    Improve OpenMP inspection; docs improvements; pybind11 update

    • Improve API for accessing information about OpenMP availability and properties
    • Docs improvements
    • Update to latest pybind11 release (2.4.3)
    • Clean up some old scripts
    opened by douglasdavis 0
  • Development for v0.5

    Development for v0.5

    • [x] implement C++ back end for multi weight histogramming
    • [x] redesign one-dimensional histogramming backend, simplify code by using more of the pybind11 array API (front end API uinchanged)
    opened by douglasdavis 0
  • Allow over/under flow for 1D histograms

    Allow over/under flow for 1D histograms

    • extend the number of bins in memory by 2
    • fill bin 0 for data less than xmin
    • fill bin nbins + 1 for data larger than than xmax
    • first real bin is now index 1
    • last real bin is now index nbins
    • add named arguments to python functions which dictate how to slice result so we return the desired bin counts.
    opened by douglasdavis 0
Releases(0.13.2)
  • 0.13.2(Dec 5, 2022)

  • 0.13.1(Aug 30, 2022)

  • 0.13.0(Aug 24, 2022)

  • 0.12.2(May 25, 2022)

  • 0.12.1(Oct 19, 2021)

  • 0.12.0(Apr 16, 2021)

    • OpenMP configuration redesigned.
      • constants have been removed in favor of new pygram11.config module.
      • Decorators and context managers added to control toggling OpenMP thresholds
        • pygram11.without_omp decorator
        • pygram11.with_omp decorator
        • pygram11.disable_omp context manager
        • pygram11.force_omp context manager
      • See the documentation for more
    • New keyword argument added to histogramming functions (cons_var) for returning variance instead of standard error.
    Source code(tar.gz)
    Source code(zip)
  • 0.11.2(Feb 27, 2021)

    Renamed internal Python files hist.py and misc.py to _hist.py and _misc.py, respectively.

    The contents of these modules are brought in to the main pygram11 module namespace by imports in __init__.py (the submodules themselves are not meant to be part of the public API). This avoids tab completions of pygram11.hi<tab> yielding pygram11.hist when we actually want pygram11.histogram.

    Source code(tar.gz)
    Source code(zip)
  • 0.11.1(Feb 25, 2021)

    Two convenience functions added to the pygram11 namespace:

    • bin_centers: returns an array representing the the center of histogram bins given a total number of bins and an axis range or given existing bin edges.
    • bin_edges: returns an array representing bin edges given a total number of bins and an axis range.
    Source code(tar.gz)
    Source code(zip)
  • 0.11.0(Feb 11, 2021)

    • API change: functions calls with weights=None now return None as the second return. Previously the uncertainty was returned (which is just the square-root of the bin heights); now users can take the square-root themselves, and the back-end does not waste cycles tracking the uncertainty since it's trivial for unweighted histograms.
    • More types are supported without conversions (previously np.float64 and np.float32 were the only supported array types, and we converted non-floating point input). Now signed and unsigned integer inputs (both 64 and 32 bit) are supported.
      • If unsupported array types are used TypeError is now raised. This library prioritizes performance; hidden performance hits are explicitly avoided.
    • Configurable thresholds have been introduced to configure when OpenMP acceleration is used (described in the documentation).
    • The back-end was refactored with some help from boost::mp11 to aid in adding more type support without introducing more boilerplate. We now vendor boost::mp11 as a submodule.
    • Bumped the vendored pybind11 submodule to v2.6.2.
    • C++14 now required to build from source.
    • Added Apple Silicon support for Python 3.9 with libomp from Homebrew installed at /opt/homebrew.
    • Documentation improvements
    • Renamed master branch to main.
    Source code(tar.gz)
    Source code(zip)
  • 0.11.0rc1(Feb 10, 2021)

  • 0.10.3(Oct 9, 2020)

  • 0.10.2(Oct 9, 2020)

    • Vendor pybind11 2.6 series.
    • Some changes to backend types (std::size_t -> pybind11::ssize_t for sizes) (https://github.com/pybind/pybind11/pull/2293)
    • Improvements to CI/CD: using cibuildwheel to build wheels.
    • Python 3.9 is now tested and compatible wheels are built.
    Source code(tar.gz)
    Source code(zip)
  • 0.10.2rc1(Oct 9, 2020)

  • 0.10.1(Sep 5, 2020)

  • 0.10.0(Jun 30, 2020)

    Renamed internal Python module from histogram to hist. This avoids a clash with the module function of the same name. Some IDE features were confused.

    Source code(tar.gz)
    Source code(zip)
  • 0.9.1(May 26, 2020)

  • 0.9.0(May 26, 2020)

  • 0.8.2(May 18, 2020)

  • 0.8.1(Apr 26, 2020)

  • 0.8.0(Feb 25, 2020)

    Public API changes:

    • Remove previously deprecated omp function argument

    Backend changed:

    • moved 1D backend back to pybind11 (template flexibility)
    • parallelization cutoffs consistently at array size of 5000 for all calculations
    • all non-floating point inputs are converted

    Other:

    • Improved tests
    Source code(tar.gz)
    Source code(zip)
  • 0.7.3(Feb 11, 2020)

  • 0.7.2(Feb 6, 2020)

  • 0.7.1(Feb 6, 2020)

  • 0.7.0(Feb 5, 2020)

    • OpenMP is now required to build from source (not a difficult dependency to satisfy)
      • omp function argument deprecated
      • omp_max_threads renamed to omp_get_max_threads to mirror OpenMP C API
      • omp_available function removed
    • Backend C++ code for 1D histograms rewritten (use Python and NumPy C APIs directly, no pybind11), more types supported (avoid conversions)
    • Rewrite multiweight histograms (still via pybind11) to support more types (avoid conversions)
    • Python code moved to the top level module (__init__.py) (without changing public API)
    Source code(tar.gz)
    Source code(zip)
  • 0.6.1(Oct 20, 2019)

    • OpenMP inspection improved; new functions replace pygram11.OPENMP:
      • pygram11.omp_available() -> bool checks for availability
      • pygram11.omp_max_threads() -> int checks for the maximum available threads
    • some documentation improvements
    • bump pybind11 version to 2.4.3
    • pygram11.OPENMP will be removed in a future release
    Source code(tar.gz)
    Source code(zip)
  • 0.6.0(Oct 17, 2019)

  • 0.5.2(Aug 20, 2019)

  • 0.5.1(Aug 10, 2019)

    • Fix setup.py such that setuptools doesn't try to install numpy via easy_install (remove unnecessary setup_requires argument).
    • Add _max_threads attribute to pygram11._core.
    • Fixed macOS wheels (use delocate to ensure OpenMP symbols are bundled).
    Source code(tar.gz)
    Source code(zip)
  • 0.5.0(Jul 29, 2019)

    New features

    • histogramming multiple weight variations in a single function call is now possible via fix1dmw and var1dmw
    • In the NumPy like API, passing a 2 dimensional array to the weights argument will use this feature as well.
    • Binary wheels for Linux and macOS are provided on PyPI (conda-forge binaries are of course still available as well)

    Other updates

    • All functions now return the sqrt(sum-of-weights-squared) instead of sum-of-weights; before v0.5.x the 2D functions returned the sum of weights squared.
    Source code(tar.gz)
    Source code(zip)
  • 0.5.0.a2(Jul 29, 2019)

Owner
Doug Davis
OSS Engineer @anaconda
Doug Davis
Fast visualization of radar_scenes based on oleschum/radar_scenes

RadarScenes Tools About This python package provides fast visualization for the RadarScenes dataset. The Open GL based visualizer is smoother than ole

Henrik Söderlund 2 Dec 09, 2021
This repository contains a streaming Dataflow pipeline written in Python with Apache Beam, reading data from PubSub.

Sample streaming Dataflow pipeline written in Python This repository contains a streaming Dataflow pipeline written in Python with Apache Beam, readin

Israel Herraiz 9 Mar 18, 2022
Lightweight, extensible data validation library for Python

Cerberus Cerberus is a lightweight and extensible data validation library for Python. v = Validator({'name': {'type': 'string'}}) v.validate({

eve 2.9k Dec 27, 2022
Here are my graphs for hw_02

Let's Have A Look At Some Graphs! Graph 1: State Mentions in Congressperson's Tweets on 10/01/2017 The graph below uses this data set to demonstrate h

7 Sep 02, 2022
Python module for drawing and rendering beautiful atoms and molecules using Blender.

Batoms is a Python package for editing and rendering atoms and molecules objects using blender. A Python interface that allows for automating workflows.

Xing Wang 1 Jul 06, 2022
Bar Chart of the number of Senators from each party who are up for election in the next three General Elections

Congress-Analysis Bar Chart of the number of Senators from each party who are up for election in the next three General Elections This bar chart shows

11 Oct 26, 2021
Render Jupyter notebook in the terminal

jut - JUpyter notebook Terminal viewer. The command line tool view the IPython/Jupyter notebook in the terminal. Install pip install jut Usage $jut --

Kracekumar 169 Dec 27, 2022
Write python locally, execute SQL in your data warehouse

RasgoQL Write python locally, execute SQL in your data warehouse ≪ Read the Docs · Join Our Slack » RasgoQL is a Python package that enables you to ea

Rasgo 265 Nov 21, 2022
Investment and risk technologies maintained by Fortitudo Technologies.

Fortitudo Technologies Open Source This package allows you to freely explore open-source implementations of some of our fundamental technologies under

Fortitudo Technologies 11 Dec 14, 2022
✅ Today I Learn

Today I Learn EDA numpy_100ex numpy_0~10 airline_satisfaction_prediction BERT_naver_movie_classification NLP_prepare NLP_Tweet_Emotion_Recognition tex

Yeonghoo_Ahn 3 Dec 15, 2022
In-memory Graph Database and Knowledge Graph with Natural Language Interface, compatible with Pandas

CogniPy for Pandas - In-memory Graph Database and Knowledge Graph with Natural Language Interface Whats in the box Reasoning, exploration of RDF/OWL,

Cognitum Octopus 34 Dec 13, 2022
HW 2: Visualizing interesting datasets

HW 2: Visualizing interesting datasets Check out the project instructions here! Mean Earnings per Hour for Males and Females My first graph uses data

7 Oct 27, 2021
NorthPitch is a python soccer plotting library that sits on top of Matplotlib

NorthPitch is a python soccer plotting library that sits on top of Matplotlib.

Devin Pleuler 30 Feb 22, 2022
Graphing communities on Twitch.tv in a visually intuitive way

VisualizingTwitchCommunities This project maps communities of streamers on Twitch.tv based on shared viewership. The data is collected from the Twitch

Kiran Gershenfeld 312 Jan 07, 2023
A Python library for plotting hockey rinks with Matplotlib.

Hockey Rink A Python library for plotting hockey rinks with Matplotlib. Installation pip install hockey_rink Current Rinks The following shows the cus

24 Jan 02, 2023
📊 Extensions for Matplotlib

📊 Extensions for Matplotlib

Nico Schlömer 519 Dec 30, 2022
nvitop, an interactive NVIDIA-GPU process viewer, the one-stop solution for GPU process management

An interactive NVIDIA-GPU process viewer, the one-stop solution for GPU process management.

Xuehai Pan 1.3k Jan 02, 2023
Streamlit component for Let's-Plot visualization library

streamlit-letsplot This is a work-in-progress, providing a convenience function to plot charts from the Lets-Plot visualization library. Example usage

Randy Zwitch 9 Nov 03, 2022
Implement the Perspective open source code in preparation for data visualization

Task Overview | Installation Instructions | Link to Module 2 Introduction Experience Technology at JP Morgan Chase Try out what real work is like in t

Abdulazeez Jimoh 1 Jan 23, 2022
It's an application to calculate I from v and r. It can also plot a graph between V vs I.

Ohm-s-Law-Visualizer It's an application to calculate I from v and r using Ohm's Law. It can also plot a graph between V vs I. Story I'm doing my Unde

Sihab Sahariar 1 Nov 20, 2021