Helper tools to construct probability distributions built from expert elicited data for use in monte carlo simulations.

Related tags

Data Analysiselicited
Overview

Elicited

Helper tools to construct probability distributions built from expert elicited data for use in monte carlo simulations.

Credit to Brett Hoover, packaging by @magoo

Usage

pip install elicited
import elicited as e

elicited is just a helper tool when using numpy and scipy, so you'll need these in your code.

import numpy as np
from scipy.stats import poisson, zipf, beta, pareto, lognorm

Lognormal

See Occurance and Applications for examples of lognormal distributions in nature.

Expert: Most customers hold around $20K (mode) but I could imagine a customer with $2.5M (max)

mode = 20000
max = 2500000

mean, stdv = e.elicitLogNormal(mode, max)
asset_values = lognorm(s=stdv, scale=np.exp(mean))
asset_values.rvs(100)

Pareto

The 80/20 rule. See Occurance and Applications

Expert: The legal costs of an incident could be devastating. Typically costs are almost zero (val_min) but a black swan could be $100M (val_max).

b = e.elicitPareto(val_min, val_max)
p = pareto(b, loc=val_min-1., scale=1.))

PERT

See PERT Distribution

Expert: Our customers have anywhere from $500-$6000 (val_min / val_max), but it's most typically around $4500 (val_mod)

PERT_a, PERT_b = e.elicitPERT(val_min, val_mod, val_max)
pert = beta(PERT_a, PERT_b, loc=val_min, scale=val_max-val_min)

Zipf's

See Applications

Expert: If we get sued, there will only be a few litigants (nMin). Very rarely it could be 30 or more litigants (nMax), maybe once every thousand cases (pMax) it would be more.

nMin = 1
nMax = 30
pMax = 1/1000

Zs = e.elicitZipf(nMin, nMax, pMax, report=True)

litigants = zipf(Zs, nMin-1)

litigants.rvs(100)

Reference: Other Useful Elicitations

Listed as a courtesy, these distributions are simple enough to elicit data into directly without a helper function.

Uniform

A "zero knowledge" distribution where all values within the range have equal probability of appearing. Similar to random.randint(a, b)

Expert: The crowd will be between 50 (min) and 500 (max) due to fire code restrictions and the existing residents in the building.

from scipy.stats import uniform

min = 50
max = 500

range = max - min

crowd_size = uniform(min, range)
crowd_size.rvs(100)

Poisson

Expert: About 3000 Customers (average) add a credit card to their account every quarter.

from scipy.stats import poisson
average = 3000
upsells = poisson(average)
upsells.rvs(100)
Owner
Ryan McGeehan
Founder / Advisor @ HackerOne Former Director of Security @ Coinbase Former Director of Security @ Facebook
Ryan McGeehan
Office365 (Microsoft365) audit log analysis tool

Office365 (Microsoft365) audit log analysis tool The header describes it all WHY?? The first line of code was written long time before other colleague

Anatoly 1 Jul 27, 2022
Orchest is a browser based IDE for Data Science.

Orchest is a browser based IDE for Data Science. It integrates your favorite Data Science tools out of the box, so you don’t have to. The application is easy to use and can run on your laptop as well

Orchest 3.6k Jan 09, 2023
A real-time financial data streaming pipeline and visualization platform using Apache Kafka, Cassandra, and Bokeh.

Realtime Financial Market Data Visualization and Analysis Introduction This repo shows my project about real-time stock data pipeline. All the code is

6 Sep 07, 2022
PySpark bindings for H3, a hierarchical hexagonal geospatial indexing system

h3-pyspark: Uber's H3 Hexagonal Hierarchical Geospatial Indexing System in PySpark PySpark bindings for the H3 core library. For available functions,

Kevin Schaich 12 Dec 24, 2022
Python package for analyzing sensor-collected human motion data

Python package for analyzing sensor-collected human motion data

Simon Ho 71 Nov 05, 2022
This creates a ohlc timeseries from downloaded CSV files from NSE India website and makes a SQLite database for your research.

NSE-timeseries-form-CSV-file-creator-and-SQL-appender- This creates a ohlc timeseries from downloaded CSV files from National Stock Exchange India (NS

PILLAI, Amal 1 Oct 02, 2022
track your GitHub statistics

GitHub-Stalker track your github statistics 👀 features find new followers or unfollowers find who got a star on your project or remove stars find who

Bahadır Araz 34 Nov 18, 2022
A crude Hy handle on Pandas library

Quickstart Hyenas is a curde Hy handle written on top of Pandas API to allow for more elegant access to data-scientist's powerhouse that is Pandas. In

Peter Výboch 4 Sep 05, 2022
The Master's in Data Science Program run by the Faculty of Mathematics and Information Science

The Master's in Data Science Program run by the Faculty of Mathematics and Information Science is among the first European programs in Data Science and is fully focused on data engineering and data a

Amir Ali 2 Jun 17, 2022
apricot implements submodular optimization for the purpose of selecting subsets of massive data sets to train machine learning models quickly.

Please consider citing the manuscript if you use apricot in your academic work! You can find more thorough documentation here. apricot implements subm

Jacob Schreiber 457 Dec 20, 2022
Airflow ETL With EKS EFS Sagemaker

Airflow ETL With EKS EFS & Sagemaker (en desarrollo) Diagrama de la solución Imp

1 Feb 14, 2022
Improving your data science workflows with

Make Better Defaults Author: Kjell Wooding [email protected] This is the git re

Kjell Wooding 18 Dec 23, 2022
Jupyter notebooks for the book "The Elements of Statistical Learning".

This repository contains Jupyter notebooks implementing the algorithms found in the book and summary of the textbook.

Madiyar 369 Dec 30, 2022
Python scripts aim to use a Random Forest machine learning algorithm to predict the water affinity of Metal-Organic Frameworks

The following Python scripts aim to use a Random Forest machine learning algorithm to predict the water affinity of Metal-Organic Frameworks (MOFs). The training set is extracted from the Cambridge S

1 Jan 09, 2022
Creating a statistical model to predict 10 year treasury yields

Predicting 10-Year Treasury Yields Intitially, I wanted to see if the volatility in the stock market, represented by the VIX index (data source), had

10 Oct 27, 2021
My solution to the book A Collection of Data Science Take-Home Challenges

DS-Take-Home Solution to the book "A Collection of Data Science Take-Home Challenges". Note: Please don't contact me for the dataset. This repository

Jifu Zhao 1.5k Jan 03, 2023
Datashredder is a simple data corruption engine written in python. You can corrupt anything text, images and video.

Datashredder is a simple data corruption engine written in python. You can corrupt anything text, images and video. You can chose the cha

2 Jul 22, 2022
A stock analysis app with streamlit

StockAnalysisApp A stock analysis app with streamlit. You select the ticker of the stock and the app makes a series of analysis by using the price cha

Antonio Catalano 50 Nov 27, 2022
A set of functions and analysis classes for solvation structure analysis

SolvationAnalysis The macroscopic behavior of a liquid is determined by its microscopic structure. For ionic systems, like batteries and many enzymes,

MDAnalysis 19 Nov 24, 2022
ToeholdTools is a Python package and desktop app designed to facilitate analyzing and designing toehold switches, created as part of the 2021 iGEM competition.

ToeholdTools Category Status Repository Package Build Quality A library for the analysis of toehold switch riboregulators created by the iGEM team Cit

0 Dec 01, 2021