WhiteBox Utilities Toolkit: Tools to make your life easier

Fancy data functions that will make your life as a data scientist easier.

Installing

To install this library in your Python environment:

pip install whiteboxml

Documentation

Metrics

Classification

ROC curve / AUC:

import numpy as np

from whiteboxml.modeling.metrics import plot_roc_auc_binary

y_pred = np.random.normal(0, 1, 1000)
y_true = np.random.choice([0, 1], 1000)

ax, fpr, tpr, thr, auc_score = plot_roc_auc_binary(y_pred=y_pred, y_true=y_true, figsize=(8, 8))

ax.get_figure().savefig('roc_curve.png')

Confusion Matrix:

import numpy as np

from whiteboxml.modeling.metrics import plot_confusion_matrix

y_true = np.random.choice([0, 1, 2, 3], 10000)
y_pred = np.random.choice([0, 1, 2, 3], 10000)

ax, matrix = plot_confusion_matrix(y_pred=y_pred, y_true=y_true, 
                                   class_labels=['a', 'b', 'c', 'd'])

ax.get_figure().savefig('confusion_matrix.png')

Optimal Threshold:

import numpy as np

from whiteboxml.modeling.metrics import get_optimal_thr

y_pred_proba = np.random.normal(0, 1, (100, 1))
y_true = np.random.choice([0, 1], (100, 1))

thr = get_optimal_thr(y_pred=y_pred_proba, y_true=y_true)

Fancy data functions that will make your life as a data scientist easier.

Related tags

Overview

WhiteBox Utilities Toolkit: Tools to make your life easier

Installing

Documentation

Metrics

Classification

Owner

WhiteBox

Python script for transferring data between three drives in two separate stages

vartests is a Python library to perform some statistic tests to evaluate Value at Risk (VaR) Models

PyClustering is a Python, C++ data mining library.

CleanX is an open source python library for exploring, cleaning and augmenting large datasets of X-rays, or certain other types of radiological images.

TheMachineScraper 🐱‍👤 is an Information Grabber built for Machine Analysis

Yet Another Workflow Parser for SecurityHub

Data Analytics on Genomes and Genetics

A utility for functional piping in Python that allows you to access any function in any scope as a partial.

BasstatPL is a package for performing different tabulations and calculations for descriptive statistics.

Pizza Orders Data Pipeline Usecase Solved by SQL, Sqoop, HDFS, Hive, Airflow.

Probabilistic Programming in Python: Bayesian Modeling and Probabilistic Machine Learning with Theano

Data processing with Pandas.

Open-Domain Question-Answering for COVID-19 and Other Emergent Domains

BioMASS - A Python Framework for Modeling and Analysis of Signaling Systems

track your GitHub statistics

AptaMat is a simple script which aims to measure differences between DNA or RNA secondary structures.

Karate Club: An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs (CIKM 2020)

Elasticsearch tool for easily collecting and batch inserting Python data and pandas DataFrames

A Python Tools to imaging the shallow seismic structure

A lightweight interface for reading in output from the Weather Research and Forecasting (WRF) model into xarray Dataset