XAI - An eXplainability toolbox for machine learning

Overview

GitHub GitHub GitHub GitHub

XAI - An eXplainability toolbox for machine learning

XAI is a Machine Learning library that is designed with AI explainability in its core. XAI contains various tools that enable for analysis and evaluation of data and models. The XAI library is maintained by The Institute for Ethical AI & ML, and it was developed based on the 8 principles for Responsible Machine Learning.

You can find the documentation at https://ethicalml.github.io/xai/index.html. You can also check out our talk at Tensorflow London where the idea was first conceived - the talk also contains an insight on the definitions and principles in this library.

YouTube video showing how to use XAI to mitigate undesired biases

This video of the talk presented at the PyData London 2019 Conference which provides an overview on the motivations for machine learning explainability as well as techniques to introduce explainability and mitigate undesired biases using the XAI Library.
Do you want to learn about more awesome machine learning explainability tools? Check out our community-built "Awesome Machine Learning Production & Operations" list which contains an extensive list of tools for explainability, privacy, orchestration and beyond.

0.1.0

If you want to see a fully functional demo in action clone this repo and run the Example Jupyter Notebook in the Examples folder.

What do we mean by eXplainable AI?

We see the challenge of explainability as more than just an algorithmic challenge, which requires a combination of data science best practices with domain-specific knowledge. The XAI library is designed to empower machine learning engineers and relevant domain experts to analyse the end-to-end solution and identify discrepancies that may result in sub-optimal performance relative to the objectives required. More broadly, the XAI library is designed using the 3-steps of explainable machine learning, which involve 1) data analysis, 2) model evaluation, and 3) production monitoring.

We provide a visual overview of these three steps mentioned above in this diagram:

XAI Quickstart

Installation

The XAI package is on PyPI. To install you can run:

pip install xai

Alternatively you can install from source by cloning the repo and running:

python setup.py install 

Usage

You can find example usage in the examples folder.

1) Data Analysis

With XAI you can identify imbalances in the data. For this, we will load the census dataset from the XAI library.

import xai.data
df = xai.data.load_census()
df.head()

View class imbalances for all categories of one column

ims = xai.imbalance_plot(df, "gender")

View imbalances for all categories across multiple columns

im = xai.imbalance_plot(df, "gender", "loan")

Balance classes using upsampling and/or downsampling

bal_df = xai.balance(df, "gender", "loan", upsample=0.8)

Perform custom operations on groups

groups = xai.group_by_columns(df, ["gender", "loan"])
for group, group_df in groups:    
    print(group) 
    print(group_df["loan"].head(), "\n")

Visualise correlations as a matrix

_ = xai.correlations(df, include_categorical=True, plot_type="matrix")

Visualise correlations as a hierarchical dendogram

_ = xai.correlations(df, include_categorical=True)

Create a balanced validation and training split dataset

# Balanced train-test split with minimum 300 examples of 
#     the cross of the target y and the column gender
x_train, y_train, x_test, y_test, train_idx, test_idx = \
    xai.balanced_train_test_split(
            x, y, "gender", 
            min_per_group=300,
            max_per_group=300,
            categorical_cols=categorical_cols)

x_train_display = bal_df[train_idx]
x_test_display = bal_df[test_idx]

print("Total number of examples: ", x_test.shape[0])

df_test = x_test_display.copy()
df_test["loan"] = y_test

_= xai.imbalance_plot(df_test, "gender", "loan", categorical_cols=categorical_cols)

2) Model Evaluation

We are able to also analyse the interaction between inference results and input features. For this, we will train a single layer deep learning model.

= 0.5).astype(int).T[0]) ">
model = build_model(proc_df.drop("loan", axis=1))

model.fit(f_in(x_train), y_train, epochs=50, batch_size=512)

probabilities = model.predict(f_in(x_test))
predictions = list((probabilities >= 0.5).astype(int).T[0])

Visualise permutation feature importance

def get_avg(x, y):
    return model.evaluate(f_in(x), y, verbose=0)[1]

imp = xai.feature_importance(x_test, y_test, get_avg)

imp.head()

Identify metric imbalances against all test data

_= xai.metrics_plot(
        y_test, 
        probabilities)

Identify metric imbalances across a specific column

_ = xai.metrics_plot(
    y_test, 
    probabilities, 
    df=x_test_display, 
    cross_cols=["gender"],
    categorical_cols=categorical_cols)

Identify metric imbalances across multiple columns

_ = xai.metrics_plot(
    y_test, 
    probabilities, 
    df=x_test_display, 
    cross_cols=["gender", "ethnicity"],
    categorical_cols=categorical_cols)

Draw confusion matrix

xai.confusion_matrix_plot(y_test, pred)

Visualise the ROC curve against all test data

_ = xai.roc_plot(y_test, probabilities)

Visualise the ROC curves grouped by a protected column

protected = ["gender", "ethnicity", "age"]
_ = [xai.roc_plot(
    y_test, 
    probabilities, 
    df=x_test_display, 
    cross_cols=[p],
    categorical_cols=categorical_cols) for p in protected]

Visualise accuracy grouped by probability buckets

d = xai.smile_imbalance(
    y_test, 
    probabilities)

Visualise statistical metrics grouped by probability buckets

d = xai.smile_imbalance(
    y_test, 
    probabilities,
    display_breakdown=True)

Visualise benefits of adding manual review on probability thresholds

d = xai.smile_imbalance(
    y_test, 
    probabilities,
    bins=9,
    threshold=0.75,
    manual_review=0.375,
    display_breakdown=False)

Comments
  • matplotlib error while installing package

    matplotlib error while installing package

    Collecting matplotlib==3.0.2

    Using cached matplotlib-3.0.2.tar.gz (36.5 MB) ERROR: Command errored out with exit status 1: command: /opt/anaconda3/envs/ethicalml/bin/python -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/wv/m62_p54d5bx1dnq_m07ck3l40000gn/T/pip-install-303disb7/matplotlib/setup.py'"'"'; file='"'"'/private/var/folders/wv/m62_p54d5bx1dnq_m07ck3l40000gn/T/pip-install-303disb7/matplotlib/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' egg_info --egg-base /private/var/folders/wv/m62_p54d5bx1dnq_m07ck3l40000gn/T/pip-install-303disb7/matplotlib/pip-egg-info

    opened by ArpitSisodia 3
  • Requirements

    Requirements

    your requirements are very restrictive. Can you please change it to >= instead of ==. for example:

    numpy>=1.3
    pandas>=0.23.0
    matplotlib>2.02,<=3.0.3
    scikit-learn>=0.19.0
    
    opened by idanmoradarthas 3
  • converters the probs into np array if its already not

    converters the probs into np array if its already not

    smile_imbalance() funciton argument "probs" does not specify that it is required to be numpy array, but it does so i have added that data type in the argument letting the user know if he/she is to refer to the docs and i have also added a line np.array() which is an idempotent operation(if the array passed is already numpy array then it does nothing but if its not it changes the list into numpy array)

    Suggestion

    • If possible can you guys consider adding "save_plot_path" method to each function, so that when this package is used in production (which i am and people considering Continuous model delivery would use) all these plots could be saved to a particular directory for data scientists to look at later since in production, code would be used in scripts running on EC2 or other cloud servers and not on jupyter notebooks
    • My use case is I am retraining the model every week and XAI allows me to generate a evaluation report allowing me to remotely decide weather to push this weeks mode into production
    • I considered adding it myself but i was not sure if this is the direction you guys wanted to take

    Thank you

    opened by sai-krishna-msk 2
  • Unable to install package

    Unable to install package

    Hello!

    I've been trying to install this package and am unable to do so. I've tried both methods on my Ubuntu machine.

    1. pip install xai
    2. python setup.py install

    What can I do to install this? Also, is this project active anymore at all?

    opened by varunbanda 2
  • Can we explain BERT models using this package?

    Can we explain BERT models using this package?

    I'm working with text data and looking for ways to explain BERT models. Is there any workaround using XAI or any other package/resources if anyone can recommend?

    opened by techwithshadab 1
  • Add a conda install option for `xai`

    Add a conda install option for `xai`

    A conda installation option could be very helpful. I have already started working on this, to add xai to conda-forge.

    Conda-forge PR:

    • https://github.com/conda-forge/staged-recipes/pull/17601

    Once the conda-forge PR is merged, you will be able to install the library with conda as follows:

    conda install -c conda-forge xai
    

    :bulb: I will push a PR to update the docs once the package is available on conda-forge.

    opened by sugatoray 0
  • Wrong series returned from _curve

    Wrong series returned from _curve

    There is some bug in https://github.com/EthicalML/xai/blob/master/xai/init.py#L962

    it was written as

    r1s = r2s = []
    

    but should be instead

    r1s, r2s = [], []
    

    The impact is that if the user would like to us r1s and r2s returned to construct the the curve (e.g. for storing the data for later analysis), they would find that r1s and r2s are referring to the same instance which stores all the curve data that should have been separately stored in r1s and r2s

    opened by chen0040 1
Releases(v0.1.0)
Owner
The Institute for Ethical Machine Learning
The Institute for Ethical Machine Learning is a think-tank that brings together with technology leaders, policymakers & academics to develop standards for ML.
The Institute for Ethical Machine Learning
A GitHub action that suggests type annotations for Python using machine learning.

Typilus: Suggest Python Type Annotations A GitHub action that suggests type annotations for Python using machine learning. This action makes suggestio

40 Sep 18, 2022
Transform ML models into a native code with zero dependencies

m2cgen (Model 2 Code Generator) - is a lightweight library which provides an easy way to transpile trained statistical models into a native code

Bayes' Witnesses 2.3k Jan 03, 2023
Machine Learning approach for quantifying detector distortion fields

DistortionML Machine Learning approach for quantifying detector distortion fields. This project is a feasibility study for training a surrogate model

Joel Bernier 1 Nov 05, 2021
CyLP is a Python interface to COIN-OR’s Linear and mixed-integer program solvers (CLP, CBC, and CGL)

CyLP CyLP is a Python interface to COIN-OR’s Linear and mixed-integer program solvers (CLP, CBC, and CGL). CyLP’s unique feature is that you can use i

COIN-OR Foundation 161 Dec 14, 2022
BudouX is the successor to Budou, the machine learning powered line break organizer tool.

BudouX Standalone. Small. Language-neutral. BudouX is the successor to Budou, the machine learning powered line break organizer tool. It is standalone

Google 868 Jan 05, 2023
Self Organising Map (SOM) for clustering of atomistic samples through unsupervised learning.

Self Organising Map for Clustering of Atomistic Samples - V2 Description Self Organising Map (also known as Kohonen Network) implemented in Python for

Franco Aquistapace 0 Nov 16, 2021
Code base of KU AIRS: SPARK Autonomous Vehicle Team

KU AIRS: SPARK Autonomous Vehicle Project Check this link for the blog post describing this project and the video of SPARK in simulation and on parkou

Mehmet Enes Erciyes 1 Nov 23, 2021
A chain of stores, 10 different stores and 50 different requests a 3-month demand forecast for its product.

Demand-Forecasting Business Problem A chain of stores, 10 different stores and 50 different requests a 3-month demand forecast for its product.

Ayşe Nur Türkaslan 3 Mar 06, 2022
Predict the income for each percentile of the population (Python) - FRENCH

05.income-prediction Predict the income for each percentile of the population (Python) - FRENCH Effectuez une prédiction de revenus Prérequis Pour ce

1 Feb 13, 2022
Iterative stochastic gradient descent (SGD) linear regressor with regularization

SGD-Linear-Regressor Iterative stochastic gradient descent (SGD) linear regressor with regularization Dataset: Kaggle “Graduate Admission 2” https://w

Zechen Ma 1 Oct 29, 2021
Adversarial Framework for (non-) Parametric Image Stylisation Mosaics

Fully Adversarial Mosaics (FAMOS) Pytorch implementation of the paper "Copy the Old or Paint Anew? An Adversarial Framework for (non-) Parametric Imag

Zalando Research 120 Dec 24, 2022
A toolkit for geo ML data processing and model evaluation (fork of solaris)

An open source ML toolkit for overhead imagery. This is a beta version of lunular which may continue to develop. Please report any bugs through issues

Ryan Avery 4 Nov 04, 2021
CVXPY is a Python-embedded modeling language for convex optimization problems.

CVXPY The CVXPY documentation is at cvxpy.org. We are building a CVXPY community on Discord. Join the conversation! For issues and long-form discussio

4.3k Jan 08, 2023
Distributed deep learning on Hadoop and Spark clusters.

Note: we're lovingly marking this project as Archived since we're no longer supporting it. You are welcome to read the code and fork your own version

Yahoo 1.3k Dec 28, 2022
Turns your machine learning code into microservices with web API, interactive GUI, and more.

Turns your machine learning code into microservices with web API, interactive GUI, and more.

Machine Learning Tooling 2.8k Jan 02, 2023
Pandas-method-chaining is a plugin for flake8 that provides method chaining linting for pandas code

pandas-method-chaining pandas-method-chaining is a plugin for flake8 that provides method chaining linting for pandas code. It is a fork from pandas-v

Francis 5 May 14, 2022
Mortality risk prediction for COVID-19 patients using XGBoost models

Mortality risk prediction for COVID-19 patients using XGBoost models Using demographic and lab test data received from the HM Hospitales in Spain, I b

1 Jan 19, 2022
MaD GUI is a basis for graphical annotation and computational analysis of time series data.

MaD GUI Machine Learning and Data Analytics Graphical User Interface MaD GUI is a basis for graphical annotation and computational analysis of time se

Machine Learning and Data Analytics Lab FAU 10 Dec 19, 2022
Tools for Optuna, MLflow and the integration of both.

HPOflow - Sphinx DOC Tools for Optuna, MLflow and the integration of both. Detailed documentation with examples can be found here: Sphinx DOC Table of

Telekom Open Source Software 17 Nov 20, 2022
Python implementation of the rulefit algorithm

RuleFit Implementation of a rule based prediction algorithm based on the rulefit algorithm from Friedman and Popescu (PDF) The algorithm can be used f

Christoph Molnar 326 Jan 02, 2023