This is the code repository for Interpretable Machine Learning with Python, published by Packt.

Overview

Interpretable Machine Learning with Python

Interpretable Machine Learning with Pythone

This is the code repository for Interpretable Machine Learning with Python, published by Packt.

Learn to build interpretable high-performance models with hands-on real-world examples

What is this book about?

Do you want to understand your models and mitigate the risks associated with poor predictions using practical machine learning (ML) interpretation? Interpretable Machine Learning with Python can help you overcome these challenges, using interpretation methods to build fairer and safer ML models.

This book covers the following exciting features:

  • Recognize the importance of interpretability in business
  • Study models that are intrinsically interpretable such as linear models, decision trees, and Naïve Bayes
  • Become well-versed in interpreting models with model-agnostic methods
  • Visualize how an image classifier works and what it learns
  • Understand how to mitigate the influence of bias in datasets

If you feel this book is for you, get your copy today!

https://www.packtpub.com/

Instructions and Navigations

All of the code is organized into folders. For example, Chapter02.

The code will look like the following:

base_classifier = KerasClassifier(model=base_model,\
                                  clip_values=(min_, max_))
y_test_mdsample_prob = np.max(y_test_prob[sampl_md_idxs],\
                                                       axis=1)
y_test_smsample_prob = np.max(y_test_prob[sampl_sm_idxs],\
                                                       axis=1)

Following is what you need for this book: This book is for data scientists, machine learning developers, and data stewards who have an increasingly critical responsibility to explain how the AI systems they develop work, their impact on decision making, and how they identify and manage bias. Working knowledge of machine learning and the Python programming language is expected.

With the following software and hardware list you can run all code files present in the book (Chapter 1-14).

Software and Hardware List

You can install the software required in any operating system by first installing Jupyter Notebook or Jupyter Lab with the most recent version of Python, or install Anaconda which can install everything at once. While hardware requirements for Jupyter are relatively modest, we recommend a machine with at least 4 cores of 2Ghz and 8Gb of RAM.

Alternatively, to installing the software locally, you can run the code in the cloud using Google Colab or another cloud notebook service.

Either way, the following packages are required to run the code in all the chapters (Google Colab has all the packages denoted with a ^):

Chapter Software required OS required
1 - 13 ^ Python 3.6+ Windows, Mac OS X, and Linux (Any)
1 - 13 ^ matplotlib 3.2.2+ Windows, Mac OS X, and Linux (Any)
1 - 13 ^ scikit-learn 0.22.2+ Windows, Mac OS X, and Linux (Any)
1 - 12 ^ pandas 1.1.5+ Windows, Mac OS X, and Linux (Any)
2 - 13 machine-learning-datasets 0.01.16+ Windows, Mac OS X, and Linux (Any)
2 - 13 ^ numpy 1.19.5+ Windows, Mac OS X, and Linux (Any)
3 - 13 ^ seaborn 0.11.1+ Windows, Mac OS X, and Linux (Any)
3 - 13 ^ tensorflow 2.4.1+ Windows, Mac OS X, and Linux (Any)
5 - 12 shap 0.38.1+ Windows, Mac OS X, and Linux (Any)
1, 5, 10, 12 ^ scipy 1.4.1+ Windows, Mac OS X, and Linux (Any)
5, 10-12 ^ xgboost 0.90+ Windows, Mac OS X, and Linux (Any)
6, 11, 12 ^ lightgbm 2.2.3+ Windows, Mac OS X, and Linux (Any)
7 - 9 alibi 0.5.5+ Windows, Mac OS X, and Linux (Any)
10 - 13 ^ tqdm 4.41.1+ Windows, Mac OS X, and Linux (Any)
2, 9 ^ statsmodels 0.10.2+ Windows, Mac OS X, and Linux (Any)
3, 5 rulefit 0.3.1+ Windows, Mac OS X, and Linux (Any)
6, 8 lime 0.2.0.1+ Windows, Mac OS X, and Linux (Any)
7, 12 catboost 0.24.4+ Windows, Mac OS X, and Linux (Any)
8, 9 ^ Keras 2.4.3+ Windows, Mac OS X, and Linux (Any)
11, 12 ^ pydot 1.3.0+ Windows, Mac OS X, and Linux (Any)
11, 12 xai 0.0.4+ Windows, Mac OS X, and Linux (Any)
1 ^ beautifulsoup4 4.6.3+ Windows, Mac OS X, and Linux (Any)
1 ^ requests 2.23.0+ Windows, Mac OS X, and Linux (Any)
3 cvae 0.0.3+ Windows, Mac OS X, and Linux (Any)
3 interpret 0.2.2+ Windows, Mac OS X, and Linux (Any)
3 ^ six 1.15.0+ Windows, Mac OS X, and Linux (Any)
3 skope-rules 1.0.1+ Windows, Mac OS X, and Linux (Any)
4 PDPbox 0.2.0+ Windows, Mac OS X, and Linux (Any)
4 pycebox 0.0.1+ Windows, Mac OS X, and Linux (Any)
5 alepython 0.1+ Windows, Mac OS X, and Linux (Any)
5 tensorflow-docs 0.0.02+ Windows, Mac OS X, and Linux (Any)
6 ^ nltk 3.2.5+ Windows, Mac OS X, and Linux (Any)
7 witwidget 1.7.0+ Windows, Mac OS X, and Linux (Any)
8 ^ opencv-python 4.1.2.30+ Windows, Mac OS X, and Linux (Any)
8 ^ scikit-image 0.16.2+ Windows, Mac OS X, and Linux (Any)
8 tf-explain 0.2.1+ Windows, Mac OS X, and Linux (Any)
8 tf-keras-vis 0.5.5+ Windows, Mac OS X, and Linux (Any)
9 SALib 1.3.12+ Windows, Mac OS X, and Linux (Any)
9 distython 0.0.3+ Windows, Mac OS X, and Linux (Any)
10 ^ mlxtend 0.14.0+ Windows, Mac OS X, and Linux (Any)
10 sklearn-genetic 0.3.0+ Windows, Mac OS X, and Linux (Any)
11 aif360==0.3.0 Windows, Mac OS X, and Linux (Any)
11 BlackBoxAuditing==0.1.54 Windows, Mac OS X, and Linux (Any)
11 dowhy 0.5.1+ Windows, Mac OS X, and Linux (Any)
11 econml 0.9.0+ Windows, Mac OS X, and Linux (Any)
11 ^ networkx 2.5+ Windows, Mac OS X, and Linux (Any)
12 bayesian-optimization 1.2.0+ Windows, Mac OS X, and Linux (Any)
12 ^ graphviz 0.10.1+ Windows, Mac OS X, and Linux (Any)
12 tensorflow-lattice 2.0.7+ Windows, Mac OS X, and Linux (Any)
13 adversarial-robustness-toolbox 1.5.0+ Windows, Mac OS X, and Linux (Any)

NOTE: the library machine-learning-datasets is the official name of what in the book is referred to as mldatasets. Due to naming conflicts, it had to be changed.

The exact versions of each library, as tested, can be found in the requirements.txt file and installed like this should you have a dedicated environment for them:

> pip install -r requirements.txt

You might get some conflicts specifically with libraries cvae, alepython, pdpbox and xai. If this is the case, try:

> pip install --no-deps -r requirements.txt

Alternatively, you can install libraries one chapter at a time inside of a local Jupyter environment using cells with !pip install or run all the code in Google Colab with the following links:

Remember to make sure you click on the menu item "File > Save a copy in Drive" as soon you open each link to ensure that your notebook is saved as you run it. Also, notebooks denoted with plus sign (+) are relatively compute-intensive, and will take an extremely long time to run on Google Colab but if you must go to "Runtime > Change runtime type" and select "High-RAM" for runtime shape. Otherwise, a better cloud enviornment or local environment is preferable.

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. Click here to download it.

Summary

The book does much more than explain technical topics, but here's a summary of the chapters:

Chapters topics

Related products

Get to Know the Authors

Serg Masís has been at the confluence of the internet, application development, and analytics for the last two decades. Currently, he's a Climate and Agronomic Data Scientist at Syngenta, a leading agribusiness company with a mission to improve global food security. Before that role, he co-founded a startup, incubated by Harvard Innovation Labs, that combined the power of cloud computing and machine learning with principles in decision-making science to expose users to new places and events. Whether it pertains to leisure activities, plant diseases, or customer lifetime value, Serg is passionate about providing the often-missing link between data and decision-making — and machine learning interpretation helps bridge this gap more robustly.

Owner
Packt
Providing books, eBooks, video tutorials, and articles for IT developers, administrators, and users.
Packt
The easy way to combine mlflow, hydra and optuna into one machine learning pipeline.

mlflow_hydra_optuna_the_easy_way The easy way to combine mlflow, hydra and optuna into one machine learning pipeline. Objective TODO Usage 1. build do

shibuiwilliam 9 Sep 09, 2022
jaxfg - Factor graph-based nonlinear optimization library for JAX.

Factor graphs + nonlinear optimization in JAX

Brent Yi 134 Dec 21, 2022
Combines Bayesian analyses from many datasets.

PosteriorStacker Combines Bayesian analyses from many datasets. Introduction Method Tutorial Output plot and files Introduction Fitting a model to a d

Johannes Buchner 19 Feb 13, 2022
K-Means clusternig example with Python and Scikit-learn

Unsupervised-Machine-Learning Flat Clustering K-Means clusternig example with Python and Scikit-learn Flat clustering Clustering algorithms group a se

Emin 1 Dec 13, 2021
Simple, fast, and parallelized symbolic regression in Python/Julia via regularized evolution and simulated annealing

Parallelized symbolic regression built on Julia, and interfaced by Python. Uses regularized evolution, simulated annealing, and gradient-free optimization.

Miles Cranmer 924 Jan 03, 2023
This is a curated list of medical data for machine learning

Medical Data for Machine Learning This is a curated list of medical data for machine learning. This list is provided for informational purposes only,

Andrew L. Beam 5.4k Dec 26, 2022
Module for statistical learning, with a particular emphasis on time-dependent modelling

Operating system Build Status Linux/Mac Windows tick tick is a Python 3 module for statistical learning, with a particular emphasis on time-dependent

X - Data Science Initiative 410 Dec 14, 2022
Factorization machines in python

Factorization Machines in Python This is a python implementation of Factorization Machines [1]. This uses stochastic gradient descent with adaptive re

Corey Lynch 892 Jan 03, 2023
Library for machine learning stacking generalization.

stacked_generalization Implemented machine learning *stacking technic[1]* as handy library in Python. Feature weighted linear stacking is also availab

114 Jul 19, 2022
Relevance Vector Machine implementation using the scikit-learn API.

scikit-rvm scikit-rvm is a Python module implementing the Relevance Vector Machine (RVM) machine learning technique using the scikit-learn API. Quicks

James Ritchie 204 Nov 18, 2022
SmartSim makes it easier to use common Machine Learning (ML) libraries like PyTorch and TensorFlow

SmartSim makes it easier to use common Machine Learning (ML) libraries like PyTorch and TensorFlow, in High Performance Computing (HPC) simulations and workloads.

A simple python program that draws a tree for incrementing values using the Collatz Conjecture.

Collatz Conjecture A simple python program that draws a tree for incrementing values using the Collatz Conjecture. Values which can be edited: Length

davidgasinski 1 Oct 28, 2021
A Python package for time series classification

pyts: a Python package for time series classification pyts is a Python package for time series classification. It aims to make time series classificat

Johann Faouzi 1.4k Jan 01, 2023
Python implementation of the rulefit algorithm

RuleFit Implementation of a rule based prediction algorithm based on the rulefit algorithm from Friedman and Popescu (PDF) The algorithm can be used f

Christoph Molnar 326 Jan 02, 2023
Bayesian optimization based on Gaussian processes (BO-GP) for CFD simulations.

BO-GP Bayesian optimization based on Gaussian processes (BO-GP) for CFD simulations. The BO-GP codes are developed using GPy and GPyOpt. The optimizer

KTH Mechanics 8 Mar 31, 2022
Simple linear model implementations from scratch.

Hand Crafted Models Simple linear model implementations from scratch. Table of contents Overview Project Structure Getting started Citing this project

Jonathan Sadighian 2 Sep 13, 2021
Implementation of deep learning models for time series in PyTorch.

List of Implementations: Currently, the reimplementation of the DeepAR paper(DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks

Yunkai Zhang 275 Dec 28, 2022
AI and Machine Learning with Kubeflow, Amazon EKS, and SageMaker

Data Science on AWS - O'Reilly Book Get the book on Amazon.com Book Outline Quick Start Workshop (4-hours) In this quick start hands-on workshop, you

Data Science on AWS 2.8k Jan 03, 2023
Contains an implementation (sklearn API) of the algorithm proposed in "GENDIS: GEnetic DIscovery of Shapelets" and code to reproduce all experiments.

GENDIS GENetic DIscovery of Shapelets In the time series classification domain, shapelets are small subseries that are discriminative for a certain cl

IDLab Services 90 Oct 28, 2022
A library of extension and helper modules for Python's data analysis and machine learning libraries.

Mlxtend (machine learning extensions) is a Python library of useful tools for the day-to-day data science tasks. Sebastian Raschka 2014-2021 Links Doc

Sebastian Raschka 4.2k Dec 29, 2022