A data-driven approach to quantify the value of classifiers in a machine learning ensemble.

Last update: Dec 29, 2022

Overview

Documentation | External Resources | Research Paper

Shapley is a Python library for evaluating binary classifiers in a machine learning ensemble.

The library consists of various methods to compute (approximate) the Shapley value of players (models) in weighted voting games (ensemble games) - a class of transferable utility cooperative games. We covered the exact enumeration based computation and various widely know approximation methods from economics and computer science research papers. There are also functionalities to identify the heterogeneity of the player pool based on the Shapley entropy. In addition, the framework comes with a detailed documentation, an intuitive tutorial, 100% test coverage and illustrative toy examples.

Citing

If you find Shapley useful in your research please consider adding the following citation:

@misc{rozemberczki2021shapley,
      title = {{The Shapley Value of Classifiers in Ensemble Games}}, 
      author = {Benedek Rozemberczki and Rik Sarkar},
      year = {2021},
      eprint = {2101.02153},
      archivePrefix = {arXiv},
      primaryClass = {cs.LG}
}

A simple example

Shapley makes solving voting games quite easy - see the accompanying tutorial. For example, this is all it takes to solve a weighted voting game with defined on the fly with permutation sampling:

import numpy as np
from shapley import PermutationSampler

W = np.random.uniform(0, 1, (1, 7))
W = W/W.sum()
q = 0.5

solver = PermutationSampler()
solver.solve_game(W, q)
shapley_values = solver.get_solution()

Methods Included

In detail, the following methods can be used.

Expected Marginal Contribution Approximation from Fatima et al.: A Linear Approximation Method for the Shapley Value
Multilinear Extension from Owen: Multilinear Extensions of Games
Monte Carlo Permutation Sampling from Maleki et al.: Bounding the Estimation Error of Sampling-based Shapley Value Approximation
Exact Enumeration from Shapley: A Value for N-Person Games

Head over to our documentation to find out more about installation, creation of datasets and a full list of implemented methods and available datasets. For a quick start, check out the examples in the examples/ directory.

If you notice anything unexpected, please open an issue. If you are missing a specific method, feel free to open a feature request.

Installation

$ pip install shapley

Running tests

$ python setup.py test

Running examples

$ cd examples
$ python permutation_sampler_example.py

License

MIT License

You might also like...

Scripts for training an AI to play the endless runner Subway Surfers using a supervised machine learning approach by imitation and a convolutional neural network (CNN) for image classification

About subwAI subwAI - a project for training an AI to play the endless runner Subway Surfers using a supervised machine learning approach by imitation

82 Jan 1, 2023

Comments

Error in running MLE example

Thank you for sharing your great work. I truly enjoyed reading it. However, I met an error when I tried the example. It seems to be fine for the MC example.

$ python multilinear_extension_example.py RuntimeWarning: invalid value encountered in true_divide self._Phi = self._Phi / np.sum(self._Phi, axis=1).reshape(-1, 1) Traceback (most recent call last): File "multilinear_extension_example.py", line 11, in solver.solve_game(W, q) File "/lib/python3.6/site-packages/shapley/solvers/multilinear_extension.py", line 34, in solve_game self._run_sanity_check(W, self._Phi) File "/lib/python3.6/site-packages/shapley/solution_concept.py", line 28, in _run_sanity_check self._verify_distribution(Phi) File "/lib/python3.6/site-packages/shapley/solution_concept.py", line 22, in _verify_distribution assert np.sum(Phi) - Phi.shape[0] < 0.001 AssertionError

opened by xxlya 2

Releases(v_10003)

v_10003(Apr 28, 2022)
Moves the Shapley library to an ABC based design.

Adds a version attribute.

Source code(tar.gz)
Source code(zip)
v_10002(May 16, 2021)

Source code(tar.gz)
Source code(zip)
v_10001(Feb 1, 2021)
Fixed the expectations and variances.

Source code(tar.gz)
Source code(zip)
v_10000(Dec 31, 2020)

The official first release of Shapley.
Source code(tar.gz)
Source code(zip)

A data-driven approach to quantify the value of classifiers in a machine learning ensemble.

Related tags

Overview

You might also like...

Scripts for training an AI to play the endless runner Subway Surfers using a supervised machine learning approach by imitation and a convolutional neural network (CNN) for image classification

The Python ensemble sampling toolkit for affine-invariant MCMC

Neural Ensemble Search for Performant and Calibrated Predictions

An Ensemble of CNN (Python 3.5.1 Tensorflow 1.3 numpy 1.13)

zeus is a Python implementation of the Ensemble Slice Sampling method.

Pytorch implementation of SenFormer: Efficient Self-Ensemble Framework for Semantic Segmentation

Ensemble Knowledge Guided Sub-network Search and Fine-tuning for Filter Pruning

Using Hotel Data to predict High Value And Potential VIP Guests

A Simple Key-Value Data-store written in Python

Comments

Error in running MLE example

Releases(v_10003)

v_10003(Apr 28, 2022)

v_10002(May 16, 2021)

v_10001(Feb 1, 2021)

v_10000(Dec 31, 2020)

Owner

Benedek Rozemberczki

Custom IMDB Dataset is extracted between 2020-2021 and custom distilBERT model is trained for movie success probability prediction

Reimplementation of the paper "Attention, Learn to Solve Routing Problems!" in jax/flax.

Self-Supervised Deep Blind Video Super-Resolution

Reading Group @mila-iqia on Computational Optimal Transport for Machine Learning Applications

scikit-learn: machine learning in Python

Siamese-nn-semantic-text-similarity - A repository containing comprehensive Neural Networks based PyTorch implementations for the semantic text similarity task

Based on Yolo's low-power, ultra-lightweight universal target detection algorithm, the parameter is only 250k, and the speed of the smart phone mobile terminal can reach ~300fps+

Learning hidden low dimensional dyanmics using a Generalized Onsager Principle and neural networks

Recursive Bayesian Networks

Official implementation of EdiTTS: Score-based Editing for Controllable Text-to-Speech

Python scripts using the Mediapipe models for Halloween.

Fast, flexible and fun neural networks.

Informal Persian Universal Dependency Treebank

This is the source code of the solver used to compete in the International Timetabling Competition 2019.

Unsupervised Image to Image Translation with Generative Adversarial Networks

Implementation of ConvMixer for "Patches Are All You Need? 🤷"

RITA is a family of autoregressive protein models, developed by LightOn in collaboration with the OATML group at Oxford and the Debora Marks Lab at Harvard.

Code and model benchmarks for "SEVIR : A Storm Event Imagery Dataset for Deep Learning Applications in Radar and Satellite Meteorology"

Codes for "Solving Long-tailed Recognition with Deep Realistic Taxonomic Classifier"

A Pytorch implementation of MoveNet from Google. Include training code and pre-train model.