A data-driven approach to quantify the value of classifiers in a machine learning ensemble.

Overview


PyPI Version Docs Status Repo size Code Coverage Build Status Arxiv

Documentation | External Resources | Research Paper

Shapley is a Python library for evaluating binary classifiers in a machine learning ensemble.

The library consists of various methods to compute (approximate) the Shapley value of players (models) in weighted voting games (ensemble games) - a class of transferable utility cooperative games. We covered the exact enumeration based computation and various widely know approximation methods from economics and computer science research papers. There are also functionalities to identify the heterogeneity of the player pool based on the Shapley entropy. In addition, the framework comes with a detailed documentation, an intuitive tutorial, 100% test coverage and illustrative toy examples.


Citing

If you find Shapley useful in your research please consider adding the following citation:

@misc{rozemberczki2021shapley,
      title = {{The Shapley Value of Classifiers in Ensemble Games}}, 
      author = {Benedek Rozemberczki and Rik Sarkar},
      year = {2021},
      eprint = {2101.02153},
      archivePrefix = {arXiv},
      primaryClass = {cs.LG}
}

A simple example

Shapley makes solving voting games quite easy - see the accompanying tutorial. For example, this is all it takes to solve a weighted voting game with defined on the fly with permutation sampling:

import numpy as np
from shapley import PermutationSampler

W = np.random.uniform(0, 1, (1, 7))
W = W/W.sum()
q = 0.5

solver = PermutationSampler()
solver.solve_game(W, q)
shapley_values = solver.get_solution()

Methods Included

In detail, the following methods can be used.


Head over to our documentation to find out more about installation, creation of datasets and a full list of implemented methods and available datasets. For a quick start, check out the examples in the examples/ directory.

If you notice anything unexpected, please open an issue. If you are missing a specific method, feel free to open a feature request.


Installation

$ pip install shapley

Running tests

$ python setup.py test

Running examples

$ cd examples
$ python permutation_sampler_example.py

License

You might also like...
Scripts for training an AI to play the endless runner Subway Surfers using a supervised machine learning approach by imitation and a convolutional neural network (CNN) for image classification
Scripts for training an AI to play the endless runner Subway Surfers using a supervised machine learning approach by imitation and a convolutional neural network (CNN) for image classification

About subwAI subwAI - a project for training an AI to play the endless runner Subway Surfers using a supervised machine learning approach by imitation

The Python ensemble sampling toolkit for affine-invariant MCMC

emcee The Python ensemble sampling toolkit for affine-invariant MCMC emcee is a stable, well tested Python implementation of the affine-invariant ense

Neural Ensemble Search for Performant and Calibrated Predictions
Neural Ensemble Search for Performant and Calibrated Predictions

Neural Ensemble Search Introduction This repo contains the code accompanying the paper: Neural Ensemble Search for Performant and Calibrated Predictio

 An Ensemble of CNN (Python 3.5.1 Tensorflow 1.3 numpy 1.13)
An Ensemble of CNN (Python 3.5.1 Tensorflow 1.3 numpy 1.13)

An Ensemble of CNN (Python 3.5.1 Tensorflow 1.3 numpy 1.13)

zeus is a Python implementation of the Ensemble Slice Sampling method.
zeus is a Python implementation of the Ensemble Slice Sampling method.

zeus is a Python implementation of the Ensemble Slice Sampling method. Fast & Robust Bayesian Inference, Efficient Markov Chain Monte Carlo (MCMC), Bl

Pytorch implementation of SenFormer: Efficient Self-Ensemble Framework for Semantic Segmentation
Pytorch implementation of SenFormer: Efficient Self-Ensemble Framework for Semantic Segmentation

SenFormer: Efficient Self-Ensemble Framework for Semantic Segmentation Efficient Self-Ensemble Framework for Semantic Segmentation by Walid Bousselham

Ensemble Knowledge Guided Sub-network Search and Fine-tuning for Filter Pruning
Ensemble Knowledge Guided Sub-network Search and Fine-tuning for Filter Pruning

Ensemble Knowledge Guided Sub-network Search and Fine-tuning for Filter Pruning This repository is official Tensorflow implementation of paper: Ensemb

Using Hotel Data to predict High Value And Potential VIP Guests
Using Hotel Data to predict High Value And Potential VIP Guests

Description Using hotel data and AI to predict high value guests and potential VIP guests. Hotel can leverage on prediction resutls to run more effect

A Simple Key-Value Data-store written in Python

mercury-db This is a File Based Key-Value Datastore that supports basic CRUD (Create, Read, Update, Delete) operations developed using Python. The dat

Comments
  • Error in running MLE example

    Error in running MLE example

    Thank you for sharing your great work. I truly enjoyed reading it. However, I met an error when I tried the example. It seems to be fine for the MC example.

    $ python multilinear_extension_example.py RuntimeWarning: invalid value encountered in true_divide self._Phi = self._Phi / np.sum(self._Phi, axis=1).reshape(-1, 1) Traceback (most recent call last): File "multilinear_extension_example.py", line 11, in solver.solve_game(W, q) File "/lib/python3.6/site-packages/shapley/solvers/multilinear_extension.py", line 34, in solve_game self._run_sanity_check(W, self._Phi) File "/lib/python3.6/site-packages/shapley/solution_concept.py", line 28, in _run_sanity_check self._verify_distribution(Phi) File "/lib/python3.6/site-packages/shapley/solution_concept.py", line 22, in _verify_distribution assert np.sum(Phi) - Phi.shape[0] < 0.001 AssertionError

    opened by xxlya 2
Releases(v_10003)
Owner
Benedek Rozemberczki
PhD candidate at The University of Edinburgh @cdt-data-science working on machine learning and data mining related to graph structured data.
Benedek Rozemberczki
Custom IMDB Dataset is extracted between 2020-2021 and custom distilBERT model is trained for movie success probability prediction

IMDB Success Predictor Project involves Web Scraping custom IMDB data between 2020 and 2021 of 10000 movies and shows sorted by number of votes ,fine

Gautam Diwan 1 Jan 18, 2022
Reimplementation of the paper "Attention, Learn to Solve Routing Problems!" in jax/flax.

JAX + Attention Learn To Solve Routing Problems Reinplementation of the paper Attention, Learn to Solve Routing Problems! using Jax and Flax. Fully su

Gabriela Surita 7 Dec 01, 2022
Self-Supervised Deep Blind Video Super-Resolution

Self-Blind-VSR Paper | Discussion Self-Supervised Deep Blind Video Super-Resolution By Haoran Bai and Jinshan Pan Abstract Existing deep learning-base

Haoran Bai 35 Dec 09, 2022
Reading Group @mila-iqia on Computational Optimal Transport for Machine Learning Applications

Computational Optimal Transport for Machine Learning Reading Group Over the last few years, optimal transport (OT) has quickly become a central topic

Ali Harakeh 11 Aug 26, 2022
scikit-learn: machine learning in Python

scikit-learn is a Python module for machine learning built on top of SciPy and is distributed under the 3-Clause BSD license. The project was started

scikit-learn 52.5k Jan 08, 2023
Based on Yolo's low-power, ultra-lightweight universal target detection algorithm, the parameter is only 250k, and the speed of the smart phone mobile terminal can reach ~300fps+

Based on Yolo's low-power, ultra-lightweight universal target detection algorithm, the parameter is only 250k, and the speed of the smart phone mobile terminal can reach ~300fps+

567 Dec 26, 2022
Learning hidden low dimensional dyanmics using a Generalized Onsager Principle and neural networks

OnsagerNet Learning hidden low dimensional dyanmics using a Generalized Onsager Principle and neural networks This is the original pyTorch implemenati

Haijun.Yu 3 Aug 24, 2022
Recursive Bayesian Networks

Recursive Bayesian Networks This repository contains the code to reproduce the results from the NeurIPS 2021 paper Lieck R, Rohrmeier M (2021) Recursi

Robert Lieck 11 Oct 18, 2022
Official implementation of EdiTTS: Score-based Editing for Controllable Text-to-Speech

EdiTTS: Score-based Editing for Controllable Text-to-Speech Official implementation of EdiTTS: Score-based Editing for Controllable Text-to-Speech. Au

Neosapience 98 Dec 25, 2022
Python scripts using the Mediapipe models for Halloween.

Mediapipe-Halloween-Examples Python scripts using the Mediapipe models for Halloween. WHY Mainly for fun. But this repository also includes useful exa

Ibai Gorordo 23 Jan 06, 2023
Fast, flexible and fun neural networks.

Brainstorm Discontinuation Notice Brainstorm is no longer being maintained, so we recommend using one of the many other,available frameworks, such as

IDSIA 1.3k Nov 21, 2022
Informal Persian Universal Dependency Treebank

Informal Persian Universal Dependency Treebank (iPerUDT) Informal Persian Universal Dependency Treebank, consisting of 3000 sentences and 54,904 token

Roya Kabiri 0 Jan 05, 2022
This is the source code of the solver used to compete in the International Timetabling Competition 2019.

ITC2019 Solver This is the source code of the solver used to compete in the International Timetabling Competition 2019. Building .NET Core (2.1 or hig

Edon Gashi 8 Jan 22, 2022
Unsupervised Image to Image Translation with Generative Adversarial Networks

Unsupervised Image to Image Translation with Generative Adversarial Networks Paper: Unsupervised Image to Image Translation with Generative Adversaria

Hao 71 Oct 30, 2022
Implementation of ConvMixer for "Patches Are All You Need? 🤷"

Patches Are All You Need? 🤷 This repository contains an implementation of ConvMixer for the ICLR 2022 submission "Patches Are All You Need?" by Asher

CMU Locus Lab 934 Jan 08, 2023
RITA is a family of autoregressive protein models, developed by LightOn in collaboration with the OATML group at Oxford and the Debora Marks Lab at Harvard.

RITA: a Study on Scaling Up Generative Protein Sequence Models RITA is a family of autoregressive protein models, developed by a collaboration of Ligh

LightOn 69 Dec 22, 2022
Code and model benchmarks for "SEVIR : A Storm Event Imagery Dataset for Deep Learning Applications in Radar and Satellite Meteorology"

NeurIPS 2020 SEVIR Code for paper: SEVIR : A Storm Event Imagery Dataset for Deep Learning Applications in Radar and Satellite Meteorology Requirement

USAF - MIT Artificial Intelligence Accelerator 46 Dec 15, 2022
Codes for "Solving Long-tailed Recognition with Deep Realistic Taxonomic Classifier"

Deep-RTC [project page] This repository contains the source code accompanying our ECCV 2020 paper. Solving Long-tailed Recognition with Deep Realistic

Gina Wu 16 May 26, 2022
A Pytorch implementation of MoveNet from Google. Include training code and pre-train model.

Movenet.Pytorch Intro MoveNet is an ultra fast and accurate model that detects 17 keypoints of a body. This is A Pytorch implementation of MoveNet fro

Mr.Fire 241 Dec 26, 2022