Interpretability and explainability of data and machine learning models

Overview

AI Explainability 360 (v0.2.1)

Build Status Documentation Status PyPI version

The AI Explainability 360 toolkit is an open-source library that supports interpretability and explainability of datasets and machine learning models. The AI Explainability 360 Python package includes a comprehensive set of algorithms that cover different dimensions of explanations along with proxy explainability metrics.

The AI Explainability 360 interactive experience provides a gentle introduction to the concepts and capabilities by walking through an example use case for different consumer personas. The tutorials and example notebooks offer a deeper, data scientist-oriented introduction. The complete API is also available.

There is no single approach to explainability that works best. There are many ways to explain: data vs. model, directly interpretable vs. post hoc explanation, local vs. global, etc. It may therefore be confusing to figure out which algorithms are most appropriate for a given use case. To help, we have created some guidance material and a chart that can be consulted.

We have developed the package with extensibility in mind. This library is still in development. We encourage the contribution of your explainability algorithms and metrics. To get started as a contributor, please join the AI Explainability 360 Community on Slack by requesting an invitation here. Please review the instructions to contribute code here.

Supported explainability algorithms

Data explanation

Local post-hoc explanation

Local direct explanation

Global direct explanation

Global post-hoc explanation 

Supported explainability metrics

Setup

Supported Configurations:

OS Python version
macOS 3.6
Ubuntu 3.6
Windows 3.6

(Optional) Create a virtual environment

AI Explainability 360 requires specific versions of many Python packages which may conflict with other projects on your system. A virtual environment manager is strongly recommended to ensure dependencies may be installed safely. If you have trouble installing the toolkit, try this first.

Conda

Conda is recommended for all configurations though Virtualenv is generally interchangeable for our purposes. Miniconda is sufficient (see the difference between Anaconda and Miniconda if you are curious) and can be installed from here if you do not already have it.

Then, to create a new Python 3.6 environment, run:

conda create --name aix360 python=3.6
conda activate aix360

The shell should now look like (aix360) $. To deactivate the environment, run:

(aix360)$ conda deactivate

The prompt will return back to $ or (base)$.

Note: Older versions of conda may use source activate aix360 and source deactivate (activate aix360 and deactivate on Windows).

Installation

Clone the latest version of this repository:

(aix360)$ git clone https://github.com/Trusted-AI/AIX360

If you'd like to run the examples and tutorial notebooks, download the datasets now and place them in their respective folders as described in aix360/data/README.md.

Then, navigate to the root directory of the project which contains setup.py file and run:

(aix360)$ pip install -e .

Using AI Explainability 360

The examples directory contains a diverse collection of jupyter notebooks that use AI Explainability 360 in various ways. Both examples and tutorial notebooks illustrate working code using the toolkit. Tutorials provide additional discussion that walks the user through the various steps of the notebook. See the details about tutorials and examples here.

Citing AI Explainability 360

A technical description of AI Explainability 360 is available in this paper. Below is the bibtex entry for this paper.

@misc{aix360-sept-2019,
title = "One Explanation Does Not Fit All: A Toolkit and Taxonomy of AI Explainability Techniques",
author = {Vijay Arya and Rachel K. E. Bellamy and Pin-Yu Chen and Amit Dhurandhar and Michael Hind
and Samuel C. Hoffman and Stephanie Houde and Q. Vera Liao and Ronny Luss and Aleksandra Mojsilovi\'c
and Sami Mourad and Pablo Pedemonte and Ramya Raghavendra and John Richards and Prasanna Sattigeri
and Karthikeyan Shanmugam and Moninder Singh and Kush R. Varshney and Dennis Wei and Yunfeng Zhang},
month = sept,
year = {2019},
url = {https://arxiv.org/abs/1909.03012}
}

AIX360 Videos

  • Introductory video to AI Explainability 360 by Vijay Arya and Amit Dhurandhar, September 5, 2019 (35 mins)

Acknowledgements

AIX360 is built with the help of several open source packages. All of these are listed in setup.py and some of these include:

License Information

Please view both the LICENSE file and the folder supplementary license present in the root directory for license information.

Comments
  • ProtoDash: local variable 'newinnerProduct' referenced before assignment

    ProtoDash: local variable 'newinnerProduct' referenced before assignment

    I am using the HELOC Dataset and trying to explain a single test instance using prototypes from my training subset using below code:

    explainer = ProtodashExplainer() (W, S, _) = explainer.explain(dfTrain.to_numpy(), dfTest.iloc[0:1,:].to_numpy(), m=2)

    However, I am getting below error: Screen Shot 2020-05-22 at 01 58 42

    Is this intentional? Please help.

    Thank you

    opened by laramdemajo 8
  • Add rule induction algorithms

    Add rule induction algorithms

    Includes the Ripper algorithm and TRXF ruleset exchange format.

    The rule_induction directory will eventually contain a set of closely related algorithms (drop-in replacements) that are used to induce and export rule set in the common TRXF format for consumption by AIMEE, ADS, and RedHat Decision Manager, etc. This originates from the internal aix360i:ripper branch, and we intend to migrate parts of that code as the quality bar is achieved.

    In particular, as previously discussed with @vijay-arya, the reason for this migration to the public repo is to provide the more technical ADS (CP4BA) clients with the means to programmatically generate their own rule sets without relying on the AIMEE GUI.

    Required dependencies:

    • numpy
    • pandas
    • sklearn
    • nyoka
    • xmltodict
    • numba
    opened by kmyusk 7
  • ExternalRiskEstimate seems to be hard coded into HELOC data processing, but I cannot find it.

    ExternalRiskEstimate seems to be hard coded into HELOC data processing, but I cannot find it.

    Screen Shot 2021-05-19 at 11 51 10 AM

    If I change the name:


    ValueError Traceback (most recent call last) in 2 from aix360.algorithms.rbm import FeatureBinarizer 3 fb = FeatureBinarizer(negations=True, returnOrd=True) ----> 4 dfTrain, dfTrainStd = fb.fit_transform(dfTrain) 5 dfTest, dfTestStd = fb.transform(dfTest) 6 dfTrain['MostRecentBillAmountRaw'].head()

    ~/opt/anaconda3/envs/aix360/lib/python3.6/site-packages/sklearn/base.py in fit_transform(self, X, y, **fit_params) 697 if y is None: 698 # fit method of arity 1 (unsupervised transformation) --> 699 return self.fit(X, **fit_params).transform(X) 700 else: 701 # fit method of arity 2 (supervised transformation)

    ~/PycharmProjects/AIX360/aix360/algorithms/rbm/features.py in fit(self, X) 111 self.ordinal = ordinal 112 # Fit StandardScaler to ordinal features --> 113 self.scaler = StandardScaler().fit(data[ordinal]) 114 return self 115

    ~/opt/anaconda3/envs/aix360/lib/python3.6/site-packages/sklearn/preprocessing/_data.py in fit(self, X, y, sample_weight) 728 # Reset internal state before fitting 729 self._reset() --> 730 return self.partial_fit(X, y, sample_weight) 731 732 def partial_fit(self, X, y=None, sample_weight=None):

    ~/opt/anaconda3/envs/aix360/lib/python3.6/site-packages/sklearn/preprocessing/_data.py in partial_fit(self, X, y, sample_weight) 766 X = self._validate_data(X, accept_sparse=('csr', 'csc'), 767 estimator=self, dtype=FLOAT_DTYPES, --> 768 force_all_finite='allow-nan', reset=first_call) 769 n_features = X.shape[1] 770

    ~/opt/anaconda3/envs/aix360/lib/python3.6/site-packages/sklearn/base.py in _validate_data(self, X, y, reset, validate_separately, **check_params) 419 out = X 420 elif isinstance(y, str) and y == 'no_validation': --> 421 X = check_array(X, **check_params) 422 out = X 423 else:

    ~/opt/anaconda3/envs/aix360/lib/python3.6/site-packages/sklearn/utils/validation.py in inner_f(*args, **kwargs) 61 extra_args = len(args) - len(all_args) 62 if extra_args <= 0: ---> 63 return f(*args, **kwargs) 64 65 # extra_args > 0

    ~/opt/anaconda3/envs/aix360/lib/python3.6/site-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator) 538 539 if all(isinstance(dtype, np.dtype) for dtype in dtypes_orig): --> 540 dtype_orig = np.result_type(*dtypes_orig) 541 542 if dtype_numeric:

    <array_function internals> in result_type(*args, **kwargs)

    ValueError: at least one array or dtype is required

    opened by BrianBlackman 5
  • beam_search_K1 with pandas > 1.1.0

    beam_search_K1 with pandas > 1.1.0

    With pandas version > 1.1.0, the line 148 (145, and 150) return an error: ValueError: cannot reindex from a duplicate axis. Locally, locally I just added '.values'. Example of line 148: colKeep[i[0]] = ((Xp[i[0]].columns.get_level_values(0) == '<=') & (thresh > i[2])).values

    opened by Hugomiralles 5
  • Fixing the error occured when using negated binary columns with FeatureBinarizer

    Fixing the error occured when using negated binary columns with FeatureBinarizer

    Issue: #112

    I've just noticed that the FeatureBinarizer, when including the negated columns as well, does not work when using a dataset where there is a binary categorical feature. ~~That's probably another Pandas version error, where the 1.0.0 or newer Pandas versions work significantly different than they previously did.~~ (Got the error using Pandas 0.25.3)

    When calling fb.fit_transform(<dataset_with_binary_category>, negations=True) The error message was: TypeError: unsupported operand type(s) for -: 'int' and 'Categorical'
    At line 142. in function transform(): A[(str(c), 'not', '')] = 1 - A[(str(c), '', '')]
    where A[(str(c), '', '')] = data[c].map(maps[c]) and c is a specific column

    At that line the substraction does not work, because the Series A[(str(c), '', '')] is categorical.

    Solution:
    For a solution just convert the type of A[(str(c), '', '')] to integer as A[(str(c), '', '')] = data[c].map(maps[c]).astype(int). Although it could be solvable in many formats, I've seen the pattern astype(int) elsewhere in the codebase, so I hope that the solution is satisfactory.

    opened by gaborpelesz 4
  • BRCG train fails in copied

    BRCG train fails in copied "Credit Approval Tutorial" code

    Hi there,

    I've actually copied the code (did no modification at all) from the BRCG part of the "Credit Approval Tutorial" code and ran into errors. I'm quite sure that the dataset was loaded appropriately, as I have also trained a scikit learn Decision Tree Classifier on it with no problem and in the same notebook.

    Can someone help me with this issue? Am I missing something or is it an internal problem?

    Thanks in advance!

    Here is the code and the output. It was run on google colab, with pandas 1.1.2 and the latest aix360 release, which is 0.2.0.

    Copied code

    import warnings
    warnings.filterwarnings('ignore')
    
    # Load FICO HELOC data with special values converted to np.nan
    from aix360.datasets.heloc_dataset import HELOCDataset, nan_preprocessing
    data = HELOCDataset(custom_preprocessing=nan_preprocessing).data()
    # Separate target variable
    y = data.pop('RiskPerformance')
    
    # Split data into training and test sets using fixed random seed
    from sklearn.model_selection import train_test_split
    dfTrain, dfTest, yTrain, yTest = train_test_split(data, y, random_state=0, stratify=y)
    dfTrain.head().transpose()
    
    # Binarize data and also return standardized ordinal features
    from aix360.algorithms.rbm import FeatureBinarizer
    fb = FeatureBinarizer(negations=True, returnOrd=True)
    dfTrain, dfTrainStd = fb.fit_transform(dfTrain)
    dfTest, dfTestStd = fb.transform(dfTest)
    dfTrain['ExternalRiskEstimate'].head()
    
    # Instantiate BRCG with small complexity penalty and large beam search width
    from aix360.algorithms.rbm import BooleanRuleCG
    br = BooleanRuleCG(lambda0=1e-3, lambda1=1e-3, CNF=True)
    
    # Train, print, and evaluate model
    br.fit(dfTrain, yTrain)
    from sklearn.metrics import accuracy_score
    print('Training accuracy:', accuracy_score(yTrain, br.predict(dfTrain)))
    print('Test accuracy:', accuracy_score(yTest, br.predict(dfTest)))
    print('Predict Y=0 if ANY of the following rules are satisfied, otherwise Y=1:')
    print(br.explain()['rules'])
    

    Output

    Learning CNF rule with complexity parameters lambda0=0.001, lambda1=0.001
    Initial LP solved
    ---------------------------------------------------------------------------
    KeyError                                  Traceback (most recent call last)
    /usr/local/lib/python3.6/dist-packages/pandas/core/series.py in __setitem__(self, key, value)
       1001         try:
    -> 1002             self._set_with_engine(key, value)
       1003         except (KeyError, ValueError):
    
    /usr/local/lib/python3.6/dist-packages/pandas/core/series.py in _set_with_engine(self, key, value)
       1032         # fails with AttributeError for IntervalIndex
    -> 1033         loc = self.index._engine.get_loc(key)
       1034         validate_numeric_casting(self.dtype, value)
    
    pandas/_libs/index.pyx in pandas._libs.index.BaseMultiIndexCodesEngine.get_loc()
    
    KeyError: 'ExternalRiskEstimate'
    
    During handling of the above exception, another exception occurred:
    
    ValueError                                Traceback (most recent call last)
    <ipython-input-98-8d81fbd6c0e1> in <module>()
         26 
         27 # Train, print, and evaluate model
    ---> 28 br.fit(dfTrain, yTrain)
         29 from sklearn.metrics import accuracy_score
         30 print('Training accuracy:', accuracy_score(yTrain, br.predict(dfTrain)))
    
    /usr/local/lib/python3.6/dist-packages/aix360/algorithms/rbm/boolean_rule_cg.py in fit(self, X, y)
        118         UB = min(UB.min(), 0)
        119         v, zNew, Anew = beam_search(r, X, self.lambda0, self.lambda1,
    --> 120                                     K=self.K, UB=UB, D=self.D, B=self.B, eps=self.eps)
        121 
        122         while (v < -self.eps).any() and (self.it < self.iterMax):
    
    /usr/local/lib/python3.6/dist-packages/aix360/algorithms/rbm/beam_search.py in beam_search(r, X, lambda0, lambda1, K, UB, D, B, wLB, eps, stopEarly)
        285             if i[1] == '<=':
        286                 thresh = Xp[i[0]].columns.get_level_values(1).to_series().replace('NaN', np.nan)
    --> 287                 colKeep[i[0]] = (Xp[i[0]].columns.get_level_values(0) == '>') & (thresh < i[2])
        288             elif i[1] == '>':
        289                 thresh = Xp[i[0]].columns.get_level_values(1).to_series().replace('NaN', np.nan)
    
    /usr/local/lib/python3.6/dist-packages/pandas/core/series.py in __setitem__(self, key, value)
       1008             else:
       1009                 # GH#12862 adding an new key to the Series
    -> 1010                 self.loc[key] = value
       1011 
       1012         except TypeError as e:
    
    /usr/local/lib/python3.6/dist-packages/pandas/core/indexing.py in __setitem__(self, key, value)
        668 
        669         iloc = self if self.name == "iloc" else self.obj.iloc
    --> 670         iloc._setitem_with_indexer(indexer, value)
        671 
        672     def _validate_key(self, key, axis: int):
    
    /usr/local/lib/python3.6/dist-packages/pandas/core/indexing.py in _setitem_with_indexer(self, indexer, value)
       1790                 # setting for extensionarrays that store dicts. Need to decide
       1791                 # if it's worth supporting that.
    -> 1792                 value = self._align_series(indexer, Series(value))
       1793 
       1794             elif isinstance(value, ABCDataFrame):
    
    /usr/local/lib/python3.6/dist-packages/pandas/core/indexing.py in _align_series(self, indexer, ser, multiindex_indexer)
       1909             # series, so need to broadcast (see GH5206)
       1910             if sum_aligners == self.ndim and all(is_sequence(_) for _ in indexer):
    -> 1911                 ser = ser.reindex(obj.axes[0][indexer[0]], copy=True)._values
       1912 
       1913                 # single indexer
    
    /usr/local/lib/python3.6/dist-packages/pandas/core/series.py in reindex(self, index, **kwargs)
       4397     )
       4398     def reindex(self, index=None, **kwargs):
    -> 4399         return super().reindex(index=index, **kwargs)
       4400 
       4401     def drop(
    
    /usr/local/lib/python3.6/dist-packages/pandas/core/generic.py in reindex(self, *args, **kwargs)
       4457         # perform the reindex on the axes
       4458         return self._reindex_axes(
    -> 4459             axes, level, limit, tolerance, method, fill_value, copy
       4460         ).__finalize__(self, method="reindex")
       4461 
    
    /usr/local/lib/python3.6/dist-packages/pandas/core/generic.py in _reindex_axes(self, axes, level, limit, tolerance, method, fill_value, copy)
       4480                 fill_value=fill_value,
       4481                 copy=copy,
    -> 4482                 allow_dups=False,
       4483             )
       4484 
    
    /usr/local/lib/python3.6/dist-packages/pandas/core/generic.py in _reindex_with_indexers(self, reindexers, fill_value, copy, allow_dups)
       4525                 fill_value=fill_value,
       4526                 allow_dups=allow_dups,
    -> 4527                 copy=copy,
       4528             )
       4529             # If we've made a copy once, no need to make another one
    
    /usr/local/lib/python3.6/dist-packages/pandas/core/internals/managers.py in reindex_indexer(self, new_axis, indexer, axis, fill_value, allow_dups, copy, consolidate)
       1274         # some axes don't allow reindexing with dups
       1275         if not allow_dups:
    -> 1276             self.axes[axis]._can_reindex(indexer)
       1277 
       1278         if axis >= self.ndim:
    
    /usr/local/lib/python3.6/dist-packages/pandas/core/indexes/base.py in _can_reindex(self, indexer)
       3283         # trying to reindex on an axis with duplicates
       3284         if not self.is_unique and len(indexer):
    -> 3285             raise ValueError("cannot reindex from a duplicate axis")
       3286 
       3287     def reindex(self, target, method=None, level=None, limit=None, tolerance=None):
    
    ValueError: cannot reindex from a duplicate axis
    
    opened by gaborpelesz 4
  • ValueError with using ProtoDash to get the Prototypes of a Dataset

    ValueError with using ProtoDash to get the Prototypes of a Dataset

    Hi!

    I'm encountering an error with a simple use case of ProtoDash to get prototypes of a given dataset. Here's an example that triggers the error:

    import pandas as pd
    from sklearn import datasets
    from aix360.algorithms.protodash import PDASH
    
    # Load Iris
    X, y = datasets.load_iris(True)
    df = pd.DataFrame(X, columns=range(X.shape[1]))
    df['y'] = y
    
    tmp = df[df['y'] == 0].drop('y', axis=1).values
    X_1 = PDASH.HeuristicSetSelection(X=tmp, Y=tmp, m=10, kernelType='gaussian', sigma=2)
    
    # This generates an error:
    # ---------------------------------------------------------------------------
    # ValueError                                Traceback (most recent call last)
    # <ipython-input-48-e631ba33f62a> in <module>
    #      1 tmp = df[df['y'] == 0].drop('y', axis=1).values
    # ----> 2 X_1 = PDASH.HeuristicSetSelection(X=tmp, Y=tmp, m=10, kernelType='gaussian', sigma=2)
    #
    # c:\users\pc\aix\aix360\aix360\algorithms\protodash\PDASH_utils.py in HeuristicSetSelection(X, Y, m, kernelType, sigma)
    #    267             currK = K2
    #    268             if maxGradient <= 0:
    #--> 269                 newCurrOptw = np.vstack((currOptw[:], np.array([0])))
    #    270                 newCurrSetValue = currSetValue
    #    271             else:
    #
    #~\AppData\Local\Continuum\anaconda3\envs\aix360\lib\site-packages\numpy\core\shape_base.py in vstack(tup)
    #    281     """
    #    282     _warn_for_nonsequence(tup)
    #--> 283     return _nx.concatenate([atleast_2d(_m) for _m in tup], 0)
    #    284 
    #    285 
    #
    #ValueError: all the input array dimensions except for the concatenation axis must match exactly
    

    Interestingly, the error does not pop up for m < 10.

    Is this a bug or am I using it incorrectly?

    Thanks,

    opened by hadrianpaulo 4
  • ModuleNotFoundError: No module named 'aix360.algorithms.rule_induction'

    ModuleNotFoundError: No module named 'aix360.algorithms.rule_induction'

    Hi,

    I tried to run an example notebook via Docker.

    I followed the instructions to build the Docker image and run the Jupyter server here

    However, when I run the notebook examples/rule_induction/brcg_demo.ipynb

    Importing the dependent libs simply gave:

    ---------------------------------------------------------------------------
    ModuleNotFoundError                       Traceback (most recent call last)
    <ipython-input-3-33e99aef874b> in <module>
          3 from sklearn.model_selection import train_test_split
          4 from sklearn.metrics import precision_score, recall_score, accuracy_score, balanced_accuracy_score
    ----> 5 from aix360.algorithms.rule_induction.rbm.boolean_rule_cg import BooleanRuleCG as BRCG
          6 from aix360.algorithms.rbm import FeatureBinarizer
          7 import time
    
    ModuleNotFoundError: No module named 'aix360.algorithms.rule_induction'
    
    opened by xiaohan2012 3
  • CEMExplainer scikit-learn compatibility &

    CEMExplainer scikit-learn compatibility & "predict_long" method error

    Hello! I've got a problem you might be able to help me with.

    I'm trying to use CEM to explain scikit-learn binary classification models. I'm using RandomForestClassifier & KNeighborsClassifier to be more exact.

    Here's a snipet using K-Neighbors:

    from sklearn.neighbors import KNeighborsClassifier
    
    model_kneighbors = KNeighborsClassifier()
    
    fit_kneighbors = model_kneighbors.fit(X_train, y_train_)
    y_pred_kneighbors = model_kneighbors.predict(X_test)
    
    cem_kneighbors_clas = CEMExplainer(model_kneighbors)
    
    arg_mode = 'PN'             # Find pertinent negatives
    arg_max_iter = 1000         # Maximum number of iterations to search for the optimal PN for given parameter settings
    arg_init_const = 10.0       # Initial coefficient value for main loss term that encourages class change
    arg_b = 9                   # No. of updates to the coefficient of the main loss term
    arg_kappa = 0.2             # Minimum confidence gap between the PNs (changed) class probability and original class' probability
    arg_beta = 1e-1             # Controls sparsity of the solution (L1 loss)
    arg_gamma = 100             # Controls how much to adhere to a (optionally trained) auto-encoder
    my_AE_model = None          # Pointer to an auto-encoder
    arg_alpha = 0.01            # Penalizes L2 norm of the solution
    arg_threshold = 1.          # Automatically turn off features <= arg_threshold if arg_threshold < 1
    arg_offset = 0.5            # the model assumes classifier trained on data normalized
                                # in [-arg_offset, arg_offset] range, where arg_offset is 0 or 0.5
    
    (adv_pn, delta_pn, info_pn) = cem_kneighbors_clas.explain_instance(
        input_X=X_to_explain_clas,      # input_X (numpy.ndarray) – input instance to be explained
        arg_mode=arg_mode,              # arg_mode (str) – ‘PP’ or ‘PN’
        AE_model=my_AE_model,           # AE_model – Auto-encoder model
        arg_kappa=arg_kappa,            # arg_kappa (double) – Confidence gap between desired class and other classes
        arg_b=arg_b,                    # arg_b (double) – Number of different weightings of loss function to try
        arg_max_iter=arg_max_iter,      # arg_max_iter (int) – For each weighting of loss function number of iterations to search
        arg_init_const=arg_init_const,  # arg_init_const (double) – Initial weighting of loss function
        arg_beta=arg_beta,              # arg_beta (double) – Weighting of L1 loss
        arg_gamma=arg_gamma             # arg_gamma (double) – Weighting of auto-encoder                  
    )
    

    When I try to run it I get: 'KNeighborsClassifier' object has no attribute 'predict_long'. I checked out the CEM implementation and found the predict_long call. I also tried the HELOC tutorial example and it all works well, but with a classifier named KerasClassifier, that I think has been deprecated? I can't find documentation for it anywhere.

    Based on this, I've got a few questions:

    1. What estimators are compatible with CEM? The other classes are compatible with scikit-learn, but perhaps CEM is for NNs or TensorFlow only? I skimmed the paper but didn't quite catch that. I know there's a variation that deals with images, but "regular" CEM should work for tabular data, correct? Since ProtoDash is compatible with these estimators, I assumed CEM was too.
    2. Perhaps I simply need to downgrade the package? I didn't install anything in particular in my Jupyter Notebook, just aix360 (didn't specify a version either, just ran pip install aix360).
    3. If I were to "tweak" the implementation and change predict_long for the usual scikit-learn predict method, would it work? I don't fully understand the code, to be honest.
    4. If scikit-learn estimators aren't supported for CEM at the moment, is there an implementation planned for the future?

    Superb work with the library, by the way. I'm a huge fan of IBM Research and the amazing work you all do.

    Regards from a fellow IBMer!

    Kindly, Josefina

    opened by josefinarcasanova 3
  • PMML export enhancements and 3.8-3.6 compatibility of rule induction code

    PMML export enhancements and 3.8-3.6 compatibility of rule induction code

    • Categorical datafields with str type now includes the list of possible/legal values in the PMML file.
    • Fixed one test that was failing in python 3.8 in rule induction

    More details in the individual commit messages.

    Edit: It was 3.8 not 3.7

    opened by kmyusk 2
  • Add Matching Explainer Algorithm

    Add Matching Explainer Algorithm

    Summary

    Adding a White Box Explainer for Matchings, as described in the upcoming ICML 2022 publication.

    • the below algorithm is implemented in a python package, which is imported into AIX360

    Fabian Lim, Laura Wynter, Shiau Hong Lim. 2022. "Order Constraints in Optimal Transport". https://arxiv.org/abs/2110.07275.

    Algorithm

    Given a matching, provide an explaination in terms of returning alternate matchings that each focus on a sparse set of salient matches.

    • inherit from the LocalWBExplainer since it requires access to the internal coefficients of a matching (white box) and does not require retraining over a dataset (locality)

    Package Dependencies

    In setup.py we have the following dependency that is installed as an egg

    In examples/matching/matching-pairs-of-sentences.ipynb we ask the user to install the below packages in order to execute the demo

    • POT==0.7.0

    Example

    An NLP-based example inspired from one of the figures in the paper is provided

    TODO

    • [x] algorithm has been added in here.
    • [x] semantic dataset used in examples has been included in here.
    • [x] examples has been included in here. With the following 2 subdirectories
      • data: storing NLP embeddings that will be downloaded for the demo
      • models: contains code for an NLP embedding model used in demo
      • utils: utility functions for the demo
    • [x] docs have been updated. Sphinx has been run locally and tested.
    • [x] Update READMEs in examples, top-level directory, examples, etc
    • [x] tests has been included in here.
      • data: each test case is specified in a .json file and stored here.
    • [x] Update setup.py. The python package is hosted publically at https://github.com/IBM/otoc.
      • [x] This repo is placed in install_requires as an egg link
      • [x] Update the GitHub link in the top-level README
    • [ ] Discuss with @vijay-arya on where to move the below items
      • examples/matching/data
      • examples/matching/models
      • tests/matching/data
    opened by fabianlim 2
  • Error:

    Error: "elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison"

    When trying to execute protodash, getting the error: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison. Any reason why?

    opened by survivebycoding 0
  • Add CodeQL workflow for GitHub code scanning

    Add CodeQL workflow for GitHub code scanning

    Hi Trusted-AI/AIX360!

    This is a one-off automatically generated pull request from LGTM.com :robot:. You might have heard that we’ve integrated LGTM’s underlying CodeQL analysis engine natively into GitHub. The result is GitHub code scanning!

    With LGTM fully integrated into code scanning, we are focused on improving CodeQL within the native GitHub code scanning experience. In order to take advantage of current and future improvements to our analysis capabilities, we suggest you enable code scanning on your repository. Please take a look at our blog post for more information.

    This pull request enables code scanning by adding an auto-generated codeql.yml workflow file for GitHub Actions to your repository — take a look! We tested it before opening this pull request, so all should be working :heavy_check_mark:. In fact, you might already have seen some alerts appear on this pull request!

    Where needed and if possible, we’ve adjusted the configuration to the needs of your particular repository. But of course, you should feel free to tweak it further! Check this page for detailed documentation.

    Questions? Check out the FAQ below!

    FAQ

    Click here to expand the FAQ section

    How often will the code scanning analysis run?

    By default, code scanning will trigger a scan with the CodeQL engine on the following events:

    • On every pull request — to flag up potential security problems for you to investigate before merging a PR.
    • On every push to your default branch and other protected branches — this keeps the analysis results on your repository’s Security tab up to date.
    • Once a week at a fixed time — to make sure you benefit from the latest updated security analysis even when no code was committed or PRs were opened.

    What will this cost?

    Nothing! The CodeQL engine will run inside GitHub Actions, making use of your unlimited free compute minutes for public repositories.

    What types of problems does CodeQL find?

    The CodeQL engine that powers GitHub code scanning is the exact same engine that powers LGTM.com. The exact set of rules has been tweaked slightly, but you should see almost exactly the same types of alerts as you were used to on LGTM.com: we’ve enabled the security-and-quality query suite for you.

    How do I upgrade my CodeQL engine?

    No need! New versions of the CodeQL analysis are constantly deployed on GitHub.com; your repository will automatically benefit from the most recently released version.

    The analysis doesn’t seem to be working

    If you get an error in GitHub Actions that indicates that CodeQL wasn’t able to analyze your code, please follow the instructions here to debug the analysis.

    How do I disable LGTM.com?

    If you have LGTM’s automatic pull request analysis enabled, then you can follow these steps to disable the LGTM pull request analysis. You don’t actually need to remove your repository from LGTM.com; it will automatically be removed in the next few months as part of the deprecation of LGTM.com (more info here).

    Which source code hosting platforms does code scanning support?

    GitHub code scanning is deeply integrated within GitHub itself. If you’d like to scan source code that is hosted elsewhere, we suggest that you create a mirror of that code on GitHub.

    How do I know this PR is legitimate?

    This PR is filed by the official LGTM.com GitHub App, in line with the deprecation timeline that was announced on the official GitHub Blog. The proposed GitHub Action workflow uses the official open source GitHub CodeQL Action. If you have any other questions or concerns, please join the discussion here in the official GitHub community!

    I have another question / how do I get in touch?

    Please join the discussion here to ask further questions and send us suggestions!

    opened by lgtm-com[bot] 0
  • TRXF pmml scorecard reader

    TRXF pmml scorecard reader

    Implemented the reader portion of the scorecard pmml export functionality. Together with the writer portion https://github.com/Trusted-AI/AIX360/pull/166, exporting TRXF scorecards to pmml is possible.

    opened by kmyusk 1
  • Questions about the results obtained by XAI method

    Questions about the results obtained by XAI method

    I found a strange phenomenon. For the same model, the same training sample and test sample, other operations are identical. Theoretically, the values obtained by using the XAI method (like Saliency) to evaluate the interpretability of the model should be the same. However, I retrained a new model, and the interpretability values obtained are completely different from those obtained from the previous model. Does anyone know why this happens? The interpretability value is completely unstable, and the results cannot be reproduced. Unless I completely save this model after training it, and then reload this parameter, the results will be the same. Does anyone know why

    opened by 9527-ly 0
  • Ripper rule induction algorithm treats timestamp type features as categorical

    Ripper rule induction algorithm treats timestamp type features as categorical

    Ripper algorithm recognizes timestamp features (e.g. 2022-06-14-19.39.35.929641) as integers, and thus encodes them to categorical features. The resulting rules are in terms of equality predicates (e.g. timestamp == 2022-06-14-19.39.35.929641) instead of intervals/inequalities as one would expect.

    Proper timestamp type support for Ripper would be nice.

    opened by kmyusk 0
Releases(v0.2.1)
  • v0.2.1(Oct 28, 2020)

    • Minor update to CEM parameters
    • FeatureBinarizerFromTrees for Directly interpretable explainers
    • Minor updates to BRCG due to Pandas update
    • Updates to Heloc tutorial
    • Abstraction class for global black box
    • comment updates to protodash
    • Minor bug fixes
    • License updates
    Source code(tar.gz)
    Source code(zip)
  • v0.2.0(Dec 9, 2019)

Owner
This GitHub org hosts LF AI Foundation projects in the category of Trusted and Responsible AI.
JittorVis - Visual understanding of deep learning model.

JittorVis - Visual understanding of deep learning model.

182 Jan 06, 2023
FairML - is a python toolbox auditing the machine learning models for bias.

======== FairML: Auditing Black-Box Predictive Models FairML is a python toolbox auditing the machine learning models for bias. Description Predictive

Julius Adebayo 338 Nov 09, 2022
Auralisation of learned features in CNN (for audio)

AuralisationCNN This repo is for an example of auralisastion of CNNs that is demonstrated on ISMIR 2015. Files auralise.py: includes all required func

Keunwoo Choi 39 Nov 19, 2022
Many Class Activation Map methods implemented in Pytorch for CNNs and Vision Transformers. Including Grad-CAM, Grad-CAM++, Score-CAM, Ablation-CAM and XGrad-CAM

Class Activation Map methods implemented in Pytorch pip install grad-cam ⭐ Comprehensive collection of Pixel Attribution methods for Computer Vision.

Jacob Gildenblat 6.5k Jan 01, 2023
Visual analysis and diagnostic tools to facilitate machine learning model selection.

Yellowbrick Visual analysis and diagnostic tools to facilitate machine learning model selection. What is Yellowbrick? Yellowbrick is a suite of visual

District Data Labs 3.9k Dec 30, 2022
Code for "High-Precision Model-Agnostic Explanations" paper

Anchor This repository has code for the paper High-Precision Model-Agnostic Explanations. An anchor explanation is a rule that sufficiently “anchors”

Marco Tulio Correia Ribeiro 735 Jan 05, 2023
Visualizer for neural network, deep learning, and machine learning models

Netron is a viewer for neural network, deep learning and machine learning models. Netron supports ONNX, TensorFlow Lite, Keras, Caffe, Darknet, ncnn,

Lutz Roeder 20.9k Dec 28, 2022
Python implementation of R package breakDown

pyBreakDown Python implementation of breakDown package (https://github.com/pbiecek/breakDown). Docs: https://pybreakdown.readthedocs.io. Requirements

MI^2 DataLab 41 Mar 17, 2022
L2X - Code for replicating the experiments in the paper Learning to Explain: An Information-Theoretic Perspective on Model Interpretation.

L2X Code for replicating the experiments in the paper Learning to Explain: An Information-Theoretic Perspective on Model Interpretation at ICML 2018,

Jianbo Chen 113 Sep 06, 2022
Using / reproducing ACD from the paper "Hierarchical interpretations for neural network predictions" 🧠 (ICLR 2019)

Hierarchical neural-net interpretations (ACD) 🧠 Produces hierarchical interpretations for a single prediction made by a pytorch neural network. Offic

Chandan Singh 111 Jan 03, 2023
python partial dependence plot toolbox

PDPbox python partial dependence plot toolbox Motivation This repository is inspired by ICEbox. The goal is to visualize the impact of certain feature

Li Jiangchun 722 Dec 30, 2022
Tool for visualizing attention in the Transformer model (BERT, GPT-2, Albert, XLNet, RoBERTa, CTRL, etc.)

Tool for visualizing attention in the Transformer model (BERT, GPT-2, Albert, XLNet, RoBERTa, CTRL, etc.)

Jesse Vig 4.7k Jan 01, 2023
Pytorch implementation of convolutional neural network visualization techniques

Convolutional Neural Network Visualizations This repository contains a number of convolutional neural network visualization techniques implemented in

Utku Ozbulak 7k Jan 03, 2023
Visualizer for neural network, deep learning, and machine learning models

Netron is a viewer for neural network, deep learning and machine learning models. Netron supports ONNX (.onnx, .pb, .pbtxt), Keras (.h5, .keras), Tens

Lutz Roeder 20.9k Dec 28, 2022
Lime: Explaining the predictions of any machine learning classifier

lime This project is about explaining what machine learning classifiers (or models) are doing. At the moment, we support explaining individual predict

Marco Tulio Correia Ribeiro 10.3k Jan 01, 2023
Interactive convnet features visualization for Keras

Quiver Interactive convnet features visualization for Keras The quiver workflow Video Demo Build your model in keras model = Model(...) Launch the vis

Keplr 1.7k Dec 21, 2022
A game theoretic approach to explain the output of any machine learning model.

SHAP (SHapley Additive exPlanations) is a game theoretic approach to explain the output of any machine learning model. It connects optimal credit allo

Scott Lundberg 18.3k Jan 08, 2023
TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, Korean, Chinese, German and Easy to adapt for other languages)

🤪 TensorFlowTTS provides real-time state-of-the-art speech synthesis architectures such as Tacotron-2, Melgan, Multiband-Melgan, FastSpeech, FastSpeech2 based-on TensorFlow 2. With Tensorflow 2, we c

3k Jan 04, 2023
Logging MXNet data for visualization in TensorBoard.

Logging MXNet Data for Visualization in TensorBoard Overview MXBoard provides a set of APIs for logging MXNet data for visualization in TensorBoard. T

Amazon Web Services - Labs 327 Dec 05, 2022
Lucid library adapted for PyTorch

Lucent PyTorch + Lucid = Lucent The wonderful Lucid library adapted for the wonderful PyTorch! Lucent is not affiliated with Lucid or OpenAI's Clarity

Lim Swee Kiat 520 Dec 26, 2022