A scikit-learn-compatible module for estimating prediction intervals.

Overview

GitHubActions Codecov ReadTheDocs License PythonVersion PyPi Conda Release Commits

https://github.com/simai-ml/MAPIE/raw/master/doc/images/mapie_logo_nobg_cut.png

MAPIE - Model Agnostic Prediction Interval Estimator

MAPIE allows you to easily estimate prediction intervals (or prediction sets) using your favourite scikit-learn-compatible model for single-output regression or multi-class classification settings.

Prediction intervals output by MAPIE encompass both aleatoric and epistemic uncertainties and are backed by strong theoretical guarantees [1-4].

🔗 Requirements

Python 3.7+

MAPIE stands on the shoulders of giants.

Its only internal dependency is scikit-learn.

🛠 Installation

Install via pip:

$ pip install mapie

or via conda:

$ conda install -c conda-forge mapie

To install directly from the github repository :

$ pip install git+https://github.com/simai-ml/MAPIE

⚡️ Quickstart

Let us start with a basic regression problem. Here, we generate one-dimensional noisy data that we fit with a linear model.

import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.datasets import make_regression

regressor = LinearRegression()
X, y = make_regression(n_samples=500, n_features=1, noise=20, random_state=59)

Since MAPIE is compliant with the standard scikit-learn API, we follow the standard sequential fit and predict process like any scikit-learn regressor. We set two values for alpha to estimate prediction intervals at approximately one and two standard deviations from the mean.

from mapie.estimators import MapieRegressor
alpha = [0.05, 0.32]
mapie = MapieRegressor(regressor)
mapie.fit(X, y)
y_pred, y_pis = mapie.predict(X, alpha=alpha)

MAPIE returns a np.ndarray of shape (n_samples, 3, len(alpha)) giving the predictions, as well as the lower and upper bounds of the prediction intervals for the target quantile for each desired alpha value. The estimated prediction intervals can then be plotted as follows.

from matplotlib import pyplot as plt
from mapie.metrics import coverage_score
plt.xlabel("x")
plt.ylabel("y")
plt.scatter(X, y, alpha=0.3)
plt.plot(X, y_pred, color="C1")
order = np.argsort(X[:, 0])
plt.plot(X[order], y_pis[order][:, 0, 1], color="C1", ls="--")
plt.plot(X[order], y_pis[order][:, 1, 1], color="C1", ls="--")
plt.fill_between(
    X[order].ravel(),
    y_pis[order][:, 0, 0].ravel(),
    y_pis[order][:, 1, 0].ravel(),
    alpha=0.2
)
coverage_scores = [
    coverage_score(y, y_pis[:, 0, i], y_pis[:, 1, i])
    for i, _ in enumerate(alpha)
]
plt.title(
    f"Target and effective coverages for "
    f"alpha={alpha[0]:.2f}: ({1-alpha[0]:.3f}, {coverage_scores[0]:.3f})\n"
    f"Target and effective coverages for "
    f"alpha={alpha[1]:.2f}: ({1-alpha[1]:.3f}, {coverage_scores[1]:.3f})"
)
plt.show()

The title of the plot compares the target coverages with the effective coverages. The target coverage, or the confidence interval, is the fraction of true labels lying in the prediction intervals that we aim to obtain for a given dataset. It is given by the alpha parameter defined in MapieRegressor, here equal to 0.05 and 0.32, thus giving target coverages of 0.95 and 0.68. The effective coverage is the actual fraction of true labels lying in the prediction intervals.

https://github.com/simai-ml/MAPIE/raw/master/doc/images/quickstart_1.png

📘 Documentation

The full documentation can be found on this link.

How does MAPIE work on regression ? It is basically based on cross-validation and relies on:

  • Residuals on the whole trainig set obtained by cross-validation,
  • Perturbed models generated during the cross-validation.

MAPIE then combines all these elements in a way that provides prediction intervals on new data with strong theoretical guarantees [1].

https://github.com/simai-ml/MAPIE/raw/master/doc/images/mapie_internals_regression.png

📝 Contributing

You are welcome to propose and contribute new ideas. We encourage you to open an issue so that we can align on the work to be done. It is generally a good idea to have a quick discussion before opening a pull request that is potentially out-of-scope. For more information on the contribution process, please go here.

🤝 Affiliations

MAPIE has been developed through a collaboration between Quantmetry, Michelin, and ENS Paris-Saclay with the financial support from Région Ile de France.

Quantmetry Michelin ENS IledeFrance

🔍 References

MAPIE methods belong to the field of conformal inference.

[1] Rina Foygel Barber, Emmanuel J. Candès, Aaditya Ramdas, and Ryan J. Tibshirani. "Predictive inference with the jackknife+." Ann. Statist., 49(1):486–507, February 2021.

[2] Mauricio Sadinle, Jing Lei, and Larry Wasserman. "Least Ambiguous Set-Valued Classifiers With Bounded Error Levels." Journal of the American Statistical Association, 114:525, 223-234, 2019.

[3] Yaniv Romano, Matteo Sesia and Emmanuel J. Candès. "Classification with Valid and Adaptive Coverage." NeurIPS 202 (spotlight).

[4] Anastasios Nikolas Angelopoulos, Stephen Bates, Michael Jordan and Jitendra Malik. "Uncertainty Sets for Image Classifiers using Conformal Prediction." International Conference on Learning Representations 2021.

📝 License

MAPIE is free and open-source software licensed under the 3-clause BSD license.

Comments
  • MAPIE is not able to one-hot encode columns when estimator is an scikit learn pipeline with a preprocessor step

    MAPIE is not able to one-hot encode columns when estimator is an scikit learn pipeline with a preprocessor step

    Describe the bug Dear colleagues, I am creating a system to classify customers in 2 binary classes and then apply a regression model to one of the classes.

    Some of my features are string that I obviously need to encode. In this case with one hot encoding.

    To Reproduce

    My code is as follows:

    from sklearnex import patch_sklearn
    patch_sklearn()
    
    from sklearn.pipeline import Pipeline
    from sklearn.compose import ColumnTransformer, TransformedTargetRegressor, make_column_selector as selector 
    from sklearn.preprocessing import StandardScaler, OneHotEncoder, LabelEncoder 
    from sklearn.impute import SimpleImputer
    #from sklearn.model_selection import cross_val_score, cross_validate
    from sklearn.multioutput import RegressorChain, MultiOutputRegressor
    from sklearn.feature_selection import VarianceThreshold
    from sklearn.metrics import mean_absolute_error, make_scorer, mean_tweedie_deviance, auc
    from sklearn.model_selection import RandomizedSearchCV, train_test_split, LeaveOneGroupOut, LeavePGroupsOut, cross_validate
    from sklearn.metrics import roc_auc_score, plot_roc_curve, roc_curve, confusion_matrix, classification_report, ConfusionMatrixDisplay
    from sklearn.ensemble import RandomForestClassifier, HistGradientBoostingRegressor, HistGradientBoostingClassifier
    
    from sklearn import set_config
    from mapie.regression import MapieRegressor
    
    data_train, data_test, target_train, target_test = train_test_split(
        df.drop(columns=target_reg + target_class + METADATA_COLUMNS), 
        df[target_reg + target_class], 
        random_state=42)
    
    categorical_columns_of_interest = ['col1', 'col2', 'col3', 'col4', 'col5', 'col6', 'col7']
    numerical_columns = ml_data.drop(columns=target_reg + target_class + METADATA_COLUMNS).select_dtypes(include=np.number).columns
    numerical_columns = [x for x in MY_FEATURES if x not in FEATURES_NOT_TO_IMPUTE]
    numerical_columns = [x for x in numerical_columns if x not in categorical_columns_of_interest]
    
    categorical_transformer = OneHotEncoder(handle_unknown="ignore")
    numeric_transformer = Pipeline(
        steps=[
            ("imputer", SimpleImputer(strategy="mean")), 
            ("scaler", StandardScaler()),
            ("variance_selector", VarianceThreshold(threshold=0.03))
            ]
    )
    preprocessor = ColumnTransformer(
        transformers=[
            ("numeric_only", numeric_transformer, numerical_columns),
            ("get_dummies", categorical_transformer, categorical_columns_of_interest)])
    
    pipeline_hist_boost_reg= Pipeline([('preprocessor', preprocessor),
                                 ('estimator', HistGradientBoostingRegressor())])
    
    regressor = TransformedTargetRegressor(pipeline_hist_boost_reg, func=np.log1p, inverse_func=np.expm1)
    
    mapie_estimator = MapieRegressor(pipeline_hist_boost_reg)
    mapie_estimator.fit(data_train, target_train)
    

    Expected behavior After this I will expect that I can run:

    y_pred, y_pis = mapie_estimator.predict(data_test)

    Screenshots

    ---------------------------------------------------------------------------
    ValueError                                Traceback (most recent call last)
    <ipython-input-43-61f70ee71787> in <module>
          1 mapie_estimator = MapieRegressor(pipeline_hist_boost_reg)
    ----> 2 mapie_estimator.fit(X_train_reg, y_train_reg)
    
    /anaconda/envs/azureml_py38/lib/python3.8/site-packages/mapie/regression.py in fit(self, X, y, sample_weight)
        457         cv = self._check_cv(self.cv)
        458         estimator = self._check_estimator(self.estimator)
    --> 459         X, y = check_X_y(
        460             X, y, force_all_finite=False, dtype=["float64", "int", "object"]
        461         )
    
    /anaconda/envs/azureml_py38/lib/python3.8/site-packages/sklearn/utils/validation.py in check_X_y(X, y, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric, estimator)
        962         raise ValueError("y cannot be None")
        963 
    --> 964     X = check_array(
        965         X,
        966         accept_sparse=accept_sparse,
    
    /anaconda/envs/azureml_py38/lib/python3.8/site-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator)
        683     if has_pd_integer_array:
        684         # If there are any pandas integer extension arrays,
    --> 685         array = array.astype(dtype)
        686 
        687     if force_all_finite not in (True, False, "allow-nan"):
    
    /anaconda/envs/azureml_py38/lib/python3.8/site-packages/pandas/core/generic.py in astype(self, dtype, copy, errors)
       5804         else:
       5805             # else, only a single dtype is given
    -> 5806             new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
       5807             return self._constructor(new_data).__finalize__(self, method="astype")
       5808 
    
    /anaconda/envs/azureml_py38/lib/python3.8/site-packages/pandas/core/internals/managers.py in astype(self, dtype, copy, errors)
        412 
        413     def astype(self: T, dtype, copy: bool = False, errors: str = "raise") -> T:
    --> 414         return self.apply("astype", dtype=dtype, copy=copy, errors=errors)
        415 
        416     def convert(
    
    /anaconda/envs/azureml_py38/lib/python3.8/site-packages/pandas/core/internals/managers.py in apply(self, f, align_keys, ignore_failures, **kwargs)
        325                     applied = b.apply(f, **kwargs)
        326                 else:
    --> 327                     applied = getattr(b, f)(**kwargs)
        328             except (TypeError, NotImplementedError):
        329                 if not ignore_failures:
    
    /anaconda/envs/azureml_py38/lib/python3.8/site-packages/pandas/core/internals/blocks.py in astype(self, dtype, copy, errors)
        590         values = self.values
        591 
    --> 592         new_values = astype_array_safe(values, dtype, copy=copy, errors=errors)
        593 
        594         new_values = maybe_coerce_values(new_values)
    
    /anaconda/envs/azureml_py38/lib/python3.8/site-packages/pandas/core/dtypes/cast.py in astype_array_safe(values, dtype, copy, errors)
       1298 
       1299     try:
    -> 1300         new_values = astype_array(values, dtype, copy=copy)
       1301     except (ValueError, TypeError):
       1302         # e.g. astype_nansafe can fail on object-dtype of strings
    
    /anaconda/envs/azureml_py38/lib/python3.8/site-packages/pandas/core/dtypes/cast.py in astype_array(values, dtype, copy)
       1246 
       1247     else:
    -> 1248         values = astype_nansafe(values, dtype, copy=copy)
       1249 
       1250     # in pandas we don't store numpy str dtypes, so convert to object
    
    /anaconda/envs/azureml_py38/lib/python3.8/site-packages/pandas/core/dtypes/cast.py in astype_nansafe(arr, dtype, copy, skipna)
       1083         flags = arr.flags
       1084         flat = arr.ravel("K")
    -> 1085         result = astype_nansafe(flat, dtype, copy=copy, skipna=skipna)
       1086         order: Literal["C", "F"] = "F" if flags.f_contiguous else "C"
       1087         # error: Item "ExtensionArray" of "Union[ExtensionArray, ndarray]" has no
    
    /anaconda/envs/azureml_py38/lib/python3.8/site-packages/pandas/core/dtypes/cast.py in astype_nansafe(arr, dtype, copy, skipna)
       1190     if copy or is_object_dtype(arr.dtype) or is_object_dtype(dtype):
       1191         # Explicit copy, or required since NumPy can't view from / to object.
    -> 1192         return arr.astype(dtype, copy=True)
       1193 
       1194     return arr.astype(dtype, copy=copy)
    
    ValueError: could not convert string to float: 'group C'
    

    Being this value part of one of the categorical columns which its being encoded by the preprocessor.

    When training the model without mapie everything works correctly:

    image

    Desktop (please complete the following information):

    import platform
    print(platform.machine())
    print(platform.version())
    print(platform.platform())
    print(platform.system())
    print(platform.processor())
    
    x86_64
    #58~18.04.1-Ubuntu SMP Wed Jul 28 23:14:18 UTC 2021
    Linux-5.4.0-1056-azure-x86_64-with-glibc2.10
    Linux
    x86_64
    

    Scikit learn dependencies:

    scikit-learn==1.0.2
    scikit-learn-intelex==2021.5.1
    imbalance-learn==0.9
    mapie==0.3.1
    
    bug 
    opened by edgBR 19
  • [ENHANCEMENT] time series conformal prediction

    [ENHANCEMENT] time series conformal prediction

    As described in this paper : https://arxiv.org/abs/1802.06300

    Or the EnbPI method proposed by Xu & Xie (2021) : http://proceedings.mlr.press/v139/xu21h.html https://arxiv.org/pdf/2010.09107.pdf

    enhancement 
    opened by gmartinonQM 8
  • sklearn estimator_checks in tests

    sklearn estimator_checks in tests

    Description

    Inclusion of scikit-learn estimator checks (see this page) This automates compatibility with sklearn via pytest

    Type of change

    • [X] Breaking change (fix or feature that would cause existing functionality to not work as expected)

    How Has This Been Tested?

    Test Configuration:

    • OS version: Osx Sierra 10.12
    • Python version: 3.9.2
    • MAPIE version: 0.1.0

    Checklist:

    • [X] My code follows the style guidelines of this project
    • [ ] I have performed a self-review of my own code
    • [ ] I have commented my code, particularly in hard-to-understand areas
    • [ ] I have made corresponding changes to the documentation
    • [ ] My changes generate no new warnings
    • [X] I have added tests that prove my fix is effective or that my feature works
    • [ ] New and existing unit tests pass locally with my changes
    • [ ] Any dependent changes have been merged and published in downstream modules

    N.B

    unit tests are failing, on purpose. Sklearn checks are not passing for the moment.

    enhancement 
    opened by remiadon 6
  • Mapie can not use Pipelines to its full extent, throws exception

    Mapie can not use Pipelines to its full extent, throws exception

    Describe the bug I want to use mapie on a model, which I obtained from gscv.best_estimator_ . The model uses a pipeline, which looks like this:

    Pipeline(steps=[('preprocessor',
                     ColumnTransformer(n_jobs=-1,
                                       transformers=[('enc_plz',
                                                      BinaryEncoder(drop_invariant=True),
                                                      ['Postcode']),
                                                     ('enc_obj',
                                                      OneHotEncoder(drop_invariant=True),
                                                      ['PropertyType']),
                                                     ('features', 'passthrough',
                                                      Index(['YearSurvey', 'GeoY', 'SecondBathroom', 'Income'], dtype='object')),
                                                     ('log',
                                                      FunctionTransformer(func=<ufunc 'log1p'>,
                                                                          validate...
                                                      Index(['Pensioner', 'Balcony', 'YearModernization', 'YearBuilt'], dtype='object'))])),
                    ('scaler', RobustScaler()),
                    ('clf',
                     ClfSwitcher(estimator=LGBMRegressor(bagging_fraction=0.4,
                                                         bagging_freq=4,
                                                         bagging_seed=15871193,
                                                         feature_fraction=0.11,
                                                         feature_fraction_seed=15871193,
                                                         learning_rate=0.007,
                                                         max_bin=63,
                                                         min_data_in_leaf=10,
                                                         n_estimators=4000,
                                                         num_leaves=6,
                                                         objective='regression',
                                                         random_state=15871193)))])
    

    When I use the lines

    
    mapie = MapieRegressor(best_estimator_, method="plus", cv=4)
    mapie.fit(X_train, y_train)
    

    Mapie throws the exception:

    ValueError: could not convert string to float: 'EFH'

    in

    ~\miniconda3\envs\Master_ML\lib\site-packages\mapie\regression.py in fit(self, X, y, sample_weight) 457 cv = self._check_cv(self.cv) 458 estimator = self._check_estimator(self.estimator) --> 459 X, y = check_X_y( 460 X, y, force_all_finite=False, dtype=["float64", "int", "object"] 461 )

    X_train and y_train are still in raw format (strings, not scaled, ....) the pipeline was designed to adress this.

    My guess is that when mapie.fit is called on X_train the categorical variable "EFH" produces the error because it is not float64, int or object type. However the pipeline would adress this by using an encoder.

    Expected behavior I would expect for the pipeline to preprocess my data before throwing a exception because of a wrong datatype.

    bug 
    opened by nilslacroix 5
  • [ENHANCEMENT] Provide new residual scores for regression

    [ENHANCEMENT] Provide new residual scores for regression

    Problem description Currently, the regression confidence intervals are based on the absolute value of the residuals self.residuals_ = np.abs(np.ravel(y) - y_pred). From my experience, in some cases, this may not be appropriate, for two main reasons:

    • We expect the confidence interval not to be symmetrical around the predicted value.
    • We expect the confidence interval to be conditioned (scaled for instance) on the predicted value.

    I ended up with these conclusions by applying MAPIE to a house price dataset from kaggle. My initial hunch was that the confidence intervals should scale with the predicted value. For instance, I would prefer to get the two following house prices 100 k€ +/- 10 k€ and 1000 k€ +/- 100 k€ instead of 100 k€ +/- 10 k€ and 1000 k€ +/- 10 k€.

    Describe the solution you'd like I suggest to add an argument in the fit method of MapieRegressor class. The argument would be an instance of a class (that I chose to name ResidualScore). The confidence intervals would then be conditioned on the residual scores (I prefer to call them "scores" instead of "residuals" since they may not simply be np.ravel(y) - y_pred which is the definition of residuals if I am correct). Different inherited classes may be created from ResidualScore. For instance I wrote the class AbsoluteResidualScore which gives the same results as currently and the class GammaResidualScore which computes the residual scores by (y - y_pred) / y_pred.

    Illustrations I have tested the proposed approach on the house price dataset. The two figures depict the predicted values with their confidence intervals based on the absolute value of the residuals (on the left) and on the proposed gamma residual score (on the right). comparison_AbsoluteResidualScore_GammaResidualScore

    NB I know that the quantile regression may be a solution to meet my expectations regarding this particular problem. I do not know if my proposal could be another tool to address such problems or if quantile regression should/must be prioritized.

    enhancement 
    opened by kapytaine 5
  • No module named 'mapie.quantile_regression'

    No module named 'mapie.quantile_regression'

    Describe the bug Hi! I wanted to use conformalized_quantile_regression for which I need the mapie.quantile_regression import. However, I tried a couple of things but cannot get it working. Is there any other way to use conformalized_quantile_regression with your library? Thanks in advance :)

    To Reproduce from mapie.quantile_regression import MapieQuantileRegressor Error: No module named 'mapie.quantile_regression'

    Expected behavior I expected that I could use Conformal Quantile Regression but have not managed to import mapie.quantile_regression.

    Desktop (please complete the following information):

    • OS:Windows
    • Browser: Chrome
    • MAPIE Version : 0.3.2
    bug 
    opened by masakljun 4
  • Feat/allow custom score function

    Feat/allow custom score function

    Description

    As explained in the following issue, this is a proposal to implement new residual scores, in addition to the current absolute residual scores.

    Fixes #141

    Type of change

    • New feature (non-breaking change which adds functionality)
    • This change requires a documentation update

    How Has This Been Tested?

    • [x] The current behaviour is identical to this proposal with the AbsoluteResidualScore class. The outputs are the same. The doctest has been used for this purpose.

    Checklist

    • [x] I have read the contributing guidelines
    • [ ] I have updated the HISTORY.rst and AUTHORS.rst files I updated the AUTHORS.rst but not the HISTORY.rst since I do not know if my proposal would be included with others updates.
    • [x] Linting passes successfully : make lint
    • [x] Typing passes successfully : make type-check
    • [x] Unit tests pass successfully : make tests
    • [x] Coverage is 100% : make coverage
    • [x] Documentation builds successfully : make doc
    enhancement 
    opened by kapytaine 4
  • [BUG]: Binary classification

    [BUG]: Binary classification

    Describe the bug Assertion error: For binary classification problems with y in [0, 1] To Reproduce Steps to reproduce the behavior:

    1. Go to '...'
    2. Click on '....'
    3. Scroll down to '....'
    4. See error

    Expected behavior A clear and concise description of what you expected to happen.

    Screenshots If applicable, add screenshots to help explain your problem.

    Desktop (please complete the following information):

    • OS: [e.g. iOS]
    • Browser [e.g. chrome, safari]
    • Version [e.g. 22]

    Additional context https://github.com/scikit-learn-contrib/MAPIE/blob/master/mapie/classification.py#L513

    assert type_of_target(y) in ["binary", "multiclass"]

    bug 
    opened by prashanthharshangi 4
  • First instance of mapie classifier

    First instance of mapie classifier

    Description

    First instance of mapie classifier with the fit and predict methods, the corresponding checks and the test file.

    Type of change

    Please check options that are relevant.

    • [ ] Bug fix (non-breaking change which fixes an issue)
    • [x] New feature (non-breaking change which adds functionality)
    • [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
    • [x] This change requires a documentation update

    How Has This Been Tested?

    The tests are carried out on the classification.py module with test_classification.py (see file for more details)

    Checklist:

    • [x] I have read the contributing guidelines
    • [x] I have updated the HISTORY.rst and AUTHORS.rst files
    • [x] Linting passes successfully : flake8 . --exclude=doc
    • [x] Typing passes successfully : mypy mapie examples --strict
    • [x] Unit tests pass successfully : pytest -vs --doctest-modules mapie
    • [x] Coverage is 100% : pytest -vs --doctest-modules --cov-branch --cov=mapie --pyargs mapie
    • [x] Documentation builds successfully : cd doc; make clean; make html
    enhancement 
    opened by aagoumbala 4
  • Jackknife+-after-Bootstrap for ensemble models

    Jackknife+-after-Bootstrap for ensemble models

    In the original article on the Jackknife+-after-Bootstrap, one of the selling points is that ensemble models such as Random Forest already consist of a set of bootstrapped models that can be recycled to give the prediction interval with the Jackknife+. Therefore, the prediction intervals are "free" in the same sense as the out-of-bag score is free in relation to cross-validation and bootstrapping for the prediction error.

    Is there some way to leverage this approach in MAPIE? Currently, it seems like MapieRegressor with SubSample would always retrain the given estimator, and in the case of Random Forest, give an ensemble of ensembles.

    enhancement 
    opened by kjelljorner 3
  • Update theoretical_description_classification.rst

    Update theoretical_description_classification.rst

    Description

    This PR fixes an error in the documentation. Assuming the scores are ranked from highest to lowest until the conformity score of the true label is found, then the true label Yᵢ should be equal to πₖ (the last label to be included), not πⱼ. j is a running index, so Yᵢ cannot be equal to πⱼ for all j from 1 to k.

    Type of change

    • This change requires a documentation update

    Checklist

    opened by AndreaPi 3
  • Return ensembled predictions even when alpha is None

    Return ensembled predictions even when alpha is None

    Hi all, and thanks you so much for developping MAPIE !

    I recently hit a limitation of MapieRegressor and MapieTimeSeriesRegressor : I would like to get the ensembled predictions (using the ensemble=True argument in .predict()), but without computing any confidence interval. So I used alpha=None, but in this case, the ensemble argument is ignored, since the code returns immediately the single_estimator_ predictions.

    However, it might be that we still want to extract the ensembled prediction (which is easily available in MAPIE class internals) even when alpha=None, given that the computation of this prediction is rather involved.

    So it appears that there is an unnecessary coupling between alpha=None and the ability to retrieve ensembled predictions.

    Would it be possible to decouple these two quite different things ?

    enhancement 
    opened by gmartinonQM 0
  • Typo in J+ab documentation

    Typo in J+ab documentation

    Hi all, and thanks for developping MAPIE !

    I just found a typo in J+ab documentation theoretical description, section 7 :

    Capture d’écran 2023-01-05 à 22 30 51

    There is a missing parenthesis closing the agg statement, that leaves an ambiguity about what is aggregated.

    documentation 
    opened by gmartinonQM 0
  • Consistency of attributes names accross regression classes and the `single_estimator_` attribute in CQR

    Consistency of attributes names accross regression classes and the `single_estimator_` attribute in CQR

    Hi all ! And many thanks for your time on MAPIE development, this library rocks !

    I found a syntax discrepancy between CQR and other regression classes, that makes using the library a little bit awkward when performing benchmarks.

    For example, the MapieRegressor and TimeSeriesRegressor classes both have an attribute called single_estimator_, that is clear from the docstring and represents the model we would use without MAPIE.

    However, the MapieQuantileRegressor class has no such attributes (which is surprising for a class inheriting MapieRegressor which does have this attribute), but instead the model that would be naturally used without MAPIE, the 0.5-quantile estimator, is hidden in estimators_[2].

    It thus happens that in a code comparing MapieRegressor with MapieQuantileRegressor, the code syntax is not homogeneous.

    Would it be possible to make a single_estimator_ attribute for MapieQuantileRegressor in the same spirit as MapieRegressor ?

    enhancement 
    opened by gmartinonQM 0
  • Clarification of the

    Clarification of the "symmetry" argument in CQR and more general documentation about CQR

    Hi all, and thanks again for all your developments of MAPIE !

    I recently struggled with CQR, wondering what was the impact of the symmetry argument in MapieQuantileRegressor.predict function. The docstring is quite elusive :

    "Deciding factor to whether to find the quantile value for each residuals separatly or to use the maximum of the two combined."

    (BTW there is a typo on "separatly" -> "separately").

    And I cannot find more information, be it in the theoretical description, the tutorial on CQR or the 1D-heteroscedastic example.

    Would it be possible to better describe the impact of the argument and to illustrate it in the tutorial for example ?

    Moreover, I am not sure that I understand the notations in the theoretical description. For example, there are three different notations E_i, E_{low} and E_{high} but none is defined. As for the vocabulary, I find the word "residual" confusing in the context of CQR, because it suggests that we compute the difference between the target and the main model prediction (the median estimator) whereas we compare with the other two quantiles.

    Capture d’écran 2023-01-05 à 22 15 30

    Would it be possible to clarify these points ?

    Thanks in advance !

    documentation 
    opened by gmartinonQM 1
  • Quantile crossing warning

    Quantile crossing warning

    Describe the bug

    When trying to use the MapieQuantileRegressor class I get the following UserWarnings:

    UserWarning: WARNING: The initial prediction values from the quantile method present issues as the upper quantile values might be higher than the lower quantile values.

    UserWarning: WARNING: Following the additional value added to have conformal predictions, the upper and lower bound present issues as one might be higher or lower than the other.

    The upper quantile better be greater than (or equal to) the lower quantile, so the former warning definitely does not make any sense and the latter one is rather vague. What is meant by the additional value?

    To Reproduce Steps to reproduce the behavior:

    • Use MapieQuantileRegressor on any data set (see screenshot)

    Expected behavior

    I know that quantile regressors can suffer from the quantile crossing problem (without any modifications). However, with the current warning it is hard to see whether this is the problem that occurs or something else is going on.

    Desktop

    • MAPIE Version: 0.5.0

    Screenshot image

    bug 
    opened by nmdwolf 2
  • question - Does MAPIE support multiple time series in tabular format?

    question - Does MAPIE support multiple time series in tabular format?

    I wonder whether MapieTimeSeriesRegressor supports building prediction interval when input data is tabular form and contains multiple time series. Specifically how does it know to pull the right residuals in BlockBootstrap to build distribution when the scales of times series in input data are significantly different?

    enhancement 
    opened by Arsa-Nik 0
Releases(v0.5.0)
  • v0.5.0(Oct 20, 2022)

  • v0.4.2(Sep 2, 2022)

  • v0.4.1(Jun 27, 2022)

  • v0.4.0(Jun 27, 2022)

    • Relax and fix typing
    • Add Split Conformal Quantile Regression
    • Add EnbPI method for Time Series Regression
    • Add EnbPI Documentation
    • Add example with heteroscedastic data
    • Add ConformityScore class that allows the user to define custom conformity scores
    Source code(tar.gz)
    Source code(zip)
  • v0.3.2(Mar 11, 2022)

    • Refactorize unit tests
    • Add "naive" and "top-k" methods in MapieClassifier
    • Include J+aB method in regression tutorial
    • Add MNIST example for classification
    • Add cross-conformal for classification
    • Add notebooks folder containing notebooks used for generating documentation tutorials
    • Uniformize the use of matrix k_ and add an argument "ensemble" to method "predict" in regression.py
    • Add replication of the Chen Xu's tutorial testing Jackknife+aB vs Jackknife+
    • Add Jackknife+-after-Bootstrap documentation
    • Improve scikit-learn pipelines compatibility
    Source code(tar.gz)
    Source code(zip)
  • v0.3.1(Nov 19, 2021)

  • v0.3.0(Sep 10, 2021)

    • Renaming estimators.py module to regression.py
    • New classification.py module with MapieClassifier class, that estimates prediction sets from softmax score
    • New set of unit tests for classification.py module
    • Modification of the documentation architecture
    • Split example gallery into separate regression and classification galleries
    • Add first classification examples
    • Add method classification_coverage_score in the module metrics.py
    • Fixed code error for plotting of interval widths in tutorial of documentation
    • Added missing import statements in tutorial of documentation
    • Refactorize tests of n_jobs and verbose in utils.py
    Source code(tar.gz)
    Source code(zip)
  • v0.2.3(Jul 9, 2021)

    • Inclusion in conda-forge with updated release checklist
    • Add time series example
    • Add epistemic uncertainty example
    • Remove CicleCI redundancy with ReadTheDocs
    • Remove Pep8speaks
    • Include linting in CI/CD
    • Use PyPa github actions for releases
    Source code(tar.gz)
    Source code(zip)
  • v0.2.2(Jun 10, 2021)

    • Set alpha parameter as predict argument, with None as default value
    • Switch to github actions for continuous integration of the code
    • Add image explaining MAPIE internals on the README
    Source code(tar.gz)
    Source code(zip)
  • v0.2.1(Jun 4, 2021)

  • v0.2.0(May 21, 2021)

    • Add n_jobs argument using joblib parallel processing
    • Allow cv to take the value -1 equivalently to LeaveOneOut()
    • Introduce the cv parameter to get closer to scikit-learn API
    • Remove the n_splits, shuffle and random_state parameters
    • Simplify the method parameter
    • Fix typos in documentation and add methods descriptions in sphinx
    • Accept alpha parameter as a list or np.ndarray
    • If alpha is an Iterable, .predict() returns a np.ndarray of shape (n_samples, 3, len(alpha))
    Source code(tar.gz)
    Source code(zip)
  • v0.1.4(May 7, 2021)

    • Move all alpha related operations to predict
    • Assume default LinearRegression if estimator is None
    • Improve documentation
    • return_pred string argument is now a bool ensemble
    Source code(tar.gz)
    Source code(zip)
  • 0.1.3(Apr 30, 2021)

    First official and clean release on pypi:

    • Update PyPi homepage
    • Set up publication workflows as a github action
    • Update issue and pull request templates
    • Increase sklearn compatibility (coverage_score and unit tests)
    • Create mapie.estimators and mapie.metrics
    Source code(tar.gz)
    Source code(zip)
Owner
scikit-learn compatible projects
TensorFlow implementation of original paper : https://github.com/hszhao/PSPNet

Keras implementation of PSPNet(caffe) Implemented Architecture of Pyramid Scene Parsing Network in Keras. For the best compability please use Python3.

VladKry 386 Dec 29, 2022
Semi-supervised Transfer Learning for Image Rain Removal. In CVPR 2019.

Semi-supervised Transfer Learning for Image Rain Removal This package contains the Python implementation of "Semi-supervised Transfer Learning for Ima

Wei Wei 59 Dec 26, 2022
The 1st place solution of track2 (Vehicle Re-Identification) in the NVIDIA AI City Challenge at CVPR 2021 Workshop.

AICITY2021_Track2_DMT The 1st place solution of track2 (Vehicle Re-Identification) in the NVIDIA AI City Challenge at CVPR 2021 Workshop. Introduction

Hao Luo 91 Dec 21, 2022
This is an open source library implementing hyperbox-based machine learning algorithms

hyperbox-brain is a Python open source toolbox implementing hyperbox-based machine learning algorithms built on top of scikit-learn and is distributed

Complex Adaptive Systems (CAS) Lab - University of Technology Sydney 21 Dec 14, 2022
High level network definitions with pre-trained weights in TensorFlow

TensorNets High level network definitions with pre-trained weights in TensorFlow (tested with 2.1.0 = TF = 1.4.0). Guiding principles Applicability.

Taehoon Lee 1k Dec 13, 2022
This repository contains the code for "SBEVNet: End-to-End Deep Stereo Layout Estimation" paper by Divam Gupta, Wei Pu, Trenton Tabor, Jeff Schneider

SBEVNet: End-to-End Deep Stereo Layout Estimation This repository contains the code for "SBEVNet: End-to-End Deep Stereo Layout Estimation" paper by D

Divam Gupta 19 Dec 17, 2022
Accepted at ICCV-2021: Workshop on Computer Vision for Automated Medical Diagnosis (CVAMD)

Is it Time to Replace CNNs with Transformers for Medical Images? Accepted at ICCV-2021: Workshop on Computer Vision for Automated Medical Diagnosis (C

Christos Matsoukas 80 Dec 27, 2022
MGFN: Multi-Graph Fusion Networks for Urban Region Embedding was accepted by IJCAI-2022.

Multi-Graph Fusion Networks for Urban Region Embedding (IJCAI-22) This is the implementation of Multi-Graph Fusion Networks for Urban Region Embedding

202 Nov 18, 2022
An implementation of the 1. Parallel, 2. Streaming, 3. Randomized SVD using MPI4Py

PYPARSVD This implementation allows for a singular value decomposition which is: Distributed using MPI4Py Streaming - data can be shown in batches to

Romit Maulik 44 Dec 31, 2022
MetaDrive: Composing Diverse Scenarios for Generalizable Reinforcement Learning

MetaDrive: Composing Diverse Driving Scenarios for Generalizable RL [ Documentation | Demo Video ] MetaDrive is a driving simulator with the following

DeciForce: Crossroads of Machine Perception and Autonomy 276 Jan 04, 2023
The repository offers the official implementation of our BMVC 2021 paper in PyTorch.

CrossMLP Cascaded Cross MLP-Mixer GANs for Cross-View Image Translation Bin Ren1, Hao Tang2, Nicu Sebe1. 1University of Trento, Italy, 2ETH, Switzerla

Bingoren 16 Jul 27, 2022
【ACMMM 2021】DSANet: Dynamic Segment Aggregation Network for Video-Level Representation Learning

DSANet: Dynamic Segment Aggregation Network for Video-Level Representation Learning (ACMMM 2021) Overview We release the code of the DSANet (Dynamic S

Wenhao Wu 46 Dec 27, 2022
FFTNet vocoder implementation

Unofficial Implementation of FFTNet vocode paper. implement the model. implement tests. overfit on a single batch (sanity check). linearize weights fo

Eren Gölge 81 Dec 08, 2022
Code for BMVC2021 paper "Boundary Guided Context Aggregation for Semantic Segmentation"

Boundary-Guided-Context-Aggregation Boundary Guided Context Aggregation for Semantic Segmentation Haoxiang Ma, Hongyu Yang, Di Huang In BMVC'2021 Pape

Haoxiang Ma 31 Jan 08, 2023
Fast Differentiable Matrix Sqrt Root

Fast Differentiable Matrix Sqrt Root Geometric Interpretation of Matrix Square Root and Inverse Square Root This repository constains the official Pyt

YueSong 42 Dec 30, 2022
Weakly supervised medical named entity classification

Trove Trove is a research framework for building weakly supervised (bio)medical named entity recognition (NER) and other entity attribute classifiers

60 Nov 18, 2022
Data-Driven Operational Space Control for Adaptive and Robust Robot Manipulation

OSCAR Project Page | Paper This repository contains the codebase used in OSCAR: Data-Driven Operational Space Control for Adaptive and Robust Robot Ma

NVIDIA Research Projects 74 Dec 22, 2022
Super-BPD: Super Boundary-to-Pixel Direction for Fast Image Segmentation (CVPR 2020)

Super-BPD for Fast Image Segmentation (CVPR 2020) Introduction We propose direction-based super-BPD, an alternative to superpixel, for fast generic im

189 Dec 07, 2022
Omnidirectional Scene Text Detection with Sequential-free Box Discretization (IJCAI 2019). Including competition model, online demo, etc.

Box_Discretization_Network This repository is built on the pytorch [maskrcnn_benchmark]. The method is the foundation of our ReCTs-competition method

Yuliang Liu 266 Nov 24, 2022