MBTR is a python package for multivariate boosted tree regressors trained in parameter space.

Related tags

Machine Learningmbtr
Overview

Documentation Status Build Status codecov Latest Version License: MIT

Multivariate Boosted TRee

What is MBTR

MBTR is a python package for multivariate boosted tree regressors trained in parameter space. The package can handle arbitrary multivariate losses, as long as their gradient and Hessian are known. Gradient boosted trees are competition-winning, general-purpose, non-parametric regressors, which exploit sequential model fitting and gradient descent to minimize a specific loss function. The most popular implementations are tailored to univariate regression and classification tasks, precluding the possibility of capturing multivariate target cross-correlations and applying conditional penalties to the predictions. This package allows to arbitrarily regularize the predictions, so that properties like smoothness, consistency and functional relations can be enforced.

Installation

pip install --upgrade git+https://github.com/supsi-dacd-isaac/mbtr.git

Usage

MBT regressor follows the scikit-learn syntax for regressors. Creating a default instance and training it is as simple as:

m = MBT().fit(x,y)

while predictions for the test set are obtained through

y_hat = m.predict(x_te)

The most important parameters are the number of boosts n_boost, that is, the number of fitted trees, learning_rate and the loss_type. An extensive explanation of the different parameters can be found in the documentation.

Documentation

Documentation and examples on the usage can be found at docs.

Reference

If you make use of this software for your work, we would appreciate it if you would cite us:

Lorenzo Nespoli and Vasco Medici (2020). Multivariate Boosted Trees and Applications to Forecasting and Control arXiv

@article{nespoli2020multivariate,
  title={Multivariate Boosted Trees and Applications to Forecasting and Control},
  author={Nespoli, Lorenzo and Medici, Vasco},
  journal={arXiv preprint arXiv:2003.03835},
  year={2020}
}

Acknowledgments

The authors would like to thank the Swiss Federal Office of Energy (SFOE) and the Swiss Competence Center for Energy Research - Future Swiss Electrical Infrastructure (SCCER-FURIES), for their financial and technical support to this research work.

You might also like...
Python package for stacking (machine learning technique)
Python package for stacking (machine learning technique)

vecstack Python package for stacking (stacked generalization) featuring lightweight functional API and fully compatible scikit-learn API Convenient wa

A Python Package to Tackle the Curse of Imbalanced Datasets in Machine Learning

imbalanced-learn imbalanced-learn is a python package offering a number of re-sampling techniques commonly used in datasets showing strong between-cla

A Python package for time series classification

pyts: a Python package for time series classification pyts is a Python package for time series classification. It aims to make time series classificat

ELI5 is a Python package which helps to debug machine learning classifiers and explain their predictions
ELI5 is a Python package which helps to debug machine learning classifiers and explain their predictions

A library for debugging/inspecting machine learning classifiers and explaining their predictions

ArviZ is a Python package for exploratory analysis of Bayesian models
ArviZ is a Python package for exploratory analysis of Bayesian models

ArviZ (pronounced "AR-vees") is a Python package for exploratory analysis of Bayesian models. Includes functions for posterior analysis, data storage, model checking, comparison and diagnostics

Python package for machine learning for healthcare using a OMOP common data model

This library was developed in order to facilitate rapid prototyping in Python of predictive machine-learning models using longitudinal medical data from an OMOP CDM-standard database.

UpliftML: A Python Package for Scalable Uplift Modeling
UpliftML: A Python Package for Scalable Uplift Modeling

UpliftML is a Python package for scalable unconstrained and constrained uplift modeling from experimental data. To accommodate working with big data, the package uses PySpark and H2O models as base learners for the uplift models. Evaluation functions expect a PySpark dataframe as input.

scikit-multimodallearn is a Python package implementing algorithms multimodal data.

scikit-multimodallearn is a Python package implementing algorithms multimodal data. It is compatible with scikit-learn, a popul

MICOM is a Python package for metabolic modeling of microbial communities
MICOM is a Python package for metabolic modeling of microbial communities

Welcome MICOM is a Python package for metabolic modeling of microbial communities currently developed in the Gibbons Lab at the Institute for Systems

Comments
  • Is it possible to define custom loss function ?

    Is it possible to define custom loss function ?

    Dear all, First thank you for developping this tool, that I believe is of great interest. I am working with:

    • environmental variables (e.g. temperature, salinity)
    • multi-dimensional targets, that are relative abundance, with their sum = 1 for each site

    Therefore, I was wondering if it is possible to implement a custom loss function in the mbtr framework, that would be adapted for proportions. Please note that I am quite new to python.

    To do some testing, I tryed to dupplicate the mse loss function with another name in the losses.py file and adding the new loss in the LOSS_MAP in __inits__.py. Then I compiled the files. However, I have this error when trying to run the model from the multi_reg.py example:

    >>> m = MBT(loss_type = 'mse', n_boosts=30,  min_leaf=100, lambda_weights=1e-3).fit(x_tr, y_tr, do_plot=True)
      3%|▎         | 1/30 [00:03<01:45,  3.63s/it]
    >>> m = MBT(loss_type = 'custom_mse', n_boosts=30,  min_leaf=100, lambda_weights=1e-3).fit(x_tr, y_tr, do_plot=True)
      0%|          | 0/30 [00:00<?, ?it/s]KeyError: 'custom_mse'
    

    It seems that the new loss is not recognised in LOSS_MAP:

    >>> LOSS_MAP = {'custom_mse': losses.custom_MSE,
    ...             'mse': losses.MSE,
    ...             'time_smoother': losses.TimeSmoother,
    ...             'latent_variable': losses.LatentVariable,
    ...             'linear_regression': losses.LinRegLoss,
    ...             'fourier': losses.FourierLoss,
    ...             'quantile': losses.QuantileLoss,
    ...             'quadratic_quantile': losses.QuadraticQuantileLoss}
    AttributeError: module 'mbtr.losses' has no attribute 'custom_MSE'
    

    I guess that I missed something when trying to dupplicate and rename the mse loss. I would appreciate any help if the definition of a custom loss function is possible.

    Best regards,

    opened by alexschickele 2
  • Dataset cannot be reached

    Dataset cannot be reached

    Hi thank you for your effort to create this. I want to try this but i cannot download nor visit the web that you provided in example multivariate_forecas.py

    Is there any alternative link for that dataset? thank you regards!

    opened by kristfrizh 1
  • Error at import time with python 3.10.*

    Error at import time with python 3.10.*

    I want to use MBTR in a teaching module and I need to use jupyter-lab inside a conda environment for teaching purposes. While MBTR works as expected in a vanilla python 3.8, it errors out (on the same machine) in a conda environment using python 3.10

    Steps to reproduce

    conda create --name testenv
    conda activate testenv
    
    conda install -c conda-forge jupyterlab
    pip install --upgrade git+https://github.com/supsi-dacd-isaac/mbtr.git
    # to make sure to get the latest version; but the version on pypi gives the same error 
    

    Then

    python
    

    and in python

    from mbtr.mbtr import MBT
    

    which outputs the following error

    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/home/myself/.conda/envs/testenv/lib/python3.10/site-packages/mbtr/mbtr.py", line 317, in <module>
        def leaf_stats(y, edges, x, order):
      File "/home/myself/.conda/envs/testenv/lib/python3.10/site-packages/numba/core/decorators.py", line 219, in wrapper
        disp.compile(sig)
      File "/home/myself/.conda/envs/testenv/lib/python3.10/site-packages/numba/core/dispatcher.py", line 965, in compile
        cres = self._compiler.compile(args, return_type)
      File "/home/myself/.conda/envs/testenv/lib/python3.10/site-packages/numba/core/dispatcher.py", line 129, in compile
        raise retval
      File "/home/myself/.conda/envs/testenv/lib/python3.10/site-packages/numba/core/dispatcher.py", line 139, in _compile_cached
        retval = self._compile_core(args, return_type)
      File "/home/myself/.conda/envs/testenv/lib/python3.10/site-packages/numba/core/dispatcher.py", line 152, in _compile_core
        cres = compiler.compile_extra(self.targetdescr.typing_context,
      File "/home/myself/.conda/envs/testenv/lib/python3.10/site-packages/numba/core/compiler.py", line 716, in compile_extra
        return pipeline.compile_extra(func)
      File "/home/myself/.conda/envs/testenv/lib/python3.10/site-packages/numba/core/compiler.py", line 452, in compile_extra
        return self._compile_bytecode()
      File "/home/myself/.conda/envs/testenv/lib/python3.10/site-packages/numba/core/compiler.py", line 520, in _compile_bytecode
        return self._compile_core()
      File "/home/myself/.conda/envs/testenv/lib/python3.10/site-packages/numba/core/compiler.py", line 499, in _compile_core
        raise e
      File "/home/myself/.conda/envs/testenv/lib/python3.10/site-packages/numba/core/compiler.py", line 486, in _compile_core
        pm.run(self.state)
      File "/home/myself/.conda/envs/testenv/lib/python3.10/site-packages/numba/core/compiler_machinery.py", line 368, in run
        raise patched_exception
      File "/home/myself/.conda/envs/testenv/lib/python3.10/site-packages/numba/core/compiler_machinery.py", line 356, in run
        self._runPass(idx, pass_inst, state)
      File "/home/myself/.conda/envs/testenv/lib/python3.10/site-packages/numba/core/compiler_lock.py", line 35, in _acquire_compile_lock
        return func(*args, **kwargs)
      File "/home/myself/.conda/envs/testenv/lib/python3.10/site-packages/numba/core/compiler_machinery.py", line 311, in _runPass
        mutated |= check(pss.run_pass, internal_state)
      File "/home/myself/.conda/envs/testenv/lib/python3.10/site-packages/numba/core/compiler_machinery.py", line 273, in check
        mangled = func(compiler_state)
      File "/home/myself/.conda/envs/testenv/lib/python3.10/site-packages/numba/core/typed_passes.py", line 105, in run_pass
        typemap, return_type, calltypes, errs = type_inference_stage(
      File "/home/myself/.conda/envs/testenv/lib/python3.10/site-packages/numba/core/typed_passes.py", line 83, in type_inference_stage
        errs = infer.propagate(raise_errors=raise_errors)
      File "/home/myself/.conda/envs/testenv/lib/python3.10/site-packages/numba/core/typeinfer.py", line 1086, in propagate
        raise errors[0]
    numba.core.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend)
    No conversion from UniTuple(none x 2) to UniTuple(array(float64, 2d, A) x 2) for '$116return_value.7', defined at None
    
    File ".conda/envs/testenv/lib/python3.10/site-packages/mbtr/mbtr.py", line 327:
    def leaf_stats(y, edges, x, order):
        <source elided>
            s_left, s_right = None, None
        return s_left, s_right
        ^
    
    During: typing of assignment at /home/myself/.conda/envs/testenv/lib/python3.10/site-packages/mbtr/mbtr.py (327)
    
    File ".conda/envs/test/lib/python3.10/site-packages/mbtr/mbtr.py", line 327:
    def leaf_stats(y, edges, x, order):
        <source elided>
            s_left, s_right = None, None
        return s_left, s_right
        ^
    

    Thanks in advance for any pointer/help. The course where I want to present this is a summer course and is closing in on me 😉

    opened by jiho 0
Releases(v0.1.3)
Owner
SUPSI-DACD-ISAAC
SUPSI-DACD-ISAAC
High performance Python GLMs with all the features!

High performance Python GLMs with all the features!

QuantCo 200 Dec 14, 2022
CS 7301: Spring 2021 Course on Advanced Topics in Optimization in Machine Learning

CS 7301: Spring 2021 Course on Advanced Topics in Optimization in Machine Learning

Rishabh Iyer 141 Nov 10, 2022
A single Python file with some tools for visualizing machine learning in the terminal.

Machine Learning Visualization Tools A single Python file with some tools for visualizing machine learning in the terminal. This demo is composed of t

Bram Wasti 35 Dec 29, 2022
Lseng-iseng eksplor Machine Learning dengan menggunakan library Scikit-Learn

Kalo dengar istilah ML, biasanya rada ambigu. Soalnya punya beberapa kepanjangan, seperti Mobile Legend, Makan Lontong, Ma**ng L*v* dan lain-lain. Tapi pada repo ini membahas Machine Learning :)

Alfiyanto Kondolele 1 Apr 06, 2022
Price forecasting of SGB and IRFC Bonds and comparing there returns

Project_Bonds Project Title : Price forecasting of SGB and IRFC Bonds and comparing there returns. Introduction of the Project The 2008-09 global fina

Tishya S 1 Oct 28, 2021
pymc-learn: Practical Probabilistic Machine Learning in Python

pymc-learn: Practical Probabilistic Machine Learning in Python Contents: Github repo What is pymc-learn? Quick Install Quick Start Index What is pymc-

pymc-learn 196 Dec 07, 2022
Applied Machine Learning for Graduate Program in Computer Science (PPGCC)

Applied Machine Learning for Graduate Program in Computer Science (PPGCC) - Federal University of Santa Catarina

Jônatas Negri Grandini 1 Dec 22, 2021
💀mummify: a version control tool for machine learning

mummify is a version control tool for machine learning. It's simple, fast, and designed for model prototyping.

Max Humber 43 Jul 09, 2022
Bottleneck a collection of fast, NaN-aware NumPy array functions written in C.

Bottleneck Bottleneck is a collection of fast, NaN-aware NumPy array functions written in C. As one example, to check if a np.array has any NaNs using

Python for Data 835 Dec 27, 2022
Stacked Generalization (Ensemble Learning)

Stacking (stacked generalization) Overview ikki407/stacking - Simple and useful stacking library, written in Python. User can use models of scikit-lea

Ikki Tanaka 192 Dec 23, 2022
NCVX (NonConVeX): A User-Friendly and Scalable Package for Nonconvex Optimization in Machine Learning.

NCVX (NonConVeX): A User-Friendly and Scalable Package for Nonconvex Optimization in Machine Learning.

SUN Group @ UMN 28 Aug 03, 2022
Factorization machines in python

Factorization Machines in Python This is a python implementation of Factorization Machines [1]. This uses stochastic gradient descent with adaptive re

Corey Lynch 892 Jan 03, 2023
Avocado hass time series vs predict price

AVOCADO HASS TIME SERIES VÀ PREDICT PRICE Trước khi vào Heroku muốn giao diện đẹp mọi người chuyển giúp mình theo hình bên dưới https://avocado-hass.h

hieulmsc 3 Dec 18, 2021
Scikit learn library models to account for data and concept drift.

liquid_scikit_learn Scikit learn library models to account for data and concept drift. This python library focuses on solving data drift and concept d

7 Nov 18, 2021
cuML - RAPIDS Machine Learning Library

cuML - GPU Machine Learning Algorithms cuML is a suite of libraries that implement machine learning algorithms and mathematical primitives functions t

RAPIDS 3.1k Dec 28, 2022
Mixing up the Invariant Information clustering architecture, with self supervised concepts from SimCLR and MoCo approaches

Self Supervised clusterer Combined IIC, and Moco architectures, with some SimCLR notions, to get state of the art unsupervised clustering while retain

Bendidi Ihab 9 Feb 13, 2022
SmartSim makes it easier to use common Machine Learning (ML) libraries like PyTorch and TensorFlow

SmartSim makes it easier to use common Machine Learning (ML) libraries like PyTorch and TensorFlow, in High Performance Computing (HPC) simulations and workloads.

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow

eXtreme Gradient Boosting Community | Documentation | Resources | Contributors | Release Notes XGBoost is an optimized distributed gradient boosting l

Distributed (Deep) Machine Learning Community 23.6k Jan 03, 2023
Transpile trained scikit-learn estimators to C, Java, JavaScript and others.

sklearn-porter Transpile trained scikit-learn estimators to C, Java, JavaScript and others. It's recommended for limited embedded systems and critical

Darius Morawiec 1.2k Jan 05, 2023
High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.

What is xLearn? xLearn is a high performance, easy-to-use, and scalable machine learning package that contains linear model (LR), factorization machin

Chao Ma 3k Jan 08, 2023