Python based GBDT implementation

Overview

Py-boost: a research tool for exploring GBDTs

Modern gradient boosting toolkits are very complex and are written in low-level programming languages. As a result,

  • It is hard to customize them to suit one’s needs
  • New ideas and methods are not easy to implement
  • It is difficult to understand how they work

Py-boost is a Python-based gradient boosting library which aims at overcoming the aforementioned problems.

Authors: Anton Vakhrushev, Leonid Iosipoi.

Py-boost Key Features

Simple. Py-boost is a simplified gradient boosting library but it supports all main features and hyperparameters available in other implementations.

Fast with GPU. Despite the fact that Py-boost is written in Python, it works only on GPU and uses Python GPU libraries such as CuPy and Numba.

Easy to customize. Py-boost can be easily customized even if one is not familiar with GPU programming (just replace np with cp). What can be customized? Almost everuthing via custom callbacks. Examples: Row/Col sampling strategy, Training control, Losses/metrics, Multioutput handling strategy, Anything via custom callbacks

Installation

Before installing py-boost via pip you should have cupy installed. You can use:

pip install -U cupy-cuda110 py-boost

Note: replace with your cuda version! For the details see this guide

Quick tour

Py-boost is easy to use since it has similar to scikit-learn interface. For usage example please see:

More examples are comming soon

Other Sber AI Lab Projects

LightAutoML: https://github.com/sberbank-ai-lab/LightAutoML
AutoWoE: https://github.com/sberbank-ai-lab/AutoMLWhitebox
RePlay: https://github.com/sberbank-ai-lab/RePlay

Owner
Sberbank AI Lab
Sberbank AI Lab
Machine Learning University: Accelerated Natural Language Processing Class

Machine Learning University: Accelerated Natural Language Processing Class This repository contains slides, notebooks and datasets for the Machine Lea

AWS Samples 2k Jan 01, 2023
Python module for data science and machine learning users.

dsnk-distributions package dsnk distribution is a Python module for data science and machine learning that was created with the goal of reducing calcu

Emmanuel ASIFIWE 1 Nov 23, 2021
dirty_cat is a Python module for machine-learning on dirty categorical variables.

dirty_cat dirty_cat is a Python module for machine-learning on dirty categorical variables.

637 Dec 29, 2022
A repository of PyBullet utility functions for robotic motion planning, manipulation planning, and task and motion planning

pybullet-planning (previously ss-pybullet) A repository of PyBullet utility functions for robotic motion planning, manipulation planning, and task and

Caelan Garrett 260 Dec 27, 2022
Model Agnostic Confidence Estimator (MACEST) - A Python library for calibrating Machine Learning models' confidence scores

Model Agnostic Confidence Estimator (MACEST) - A Python library for calibrating Machine Learning models' confidence scores

Oracle 95 Dec 28, 2022
Model Validation Toolkit is a collection of tools to assist with validating machine learning models prior to deploying them to production and monitoring them after deployment to production.

Model Validation Toolkit is a collection of tools to assist with validating machine learning models prior to deploying them to production and monitoring them after deployment to production.

FINRA 25 Dec 28, 2022
Compare MLOps Platforms. Breakdowns of SageMaker, VertexAI, AzureML, Dataiku, Databricks, h2o, kubeflow, mlflow...

Compare MLOps Platforms. Breakdowns of SageMaker, VertexAI, AzureML, Dataiku, Databricks, h2o, kubeflow, mlflow...

Thoughtworks 318 Jan 02, 2023
Bayesian Additive Regression Trees For Python

BartPy Introduction BartPy is a pure python implementation of the Bayesian additive regressions trees model of Chipman et al [1]. Reasons to use BART

187 Dec 16, 2022
Accelerating model creation and evaluation.

EmeraldML A machine learning library for streamlining the process of (1) cleaning and splitting data, (2) training, optimizing, and testing various mo

Yusuf 0 Dec 06, 2021
Given the names and grades for each student in a class N of students, store them in a nested list and print the name(s) of any student(s) having the second lowest grade.

Hackerank-Nested-List Given the names and grades for each student in a class N of students, store them in a nested list and print the name(s) of any s

Sangeeth Mathew John 2 Dec 14, 2021
A webpage that utilizes machine learning to extract sentiments from tweets.

Tweets_Classification_Webpage The goal of this project is to be able to predict what rating customers on social media platforms would give to products

Ayaz Nakhuda 1 Dec 30, 2021
Cool Python features for machine learning that I used to be too afraid to use. Will be updated as I have more time / learn more.

python-is-cool A gentle guide to the Python features that I didn't know existed or was too afraid to use. This will be updated as I learn more and bec

Chip Huyen 3.3k Jan 05, 2023
🔬 A curated list of awesome machine learning strategies & tools in financial market.

🔬 A curated list of awesome machine learning strategies & tools in financial market.

GeorgeZou 1.6k Dec 30, 2022
whylogs: A Data and Machine Learning Logging Standard

whylogs: A Data and Machine Learning Logging Standard whylogs is an open source standard for data and ML logging whylogs logging agent is the easiest

WhyLabs 2k Jan 06, 2023
Sequence learning toolkit for Python

seqlearn seqlearn is a sequence classification toolkit for Python. It is designed to extend scikit-learn and offer as similar as possible an API. Comp

Lars 653 Dec 27, 2022
A Python toolkit for rule-based/unsupervised anomaly detection in time series

Anomaly Detection Toolkit (ADTK) Anomaly Detection Toolkit (ADTK) is a Python package for unsupervised / rule-based time series anomaly detection. As

Arundo Analytics 888 Dec 30, 2022
A library to generate synthetic time series data by easy-to-use factors and generator

timeseries-generator This repository consists of a python packages that generates synthetic time series dataset in a generic way (under /timeseries_ge

Nike Inc. 87 Dec 20, 2022
onelearn: Online learning in Python

onelearn: Online learning in Python Documentation | Reproduce experiments | onelearn stands for ONE-shot LEARNning. It is a small python package for o

15 Nov 06, 2022
Learning --> Numpy January 2022 - winter'22

Numerical-Python Numpy NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along

Shahzaneer Ahmed 0 Mar 12, 2022
Machine learning that just works, for effortless production applications

Machine learning that just works, for effortless production applications

Elisha Yadgaran 16 Sep 02, 2022