Stochastic Gradient Trees implementation in Python

Overview

Stochastic Gradient Trees - Python

Stochastic Gradient Trees1 by Henry Gouk, Bernhard Pfahringer, and Eibe Frank implementation in Python. Based on the parer's accompanied repository code.

Python Version 3.7 or later

Used Python libraries:

  • numpy>=1.20.2
  • scipy>=1.6.2
  • pandas>=1.3.3
  • scikit-learn>=0.24.2

Usage:

    from StochasticGradientTree import StochasticGradientTreeClassifier

    from sklearn.model_selection import train_test_split
    from sklearn.datasets import load_breast_cancer
    from sklearn.metrics import confusion_matrix, accuracy_score, log_loss

    def train(X, y):

        X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.34)
        
        tree = StochasticGradientTreeClassifier()

        tree.fit(X_train, y_train)
    
        y_pred = tree.predict(X_test)

        proba = tree.predict_proba(X_test)        

        acc_test = accuracy_score(y_test, y_pred)
        print(confusion_matrix(y_test, y_pred))
        print('Acc test: ', acc_test)
        print('Cross entropy loss: ', log_loss(y_test, proba))

        return tree, acc_test

    if __name__ == "__main__":

        breast = load_breast_cancer(as_frame=True)

        X = breast.frame.copy()
        y = breast.frame.target
        
        X.drop(['target'], axis=1, inplace=True) 

        tree, _ = train(X, y)

Binary classification example:

python classification_breast.py

Multiclass classification (using the One-vs-the-rest multiclass strategy):

python classification_iris.py

Regression example:

python regression_diabetes.py

Footnotes

  1. Gouk, H., Pfahringer, B., and Frank, E. Stochastic gradient trees. In Proceedings of The Eleventh Asian Conference on Machine Learning, volume 101 of Proceedings of Machine Learning Research, pp. 1094–1109. PMLR, 2019.

Owner
John Koumentis
Machine Learning
John Koumentis
A tax calculator for stocks and dividends activities.

Revolut Stocks calculator for Bulgarian National Revenue Agency Information Processing and calculating the required information about stock possession

Doino Gretchenliev 200 Oct 25, 2022
DaDRA (day-druh) is a Python library for Data-Driven Reachability Analysis.

DaDRA (day-druh) is a Python library for Data-Driven Reachability Analysis. The main goal of the package is to accelerate the process of computing estimates of forward reachable sets for nonlinear dy

2 Nov 08, 2021
An Indexer that works out-of-the-box when you have less than 100K stored Documents

U100KIndexer An Indexer that works out-of-the-box when you have less than 100K stored Documents. U100K means under 100K. At 100K stored Documents with

Jina AI 7 Mar 15, 2022
Stream-Kafka-ELK-Stack - Weather data streaming using Apache Kafka and Elastic Stack.

Streaming Data Pipeline - Kafka + ELK Stack Streaming weather data using Apache Kafka and Elastic Stack. Data source: https://openweathermap.org/api O

Felipe Demenech Vasconcelos 2 Jan 20, 2022
This cosmetics generator allows you to generate the new Fortnite cosmetics, Search pak and search cosmetics!

COSMETICS GENERATOR This cosmetics generator allows you to generate the new Fortnite cosmetics, Search pak and search cosmetics! Remember to put the l

ᴅᴊʟᴏʀ3xᴢᴏ 11 Dec 13, 2022
Galvanalyser is a system for automatically storing data generated by battery cycling machines in a database

Galvanalyser is a system for automatically storing data generated by battery cycling machines in a database, using a set of "harvesters", whose job it

Battery Intelligence Lab 20 Sep 28, 2022
ETL pipeline on movie data using Python and postgreSQL

Movies-ETL ETL pipeline on movie data using Python and postgreSQL Overview This project consisted on a automated Extraction, Transformation and Load p

Juan Nicolas Serrano 0 Jul 07, 2021
Desafio proposto pela IGTI em seu bootcamp de Cloud Data Engineer

Desafio Modulo 4 - Cloud Data Engineer Bootcamp - IGTI Objetivos Criar infraestrutura como código Utuilizando um cluster Kubernetes na Azure Ingestão

Otacilio Filho 4 Jan 23, 2022
Learn machine learning the fun way, with Oracle and RedBull Racing

Red Bull Racing Analytics Hands-On Labs Introduction Are you interested in learning machine learning (ML)? How about doing this in the context of the

Oracle DevRel 55 Oct 24, 2022
We're Team Arson and we're using the power of predictive modeling to combat wildfires.

We're Team Arson and we're using the power of predictive modeling to combat wildfires. Arson Map Inspiration There’s been a lot of wildfires in Califo

Jerry Lee 3 Oct 17, 2021
Minimal working example of data acquisition with nidaqmx python API

Data Aquisition using NI-DAQmx python API Based on this project It is a minimal working example for data acquisition using the NI-DAQmx python API. It

Pablo 1 Nov 05, 2021
A Python package for modular causal inference analysis and model evaluations

Causal Inference 360 A Python package for inferring causal effects from observational data. Description Causal inference analysis enables estimating t

International Business Machines 506 Dec 19, 2022
Sensitivity Analysis Library in Python (Numpy). Contains Sobol, Morris, Fractional Factorial and FAST methods.

Sensitivity Analysis Library (SALib) Python implementations of commonly used sensitivity analysis methods. Useful in systems modeling to calculate the

SALib 663 Jan 05, 2023
A Numba-based two-point correlation function calculator using a grid decomposition

A Numba-based two-point correlation function (2PCF) calculator using a grid decomposition. Like Corrfunc, but written in Numba, with simplicity and hackability in mind.

Lehman Garrison 3 Aug 24, 2022
Multiple Pairwise Comparisons (Post Hoc) Tests in Python

scikit-posthocs is a Python package that provides post hoc tests for pairwise multiple comparisons that are usually performed in statistical data anal

Maksim Terpilowski 264 Dec 30, 2022
PyStan, a Python interface to Stan, a platform for statistical modeling. Documentation: https://pystan.readthedocs.io

PyStan PyStan is a Python interface to Stan, a package for Bayesian inference. Stan® is a state-of-the-art platform for statistical modeling and high-

Stan 229 Dec 29, 2022
A probabilistic programming language in TensorFlow. Deep generative models, variational inference.

Edward is a Python library for probabilistic modeling, inference, and criticism. It is a testbed for fast experimentation and research with probabilis

Blei Lab 4.7k Jan 09, 2023
Time ranges with python

timeranges Time ranges. Read the Docs Installation pip timeranges is available on pip: pip install timeranges GitHub You can also install the latest v

Micael Jarniac 2 Sep 01, 2022
GWpy is a collaboration-driven Python package providing tools for studying data from ground-based gravitational-wave detectors

GWpy is a collaboration-driven Python package providing tools for studying data from ground-based gravitational-wave detectors. GWpy provides a user-f

GWpy 342 Jan 07, 2023