Scikit-learn compatible wrapper of the Random Bits Forest program written by (Wang et al., 2016)

Overview

sklearn-compatible Random Bits Forest

Scikit-learn compatible wrapper of the Random Bits Forest program written by Wang et al., 2016, available as a binary on Sourceforge. All credits belong to the authors. This is just some quick and dirty wrapper and testing code.

The authors present "...a classification and regression algorithm called Random Bits Forest (RBF). RBF integrates neural network (for depth), boosting (for wideness) and random forest (for accuracy). It first generates and selects ~10,000 small three-layer threshold random neural networks as basis by gradient boosting scheme. These binary basis are then feed into a modified random forest algorithm to obtain predictions. In conclusion, RBF is a novel framework that performs strongly especially on data with large size."

Note: the executable supplied by the authors has been compiled for Linux, and for CPUs supporting SSE instructions.

Fig1 from Wang et al., 2016

Usage

Usage example of the Random Bits Forest:

from uci_loader import *
from randombitsforest import RandomBitsForest
X, y = getdataset('diabetes')

from sklearn.ensemble.forest import RandomForestClassifier

classifier = RandomBitsForest()
classifier.fit(X[:len(y)/2], y[:len(y)/2])
p = classifier.predict(X[len(y)/2:])
print "Random Bits Forest Accuracy:", np.mean(p == y[len(y)/2:])

classifier = RandomForestClassifier(n_estimators=20)
classifier.fit(X[:len(y)/2], y[:len(y)/2])
print "Random Forest Accuracy:", np.mean(classifier.predict(X[len(y)/2:]) == y[len(y)/2:])

Usage example for the UCI comparison:

from uci_comparison import compare_estimators
from sklearn.ensemble.forest import RandomForestClassifier, ExtraTreesClassifier
from randombitsforest import RandomBitsForest

estimators = {
              'RandomForest': RandomForestClassifier(n_estimators=200),
              'ExtraTrees': ExtraTreesClassifier(n_estimators=200),
              'RandomBitsForest': RandomBitsForest(number_of_trees=200)
            }

# optionally, pass a list of UCI dataset identifiers as the datasets parameter, e.g. datasets=['iris', 'diabetes']
# optionally, pass a dict of scoring functions as the metric parameter, e.g. metrics={'F1-score': f1_score}
compare_estimators(estimators)

"""
                          ExtraTrees F1score RandomBitsForest F1score RandomForest F1score
========================================================================================
  breastcancer (n=683)      0.960 (SE=0.003)      0.954 (SE=0.003)     *0.963 (SE=0.003)
       breastw (n=699)     *0.956 (SE=0.003)      0.951 (SE=0.003)      0.953 (SE=0.005)
      creditg (n=1000)     *0.372 (SE=0.005)      0.121 (SE=0.003)      0.371 (SE=0.005)
      haberman (n=306)      0.317 (SE=0.015)     *0.346 (SE=0.020)      0.305 (SE=0.016)
         heart (n=270)      0.852 (SE=0.004)     *0.854 (SE=0.004)      0.852 (SE=0.006)
    ionosphere (n=351)      0.740 (SE=0.037)     *0.741 (SE=0.037)      0.736 (SE=0.037)
          labor (n=57)      0.246 (SE=0.016)      0.128 (SE=0.014)     *0.361 (SE=0.018)
liverdisorders (n=345)      0.707 (SE=0.013)     *0.723 (SE=0.013)      0.713 (SE=0.012)
     tictactoe (n=958)      0.030 (SE=0.007)     *0.336 (SE=0.040)      0.030 (SE=0.007)
          vote (n=435)     *0.658 (SE=0.012)      0.228 (SE=0.017)     *0.658 (SE=0.012)
"""
Owner
Tamas Madl
Tamas Madl
Kalman filter library

The kalman filter framework described here is an incredibly powerful tool for any optimization problem, but particularly for visual odometry, sensor fusion localization or SLAM.

comma.ai 276 Jan 01, 2023
Empyrial is a Python-based open-source quantitative investment library dedicated to financial institutions and retail investors

By Investors, For Investors. Want to read this in Chinese? Click here Empyrial is a Python-based open-source quantitative investment library dedicated

Santosh 640 Dec 31, 2022
XManager: A framework for managing machine learning experiments ๐Ÿง‘โ€๐Ÿ”ฌ

XManager is a platform for packaging, running and keeping track of machine learning experiments. It currently enables one to launch experiments locally or on Google Cloud Platform (GCP). Interaction

DeepMind 620 Dec 27, 2022
Create large-scale ML-driven multiscale simulation ensembles to study the interactions

MuMMI RAS v0.1 Released: Nov 16, 2021 MuMMI RAS is the application component of the MuMMI framework developed to create large-scale ML-driven multisca

4 Feb 16, 2022
Cohort Intelligence used to solve various mathematical functions

Cohort-Intelligence-for-Mathematical-Functions About Cohort Intelligence : Cohort Intelligence ( CI ) is an optimization technique. It attempts to mod

Aayush Khandekar 2 Oct 25, 2021
A collection of machine learning examples and tutorials.

machine_learning_examples A collection of machine learning examples and tutorials.

LazyProgrammer.me 7.1k Jan 01, 2023
A toolkit for geo ML data processing and model evaluation (fork of solaris)

An open source ML toolkit for overhead imagery. This is a beta version of lunular which may continue to develop. Please report any bugs through issues

Ryan Avery 4 Nov 04, 2021
BudouX is the successor to Budou, the machine learning powered line break organizer tool.

BudouX Standalone. Small. Language-neutral. BudouX is the successor to Budou, the machine learning powered line break organizer tool. It is standalone

Google 868 Jan 05, 2023
Educational python for Neural Networks, written in pure Python/NumPy.

Educational python for Neural Networks, written in pure Python/NumPy.

127 Oct 27, 2022
Code base of KU AIRS: SPARK Autonomous Vehicle Team

KU AIRS: SPARK Autonomous Vehicle Project Check this link for the blog post describing this project and the video of SPARK in simulation and on parkou

Mehmet Enes Erciyes 1 Nov 23, 2021
Multiple Linear Regression using the LinearRegression class from sklearn.linear_model library

Multiple-Linear-Regression-master - A python program to implement Multiple Linear Regression using the LinearRegression class from sklearn.linear model library

Kushal Shingote 1 Feb 06, 2022
K-Means clusternig example with Python and Scikit-learn

Unsupervised-Machine-Learning Flat Clustering K-Means clusternig example with Python and Scikit-learn Flat clustering Clustering algorithms group a se

Emin 1 Dec 13, 2021
Little Ball of Fur - A graph sampling extension library for NetworKit and NetworkX (CIKM 2020)

Little Ball of Fur is a graph sampling extension library for Python. Please look at the Documentation, relevant Paper, Promo video and External Resour

Benedek Rozemberczki 619 Dec 14, 2022
Open source time series library for Python

PyFlux PyFlux is an open source time series library for Python. The library has a good array of modern time series models, as well as a flexible array

Ross Taylor 2k Jan 02, 2023
We have a dataset of user performances. The project is to develop a machine learning model that will predict the salaries of baseball players.

Salary-Prediction-with-Machine-Learning 1. Business Problem Can a machine learning project be implemented to estimate the salaries of baseball players

AyลŸe Nur Tรผrkaslan 9 Oct 14, 2022
Implementation of K-Nearest Neighbors Algorithm Using PySpark

KNN With Spark Implementation of KNN using PySpark. The KNN was used on two separate datasets (https://archive.ics.uci.edu/ml/datasets/iris and https:

Zachary Petroff 4 Dec 30, 2022
ThunderSVM: A Fast SVM Library on GPUs and CPUs

What's new We have recently released ThunderGBM, a fast GBDT and Random Forest library on GPUs. add scikit-learn interface, see here Overview The miss

Xtra Computing Group 1.4k Dec 22, 2022
Machine-Learning with python (jupyter)

Machine-Learning with python (jupyter) ๋จธ์‹ ๋Ÿฌ๋‹ ์•ผํ•™ ์ž‘์‹ฌ 10์ผ๊ณผ ์ฅฌํ”ผํ„ฐ ๋…ธํŠธ๋ถ ๊ธฐ๋ฐ˜ ๋ฐ์ดํ„ฐ ์‚ฌ์ด์–ธ์Šค ์‹œ์ž‘ ๋“ค์–ด๊ฐ€๊ธฐ์ „ https://nbviewer.org/ ํŽ˜์ด์ง€๋ฅผ ํ†ตํ•ด์„œ ์ฅฌํ”ผํ„ฐ ๋…ธํŠธ๋ถ ๋‚ด์šฉ์„ ๋ณผ ์ˆ˜ ์žˆ๋‹ค. ์œ„ ํŽ˜์ด์ง€์—์„œ ํ˜„์žฌ ๋ ˆํฌ ๊ธฐ

HyeonWoo Jeong 1 Jan 23, 2022
A repository for collating all the resources such as articles, blogs, papers, and books related to Bayesian Statistics.

A repository for collating all the resources such as articles, blogs, papers, and books related to Bayesian Statistics.

Aayush Malik 80 Dec 12, 2022
A Powerful Serverless Analysis Toolkit That Takes Trial And Error Out of Machine Learning Projects

KXY: A Seemless API to 10x The Productivity of Machine Learning Engineers Documentation https://www.kxy.ai/reference/ Installation From PyPi: pip inst

KXY Technologies, Inc. 35 Jan 02, 2023