Python implementation of the rulefit algorithm

Overview

RuleFit

Implementation of a rule based prediction algorithm based on the rulefit algorithm from Friedman and Popescu (PDF)

The algorithm can be used for predicting an output vector y given an input matrix X. In the first step a tree ensemble is generated with gradient boosting. The trees are then used to form rules, where the paths to each node in each tree form one rule. A rule is a binary decision if an observation is in a given node, which is dependent on the input features that were used in the splits. The ensemble of rules together with the original input features are then being input in a L1-regularized linear model, also called Lasso, which estimates the effects of each rule on the output target but at the same time estimating many of those effects to zero.

You can use rulefit for predicting a numeric response (categorial not yet implemented). The input has to be a numpy matrix with only numeric values.

Installation

The latest version can be installed from the master branch using pip:

pip install git+git://github.com/christophM/rulefit.git

Another option is to clone the repository and install using python setup.py install or python setup.py develop.

Usage

Train your model:

import numpy as np
import pandas as pd

from rulefit import RuleFit

boston_data = pd.read_csv("boston.csv", index_col=0)

y = boston_data.medv.values
X = boston_data.drop("medv", axis=1)
features = X.columns
X = X.as_matrix()

rf = RuleFit()
rf.fit(X, y, feature_names=features)

If you want to have influence on the tree generator you can pass the generator as argument:

from sklearn.ensemble import GradientBoostingRegressor
gb = GradientBoostingRegressor(n_estimators=500, max_depth=10, learning_rate=0.01)
rf = RuleFit(gb)

rf.fit(X, y, feature_names=features)

Predict

rf.predict(X)

Inspect rules:

rules = rf.get_rules()

rules = rules[rules.coef != 0].sort_values("support", ascending=False)

print(rules)

Notes

  • In contrast to the original paper, the generated trees are always fitted with the same maximum depth. In the original implementation the maximum depth of the tree are drawn from a distribution each time
  • This implementation is in progress. If you find a bug, don't hesitate to contact me.

Changelog

All notable changes to this project will be documented here.

[v0.3] - IN PROGRESS

  • set default of exclude_zero_coef to False in get_rules():
  • syntax fix (Issue 21)

[v0.2] - 2017-11-24

  • Introduces classification for RuleFit
  • Adds scaling of variables (Friedscale)
  • Allows random size trees for creating rules

[v0.1] - 2016-06-18

  • Start changelog and versions
Owner
Christoph Molnar
Interpretable Machine Learning researcher. Author of Interpretable Machine Learning Book: https://christophm.github.io/interpretable-ml-book/
Christoph Molnar
A high-performance topological machine learning toolbox in Python

giotto-tda is a high-performance topological machine learning toolbox in Python built on top of scikit-learn and is distributed under the G

giotto.ai 632 Dec 29, 2022
This is a curated list of medical data for machine learning

Medical Data for Machine Learning This is a curated list of medical data for machine learning. This list is provided for informational purposes only,

Andrew L. Beam 5.4k Dec 26, 2022
Toolkit for building machine learning models that generalize to unseen domains and are robust to privacy and other attacks.

Toolkit for Building Robust ML models that generalize to unseen domains (RobustDG) Divyat Mahajan, Shruti Tople, Amit Sharma Privacy & Causal Learning

Microsoft 149 Jan 06, 2023
A concept I came up which ditches the idea of "layers" in a neural network.

Dynet A concept I came up which ditches the idea of "layers" in a neural network. Install Copy Dynet.py to your project. Run the example Install matpl

Anik Patel 4 Dec 05, 2021
Machine Learning Study 혼자 해보기

Machine Learning Study 혼자 해보기 기여자 (Contributors) ✨ Teddy Lee 🏠 HongJaeKwon 🏠 Seungwoo Han 🏠 Tae Heon Kim 🏠 Steve Kwon 🏠 SW Song 🏠 K1A2 🏠 Wooil

Teddy Lee 1.7k Jan 01, 2023
NumPy-based implementation of a multilayer perceptron (MLP)

My own NumPy-based implementation of a multilayer perceptron (MLP). Several of its components can be tuned and played with, such as layer depth and size, hidden and output layer activation functions,

1 Feb 10, 2022
Kubeflow is a machine learning (ML) toolkit that is dedicated to making deployments of ML workflows on Kubernetes simple, portable, and scalable.

SDK: Overview of the Kubeflow pipelines service Kubeflow is a machine learning (ML) toolkit that is dedicated to making deployments of ML workflows on

Kubeflow 3.1k Jan 06, 2023
icepickle is to allow a safe way to serialize and deserialize linear scikit-learn models

icepickle It's a cooler way to store simple linear models. The goal of icepickle is to allow a safe way to serialize and deserialize linear scikit-lea

vincent d warmerdam 24 Dec 09, 2022
CorrProxies - Optimizing Machine Learning Inference Queries with Correlative Proxy Models

CorrProxies - Optimizing Machine Learning Inference Queries with Correlative Proxy Models

ZhihuiYangCS 8 Jun 07, 2022
AutoX是一个高效的自动化机器学习工具,它主要针对于表格类型的数据挖掘竞赛。 它的特点包括: 效果出色、简单易用、通用、自动化、灵活。

English | 简体中文 AutoX是什么? AutoX一个高效的自动化机器学习工具,它主要针对于表格类型的数据挖掘竞赛。 它的特点包括: 效果出色: AutoX在多个kaggle数据集上,效果显著优于其他解决方案(见效果对比)。 简单易用: AutoX的接口和sklearn类似,方便上手使用。

4Paradigm 431 Dec 28, 2022
GroundSeg Clustering Optimized Kdtree

ground seg and clustering based on kitti velodyne data, and a additional optimized kdtree for knn and radius nn search

2 Dec 02, 2021
Examples and code for the Practical Machine Learning workshop series

Practical Machine Learning Workshop Series Practical Machine Learning for Quantitative Finance Post conference workshop at the WBS Spring Conference D

CompatibL 21 Jun 25, 2022
Simulate & classify transient absorption spectroscopy (TAS) spectral features for bulk semiconducting materials (Post-DFT)

PyTASER PyTASER is a Python (3.9+) library and set of command-line tools for classifying spectral features in bulk materials, post-DFT. The goal of th

Materials Design Group 4 Dec 27, 2022
Implementation of the Object Relation Transformer for Image Captioning

Object Relation Transformer This is a PyTorch implementation of the Object Relation Transformer published in NeurIPS 2019. You can find the paper here

Yahoo 158 Dec 24, 2022
Visualize classified time series data with interactive Sankey plots in Google Earth Engine

sankee Visualize changes in classified time series data with interactive Sankey plots in Google Earth Engine Contents Description Installation Using P

Aaron Zuspan 76 Dec 15, 2022
A Collection of Conference & School Notes in Machine Learning 🦄📝🎉

Machine Learning Conference & Summer School Notes. 🦄📝🎉

558 Dec 28, 2022
A handy tool for common machine learning models' hyper-parameter tuning.

Common machine learning models' hyperparameter tuning This repo is for a collection of hyper-parameter tuning for "common" machine learning models, in

Kevin Hu 2 Jan 27, 2022
A Python implementation of FastDTW

fastdtw Python implementation of FastDTW [1], which is an approximate Dynamic Time Warping (DTW) algorithm that provides optimal or near-optimal align

tanitter 651 Jan 04, 2023
Implementation of linesearch Optimization Algorithms in Python

Nonlinear Optimization Algorithms During my time as Scientific Assistant at the Karlsruhe Institute of Technology (Germany) I implemented various Opti

Paul 3 Dec 06, 2022