This is the repo for the paper "Improving the Accuracy-Memory Trade-Off of Random Forests Via Leaf-Refinement".

Overview

Improving the Accuracy-Memory Trade-Off of Random Forests Via Leaf-Refinement

This is the repository for the paper "Improving the Accuracy-Memory Trade-Off of Random Forests Via Leaf-Refinement". The repository is structured as the following:

  • PyPruning: This repository contains the implementations for all pruning algorithms and can be installed as a regular python package and used in other projects. For more information have a look at the Readme file in PyPruning/Readme.md and its documentation in PyPruning/docs.
  • experiment_runner: This is a simple package / script which can be used to run multiple experiments in parallel on the same machine or distributed across many different machines. It can also be installed as a regular python package and used for other projects. For more information have a look at the Readme file in experiment_runner/Readme.md.
  • {adult, bank, connect, ..., wine-quality}: Each folder contains an script init.sh which downloads the necessary files and performs pre-processing if necessary (e.g. extract archives etc.).
  • init_all.sh: Iterates over all datasets and calls the respective init.sh files. Depending on your internet connection this may take some time
  • environment.yml: Anaconda environment file which contains all dependencies. For more details see below
  • LeafRefinement.py: This is the implementation of the LeafRefinement method. We initially implemented a more complex method which uses Proximal Gradient Descent to simultaneously learn the weights and refine leaf nodes. During our experiments we discovered that leaf-refinement in iteself was enough and much simpler. We kept our old code, but implemented the LeafRefinement.py class for easier usage.
  • run.py: The script which executes the experiments. For more details see the examples below.
  • plot_results.py: The script is used explore and display results. It also creates the plots for the paper.

Getting everything ready

This git repository contains two submodules PyPruning and experiment_runner which need to be cloned first.

git clone --recurse-submodules [email protected]:sbuschjaeger/leaf-refinement-experiments.git

After the code has been obtained you need to install all dependencies. If you use Anaconda you can simply call

conda env create -f environment.yml

to prepare and activate the environment LR. After that you can install the python packages PyPruning and experiment_runner via pip:

pip install -e file:PyPruning
pip install -e file:experiment_runner

and finally activate the environment with

conda activate LR

Last you will need to get some data. If you are interested in a specific dataset you can use the accompanying init.sh script via

cd `${Dataset}`
./init.sh

or if you want to download all datasets use

./init_all.sh

Depending on your internet connection this may take some time.

Running experiments

If everything worked as expected you should now be able to run the run.py script to prune some ensembles. This script has a decent amount of parameters. See further below for an minimal working example.

  • n_jobs: Number of jobs / threads used for multiprocessing
  • base: Base learner used for experiments. Can be {RandomForestClassifier, ExtraTreesClassifier, BaggingClassifier, HeterogenousForest}. Can be a list of arguments for multiple experiments.
  • nl: Maximum number of leaf nodes (corresponds to scikit-learns max_leaf_nodes parameter)
  • dataset: Dataset used for experiment. Can be a list of arguments for multiple experiments.
  • n_estimators: Number of estimators trained for the base learner.
  • n_prune: Size of the pruned ensemble. Can be a list of arguments for multiple experiments.
  • xval: Number of cross validation runs (default is 5)
  • use_prune: If set then the script uses a train / prune / test split. If not set then the training data is also used for pruning.
  • timeout: Maximum number of seconds per run. If the runtime exceeds the provided value, stop execution (default is 5400 seconds)

Note that all base ensembles for all cross validation splits of a dataset are trained before any of the pruning algorithms are used. If you want to evaluate many datasets / hyperparameter configuration in one run this requires a lot of memory.

To train and prune forests on the magic dataset you can for example do

./run.py --dataset adult -n_estimators 256 --n_prune 2 4 8 16 32 64 128 256 --nl 64 128 256 512 1024 --n_jobs 128 --xval 5 --base RandomForestClassifier

The results are stored in ${Dataset}/results/${base}/${use_prune}/${date}/results.jsonl where ${Dataset} is the dataset (e.g. magic) and ${date} is the current time and date.

In order to re-produce the experiments form the paper you can call:

./run.py --dataset adult anura bank chess connect eeg elec postures japanese-vowels magic mozilla mnist nomao avila ida2016 satimage --n_estimators 256 --n_prune 2 4 8 16 32 64 128 256 --nl 64 128 256 512 1024 --n_jobs 128 --xval 5 --base RandomForestClassifier

Important: This call uses 128 threads and requires a decent (something in the range of 64GB) amount of memory to work.

Exploring the results

After you run the experiments you can view the results with the plot_results.py script. We recommend to use an interactive Python environment for that such as Jupyter or VSCode with the ability to execute cells, but you should also be able to run this script as-is. This script is fairly well-commented, so please have a look at it for more detailed comments.

fcn by tensorflow

Update An example on how to integrate this code into your own semantic segmentation pipeline can be found in my KittiSeg project repository. tensorflo

9 May 22, 2022
SEOVER: Sentence-level Emotion Orientation Vector based Conversation Emotion Recognition Model

SEOVER-Master This code is the implementation of paper: SEOVER: Sentence-level Emotion Orientation Vector based Conversation Emotion Recognition Model

4 Feb 24, 2022
Predictive Maintenance LSTM

Predictive-Maintenance-LSTM - Predictive maintenance study for Complex case study, we've obtained failure causes by operational error and more deeply by design mistakes.

Amir M. Sadafi 1 Dec 31, 2021
Learning Modified Indicator Functions for Surface Reconstruction

Learning Modified Indicator Functions for Surface Reconstruction In this work, we propose a learning-based approach for implicit surface reconstructio

4 Apr 18, 2022
LogDeep is an open source deeplearning-based log analysis toolkit for automated anomaly detection.

LogDeep is an open source deeplearning-based log analysis toolkit for automated anomaly detection.

donglee 279 Dec 13, 2022
Data-Driven Operational Space Control for Adaptive and Robust Robot Manipulation

OSCAR Project Page | Paper This repository contains the codebase used in OSCAR: Data-Driven Operational Space Control for Adaptive and Robust Robot Ma

NVIDIA Research Projects 74 Dec 22, 2022
Automatic 2D-to-3D Video Conversion with CNNs

Deep3D: Automatic 2D-to-3D Video Conversion with CNNs How To Run To run this code. Please install MXNet following the official document. Deep3D requir

Eric Junyuan Xie 1.2k Dec 30, 2022
A basic duplicate image detection service using perceptual image hash functions and nearest neighbor search, implemented using faiss, fastapi, and imagehash

Duplicate Image Detection Getting Started Install dependencies pip install -r requirements.txt Run service python main.py Testing Test with pytest How

Matthew Podolak 21 Nov 11, 2022
A Moonraker plug-in for real-time compensation of frame thermal expansion

Frame Expansion Compensation A Moonraker plug-in for real-time compensation of frame thermal expansion. Installation Credit to protoloft, from whom I

58 Jan 02, 2023
Cache Requests in Deta Bases and Echo them with Deta Micros

Deta Echo Cache Leverage the awesome Deta Micros and Deta Base to cache requests and echo them as needed. Stop worrying about slow public APIs or agre

Gingerbreadfork 8 Dec 07, 2021
Keywords : Streamlit, BertTokenizer, BertForMaskedLM, Pytorch

Next Word Prediction Keywords : Streamlit, BertTokenizer, BertForMaskedLM, Pytorch 🎬 Project Demo ✔ Application is hosted on Streamlit. You can see t

Vivek7 3 Aug 26, 2022
CUP-DNN is a deep neural network model used to predict tissues of origin for cancers of unknown of primary.

CUP-DNN CUP-DNN is a deep neural network model used to predict tissues of origin for cancers of unknown of primary. The model was trained on the expre

1 Oct 27, 2021
FPSAutomaticAiming——基于YOLOV5的FPS类游戏自动瞄准AI

FPSAutomaticAiming——基于YOLOV5的FPS类游戏自动瞄准AI 声明: 本项目仅限于学习交流,不可用于非法用途,包括但不限于:用于游戏外挂等,使用本项目产生的任何后果与本人无关! 简介 本项目基于yolov5,实现了一款FPS类游戏(CF、CSGO等)的自瞄AI,本项目旨在使用现

Fabian 246 Dec 28, 2022
Efficient-GlobalPointer - Pytorch Efficient GlobalPointer

引言 感谢苏神带来的模型,原文地址:https://spaces.ac.cn/archives/8877 如何运行 对应模型EfficientGlobalPoi

powerycy 40 Dec 14, 2022
A PyTorch implementation of "Semi-Supervised Graph Classification: A Hierarchical Graph Perspective" (WWW 2019)

SEAL ⠀⠀⠀ A PyTorch implementation of Semi-Supervised Graph Classification: A Hierarchical Graph Perspective (WWW 2019) Abstract Node classification an

Benedek Rozemberczki 202 Dec 27, 2022
Let's create a tool to convert Thailand budget from PDF to CSV.

thailand-budget-pdf2csv Let's create a tool to convert Thailand Government Budgeting from PDF to CSV! รวมพลัง Dev แปลงงบ จาก PDF สู่ Machine-readable

Kao.Geek 88 Dec 19, 2022
Improving Machine Translation Systems via Isotopic Replacement

CAT (Improving Machine Translation Systems via Isotopic Replacement) Machine translation plays an essential role in people’s daily international commu

Zeyu Sun 10 Nov 30, 2022
ECCV2020 paper: Fashion Captioning: Towards Generating Accurate Descriptions with Semantic Rewards. Code and Data.

This repo contains some of the codes for the following paper Fashion Captioning: Towards Generating Accurate Descriptions with Semantic Rewards. Code

Xuewen Yang 56 Dec 08, 2022
The code for our CVPR paper PISE: Person Image Synthesis and Editing with Decoupled GAN, Project Page, supp.

PISE The code for our CVPR paper PISE: Person Image Synthesis and Editing with Decoupled GAN, Project Page, supp. Requirement conda create -n pise pyt

jinszhang 110 Nov 21, 2022
PyTorch 1.5 implementation for paper DECOR-GAN: 3D Shape Detailization by Conditional Refinement.

DECOR-GAN PyTorch 1.5 implementation for paper DECOR-GAN: 3D Shape Detailization by Conditional Refinement, Zhiqin Chen, Vladimir G. Kim, Matthew Fish

Zhiqin Chen 72 Dec 31, 2022