[ICML 2021] A fast algorithm for fitting robust decision trees.

Overview

GROOT: Growing Robust Trees

Growing Robust Trees (GROOT) is an algorithm that fits binary classification decision trees such that they are robust against user-specified adversarial examples. The algorithm closely resembles algorithms used for fitting normal decision trees (i.e. CART) but changes the splitting criterion and the way samples propagate when creating a split.

This repository contains the module groot that implements GROOT as a Scikit-learn compatible classifier, an adversary for model evaluation and easy functions to import datasets. For documentation see https://groot.cyber-analytics.nl

Simple example

To train and evaluate GROOT on a toy dataset against an attacker that can move samples by 0.5 in each direction one can use the following code:

from groot.adversary import DecisionTreeAdversary
from groot.model import GrootTreeClassifier

from sklearn.datasets import make_moons

X, y = make_moons(noise=0.3, random_state=0)
X_test, y_test = make_moons(noise=0.3, random_state=1)

attack_model = [0.5, 0.5]
is_numerical = [True, True]
tree = GrootTreeClassifier(attack_model=attack_model, is_numerical=is_numerical, random_state=0)

tree.fit(X, y)
accuracy = tree.score(X_test, y_test)
adversarial_accuracy = DecisionTreeAdversary(tree, "groot").adversarial_accuracy(X_test, y_test)

print("Accuracy:", accuracy)
print("Adversarial Accuracy:", adversarial_accuracy)

Installation

groot can be installed from PyPi: pip install groot-trees

To use Kantchelian's MILP attack it is required that you have GUROBI installed along with their python package: python -m pip install -i https://pypi.gurobi.com gurobipy

Specific dependency versions

To reproduce our experiments with exact package versions you can clone the repository and run: pip install -r requirements.txt

We recommend using virtual environments.

Reproducing 'Efficient Training of Robust Decision Trees Against Adversarial Examples' (article)

To reproduce the results from the paper we provide generate_k_fold_results.py, a script that takes the trained models (from JSON format) and generates tables and figures. The resulting figures generate under /out/.

To not only generate the results but to also retrain all models we include the scripts train_kfold_models.py and fit_chen_xgboost.py. The first script runs the algorithms in parallel for each dataset then outputs to /out/trees/ and /out/forests/. Warning: the script can take a long time to run (about a day given 16 cores). The second script train specifically the Chen et al. boosting ensembles. /out/results.zip contains all results from when we ran the scripts.

To experiment on image datasets we have a script image_experiments.py that fits and output the results. In this script, one can change the dataset variable to 'mnist' or 'fmnist' to switch between the two.

The scripts summarize_datasets.py and visualize_threat_models.py output some figures we used in the text.

Implementation details

The TREANT implementation (groot.treant.py) is copied almost completely from the authors of TREANT at https://github.com/gtolomei/treant with small modifications to better interface with the experiments. The heuristic by Chen et al. runs in the GROOT code, only with a different score function. This score function can be enabled by setting chen_heuristic=True on a GrootTreeClassifier before calling .fit(X, y). The provably robust boosting implementation comes almost completely from their code at https://github.com/max-andr/provably-robust-boosting and we use a small wrapper around their code (groot.provably_robust_boosting.wrapper.py) to use it. When we recorded the runtimes we turned off all parallel options in the @jit annotations from the code. The implementation of Chen et al. boosting can be found in their own repo https://github.com/chenhongge/RobustTrees, from whic we need to compile and copy the binary xgboost to the current directory. The script fit_chen_xgboost.py then calls this binary and uses the command line interface to fit all models.

Important note on TREANT

To encode L-infinity norms correctly we had to modify TREANT to NOT apply rules recursively. This means we added a single break statement in the treant.Attacker.__compute_attack() method. If you are planning on using TREANT with recursive attacker rules then you should remove this statement or use TREANT's unmodified code at https://github.com/gtolomei/treant .

Contact

For any questions or comments please create an issue or contact me directly.

Comments
  • Reproducing results from the article, issue with runtimes.csv

    Reproducing results from the article, issue with runtimes.csv

    Hello! I am trying to reproduce results from the article, and I can't figure out certain problem. First I am trying to run train_kfold_models, but the code always ouputs an error: "ImportError: cannot import name 'GrootTree' from 'groot.model'". Is there something wrong with the .py file I am trying to run, or is this problem something that doesn't occur to you and everyone else (-->something wrong on computer or files or environment)?

    Onni Mansikkamäki

    opened by OnniMansikkamaki 3
  • is_numerical argument GrootTreeClassifier

    is_numerical argument GrootTreeClassifier

    Running the example code on the make moons data in the README I get:

    Traceback (most recent call last):
      File "/home/.../groot_test.py", line 11, in <module>
        tree = GrootTreeClassifier(attack_model=attack_model, is_numerical=is_numerical, random_state=0)
    TypeError: __init__() got an unexpected keyword argument 'is_numerical'
    

    Leaving out the argument and having this line instead: tree = GrootTreeClassifier(attack_model=attack_model, random_state=0) results in this error:

    Traceback (most recent call last):
      File "/home/.../groot_test.py", line 15, in <module>
        adversarial_accuracy = DecisionTreeAdversary(tree, "groot").adversarial_accuracy(X_test, y_test)
      File "/home/.../venv/lib/python3.9/site-packages/groot/adversary.py", line 259, in __init__
        self.is_numeric = self.decision_tree.is_numerical
    AttributeError: 'GrootTreeClassifier' object has no attribute 'is_numerical'
    

    I'm guessing the code got an update, but the readme didn't. Or I made a stupid mistake, also very possible.

    opened by laudv 2
  • Reproducing result from paper

    Reproducing result from paper

    Hello! I am trying to reproduce the results from the paper. I am struggling to find, where these files: generate_k_fold_results.py, train_kfold_models.py, fit_chen_xgboost.py, image_experiments.py, summarize_datasets.py and visualize_threat_models.py are provided?

    Onni Mansikkamäki

    opened by OnniMansikkamaki 0
  • Regression decision trees and random forests

    Regression decision trees and random forests

    This PR adds GROOT decision trees and random forests that use the adversarial sum of absolute errors to make splits. It also adds new tests, speeds them up and updates the documentation.

    opened by daniel-vos 0
  • Add regression, tests and refactor into base class

    Add regression, tests and refactor into base class

    This PR adds a regression GROOT tree based on the adversarial sum of absolute errors, more tests and refactors GROOT trees into a base class (BaseGrootTree) with subclasses GrootTreeClassifier and GrootTreeRegressor extending it.

    opened by daniel-vos 0
Releases(v0.0.1)
Owner
Cyber Analytics Lab
@ Delft University of Technology
Cyber Analytics Lab
This repository contains the reference implementation for our proposed Convolutional CRFs.

ConvCRF This repository contains the reference implementation for our proposed Convolutional CRFs in PyTorch (Tensorflow planned). The two main entry-

Marvin Teichmann 553 Dec 07, 2022
Rethinking Transformer-based Set Prediction for Object Detection

Rethinking Transformer-based Set Prediction for Object Detection Here are the code for the ICCV paper. The code is adapted from Detectron2 and AdelaiD

Zhiqing Sun 62 Dec 03, 2022
The implement of papar "Enhanced Graph Learning for Collaborative Filtering via Mutual Information Maximization"

SIGIR2021-EGLN The implement of paper "Enhanced Graph Learning for Collaborative Filtering via Mutual Information Maximization" Neural graph based Col

15 Dec 27, 2022
Temporally Coherent GAN SIGGRAPH project.

TecoGAN This repository contains source code and materials for the TecoGAN project, i.e. code for a TEmporally COherent GAN for video super-resolution

Duc Linh Nguyen 2 Jan 18, 2022
Keras documentation, hosted live at keras.io

Keras.io documentation generator This repository hosts the code used to generate the keras.io website. Generating a local copy of the website pip inst

Keras 2k Jan 08, 2023
E2C implementation in PyTorch

Embed to Control implementation in PyTorch Paper can be found here: https://arxiv.org/abs/1506.07365 You will need a patched version of OpenAI Gym in

Yicheng Luo 42 Dec 12, 2022
It is an open dataset for object detection in remote sensing images.

RSOD-Dataset It is an open dataset for object detection in remote sensing images. The dataset includes aircraft, oiltank, playground and overpass. The

136 Dec 08, 2022
This is the official released code for our paper, The Emergence of Objectness: Learning Zero-Shot Segmentation from Videos

The-Emergence-of-Objectness This is the official released code for our paper, The Emergence of Objectness: Learning Zero-Shot Segmentation from Videos

44 Oct 08, 2022
Deep Probabilistic Programming Course @ DIKU

Deep Probabilistic Programming Course @ DIKU

52 May 14, 2022
Relaxed-machines - explorations in neuro-symbolic differentiable interpreters

Relaxed Machines Explorations in neuro-symbolic differentiable interpreters. Baby steps: inc_stop Libraries JAX Haiku Optax Resources Chapter 3 (∂4: A

Nada Amin 6 Feb 02, 2022
The official implementation for ACL 2021 "Challenges in Information Seeking QA: Unanswerable Questions and Paragraph Retrieval".

Code for "Challenges in Information Seeking QA: Unanswerable Questions and Paragraph Retrieval" (ACL 2021, Long) This is the repository for baseline m

Akari Asai 25 Oct 30, 2022
Generative vs Discriminative: Rethinking The Meta-Continual Learning (NeurIPS 2021)

Generative vs Discriminative: Rethinking The Meta-Continual Learning (NeurIPS 2021) In this repository we provide PyTorch implementations for GeMCL; a

4 Apr 15, 2022
A curated list of awesome open source libraries to deploy, monitor, version and scale your machine learning

Awesome production machine learning This repository contains a curated list of awesome open source libraries that will help you deploy, monitor, versi

The Institute for Ethical Machine Learning 12.9k Jan 04, 2023
The final project of "Applying AI to 3D Medical Imaging Data" from "AI for Healthcare" nanodegree - Udacity.

Quantifying Hippocampus Volume for Alzheimer's Progression Background Alzheimer's disease (AD) is a progressive neurodegenerative disorder that result

Omar Laham 1 Jan 14, 2022
Hyperparameter tuning for humans

KerasTuner KerasTuner is an easy-to-use, scalable hyperparameter optimization framework that solves the pain points of hyperparameter search. Easily c

Keras 2.6k Dec 27, 2022
Official implementation of deep Gaussian process (DGP)-based multi-speaker speech synthesis with PyTorch.

Multi-speaker DGP This repository provides official implementation of deep Gaussian process (DGP)-based multi-speaker speech synthesis with PyTorch. O

sarulab-speech 24 Sep 07, 2022
Baseline powergrid model for NY

Baseline-powergrid-model-for-NY Table of Contents About The Project Built With Usage License Contact Acknowledgements About The Project As the urgency

Anderson Energy Lab at Cornell 6 Nov 24, 2022
Breast Cancer Classification Model is applied on a different dataset

Breast Cancer Classification Model is applied on a different dataset

1 Feb 04, 2022
TResNet: High Performance GPU-Dedicated Architecture

TResNet: High Performance GPU-Dedicated Architecture paperV2 | pretrained models Official PyTorch Implementation Tal Ridnik, Hussam Lawen, Asaf Noy, I

426 Dec 28, 2022
3D mesh stylization driven by a text input in PyTorch

Text2Mesh [Project Page] Text2Mesh is a method for text-driven stylization of a 3D mesh, as described in "Text2Mesh: Text-Driven Neural Stylization fo

Threedle (University of Chicago) 649 Dec 27, 2022