Extended Isolation Forest for Anomaly Detection

Last update: Dec 18, 2022

Related tags

Overview

Extended Isolation Forest
Installation
- Requirements
Use
Citation
Releases

Extended Isolation Forest

This is a simple Python implementation for the Extended Isolation Forest method described in this (https://doi.org/10.1109/TKDE.2019.2947676). It is an improvement on the original algorithm Isolation Forest which is described (among other places) in this paper for detecting anomalies and outliers for multidimensional data point distributions. An R wrapper around the core Python implementation can be found here.

Summary

The problem of anomaly detection has wide range of applications in various fields and scientific applications. Anomalous data can have as much scientific value as normal data or in some cases even more, and it is of vital importance to have robust, fast and reliable algorithms to detect and flag such anomalies. Here, we present an extension to the model-free anomaly detection algorithm, Isolation Forest Liu2008. This extension, named Extended Isolation Forest (EIF), improves the consistency and reliability of the anomaly score produced by standard methods for a given data point. We show that the standard Isolation Forest produces inconsistent anomaly score maps, and that these score maps suffer from an artifact produced as a result of how the criteria for branching operation of the binary tree is selected.

Our method allows for the slicing of the data to be done using hyperplanes with random slopes which results in improved score maps. The consistency and reliability of the algorithm is much improved using this extension. Here we show the need for an improvement on the source algorithm to improve the scoring of anomalies and the robustness of the score maps especially around edges of nominal data. We discuss the sources of the problem, and we present an efficient way for choosing these hyperplanes which give way to multiple extension levels in the case of higher dimensional data. The standard Isolation Forest is therefore a special case of the Extended Isolation Forest as presented it here. For an N dimensional dataset, Extended Isolation Forest has N levels of extension, with 0 being identical to the case of standard Isolation Forest, and N-1 being the fully extended version.

Motivation

Figure 1: Example training data. a) Normally distributed cluster. b) Two normally distributed clusters. c) Sinusoidal data points with Gaussian noise.

While various techniques exist for approaching anomaly detection, Isolation Forest Liu2008 is one with unique capabilities. This algorithm can readily work on high dimensional data, it is model free, and it scales well. It is therefore highly desirable and easy to use. However, looking at score maps for some basic example, we can see that the anomaly scores produced by the standard Isolation Forest are inconsistent, . To see this we look at the three examples shown in Figure 1.

In each case, we use the data to train our Isolation Forest. We then use the trained models to score a square grid of uniformly distributed data points, which results in score maps shown in Figure 2. Through the simplicity of the example data, we have an intuition about what the score maps should look like. For example, for the data shown in Figure 1a, we expect to see low anomaly scores in the center of the map, while the anomaly score should increase as we move radially away from the center. Similarly for the other figures.

Looking at the score maps produced by the standard Isolation Forest shown in Figure 2, we can clearly see the inconsistencies in the scores. While we can clearly see a region of low anomaly score in the center in Figure 2a, we can also see regions aligned with x and y axes passing through the origin that have lower anomaly scores compared to the four corners of the region. Based on our intuitive understanding of the data, this cannot be correct. A similar phenomenon is observed in Figure 2b. In this case, the problem is amplified. Since there are two clusters, the artificially low anomaly score regions intersect close to points (0,0) and (10,10), and create low anomaly score regions where there is no data. It is immediately obvious how this can be problematic. As for the third example, figure 2c shows that the structure of the data is completely lost. The sinusoidal shape is essentially treated as one rectangular blob.

Figure 2: Score maps using the Standard Isolation Forest for the points from Figure 1. We can see the bands and artifacts on these maps

Isolation Forest

Given a dataset of dimension N, the algorithm chooses a random sub-sample of data to construct a binary tree. The branching process of the tree occurs by selecting a random dimension x_i with i in {1,2,...,N} of the data (a single variable). It then selects a random value v within the minimum and maximum values in that dimension. If a given data point possesses a value smaller than v for dimension x_i, then that point is sent to the left branch, otherwise it is sent to the right branch. In this manner the data on the current node of the tree is split in two. This process of branching is performed recursively over the dataset until a single point is isolated, or a predetermined depth limit is reached. The process begins again with a new random sub-sample to build another randomized tree. After building a large ensemble of trees, i.e. a forest, the training is complete.

During the scoring step, a new candidate data point (or one chosen from the data used to create the trees) is run through all the trees, and an ensemble anomaly score is assigned based on the depth the point reaches in each tree. Figure 3 shows an schematic example of a tree and a forest plotted radially.

Figure 3: a) Shows an example tree formed from the example data while b) shows the forest generated where each tree is represented by a radial line from the center to the outer circle. Anomalous points (shown in red) are isolated very quickly,which means they reach shallower depths than nominal points (shown in blue).

It turns out the splitting process described above is the main source of the bias observed in the score maps. Figure 4 shows the process described above for each one of the examples considered thus far. The branch cuts are always parallel to the axes, and as a result over construction of many trees, regions in the domain that don't occupy any data points receive superfluous branch cuts.

Figure 4: Splitting of data in the domain during the process of construction of one tree.

Extension

The Extended Isolation Forest remedies this problem by allowing the branching process to occur in every direction. The process of choosing branch cuts is altered so that at each node, instead of choosing a random feature along with a random value, we choose a random normal vector along with a random intercept point.

Figure 5 shows the resulting branch cuts int he domain for each of our examples.

Figure 5: Same as Figure 4 but using Extended Isolation Forest

We can see that the region is divided much more uniformly, and without the bias introducing effects of the coordinate system. As in the case of the standard Isolation Forest, the anomaly score is computed by the aggregated depth that a given point reaches on each iTree.

As we see in Figure 6, these modifications completely fix the issue with the score maps that we saw before and produce reliable results. Clearly, these score maps are a much better representation of anomaly score distributions.

Figure 6: Score maps using the Extended Isolation Forest.

Figure 7 shows a very simple example of anomalies and nominal points from a Single blob example as shown in Figure 1a. It also shows the distribution of the anomaly scores which can be used to make hard cuts on the definition of anomalies or even assign probabilities to each point.

Figure 7: a) Shows the dataset used, some sample anomalous data points discovered using the algorithm are highlighted in black. We also highlight some nominal points in red. In b), we have the distribution of anomaly scores obtained by the algorithm.

The Code

Here we provide the source code for the algorithm as well as documented example notebooks to help get started. Various visualizations are provided such as score distributions, score maps, aggregate slicing of the domain, and tree and whole forest visualizations. Most examples are in 2D. We present one 3D example. However, the algorithm works readily with higher dimensional data.

Installation

pip install eif

or directly from the repository

pip install git+https://github.com/sahandha/eif.git

Alternatively, you can install the eif R package from here, which provides an R wrapper around the core Python implementation.

Requirements

numpy
cython

No extra requirements are needed. In addition, it also contains means to draw the trees created using the igraph library. See the example for tree visualizations.

Use

See these notebooks for examples on how to use it

Citation

If you use this code and method, please considering using the following reference:

A link to the paper can be found here

@ARTICLE{8888179,
author={S. {Hariri} and M. {Carrasco Kind} and R. J. {Brunner}},
journal={IEEE Transactions on Knowledge and Data Engineering},
title={Extended Isolation Forest},
year={2019},
volume={},
number={},
pages={1-1},
keywords={Forestry;Vegetation;Distributed databases;Anomaly detection;Standards;Clustering algorithms;Heating systems;Anomaly Detection;Isolation Forest},
doi={10.1109/TKDE.2019.2947676},
ISSN={},
month={},}

Releases

v2.0.2

2019-NOV-14

Convert code into C++ with using cython.
Much faster and efficient forest generation and scoring procedures.
Previous implementation renamed, use import eif_old to use old version

v1.0.2

2018-OCT-01

Release
Added documentation, examples and software paper

v1.0.1

2018-AUG-08

Bugfix for multidimensional data

v1.0.0

2018-JUL-15

Initial Release

Comments

Error while installing eif

Hi!

Trying to install eif through pip I get the following error:


(base) C:\WINDOWS\system32>pip install eif
Collecting eif
  Using cached https://files.pythonhosted.org/packages/83/b2/d87d869deeb192ab599c899b91a9ad1d3775d04f5b7adcaf7ff6daa54c24/eif-2.0.2.tar.gz
Requirement already satisfied: numpy in c:\users\o.korshun\appdata\local\continuum\anaconda3\lib\site-packages (from eif) (1.16.5)
Requirement already satisfied: cython in c:\users\o.korshun\appdata\local\continuum\anaconda3\lib\site-packages (from eif) (0.29.13)
Building wheels for collected packages: eif
  Building wheel for eif (setup.py) ... error
  ERROR: Command errored out with exit status 1:
   command: 'C:\Users\o.korshun\AppData\Local\Continuum\anaconda3\python.exe' -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\O0ADF~1.KOR\\AppData\\Local\\Temp\\pip-install-1adywqes\\eif\\setup.py'"'"'; __file__='"'"'C:\\Users\\O0ADF~1.KOR\\AppData\\Local\\Temp\\pip-install-1adywqes\\eif\\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d 'C:\Users\O0ADF~1.KOR\AppData\Local\Temp\pip-wheel-kw_2kpwv' --python-tag cp37
       cwd: C:\Users\O0ADF~1.KOR\AppData\Local\Temp\pip-install-1adywqes\eif\
  Complete output (60 lines):
  running bdist_wheel
  running build
  running build_py
  creating build
  creating build\lib.win-amd64-3.7
  copying eif_old.py -> build\lib.win-amd64-3.7
  copying version.py -> build\lib.win-amd64-3.7
  running egg_info
  writing eif.egg-info\PKG-INFO
  writing dependency_links to eif.egg-info\dependency_links.txt
  writing requirements to eif.egg-info\requires.txt
  writing top-level names to eif.egg-info\top_level.txt
  reading manifest file 'eif.egg-info\SOURCES.txt'
  reading manifest template 'MANIFEST.in'
  writing manifest file 'eif.egg-info\SOURCES.txt'
  running build_ext
  cythoning _eif.pyx to _eif.cpp
  building 'eif' extension
  creating build\temp.win-amd64-3.7
  creating build\temp.win-amd64-3.7\Release
  C:\Users\o.korshun\AppData\Local\Continuum\anaconda3\Library\mingw-w64\bin\gcc.exe -mdll -O -Wall -DMS_WIN64 -IC:\Users\o.korshun\AppData\Local\Continuum\anaconda3\lib\site-packages\numpy\core\include -IC:\Users\o.korshun\AppData\Local\Continuum\anaconda3\include -IC:\Users\o.korshun\AppData\Local\Continuum\anaconda3\include -c _eif.cpp -o build\temp.win-amd64-3.7\Release\_eif.o -Wcpp
  In file included from C:/Users/o.korshun/AppData/Local/Continuum/anaconda3/Library/mingw-w64/include/c++/5.3.0/random:35:0,
                   from eif.hxx:5,
                   from _eif.cpp:614:
  C:/Users/o.korshun/AppData/Local/Continuum/anaconda3/Library/mingw-w64/include/c++/5.3.0/bits/c++0x_warning.h:32:2: error: #error This file requires compiler and library support for the ISO C++ 2011 standard. This support is currently experimental, and must be enabled with the -std=c++11 or -std=gnu++11 compiler options.
   #error This file requires compiler and library support for the \
    ^
  In file included from C:\Users\o.korshun\AppData\Local\Continuum\anaconda3\lib\site-packages\numpy\core\include/numpy/ndarraytypes.h:1822:0,
                   from C:\Users\o.korshun\AppData\Local\Continuum\anaconda3\lib\site-packages\numpy\core\include/numpy/ndarrayobject.h:12,
                   from C:\Users\o.korshun\AppData\Local\Continuum\anaconda3\lib\site-packages\numpy\core\include/numpy/arrayobject.h:4,
                   from _eif.cpp:612:
  C:\Users\o.korshun\AppData\Local\Continuum\anaconda3\lib\site-packages\numpy\core\include/numpy/npy_1_7_deprecated_api.h:15:77: note: #pragma message: C:\Users\o.korshun\AppData\Local\Continuum\anaconda3\lib\site-packages\numpy\core\include/numpy/npy_1_7_deprecated_api.h(14) : Warning Msg: Using deprecated NumPy API, disable it with #define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION
                            "#define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION")
                                                                               ^
  In file included from _eif.cpp:614:0:
  eif.hxx:11:28: error: 'std::mt19937_64' has not been declared
   #define RANDOM_ENGINE std::mt19937_64
                              ^
  eif.hxx:65:55: note: in expansion of macro 'RANDOM_ENGINE'
           void build_tree (double*, int, int, int, int, RANDOM_ENGINE&, int);
                                                         ^
  eif.hxx:11:28: error: 'std::mt19937_64' has not been declared
   #define RANDOM_ENGINE std::mt19937_64
                              ^
  eif.hxx:66:44: note: in expansion of macro 'RANDOM_ENGINE'
           Node* add_node (double*, int, int, RANDOM_ENGINE&);
                                              ^
  eif.hxx:11:28: error: 'std::mt19937_64' has not been declared
   #define RANDOM_ENGINE std::mt19937_64
                              ^
  eif.hxx:132:63: note: in expansion of macro 'RANDOM_ENGINE'
   inline std::vector<int> sample_without_replacement (int, int, RANDOM_ENGINE&);
                                                                 ^
  _eif.cpp: In function 'PyTypeObject* __Pyx_ImportType(PyObject*, const char*, const char*, size_t, __Pyx_ImportType_CheckSize)':
  _eif.cpp:8085:53: warning: unknown conversion type character 'z' in format [-Wformat=]
               module_name, class_name, size, basicsize);
                                                       ^
  _eif.cpp:8085:53: warning: unknown conversion type character 'z' in format [-Wformat=]
  _eif.cpp:8085:53: warning: too many arguments for format [-Wformat-extra-args]
  error: command 'C:\\Users\\o.korshun\\AppData\\Local\\Continuum\\anaconda3\\Library\\mingw-w64\\bin\\gcc.exe' failed with exit status 1
  ----------------------------------------
  ERROR: Failed building wheel for eif
  Running setup.py clean for eif
Failed to build eif
Installing collected packages: eif
    Running setup.py install for eif ... error
    ERROR: Command errored out with exit status 1:
     command: 'C:\Users\o.korshun\AppData\Local\Continuum\anaconda3\python.exe' -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\O0ADF~1.KOR\\AppData\\Local\\Temp\\pip-install-1adywqes\\eif\\setup.py'"'"'; __file__='"'"'C:\\Users\\O0ADF~1.KOR\\AppData\\Local\\Temp\\pip-install-1adywqes\\eif\\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record 'C:\Users\O0ADF~1.KOR\AppData\Local\Temp\pip-record-yqa9lmac\install-record.txt' --single-version-externally-managed --compile
         cwd: C:\Users\O0ADF~1.KOR\AppData\Local\Temp\pip-install-1adywqes\eif\
    Complete output (60 lines):
    running install
    running build
    running build_py
    creating build
    creating build\lib.win-amd64-3.7
    copying eif_old.py -> build\lib.win-amd64-3.7
    copying version.py -> build\lib.win-amd64-3.7
    running egg_info
    writing eif.egg-info\PKG-INFO
    writing dependency_links to eif.egg-info\dependency_links.txt
    writing requirements to eif.egg-info\requires.txt
    writing top-level names to eif.egg-info\top_level.txt
    reading manifest file 'eif.egg-info\SOURCES.txt'
    reading manifest template 'MANIFEST.in'
    writing manifest file 'eif.egg-info\SOURCES.txt'
    running build_ext
    skipping '_eif.cpp' Cython extension (up-to-date)
    building 'eif' extension
    creating build\temp.win-amd64-3.7
    creating build\temp.win-amd64-3.7\Release
    C:\Users\o.korshun\AppData\Local\Continuum\anaconda3\Library\mingw-w64\bin\gcc.exe -mdll -O -Wall -DMS_WIN64 -IC:\Users\o.korshun\AppData\Local\Continuum\anaconda3\lib\site-packages\numpy\core\include -IC:\Users\o.korshun\AppData\Local\Continuum\anaconda3\include -IC:\Users\o.korshun\AppData\Local\Continuum\anaconda3\include -c _eif.cpp -o build\temp.win-amd64-3.7\Release\_eif.o -Wcpp
    In file included from C:/Users/o.korshun/AppData/Local/Continuum/anaconda3/Library/mingw-w64/include/c++/5.3.0/random:35:0,
                     from eif.hxx:5,
                     from _eif.cpp:614:
    C:/Users/o.korshun/AppData/Local/Continuum/anaconda3/Library/mingw-w64/include/c++/5.3.0/bits/c++0x_warning.h:32:2: error: #error This file requires compiler and library support for the ISO C++ 2011 standard. This support is currently experimental, and must be enabled with the -std=c++11 or -std=gnu++11 compiler options.
     #error This file requires compiler and library support for the \
      ^
    In file included from C:\Users\o.korshun\AppData\Local\Continuum\anaconda3\lib\site-packages\numpy\core\include/numpy/ndarraytypes.h:1822:0,
                     from C:\Users\o.korshun\AppData\Local\Continuum\anaconda3\lib\site-packages\numpy\core\include/numpy/ndarrayobject.h:12,
                     from C:\Users\o.korshun\AppData\Local\Continuum\anaconda3\lib\site-packages\numpy\core\include/numpy/arrayobject.h:4,
                     from _eif.cpp:612:
    C:\Users\o.korshun\AppData\Local\Continuum\anaconda3\lib\site-packages\numpy\core\include/numpy/npy_1_7_deprecated_api.h:15:77: note: #pragma message: C:\Users\o.korshun\AppData\Local\Continuum\anaconda3\lib\site-packages\numpy\core\include/numpy/npy_1_7_deprecated_api.h(14) : Warning Msg: Using deprecated NumPy API, disable it with #define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION
                              "#define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION")
                                                                                 ^
    In file included from _eif.cpp:614:0:
    eif.hxx:11:28: error: 'std::mt19937_64' has not been declared
     #define RANDOM_ENGINE std::mt19937_64
                                ^
    eif.hxx:65:55: note: in expansion of macro 'RANDOM_ENGINE'
             void build_tree (double*, int, int, int, int, RANDOM_ENGINE&, int);
                                                           ^
    eif.hxx:11:28: error: 'std::mt19937_64' has not been declared
     #define RANDOM_ENGINE std::mt19937_64
                                ^
    eif.hxx:66:44: note: in expansion of macro 'RANDOM_ENGINE'
             Node* add_node (double*, int, int, RANDOM_ENGINE&);
                                                ^
    eif.hxx:11:28: error: 'std::mt19937_64' has not been declared
     #define RANDOM_ENGINE std::mt19937_64
                                ^
    eif.hxx:132:63: note: in expansion of macro 'RANDOM_ENGINE'
     inline std::vector<int> sample_without_replacement (int, int, RANDOM_ENGINE&);
                                                                   ^
    _eif.cpp: In function 'PyTypeObject* __Pyx_ImportType(PyObject*, const char*, const char*, size_t, __Pyx_ImportType_CheckSize)':
    _eif.cpp:8085:53: warning: unknown conversion type character 'z' in format [-Wformat=]
                 module_name, class_name, size, basicsize);
                                                         ^
    _eif.cpp:8085:53: warning: unknown conversion type character 'z' in format [-Wformat=]
    _eif.cpp:8085:53: warning: too many arguments for format [-Wformat-extra-args]
    error: command 'C:\\Users\\o.korshun\\AppData\\Local\\Continuum\\anaconda3\\Library\\mingw-w64\\bin\\gcc.exe' failed with exit status 1
    ----------------------------------------
ERROR: Command errored out with exit status 1: 'C:\Users\o.korshun\AppData\Local\Continuum\anaconda3\python.exe' -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\O0ADF~1.KOR\\AppData\\Local\\Temp\\pip-install-1adywqes\\eif\\setup.py'"'"'; __file__='"'"'C:\\Users\\O0ADF~1.KOR\\AppData\\Local\\Temp\\pip-install-1adywqes\\eif\\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record 'C:\Users\O0ADF~1.KOR\AppData\Local\Temp\pip-record-yqa9lmac\install-record.txt' --single-version-externally-managed --compile Check the logs for full command output.

opened by PoradaKev 25

Can the extension concept Applied to Gradient Boosted Machine?

Hi there,

This might be dummy questions.

I was curious whether the "extension" concept that you introduce can be applied to Supervised version such as Gradient Boosted Trees algorithm or not. There was several widely known Implementation like XGBoost or LightGBM. All of these GBT also suffer from "box" like decision boundary. I believe it would be great to see GBT to create decision boundary the way your extended isolation forest was producing.

What do you guys think?

Feel free to close this issue since its not real issue, just discussion.

opened by alfian777 5
Installation problem

Hello, i'm trying to install this package, and i'm having error messages and i don't get to install it. Can you help?

Windows 10

(base) C:\Users\quirosgu>pip install eif Collecting eif Using cached eif-2.0.2.tar.gz (1.6 MB) Requirement already satisfied: numpy in c:\users\quirosgu\anaconda3\lib\site-packages (from eif) (1.18.5) Requirement already satisfied: cython in c:\users\quirosgu\anaconda3\lib\site-packages (from eif) (0.29.21) Building wheels for collected packages: eif Building wheel for eif (setup.py) ... error ERROR: Command errored out with exit status 1: command: 'C:\Users\quirosgu\Anaconda3\python.exe' -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\Users\quirosgu\AppData\Local\Temp\pip-install-wz5r6gph\eif\setup.py'"'"'; file='"'"'C:\Users\quirosgu\AppData\Local\Temp\pip-install-wz5r6gph\eif\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' bdist_wheel -d 'C:\Users\quirosgu\AppData\Local\Temp\pip-wheel-6t9epked' cwd: C:\Users\quirosgu\AppData\Local\Temp\pip-install-wz5r6gph\eif
Complete output (19 lines): running bdist_wheel running build running build_py creating build creating build\lib.win32-3.8 copying eif_old.py -> build\lib.win32-3.8 copying version.py -> build\lib.win32-3.8 running egg_info writing eif.egg-info\PKG-INFO writing dependency_links to eif.egg-info\dependency_links.txt writing requirements to eif.egg-info\requires.txt writing top-level names to eif.egg-info\top_level.txt reading manifest file 'eif.egg-info\SOURCES.txt' reading manifest template 'MANIFEST.in' writing manifest file 'eif.egg-info\SOURCES.txt' running build_ext cythoning _eif.pyx to _eif.cpp building 'eif' extension error: Microsoft Visual C++ 14.0 or greater is required. Get it with "Microsoft C++ Build Tools": https://visualstudio.microsoft.com/visual-cpp-build-tools/

ERROR: Failed building wheel for eif Running setup.py clean for eif Failed to build eif Installing collected packages: eif Running setup.py install for eif ... error ERROR: Command errored out with exit status 1: command: 'C:\Users\quirosgu\Anaconda3\python.exe' -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\Users\quirosgu\AppData\Local\Temp\pip-install-wz5r6gph\eif\setup.py'"'"'; file='"'"'C:\Users\quirosgu\AppData\Local\Temp\pip-install-wz5r6gph\eif\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record 'C:\Users\quirosgu\AppData\Local\Temp\pip-record-fjpa9g_k\install-record.txt' --single-version-externally-managed --compile --install-headers 'C:\Users\quirosgu\Anaconda3\Include\eif' cwd: C:\Users\quirosgu\AppData\Local\Temp\pip-install-wz5r6gph\eif
Complete output (19 lines): running install running build running build_py creating build creating build\lib.win32-3.8 copying eif_old.py -> build\lib.win32-3.8 copying version.py -> build\lib.win32-3.8 running egg_info writing eif.egg-info\PKG-INFO writing dependency_links to eif.egg-info\dependency_links.txt writing requirements to eif.egg-info\requires.txt writing top-level names to eif.egg-info\top_level.txt reading manifest file 'eif.egg-info\SOURCES.txt' reading manifest template 'MANIFEST.in' writing manifest file 'eif.egg-info\SOURCES.txt' running build_ext skipping '_eif.cpp' Cython extension (up-to-date) building 'eif' extension error: Microsoft Visual C++ 14.0 or greater is required. Get it with "Microsoft C++ Build Tools": https://visualstudio.microsoft.com/visual-cpp-build-tools/ ---------------------------------------- ERROR: Command errored out with exit status 1: 'C:\Users\quirosgu\Anaconda3\python.exe' -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\Users\quirosgu\AppData\Local\Temp\pip-install-wz5r6gph\eif\setup.py'"'"'; file='"'"'C:\Users\quirosgu\AppData\Local\Temp\pip-install-wz5r6gph\eif\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record 'C:\Users\quirosgu\AppData\Local\Temp\pip-record-fjpa9g_k\install-record.txt' --single-version-externally-managed --compile --install-headers 'C:\Users\quirosgu\Anaconda3\Include\eif' Check the logs for full command output.

In this case, i already installed all the dependencies required MVC++, etc, but the problem continues.

I tried to reproduce it in another WIndows machine and it does not work, contrary, in a Linux based system it does work.

opened by luigiquiros 3
PR for Parallelization and Reduce Memory

Hello,

For high dimensional datasets, I'm finding multi-processing parallelization can speed things up a bit. I also, find that storing the original data in each Node and each iTree consumes a lot of needless memory. Would you be open to reviewing a Pull Request(s) that addressed both of these items? If so, would you accept them bundled together as one PR or would you like them separated?

Thanks

opened by pford221 3
Use in novelty detection/one-class classification

From what I understand, your api doesn't distinguish between constructing the trees and querying to obtain scores (like the fit/predict methods of scikit-learn), is that correct?

So it's not currently possible to use this implementation for novelty detection/one-class classification, where the training set is different from the test set?

opened by oulenz 2
Scoring takes too long

My training and validation data are of similar size (about 1,500,000 rows and 11 features). Model building took very less time even with full extension. But, when scoring the validation data using compute_paths, the function has been running for close to 15 hours and still scoring is not done. Is there some way to speed up the scoring process?

opened by thedarklord310780 2
Add Arxiv paper to readme

Thanks for providing this code. Please add mention of and a link to your associated Arxiv paper into the repo's readme. The link is https://arxiv.org/abs/1811.02141

opened by impredicative 2
setting ExtensionLevel

If I understand the paper correctly, we obtain the full EIF approach by setting ExtensionLevel equal to the number of dimensions of the data minus 1, correct?

opened by oulenz 1
Small fix install progress

One of the extra compile arguments in setup.py seemed to prevent successful installation on multiple systems. Simply removing this argument seems to resolve this with no negative implications. The argument seems to try and force the compiler to run in c++11. Unsure if this was even present on the tested systems

opened by Dainean 1
Update eif.py

Goal: for more convenient usage Inspired by the tutorial document, I added two functions, outlier_pred and outlier_index into iForest, which returns the outlier prediction index and label matrix.

opened by MaiRajborirug 0
How to save the eif Model?

I am trying to save the model using pickle.dump() but this not working. How do I save the eif model? Please provide me a solution as I am stuck with this problem. Thank you.

opened by SanthanaMano 0
module 'eif' has no attribute '__version__'

i install eif by "pip install eif" and Successfully installed eif-2.0.2 but when i use eif.iForest arise attributeError: module 'eif' has no attribute 'version'

opened by wererLinC 0
I can't install eif 2.0.2, please tell me the reason

(base) C:\Users\22393\eif-2.0.2\eif-2.0.2>python setup.py install running install running bdist_egg running egg_info writing eif.egg-info\PKG-INFO writing dependency_links to eif.egg-info\dependency_links.txt writing requirements to eif.egg-info\requires.txt writing top-level names to eif.egg-info\top_level.txt reading manifest file 'eif.egg-info\SOURCES.txt' reading manifest template 'MANIFEST.in' writing manifest file 'eif.egg-info\SOURCES.txt' installing library code to build\bdist.win-amd64\egg running install_lib running build_py running build_ext skipping '_eif.cpp' Cython extension (up-to-date) building 'eif' extension C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\MSVC\14.29.30133\bin\HostX86\x64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -IE:\ProgramFiles\anaconda\lib\site-packages\numpy\core\include -IE:\ProgramFiles\anaconda\include -IE:\ProgramFiles\anaconda\include "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\MSVC\14.29.30133\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\cppwinrt" /EHsc /Tp_eif.cpp /Fobuild\temp.win-amd64-3.8\Release_eif.obj -Wcpp cl: 命令行 error D8021 :无效的数值参数“/Wcpp” error: command 'C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\MSVC\14.29.30133\bin\HostX86\x64\cl.exe' failed with exit status 2

opened by whmwhm123 0
Unable to install eif2.0.2
Dear Team, I am getting below error while trying install eif2.02 . Methods tried:

pip install eif

Downloaded eif tar file from pypi.org and tried installing

Downloaded the code from github and tried installing

In one of the issue it is mentioned to edit setup.py file(Remove the extra_compile line) and executed

failed in all above methods, Below is the error ERROR: Complete output from command 'C:\Anaconda3\python.exe' -u -c 'import setuptools, tokenize;file='"'"'C:\Users\XSVIJA~1\AppData\Local\Temp\pip-req-build-rqacf45o\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' bdist_wheel -d 'C:\Users\XSVIJA~1\AppData\Local\Temp\pip-wheel-wjzxwp64' --python-tag cp37: ERROR: running bdist_wheel running build running build_py creating build creating build\lib.win-amd64-3.7 copying eif_old.py -> build\lib.win-amd64-3.7 copying version.py -> build\lib.win-amd64-3.7 running egg_info writing eif.egg-info\PKG-INFO writing dependency_links to eif.egg-info\dependency_links.txt writing requirements to eif.egg-info\requires.txt writing top-level names to eif.egg-info\top_level.txt reading manifest file 'eif.egg-info\SOURCES.txt' reading manifest template 'MANIFEST.in' writing manifest file 'eif.egg-info\SOURCES.txt' running build_ext cythoning _eif.pyx to _eif.cpp building 'eif' extension creating build\temp.win-amd64-3.7 creating build\temp.win-amd64-3.7\Release C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\BIN\x86_amd64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -IC:\Anaconda3\lib\site-packages\numpy\core\include -IC:\Anaconda3\include -IC:\Anaconda3\include "-IC:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\INCLUDE" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.10240.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\8.1\include\shared" "-IC:\Program Files (x86)\Windows Kits\8.1\include\um" "-IC:\Program Files (x86)\Windows Kits\8.1\include\winrt" /EHsc /Tp_eif.cpp /Fobuild\temp.win-amd64-3.7\Release_eif.obj -Wcpp cl : Command line error D8021 : invalid numeric argument '/Wcpp' error: command 'C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\BIN\x86_amd64\cl.exe' failed with exit status 2

ERROR: Failed building wheel for eif Running setup.py clean for eif Failed to build eif Installing collected packages: eif Running setup.py install for eif ... error ERROR: Complete output from command 'C:\Anaconda3\python.exe' -u -c 'import setuptools, tokenize;file='"'"'C:\Users\XSVIJA~1\AppData\Local\Temp\pip-req-build-rqacf45o\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record 'C:\Users\XSVIJA~1\AppData\Local\Temp\pip-record-f8wv7_fl\install-record.txt' --single-version-externally-managed --compile: ERROR: running install running build running build_py creating build creating build\lib.win-amd64-3.7 copying eif_old.py -> build\lib.win-amd64-3.7 copying version.py -> build\lib.win-amd64-3.7 running egg_info writing eif.egg-info\PKG-INFO writing dependency_links to eif.egg-info\dependency_links.txt writing requirements to eif.egg-info\requires.txt writing top-level names to eif.egg-info\top_level.txt reading manifest file 'eif.egg-info\SOURCES.txt' reading manifest template 'MANIFEST.in' writing manifest file 'eif.egg-info\SOURCES.txt' running build_ext skipping '_eif.cpp' Cython extension (up-to-date) building 'eif' extension creating build\temp.win-amd64-3.7 creating build\temp.win-amd64-3.7\Release C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\BIN\x86_amd64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -IC:\Anaconda3\lib\site-packages\numpy\core\include -IC:\Anaconda3\include -IC:\Anaconda3\include "-IC:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\INCLUDE" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.10240.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\8.1\include\shared" "-IC:\Program Files (x86)\Windows Kits\8.1\include\um" "-IC:\Program Files (x86)\Windows Kits\8.1\include\winrt" /EHsc /Tp_eif.cpp /Fobuild\temp.win-amd64-3.7\Release_eif.obj -Wcpp cl : Command line error D8021 : invalid numeric argument '/Wcpp' error: command 'C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\BIN\x86_amd64\cl.exe' failed with exit status 2 ---------------------------------------- ERROR: Command "'C:\Anaconda3\python.exe' -u -c 'import setuptools, tokenize;file='"'"'C:\Users\XSVIJA~1\AppData\Local\Temp\pip-req-build-rqacf45o\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record 'C:\Users\XSVIJA~1\AppData\Local\Temp\pip-record-f8wv7_fl\install-record.txt' --single-version-externally-managed --compile" failed with error code 1 in C:\Users\XSVIJA~1\AppData\Local\Temp\pip-req-build-rqacf45o\

Please help.
opened by botlavijaykumar 1
Effect of feature scaling
Hi thanks for the great package (and example notebooks!). My issue is summarised in two points:

It appears that feature scale influences the orientation of the hyperplane splits in the trees, resulting in a poor anomaly score map.

Is this expected behaviour? If so, can anyone offer an explanation as to how this comes about as it seems from the paper that the orientation of all hyperplanes are random.

The following illustrates this further:

I have noticed that the extended forest shows odd results when applied to features with very different scales. For example if I draw 2D points from 2 normal distributions with variance 1 and 1000 and plot the contour maps comparing the regular iForest and the extended we see the contours become horizontal and the heat map in general is not good compared to the regular iForest.

It seems as though the choice of hyperplane gets biased towards horizontal lines. This is also notable in the examples given in the paper (figure 9) where 3 plots of tree splits are shown: Here we see the first two examples (a and b) the x and y values of the data lie on the same scale and the splits look randomly orientated. However in c) the x scale of the data is much larger than y scale, and most splits look more vertical. As a result we seen areas of higher anomaly score above and below the point cloud in the resulting heat map:

This issue is easily fixed by simply scaling all features before using the forest. However I was wondering if the splits are done on a hyperplane of random orientation why/how does feature scale influence the orientation of splits in each tree?

Apologies if I am missing something obvious, any insight would be useful, thanks!
opened by felixcaz 0

Releases(v2.0.2)

v2.0.2(Nov 14, 2019)

Cxx Implementation, much faster, same results
Source code(tar.gz)
Source code(zip)
v1.0.2(Oct 1, 2018)

Source code(tar.gz)
Source code(zip)
v1.0.1(Aug 8, 2018)

Source code(tar.gz)
Source code(zip)
v1.0.0(Jul 15, 2018)

Extended Isolation Forest

Initial Release
Source code(tar.gz)
Source code(zip)

Owner

Sahand Hariri

GitHub Repository

BioPy is a collection (in-progress) of biologically-inspired algorithms written in Python

BioPy is a collection (in-progress) of biologically-inspired algorithms written in Python. Some of the algorithms included are mor

40 Aug 26, 2022

ThunderSVM: A Fast SVM Library on GPUs and CPUs

What's new We have recently released ThunderGBM, a fast GBDT and Random Forest library on GPUs. add scikit-learn interface, see here Overview The miss

1.4k Dec 22, 2022

Implementation of the Object Relation Transformer for Image Captioning

Object Relation Transformer This is a PyTorch implementation of the Object Relation Transformer published in NeurIPS 2019. You can find the paper here

158 Dec 24, 2022

Python implementation of Weng-Lin Bayesian ranking, a better, license-free alternative to TrueSkill

Python implementation of Weng-Lin Bayesian ranking, a better, license-free alternative to TrueSkill This is a port of the amazing openskill.js package

156 Dec 14, 2022

Predict the output which should give a fair idea about the chances of admission for a student for a particular university

Predict the output which should give a fair idea about the chances of admission for a student for a particular university.

1 Jan 11, 2022

Distributed Deep learning with Keras & Spark

Elephas: Distributed Deep Learning with Keras & Spark Elephas is an extension of Keras, which allows you to run distributed deep learning models at sc

1.6k Dec 29, 2022

Microsoft Machine Learning for Apache Spark

Microsoft Machine Learning for Apache Spark MMLSpark is an ecosystem of tools aimed towards expanding the distributed computing framework Apache Spark

3.9k Dec 30, 2022

About Solve CTF offline disconnection problem - based on python3's small crawler

About Solve CTF offline disconnection problem - based on python3's small crawler, support keyword search and local map bed establishment, currently support Jianshu, xianzhi,anquanke,freebuf,seebug

32 Oct 25, 2022

A logistic regression model for health insurance purchasing prediction

Logistic_Regression_Model A logistic regression model for health insurance purchasing prediction This code is using these packages, so please make sur

1 Nov 29, 2021

Python package for machine learning for healthcare using a OMOP common data model

This library was developed in order to facilitate rapid prototyping in Python of predictive machine-learning models using longitudinal medical data from an OMOP CDM-standard database.

75 Jan 03, 2023

MLOps pipeline project using Amazon SageMaker Pipelines

This project shows steps to build an end to end MLOps architecture that covers data prep, model training, realtime and batch inference, build model registry, track lineage of artifacts and model drif

3 Sep 16, 2022

Predicting Keystrokes using an Audio Side-Channel Attack and Machine Learning

Predicting Keystrokes using an Audio Side-Channel Attack and Machine Learning My

3 Apr 10, 2022

Machine learning model evaluation made easy: plots, tables, HTML reports, experiment tracking and Jupyter notebook analysis.

sklearn-evaluation Machine learning model evaluation made easy: plots, tables, HTML reports, experiment tracking, and Jupyter notebook analysis. Suppo

354 Dec 31, 2022

A mindmap summarising Machine Learning concepts, from Data Analysis to Deep Learning.

5.7k Dec 30, 2022

A series of Jupyter notebooks that walk you through the fundamentals of Machine Learning and Deep Learning in Python using Scikit-Learn, Keras and TensorFlow 2.

Machine Learning Notebooks, 3rd edition This project aims at teaching you the fundamentals of Machine Learning in python. It contains the example code

1.6k Jan 05, 2023

Module for statistical learning, with a particular emphasis on time-dependent modelling

Operating system Build Status Linux/Mac Windows tick tick is a Python 3 module for statistical learning, with a particular emphasis on time-dependent

410 Dec 14, 2022

Probabilistic programming framework that facilitates objective model selection for time-varying parameter models.

Time series analysis today is an important cornerstone of quantitative science in many disciplines, including natural and life sciences as well as eco

129 Dec 24, 2022

Model Agnostic Confidence Estimator (MACEST) - A Python library for calibrating Machine Learning models' confidence scores

95 Dec 28, 2022

Self Organising Map (SOM) for clustering of atomistic samples through unsupervised learning.

Self Organising Map for Clustering of Atomistic Samples - V2 Description Self Organising Map (also known as Kohonen Network) implemented in Python for

0 Nov 16, 2021

A Software Framework for Neuromorphic Computing

338 Dec 26, 2022

Extended Isolation Forest for Anomaly Detection

Related tags

Overview

Table of contents

Extended Isolation Forest

Summary

Motivation

Isolation Forest

Extension

The Code

Installation

Requirements

Use

Citation

Releases

v2.0.2

2019-NOV-14

v1.0.2

2018-OCT-01

v1.0.1

2018-AUG-08

v1.0.0

2018-JUL-15

Comments

Releases(v2.0.2)

v2.0.2(Nov 14, 2019)

v1.0.2(Oct 1, 2018)

v1.0.1(Aug 8, 2018)

v1.0.0(Jul 15, 2018)

Extended Isolation Forest

Initial Release

Owner

Sahand Hariri

BioPy is a collection (in-progress) of biologically-inspired algorithms written in Python

ThunderSVM: A Fast SVM Library on GPUs and CPUs

Implementation of the Object Relation Transformer for Image Captioning

Python implementation of Weng-Lin Bayesian ranking, a better, license-free alternative to TrueSkill

Predict the output which should give a fair idea about the chances of admission for a student for a particular university

Distributed Deep learning with Keras & Spark

Microsoft Machine Learning for Apache Spark

About Solve CTF offline disconnection problem - based on python3's small crawler

A logistic regression model for health insurance purchasing prediction

Python package for machine learning for healthcare using a OMOP common data model

MLOps pipeline project using Amazon SageMaker Pipelines

Predicting Keystrokes using an Audio Side-Channel Attack and Machine Learning

Machine learning model evaluation made easy: plots, tables, HTML reports, experiment tracking and Jupyter notebook analysis.

A mindmap summarising Machine Learning concepts, from Data Analysis to Deep Learning.

A series of Jupyter notebooks that walk you through the fundamentals of Machine Learning and Deep Learning in Python using Scikit-Learn, Keras and TensorFlow 2.

Module for statistical learning, with a particular emphasis on time-dependent modelling

Probabilistic programming framework that facilitates objective model selection for time-varying parameter models.

Model Agnostic Confidence Estimator (MACEST) - A Python library for calibrating Machine Learning models' confidence scores

Self Organising Map (SOM) for clustering of atomistic samples through unsupervised learning.

A Software Framework for Neuromorphic Computing