๐ŸŒฒ Implementation of the Robust Random Cut Forest algorithm for anomaly detection on streams

Overview

rrcf ๐ŸŒฒ ๐ŸŒฒ ๐ŸŒฒ

Build Status Coverage Status Python 3.6 GitHub status

Implementation of the Robust Random Cut Forest Algorithm for anomaly detection by Guha et al. (2016).

S. Guha, N. Mishra, G. Roy, & O. Schrijvers, Robust random cut forest based anomaly detection on streams, in Proceedings of the 33rd International conference on machine learning, New York, NY, 2016 (pp. 2712-2721).

About

The Robust Random Cut Forest (RRCF) algorithm is an ensemble method for detecting outliers in streaming data. RRCF offers a number of features that many competing anomaly detection algorithms lack. Specifically, RRCF:

  • Is designed to handle streaming data.
  • Performs well on high-dimensional data.
  • Reduces the influence of irrelevant dimensions.
  • Gracefully handles duplicates and near-duplicates that could otherwise mask the presence of outliers.
  • Features an anomaly-scoring algorithm with a clear underlying statistical meaning.

This repository provides an open-source implementation of the RRCF algorithm and its core data structures for the purposes of facilitating experimentation and enabling future extensions of the RRCF algorithm.

Documentation

Read the docs here ๐Ÿ“– .

Installation

Use pip to install rrcf via pypi:

$ pip install rrcf

Currently, only Python 3 is supported.

Dependencies

The following dependencies are required to install and use rrcf:

The following optional dependencies are required to run the examples shown in the documentation:

Listed version numbers have been tested and are known to work (this does not necessarily preclude older versions).

Robust random cut trees

A robust random cut tree (RRCT) is a binary search tree that can be used to detect outliers in a point set. A RRCT can be instantiated from a point set. Points can also be added and removed from an RRCT.

Creating the tree

import numpy as np
import rrcf

# A (robust) random cut tree can be instantiated from a point set (n x d)
X = np.random.randn(100, 2)
tree = rrcf.RCTree(X)

# A random cut tree can also be instantiated with no points
tree = rrcf.RCTree()

Inserting points

tree = rrcf.RCTree()

for i in range(6):
    x = np.random.randn(2)
    tree.insert_point(x, index=i)
โ”€+
 โ”œโ”€โ”€โ”€+
 โ”‚   โ”œโ”€โ”€โ”€+
 โ”‚   โ”‚   โ”œโ”€โ”€(0)
 โ”‚   โ”‚   โ””โ”€โ”€โ”€+
 โ”‚   โ”‚       โ”œโ”€โ”€(5)
 โ”‚   โ”‚       โ””โ”€โ”€(4)
 โ”‚   โ””โ”€โ”€โ”€+
 โ”‚       โ”œโ”€โ”€(2)
 โ”‚       โ””โ”€โ”€(3)
 โ””โ”€โ”€(1)

Deleting points

tree.forget_point(2)
โ”€+
 โ”œโ”€โ”€โ”€+
 โ”‚   โ”œโ”€โ”€โ”€+
 โ”‚   โ”‚   โ”œโ”€โ”€(0)
 โ”‚   โ”‚   โ””โ”€โ”€โ”€+
 โ”‚   โ”‚       โ”œโ”€โ”€(5)
 โ”‚   โ”‚       โ””โ”€โ”€(4)
 โ”‚   โ””โ”€โ”€(3)
 โ””โ”€โ”€(1)

Anomaly score

The likelihood that a point is an outlier is measured by its collusive displacement (CoDisp): if including a new point significantly changes the model complexity (i.e. bit depth), then that point is more likely to be an outlier.

# Seed tree with zero-mean, normally distributed data
X = np.random.randn(100,2)
tree = rrcf.RCTree(X)

# Generate an inlier and outlier point
inlier = np.array([0, 0])
outlier = np.array([4, 4])

# Insert into tree
tree.insert_point(inlier, index='inlier')
tree.insert_point(outlier, index='outlier')
tree.codisp('inlier')
>>> 1.75
tree.codisp('outlier')
>>> 39.0

Batch anomaly detection

This example shows how a robust random cut forest can be used to detect outliers in a batch setting. Outliers correspond to large CoDisp.

import numpy as np
import pandas as pd
import rrcf

# Set parameters
np.random.seed(0)
n = 2010
d = 3
num_trees = 100
tree_size = 256

# Generate data
X = np.zeros((n, d))
X[:1000,0] = 5
X[1000:2000,0] = -5
X += 0.01*np.random.randn(*X.shape)

# Construct forest
forest = []
while len(forest) < num_trees:
    # Select random subsets of points uniformly from point set
    ixs = np.random.choice(n, size=(n // tree_size, tree_size),
                           replace=False)
    # Add sampled trees to forest
    trees = [rrcf.RCTree(X[ix], index_labels=ix) for ix in ixs]
    forest.extend(trees)

# Compute average CoDisp
avg_codisp = pd.Series(0.0, index=np.arange(n))
index = np.zeros(n)
for tree in forest:
    codisp = pd.Series({leaf : tree.codisp(leaf) for leaf in tree.leaves})
    avg_codisp[codisp.index] += codisp
    np.add.at(index, codisp.index.values, 1)
avg_codisp /= index

Image

Streaming anomaly detection

This example shows how the algorithm can be used to detect anomalies in streaming time series data.

tree_size: tree.forget_point(index - tree_size) # Insert the new point into the tree tree.insert_point(point, index=index) # Compute codisp on the new point and take the average among all trees if not index in avg_codisp: avg_codisp[index] = 0 avg_codisp[index] += tree.codisp(index) / num_trees ">
import numpy as np
import rrcf

# Generate data
n = 730
A = 50
center = 100
phi = 30
T = 2*np.pi/100
t = np.arange(n)
sin = A*np.sin(T*t-phi*T) + center
sin[235:255] = 80

# Set tree parameters
num_trees = 40
shingle_size = 4
tree_size = 256

# Create a forest of empty trees
forest = []
for _ in range(num_trees):
    tree = rrcf.RCTree()
    forest.append(tree)
    
# Use the "shingle" generator to create rolling window
points = rrcf.shingle(sin, size=shingle_size)

# Create a dict to store anomaly score of each point
avg_codisp = {}

# For each shingle...
for index, point in enumerate(points):
    # For each tree in the forest...
    for tree in forest:
        # If tree is above permitted size, drop the oldest point (FIFO)
        if len(tree.leaves) > tree_size:
            tree.forget_point(index - tree_size)
        # Insert the new point into the tree
        tree.insert_point(point, index=index)
        # Compute codisp on the new point and take the average among all trees
        if not index in avg_codisp:
            avg_codisp[index] = 0
        avg_codisp[index] += tree.codisp(index) / num_trees

Image

Contributing

We welcome contributions to the rrcf repo. To contribute, submit a pull request to the dev branch.

Types of contributions

Some suggested types of contributions include:

  • Bug fixes
  • Documentation improvements
  • Performance enhancements
  • Extensions to the algorithm

Check the issue tracker for any specific issues that need help. If you encounter a problem using rrcf, or have an idea for an extension, feel free to raise an issue.

Guidelines for contributors

Please consider the following guidelines when contributing to the codebase:

  • Ensure that any new methods, functions or classes include docstrings. Docstrings should include a description of the code, as well as descriptions of the inputs (arguments) and outputs (returns). Providing an example use case is recommended (see existing methods for examples).
  • Write unit tests for any new code and ensure that all tests are passing with no warnings. Please ensure that overall code coverage does not drop below 80%.

Running unit tests

To run unit tests, first ensure that pytest and pytest-cov are installed:

$ pip install pytest pytest-cov

To run the tests, navigate to the root directory of the repo and run:

$ pytest --cov=rrcf/

Citing

If you have used this codebase in a publication and wish to cite it, please use the Journal of Open Source Software article.

M. Bartos, A. Mullapudi, & S. Troutman, rrcf: Implementation of the Robust Random Cut Forest algorithm for anomaly detection on streams, in: Journal of Open Source Software, The Open Journal, Volume 4, Number 35. 2019

@article{bartos_2019_rrcf,
  title={{rrcf: Implementation of the Robust Random Cut Forest algorithm for anomaly detection on streams}},
  authors={Matthew Bartos and Abhiram Mullapudi and Sara Troutman},
  journal={{The Journal of Open Source Software}},
  volume={4},
  number={35},
  pages={1336},
  year={2019}
}
Comments
  • 'branch' variable can be referenced before assignment

    'branch' variable can be referenced before assignment

    I was essentially replicating the streaming example in this repo except with my own dataset and the code broke with the error shown in the screenshot.

    rrcf bug

    My full code is here.

    The bug is at line 460 of rrcf.py: local variable 'branch' referenced before assignment. What's going wrong is beyond me. Will someone please look into this? I work at LASP, a laboratory in Boulder, CO, and we're considering using this code in production, but we can't while this bug exists.

    opened by sapols 6
  • Pickling issues

    Pickling issues

    Hi, I have found a problem in pickling and unpickling the trees. After following through the original streaming example I tried pickling and unpickling a single tree, but the results are not same.

    t = forest[0]
    with open("a.pkl", "wb") as f:
        pickle.dump(t.to_dict(), f)
    
    t2 = rrcf.RCTree()
    with open("a.pkl", "rb") as f:
        t2.load_dict(pickle.load(f))
    
    len(t.leaves)
    # 257
    
    len(t2.leaves)
    # 238
    

    I thought the issue is while calling pickle.dump as the tree dict is nested, but the documentation says it'll raise RecursionError if such an object is encountered. So I think the issue could be with the to_dict or load_dict functions. I used both pickle and dill to test this.

    bug 
    opened by TrigonaMinima 5
  • Added random state to the constructor of RCTree

    Added random state to the constructor of RCTree

    Added an optional parameter to the constructor of RCTree, called "random_state". It can be int, an np.random.RandomState instance of None (default), like in all sklearn modules.

    This allows for optional generation of the same tree, if the same seed (int or RandomState) is provided, which is incredibly useful (for writing tests etc.)

    opened by nikosgavalas 4
  • subSamleSize hyperparameter missing in the implementation

    subSamleSize hyperparameter missing in the implementation

    While constructing the trees currently all the data points are taken at once for each tree and various trees are constructed using the same set of data points. As per my understanding of the algorithm, a number(numtrees) of trees of size(subSampleSize) are constructed by taking a random number of data points from all the given data points (where size of the set = subSampleSize). And then for each point in the dataset we calculate the codisp score based on how it changes the shape of each tree and calculate the average of the displacement. Is the above described scenario kept for future addition or my understanding of the algorithm is wrong? Please correct me if I am wrong.

    opened by ptiagi 4
  • Unable to copy/save model using pickle

    Unable to copy/save model using pickle

    I'm using the model in a streaming anomaly detection scenario where I want to generate the trees up to a certain point in time, then repeatedly advance the models from that starting point on various predicted time-series.

    However, the method I came up with was to "train" then save or copy the model, then run the copied version on the new time-series.

    However, it looks like the trees can't be pickled which is causing copy and save issues:

    X = np.random.randn(100, 2)
    tree = rrcf.RCTree(X)
    copy.deepcopy(tree)
    
    TypeError: can't pickle module objects
    

    This would seem to indicate that somewhere in the RCTree class, the instances are referencing a module rather than an instance of a module. Is there anyway to address this? Either by using instances rather than modules or perhaps just an alternative way to copy/save the tree classes?

    opened by colinkyle 3
  • Duplicates seem to break batch instantiation

    Duplicates seem to break batch instantiation

    Traceback:

        157         else:
        158             # Create a leaf node from isolated point
    --> 159             i = np.asscalar(np.flatnonzero(S2))
        160             leaf = Leaf(i=i, d=depth, u=branch, x=X[i, :], n=N[i])
        161             # Link leaf node to parent
    
    /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/numpy/lib/type_check.py in asscalar(a)
        487 
        488     """
    --> 489     return a.item()
        490 
        491 #-----------------------------------------------------------------------------
    
    ValueError: can only convert an array of size 1 to a Python scalar
    
    opened by mdbartos 2
  • include license in `setup.py`

    include license in `setup.py`

    Hi,

    Thanks for maintaining this nice project. I see that the project is in MIT License which is great. Can you also add this to setup.py so that it is correctly reflected in pypi. Many companies use the meta information available with pypi to understand the license of the project , so for rrcf it is coming as unknown . We just need to add one line to the setup.py, Could you please consider updating it. I can send a PR if you would like it that way.

    opened by whiletruelearn 1
  • Don't store leaves dict

    Don't store leaves dict

    Storing leaves in a dict can be troublesome (need to update it every time tree structure is changed). Could probably achieve similar performance using a function.

    opened by mdbartos 1
  • Issue when predicting identical points using a batch trained model

    Issue when predicting identical points using a batch trained model

    Hi, I have a training dataset with many identical datapoints. I use batch-mode to train the model. Thereafter, when I insert a point that is identical to a subset of points in the training dataset, the point will displace all its existing copies. This results in a high (co)displacement-score for this point, even though the point is very common.

    Update: setting the tolerance to 0 when inserting a point did the trick.

    opened by alexstrid 0
  • clarify reproducibility using numpy.random.seed

    clarify reproducibility using numpy.random.seed

    I made this change in README.md to clarify how to maintain reproducibility (might be useful for paper publication, hyper parameter optimization, and debugging).

    Control tree random seed

    Even with same data, a tree (also a forest) generated from rrcf.RCTree() is subject to np.random and might change for every run (resulting in different tree shape and anomaly score). To maintain reproducibility, use numpy.random.seed():

    # Before making a tree or forest
    seed_number = 42 # your_number
    np.random.seed(seed_number)
    tree = rrcf.RCTree(X)
    
    opened by yasirroni 0
  • Question related to the paper

    Question related to the paper

    Hi,

    my question is more "theoretical" rather than directly related to the implementation of the algorithm, but I hope someone will be able to help me in understanding this point.

    In the paper, it is stated that the insertion/removal of the point is possible because of how the splitting are performed; in particular, the authors say:

    "For example, if we choose the dimensions uniformly at random as in (Liu et al., 2012), suppose we build a tree for (1,0),(ฮต,ฮต),(0,1) where 1 โ‰ซ ฮต > 0 and then delete (1,0). The probability of getting a tree over the two remaining points that uses a vertical separator is 3/4 โˆ’ ฮต/2 and not 1/2 as desired".

    Could anyone help me to understand this statement? It's not clear to me how these probabilities are obtained.

    Thanks

    opened by adavoli91 0
  • Is it always encouraged to scale the data?

    Is it always encouraged to scale the data?

    I use RRCF to detect anomalies in a streaming 6-dimensional dataset where prices for 6 related products come in all the time. Since their distribution is not stationary but have a trend (due to inflation, the prices for products are going up) - would it be a good idea to scale the data to [0, 1] before applying the algorithm?

    opened by mtomic123 0
  • ValueError: can only convert an array of size 1 to a Python scalar

    ValueError: can only convert an array of size 1 to a Python scalar

    Hi,I have found a problem in building the tree when dealing with some datasets, but I don't know why. When I run the initial code as following,

    # Construct forest forest = [] while len(forest) < num_trees: # Select random subsets of points uniformly ixs = np.random.choice(n, size=sample_size_range, replace=False) # Add sampled trees to forest trees = [rrcf.RCTree(X[ix], index_labels=ix) for ix in ixs] forest.extend(trees)

    the error happens ,and it display like this .

    ValueError Traceback (most recent call last) in 116 replace=False) 117 # Add sampled trees to forest --> 118 trees = [rrcf.RCTree(X[ix], index_labels=ix) 119 for ix in ixs] 120 forest.extend(trees)

    in (.0) 116 replace=False) 117 # Add sampled trees to forest --> 118 trees = [rrcf.RCTree(X[ix], index_labels=ix) 119 for ix in ixs] 120 forest.extend(trees)

    ~/anaconda3/lib/python3.8/site-packages/rrcf/rrcf.py in init(self, X, index_labels, precision, random_state) 104 # Create RRC Tree 105 S = np.ones(n, dtype=np.bool) --> 106 self._mktree(X, S, N, I, parent=self) 107 # Remove parent of root 108 self.root.u = None

    ~/anaconda3/lib/python3.8/site-packages/rrcf/rrcf.py in _mktree(self, X, S, N, I, parent, side, depth) 196 if S2.sum() > 1: 197 # Recursively construct tree on S2 --> 198 self._mktree(X, S2, N, I, parent=branch, side='r', depth=depth) 199 # Otherwise... 200 else:

    ~/anaconda3/lib/python3.8/site-packages/rrcf/rrcf.py in _mktree(self, X, S, N, I, parent, side, depth) 174 if S1.sum() > 1: 175 # Recursively construct tree on S1 --> 176 self._mktree(X, S1, N, I, parent=branch, side='l', depth=depth) 177 # Otherwise... 178 else:

    ~/anaconda3/lib/python3.8/site-packages/rrcf/rrcf.py in _mktree(self, X, S, N, I, parent, side, depth) 174 if S1.sum() > 1: 175 # Recursively construct tree on S1 --> 176 self._mktree(X, S1, N, I, parent=branch, side='l', depth=depth) 177 # Otherwise... 178 else:

    ~/anaconda3/lib/python3.8/site-packages/rrcf/rrcf.py in _mktree(self, X, S, N, I, parent, side, depth) 174 if S1.sum() > 1: 175 # Recursively construct tree on S1 --> 176 self._mktree(X, S1, N, I, parent=branch, side='l', depth=depth) 177 # Otherwise... 178 else:

    ~/anaconda3/lib/python3.8/site-packages/rrcf/rrcf.py in _mktree(self, X, S, N, I, parent, side, depth) 200 else: 201 # Create a leaf node from isolated point --> 202 i = np.asscalar(np.flatnonzero(S2)) 203 leaf = Leaf(i=i, d=depth, u=branch, x=X[i, :], n=N[i]) 204 # Link leaf node to parent

    <array_function internals> in asscalar(*args, **kwargs)

    ~/anaconda3/lib/python3.8/site-packages/numpy/lib/type_check.py in asscalar(a) 579 24 580 """ --> 581 return a.item() 582 583 #-----------------------------------------------------------------------------

    ValueError: can only convert an array of size 1 to a Python scalar

    However, when I try to run the X[ix] and build again ,it run.

    opened by futtery 0
  • QUESTION: Simulating sampling of points in streaming detection

    QUESTION: Simulating sampling of points in streaming detection

    Hi! I've tested both your implementation of 'streaming detection' and 'batch detection'. So far, I'm getting the best results with the 'batch detection'. However, I want to use the streaming approach to dynamically update the model according to a continuous stream of data.

    My current understanding is that 'batch detection' performs better because of the random sampling of points. With 'streaming detection', all trees contain the same points. Therefore, I tested an approach where some points are randomly deleted from trees after calculating the codisp. That way, the trees will contain different points, which in way simulates random sampling of points. My current results tells me that this works well.

    Does this sound like a valid alternative to the standard 'streaming detection', or are there some traps I'm missing here?

    opened by stianvale 2
  • QUESTION: Feature importance

    QUESTION: Feature importance

    Hi, and thanks building this great repo!

    I have a general question; what's the proper way to compute feature importance for RRCF? Basically, I want to know what features contribute the most to the collusive displacement value.

    opened by stianvale 3
  • Wrong use of assert statements

    Wrong use of assert statements

    I noticed that there are assert statements that are catched wrongly, if an assert statement fails it throws an AssertionError not ValueError nor KeyError.

    https://github.com/kLabUM/rrcf/blob/34504c14bba233f86a7dcae35d55fc84cc5b7508/rrcf/rrcf.py#L429-L438

    Also consider removing all assert statements, because they are ignored if __debug__ is not True. This is the case when you run in production (See Docs).

    The lines could be rewritten as:

            if not point.size == self.ndim:
                raise ValueError(
                    "Point must be same dimension as existing points in tree.")
            # Check for existing index in leaves dict
            try:
                self.leaves[index]
            except KeyError:
                raise KeyError("Index already exists in leaves dict.")
    
    opened by sebtrack 0
  • RCTree cannot handle when the data consists of only one unique value

    RCTree cannot handle when the data consists of only one unique value

    I ran into issues when a subset of my sample data points only contain ONE unique value. How should we handle such an exception?

    The error message basically suggests a NaN value for probability (caused by division by zero). I tried to turn this into a uniform distribution, but it caused subsequent issue after a cut the right side contains no values. I think this violates the principle of the RRCF algo. Do we have better way of resolving such cases?

    File "<ipython-input-2-b3a957a401e5>", line 139, in <listcomp>
        rrcf.RCTree(x[ix], index_labels=ix) for ix in ixs]
      File "C:\ProgramData\Anaconda3\lib\site-packages\rrcf-0.4.3-py3.8.egg\rrcf\rrcf.py", line 106, in __init__
        self._mktree(X, S, N, I, parent=self)
      File "C:\ProgramData\Anaconda3\lib\site-packages\rrcf-0.4.3-py3.8.egg\rrcf\rrcf.py", line 177, in _mktree
        S1, S2, branch = self._cut(X, S, parent=parent, side=side)
      File "C:\ProgramData\Anaconda3\lib\site-packages\rrcf-0.4.3-py3.8.egg\rrcf\rrcf.py", line 159, in _cut
        q = self.rng.choice(self.ndim, p=l)
      File "mtrand.pyx", line 928, in numpy.random.mtrand.RandomState.choice
    ValueError: probabilities contain NaN
    
    opened by kongwilson 3
Releases(0.4.3)
Owner
Real-time water systems lab
Real-time water systems lab
A Python package to preprocess time series

Disclaimer: This package is WIP. Do not take any APIs for granted. tspreprocess Time series can contain noise, may be sampled under a non fitting rate

Maximilian Christ 57 Dec 17, 2022
่™šๆ‹Ÿ่ดงๅธ(BTCใ€ETH)็‚’ๅธ้‡ๅŒ–็ณป็ปŸ้กน็›ฎใ€‚ๅœจไธ€็‰ˆๆœฌ็š„ๅŸบ็ก€ไธŠๅŠ ๅ…ฅไบ†่ถ‹ๅŠฟๅˆคๆ–ญ

๐ŸŽ‰ ็ฌฌไบŒ็‰ˆๆœฌ ๐ŸŽ‰ ๏ผˆ็Žฐ่ดง่ถ‹ๅŠฟ็ฝ‘ๆ ผ๏ผ‰ ไป‹็ป ๅœจ็ฌฌไธ€็‰ˆๆœฌ็š„ๅŸบ็ก€ไธŠ ่ถ‹ๅŠฟๅˆคๆ–ญ๏ผŒไธๅœจๅ›บๅฎš็‚นไฝๅผ€ๅ•๏ผŒ้€‰ๆ‹ฉๆ›ดไผ˜็š„ๅผ€ไป“็‚นไฝ ไผ˜ๅŠฟ๏ผš ๐ŸŽ‰ ็ฎ€ๅ•ๆ˜“ไธŠๆ‰‹ ๅฎ‰ๅ…จ(ไธ็”จๅฐ†api_secretๅ‘Š่ฏ‰ไป–ไบบ) ๅฆ‚ไฝ•ๅฏๅŠจ ไฟฎๆ”นapp็›ฎๅฝ•ไธ‹็š„authorizationๆ–‡ไปถ

ๅนธ็ฆๆ‘็š„็ ๅ†œ 250 Jan 07, 2023
Continuously evaluated, functional, incremental, time-series forecasting

timemachines Autonomous, univariate, k-step ahead time-series forecasting functions assigned Elo ratings You can: Use some of the functionality of a s

Peter Cotton 343 Jan 04, 2023
This project has Classification and Clustering done Via kNN and K-Means respectfully

This project has Classification and Clustering done Via kNN and K-Means respectfully. It later tests its efficiency via F1/accuracy/recall/precision for kNN and Davies-Bouldin Index for Clustering. T

Mohammad Ali Mustafa 0 Jan 20, 2022
Reggy - Regressions with arbitrarily complex regularization terms

reggy Regressions with arbitrarily complex regularization terms. Currently suppo

Kim 1 Jan 20, 2022
A unified framework for machine learning with time series

Welcome to sktime A unified framework for machine learning with time series We provide specialized time series algorithms and scikit-learn compatible

The Alan Turing Institute 6k Jan 06, 2023
High performance Python GLMs with all the features!

High performance Python GLMs with all the features!

QuantCo 200 Dec 14, 2022
PyHarmonize: Adding harmony lines to recorded melodies in Python

PyHarmonize: Adding harmony lines to recorded melodies in Python About To use this module, the user provides a wav file containing a melody, the key i

Julian Kappler 2 May 20, 2022
Titanic Traveller Survivability Prediction

The aim of the mini project is predict whether or not a passenger survived based on attributes such as their age, sex, passenger class, where they embarked and more.

John Phillip 0 Jan 20, 2022
A series of Jupyter notebooks that walk you through the fundamentals of Machine Learning and Deep Learning in Python using Scikit-Learn, Keras and TensorFlow 2.

Machine Learning Notebooks, 3rd edition This project aims at teaching you the fundamentals of Machine Learning in python. It contains the example code

Aurรฉlien Geron 1.6k Jan 05, 2023
๐Ÿ”ฌ A curated list of awesome machine learning strategies & tools in financial market.

๐Ÿ”ฌ A curated list of awesome machine learning strategies & tools in financial market.

GeorgeZou 1.6k Dec 30, 2022
Predict profitability of trades based on indicator buy / sell signals

Predict profitability of trades based on indicator buy / sell signals Trade profitability analysis for trades based on various indicators signals: MAC

Tomasz Porzycki 1 Dec 15, 2021
Kaggle Competition using 15 numerical predictors to predict a continuous outcome.

Kaggle-Comp.-Data-Mining Kaggle Competition using 15 numerical predictors to predict a continuous outcome as part of a final project for a stats data

moisey alaev 1 Dec 28, 2021
A Powerful Serverless Analysis Toolkit That Takes Trial And Error Out of Machine Learning Projects

KXY: A Seemless API to 10x The Productivity of Machine Learning Engineers Documentation https://www.kxy.ai/reference/ Installation From PyPi: pip inst

KXY Technologies, Inc. 35 Jan 02, 2023
TensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters.

TensorFlowOnSpark TensorFlowOnSpark brings scalable deep learning to Apache Hadoop and Apache Spark clusters. By combining salient features from the T

Yahoo 3.8k Jan 04, 2023
SageMaker Python SDK is an open source library for training and deploying machine learning models on Amazon SageMaker.

SageMaker Python SDK SageMaker Python SDK is an open source library for training and deploying machine learning models on Amazon SageMaker. With the S

Amazon Web Services 1.8k Jan 01, 2023
Binary Classification Problem with Machine Learning

Binary Classification Problem with Machine Learning Solving Approach: 1) Ultimate Goal of the Assignment: This assignment is about solving a binary cl

Dinesh Mali 0 Jan 20, 2022
Library of Stan Models for Survival Analysis

survivalstan: Survival Models in Stan author: Jacki Novik Overview Library of Stan Models for Survival Analysis Features: Variety of standard survival

Hammer Lab 122 Jan 06, 2023
Bayesian optimization based on Gaussian processes (BO-GP) for CFD simulations.

BO-GP Bayesian optimization based on Gaussian processes (BO-GP) for CFD simulations. The BO-GP codes are developed using GPy and GPyOpt. The optimizer

KTH Mechanics 8 Mar 31, 2022
Pydantic based mock data generation

This library offers powerful mock data generation capabilities for pydantic based models. It can also be used with other libraries that use pydantic as a foundation, for example SQLModel, Beanie and

Na'aman Hirschfeld 396 Dec 28, 2022