Automated Machine Learning Pipeline for tabular data. Designed for predictive maintenance applications, failure identification, failure prediction, condition monitoring, etc.

Overview

Amplo - AutoML (for Machine Data)

image PyPI - License

Welcome to the Automated Machine Learning package Amplo. Amplo's AutoML is designed specifically for machine data and works very well with tabular time series data (especially unbalanced classification!).

Though this is a standalone Python package, Amplo's AutoML is also available on Amplo's Smart Maintenance Platform. With a graphical user interface and various data connectors, it is the ideal place for service engineers to get started on Predictive.

Amplo's AutoML Pipeline contains the entire Machine Learning development cycle, including exploratory data analysis, data cleaning, feature extraction, feature selection, model selection, hyper parameter optimization, stacking, version control, production-ready models and documentation. It comes with additional tools such as interval analysers, drift detectors, data quality checks, etc.

Downloading Amplo

The easiest way is to install our Python package through PyPi:

pip install Amplo

2. Usage

Usage is very simple with Amplo's AutoML Pipeline.

from Amplo import Pipeline
from sklearn.datasets import make_classification
from sklearn.datasets import make_regression


x, y = make_classification()
pipeline = Pipeline()
pipeline.fit(x, y)
yp = pipeline.predict_proba(x)

x, y = make_regression()
pipeline = Pipeline()
pipeline.fit(x, y)
yp = pipeline.predict(x)

3. Amplo AutoML Features

Interval Analyser

from Amplo.AutoML import IntervalAnalyser

Interval Analyser for Log file classification. When log files have to be classified, and there is not enough data for time series methods (such as LSTMs, ROCKET or Weasel, Boss, etc), one needs to fall back to classical machine learning models which work better with lower samples. This raises the problem of which samples to classify. You shouldn't just simply classify on every sample and accumulate, that may greatly disrupt classification performance. Therefore, we introduce this interval analyser. By using an approximate K-Nearest Neighbors algorithm, one can estimate the strength of correlation for every sample inside a log. Using this allows for better interval selection for classical machine learning models.

To use this interval analyser, make sure that your logs are located in a folder of their class, with one parent folder with all classes, e.g.:

+-- Parent Folder
|   +-- Class_1
|       +-- Log_1.*
|       +-- Log_2.*
|   +-- Class_2
|       +-- Log_3.*

Exploratory Data Analysis

from Amplo.AutoML import DataExplorer

Automated Exploratory Data Analysis. Covers binary classification and regression. It generates:

  • Missing Values Plot
  • Line Plots of all features
  • Box plots of all features
  • Co-linearity Plot
  • SHAP Values
  • Random Forest Feature Importance
  • Predictive Power Score

Additional plots for Regression:

  • Seasonality Plots
  • Differentiated Variance Plot
  • Auto Correlation Function Plot
  • Partial Auto Correlation Function Plot
  • Cross Correlation Function Plot
  • Scatter Plots

Data Processing

from Amplo.AutoML import DataProcesser

Automated Data Cleaning:

  • Infers & converts data types (integer, floats, categorical, datetime)
  • Reformats column names
  • Removes duplicates columns and rows
  • Handles missing values by:
    • Removing columns
    • Removing rows
    • Interpolating
    • Filling with zero's
  • Removes outliers using:
    • Clipping
    • Z-score
    • Quantiles
  • Removes constant columns

Data Sampler

from Amplo.AutoML import DataSampler

This pipeline is designed to handle unbalanced classification problems. Aside weighted loss functions, under sampling the majority class or down sampling the minority class helps. Various algorithms are analysed:

  • SMOTE
  • Borderline SMOTE
  • Random Over Sampler
  • Tomek Links
  • One Sided Selection
  • Random Under Sampler
  • Edited Nearest Neighbours
  • SMOTE Tomek
  • SMOTE Edited Nearest Neighbours

Feature Processing

from Amplo.AutoML import FeatureProcesser

Automatically extracts and selects features. Removes Co-Linear Features. Included Feature Extraction algorithms:

  • Multiplicative Features
  • Dividing Features
  • Additive Features
  • Subtractive Features
  • Trigonometric Features
  • K-Means Features
  • Lagged Features
  • Differencing Features
  • Inverse Features
  • Datetime Features

Included Feature Selection algorithms:

  • Random Forest Feature Importance (Threshold and Increment)
  • Predictive Power Score

Sequencing

from Amplo.AutoML import Sequencer

For time series regression problems, it is often useful to include multiple previous samples instead of just the latest. This class sequences the data, based on which time steps you want included in the in- and output. This is also very useful when working with tensors, as a tensor can be returned which directly fits into a Recurrent Neural Network.

Modelling

from Amplo.AutoML import Modeller

Runs various regression or classification models. Includes:

  • Scikit's Linear Model
  • Scikit's Random Forest
  • Scikit's Bagging
  • Scikit's GradientBoosting
  • Scikit's HistGradientBoosting
  • DMLC's XGBoost
  • Catboost's Catboost
  • Microsoft's LightGBM
  • Stacking Models

Grid Search

from Amplo.GridSearch import *

Contains three hyper parameter optimizers with extended predefined model parameters:

  • Grid Search
  • Halving Random Search
  • Optuna's Tree-Parzen-Estimator

Automatic Documntation

from Amplo.AutoML import Documenter

Contains a documenter for classification (binary and multiclass problems), as well as for regression. Creates a pdf report for a Pipeline, including metrics, data processing steps, and everything else to recreate the result.

You might also like...
Model Validation Toolkit is a collection of tools to assist with validating machine learning models prior to deploying them to production and monitoring them after deployment to production.

Model Validation Toolkit is a collection of tools to assist with validating machine learning models prior to deploying them to production and monitoring them after deployment to production.

A toolkit for making real world machine learning and data analysis applications in C++

dlib C++ library Dlib is a modern C++ toolkit containing machine learning algorithms and tools for creating complex software in C++ to solve real worl

The easy way to combine mlflow, hydra and optuna into one machine learning pipeline.
The easy way to combine mlflow, hydra and optuna into one machine learning pipeline.

mlflow_hydra_optuna_the_easy_way The easy way to combine mlflow, hydra and optuna into one machine learning pipeline. Objective TODO Usage 1. build do

fMRIprep Pipeline To Machine Learning

fMRIprep Pipeline To Machine Learning(Demo) 所有配置均在config.py文件下定义 前置环境(lilab) 各个节点均安装docker,并有fmripre的镜像 可以使用conda中的base环境(相应的第三份包之后更新) 1. fmriprep scr

This repository contains full machine learning pipeline of the Zillow Houses competition on Kaggle platform.

Zillow-Houses This repository contains full machine learning pipeline of the Zillow Houses competition on Kaggle platform. Pipeline is consists of 10

MachineLearningStocks is designed to be an intuitive and highly extensible template project applying machine learning to making stock predictions.
TorchDrug is a PyTorch-based machine learning toolbox designed for drug discovery

A powerful and flexible machine learning platform for drug discovery

Automated Machine Learning with scikit-learn

auto-sklearn auto-sklearn is an automated machine learning toolkit and a drop-in replacement for a scikit-learn estimator. Find the documentation here

MLBox is a powerful Automated Machine Learning python library.
MLBox is a powerful Automated Machine Learning python library.

MLBox is a powerful Automated Machine Learning python library. It provides the following features: Fast reading and distributed data preprocessing/cle

Releases(v0.10.2)
Owner
Amplo
Zurich based SaaS startup providing a Smart Maintenance Platform
Amplo
Code Repository for Machine Learning with PyTorch and Scikit-Learn

Code Repository for Machine Learning with PyTorch and Scikit-Learn

Sebastian Raschka 1.4k Jan 03, 2023
Client - 🔥 A tool for visualizing and tracking your machine learning experiments

Weights and Biases Use W&B to build better models faster. Track and visualize all the pieces of your machine learning pipeline, from datasets to produ

Weights & Biases 5.2k Jan 03, 2023
ArviZ is a Python package for exploratory analysis of Bayesian models

ArviZ (pronounced "AR-vees") is a Python package for exploratory analysis of Bayesian models. Includes functions for posterior analysis, data storage, model checking, comparison and diagnostics

ArviZ 1.3k Jan 05, 2023
Repository for DCA0305, an undergraduate course about Machine Learning Workflows and Pipelines

Federal University of Rio Grande do Norte Technology Center Department of Computer Engineering and Automation Machine Learning Based Systems Design Re

Ivanovitch Silva 81 Oct 18, 2022
Optuna is an automatic hyperparameter optimization software framework, particularly designed for machine learning

Optuna is an automatic hyperparameter optimization software framework, particularly designed for machine learning. It features an imperative, define-by-run style user API.

7.4k Jan 04, 2023
Library for machine learning stacking generalization.

stacked_generalization Implemented machine learning *stacking technic[1]* as handy library in Python. Feature weighted linear stacking is also availab

114 Jul 19, 2022
inding a method to objectively quantify skill versus chance in games, using reinforcement learning

Skill-vs-chance-games-analysis - Finding a method to objectively quantify skill versus chance in games, using reinforcement learning

Marcus Chiam 4 Nov 19, 2022
Module for statistical learning, with a particular emphasis on time-dependent modelling

Operating system Build Status Linux/Mac Windows tick tick is a Python 3 module for statistical learning, with a particular emphasis on time-dependent

X - Data Science Initiative 410 Dec 14, 2022
Short PhD seminar on Machine Learning Security (Adversarial Machine Learning)

Short PhD seminar on Machine Learning Security (Adversarial Machine Learning)

141 Dec 27, 2022
Highly interpretable classifiers for scikit learn, producing easily understood decision rules instead of black box models

Highly interpretable, sklearn-compatible classifier based on decision rules This is a scikit-learn compatible wrapper for the Bayesian Rule List class

Tamas Madl 482 Nov 19, 2022
使用数学和计算机知识投机倒把

偷鸡不成项目集锦 坦率地讲,涉及金融市场的好策略如果公开,必然导致使用的人多,最后策略变差。所以这个仓库只收集我目前失败了的案例。 加密货币组合套利 中国体育彩票预测 我赚不上钱的项目,也许可以帮助更有能力的人去赚钱。

Roy 28 Dec 29, 2022
A Software Framework for Neuromorphic Computing

A Software Framework for Neuromorphic Computing

Lava 338 Dec 26, 2022
Forecast dynamically at scale with this unique package. pip install scalecast

🌄 Scalecast: Dynamic Forecasting at Scale About This package uses a scaleable forecasting approach in Python with common scikit-learn and statsmodels

Michael Keith 158 Jan 03, 2023
An open source framework that provides a simple, universal API for building distributed applications. Ray is packaged with RLlib, a scalable reinforcement learning library, and Tune, a scalable hyperparameter tuning library.

Ray provides a simple, universal API for building distributed applications. Ray is packaged with the following libraries for accelerating machine lear

23.3k Dec 31, 2022
🤖 ⚡ scikit-learn tips

🤖 ⚡ scikit-learn tips New tips are posted on LinkedIn, Twitter, and Facebook. 👉 Sign up to receive 2 video tips by email every week! 👈 List of all

Kevin Markham 1.6k Jan 03, 2023
Real-time domain adaptation for semantic segmentation

Advanced-Machine-Learning This repository contains the code for the project Real

Andrea Cavallo 1 Jan 30, 2022
ThunderSVM: A Fast SVM Library on GPUs and CPUs

What's new We have recently released ThunderGBM, a fast GBDT and Random Forest library on GPUs. add scikit-learn interface, see here Overview The miss

Xtra Computing Group 1.4k Dec 22, 2022
WAGMA-SGD is a decentralized asynchronous SGD for distributed deep learning training based on model averaging.

WAGMA-SGD is a decentralized asynchronous SGD based on wait-avoiding group model averaging. The synchronization is relaxed by making the collectives externally-triggerable, namely, a collective can b

Shigang Li 6 Jun 18, 2022
All-in-one web-based development environment for machine learning

All-in-one web-based development environment for machine learning Getting Started • Features & Screenshots • Support • Report a Bug • FAQ • Known Issu

3 Feb 03, 2021
Real-time stream processing for python

Streamz Streamz helps you build pipelines to manage continuous streams of data. It is simple to use in simple cases, but also supports complex pipelin

Python Streamz 1.1k Dec 28, 2022