A project based example of Data pipelines, ML workflow management, API endpoints and Monitoring.

Last update: Dec 03, 2022

Overview

MLOps

A project based example of Data pipelines, ML workflow management, API endpoints and Monitoring.

Tools used:

Data Pipeline: Dagster
ML workflow: MLflow
API Deployment: FastAPI
Monitoring: ElasticAPM

Blog posts

Requirements

Poetry (dependency management)

$ curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | python -
$ poetry --version
# Poetry version 1.1.10

pre-commit (static code analysis)

$ pip install pre-commit
$ pre-commit --version
# pre-commit 2.15.0

Minio (s3 compatible object storage)

Follow the instructions here - https://min.io/download

Setup

Environment setup

$ poetry install

MLflow

$ poetry shell
$ export MLFLOW_S3_ENDPOINT_URL=http://127.0.0.1:9000
$ export AWS_ACCESS_KEY_ID=minioadmin
$ export AWS_SECRET_ACCESS_KEY=minioadmin

# make sure that the backend store and artifact locations are same in the .env file as well
$ mlflow server \
    --backend-store-uri sqlite:///mlflow.db \
    --default-artifact-root s3://mlflow \
    --host 0.0.0.0

Minio

$ export MINIO_ROOT_USER=minioadmin
$ export MINIO_ROOT_PASSWORD=minioadmin

$ mkdir minio_data
$ minio server minio_data --console-address ":9001"

# API: http://192.168.29.103:9000  http://10.119.80.13:9000  http://127.0.0.1:9000
# RootUser: minioadmin
# RootPass: minioadmin

# Console: http://192.168.29.103:9001 http://10.119.80.13:9001 http://127.0.0.1:9001
# RootUser: minioadmin
# RootPass: minioadmin

# Command-line: https://docs.min.io/docs/minio-client-quickstart-guide
#    $ mc alias set myminio http://192.168.29.103:9000 minioadmin minioadmin

# Documentation: https://docs.min.io

Go to http://127.0.0.1:9001/buckets/ and create a bucket called mlflow.

Dagster

$ poetry shell
$ dagit -f mlops/pipeline.py

ElasticAPM

$ docker-compose -f docker-compose-monitoring.yaml up

FastAPI

$ poetry shell
$ export PYTHONPATH=.
$ python mlops/app/application.py

TODO

Setup with docker-compose.
Load testing.
Test cases.
CI/CD pipeline.
Drift detection.

A project based example of Data pipelines, ML workflow management, API endpoints and Monitoring.

Related tags

Overview

MLOps

Requirements

Poetry (dependency management)

pre-commit (static code analysis)

Minio (s3 compatible object storage)

Setup

Environment setup

MLflow

Minio

Dagster

ElasticAPM

FastAPI

TODO

Owner

Utsav

SIMD-accelerated bitwise hamming distance Python module for hexidecimal strings

Open source time series library for Python

InfiniteBoost: building infinite ensembles with gradient descent

Basic Docker Compose for Machine Learning Purposes

Predico Disease Prediction system based on symptoms provided by patient- using Python-Django & Machine Learning

Machine-learning-dell - Repositório com as atividades desenvolvidas no curso de Machine Learning

A machine learning web application for binary classification using streamlit

Uses WiFi signals :signal_strength: and machine learning to predict where you are

Machine learning algorithms implementation

Adversarial Framework for (non-) Parametric Image Stylisation Mosaics

Predicting Baseball Metric Clusters: Clustering Application in Python Using scikit-learn

MaD GUI is a basis for graphical annotation and computational analysis of time series data.

A statistical library designed to fill the void in Python's time series analysis capabilities, including the equivalent of R's auto.arima function.

Formulae is a Python library that implements Wilkinson's formulas for mixed-effects models.

Little Ball of Fur - A graph sampling extension library for NetworKit and NetworkX (CIKM 2020)

Python package for machine learning for healthcare using a OMOP common data model

Python ML pipeline that showcases mltrace functionality.

Send rockets to Mars with artificial intelligence(Genetic algorithm) in python.

STUMPY is a powerful and scalable Python library for computing a Matrix Profile, which can be used for a variety of time series data mining tasks

A Python library for detecting patterns and anomalies in massive datasets using the Matrix Profile