A simple guide to MLOps through ZenML and its various integrations.

Last update: Dec 27, 2022

Overview

ZenBytes

Join our

Slack Community and become part of the ZenML family

Give the main ZenML repo a

GitHub star to show your love

ZenBytes is a series of practical lessons about MLOps through ZenML and its various integrations. It is intended for people looking to learn about MLOps generally, and also practitioners specifically looking to learn more about ZenML.

🙏 About ZenML

ZenML is an extensible, open-source MLOps framework to create production-ready machine learning pipelines. Built for data scientists, it has a simple, flexible syntax, is cloud- and tool-agnostic, and has interfaces/abstractions that are catered towards ML workflows. The ZenML repository and Docs has more details.

ZenML is a good tool to learn MLOps because of two reasons:

🔹 ZenML focuses on being un-opinionated about underlying tooling and infrastructure across the MLOps stack. 🔹 ZenML presents itself as a pipeline tool, making all development in ZenML data-centric rather than model-centric.

🧱 Structure of Lessons

The lessons are structured in Chapters. Each chapter is a notebook that walks through and explains various concepts:

Chapter 0: Basics
Chapter 1: Building a ML(Ops) pipeline
Chapter 2: Transitioning across stacks
Coming soon: More chapters

💻 System Requirements

In order to run these lessons, you need to have some packages installed on your machine. Note you only need these for some parts, and you might get away with only Python and pip install requirements.txt for some parts of the codebase, but we recommend installing all these:

Currently, this will only run on UNIX systems.

package	MacOS installation	Linux installation
docker	Docker Desktop for Mac	Docker Engine for Linux
kubectl	kubectl for mac	kubectl for linux
k3d	Brew Installation of k3d	k3d installation linux

You might also need to install Anaconda to get the MLflow deployment to work.

🐍 Python Requirements

Once you've got the system requirements figured out, let's jump into the Python packages you need. Within the Python environment of your choice, run:

git clone https://github.com/zenml-io/zenbytes
pip install -r requirements.txt

If you are running the run.py script, you will also need to install some integrations using zenml:

zenml integration install sklearn -f
zenml integration install dash -f
zenml integration install evidently -f
zenml integration install mlflow -f
zenml integration install kubeflow -f
zenml integration install seldon -f

📓 Diving into the code

We're ready to go now. You can go through the notebook step-by-step guide:

jupyter notebook

🏁 Cleaning up when you're done

Once you are done running all notebooks you might want to stop all running processes. For this, run the following command. (This will tear down your k3d cluster and the local docker registry.)

zenml stack set aws_kubeflow_stack
zenml stack down -f
zenml stack set local_kubeflow_stack
zenml stack down -f

❓ FAQ

MacOS When starting the container registry for Kubeflow, I get an error about port 5000 not being available. OSError: [Errno 48] Address already in use

Solution: In order for Kubeflow to run, the docker container registry currently needs to be at port 5000. MacOS, however, uses port 5000 for the Airplay receiver. Here is a guide on how to fix this Freeing up port 5000.

A simple guide to MLOps through ZenML and its various integrations.

Related tags

Overview

ZenBytes

🙏 About ZenML

🧱 Structure of Lessons

💻 System Requirements

🐍 Python Requirements

📓 Diving into the code

🏁 Cleaning up when you're done

❓ FAQ

Owner

ZenML

Python module for performing linear regression for data with measurement errors and intrinsic scatter

2021 Machine Learning Security Evasion Competition

Library for machine learning stacking generalization.

Implementations of Machine Learning models, Regularizers, Optimizers and different Cost functions.

[DEPRECATED] Tensorflow wrapper for DataFrames on Apache Spark

customer churn prediction prevention in telecom industry using machine learning and survival analysis

PyHarmonize: Adding harmony lines to recorded melodies in Python

Course files for "Ocean/Atmosphere Time Series Analysis"

pymc-learn: Practical Probabilistic Machine Learning in Python

Implementation of linesearch Optimization Algorithms in Python

NCVX (NonConVeX): A User-Friendly and Scalable Package for Nonconvex Optimization in Machine Learning.

CrayLabs and user contibuted examples of using SmartSim for various simulation and machine learning applications.

Distributed Computing for AI Made Simple

A webpage that utilizes machine learning to extract sentiments from tweets.

neurodsp is a collection of approaches for applying digital signal processing to neural time series

Python package for stacking (machine learning technique)

TensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters.

This is an auto-ML tool specialized in detecting of outliers

Model factory is a ML training platform to help engineers to build ML models at scale

a distributed deep learning platform