Repository for the Demo of using DVC with PyCaret & MLOps (DVC Office Hours - 20th Jan, 2022)

Overview

Using DVC with PyCaret & FastAPI (Demo)

This repo contains all the resources for my demo explaining how to use DVC along with other interesting tools & frameworks like PyCaret & FastAPI for data & model versioning, experimentation with ML models & finally deploying these models quickly for inferencing.

This demo was presented at the DVC Office Hours on 20th Jan 2022.

Note: We will use Azure Blob Storage as our remote storage for this demo. To follow along, it is advised to either create an Azure account or use a different remote for storage.


Steps Followed for the Demo

0. Preliminaries

Create a virtual environment named dvc-demo & install required packages

python3 -m venv dvc-demo
source dvc-demo/bin/activate

pip install dvc[azure] pycaret fastapi uvicorn python-multipart

Initialize the repo with DVC tracking & create a data/ folder

mkdir dvc-pycaret-fastapi-demo
cd dvc-pycaret-fastapi-demo
git init
dvc init

git remote add origin https://github.com/tezansahu/dvc-pycaret-fastapi-demo.git

mkdir data

1. Tracking Data with DVC

We use the Heart Failure Prediction Dataset for this demo.

First, we download the heart.csv file & retain ~800 rows from this file in the data/ folder. (We will use the file with all the rows later - this is to simulate the change/increase in data that an ML workflow sees during its lifetime)

Track this data/heart.csv using DVC

dvc add data/heart.csv
git add data/heart.csv.dvc
git commit -m "add data - phase 1"

2. Setup the Remote for Storing Tracked Data & Models

  • Go to the Azure Portal & create a Storage Account (here, we name it dvcdemo) Creating a Storage Account on Azure

  • Within the storage account, create a Container (here, we name it demo20jan2022)

  • Obtain the Connection String from the storage account as follows: Obtaining the Connection String for a Storage Account on Azure

  • Install the Azure CLI from here & log into Azure from within the terminal using az login

Now, we store the tracked data in Azure:

dvc remote add -d storage azure://demo20jan2022/dvcstore
dvc remote modify --local storage connection_string <connection-string>

dvc push
git push origin main

3. ML Experimentation with PyCaret

Create the notebooks/ folders using mkdir notebook & download the notebooks/experimentation_with_pycaret.ipynb notebook from this repo into this notebooks/ folder.

Track this notebook with Git:

git add notebooks/
git commit -m "add ml training notebook"

Run all the cells mentioned under Phase 1 in the notebook. This involves basics of PyCaret:

  • Setting up a vanilla experiment with setup()
  • Comparing various classification models with compare_models()
  • Evaluating the preformance a model with evaluate_model()
  • Making predictions on the held-out eval data using predict_model()
  • Finalizing the model by training on the full training + eval data using finalize_model()
  • Saving the model pipeline using save_model()

This will create a model.pkl file in the models/ folder

4. Tracking Models with DVC

Now, we track the ML model using DVC & store it in our remote storage

dvc add models/model.pkl
git add models/model.pkl.dvc
git commit -m "add model - phase 1"

dvc push
git push origin main

5. Deploy the Model with FastAPI

First, delete the .dvc/cache/ & models/model.pkl (simulate production env). Then, pull the changes from the DVC remote storage.

dvc pull

Check that the model.pkl file is now present in models/ folder.

Now, create a server/ folder & place the main.py file in it after downloaidng the server/main.py file from this repo. This RESTful API server has 2 POST endpoints:

  • Inferencing on an individual record
  • Batch inferencing on a CSV file

We commit this to our repo:

git add server/
git commit -m "create basic fastapi server"

Now, we can run our local server on port 8000

cd server
uvicorn main:app --port=8000

Go to http://localhost:8000/docs & play with the endpoints present in the interactive documentation.

Swagger Interactive API Documentation for our Server

For the individual inference, you could use teh following data:

{
  "Age": 61,
  "Sex": "M",
  "ChestPainType": "ASY",
  "RestingBP": 148,
  "Cholesterol": 203,
  "FastingBS": 0,
  "RestingECG": "Normal",
  "MaxHR": 161,
  "ExerciseAngina": "N",
  "Oldpeak": 0,
  "ST_Slope": "Up"
}

6. Simulating the arrival of New Data

Now, we use the full heart.csv file to simulate the arrival of new data with time. We place it within data/ folder & upload it to DVC remote.

dvc add data/heart.csv
git add data/heart.csv.dvc
git commit -m "add data - phase 2"

dvc push
git push origin main

7. More Experimentation with PyCaret

Now, we run the experiment in Phase 2 of the notebooks/experimentation_with_pycaret.ipynb notebook. This involves:

  • Feature engineering while setting up teh experient
  • Fine-tuning of models with tune_model()
  • Creating an ensemble of models with blend_models()

The blended model is saved as models/modl.pkl

We upload it to our DVC remote.

dvc add models/model.pkl
git add models/model.pkl.dvc
git commit -m "add model - phase 2"

dvc push
git push origin main

8. Redeploying the New Model using FastAPI

Now, we again start the server (no code changes required, because the model file has same name) & perform inference.

cd server
uvicorn main:app --port=8000

With this, we demonstrate how DVC can be used in conjunction with PyCaret & FastAPI for iterating & experimenting efficiently with ML models & deploying them with minimal effort.


Additional Resources


Created with ❤️ by Tezan Sahu

Owner
Tezan Sahu
Data & Applied Scientist at Microsoft with a keen interest in NLP, Deep Learning, Blockchain Technologies & Data Analytics.
Tezan Sahu
API using python and Fastapi framework

Welcome 👋 CFCApi is a API DEVELOPMENT PROJECT UNDER CODE FOR COMMUNITY ! Project Walkthrough 🚀 CFCApi run on Python using FASTapi Framework Docs The

Abhishek kushwaha 7 Jan 02, 2023
Toolkit for developing and maintaining ML models

modelkit Python framework for production ML systems. modelkit is a minimalist yet powerful MLOps library for Python, built for people who want to depl

140 Dec 27, 2022
Full stack, modern web application generator. Using FastAPI, PostgreSQL as database, Docker, automatic HTTPS and more.

Full Stack FastAPI and PostgreSQL - Base Project Generator Generate a backend and frontend stack using Python, including interactive API documentation

Sebastián Ramírez 10.8k Jan 08, 2023
Ansible Inventory Plugin, created to get hosts from HTTP API.

ansible-ws-inventory-plugin Ansible Inventory Plugin, created to get hosts from HTTP API. Features: Database compatible with MongoDB and Filesystem (J

Carlos Neto 0 Feb 05, 2022
🐞 A debug toolbar for FastAPI based on the original django-debug-toolbar. 🐞

Debug Toolbar 🐞 A debug toolbar for FastAPI based on the original django-debug-toolbar. 🐞 Swagger UI & GraphQL are supported. Documentation: https:/

Dani 74 Dec 30, 2022
A RESTful API for creating and monitoring resource components of a hypothetical build system. Built with FastAPI and pydantic. Complete with testing and CI.

diskspace-monitor-CRUD Background The build system is part of a large environment with a multitude of different components. Many of the components hav

Nick Hopewell 67 Dec 14, 2022
Keepalive - Discord Bot to keep threads from expiring

keepalive Discord Bot to keep threads from expiring Installation Create a new Di

Francesco Pierfederici 5 Mar 14, 2022
京东图片点击验证码识别

京东图片验证码识别 本项目是@yqchilde 大佬的 JDMemberCloseAccount 识别图形验证码(#45)思路验证,若你也有思路可以提交Issue和PR也可以在 @yqchilde 的 TG群 找到我 声明 本脚本只是为了学习研究使用 本脚本除了采集处理验证码图片没有其他任何功能,也

AntonVanke 37 Dec 22, 2022
Starlette middleware for Prerender

Prerender Python Starlette Starlette middleware for Prerender Documentation: https://BeeMyDesk.github.io/prerender-python-starlette/ Source Code: http

BeeMyDesk 14 May 02, 2021
A server hosts a FastAPI application and multiple clients can be connected to it via SocketIO.

FastAPI_and_SocketIO A server hosts a FastAPI application and multiple clients can be connected to it via SocketIO. Executing server.py sets up the se

Ankit Rana 2 Mar 04, 2022
User authentication fastapi with python

user-authentication-fastapi Authentication API Development Setup environment You should create a virtual environment and activate it: virtualenv venv

Sabir Hussain 3 Mar 03, 2022
Reusable utilities for FastAPI

Reusable utilities for FastAPI Documentation: https://fastapi-utils.davidmontague.xyz Source Code: https://github.com/dmontagu/fastapi-utils FastAPI i

David Montague 1.3k Jan 04, 2023
Regex Converter for Flask URL Routes

Flask-Reggie Enable Regex Routes within Flask Installation pip install flask-reggie Configuration To enable regex routes within your application from

Rhys Elsmore 48 Mar 07, 2022
flask extension for integration with the awesome pydantic package

flask extension for integration with the awesome pydantic package

249 Jan 06, 2023
A set of demo of deploying a Machine Learning Model in production using various methods

Machine Learning Model in Production This git is for those who have concern about serving your machine learning model to production. Overview The tuto

Vo Van Tu 53 Sep 14, 2022
Cube-CRUD is a simple example of a REST API CRUD in a context of rubik's cube review service.

Cube-CRUD is a simple example of a REST API CRUD in a context of rubik's cube review service. It uses Sqlalchemy ORM to manage the connection and database operations.

Sebastian Andrade 1 Dec 11, 2021
Fetching Cryptocurrency Prices from Coingecko and Displaying them on Grafana

cryptocurrency-prices-grafana Fetching Cryptocurrency Prices from Coingecko and Displaying them on Grafana About This stack consists of: Prometheus (t

Ruan Bekker 7 Aug 01, 2022
Instrument your FastAPI app

Prometheus FastAPI Instrumentator A configurable and modular Prometheus Instrumentator for your FastAPI. Install prometheus-fastapi-instrumentator fro

Tim Schwenke 441 Jan 05, 2023
MQTT FastAPI Wrapper With Python

mqtt-fastapi-wrapper Quick start Create mosquitto.conf with the following content: ➜ /tmp cat mosquitto.conf persistence false allow_anonymous true

Vitalii Kulanov 3 May 09, 2022
This repository contains learning resources for Python Fast API Framework and Docker

This repository contains learning resources for Python Fast API Framework and Docker, Build High Performing Apps With Python BootCamp by Lux Academy and Data Science East Africa.

Harun Mbaabu Mwenda 23 Nov 20, 2022