This project shows how to serve an ONNX-optimized image classification model as a web service with FastAPI, Docker, and Kubernetes.

Overview

Deploying ML models with FastAPI, Docker, and Kubernetes

By: Sayak Paul and Chansung Park

This project shows how to serve an ONNX-optimized image classification model as a RESTful web service with FastAPI, Docker, and Kubernetes (k8s). The idea is to first Dockerize the API and then deploy it on a k8s cluster running on Google Kubernetes Engine (GKE). We do this integration using GitHub Actions.

👋 Note: Even though this project uses an image classification its structure and techniques can be used to serve other models as well.

Deploying the model as a service with k8s

  • We decouple the model optimization part from our API code. The optimization part is available within the notebooks/TF_to_ONNX.ipynb notebook.

  • Then we locally test the API. You can find the instructions within the api directory.

  • To deploy the API, we define our deployment.yaml workflow file inside .github/workflows. It does the following tasks:

    • Looks for any changes in the specified directory. If there are any changes:
    • Builds and pushes the latest Docker image to Google Container Register (GCR).
    • Deploys the Docker container on the k8s cluster running on GKE.

Configurations needed beforehand

  • Create a k8s cluster on GKE. Here's a relevant resource.

  • Create a service account key (JSON) file. It's a good practice to only grant it the roles required for the project. For example, for this project, we created a fresh service account and granted it permissions for the following: Storage Admin, GKE Developer, and GCR Developer.

  • Crete a secret named GCP_CREDENTIALS on your GitHub repository and copy paste the contents of the service account key file into the secret.

  • Configure bucket storage related permissions for the service account:

    $ export PROJECT_ID=<PROJECT_ID>
    $ export ACCOUNT=<ACCOUNT>
    
    $ gcloud -q projects add-iam-policy-binding ${PROJECT_ID} \
        --member=serviceAccount:${ACCOUNT}@${PROJECT_ID}.iam.gserviceaccount.com \
        --role roles/storage.admin
    
    $ gcloud -q projects add-iam-policy-binding ${PROJECT_ID} \
        --member=serviceAccount:${ACCOUNT}@${PROJECT_ID}.iam.gserviceaccount.com \
        --role roles/storage.objectAdmin
    
    gcloud -q projects add-iam-policy-binding ${PROJECT_ID} \
        --member=serviceAccount:${ACCOUNT}@${PROJECT_ID}.iam.gserviceaccount.com \
        --role roles/storage.objectCreator
  • If you're on the main branch already then upon a new push, the worflow defined in .github/workflows/deployment.yaml should automatically run. Here's how the final outputs should look like so (run link):

Notes

  • Since we use CPU-based pods within the k8s cluster, we use ONNX optimizations since they are known to provide performance speed-ups for CPU-based environments. If you are using GPU-based pods then look into TensorRT.
  • We use Kustomize to manage the deployment on k8s.

Querying the API endpoint

From workflow outputs, you should see something like so:

NAME             TYPE           CLUSTER-IP     EXTERNAL-IP     PORT(S)        AGE
fastapi-server   LoadBalancer   xxxxxxxxxx   xxxxxxxxxx        80:30768/TCP   23m
kubernetes       ClusterIP      xxxxxxxxxx     <none>          443/TCP        160m

Note the EXTERNAL-IP corresponding to fastapi-server (iff you have named your service like so). Then cURL it:

curl -X POST -F [email protected] -F with_resize=True -F with_post_process=True http://{EXTERNAL-IP}:80/predict/image

You should get the following output (if you're using the cat.jpg image present in the api directory):

"{\"Label\": \"tabby\", \"Score\": \"0.538\"}"

The request assumes that you have a file called cat.jpg present in your working directory.

TODO (s)

  • Set up logging for the k8s pods.
  • Find a better way to report the latest API endpoint.

Acknowledgements

ML-GDE program for providing GCP credit support.

Comments
  • Feat/locust grpc

    Feat/locust grpc

    @deep-diver currently, the load test runs into:

    Screenshot 2022-04-02 at 10 54 26 AM

    I have ensured https://github.com/sayakpaul/ml-deployment-k8s-fastapi/blob/feat/locust-grpc/locust/grpc/locustfile.py#L49 returns the correct output. But after a few requests, I run into the above problem.

    Also, I should mention that the gRPC client currently does not take care of image resizing which makes it a bit less comparable to the REST client which handles preprocessing as well postprocessing.

    opened by sayakpaul 18
  • Setup TF Serving based deployment

    Setup TF Serving based deployment

    In this new feature, the following works are expected

    • Update the notebook Create a new notebook with the TF Serving prototype based on both gRPC(Ref) and RestAPI(Ref).

    • Update the notebook Update the newly created notebook to check the %%timeit on the TF Serving server locally.

    • Build/Commit docker image based on TF Serving base image using this method.

    • Deploy the built docker image on GKE cluster

    • Check the deployed model's performance with a various scenarios (maybe the same ones applied to ONNX+FastAPI scenarios)

    new feature 
    opened by deep-diver 11
  • Perform load testing with Locust

    Perform load testing with Locust

    Resources:

    • https://towardsdatascience.com/performance-testing-an-ml-serving-api-with-locust-ecd98ab9b7f7
    • https://microsoft.github.io/PartsUnlimitedMRP/pandp/200.1x-PandP-LocustTest.html
    • https://github.com/https-deeplearning-ai/machine-learning-engineering-for-production-public/tree/main/course4/week2-ungraded-labs/C4_W2_Lab_3_Latency_Test_Compose
    opened by sayakpaul 10
  • 4 dockerize

    4 dockerize

    fix

    • move api/utils/requirements.txt to /api
    • add missing dependency python-multipart to the requirements.txt

    add

    • Dockerfile

    Closes https://github.com/sayakpaul/ml-deployment-k8s-fastapi/issues/4

    opened by deep-diver 4
  • Deployment on GKE with GitHub Actions

    Deployment on GKE with GitHub Actions

    Closes https://github.com/sayakpaul/ml-deployment-k8s-fastapi/issues/5, https://github.com/sayakpaul/ml-deployment-k8s-fastapi/issues/7, and https://github.com/sayakpaul/ml-deployment-k8s-fastapi/issues/6.

    opened by sayakpaul 2
  • chore: refactored the colab notebook.

    chore: refactored the colab notebook.

    Just added a text cell explaining why it's better to include the preprocessing function in the final exported model. Also, added a cell to show if the TF and ONNX outputs match with np.testing.assert_allclose().

    opened by sayakpaul 2
Owner
Sayak Paul
ML Engineer at @carted | One PR at a time
Sayak Paul
API written using Fast API to manage events and implement a leaderboard / badge system.

Open Food Facts Events API written using Fast API to manage events and implement a leaderboard / badge system. Installation To run the API locally, ru

Open Food Facts 5 Jan 07, 2023
High-performance Async REST API, in Python. FastAPI + GINO + Arq + Uvicorn (w/ Redis and PostgreSQL).

fastapi-gino-arq-uvicorn High-performance Async REST API, in Python. FastAPI + GINO + Arq + Uvicorn (powered by Redis & PostgreSQL). Contents Get Star

Leo Sussan 351 Jan 04, 2023
Signalling for FastAPI.

fastapi-signals Signalling for FastAPI.

Henshal B 7 May 04, 2022
CLI and Streamlit applications to create APIs from Excel data files within seconds, using FastAPI

FastAPI-Wrapper CLI & APIness Streamlit App Arvindra Sehmi, Oxford Economics Ltd. | Website | LinkedIn (Updated: 21 April, 2021) fastapi-wrapper is mo

Arvindra 49 Dec 03, 2022
🐍Pywork is a Yeoman generator to scaffold a Bare-bone Python Application

Pywork python app yeoman generator Yeoman | Npm Pywork | Home PyWork is a Yeoman generator for a basic python-worker project that makes use of Pipenv,

Vu Tran 10 Dec 16, 2022
Ready-to-use and customizable users management for FastAPI

FastAPI Users Ready-to-use and customizable users management for FastAPI Documentation: https://fastapi-users.github.io/fastapi-users/ Source Code: ht

FastAPI Users 2.3k Dec 30, 2022
Dead simple CSRF security middleware for Starlette ⭐ and Fast API ⚡

csrf-starlette-fastapi Dead simple CSRF security middleware for Starlette ⭐ and Fast API ⚡ Will work with either a input type="hidden" field or ajax

Nathaniel Sabanski 9 Nov 20, 2022
Sample project showing reliable data ingestion application using FastAPI and dramatiq

Create and deploy a reliable data ingestion service with FastAPI, SQLModel and Dramatiq This is the source code for the data ingestion service explain

François Voron 31 Nov 30, 2022
FastAPI + Postgres + Docker Compose + Heroku Deploy Template

FastAPI + Postgres + Docker Compose + Heroku Deploy ⚠️ For educational purpose only. Not ready for production use YET Features FastAPI with Postgres s

DP 12 Dec 27, 2022
Single Page App with Flask and Vue.js

Developing a Single Page App with FastAPI and Vue.js Want to learn how to build this? Check out the post. Want to use this project? Build the images a

91 Jan 05, 2023
Adds simple SQLAlchemy support to FastAPI

FastAPI-SQLAlchemy FastAPI-SQLAlchemy provides a simple integration between FastAPI and SQLAlchemy in your application. It gives access to useful help

Michael Freeborn 465 Jan 07, 2023
A simple Blogging Backend app created with Fast API

This is a simple blogging app backend built with FastAPI. This project is created to simulate a real CRUD blogging system. It is built to be used by s

Owusu Kelvin Clark 13 Mar 24, 2022
Keepalive - Discord Bot to keep threads from expiring

keepalive Discord Bot to keep threads from expiring Installation Create a new Di

Francesco Pierfederici 5 Mar 14, 2022
Voucher FastAPI

Voucher-API Requirement Docker Installed on system Libraries Pandas Psycopg2 FastAPI PyArrow Pydantic Uvicorn How to run Download the repo on your sys

Hassan Munir 1 Jan 26, 2022
Generate modern Python clients from OpenAPI

openapi-python-client Generate modern Python clients from OpenAPI 3.x documents. This generator does not support OpenAPI 2.x FKA Swagger. If you need

Triax Technologies 558 Jan 07, 2023
京东图片点击验证码识别

京东图片验证码识别 本项目是@yqchilde 大佬的 JDMemberCloseAccount 识别图形验证码(#45)思路验证,若你也有思路可以提交Issue和PR也可以在 @yqchilde 的 TG群 找到我 声明 本脚本只是为了学习研究使用 本脚本除了采集处理验证码图片没有其他任何功能,也

AntonVanke 37 Dec 22, 2022
Minecraft biome tile server writing on Python using FastAPI

Blocktile Minecraft biome tile server writing on Python using FastAPI Usage https://blocktile.herokuapp.com/overworld/{seed}/{zoom}/{col}/{row}.png s

Vladimir 2 Aug 31, 2022
Opentracing support for Starlette and FastApi

Starlette-OpenTracing OpenTracing support for Starlette and FastApi. Inspired by: Flask-OpenTracing OpenTracing implementations exist for major distri

Rene Dohmen 63 Dec 30, 2022
Repository for the Demo of using DVC with PyCaret & MLOps (DVC Office Hours - 20th Jan, 2022)

Using DVC with PyCaret & FastAPI (Demo) This repo contains all the resources for my demo explaining how to use DVC along with other interesting tools

Tezan Sahu 6 Jul 22, 2022
A simple example of deploying FastAPI as a Zeit Serverless Function

FastAPI Zeit Now Deploy a FastAPI app as a Zeit Serverless Function. This repo deploys the FastAPI SQL Databases Tutorial to demonstrate how a FastAPI

Paul Weidner 26 Dec 21, 2022