This project shows how to serve an ONNX-optimized image classification model as a web service with FastAPI, Docker, and Kubernetes.

Last update: Dec 23, 2022

Overview

Deploying ML models with FastAPI, Docker, and Kubernetes

By: Sayak Paul and Chansung Park

This project shows how to serve an ONNX-optimized image classification model as a RESTful web service with FastAPI, Docker, and Kubernetes (k8s). The idea is to first Dockerize the API and then deploy it on a k8s cluster running on Google Kubernetes Engine (GKE). We do this integration using GitHub Actions.

👋 Note: Even though this project uses an image classification its structure and techniques can be used to serve other models as well.

Deploying the model as a service with k8s

We decouple the model optimization part from our API code. The optimization part is available within the notebooks/TF_to_ONNX.ipynb notebook.
Then we locally test the API. You can find the instructions within the api directory.
To deploy the API, we define our deployment.yaml workflow file inside .github/workflows. It does the following tasks:
- Looks for any changes in the specified directory. If there are any changes:
- Builds and pushes the latest Docker image to Google Container Register (GCR).
- Deploys the Docker container on the k8s cluster running on GKE.

Configurations needed beforehand

Create a k8s cluster on GKE. Here's a relevant resource.
Create a service account key (JSON) file. It's a good practice to only grant it the roles required for the project. For example, for this project, we created a fresh service account and granted it permissions for the following: Storage Admin, GKE Developer, and GCR Developer.
Crete a secret named GCP_CREDENTIALS on your GitHub repository and copy paste the contents of the service account key file into the secret.

Configure bucket storage related permissions for the service account:

$ export PROJECT_ID=<PROJECT_ID>
$ export ACCOUNT=<ACCOUNT>

$ gcloud -q projects add-iam-policy-binding ${PROJECT_ID} \
    --member=serviceAccount:${ACCOUNT}@${PROJECT_ID}.iam.gserviceaccount.com \
    --role roles/storage.admin

$ gcloud -q projects add-iam-policy-binding ${PROJECT_ID} \
    --member=serviceAccount:${ACCOUNT}@${PROJECT_ID}.iam.gserviceaccount.com \
    --role roles/storage.objectAdmin

gcloud -q projects add-iam-policy-binding ${PROJECT_ID} \
    --member=serviceAccount:${ACCOUNT}@${PROJECT_ID}.iam.gserviceaccount.com \
    --role roles/storage.objectCreator

If you're on the main branch already then upon a new push, the worflow defined in .github/workflows/deployment.yaml should automatically run. Here's how the final outputs should look like so (run link):

Notes

Since we use CPU-based pods within the k8s cluster, we use ONNX optimizations since they are known to provide performance speed-ups for CPU-based environments. If you are using GPU-based pods then look into TensorRT.
We use Kustomize to manage the deployment on k8s.

Querying the API endpoint

From workflow outputs, you should see something like so:

NAME             TYPE           CLUSTER-IP     EXTERNAL-IP     PORT(S)        AGE
fastapi-server   LoadBalancer   xxxxxxxxxx   xxxxxxxxxx        80:30768/TCP   23m
kubernetes       ClusterIP      xxxxxxxxxx     <none>          443/TCP        160m

Note the EXTERNAL-IP corresponding to fastapi-server (iff you have named your service like so). Then cURL it:

curl -X POST -F [email protected] -F with_resize=True -F with_post_process=True http://{EXTERNAL-IP}:80/predict/image

You should get the following output (if you're using the cat.jpg image present in the api directory):

"{\"Label\": \"tabby\", \"Score\": \"0.538\"}"

The request assumes that you have a file called cat.jpg present in your working directory.

TODO (s)

Set up logging for the k8s pods.
Find a better way to report the latest API endpoint.

Acknowledgements

ML-GDE program for providing GCP credit support.

Comments

Feat/locust grpc

@deep-diver currently, the load test runs into:

I have ensured https://github.com/sayakpaul/ml-deployment-k8s-fastapi/blob/feat/locust-grpc/locust/grpc/locustfile.py#L49 returns the correct output. But after a few requests, I run into the above problem.

Also, I should mention that the gRPC client currently does not take care of image resizing which makes it a bit less comparable to the REST client which handles preprocessing as well postprocessing.

opened by sayakpaul 18
Setup TF Serving based deployment
In this new feature, the following works are expected

~~Update the notebook~~ Create a new notebook with the TF Serving prototype based on both gRPC(Ref) and RestAPI(Ref).

~~Update the notebook~~ Update the newly created notebook to check the %%timeit on the TF Serving server locally.

Build/Commit docker image based on TF Serving base image using this method.

Deploy the built docker image on GKE cluster

Check the deployed model's performance with a various scenarios (maybe the same ones applied to ONNX+FastAPI scenarios)

new feature
opened by deep-diver 11
Perform load testing with Locust
Resources:

https://towardsdatascience.com/performance-testing-an-ml-serving-api-with-locust-ecd98ab9b7f7

https://microsoft.github.io/PartsUnlimitedMRP/pandp/200.1x-PandP-LocustTest.html

https://github.com/https-deeplearning-ai/machine-learning-engineering-for-production-public/tree/main/course4/week2-ungraded-labs/C4_W2_Lab_3_Latency_Test_Compose
opened by sayakpaul 10
4 dockerize
fix

move api/utils/requirements.txt to /api

add missing dependency python-multipart to the requirements.txt

add

Dockerfile

Closes https://github.com/sayakpaul/ml-deployment-k8s-fastapi/issues/4
opened by deep-diver 4
Deployment on GKE with GitHub Actions

Closes https://github.com/sayakpaul/ml-deployment-k8s-fastapi/issues/5, https://github.com/sayakpaul/ml-deployment-k8s-fastapi/issues/7, and https://github.com/sayakpaul/ml-deployment-k8s-fastapi/issues/6.

opened by sayakpaul 2
chore: refactored the colab notebook.

Just added a text cell explaining why it's better to include the preprocessing function in the final exported model. Also, added a cell to show if the TF and ONNX outputs match with np.testing.assert_allclose().

opened by sayakpaul 2

Releases(v1.0.0)

v1.0.0(Feb 21, 2022)

Source code(tar.gz)
Source code(zip)
resnet50_w_preprocessing.onnx(97.42 MB)
resnet50_w_preprocessing_tf.tar.gz(101.89 MB)

Owner

Sayak Paul

ML Engineer at @carted | One PR at a time

GitHub Repository

This project shows how to serve an ONNX-optimized image classification model as a web service with FastAPI, Docker, and Kubernetes.

Related tags

Overview

Deploying ML models with FastAPI, Docker, and Kubernetes

Deploying the model as a service with k8s

Configurations needed beforehand

Notes

Querying the API endpoint

TODO (s)

Acknowledgements

Comments

Feat/locust grpc

Setup TF Serving based deployment

Perform load testing with Locust

4 dockerize

Deployment on GKE with GitHub Actions

chore: refactored the colab notebook.

Releases(v1.0.0)

v1.0.0(Feb 21, 2022)

Owner

Sayak Paul

API written using Fast API to manage events and implement a leaderboard / badge system.

High-performance Async REST API, in Python. FastAPI + GINO + Arq + Uvicorn (w/ Redis and PostgreSQL).

Signalling for FastAPI.

CLI and Streamlit applications to create APIs from Excel data files within seconds, using FastAPI

🐍Pywork is a Yeoman generator to scaffold a Bare-bone Python Application

Ready-to-use and customizable users management for FastAPI

Dead simple CSRF security middleware for Starlette ⭐ and Fast API ⚡

Sample project showing reliable data ingestion application using FastAPI and dramatiq

FastAPI + Postgres + Docker Compose + Heroku Deploy Template

Single Page App with Flask and Vue.js

Adds simple SQLAlchemy support to FastAPI

A simple Blogging Backend app created with Fast API

Keepalive - Discord Bot to keep threads from expiring

Voucher FastAPI

Generate modern Python clients from OpenAPI

京东图片点击验证码识别

Minecraft biome tile server writing on Python using FastAPI

Opentracing support for Starlette and FastApi

Repository for the Demo of using DVC with PyCaret & MLOps (DVC Office Hours - 20th Jan, 2022)

A simple example of deploying FastAPI as a Zeit Serverless Function