This project shows how to serve an ONNX-optimized image classification model as a web service with FastAPI, Docker, and Kubernetes.

Overview

Deploying ML models with FastAPI, Docker, and Kubernetes

By: Sayak Paul and Chansung Park

This project shows how to serve an ONNX-optimized image classification model as a RESTful web service with FastAPI, Docker, and Kubernetes (k8s). The idea is to first Dockerize the API and then deploy it on a k8s cluster running on Google Kubernetes Engine (GKE). We do this integration using GitHub Actions.

๐Ÿ‘‹ Note: Even though this project uses an image classification its structure and techniques can be used to serve other models as well.

Deploying the model as a service with k8s

  • We decouple the model optimization part from our API code. The optimization part is available within the notebooks/TF_to_ONNX.ipynb notebook.

  • Then we locally test the API. You can find the instructions within the api directory.

  • To deploy the API, we define our deployment.yaml workflow file inside .github/workflows. It does the following tasks:

    • Looks for any changes in the specified directory. If there are any changes:
    • Builds and pushes the latest Docker image to Google Container Register (GCR).
    • Deploys the Docker container on the k8s cluster running on GKE.

Configurations needed beforehand

  • Create a k8s cluster on GKE. Here's a relevant resource.

  • Create a service account key (JSON) file. It's a good practice to only grant it the roles required for the project. For example, for this project, we created a fresh service account and granted it permissions for the following: Storage Admin, GKE Developer, and GCR Developer.

  • Crete a secret named GCP_CREDENTIALS on your GitHub repository and copy paste the contents of the service account key file into the secret.

  • Configure bucket storage related permissions for the service account:

    $ export PROJECT_ID=<PROJECT_ID>
    $ export ACCOUNT=<ACCOUNT>
    
    $ gcloud -q projects add-iam-policy-binding ${PROJECT_ID} \
        --member=serviceAccount:${ACCOUNT}@${PROJECT_ID}.iam.gserviceaccount.com \
        --role roles/storage.admin
    
    $ gcloud -q projects add-iam-policy-binding ${PROJECT_ID} \
        --member=serviceAccount:${ACCOUNT}@${PROJECT_ID}.iam.gserviceaccount.com \
        --role roles/storage.objectAdmin
    
    gcloud -q projects add-iam-policy-binding ${PROJECT_ID} \
        --member=serviceAccount:${ACCOUNT}@${PROJECT_ID}.iam.gserviceaccount.com \
        --role roles/storage.objectCreator
  • If you're on the main branch already then upon a new push, the worflow defined in .github/workflows/deployment.yaml should automatically run. Here's how the final outputs should look like so (run link):

Notes

  • Since we use CPU-based pods within the k8s cluster, we use ONNX optimizations since they are known to provide performance speed-ups for CPU-based environments. If you are using GPU-based pods then look into TensorRT.
  • We use Kustomize to manage the deployment on k8s.

Querying the API endpoint

From workflow outputs, you should see something like so:

NAME             TYPE           CLUSTER-IP     EXTERNAL-IP     PORT(S)        AGE
fastapi-server   LoadBalancer   xxxxxxxxxx   xxxxxxxxxx        80:30768/TCP   23m
kubernetes       ClusterIP      xxxxxxxxxx     <none>          443/TCP        160m

Note the EXTERNAL-IP corresponding to fastapi-server (iff you have named your service like so). Then cURL it:

curl -X POST -F [email protected] -F with_resize=True -F with_post_process=True http://{EXTERNAL-IP}:80/predict/image

You should get the following output (if you're using the cat.jpg image present in the api directory):

"{\"Label\": \"tabby\", \"Score\": \"0.538\"}"

The request assumes that you have a file called cat.jpg present in your working directory.

TODO (s)

  • Set up logging for the k8s pods.
  • Find a better way to report the latest API endpoint.

Acknowledgements

ML-GDE program for providing GCP credit support.

Comments
  • Feat/locust grpc

    Feat/locust grpc

    @deep-diver currently, the load test runs into:

    Screenshot 2022-04-02 at 10 54 26 AM

    I have ensured https://github.com/sayakpaul/ml-deployment-k8s-fastapi/blob/feat/locust-grpc/locust/grpc/locustfile.py#L49 returns the correct output. But after a few requests, I run into the above problem.

    Also, I should mention that the gRPC client currently does not take care of image resizing which makes it a bit less comparable to the REST client which handles preprocessing as well postprocessing.

    opened by sayakpaul 18
  • Setup TF Serving based deployment

    Setup TF Serving based deployment

    In this new feature, the following works are expected

    • Update the notebook Create a new notebook with the TF Serving prototype based on both gRPC(Ref) and RestAPI(Ref).

    • Update the notebook Update the newly created notebook to check the %%timeit on the TF Serving server locally.

    • Build/Commit docker image based on TF Serving base image using this method.

    • Deploy the built docker image on GKE cluster

    • Check the deployed model's performance with a various scenarios (maybe the same ones applied to ONNX+FastAPI scenarios)

    new feature 
    opened by deep-diver 11
  • Perform load testing with Locust

    Perform load testing with Locust

    Resources:

    • https://towardsdatascience.com/performance-testing-an-ml-serving-api-with-locust-ecd98ab9b7f7
    • https://microsoft.github.io/PartsUnlimitedMRP/pandp/200.1x-PandP-LocustTest.html
    • https://github.com/https-deeplearning-ai/machine-learning-engineering-for-production-public/tree/main/course4/week2-ungraded-labs/C4_W2_Lab_3_Latency_Test_Compose
    opened by sayakpaul 10
  • 4 dockerize

    4 dockerize

    fix

    • move api/utils/requirements.txt to /api
    • add missing dependency python-multipart to the requirements.txt

    add

    • Dockerfile

    Closes https://github.com/sayakpaul/ml-deployment-k8s-fastapi/issues/4

    opened by deep-diver 4
  • Deployment on GKE with GitHub Actions

    Deployment on GKE with GitHub Actions

    Closes https://github.com/sayakpaul/ml-deployment-k8s-fastapi/issues/5, https://github.com/sayakpaul/ml-deployment-k8s-fastapi/issues/7, and https://github.com/sayakpaul/ml-deployment-k8s-fastapi/issues/6.

    opened by sayakpaul 2
  • chore: refactored the colab notebook.

    chore: refactored the colab notebook.

    Just added a text cell explaining why it's better to include the preprocessing function in the final exported model. Also, added a cell to show if the TF and ONNX outputs match with np.testing.assert_allclose().

    opened by sayakpaul 2
Owner
Sayak Paul
ML Engineer at @carted | One PR at a time
Sayak Paul
์Šคํƒ€ํŠธ์—… ๊ฐœ๋ฐœ์ž ์ฑ„์šฉ

์Šคํƒ€ํŠธ์—… ๊ฐœ๋ฐœ์ž ์ฑ„์šฉ ๅคง ๋ฐ•๋žŒํšŒ Seed ~ Series B์— ์žˆ๋Š” ์Šคํƒ€ํŠธ์—…์„ ์œ„ํ•œ ์ฑ„์šฉ์ •๋ณด ํŽ˜์ด์ง€์ž…๋‹ˆ๋‹ค. Back-end, Frontend, Mobile ๋“ฑ ๊ฐœ๋ฐœ์ž๋ฅผ ๋Œ€์ƒ์œผ๋กœ ์ง„ํ–‰ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ํ•ด๋‹น ์Šคํƒ€ํŠธ์—…์— ์ข…์‚ฌํ•˜์‹œ๋Š” ๋ถ„๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ์ฑ„์šฉ ๊ด€๋ จ ์ •๋ณด๋ฅผ ์•Œ๊ณ  ๊ณ„์‹œ๋‹ค๋ฉด

JuHyun Lee 58 Dec 14, 2022
FastAPI with Docker and Traefik

Dockerizing FastAPI with Postgres, Uvicorn, and Traefik Want to learn how to build this? Check out the post. Want to use this project? Development Bui

51 Jan 06, 2023
Simple example of FastAPI + Celery + Triton for benchmarking

You can see the previous work from: https://github.com/Curt-Park/producer-consumer-fastapi-celery https://github.com/Curt-Park/triton-inference-server

Jinwoo Park (Curt) 37 Dec 29, 2022
FastAPI native extension, easy and simple JWT auth

fastapi-jwt FastAPI native extension, easy and simple JWT auth

Konstantin Chernyshev 19 Dec 12, 2022
Browse JSON API in a HTML interface.

Falcon API Browse This project provides a middleware for Falcon Web Framework that will render the response in an HTML form for documentation purpose.

Abhilash Raj 4 Mar 16, 2022
Adds GraphQL support to your Flask application.

Flask-GraphQL Adds GraphQL support to your Flask application. Usage Just use the GraphQLView view from flask_graphql from flask import Flask from flas

GraphQL Python 1.3k Dec 31, 2022
SQLAlchemy Admin for Starlette/FastAPI

SQLAlchemy Admin for Starlette/FastAPI SQLAdmin is a flexible Admin interface for SQLAlchemy models. Main features include: SQLAlchemy sync/async engi

Amin Alaee 683 Jan 03, 2023
Cookiecutter API for creating Custom Skills for Azure Search using Python and Docker

cookiecutter-spacy-fastapi Python cookiecutter API for quick deployments of spaCy models with FastAPI Azure Search The API interface is compatible wit

Microsoft 379 Jan 03, 2023
Complete Fundamental to Expert Codes of FastAPI for creating API's

FastAPI FastAPI is a modern, fast (high-performance), web framework for building APIs with Python 3 based on standard Python type hints. The key featu

Pranav Anand 1 Nov 28, 2021
A minimal FastAPI implementation for Django !

Caution!!! This project is in early developing stage. So use it at you own risk. Bug reports / Fix PRs are welcomed. Installation pip install django-m

toki 23 Dec 24, 2022
Keycloack plugin for FastApi.

FastAPI Keycloack Keycloack plugin for FastApi. Your aplication receives the claims decoded from the access token. Usage Run keycloak on port 8080 and

Elber 4 Jun 24, 2022
Cbpa - Coinbase Pro Automation for buying your favourite cryptocurrencies

cbpa Coinbase Pro Automation for making buy orders from a default bank account.

Anthony Corletti 3 Nov 27, 2022
API for Submarino store

submarino-api API for the submarino e-commerce documentation read the documentation in: https://submarino-api.herokuapp.com/docs or in https://submari

Miguel 1 Oct 14, 2021
A Python framework to build Slack apps in a flash with the latest platform features.

Bolt for Python A Python framework to build Slack apps in a flash with the latest platform features. Read the getting started guide and look at our co

SlackAPI 684 Jan 09, 2023
fastapi-cache is a tool to cache fastapi response and function result, with backends support redis and memcached.

fastapi-cache Introduction fastapi-cache is a tool to cache fastapi response and function result, with backends support redis, memcache, and dynamodb.

long2ice 551 Jan 08, 2023
Turns your Python functions into microservices with web API, interactive GUI, and more.

Instantly turn your Python functions into production-ready microservices. Deploy and access your services via HTTP API or interactive UI. Seamlessly export your services into portable, shareable, and

Machine Learning Tooling 2.8k Jan 04, 2023
๐Ÿž A debug toolbar for FastAPI based on the original django-debug-toolbar. ๐Ÿž

Debug Toolbar ๐Ÿž A debug toolbar for FastAPI based on the original django-debug-toolbar. ๐Ÿž Swagger UI & GraphQL are supported. Documentation: https:/

Dani 74 Dec 30, 2022
A FastAPI Framework for things like Database, Redis, Logging, JWT Authentication and Rate Limits

A FastAPI Framework for things like Database, Redis, Logging, JWT Authentication and Rate Limits Install You can install this Library with: pip instal

Tert0 33 Nov 28, 2022
An alternative implement of Imjad API | Imjad API ็š„ๅผ€ๆบๆ›ฟไปฃ

HibiAPI An alternative implement of Imjad API. Imjad API ็š„ๅผ€ๆบๆ›ฟไปฃ. ๅ‰่จ€ ็”ฑไบŽImjad API่ฟ™ๆ˜ฏไป€ไนˆ?ไฝฟ็”จไบบๆ•ฐ่ฟ‡ๅคš, ่‡ดไฝฟ่ฐƒ็”จ่ถ…ๅ‡บ้™ๅˆถ, ๆ‰€ไปฅๆœฌไบบๅธŒๆœ›ๆไพ›ไธ€ไธชๅผ€ๆบๆ›ฟไปฃๆฅไพ›็คพๅŒบ่ฟ›่กŒ่‡ช็”ฑ็š„้ƒจ็ฝฒๅ’Œไฝฟ็”จ, ไปŽ่€Œๅ‡่ฝปไธ€้ƒจๅˆ†่ฏฅAPI็š„ไฝฟ็”จๅŽ‹ๅŠ› ไผ˜ๅŠฟ

Mix Technology 450 Dec 29, 2022
The template for building scalable web APIs based on FastAPI, Tortoise ORM and other.

FastAPI and Tortoise ORM. Powerful but simple template for web APIs w/ FastAPI (as web framework) and Tortoise-ORM (for working via database without h

prostomarkeloff 95 Jan 08, 2023