Inferoxy is a service for quick deploying and using dockerized Computer Vision models.

Last update: Oct 10, 2022

Related tags

Overview

What is it?

Inferoxy is a service for quick deploying and using dockerized Computer Vision models. It's a core of EORA's Computer Vision platform Vision Hub that runs on top of AWS EKS.

Why use it?

You should use it if:

You want to simplify deploying Computer Vision models with an appropriate Data Science stack to production: all you need to do is to build a Docker image with your model including any pre- and post-processing steps and push it into an accessible registry
You have only one machine or cluster for inference (CPU/GPU)
You want automatic batching for multi-GPU/multi-node setup
Model versioning

Architecture

Inferoxy is built using message broker pattern.

Roughly speaking, it accepts user requests through different interfaces which we call "bridges". Multiple bridges can run simultaneously. Current supported bridges are REST API, gRPC and ZeroMQ
The requests are carefully split into batches and processed on a single multi-GPU machine or a multi-node cluster
The models to be deployed are managed through Model Manager that communicates with Redis to store/retrieve models information such as Docker image URL, maximum batch size value, etc.

Batching

One of the core Inferoxy's features is the batching mechanism.

For batch processing it's taken into consideration that different models can utilize different batch sizes and that some models can process a series of batches from a specific user, e.g. for video processing tasks. The latter models are called "stateful" models while models which don't depend on user state are called "stateless"
Multiple copies of the same model can run on different machines while only one copy can run on the same GPU device. So, to increase models efficiency it's recommended to set batch size for models to be as high as possible
A user of the stateful model reserves the whole copy of the model and releases it when his task is finished.
Users of the stateless models can use the same copy of the model simultaneously
Numpy tensors of RGB images with metadata are all going through ZeroMQ to the models and the results are also read from ZeroMQ socket

Cluster management

The cluster management consists of keeping track of the running copies of the models, load analysis, health checking and alerting.

Requirements

You can run Inferoxy locally on a single machine or k8s cluster. To run Inferoxy, you should have a minimum of 4GB RAM and CPU or GPU device depending on your speed/cost trade-off.

Basic commands

Local run

To run locally you should use Inferoxy Docker image. The last version you can find here.

docker pull public.registry.visionhub.ru/inferoxy:v1.0.4

After image is pulled we need to make basic configuration using .env file

# .env
CLOUD_CLIENT=docker
TASK_MANAGER_DOCKER_CONFIG_NETWORK=inferoxy
TASK_MANAGER_DOCKER_CONFIG_REGISTRY=
TASK_MANAGER_DOCKER_CONFIG_LOGIN=
TASK_MANAGER_DOCKER_CONFIG_PASSWORD=
MODEL_STORAGE_DATABASE_HOST=redis
MODEL_STORAGE_DATABASE_PORT=6379
MODEL_STORAGE_DATABASE_NUMBER=0
LOGGING_LEVEL=INFO

The next step is to create inferoxy Docker network.

docker network create inferoxy

Now we should run Redis in this network. Redis is needed to store information about your models.

docker run --network inferoxy --name redis redis:latest

Create models.yaml file with simple set of models. You can read about models.yaml in documentation

stub:
  address: public.registry.visionhub.ru/models/stub:v5
  batch_size: 256
  run_on_gpu: False
  stateless: True

Now we can start Inferoxy:

docker run --env-file .env 
	-v /var/run/docker.sock:/var/run/docker.sock \
	-p 7787:7787 -p 7788:7788 -p 8000:8000 -p 8698:8698\
	--name inferoxy --rm \
	--network inferoxy \
	-v $(pwd)/models.yaml:/etc/inferoxy/models.yaml \
	public.registry.visionhub.ru/inferoxy:${INFEROXY_VERSION}

Documentation

You can find the full documentation here

Discord

Join our community in Discord server to discuss stuff related to Inferoxy usage and development

Inferoxy is a service for quick deploying and using dockerized Computer Vision models.

Related tags

Overview

What is it?

Why use it?

Architecture

Batching

Cluster management

Requirements

Basic commands

Local run

Documentation

Discord

Owner

Google Kubernetes Engine (GKE) with a Snyk Kubernetes controller installed/configured for Snyk App

Cross-platform lib for process and system monitoring in Python

SSH to WebSockets Bridge

Let's learn how to build, release and operate your containerized applications to Amazon ECS and AWS Fargate using AWS Copilot.

Rancher Kubernetes API compatible with RKE, RKE2 and maybe others?

Daemon to ban hosts that cause multiple authentication errors

The leading native Python SSHv2 protocol library.

Pulumi - Developer-First Infrastructure as Code. Your Cloud, Your Language, Your Way 🚀

Linux, Jenkins, AWS, SRE, Prometheus, Docker, Python, Ansible, Git, Kubernetes, Terraform, OpenStack, SQL, NoSQL, Azure, GCP, DNS, Elastic, Network, Virtualization. DevOps Interview Questions

Knock your images before these make you painful.

Project 4 Cloud DevOps Nanodegree

Inferoxy is a service for quick deploying and using dockerized Computer Vision models.

A basic instruction for Kubernetes setup and understanding.

More than 130 check plugins for Icinga and other Nagios-compatible monitoring applications. Each plugin is a standalone command line tool (written in Python) that provides a specific type of check.

Lima is an alternative to using Docker Desktop on your Mac.

A job launching library for docker, EC2, GCP, etc.

Supervisor process control system for UNIX

🐳 Docker templates for various languages.

This projects provides the documentation and the automation(code) for the Oracle EMEA WLA COA Demo UseCase.

MicroK8s is a small, fast, single-package Kubernetes for developers, IoT and edge.