Inferoxy is a service for quick deploying and using dockerized Computer Vision models.

Overview

Inferoxy

codecov

What is it?

Inferoxy is a service for quick deploying and using dockerized Computer Vision models. It's a core of EORA's Computer Vision platform Vision Hub that runs on top of AWS EKS.

Why use it?

You should use it if:

  • You want to simplify deploying Computer Vision models with an appropriate Data Science stack to production: all you need to do is to build a Docker image with your model including any pre- and post-processing steps and push it into an accessible registry
  • You have only one machine or cluster for inference (CPU/GPU)
  • You want automatic batching for multi-GPU/multi-node setup
  • Model versioning

Architecture

Overall architecture

Inferoxy is built using message broker pattern.

  • Roughly speaking, it accepts user requests through different interfaces which we call "bridges". Multiple bridges can run simultaneously. Current supported bridges are REST API, gRPC and ZeroMQ
  • The requests are carefully split into batches and processed on a single multi-GPU machine or a multi-node cluster
  • The models to be deployed are managed through Model Manager that communicates with Redis to store/retrieve models information such as Docker image URL, maximum batch size value, etc.

Batching

Batching

One of the core Inferoxy's features is the batching mechanism.

  • For batch processing it's taken into consideration that different models can utilize different batch sizes and that some models can process a series of batches from a specific user, e.g. for video processing tasks. The latter models are called "stateful" models while models which don't depend on user state are called "stateless"
  • Multiple copies of the same model can run on different machines while only one copy can run on the same GPU device. So, to increase models efficiency it's recommended to set batch size for models to be as high as possible
  • A user of the stateful model reserves the whole copy of the model and releases it when his task is finished.
  • Users of the stateless models can use the same copy of the model simultaneously
  • Numpy tensors of RGB images with metadata are all going through ZeroMQ to the models and the results are also read from ZeroMQ socket

Cluster management

Cluster

The cluster management consists of keeping track of the running copies of the models, load analysis, health checking and alerting.

Requirements

You can run Inferoxy locally on a single machine or k8s cluster. To run Inferoxy, you should have a minimum of 4GB RAM and CPU or GPU device depending on your speed/cost trade-off.

Basic commands

Local run

To run locally you should use Inferoxy Docker image. The last version you can find here.

docker pull public.registry.visionhub.ru/inferoxy:v1.0.4

After image is pulled we need to make basic configuration using .env file

# .env
CLOUD_CLIENT=docker
TASK_MANAGER_DOCKER_CONFIG_NETWORK=inferoxy
TASK_MANAGER_DOCKER_CONFIG_REGISTRY=
TASK_MANAGER_DOCKER_CONFIG_LOGIN=
TASK_MANAGER_DOCKER_CONFIG_PASSWORD=
MODEL_STORAGE_DATABASE_HOST=redis
MODEL_STORAGE_DATABASE_PORT=6379
MODEL_STORAGE_DATABASE_NUMBER=0
LOGGING_LEVEL=INFO

The next step is to create inferoxy Docker network.

docker network create inferoxy

Now we should run Redis in this network. Redis is needed to store information about your models.

docker run --network inferoxy --name redis redis:latest 

Create models.yaml file with simple set of models. You can read about models.yaml in documentation

stub:
  address: public.registry.visionhub.ru/models/stub:v5
  batch_size: 256
  run_on_gpu: False
  stateless: True

Now we can start Inferoxy:

docker run --env-file .env 
	-v /var/run/docker.sock:/var/run/docker.sock \
	-p 7787:7787 -p 7788:7788 -p 8000:8000 -p 8698:8698\
	--name inferoxy --rm \
	--network inferoxy \
	-v $(pwd)/models.yaml:/etc/inferoxy/models.yaml \
	public.registry.visionhub.ru/inferoxy:${INFEROXY_VERSION}

Documentation

You can find the full documentation here

Discord

Join our community in Discord server to discuss stuff related to Inferoxy usage and development

A Python library for the Docker Engine API

Docker SDK for Python A Python library for the Docker Engine API. It lets you do anything the docker command does, but from within Python apps – run c

Docker 6.1k Dec 31, 2022
Ralph is the CMDB / Asset Management system for data center and back office hardware.

Ralph Ralph is full-featured Asset Management, DCIM and CMDB system for data centers and back offices. Features: keep track of assets purchases and th

Allegro Tech 1.9k Jan 01, 2023
Blazingly-fast :rocket:, rock-solid, local application development :arrow_right: with Kubernetes.

Gefyra Gefyra gives Kubernetes-("cloud-native")-developers a completely new way of writing and testing their applications. Over are the times of custo

Michael Schilonka 352 Dec 26, 2022
MagTape is a Policy-as-Code tool for Kubernetes that allows for evaluating Kubernetes resources against a set of defined policies to inform and enforce best practice configurations.

MagTape is a Policy-as-Code tool for Kubernetes that allows for evaluating Kubernetes resources against a set of defined policies to inform and enforce best practice configurations. MagTape includes

T-Mobile 143 Dec 27, 2022
Tools for writing awesome Fabric files

About fabtools includes useful functions to help you write your Fabric files. fabtools makes it easier to manage system users, packages, databases, et

1.3k Dec 30, 2022
Docker Container wallstreetbets-sentiment-analysis

Docker Container wallstreetbets-sentiment-analysis A docker container using restful endpoints exposed on port 5000 "/analyze" to gather sentiment anal

145 Nov 22, 2022
HB Case Study

HB Case Study Envoy Proxy It is a modern Layer7(App) and Layer3(TCP) proxy Incredibly modernized version of reverse proxies like NGINX, HAProxy It is

Ilker Ispir 1 Oct 22, 2021
MicroK8s is a small, fast, single-package Kubernetes for developers, IoT and edge.

MicroK8s The smallest, fastest Kubernetes Single-package fully conformant lightweight Kubernetes that works on 42 flavours of Linux. Perfect for: Deve

Ubuntu 7.1k Jan 08, 2023
MLops tools review for execution on multiple cluster types: slurm, kubernetes, dask...

MLops tools review focused on execution using multiple cluster types: slurm, kubernetes, dask...

4 Nov 30, 2022
Tools and Docker images to make a fast Ruby on Rails development environment

Tools and Docker images to make a fast Ruby on Rails development environment. With the production templates, moving from development to production will be seamless.

1 Nov 13, 2022
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

Apache Airflow Apache Airflow (or simply Airflow) is a platform to programmatically author, schedule, and monitor workflows. When workflows are define

The Apache Software Foundation 28.6k Jan 01, 2023
Ingress patch example by Kustomize

Ingress patch example by Kustomize

Jinu 10 Nov 14, 2022
Python job scheduling for humans.

schedule Python job scheduling for humans. Run Python functions (or any other callable) periodically using a friendly syntax. A simple to use API for

Dan Bader 10.4k Jan 02, 2023
Webinar oficial Zabbix Brasil. Uma série de 4 aulas sobre API do Zabbix.

Repositório de scripts do Webinar de API do Zabbix Webinar oficial Zabbix Brasil. Uma série de 4 aulas sobre API do Zabbix. Nossos encontros [x] 04/11

Robert Silva 7 Mar 31, 2022
gunicorn 'Green Unicorn' is a WSGI HTTP Server for UNIX, fast clients and sleepy applications.

Gunicorn Gunicorn 'Green Unicorn' is a Python WSGI HTTP Server for UNIX. It's a pre-fork worker model ported from Ruby's Unicorn project. The Gunicorn

Benoit Chesneau 8.7k Jan 08, 2023
docker-compose工程部署时的辅助脚本

okta-cmd Introduction docker-compose 辅助脚本

完美风暴666 4 Dec 09, 2021
CDK Template of Table Definition AWS Lambda for RDB

CDK Template of Table Definition AWS Lambda for RDB Overview This sample deploys Amazon Aurora of PostgreSQL or MySQL with AWS Lambda that can define

AWS Samples 5 May 16, 2022
ZeroMQ bindings for Twisted

Twisted bindings for 0MQ Introduction txZMQ allows to integrate easily ØMQ sockets into Twisted event loop (reactor). txZMQ supports both CPython and

Andrey Smirnov 149 Dec 08, 2022
Kubediff: a tool for Kubernetes to show differences between running state and version controlled configuration.

Kubediff: a tool for Kubernetes to show differences between running state and version controlled configuration.

Weaveworks 1.1k Dec 30, 2022