Machine Learning Systems Design

Read this booklet here.

This booklet covers four main steps of designing a machine learning system:

Project setup
Data pipeline
Modeling: selecting, training, and debugging
Serving: testing, deploying, and maintaining

It comes with links to practical resources that explain each aspect in more details. It also suggests case studies written by machine learning engineers at major tech companies who have deployed machine learning systems to solve real-world problems.

At the end, the booklet contains 27 open-ended machine learning systems design questions that might come up in machine learning interviews. The answers for these questions will be published in the book Machine Learning Interviews. You can look at and contribute to community answers to these questions on GitHub here. You can read more about the book and sign up for the book's mailing list here.

Contribute

This is work-in-progress so any type of contribution is very much appreciated. Here are a few ways you can contribute:

Improve the text by fixing any lexical, grammatical, or technical error
Add more relevant resources to each aspect of the machine learning project flow
Add/edit questions
Add/edit answers
Other

This book was created using the wonderful magicbook package. For detailed instructions on how to use the package, see their GitHub repo. The package requires that you have node. If you're on Mac, you can install node using:

brew install node

Install magicbook with:

npm install magicbook

Clone this repository:

git clone https://github.com/chiphuyen/machine-learning-systems-design.git
cd machine-learning-systems-design

After you've made changes to the content in the content folder, you can build the booklet by the following steps:

magicbook build

You'll find the generated HTML and PDF files in the folder build.

Acknowledgment

I'd like to thank Ben Krause for being a great friend and helping me with this draft!

A booklet on machine learning systems design with exercises

Related tags

Overview

Machine Learning Systems Design

Contribute

Acknowledgment

Citation

Owner

Chip Huyen

Kohei's 5th place solution for xview3 challenge

Implementation of OmniNet, Omnidirectional Representations from Transformers, in Pytorch

TensorLight - A high-level framework for TensorFlow

Machine learning evaluation metrics, implemented in Python, R, Haskell, and MATLAB / Octave

For auto aligning, cropping, and scaling HR and LR images for training image based neural networks

Notspot robot simulation - Python version

CAMoE + Dual SoftMax Loss (DSL): Improving Video-Text Retrieval by Multi-Stream Corpus Alignment and Dual Softmax Loss

This is the codebase for Diffusion Models Beat GANS on Image Synthesis.

Facilitating Database Tuning with Hyper-ParameterOptimization: A Comprehensive Experimental Evaluation

fastgradio is a python library to quickly build and share gradio interfaces of your trained fastai models.

NeWT: Natural World Tasks

[SIGGRAPH'22] StyleGAN-XL: Scaling StyleGAN to Large Diverse Datasets

Code for Max-Margin Contrastive Learning - AAAI 2022

Ensembling Off-the-shelf Models for GAN Training

Mixed Transformer UNet for Medical Image Segmentation

Official implementation of "Not only Look, but also Listen: Learning Multimodal Violence Detection under Weak Supervision" ECCV2020

A library for researching neural networks compression and acceleration methods.

Pytorch Implementation for CVPR2018 Paper: Learning to Compare: Relation Network for Few-Shot Learning

Code repository for "Free View Synthesis", ECCV 2020.

The code is the training example of AAAI2022 Security AI Challenger Program Phase 8: Data Centric Robot Learning on ML models.