The code from the Machine Learning Bookcamp book and a free course based on the book

Overview

Machine Learning Bookcamp

The code from the Machine Learning Bookcamp book

Useful links:

Machine Learning Zoomcamp

Machine Learning Zoomcamp is a course based on the book

  • It's online and free
  • You can join at any moment
  • More information in the course-zoomcamp folder

Reading Plan

Chapters

Chapter 1: Introduction to Machine Learning

  • Understanding machine learning and the problems it can solve
  • CRISP-DM: Organizing a successful machine learning project
  • Training and selecting machine learning models
  • Performing model validation

No code

Chapter 2: Machine Learning for Regression

  • Creating a car-price prediction project with a linear regression model
  • Doing an initial exploratory data analysis with Jupyter notebooks
  • Setting up a validation framework
  • Implementing the linear regression model from scratch
  • Performing simple feature engineering for the model
  • Keeping the model under control with regularization
  • Using the model to predict car prices

Code: chapter-02-car-price/02-carprice.ipynb

Chapter 3: Machine Learning for Classification

  • Predicting customers who will churn with logistic regression
  • Doing exploratory data analysis for identifying important features
  • Encoding categorical variables to use them in machine learning models
  • Using logistic regression for classification

Code: chapter-03-churn-prediction/03-churn.ipynb

Chapter 4: Evaluation Metrics for Classification

  • Accuracy as a way of evaluating binary classification models and its limitations
  • Determining where our model makes mistakes using a confusion table
  • Deriving other metrics like precision and recall from the confusion table
  • Using ROC and AUC to further understand the performance of a binary classification model
  • Cross-validating a model to make sure it behaves optimally
  • Tuning the parameters of a model to achieve the best predictive performance

Code: chapter-03-churn-prediction/04-metrics.ipynb

Chapter 5: Deploying Machine Learning Models

  • Saving models with Pickle
  • Serving models with Flask
  • Managing dependencies with Pipenv
  • Making the service self-contained with Docker
  • Deploying it to the cloud using AWS Elastic Beanstalk

Code: chapter-05-deployment

Chapter 6: Decision Trees and Ensemble Learning

  • Predicting the risk of default with tree-based models
  • Decision trees and the decision tree learning algorithm
  • Random forest: putting multiple trees together into one model
  • Gradient boosting as an alternative way of combining decision trees

Code: chapter-06-trees/06-trees.ipynb

Chapter 7: Neural Networks and Deep Learning

  • Convolutional neural networks for image classification
  • TensorFlow and Keras — frameworks for building neural networks
  • Using pre-trained neural networks
  • Internals of a convolutional neural network
  • Training a model with transfer learning
  • Data augmentations — the process of generating more training data

Code: chapter-07-neural-nets/07-neural-nets-train.ipynb

Chapter 8: Serverless Deep Learning

  • Serving models with TensorFlow-Lite — a light-weight environment for applying TensorFlow models
  • Deploying deep learning models with AWS Lambda
  • Exposing the Lambda function as a web service via API Gateway

Code: chapter-08-serverless

Chapter 9: Kubernetes and Kubeflow

Kubernetes:

  • Understanding different methods of deploying and serving models in the cloud.
  • Serving Keras and TensorFlow models with TensorFlow-Serving
  • Deploying TensorFlow-Serving to Kubernetes

Code: chapter-09-kubernetes

Kubeflow:

  • Using Kubeflow and KFServing for simplifying the deployment process

Code: chapter-09-kubeflow

Articles from mlbookcamp.com:

Appendices

Appendix A: Setting up the Environment

  • Installing Anaconda, a Python distribution that includes most of the scientific libraries we need
  • Running a Jupyter Notebook service from a remote machine
  • Installing and configuring the Kaggle command line interface tool for accessing datasets from Kaggle
  • Creating an EC2 machine on AWS using the web interface and the command-line interface

Code: no code

Articles from mlbookcamp.com:

Appendix B: Introduction to Python

  • Basic python syntax: variables and control-flow structures
  • Collections: lists, tuples, sets, and dictionaries
  • List comprehensions: a concise way of operating on collections
  • Reusability: functions, classes and importing code
  • Package management: using pip for installing libraries
  • Running python scripts

Code: appendix-b-python.ipynb

Articles from mlbookcamp.com:

Appendix C: Introduction to NumPy and Linear Algebra

  • One-dimensional and two-dimensional NumPy arrays
  • Generating NumPy arrays randomly
  • Operations with NumPy arrays: element-wise operations, summarizing operations, sorting and filtering
  • Multiplication in linear algebra: vector-vector, matrix-vector and matrix-matrix multiplications
  • Finding the inverse of a matrix and solving the normal equation

Code: appendix-c-numpy.ipynb

Articles from mlbookcamp.com:

Appendix C: Introduction to Pandas

  • The main data structures in Pandas: DataFrame and Series
  • Accessing rows and columns of a DataFrame
  • Element-wise and summarizing operations
  • Working with missing values
  • Sorting and grouping

Code: appendix-d-pandas.ipynb

Appendix D: AWS SageMaker

  • Increasing the GPU quota limits
  • Renting a Jupyter notebook with GPU in AWS SageMaker
You might also like...
A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

Light Gradient Boosting Machine LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is designed to be distributed a

Examples and code for the Practical Machine Learning workshop series

Practical Machine Learning Workshop Series Practical Machine Learning for Quantitative Finance Post conference workshop at the WBS Spring Conference D

100 Days of Machine and Deep Learning Code

💯 Days of Machine Learning and Deep Learning Code MACHINE LEARNING TOPICS COVERED - FROM SCRATCH Linear Regression Logistic Regression K Means Cluste

Turns your machine learning code into microservices with web API, interactive GUI, and more.
Turns your machine learning code into microservices with web API, interactive GUI, and more.

Turns your machine learning code into microservices with web API, interactive GUI, and more.

TorchDrug is a PyTorch-based machine learning toolbox designed for drug discovery

A powerful and flexible machine learning platform for drug discovery

Machine learning template for projects based on sklearn library.

Machine learning template for projects based on sklearn library.

Predico Disease Prediction system based on symptoms provided by patient- using Python-Django & Machine Learning

Predico Disease Prediction system based on symptoms provided by patient- using Python-Django & Machine Learning

Painless Machine Learning for python based on scikit-learn

PlainML Painless Machine Learning Library for python based on scikit-learn. Install pip install plainml Example from plainml import KnnModel, load_ir

Microsoft contributing libraries, tools, recipes, sample codes and workshop contents for machine learning & deep learning.

Microsoft contributing libraries, tools, recipes, sample codes and workshop contents for machine learning & deep learning.

Comments
  • Adding setup with docker

    Adding setup with docker

    Hi @alexeygrigorev ,

    I created a small guide for anyone who feels comfortable using Docker or might want to try it for setting up the environment.

    Since I saw a couple of questions today related to environment setup, I thought of sharing what I usually use when working on projects or courses, then it can be re-usable.

    Hoping is helpful :)

    Changelog:

    • Updated readme with link to guide to create docker container
    • Added new guide to build docker container and run it
    • Added Dockerfile and environment.yml
    opened by laurauzcategui 5
  • While converting keras to tflite error

    While converting keras to tflite error

    While converting keras to tflite error :

    raise ValueError('Unrecognized keyword arguments:', kwargs.keys()) ValueError: ('Unrecognized keyword arguments:', dict_keys(['ragged']))

    Traceback (most recent call last): File "convert.py", line 5, in <module> model = keras.models.load_model('xception_v4_large_08_0.894.h5')

    opened by saisubramani 5
  • notes correction in 06 Decision Trees...

    notes correction in 06 Decision Trees...

    Inside 02-data-prep.md , in the train/val/test split bullet note at the moment is : "Split the data with the distribution of 80% train, 20% validation, and 20% test sets with random seed to 11"

    should be:

    Split the data with the distribution of 60% train, 20% validation, and 20% test sets with random seed to 11

    opened by lucapug 4
  • Update homework.md

    Update homework.md

    Updated Question 4 text from "when one grows" to "when one grows up" and the F1 formula from "F1 = 2 * P * R / (P + R)" to "$$F1 = {2.}\frac{P . R}{P+R}$$"

    opened by ukokobili 3
Releases(chapter7-model)
Owner
Alexey Grigorev
Alexey Grigorev
A model to predict steering torque fully end-to-end

torque_model The torque model is a spiritual successor to op-smart-torque, which was a project to train a neural network to control a car's steering f

Shane Smiskol 4 Jun 03, 2022
Real-time domain adaptation for semantic segmentation

Advanced-Machine-Learning This repository contains the code for the project Real

Andrea Cavallo 1 Jan 30, 2022
Model factory is a ML training platform to help engineers to build ML models at scale

Model Factory Machine learning today is powering many businesses today, e.g., search engine, e-commerce, news or feed recommendation. Training high qu

16 Sep 23, 2022
Can a machine learning project be implemented to estimate the salaries of baseball players whose salary information and career statistics for 1986 are shared?

END TO END MACHINE LEARNING PROJECT ON HITTERS DATASET Can a machine learning project be implemented to estimate the salaries of baseball players whos

Pinar Oner 7 Dec 18, 2021
Lightweight Machine Learning Experiment Logging 📖

Simple logging of statistics, model checkpoints, plots and other objects for your Machine Learning Experiments (MLE). Furthermore, the MLELogger comes with smooth multi-seed result aggregation and co

Robert Lange 65 Dec 08, 2022
BioPy is a collection (in-progress) of biologically-inspired algorithms written in Python

BioPy is a collection (in-progress) of biologically-inspired algorithms written in Python. Some of the algorithms included are mor

Jared M. Smith 40 Aug 26, 2022
Bodywork deploys machine learning projects developed in Python, to Kubernetes.

Bodywork deploys machine learning projects developed in Python, to Kubernetes. It helps you to: serve models as microservices execute batch jobs run r

Bodywork Machine Learning 409 Jan 01, 2023
A Collection of Conference & School Notes in Machine Learning 🦄📝🎉

Machine Learning Conference & Summer School Notes. 🦄📝🎉

558 Dec 28, 2022
Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.

Horovod Horovod is a distributed deep learning training framework for TensorFlow, Keras, PyTorch, and Apache MXNet. The goal of Horovod is to make dis

Horovod 12.9k Jan 07, 2023
Open-Source CI/CD platform for ML teams. Deliver ML products, better & faster. ⚡️🧑‍🔧

Deliver ML products, better & faster Giskard is an Open-Source CI/CD platform for ML teams. Inspect ML models visually from your Python notebook 📗 Re

Giskard 335 Jan 04, 2023
Module is created to build a spam filter using Python and the multinomial Naive Bayes algorithm.

Naive-Bayes Spam Classificator Module is created to build a spam filter using Python and the multinomial Naive Bayes algorithm. Main goal is to code a

Viktoria Maksymiuk 1 Jun 27, 2022
Uplift modeling and causal inference with machine learning algorithms

Disclaimer This project is stable and being incubated for long-term support. It may contain new experimental code, for which APIs are subject to chang

Uber Open Source 3.7k Jan 07, 2023
An MLOps framework to package, deploy, monitor and manage thousands of production machine learning models

Seldon Core: Blazing Fast, Industry-Ready ML An open source platform to deploy your machine learning models on Kubernetes at massive scale. Overview S

Seldon 3.5k Jan 01, 2023
This is a curated list of medical data for machine learning

Medical Data for Machine Learning This is a curated list of medical data for machine learning. This list is provided for informational purposes only,

Andrew L. Beam 5.4k Dec 26, 2022
Automated Machine Learning Pipeline with Feature Engineering and Hyper-Parameters Tuning

The mljar-supervised is an Automated Machine Learning Python package that works with tabular data. I

MLJAR 2.4k Jan 02, 2023
A quick reference guide to the most commonly used patterns and functions in PySpark SQL

Using PySpark we can process data from Hadoop HDFS, AWS S3, and many file systems. PySpark also is used to process real-time data using Streaming and

Sundar Ramamurthy 53 Dec 21, 2022
Fit interpretable models. Explain blackbox machine learning.

InterpretML - Alpha Release In the beginning machines learned in darkness, and data scientists struggled in the void to explain them. Let there be lig

InterpretML 5.2k Jan 09, 2023
MegFlow - Efficient ML solutions for long-tailed demands.

Efficient ML solutions for long-tailed demands.

旷视天元 MegEngine 371 Dec 21, 2022
BentoML is a flexible, high-performance framework for serving, managing, and deploying machine learning models.

Model Serving Made Easy BentoML is a flexible, high-performance framework for serving, managing, and deploying machine learning models. Supports multi

BentoML 4.4k Jan 04, 2023
A library to generate synthetic time series data by easy-to-use factors and generator

timeseries-generator This repository consists of a python packages that generates synthetic time series dataset in a generic way (under /timeseries_ge

Nike Inc. 87 Dec 20, 2022