The code from the Machine Learning Bookcamp book and a free course based on the book

Overview

Machine Learning Bookcamp

The code from the Machine Learning Bookcamp book

Useful links:

Machine Learning Zoomcamp

Machine Learning Zoomcamp is a course based on the book

  • It's online and free
  • You can join at any moment
  • More information in the course-zoomcamp folder

Reading Plan

Chapters

Chapter 1: Introduction to Machine Learning

  • Understanding machine learning and the problems it can solve
  • CRISP-DM: Organizing a successful machine learning project
  • Training and selecting machine learning models
  • Performing model validation

No code

Chapter 2: Machine Learning for Regression

  • Creating a car-price prediction project with a linear regression model
  • Doing an initial exploratory data analysis with Jupyter notebooks
  • Setting up a validation framework
  • Implementing the linear regression model from scratch
  • Performing simple feature engineering for the model
  • Keeping the model under control with regularization
  • Using the model to predict car prices

Code: chapter-02-car-price/02-carprice.ipynb

Chapter 3: Machine Learning for Classification

  • Predicting customers who will churn with logistic regression
  • Doing exploratory data analysis for identifying important features
  • Encoding categorical variables to use them in machine learning models
  • Using logistic regression for classification

Code: chapter-03-churn-prediction/03-churn.ipynb

Chapter 4: Evaluation Metrics for Classification

  • Accuracy as a way of evaluating binary classification models and its limitations
  • Determining where our model makes mistakes using a confusion table
  • Deriving other metrics like precision and recall from the confusion table
  • Using ROC and AUC to further understand the performance of a binary classification model
  • Cross-validating a model to make sure it behaves optimally
  • Tuning the parameters of a model to achieve the best predictive performance

Code: chapter-03-churn-prediction/04-metrics.ipynb

Chapter 5: Deploying Machine Learning Models

  • Saving models with Pickle
  • Serving models with Flask
  • Managing dependencies with Pipenv
  • Making the service self-contained with Docker
  • Deploying it to the cloud using AWS Elastic Beanstalk

Code: chapter-05-deployment

Chapter 6: Decision Trees and Ensemble Learning

  • Predicting the risk of default with tree-based models
  • Decision trees and the decision tree learning algorithm
  • Random forest: putting multiple trees together into one model
  • Gradient boosting as an alternative way of combining decision trees

Code: chapter-06-trees/06-trees.ipynb

Chapter 7: Neural Networks and Deep Learning

  • Convolutional neural networks for image classification
  • TensorFlow and Keras — frameworks for building neural networks
  • Using pre-trained neural networks
  • Internals of a convolutional neural network
  • Training a model with transfer learning
  • Data augmentations — the process of generating more training data

Code: chapter-07-neural-nets/07-neural-nets-train.ipynb

Chapter 8: Serverless Deep Learning

  • Serving models with TensorFlow-Lite — a light-weight environment for applying TensorFlow models
  • Deploying deep learning models with AWS Lambda
  • Exposing the Lambda function as a web service via API Gateway

Code: chapter-08-serverless

Chapter 9: Kubernetes and Kubeflow

Kubernetes:

  • Understanding different methods of deploying and serving models in the cloud.
  • Serving Keras and TensorFlow models with TensorFlow-Serving
  • Deploying TensorFlow-Serving to Kubernetes

Code: chapter-09-kubernetes

Kubeflow:

  • Using Kubeflow and KFServing for simplifying the deployment process

Code: chapter-09-kubeflow

Articles from mlbookcamp.com:

Appendices

Appendix A: Setting up the Environment

  • Installing Anaconda, a Python distribution that includes most of the scientific libraries we need
  • Running a Jupyter Notebook service from a remote machine
  • Installing and configuring the Kaggle command line interface tool for accessing datasets from Kaggle
  • Creating an EC2 machine on AWS using the web interface and the command-line interface

Code: no code

Articles from mlbookcamp.com:

Appendix B: Introduction to Python

  • Basic python syntax: variables and control-flow structures
  • Collections: lists, tuples, sets, and dictionaries
  • List comprehensions: a concise way of operating on collections
  • Reusability: functions, classes and importing code
  • Package management: using pip for installing libraries
  • Running python scripts

Code: appendix-b-python.ipynb

Articles from mlbookcamp.com:

Appendix C: Introduction to NumPy and Linear Algebra

  • One-dimensional and two-dimensional NumPy arrays
  • Generating NumPy arrays randomly
  • Operations with NumPy arrays: element-wise operations, summarizing operations, sorting and filtering
  • Multiplication in linear algebra: vector-vector, matrix-vector and matrix-matrix multiplications
  • Finding the inverse of a matrix and solving the normal equation

Code: appendix-c-numpy.ipynb

Articles from mlbookcamp.com:

Appendix C: Introduction to Pandas

  • The main data structures in Pandas: DataFrame and Series
  • Accessing rows and columns of a DataFrame
  • Element-wise and summarizing operations
  • Working with missing values
  • Sorting and grouping

Code: appendix-d-pandas.ipynb

Appendix D: AWS SageMaker

  • Increasing the GPU quota limits
  • Renting a Jupyter notebook with GPU in AWS SageMaker
You might also like...
A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

Light Gradient Boosting Machine LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is designed to be distributed a

Examples and code for the Practical Machine Learning workshop series

Practical Machine Learning Workshop Series Practical Machine Learning for Quantitative Finance Post conference workshop at the WBS Spring Conference D

100 Days of Machine and Deep Learning Code

💯 Days of Machine Learning and Deep Learning Code MACHINE LEARNING TOPICS COVERED - FROM SCRATCH Linear Regression Logistic Regression K Means Cluste

Turns your machine learning code into microservices with web API, interactive GUI, and more.
Turns your machine learning code into microservices with web API, interactive GUI, and more.

Turns your machine learning code into microservices with web API, interactive GUI, and more.

TorchDrug is a PyTorch-based machine learning toolbox designed for drug discovery

A powerful and flexible machine learning platform for drug discovery

Machine learning template for projects based on sklearn library.

Machine learning template for projects based on sklearn library.

Predico Disease Prediction system based on symptoms provided by patient- using Python-Django & Machine Learning

Predico Disease Prediction system based on symptoms provided by patient- using Python-Django & Machine Learning

Painless Machine Learning for python based on scikit-learn

PlainML Painless Machine Learning Library for python based on scikit-learn. Install pip install plainml Example from plainml import KnnModel, load_ir

Microsoft contributing libraries, tools, recipes, sample codes and workshop contents for machine learning & deep learning.

Microsoft contributing libraries, tools, recipes, sample codes and workshop contents for machine learning & deep learning.

Comments
  • Adding setup with docker

    Adding setup with docker

    Hi @alexeygrigorev ,

    I created a small guide for anyone who feels comfortable using Docker or might want to try it for setting up the environment.

    Since I saw a couple of questions today related to environment setup, I thought of sharing what I usually use when working on projects or courses, then it can be re-usable.

    Hoping is helpful :)

    Changelog:

    • Updated readme with link to guide to create docker container
    • Added new guide to build docker container and run it
    • Added Dockerfile and environment.yml
    opened by laurauzcategui 5
  • While converting keras to tflite error

    While converting keras to tflite error

    While converting keras to tflite error :

    raise ValueError('Unrecognized keyword arguments:', kwargs.keys()) ValueError: ('Unrecognized keyword arguments:', dict_keys(['ragged']))

    Traceback (most recent call last): File "convert.py", line 5, in <module> model = keras.models.load_model('xception_v4_large_08_0.894.h5')

    opened by saisubramani 5
  • notes correction in 06 Decision Trees...

    notes correction in 06 Decision Trees...

    Inside 02-data-prep.md , in the train/val/test split bullet note at the moment is : "Split the data with the distribution of 80% train, 20% validation, and 20% test sets with random seed to 11"

    should be:

    Split the data with the distribution of 60% train, 20% validation, and 20% test sets with random seed to 11

    opened by lucapug 4
  • Update homework.md

    Update homework.md

    Updated Question 4 text from "when one grows" to "when one grows up" and the F1 formula from "F1 = 2 * P * R / (P + R)" to "$$F1 = {2.}\frac{P . R}{P+R}$$"

    opened by ukokobili 3
Releases(chapter7-model)
Owner
Alexey Grigorev
Alexey Grigorev
Microsoft Machine Learning for Apache Spark

Microsoft Machine Learning for Apache Spark MMLSpark is an ecosystem of tools aimed towards expanding the distributed computing framework Apache Spark

Microsoft Azure 3.9k Dec 30, 2022
fastFM: A Library for Factorization Machines

Citing fastFM The library fastFM is an academic project. The time and resources spent developing fastFM are therefore justified by the number of citat

1k Dec 24, 2022
Python library for multilinear algebra and tensor factorizations

scikit-tensor is a Python module for multilinear algebra and tensor factorizations

Maximilian Nickel 394 Dec 09, 2022
Decision tree is the most powerful and popular tool for classification and prediction

Diabetes Prediction Using Decision Tree Introduction Decision tree is the most powerful and popular tool for classification and prediction. A Decision

Arjun U 1 Jan 23, 2022
Transform ML models into a native code with zero dependencies

m2cgen (Model 2 Code Generator) - is a lightweight library which provides an easy way to transpile trained statistical models into a native code

Bayes' Witnesses 2.3k Jan 03, 2023
QML: A Python Toolkit for Quantum Machine Learning

QML is a Python2/3-compatible toolkit for representation learning of properties of molecules and solids.

176 Dec 09, 2022
Fourier-Bayesian estimation of stochastic volatility models

fourier-bayesian-sv-estimation Fourier-Bayesian estimation of stochastic volatility models Code used to run the numerical examples of "Bayesian Approa

15 Jun 20, 2022
Uses WiFi signals :signal_strength: and machine learning to predict where you are

Uses WiFi signals and machine learning (sklearn's RandomForest) to predict where you are. Even works for small distances like 2-10 meters.

Pascal van Kooten 5k Jan 09, 2023
This repo includes some graph-based CTR prediction models and other representative baselines.

Graph-based CTR prediction This is a repository designed for graph-based CTR prediction methods, it includes our graph-based CTR prediction methods: F

Big Data and Multi-modal Computing Group, CRIPAC 47 Dec 30, 2022
Fast Fourier Transform-accelerated Interpolation-based t-SNE (FIt-SNE)

FFT-accelerated Interpolation-based t-SNE (FIt-SNE) Introduction t-Stochastic Neighborhood Embedding (t-SNE) is a highly successful method for dimensi

Kluger Lab 547 Dec 21, 2022
Real-time stream processing for python

Streamz Streamz helps you build pipelines to manage continuous streams of data. It is simple to use in simple cases, but also supports complex pipelin

Python Streamz 1.1k Dec 28, 2022
In this Repo a simple Sklearn Model will be trained and pushed to MLFlow

SKlearn_to_MLFLow In this Repo a simple Sklearn Model will be trained and pushed to MLFlow Install This Repo is based on poetry python3 -m venv .venv

1 Dec 13, 2021
Send rockets to Mars with artificial intelligence(Genetic algorithm) in python.

Send Rockets To Mars With AI Send rockets to Mars with artificial intelligence(Genetic algorithm) in python. Tools Python 3 EasyDraw How to Play Insta

Mohammad Dori 3 Jul 15, 2022
Firebase + Cloudrun + Machine learning

A simple end to end consumer lending decision engine powered by Google Cloud Platform (firebase hosting and cloudrun)

Emmanuel Ogunwede 8 Aug 16, 2022
Summer: compartmental disease modelling in Python

Summer: compartmental disease modelling in Python Summer is a Python-based framework for the creation and execution of compartmental (or "state-based"

6 May 13, 2022
Required for a machine learning pipeline data preprocessing and variable engineering script needs to be prepared

Feature-Engineering Required for a machine learning pipeline data preprocessing and variable engineering script needs to be prepared. When the dataset

kemalgunay 5 Apr 21, 2022
Machine-Learning with python (jupyter)

Machine-Learning with python (jupyter) 머신러닝 야학 작심 10일과 쥬피터 노트북 기반 데이터 사이언스 시작 들어가기전 https://nbviewer.org/ 페이지를 통해서 쥬피터 노트북 내용을 볼 수 있다. 위 페이지에서 현재 레포 기

HyeonWoo Jeong 1 Jan 23, 2022
Kaggle Tweet Sentiment Extraction Competition: 1st place solution (Dark of the Moon team)

Kaggle Tweet Sentiment Extraction Competition: 1st place solution (Dark of the Moon team)

Artsem Zhyvalkouski 64 Nov 30, 2022
Gaussian Process Optimization using GPy

End of maintenance for GPyOpt Dear GPyOpt community! We would like to acknowledge the obvious. The core team of GPyOpt has moved on, and over the past

Sheffield Machine Learning Software 847 Dec 19, 2022
BASTA: The BAyesian STellar Algorithm

BASTA: BAyesian STellar Algorithm Current stable version: v1.0 Important note: BASTA is developed for Python 3.8, but Python 3.7 should work as well.

BASTA team 16 Nov 15, 2022