AI and Machine Learning with Kubeflow, Amazon EKS, and SageMaker

Overview

Data Science on AWS - O'Reilly Book

Open In SageMaker Studio Lab

Get the book on Amazon.com

Data Science on AWS

Book Outline

Book Outline

Quick Start Workshop (4-hours)

Workshop Paths

In this quick start hands-on workshop, you will build an end-to-end AI/ML pipeline for natural language processing with Amazon SageMaker. You will train and tune a text classifier to predict the star rating (1 is bad, 5 is good) for product reviews using the state-of-the-art BERT model for language representation. To build our BERT-based NLP text classifier, you will use a product reviews dataset where each record contains some review text and a star rating (1-5).

Quick Start Workshop Learning Objectives

Attendees will learn how to do the following:

  • Ingest data into S3 using Amazon Athena and the Parquet data format
  • Visualize data with pandas, matplotlib on SageMaker notebooks
  • Detect statistical data bias with SageMaker Clarify
  • Perform feature engineering on a raw dataset using Scikit-Learn and SageMaker Processing Jobs
  • Store and share features using SageMaker Feature Store
  • Train and evaluate a custom BERT model using TensorFlow, Keras, and SageMaker Training Jobs
  • Evaluate the model using SageMaker Processing Jobs
  • Track model artifacts using Amazon SageMaker ML Lineage Tracking
  • Run model bias and explainability analysis with SageMaker Clarify
  • Register and version models using SageMaker Model Registry
  • Deploy a model to a REST endpoint using SageMaker Hosting and SageMaker Endpoints
  • Automate ML workflow steps by building end-to-end model pipelines using SageMaker Pipelines

Extended Workshop (8-hours)

Workshop Paths

In the extended hands-on workshop, you will get hands-on with advanced model training and deployment techniques such as hyper-parameter tuning, A/B testing, and auto-scaling. You will also setup a real-time, streaming analytics and data science pipeline to perform window-based aggregations and anomaly detection.

Extended Workshop Learning Objectives

Attendees will learn how to do the following:

  • Perform automated machine learning (AutoML) to find the best model from just your dataset with low-code
  • Find the best hyper-parameters for your custom model using SageMaker Hyper-parameter Tuning Jobs
  • Deploy multiple model variants into a live, production A/B test to compare online performance, live-shift prediction traffic, and autoscale the winning variant using SageMaker Hosting and SageMaker Endpoints
  • Setup a streaming analytics and continuous machine learning application using Amazon Kinesis and SageMaker

Workshop Instructions

Open In SageMaker Studio Lab

Amazon SageMaker Studio Lab is a free service that enables anyone to learn and experiment with ML without needing an AWS account, credit card, or cloud configuration knowledge.

1. Request Amazon SageMaker Studio Lab Account

Go to Amazon SageMaker Studio Lab, and request a free acount by providing a valid email address.

Amazon SageMaker Studio Lab Amazon SageMaker Studio Lab - Request Account

Note that Amazon SageMaker Studio Lab is currently in public preview. The number of new account registrations will be limited to ensure a high quality of experience for all customers.

2. Create Studio Lab Account

When your account request is approved, you will receive an email with a link to the Studio Lab account registration page.

You can now create your account with your approved email address and set a password and your username. This account is separate from an AWS account and doesn't require you to provide any billing information.

Amazon SageMaker Studio Lab - Create Account

3. Sign in to your Studio Lab Account

You are now ready to sign in to your account.

Amazon SageMaker Studio Lab - Sign In

4. Select your Compute instance, Start runtime, and Open project

CPU Option

Select CPU as the compute type and click Start runtime.

Amazon SageMaker Studio Lab - CPU

Once the Status shows Running, click Open project

Amazon SageMaker Studio Lab - GPU Running

5. Launch a New Terminal within Studio Lab

Amazon SageMaker Studio Lab - New Terminal

6. Clone this GitHub Repo in the Terminal

Within the Terminal, run the following:

cd ~ && git clone https://github.com/data-science-on-aws/oreilly_book

Amazon SageMaker Studio Lab - Clone Repo

7. Create data_science_on_aws Conda kernel

Within the Terminal, run the following:

cd ~/oreilly_book/ && conda env create -f environment.yml || conda env update -f environment.yml && conda activate data_science_on_aws

Amazon SageMaker Studio Lab - Create Kernel

If you see an error like the following, just ignore it. This will appear if you already have an existing Conda environment with this name. In this case, we will update the environment.

CondaValueError: prefix already exists: /home/studio-lab-user/.conda/envs/data_science_on_aws

8. Start the Workshop!

Navigate to oreilly_book/00_quickstart/ in SageMaker Studio Lab and start the workshop!

You may need to refresh your browser if you don't see the new oreilly_book/ directory.

Amazon SageMaker Studio Lab - Start Workshop

When you open the notebooks, make sure to select the data_science_on_aws kernel.

Amazon SageMaker Studio Lab - Select Kernel

Owner
Data Science on AWS
Data Science on AWS
Data Science on AWS
Time series forecasting with PyTorch

Our article on Towards Data Science introduces the package and provides background information. Pytorch Forecasting aims to ease state-of-the-art time

Jan Beitner 2.5k Jan 02, 2023
Diabetes Prediction with Logistic Regression

Diabetes Prediction with Logistic Regression Exploratory Data Analysis Data Preprocessing Model & Prediction Model Evaluation Model Validation: Holdou

AZİZE SULTAN PALALI 2 Oct 23, 2021
A Python Module That Uses ANN To Predict A Stocks Price And Also Provides Accurate Technical Analysis With Many High Potential Implementations!

Stox A Module to predict the "close price" for the next day and give "technical analysis". It uses a Neural Network and the LSTM algorithm to predict

Stox 31 Dec 16, 2022
AutoTabular automates machine learning tasks enabling you to easily achieve strong predictive performance in your applications.

AutoTabular AutoTabular automates machine learning tasks enabling you to easily achieve strong predictive performance in your applications. With just

wenqi 2 Jun 26, 2022
A statistical library designed to fill the void in Python's time series analysis capabilities, including the equivalent of R's auto.arima function.

pmdarima Pmdarima (originally pyramid-arima, for the anagram of 'py' + 'arima') is a statistical library designed to fill the void in Python's time se

alkaline-ml 1.3k Jan 06, 2023
An implementation of Relaxed Linear Adversarial Concept Erasure (RLACE)

Background This repository contains an implementation of Relaxed Linear Adversarial Concept Erasure (RLACE). Given a dataset X of dense representation

Shauli Ravfogel 4 Apr 13, 2022
Code for the TCAV ML interpretability project

Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV) Been Kim, Martin Wattenberg, Justin Gilmer, C

552 Dec 27, 2022
Python implementation of the rulefit algorithm

RuleFit Implementation of a rule based prediction algorithm based on the rulefit algorithm from Friedman and Popescu (PDF) The algorithm can be used f

Christoph Molnar 326 Jan 02, 2023
A series of Jupyter notebooks that walk you through the fundamentals of Machine Learning and Deep Learning in Python using Scikit-Learn, Keras and TensorFlow 2.

Machine Learning Notebooks, 3rd edition This project aims at teaching you the fundamentals of Machine Learning in python. It contains the example code

Aurélien Geron 1.6k Jan 05, 2023
Napari sklearn decomposition

napari-sklearn-decomposition A simple plugin to use with napari This napari plug

1 Sep 01, 2022
MLFlow in a Dockercontainer based on Azurite and Postgres

mlflow-azurite-postgres docker This is a MLFLow image which works with a postgres DB and a local Azure Blob Storage Instance (Azurite). This image is

2 May 29, 2022
Simple linear model implementations from scratch.

Hand Crafted Models Simple linear model implementations from scratch. Table of contents Overview Project Structure Getting started Citing this project

Jonathan Sadighian 2 Sep 13, 2021
Given the names and grades for each student in a class N of students, store them in a nested list and print the name(s) of any student(s) having the second lowest grade.

Hackerank-Nested-List Given the names and grades for each student in a class N of students, store them in a nested list and print the name(s) of any s

Sangeeth Mathew John 2 Dec 14, 2021
GAM timeseries modeling with auto-changepoint detection. Inspired by Facebook Prophet and implemented in PyMC3

pm-prophet Pymc3-based universal time series prediction and decomposition library (inspired by Facebook Prophet). However, while Faceook prophet is a

Luca Giacomel 314 Dec 25, 2022
Distributed scikit-learn meta-estimators in PySpark

sk-dist: Distributed scikit-learn meta-estimators in PySpark What is it? sk-dist is a Python package for machine learning built on top of scikit-learn

Ibotta 282 Dec 09, 2022
MLBox is a powerful Automated Machine Learning python library.

MLBox is a powerful Automated Machine Learning python library. It provides the following features: Fast reading and distributed data preprocessing/cle

Axel 1.4k Jan 06, 2023
Pandas DataFrames and Series as Interactive Tables in Jupyter

Pandas DataFrames and Series as Interactive Tables in Jupyter Star Turn pandas DataFrames and Series into interactive datatables in both your notebook

Marc Wouts 364 Jan 04, 2023
PennyLane is a cross-platform Python library for differentiable programming of quantum computers

PennyLane is a cross-platform Python library for differentiable programming of quantum computers. Train a quantum computer the same way as a neural ne

PennyLaneAI 1.6k Jan 01, 2023
Microsoft Machine Learning for Apache Spark

Microsoft Machine Learning for Apache Spark MMLSpark is an ecosystem of tools aimed towards expanding the distributed computing framework Apache Spark

Microsoft Azure 3.9k Dec 30, 2022
A single Python file with some tools for visualizing machine learning in the terminal.

Machine Learning Visualization Tools A single Python file with some tools for visualizing machine learning in the terminal. This demo is composed of t

Bram Wasti 35 Dec 29, 2022