Databricks Certified Associate Spark Developer preparation toolkit to setup single node Standalone Spark Cluster along with material in the form of Jupyter Notebooks.

Overview

Databricks Certification Spark

Databricks Certified Associate Spark Developer preparation toolkit to setup single node Standalone Spark Cluster along with material in the form of Jupyter Notebooks. This is extensively used as part of our Udemy courses as well as our upcoming guided programs related to Databricks Certified Associate Spark Developer.

Udemy Courses

This GitHub repository can be leveraged to setup Single Node Spark Cluster using Standalone along with Jupyterlab to prepare for the Databricks Certified Associate Developer - Apache Spark. They are available at a max of $25 and we provide $10 coupons 2 times every month. Also, these courses are part of Udemy for business.

Technologies Covered

As part of this custom image built by us, we have included the following as a preparation toolkit for Databricks Certified Associate Developer - Apache Spark.

  • Apache Spark 3 using Spark Stand Alone Cluster
  • Jupyter based environment along with material for the preparation towards Databricks Certified Associate Developer - Apache Spark
  • If you set up the environment as instructed as part of our courses then you will also get the data sets as well as material in the form of Jupyter Notebooks.

For all video lectures, up-to-date material, live support - feel free to sign up for our Udemy courses or our upcoming guided programs.

Setup Spark Lab for Databricks Certified Associate Developer - Apache Spark

Pre-requisites

Here are the pre-requisites to setup the lab.

  • Memory: 16 GB RAM
  • CPU: At least Quadcore
  • If you are using Windows or Mac, make sure to setup Docker Desktop.
  • If your system does not meet the requirement, you need to setup environment using AWS Cloud9.
  • Even if you have 16 GB RAM and the Quadcore CPU, the system might slow down once we start the docker containers due to the requirements of the resources. You can always use AWS Cloud9 as fallback option.
  • In my case, I will be demonstrating using Cloud9.

Configure Docker Desktop

If you are using Windows or Mac, you need to change the settings to use as much resources as possible.

  • Go to Docker Desktop preferences.
  • Change memory to 12 GB.
  • Change CPUs to the maximum number.

Setup Environment

Here are the steps one need to follow to setup the lab.

  • Clone the repository by running git clone https://github.com/itversity/databricks-certification-spark.

Pull the Image

Spark image is of moderate size. It is close to 1.5 GB.

  • Make sure to pull it before running docker-compose command to setup the lab.
  • You can pull the image using docker pull itversity/itvspark3.
  • You can validate if the image is successfully pulled or not by running docker images command.

Start Environment

Here are the steps to start the environment.

  • Run docker-compose up -d --build itvspark3.
  • It will set up single node Stand Alone Spark Cluster.
  • You can run docker-compose logs -f itvspark3 to review the progress. It will take some time to complete the setup process.
  • You can stop the environment using docker-compose stop command.

Access the Lab

Here are the steps to access the lab.

  • Make sure both Postgres and Jupyter Lab containers are up and running by using docker-compose ps
  • Get the token from the Jupyter Lab container using below command.
docker-compose exec itvspark3 \
  sh -c "cat .local/share/jupyter/runtime/jpserver-*.json"

Access Databricks Certified Associate Developer - Apache Spark Material

Once you login, you should be able to go through the module under itversity-material to access the content.

Evaluate on three different ML model for feature selection using Breast cancer data.

Anomaly-detection-Feature-Selection Evaluate on three different ML model for feature selection using Breast cancer data. ML models: SVM, KNN and MLP.

Tarek idrees 1 Mar 17, 2022
A Python Module That Uses ANN To Predict A Stocks Price And Also Provides Accurate Technical Analysis With Many High Potential Implementations!

Stox A Module to predict the "close price" for the next day and give "technical analysis". It uses a Neural Network and the LSTM algorithm to predict

Stox 31 Dec 16, 2022
CS 7301: Spring 2021 Course on Advanced Topics in Optimization in Machine Learning

CS 7301: Spring 2021 Course on Advanced Topics in Optimization in Machine Learning

Rishabh Iyer 141 Nov 10, 2022
Simple Machine Learning Tool Kit

Getting started smltk (Simple Machine Learning Tool Kit) package is implemented for helping your work during data preparation testing your model The g

Alessandra Bilardi 1 Dec 30, 2021
Python factor analysis library (PCA, CA, MCA, MFA, FAMD)

Prince is a library for doing factor analysis. This includes a variety of methods including principal component analysis (PCA) and correspondence anal

Max Halford 915 Dec 31, 2022
pymc-learn: Practical Probabilistic Machine Learning in Python

pymc-learn: Practical Probabilistic Machine Learning in Python Contents: Github repo What is pymc-learn? Quick Install Quick Start Index What is pymc-

pymc-learn 196 Dec 07, 2022
Learn Machine Learning Algorithms by doing projects in Python and R Programming Language

Learn Machine Learning Algorithms by doing projects in Python and R Programming Language. This repo covers all aspect of Machine Learning Algorithms.

Ravi Chaubey 6 Oct 20, 2022
Climin is a Python package for optimization, heavily biased to machine learning scenarios

climin climin is a Python package for optimization, heavily biased to machine learning scenarios distributed under the BSD 3-clause license. It works

Biomimetic Robotics and Machine Learning at Technische Universität München 177 Sep 02, 2022
This repository has datasets containing information of Uber pickups in NYC from April 2014 to September 2014 and January to June 2015. data Analysis , virtualization and some insights are gathered here

uber-pickups-analysis Data Source: https://www.kaggle.com/fivethirtyeight/uber-pickups-in-new-york-city Information about data set The dataset contain

B DEVA DEEKSHITH 1 Nov 03, 2021
Convoys is a simple library that fits a few statistical model useful for modeling time-lagged conversions.

Convoys is a simple library that fits a few statistical model useful for modeling time-lagged conversions. There is a lot more info if you head over to the documentation. You can also take a look at

Better 240 Dec 26, 2022
Model factory is a ML training platform to help engineers to build ML models at scale

Model Factory Machine learning today is powering many businesses today, e.g., search engine, e-commerce, news or feed recommendation. Training high qu

16 Sep 23, 2022
Short PhD seminar on Machine Learning Security (Adversarial Machine Learning)

Short PhD seminar on Machine Learning Security (Adversarial Machine Learning)

141 Dec 27, 2022
a distributed deep learning platform

Apache SINGA Distributed deep learning system http://singa.apache.org Quick Start Installation Examples Issues JIRA tickets Code Analysis: Mailing Lis

The Apache Software Foundation 2.7k Jan 05, 2023
This machine learning model was developed for House Prices

This machine learning model was developed for House Prices - Advanced Regression Techniques competition in Kaggle by using several machine learning models such as Random Forest, XGBoost and LightGBM.

serhat_derya 1 Mar 02, 2022
Predicting Keystrokes using an Audio Side-Channel Attack and Machine Learning

Predicting Keystrokes using an Audio Side-Channel Attack and Machine Learning My

3 Apr 10, 2022
Stats, linear algebra and einops for xarray

xarray-einstats Stats, linear algebra and einops for xarray ⚠️ Caution: This project is still in a very early development stage Installation To instal

ArviZ 30 Dec 28, 2022
Datetimes for Humans™

Maya: Datetimes for Humans™ Datetimes are very frustrating to work with in Python, especially when dealing with different locales on different systems

Timo Furrer 3.4k Dec 28, 2022
learn python in 100 days, a simple step could be follow from beginner to master of every aspect of python programming and project also include side project which you can use as demo project for your personal portfolio

learn python in 100 days, a simple step could be follow from beginner to master of every aspect of python programming and project also include side project which you can use as demo project for your

BDFD 6 Nov 05, 2022
Pydantic based mock data generation

This library offers powerful mock data generation capabilities for pydantic based models. It can also be used with other libraries that use pydantic as a foundation, for example SQLModel, Beanie and

Na'aman Hirschfeld 396 Dec 28, 2022
MLReef is an open source ML-Ops platform that helps you collaborate, reproduce and share your Machine Learning work with thousands of other users.

The collaboration platform for Machine Learning MLReef is an open source ML-Ops platform that helps you collaborate, reproduce and share your Machine

MLReef 1.4k Dec 27, 2022