Demonstrate the breadth and depth of your data science skills by earning all of the Databricks Data Scientist credentials

Overview

Data Scientist Learning Plan

Demonstrate the breadth and depth of your data science skills by earning all of the Databricks Data Scientist credentials.

This learning path consists of several series of self-paced (E-Learning) courses and paid instructor-led courses. If you are interested in ILT, please be sure to search the course catalog for more information.

Learning Plan Structure

  • What is the Databricks Lakehouse Platform?

    This course (formerly Fundamentals of the Databricks Lakehouse Platform) is designed for everyone who is brand new to the Platform and wants to learn more about what it is, why it was developed, what it does, and the components that make it up.

    Our goal is that by the time you finish this course, you’ll have a better understanding of the Platform in general and be able to answer questions like: What is Databricks? Where does Databricks fit into my workflow? How have other customers been successful with Databricks?

    Learning objectives

    • Describe what the Databricks Lakehouse Platform is.
    • Explain the origins of the Lakehouse data management paradigm.
    • Outline fundamental problems that cause most enterprises to struggle with managing and making use of their data.
    • Identify the most popular components of the Databricks Lakehouse - Platform used by data practitioners, depending on their unique role.
    • Give examples of organizations that have used the Databricks Lakehouse Platform to streamline big data processing and analytics.
  • What is Delta Lake?

    Today, many organizations struggle with achieving successful big data and artificial intelligence (AI) projects. One of the biggest challenges they face is ensuring that quality, reliable data is available to data practitioners running these projects. After all, an organization that does not have reliable data will not succeed with AI. To help organizations bring structure, reliability, and performance to their data lakes, Databricks created Delta Lake.

    Delta Lake is an open format storage layer that sits on top of your organization’s data lake. It is the foundation of a cost-effective, highly scalable Lakehouse and is an integral part of the Databricks Lakehouse Platform.

    In this course (formerly Fundamentals of Delta Lake), we’ll break down the basics behind Delta Lake - what it does, how it works, and why it is valuable from a business perspective, to any organization with big data and AI projects.

    Learning objectives

    • Describe how Delta Lake fits into the Databricks Lakehouse Platform.
    • Explain the four elements encompassed by Delta Lake.
    • Summarize high-level Delta Lake functionality that helps organizations solve common challenges related to enterprise-scale data analytics.
    • Articulate examples of how organizations have employed Delta Lake on Databricks to improve business outcomes.
  • What is Databricks SQL?

    Databricks SQL offers SQL users a platform for querying, analyzing, and visualizing data. This course (formerly Fundamentals of Databricks SQL) guides users through the interface and demonstrates many of the tools and features available in the Databricks SQL interface.

    Learning objectives

    • Describe the basics of the Databricks SQL service.
    • Describe the benefits of using Databricks SQL to perform data analyses.
    • Describe how to complete a basic query, visualization, and dashboard workflow using Databricks SQL.
  • What is Databricks Machine Learning?

    Databricks Machine Learning offers data scientists and other machine learning practitioners a platform for completing and managing the end-to-end machine learning lifecycle. This course (formerly Fundamentals of Databricks Machine Learning) guides business leaders and practitioners through a basic overview of Databricks Machine Learning, the benefits of using Databricks Machine Learning, its fundamental components and functionalities, and examples of successful customer use.

    Learning objectives

    • Describe the basic overview of Databricks Machine Learning.
    • Identify how using Databricks Machine Learning benefits data science and machine learning teams.
    • Summarize the fundamental components and functionalities of Databricks Machine Learning.
    • Exemplify successful use cases of Databricks Machine Learning by real Databricks customers.
  • Fundamentals of the Databricks Lakehouse Platform Accreditation

  • Apache Spark Programming with Databricks

  • Certification Overview Course for the Databricks Certified Associate Developer for Apache Spark Exam

  • Getting Started with Databricks Machine Learning

  • Scaling Machine Learning Pipelines

Owner
Trung-Duy Nguyen
Trung-Duy Nguyen
A Pythonic introduction to methods for scaling your data science and machine learning work to larger datasets and larger models, using the tools and APIs you know and love from the PyData stack (such as numpy, pandas, and scikit-learn).

This tutorial's purpose is to introduce Pythonistas to methods for scaling their data science and machine learning work to larger datasets and larger models, using the tools and APIs they know and lo

Coiled 102 Nov 10, 2022
CubingB is a timer/analyzer for speedsolving Rubik's cubes, with smart cube support

CubingB is a timer/analyzer for speedsolving Rubik's cubes (and related puzzles). It focuses on supporting "smart cubes" (i.e. bluetooth cubes) for recording the exact moves of a solve in real time.

Zach Wegner 5 Sep 18, 2022
DataPrep — The easiest way to prepare data in Python

DataPrep — The easiest way to prepare data in Python

SFU Database Group 1.5k Dec 27, 2022
PCAfold is an open-source Python library for generating, analyzing and improving low-dimensional manifolds obtained via Principal Component Analysis (PCA).

PCAfold is an open-source Python library for generating, analyzing and improving low-dimensional manifolds obtained via Principal Component Analysis (PCA).

Burn Research 4 Oct 13, 2022
Transform-Invariant Non-Negative Matrix Factorization

Transform-Invariant Non-Negative Matrix Factorization A comprehensive Python package for Non-Negative Matrix Factorization (NMF) with a focus on learn

EMD Group 6 Jul 01, 2022
LynxKite: a complete graph data science platform for very large graphs and other datasets.

LynxKite is a complete graph data science platform for very large graphs and other datasets. It seamlessly combines the benefits of a friendly graphical interface and a powerful Python API.

124 Dec 14, 2022
PyStan, a Python interface to Stan, a platform for statistical modeling. Documentation: https://pystan.readthedocs.io

PyStan PyStan is a Python interface to Stan, a package for Bayesian inference. Stan® is a state-of-the-art platform for statistical modeling and high-

Stan 229 Dec 29, 2022
PyPSA: Python for Power System Analysis

1 Python for Power System Analysis Contents 1 Python for Power System Analysis 1.1 About 1.2 Documentation 1.3 Functionality 1.4 Example scripts as Ju

758 Dec 30, 2022
A lightweight interface for reading in output from the Weather Research and Forecasting (WRF) model into xarray Dataset

xwrf A lightweight interface for reading in output from the Weather Research and Forecasting (WRF) model into xarray Dataset. The primary objective of

National Center for Atmospheric Research 43 Nov 29, 2022
CS50 pset9: Using flask API to create a web application to exchange stocks' shares.

C$50 Finance In this guide we want to implement a website via which users can “register”, “login” “buy” and “sell” stocks, like below: Background If y

1 Jan 24, 2022
API>local_db>AWS_RDS - Disclaimer! All data used is for educational purposes only.

APIlocal_dbAWS_RDS Disclaimer! All data used is for educational purposes only. ETL pipeline diagram. Aim of project By creating a fully working pipe

0 Apr 25, 2022
Business Intelligence (BI) in Python, OLAP

Open Mining Business Intelligence (BI) Application Server written in Python Requirements Python 2.7 (Backend) Lua 5.2 or LuaJIT 5.1 (OML backend) Mong

Open Mining 1.2k Dec 27, 2022
Projects that implement various aspects of Data Engineering.

DATAWAREHOUSE ON AWS The purpose of this project is to build a datawarehouse to accomodate data of active user activity for music streaming applicatio

2 Oct 14, 2021
Snakemake workflow for converting FASTQ files to self-contained CRAM files with maximum lossless compression.

Snakemake workflow: name A Snakemake workflow for description Usage The usage of this workflow is described in the Snakemake Workflow Catalog. If

Algorithms for reproducible bioinformatics (Koesterlab) 1 Dec 16, 2021
Predictive Modeling & Analytics on Home Equity Line of Credit

Predictive Modeling & Analytics on Home Equity Line of Credit Data (Python) HMEQ Data Set In this assignment we will use Python to examine a data set

Dhaval Patel 1 Jan 09, 2022
Python package for processing UC module spectral data.

UC Module Python Package How To Install clone repo. cd UC-module pip install . How to Use uc.module.UC(measurment=str, dark=str, reference=str, heade

Nicolai Haaber Junge 1 Oct 20, 2021
Clean and reusable data-sciency notebooks.

KPACUBO KPACUBO is a set Jupyter notebooks focused on the best practices in both software development and data science, namely, code reuse, explicit d

Matvey Morozov 1 Jan 28, 2022
PyClustering is a Python, C++ data mining library.

pyclustering is a Python, C++ data mining library (clustering algorithm, oscillatory networks, neural networks). The library provides Python and C++ implementations (C++ pyclustering library) of each

Andrei Novikov 1k Jan 05, 2023
Integrate bus data from a variety of sources (batch processing and real time processing).

Purpose: This is integrate bus data from a variety of sources such as: csv, json api, sensor data ... into Relational Database (batch processing and r

1 Nov 25, 2021
The Dash Enterprise App Gallery "Oil & Gas Wells" example

This app is based on the Dash Enterprise App Gallery "Oil & Gas Wells" example. For more information and more apps see: Dash App Gallery See the Dash

Austin Caudill 1 Nov 08, 2021