Demonstrate the breadth and depth of your data science skills by earning all of the Databricks Data Scientist credentials

Overview

Data Scientist Learning Plan

Demonstrate the breadth and depth of your data science skills by earning all of the Databricks Data Scientist credentials.

This learning path consists of several series of self-paced (E-Learning) courses and paid instructor-led courses. If you are interested in ILT, please be sure to search the course catalog for more information.

Learning Plan Structure

  • What is the Databricks Lakehouse Platform?

    This course (formerly Fundamentals of the Databricks Lakehouse Platform) is designed for everyone who is brand new to the Platform and wants to learn more about what it is, why it was developed, what it does, and the components that make it up.

    Our goal is that by the time you finish this course, you’ll have a better understanding of the Platform in general and be able to answer questions like: What is Databricks? Where does Databricks fit into my workflow? How have other customers been successful with Databricks?

    Learning objectives

    • Describe what the Databricks Lakehouse Platform is.
    • Explain the origins of the Lakehouse data management paradigm.
    • Outline fundamental problems that cause most enterprises to struggle with managing and making use of their data.
    • Identify the most popular components of the Databricks Lakehouse - Platform used by data practitioners, depending on their unique role.
    • Give examples of organizations that have used the Databricks Lakehouse Platform to streamline big data processing and analytics.
  • What is Delta Lake?

    Today, many organizations struggle with achieving successful big data and artificial intelligence (AI) projects. One of the biggest challenges they face is ensuring that quality, reliable data is available to data practitioners running these projects. After all, an organization that does not have reliable data will not succeed with AI. To help organizations bring structure, reliability, and performance to their data lakes, Databricks created Delta Lake.

    Delta Lake is an open format storage layer that sits on top of your organization’s data lake. It is the foundation of a cost-effective, highly scalable Lakehouse and is an integral part of the Databricks Lakehouse Platform.

    In this course (formerly Fundamentals of Delta Lake), we’ll break down the basics behind Delta Lake - what it does, how it works, and why it is valuable from a business perspective, to any organization with big data and AI projects.

    Learning objectives

    • Describe how Delta Lake fits into the Databricks Lakehouse Platform.
    • Explain the four elements encompassed by Delta Lake.
    • Summarize high-level Delta Lake functionality that helps organizations solve common challenges related to enterprise-scale data analytics.
    • Articulate examples of how organizations have employed Delta Lake on Databricks to improve business outcomes.
  • What is Databricks SQL?

    Databricks SQL offers SQL users a platform for querying, analyzing, and visualizing data. This course (formerly Fundamentals of Databricks SQL) guides users through the interface and demonstrates many of the tools and features available in the Databricks SQL interface.

    Learning objectives

    • Describe the basics of the Databricks SQL service.
    • Describe the benefits of using Databricks SQL to perform data analyses.
    • Describe how to complete a basic query, visualization, and dashboard workflow using Databricks SQL.
  • What is Databricks Machine Learning?

    Databricks Machine Learning offers data scientists and other machine learning practitioners a platform for completing and managing the end-to-end machine learning lifecycle. This course (formerly Fundamentals of Databricks Machine Learning) guides business leaders and practitioners through a basic overview of Databricks Machine Learning, the benefits of using Databricks Machine Learning, its fundamental components and functionalities, and examples of successful customer use.

    Learning objectives

    • Describe the basic overview of Databricks Machine Learning.
    • Identify how using Databricks Machine Learning benefits data science and machine learning teams.
    • Summarize the fundamental components and functionalities of Databricks Machine Learning.
    • Exemplify successful use cases of Databricks Machine Learning by real Databricks customers.
  • Fundamentals of the Databricks Lakehouse Platform Accreditation

  • Apache Spark Programming with Databricks

  • Certification Overview Course for the Databricks Certified Associate Developer for Apache Spark Exam

  • Getting Started with Databricks Machine Learning

  • Scaling Machine Learning Pipelines

Owner
Trung-Duy Nguyen
Trung-Duy Nguyen
Pyspark Spotify ETL

This is my first Data Engineering project, it extracts data from the user's recently played tracks using Spotify's API, transforms data and then loads it into Postgresql using SQLAlchemy engine. Data

16 Jun 09, 2022
A script to "SHUA" H1-2 map of Mercenaries mode of Hearthstone

lushi_script Introduction This script is to "SHUA" H1-2 map of Mercenaries mode of Hearthstone Installation Make sure you installed python=3.6. To in

210 Jan 02, 2023
Tablexplore is an application for data analysis and plotting built in Python using the PySide2/Qt toolkit.

Tablexplore is an application for data analysis and plotting built in Python using the PySide2/Qt toolkit.

Damien Farrell 81 Dec 26, 2022
PrimaryBid - Transform application Lifecycle Data and Design and ETL pipeline architecture for ingesting data from multiple sources to redshift

Transform application Lifecycle Data and Design and ETL pipeline architecture for ingesting data from multiple sources to redshift This project is composed of two parts: Part1 and Part2

Emmanuel Boateng Sifah 1 Jan 19, 2022
Projeto para realizar o RPA Challenge . Utilizando Python e as bibliotecas Selenium e Pandas.

RPA Challenge in Python Projeto para realizar o RPA Challenge (www.rpachallenge.com), utilizando Python. O objetivo deste desafio é criar um fluxo de

Henrique A. Lourenço 1 Apr 12, 2022
A set of tools to analyse the output from TraDIS analyses

QuaTradis (Quadram TraDis) A set of tools to analyse the output from TraDIS analyses Contents Introduction Installation Required dependencies Bioconda

Quadram Institute Bioscience 2 Feb 16, 2022
Finding project directories in Python (data science) projects, just like there R rprojroot and here packages

Find relative paths from a project root directory Finding project directories in Python (data science) projects, just like there R here and rprojroot

Daniel Chen 102 Nov 16, 2022
Python beta calculator that retrieves stock and market data and provides linear regressions.

Stock and Index Beta Calculator Python script that calculates the beta (β) of a stock against the chosen index. The script retrieves the data and resa

sammuhrai 4 Jul 29, 2022
An extension to pandas dataframes describe function.

pandas_summary An extension to pandas dataframes describe function. The module contains DataFrameSummary object that extend describe() with: propertie

Mourad 450 Dec 30, 2022
Show you how to integrate Zeppelin with Airflow

Introduction This repository is to show you how to integrate Zeppelin with Airflow. The philosophy behind the ingtegration is to make the transition f

Jeff Zhang 11 Dec 30, 2022
Data exploration done quick.

Pandas Tab Implementation of Stata's tabulate command in Pandas for extremely easy to type one-way and two-way tabulations. Support: Python 3.7 and 3.

W.D. 20 Aug 27, 2022
A collection of learning outcomes data analysis using Python and SQL, from DQLab.

Data Analyst with PYTHON Data Analyst berperan dalam menghasilkan analisa data serta mempresentasikan insight untuk membantu proses pengambilan keputu

6 Oct 11, 2022
The Master's in Data Science Program run by the Faculty of Mathematics and Information Science

The Master's in Data Science Program run by the Faculty of Mathematics and Information Science is among the first European programs in Data Science and is fully focused on data engineering and data a

Amir Ali 2 Jun 17, 2022
A simple and efficient tool to parallelize Pandas operations on all available CPUs

Pandaral·lel Without parallelization With parallelization Installation $ pip install pandarallel [--upgrade] [--user] Requirements On Windows, Pandara

Manu NALEPA 2.8k Dec 31, 2022
A tool to compare differences between dataframes and create a differences report in Excel

similarpanda A module to check for differences between pandas Dataframes, and generate a report in Excel format. This is helpful in a workplace settin

Andre Pretorius 9 Sep 15, 2022
This is a python script to navigate and extract the FSD50K dataset

FSD50K navigator This is a script I use to navigate the sound dataset from FSK50K.

sweemeng 2 Nov 23, 2021
For making Tagtog annotation into csv dataset

tagtog_relation_extraction for making Tagtog annotation into csv dataset How to Use On Tagtog 1. Go to Project Downloads 2. Download all documents,

hyeong 4 Dec 28, 2021
Minimal working example of data acquisition with nidaqmx python API

Data Aquisition using NI-DAQmx python API Based on this project It is a minimal working example for data acquisition using the NI-DAQmx python API. It

Pablo 1 Nov 05, 2021
A columnar data container that can be compressed.

Unmaintained Package Notice Unfortunately, and due to lack of resources, the Blosc Development Team is unable to maintain this package anymore. During

944 Dec 09, 2022