Jupyter notebooks and AWS CloudFormation template to show how Hudi, Iceberg, and Delta Lake work

Overview

Modern Data Lake Storage Layers

This repository contains supporting assets for my research in modern Data Lake storage layers like Apache Hudi, Apache Iceberg, and Delta Lake.

Specifically, there's a CloudFormation template to create an EMR cluster and EMR Studio with the necessary requirements and Jupyter notebooks with the example walkthroughs.

You can view the corresponding blog post and video

Pre-requisites

You'll need an AWS Account in which you have administrator privileges and the ability to deploy a CloudFormation template. The template will create an EMR Cluster and S3 bucket that will incur charges - be sure to either shut down the cluster when done or delete the CloudFormation stack. In order to delete the CloudFormation stack, you'll need to:

  • Manually delete any EMR Studio Workspaces you created
  • Manually empty the S3 bucket created by CloudFormation
  • Manually delete the VPC created by CloudFormation due to auto-created rules

Overview

The included CloudFormation template creates a new VPC and EMR Cluster for you to be able to run the notebooks. An EMR Studio is also created and you can find the Studio URL in the Outputs tab of your CloudFormation Stack.

Once the stack is done creating, you'll need to navigate to EMR Studio and create a new workspace attached to the "data-lakes" cluster.

Inside the workspace you either upload each notebook individually from the notebooks/ folder or simply connect to this repository by using the "Git" icon on the left-hand side.

Compulsory join Telegram Bot

mussjoin About Compulsory join Telegram Bot this Telegram Bot Application can be added users to Telegram Channel or Group compulsorily. in addition wh

Hamed Mohammadvand 4 Dec 03, 2021
Automatically deploy freqtrade to a remote Docker host and auto update strategies.

Freqtrade Automatically deploy freqtrade to a remote Docker host and auto update strategies. I've been using it to automatically deploy to vultr, but

p-zombie 109 Jan 07, 2023
E-Commerce Telegram Bot for UCA Students

ucaStudentStore To buy from and sell to other students Features Register the first time, after that you will always be recognised You can login either

Shukur Sabzaliev 5 Jun 26, 2022
Yet another discord-BOT

Note I have not added comments to the initial code as it is for my educational purpose. Use This is the code for a discord-BOT API py-cord-2.0.0a4178+

IRONMELTS 1 Dec 18, 2021
iCloudPy is a simple iCloud webservices wrapper library written in Python

iCloudPy 🤟 Please star this repository if you end up using the library. It will help me continue supporting this product. 🙏 iCloudPy is a simple iCl

Mandar Patil 49 Dec 26, 2022
Clippin n grafting Backend

Clipping' n Grafting Presenting you, 🎉 Clippin' n Grafting 🎉 , your very own ecommerce website displaying all your artsy-craftsy stuff. Not only the

Google-Developer-Student-Club-ISquareIT (GDSC I²IT) 2 Oct 22, 2021
YARSAW is an Async Python API Wrapper for the Random Stuff API.

Yet Another Random Stuff API Wrapper - YARSAW YARSAW is an Async Python API Wrapper for the Random Stuff API. This module makes it simpler for you to

Bruce 6 Mar 27, 2022
Converts between Spotify's new lyrics (and their proprietary format) to an LRC file for local playback.

spotify-lyrics-to-lrc Converts between Spotify's new lyrics (and their proprietary format) to an LRC file for local playback. How to use: Open Spotify

~noah~ 6 Nov 19, 2022
This is a simple Telegram bot to Delete User Messages based on Groupmembers Votes. Heroku deployable

ibCleaner Bot This is a simple Telegram bot to Delete User Messages based on Groupmembers Votes. Deploy to Heroku Deploy locally Edit config.py and ad

8 Oct 21, 2022
A wrapper for The Movie Database API v3 and v4 that only uses the read access token (not api key).

fulltmdb A wrapper for The Movie Database API v3 and v4 that only uses the read access token (not api key). Installation Use the package manager pip t

Jacob Hale 2 Sep 26, 2021
Leveraged grid-trading bot using CCXT/CCXT Pro library in FTX exchange.

Leveraged-grid-trading-bot The code is designed to perform infinity grid trading strategy in FTX exchange. The basic trader named Gridtrader.py contro

Hao-Liang Wen 25 Oct 07, 2021
A Rich renderable for viewing Multiple Sequence Alignments in the terminal.

rich-msa A simple module to render colorful Multiple Sequence Alignment with rich in the terminal. 🔧 Installing Install the rich-msa package directly

Martin Larralde 64 Dec 04, 2022
An Python SDK for QQ based on mirai-api-http v2.

Argon 一个基于 graia-broadcast 和 mirai-api-http v2 的 Python SDK。 本项目适用于 mirai-api-http 2.0 以上版本。 目前仍处于开发阶段,内部接口可能会有较大的变化。 The Stasis / 停滞 为维持 GraiaProject

BlueGlassBlock 1 Oct 29, 2021
Dodo - A graphical, hackable email client based on notmuch

Dodo Dodo is a graphical email client written in Python/PyQt5, based on the comm

Aleks Kissinger 44 Nov 12, 2022
Vladilena Mirize Music - Bot Music Telegram By @zenfrans

Vladilena Mirize Music - Bot Music Telegram By @zenfrans

Wahyusaputra 3 Feb 12, 2022
A simple Discord Mass Dm with Scraper

Python-Mass-DM A simple Discord Mass Dm with Scraper If Member Scraper in Taliban.py doesn't work. You can DM me cuz that scraper is for tokens that g

RyanzSantos 4 Sep 02, 2022
Utility for converting IP Fabric webhooks into a Teams format

IP Fabric Webhook Integration for Microsoft Teams and/or Slack Setup IP Fabric Setup Go to Settings Webhooks Add webhook Provide a name URL will b

Community Fabric 1 Jan 26, 2022
Discord Voice Channel Automatic Online

Discord-Selfbot-voice Features: Discord Voice Channel Automatic Online FAQ Q: How can I obtain my token? A: 1. How to obtain your token in android 2.

Pranav Ajay 3 Oct 31, 2022
Utilizing the freqtrade high-frequency cryptocurrency trading framework to build and optimize trading strategies. The bot runs nonstop on a Rasberry Pi.

Freqtrade Strategy Repository Please test all scripts and dry run them before using them in live mode Contact me on discord if you have any questions!

Michael Fourie 90 Jan 01, 2023
A multi purpose discord bot for python

Sypher The best multi purpose discord bot. Add Sypher right now Invite Me | Join

Johan Naizu 1 Dec 15, 2022