Jupyter notebooks and AWS CloudFormation template to show how Hudi, Iceberg, and Delta Lake work

Overview

Modern Data Lake Storage Layers

This repository contains supporting assets for my research in modern Data Lake storage layers like Apache Hudi, Apache Iceberg, and Delta Lake.

Specifically, there's a CloudFormation template to create an EMR cluster and EMR Studio with the necessary requirements and Jupyter notebooks with the example walkthroughs.

You can view the corresponding blog post and video

Pre-requisites

You'll need an AWS Account in which you have administrator privileges and the ability to deploy a CloudFormation template. The template will create an EMR Cluster and S3 bucket that will incur charges - be sure to either shut down the cluster when done or delete the CloudFormation stack. In order to delete the CloudFormation stack, you'll need to:

  • Manually delete any EMR Studio Workspaces you created
  • Manually empty the S3 bucket created by CloudFormation
  • Manually delete the VPC created by CloudFormation due to auto-created rules

Overview

The included CloudFormation template creates a new VPC and EMR Cluster for you to be able to run the notebooks. An EMR Studio is also created and you can find the Studio URL in the Outputs tab of your CloudFormation Stack.

Once the stack is done creating, you'll need to navigate to EMR Studio and create a new workspace attached to the "data-lakes" cluster.

Inside the workspace you either upload each notebook individually from the notebooks/ folder or simply connect to this repository by using the "Git" icon on the left-hand side.

A multipurpose, semi-modular Discord bot written in Python with the new discord.py module.

Discord.py Reaction Bot MIRAI KURIYAMA A multipurpose, semi-modular Discord bot written in Python with the new discord.py module. Installing dependenc

1 Dec 02, 2021
Apprise - Push Notifications that work with just about every platform!

ap·prise / verb To inform or tell (someone). To make one aware of something. Apprise allows you to send a notification to almost all of the most popul

Chris Caron 7.2k Jan 07, 2023
music recommend chat bot

Your Song A chat bot who can recommend music for you. Project Documents https://drive.google.com/drive/folders/1zbHbuRyrUgMrO-LtDXrXwqycN_ysuAUx Dir I

4 Mar 27, 2022
Send to Telegram, Vk, Discord

Triple send Версия для русских: здесь Demo: Telegram: @Triple_project_bot Discord: Triple project#0877 Vkontakte: @dev.santaspeen How to run Install r

2 Sep 27, 2022
A Simple, Easy to use and light-weight Pyrogram Userbot

Nexa Userbot A Simple, Easy to use and light-weight Pyrogram Userbot Deploy With Heroku With VPS (Local) Clone Nexa-Userbot repository git clone https

I'm Not A Bot #Left_TG 28 Nov 12, 2022
Project made to analyse movie trends

MovieTrends Project to analyse the daily movie trends from the website The Movie DataBase. The main idea is upload the results to a PostgreSQL server

Jazmín López Chacón 0 Feb 15, 2022
A Phyton script working for stream twits from twitter by tweepy to mongoDB

twitter-to-mongo A python script that uses the Tweepy library to pull Tweets with specific keywords from Twitter's Streaming API, and then stores the

3 Feb 03, 2022
HTTP Calls to Amazon Web Services Rest API for IoT Core Shadow Actions 💻🌐💡

aws-iot-shadow-rest-api HTTP Calls to Amazon Web Services Rest API for IoT Core Shadow Actions 💻 🌐 💡 This simple script implements the following aw

AIIIXIII 3 Jun 06, 2022
LoL 台版10周年活動自動輸入邀請碼

LoLTW_10Year_88Event LoLTW 8.8 周年慶 邀請碼自動輸入 設定 在 LoLTW_10Year_88Evnet.exe 的位置建立一個檔案 .env,內容如下 Bahamut_Discussion = https://forum.gamer.com.tw/C.php?bsn

古丁丁 5 Dec 13, 2021
Wordle-bot: A Discord bot to track you and your friends' Wordle scores.

wordle-bot A Discord bot to track you and your friends' Wordle scores, so you can see who's the best! To submit a score to wordle-bot, just paste the

Spencer Murray 8 Feb 16, 2022
Automated endpoint management for Amazon Aurora Global Database

This sample code can be used to manage Aurora global database endpoints. After failover the global database writer endpoints swap from one region to the other. This solution automates creation and ma

AWS Samples 13 Dec 08, 2022
The world's first public V2ray manager Telegram bot

📌 DarkV2ray-Manager-Bot 0.1 UPDATE 11/11/2021 Telegram bot v2ray Test user expired date data limit paylode && sni usage user on/off heroku bot hostin

@Dk_king_offcial 1 Nov 11, 2021
🔮 Uncover some followers of a private instagram account

Private Instagram Chaining 🔮 Uncover part of followers of an instagram private account I have this private instagram account julianakhao. I need to g

аэт 69 Dec 17, 2022
Telegram bot to stream videos in telegram voicechat for both groups and channels

Telegram bot to stream videos in telegram voicechat for both groups and channels. Supports live streams, YouTube videos and telegram media. With record stream support, Schedule streams, and many more

ALBY 9 Feb 20, 2022
Tglogging - A python package to send your app logs to a telegram chat in realtime

Telegram Logger A simple python package to send your app logs to a telegram chat

SUBIN 60 Dec 27, 2022
A complete Python application to automatize the process of uploading files to Amazon S3

Upload files or folders (even with subfolders) to Amazon S3 in a totally automatized way taking advantage of: Amazon S3 Multipart Upload: The uploaded

Pol Alzina 1 Nov 20, 2021
A Telegram Bot That Can Find Lyrics Of Song

Lyrics-Search-Bot A Telegram Bot That Can Find Lyrics Of Song A Simple Telegram Bot That Can Extract Lyrics Of Any Songs Deploy Commands start - To St

Muhammed Fazin 11 Oct 21, 2022
Ap lokit lokit

🎵 FANDA PROJECT 🎵 HAI AKU FANDA! Requirements 📝 FFmpeg NodeJS nodesource.com Python 3.8 or higher PyTgCalls MongoDB Get STRING_SESSION from below:

Fatur 2 Nov 18, 2021
Linky bot, A open-source discord bot that allows you to add links to ur website, youtube url, etc for the people all around discord to see!

LinkyBot Linky bot, An open-source discord bot that allows you to add links to ur website, youtube url, etc for the people all around discord to see!

AlexyDaCoder 1 Sep 20, 2022
Мои личные наработки по новому API Тинькофф. Не официально.

TinkoffNewAPI Мои личные наработки по новому API Тинькофф. Не официально. Официально по ссылке: https://github.com/Tinkoff/investAPI/ Выложено по прос

1 Jan 20, 2022