Jupyter notebooks and AWS CloudFormation template to show how Hudi, Iceberg, and Delta Lake work

Overview

Modern Data Lake Storage Layers

This repository contains supporting assets for my research in modern Data Lake storage layers like Apache Hudi, Apache Iceberg, and Delta Lake.

Specifically, there's a CloudFormation template to create an EMR cluster and EMR Studio with the necessary requirements and Jupyter notebooks with the example walkthroughs.

You can view the corresponding blog post and video

Pre-requisites

You'll need an AWS Account in which you have administrator privileges and the ability to deploy a CloudFormation template. The template will create an EMR Cluster and S3 bucket that will incur charges - be sure to either shut down the cluster when done or delete the CloudFormation stack. In order to delete the CloudFormation stack, you'll need to:

  • Manually delete any EMR Studio Workspaces you created
  • Manually empty the S3 bucket created by CloudFormation
  • Manually delete the VPC created by CloudFormation due to auto-created rules

Overview

The included CloudFormation template creates a new VPC and EMR Cluster for you to be able to run the notebooks. An EMR Studio is also created and you can find the Studio URL in the Outputs tab of your CloudFormation Stack.

Once the stack is done creating, you'll need to navigate to EMR Studio and create a new workspace attached to the "data-lakes" cluster.

Inside the workspace you either upload each notebook individually from the notebooks/ folder or simply connect to this repository by using the "Git" icon on the left-hand side.

A comand-line utility for taking automated screenshots of websites

shot-scraper A comand-line utility for taking automated screenshots of websites For background on this project see shot-scraper: automated screenshots

Simon Willison 837 Jan 07, 2023
Python binding for Terraform.

Python libterraform Python binding for Terraform. Installation $ pip install libterraform NOTE Please install version 0.3.1 or above, which solves the

Prodesire 28 Dec 29, 2022
A discord.py bot template with easy deployment through Github Actions

discord.py bot template A discord.py bot template with easy deployment through Github Actions. You can use this template to just run a Python instance

Thomas Van Iseghem 1 Feb 09, 2022
Unofficial Meteor Client wiki

Welcome to the Unofficial Meteor Client wiki! Meteor FAQs | A rewritten and better FAQ page. Installation Guide | A guide on how to install Meteor Cli

Anti Cope 0 Feb 21, 2022
Herramienta para transferir eventos de Sucuri WAF hacia Azure Blob Storage.

Transfiere eventos de Sucuri hacia Azure Blob Storage Script para transferir eventos del Sucuri Web Application Firewall (WAF) hacia Azure Blob Storag

CSIRT-RD 1 Dec 22, 2021
Python based Algo trading bot for Nifty / Banknifty futures and options

Fully automated Alice Blue Algo Trading with Python on NSE and MCX for Nifty / Crude / Banknifty futures and options , absolutely FREE ! This algo tra

Rajesh Sivadasan 49 Dec 31, 2022
Create Discord Accounts Semi-Automatically Without Captcha Solving API Key

Discord-Account-Generator Create Discord Accounts Semi-Automatically without captcha solving api key IMPORTANT: Your chromedriver version should be th

NotSakshyam 11 Mar 21, 2022
Simple Bot With Python 3.8+ For Converstaion Your Media

Media-Conversation Simple Bot With Python 3.8+ For Converstaion Your Media

Farzin 2 Dec 06, 2021
“ HOLA HUMANS 👋 I'M DAISYX 2.0 ❤️ „ LATEST VERSION OF DAISYX.. Source Code of @Daisyxbot

❤️ DaisyX 2.0 ❤️ A Powerful, Smart And Simple Group Manager ... Written with AioGram , Pyrogram and Telethon... ⭐️ Thanks to everyone who starred Dais

TeamDaisyX 153 Dec 06, 2022
This is a telegram bot hosted by a Raspberry Pi equipped with a temperature and humidity sensor. The bot is capable of sending plots and readings.

raspy-temperature-bot This is a telegram bot hosted by a Raspberry Pi equipped with a temperature and humidity sensor. The bot is capable of sending p

31 May 22, 2022
Trabalho N1 para a materia Tecnicas de Progamação da Anhembi Morumbi

Projeto da Anhembi Morumbi - Tecnicas de Programação. RPG de Console (CMD) Trabalho proposto pelo professor André Santana, na materia Tecnicas de Prog

Leonardo Silva M de Barros 3 Sep 12, 2021
Discord Bot written in Python that plays music in your voice channel

Discord Bot that plays music! I decided to create a simple Discord bot using Python in order to advance my coding skills. Please don't ask me for help

Eric Yeung 39 Jan 01, 2023
Univerity-student oriented (lithuanian) discord bot

Univerity-student oriented (lithuanian) discord bot

3 Nov 30, 2021
A Superfast SMS & Call bomber for Linux And Termux

PSKR_BOMBER 💣 📱 💀 A Superfast SMS & Call bomber for Linux And Termux ! Disclaimer This tool is for educational purposes only ! Don't use this to ta

1 Dec 20, 2021
This is Telegram Files Store Bot by @AbirHasan2005

PyroFilesStoreBot This is Telegram Parmanent Files Store Bot by @AbirHasan2005. Language: Python3 Library: Pyrogram Features: In PM Just Forward or Se

Abir Hasan 168 Dec 19, 2022
Protect Discord server invite link

DiscordOauth2Join Protect discord server invite links! Setup I will not help setting up the discord application, but just python. First, install the r

ZEEE 4 Aug 12, 2021
QR login for pyrogram client

Generate Pyrogram session via QRlogin

ポキ 18 Oct 21, 2022
WikipediaBot from mohirdev.uz

wiki-bot WikipediaBot from mohirdev.uz Requirements wikipedia aiogram Installing wiki/aiogram pip install wikipedia pip install aiogram

Muhammad Ali 5 Sep 28, 2022
An API Client package to access the APIs for NBA.com

nba_api An API Client package to access the APIs for NBA.com Development Version: v1.1.9 nba_api is an API Client for www.nba.com. This package is mea

Swar Patel 1.4k Jan 01, 2023
Script que envia e-mails de denúncia para desativar número de WhatsApp.

SpamReport (Alpha) Este script foi feito apenas para uso educacional, não me responsabilizo por qualquer uso indevido. Version: 1.0 Alpha Ative essa o

Kiny-Kiny 83 Dec 20, 2022