Jupyter notebooks and AWS CloudFormation template to show how Hudi, Iceberg, and Delta Lake work

Overview

Modern Data Lake Storage Layers

This repository contains supporting assets for my research in modern Data Lake storage layers like Apache Hudi, Apache Iceberg, and Delta Lake.

Specifically, there's a CloudFormation template to create an EMR cluster and EMR Studio with the necessary requirements and Jupyter notebooks with the example walkthroughs.

You can view the corresponding blog post and video

Pre-requisites

You'll need an AWS Account in which you have administrator privileges and the ability to deploy a CloudFormation template. The template will create an EMR Cluster and S3 bucket that will incur charges - be sure to either shut down the cluster when done or delete the CloudFormation stack. In order to delete the CloudFormation stack, you'll need to:

  • Manually delete any EMR Studio Workspaces you created
  • Manually empty the S3 bucket created by CloudFormation
  • Manually delete the VPC created by CloudFormation due to auto-created rules

Overview

The included CloudFormation template creates a new VPC and EMR Cluster for you to be able to run the notebooks. An EMR Studio is also created and you can find the Studio URL in the Outputs tab of your CloudFormation Stack.

Once the stack is done creating, you'll need to navigate to EMR Studio and create a new workspace attached to the "data-lakes" cluster.

Inside the workspace you either upload each notebook individually from the notebooks/ folder or simply connect to this repository by using the "Git" icon on the left-hand side.

The aim is to contain multiple models for materials discovery under a common interface

Aviary The aviary contains: - roost, - wren, cgcnn. The aim is to contain multiple models for materials discovery under a common interface Environment

Rhys Goodall 20 Jan 06, 2023
Materials for the AMS 2022 Student Conference Python Workshop.

AMS 2022 Student Conference Python Workshop Let's talk MetPy! Here you will find a collection of notebooks we will be demonstrating and working throug

Unidata 4 Dec 13, 2022
Verkehrsunfälle in Deutschland, aufgeschlüsselt nach Verkehrsmittel des Hauptverursachers und Nebenverursachers

How-To Einfach ./main.py ausführen mit der Statistik-Datei aus dem Ordner "Unfälle_mit_mehreren_Beteiligten" als erstem Argument. Requirements python,

4 Oct 12, 2022
Python client for CoinPayments API

pyCoinPayments - Python API client for CoinPayments Updates This library has now been converted to work with python3 This is an unofficial client for

James 27 Sep 21, 2022
This bot will pull a stream of tweets based on rules you set and automatically reply to them.

Twitter reply bot This bot will pull a stream of tweets based on rules you set and automatically reply to them. I built this bot in order to help comb

Brains 1 Feb 13, 2022
A jokes api python module

A jokes api python module

Fayas Noushad 3 Nov 28, 2021
Market calendar RESTful API with holiday, late open, and early close. Over 50+ unique exchange calendars for global equity and futures markets.

Trading Calendar Market calendar RESTful API with holiday, late open, and early close. Over 50+ unique exchange calendars for global equity and future

Apptastic Software 1 Feb 03, 2022
Script for downloading Coursera.org videos and naming them.

Coursera Downloader Coursera Downloader Introduction Features Disclaimer Installation instructions Recommended installation method for all Operating S

Coursera Downloader 9k Jan 02, 2023
A simple telegram bot to forward files from one channel to other.

Forward_2.0 Bot to forward messages from one channel to other without admin permission in source channel. Can be used for both private and Public chan

SUBIN 56 Dec 29, 2022
A basic Ubisoft API wrapper created in python.

UbisoftAPI A basic Ubisoft API wrapper created in python. I will be updating this with more endpoints as time goes on. Please note that this is my fir

Ethan 2 Oct 31, 2021
Cleiton Leonel 4 Apr 22, 2022
Instrument asyncio Python for distributed tracing with AWS X-Ray.

xraysink (aka xray-asyncio) Extra AWS X-Ray instrumentation to use distributed tracing with asyncio Python libraries that are not (yet) supported by t

Gary Donovan 12 Nov 10, 2022
Ever wanted a dashboard for making your antispam? This is it.

Ever wanted a dashboard for making your antispam? This is it.

Skelmis 1 Oct 27, 2021
A bot framework for Reddit to manage threads, wiki pages, widgets, menus and more.

Sub Manager Sub Manager is a bot framework for Reddit to automate a variety of tasks on one or more subreddits, and can be configured and run without

r/SpaceX 3 Aug 26, 2022
AWS DeepRacer Free Student Workshop: Run faster by using your custom waypoints

AWS DeepRacer Free Student Workshop: Run faster by using your custom waypoints Reward Function Template for waypoints def reward_function(params):

Yuen Cheuk Lam 88 Nov 27, 2022
in-progress decompilation of Gauntlet Legends decompression code on the N64

Gauntlet-Legends A in-progress decompilation of Gauntlet-Legends (N64) decompression code. This project currently supports the US release. Building (L

6 Jul 23, 2022
Telegram bot for stream music or video on telegram

KYURA MUSIC Telegram bot for stream music or video on telegram, powered by PyTgCalls and Pyrogram Help Need Help me to translate this repo, click the

0 Dec 08, 2022
A simple discord bot written in python which can surf subreddits, send a random meme, jokes and also weather of a given place

A simple Discord Bot A simple discord bot written in python which can surf subreddits, send a random meme, jokes and also weather of a given place. We

1 Jan 24, 2022
Telegram File to Link Fastest Bot , also used for movies streaming

Telegram File Stream Bot ! A Telegram bot to stream files to web. Report a Bug | Request Feature About This Bot This bot will give you stream links fo

Avishkar Patil 194 Jan 07, 2023
Connects to a local SenseCap M1 Helium Hotspot and pulls API Data.

sensecap_api_checker_HELIUM Connects to a local SenseCap M1 Helium Hotspot and pulls API Data.

Lorentz Factr 1 Nov 03, 2021