Jupyter notebooks and AWS CloudFormation template to show how Hudi, Iceberg, and Delta Lake work

Overview

Modern Data Lake Storage Layers

This repository contains supporting assets for my research in modern Data Lake storage layers like Apache Hudi, Apache Iceberg, and Delta Lake.

Specifically, there's a CloudFormation template to create an EMR cluster and EMR Studio with the necessary requirements and Jupyter notebooks with the example walkthroughs.

You can view the corresponding blog post and video

Pre-requisites

You'll need an AWS Account in which you have administrator privileges and the ability to deploy a CloudFormation template. The template will create an EMR Cluster and S3 bucket that will incur charges - be sure to either shut down the cluster when done or delete the CloudFormation stack. In order to delete the CloudFormation stack, you'll need to:

  • Manually delete any EMR Studio Workspaces you created
  • Manually empty the S3 bucket created by CloudFormation
  • Manually delete the VPC created by CloudFormation due to auto-created rules

Overview

The included CloudFormation template creates a new VPC and EMR Cluster for you to be able to run the notebooks. An EMR Studio is also created and you can find the Studio URL in the Outputs tab of your CloudFormation Stack.

Once the stack is done creating, you'll need to navigate to EMR Studio and create a new workspace attached to the "data-lakes" cluster.

Inside the workspace you either upload each notebook individually from the notebooks/ folder or simply connect to this repository by using the "Git" icon on the left-hand side.

A discord nuking tool made by python, this also has nuke accounts, inbuilt Selfbot, Massreport, Token Grabber, Nitro Sniper and ALOT more!

Disclaimer: Rage Multi Tool was made for Educational Purposes This project was created only for good purposes and personal use. By using Rage, you agr

†† 50 Jul 19, 2022
Simple Instagram Login Link Generator

instagram-account-login Simple Instagram Login Link Generator Info Program generates instagram login links and you may get into someone´s thought the

Kevin 5 Dec 03, 2022
A module grouping multiple translation APIs

translatepy (originally: translate) An aggregation of multiple translation API Translate, transliterate, get the language of texts in no time with the

349 Jan 06, 2023
Gets instagram public username and returns usefull informations like profilepic(b64), video_urls etc.

InstaSucker Gets instagram public username and returns usefull informations like profilepic(b64), video_urls etc. Information this project contains a

Armin Amiri 5 Apr 30, 2022
Discord-disnake - This package allows to use disnake without changing the discord namespace

This package is a shim This module allows to use disnake using discord namespace. This is not an independent library. Installing Python 3.8 or higher

5 Dec 13, 2022
Demo of using Telegram to send alert message

MIAI_Telegram Demo of using Telegram to send alert message Video link: https://youtu.be/oZ9CsIrlMgg #MìAI Fanpage: http://facebook.com/miaiblog Group

4 Jun 20, 2021
A multi exploit instagram exploitation framework

Instagram Exploitation Framework About IEF Is an open source Instagram Exploitation Framework with various Exploits that could be used to mod your pro

Instagram Exploitation Framework - BirdSecurity 1 May 23, 2022
Asynchronous Python API Wrapper for phisherman.gg

Asynchronous Python API Wrapper for phisherman.gg

Qrista Labs 4 Apr 30, 2022
🎄 JustaGrabber - A discord token grabber written in python3

🎄 JustaGrabber - A discord token grabber written in python3 🎇 Made by kldiscord https://github.com/kldiscord 🌟 Please leave a star if you liked Jus

1 Dec 19, 2022
PR Changes Matrix Builder

Pr-changes-matrix-builder - A Github Action that will output a variable to be used in a matrix strategy job based on a PR&'s changes

Kyle James Walker (he/him) 21 Oct 04, 2022
Pure Python implementation of the Windows API method IDvdInfo2::GetDiscID.

pydvdid-m Pure Python implementation of the Windows API method IDvdInfo2::GetDiscID. This is a modification of sjwood's pydvdid. The Windows API metho

4 Nov 22, 2022
A collection of scripts to steal BTC from Lightning Network enabled custodial services. Only for educational purpose! Share your findings only when design flaws are fixed.

Lightning Network Fee Siphoning Attack LN-fee-siphoning is a collection of scripts to subtract BTC from Lightning Network enabled custodial services b

Reckless_Satoshi 14 Oct 15, 2022
An Anime Themed Fast And Safe Group Managing Bot.

Ξ L I N Λ 👸 A Powerful, Smart And Simple Group Manager bot Avaiilable a latest version as Ξ L I N Λ 👸 on Telegram Self-hosting (For Devs) vps # Inst

7 Nov 12, 2022
Python client for using Prefect Cloud with Saturn Cloud

prefect-saturn prefect-saturn is a Python package that makes it easy to run Prefect Cloud flows on a Dask cluster with Saturn Cloud. For a detailed tu

Saturn Cloud 15 Dec 07, 2022
Python client for CoinPayments API

pyCoinPayments - Python API client for CoinPayments Updates This library has now been converted to work with python3 This is an unofficial client for

James 27 Sep 21, 2022
Discord CTF helper bot for CyberErudites

Eruditus - CTF helper bot Eruditus - CTF helper bot About Eruditus is a Discord CTF helper bot built with Python, it was initially designed to be used

Hafidh 34 Dec 30, 2022
Solcast rooftop api for HA

Solcast Solar Home Assistant(https://www.home-assistant.io/) Component This custom component integrates the Solcast API into Home Assistant. Modified

Greg 1 Oct 11, 2021
🔏 Discordちゃんねる ◆wGFzKUzY7E

使い方 discord.pyをインストール. python -m pip install -r requirements.txtを実行. bot.pyと同じ階層に.tokenを用意. bot.pyを実行. ※現状、使用しているライブラリの関係でWindowsOSは未対応です。 コマンド ニックネーム

Gattxxa 3 Feb 02, 2022
GitPython is a python library used to interact with Git repositories.

Gitoxide: A peek into the future… I started working on GitPython in 2009, back in the days when Python was 'my thing' and I had great plans with it. O

3.8k Jan 03, 2023
Create Fast and easy image datasets using reddit

Reddit-Image-Scraper Reddit Reddit is an American Social news aggregation, web content rating, and discussion website. Reddit has been devided by topi

Wasin Silakong 4 Apr 27, 2022