AWS Glue PySpark - Apache Hudi Quick Start Guide

Overview

AWS Glue PySpark - Apache Hudi Quick Start Guide

Disclaimer:

This is a quick start guide for the Apache Hudi Python Spark connector, running on AWS Glue.

It's also specifically configured for the following Glue version:

  • AWS Glue 3.0
    • Spark 3.1.1
    • Python 3.7

Glue Configuration Reference: https://docs.aws.amazon.com/glue/latest/dg/add-job.html

Apache Hudi Reference: https://hudi.apache.org/docs/quick-start-guide/ for more information

Prerequisites:

- Python 3.6 or higher
- AWS CLI - Profile named 'dev' with Administrator Access (https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-profiles.html)

Folder Structure:

glue-hudi-hello
├── README.md
├── cloud-formation
│   ├── command.md
│   └── GlueJobPySparkHudi.yaml
├── jars
│   ├── command.md
│   ├── hudi-spark3-bundle_2.12-0.9.0.jar
│   └── spark-avro_2.12-3.0.1.jar
├── job
│   ├── command.md
│   └── job.py
│   └── upload_job.py
├── requirements.txt

Step 1: Create and activate a virtualenv:

Create a new virtual environment for the project in its root directory:

python3 -m venv venv

Activate it:

source venv/bin/activate

Run from the root directory the pip install to get boto3.

pip install -r requirements.txt

Step 2: Create the AWS Resources:

Now, with a aws configured profile named as dev, cd into the cloud-formation folder and run the command in command.md.

As a AWS Cloud Formation exercise, read the command Parameters and how they are used on the GlueJobPySparkHudi.yaml file to dynamically create the Glue Job and S3 Bucket.

Step 3: Upload the Job and Jars to S3:

cd into the job folder and run the command in command.md.

cd into the jars folder and run the commands in command.md. Note: There is one command for each jar.

Step 4: Check AWS Resources results:

Log into aws console and check the Glue Job and S3 Bucket.

On the AWS Glue console, you can run the Glue Job by clicking on the job name.

After the job is finished, you can check the Glue Data Catalog and query the new database from AWS Athena.

On AWS Athena check for the database: hudi_demo and for the table: hudi_trips.

Owner
Gabriel Amazonas Mesquita
Gabriel Amazonas Mesquita
🔮 Uncover some followers of a private instagram account

Private Instagram Chaining 🔮 Uncover part of followers of an instagram private account I have this private instagram account julianakhao. I need to g

аэт 69 Dec 17, 2022
A Telegram bot that scrapes websites for available vaccination appointments to notify users. (German)

@dachau_impf_bot 🇬🇧 A Telegram bot to check the contents of https://termin.dachau-med.de for available slots and inform users of the available dates

1 Nov 21, 2021
Bot simply search for the files from provided channel according to given query and gives link to those files as buttons!

Auto Filter Bot ㅤㅤㅤㅤㅤㅤㅤ ㅤㅤㅤㅤㅤㅤㅤ You can call this as an Auto Filter Bot if you like :D Bot simply search for the files from provided channel according

TroJanzHEX 89 Nov 23, 2022
Video Stream: an Advanced Telegram Bot that's allow you to play Video & Music on Telegram Group Video Chat

Video Stream is an Advanced Telegram Bot that's allow you to play Video & Music

SHU KURENAI TEAM 4 Nov 05, 2022
A simple Discord bot that can fetch definitions and post them in chat.

A simple Discord bot that can fetch definitions and post them in chat. If you are connected to a voice channel, the bot will also read out the definition to you.

Tycho Bellers 4 Sep 29, 2022
Astro Bot With Golang

Astro-Bot Features: Astronomy Picture of the day Horoscope People In Space How we built it Our team was broken, one person didn't do anything the othe

Vaarun Sinha 1 Nov 21, 2021
Python Library for Secp256k1 Bitcoin curve to do fast ECC calculation

secp256k1 Python Library for Secp256k1 Bitcoin curve to do fast ECC calculation Example Usage import secp256k1 as ice print('[C]',privatekey_to_addres

iceland 49 Jan 01, 2023
Huggingface transformers for discord

disformers Huggingface transformers for discord base source butyr/huggingface-transformer-chatbots install pip install -U disformers example see examp

SpaceDEVofficial 1 Nov 09, 2021
Financial portfolio optimisation in python, including classical efficient frontier, Black-Litterman, Hierarchical Risk Parity

PyPortfolioOpt has recently been published in the Journal of Open Source Software 🎉 PyPortfolioOpt is a library that implements portfolio optimizatio

Robert Martin 3.2k Jan 02, 2023
Discord Rich Presence for Team Fortress 2

TF2 Rich Presence Discord Rich Presence for Team Fortress 2 Detects current game state, queue info, playtime, and more Configurable, reliable, and per

Kataiser 33 Dec 31, 2022
Probably Overengineered Unimore Booker

POUB Probably Overengineered Unimore Booker A python-powered, actor-based, telegram-facing, timetable-aware booker for unimore (if you know more adjec

Lorenzo Rossi 3 Feb 20, 2022
Disco is an extensive and extendable Python 2.x/3.x library for the Discord API.

disco Disco is an extensive and extendable Python 2.x/3.x library for the Discord API. Disco boasts the following major features: Expressive, function

1 Nov 18, 2021
Cities bot - A simple example of using aiogram and the wikipedia package

Cities game A simple example of using aiogram and the wikipedia package. The bot

Artem Meller 2 Jan 29, 2022
Bot playing "mathbattle" game from Telegram messenger

mathbattlebot Bot playing mathbattle game from Telegram messenger Installing: run in command line pip3 install -r requirements.txt Running: Example c

Egor 1 May 30, 2022
A listener for RF >= 4.0 that prints a Stack Trace to console to faster find the code section where the failure appears.

robotframework-stacktrace A listener for RF = 4.0 that prints a Stack Trace to console to faster find the code section where the failure appears. Ins

marketsquare 16 Nov 24, 2022
Ap lokit lokit

🎵 FANDA PROJECT 🎵 HAI AKU FANDA! Requirements 📝 FFmpeg NodeJS nodesource.com Python 3.8 or higher PyTgCalls MongoDB Get STRING_SESSION from below:

Fatur 2 Nov 18, 2021
A Pythonic client for the official https://data.gov.gr API.

pydatagovgr An unofficial Pythonic client for the official data.gov.gr API. Aims to be an easy, intuitive and out-of-the-box way to: find data publish

Ilias Antonopoulos 40 Nov 10, 2022
music recommend chat bot

Your Song A chat bot who can recommend music for you. Project Documents https://drive.google.com/drive/folders/1zbHbuRyrUgMrO-LtDXrXwqycN_ysuAUx Dir I

4 Mar 27, 2022
Anti Spam/NSFW Telegram Bot Written In Python With Pyrogram.

Anti Spam/NSFW Telegram Bot Written In Python With Pyrogram.

Wahyusaputra 2 Dec 29, 2021
A Discord Token Spammer, multi webhooks compatibility, made in python +3.7. By Ezermoz

DiscordWebhookSpammer A Discord Token Spammer, multi webhooks compatibility, made in python +3.7. By Ezermoz Put you webhook in webhooks.txt if you wa

3 Nov 24, 2021