AWS Glue PySpark - Apache Hudi Quick Start Guide

Overview

AWS Glue PySpark - Apache Hudi Quick Start Guide

Disclaimer:

This is a quick start guide for the Apache Hudi Python Spark connector, running on AWS Glue.

It's also specifically configured for the following Glue version:

  • AWS Glue 3.0
    • Spark 3.1.1
    • Python 3.7

Glue Configuration Reference: https://docs.aws.amazon.com/glue/latest/dg/add-job.html

Apache Hudi Reference: https://hudi.apache.org/docs/quick-start-guide/ for more information

Prerequisites:

- Python 3.6 or higher
- AWS CLI - Profile named 'dev' with Administrator Access (https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-profiles.html)

Folder Structure:

glue-hudi-hello
├── README.md
├── cloud-formation
│   ├── command.md
│   └── GlueJobPySparkHudi.yaml
├── jars
│   ├── command.md
│   ├── hudi-spark3-bundle_2.12-0.9.0.jar
│   └── spark-avro_2.12-3.0.1.jar
├── job
│   ├── command.md
│   └── job.py
│   └── upload_job.py
├── requirements.txt

Step 1: Create and activate a virtualenv:

Create a new virtual environment for the project in its root directory:

python3 -m venv venv

Activate it:

source venv/bin/activate

Run from the root directory the pip install to get boto3.

pip install -r requirements.txt

Step 2: Create the AWS Resources:

Now, with a aws configured profile named as dev, cd into the cloud-formation folder and run the command in command.md.

As a AWS Cloud Formation exercise, read the command Parameters and how they are used on the GlueJobPySparkHudi.yaml file to dynamically create the Glue Job and S3 Bucket.

Step 3: Upload the Job and Jars to S3:

cd into the job folder and run the command in command.md.

cd into the jars folder and run the commands in command.md. Note: There is one command for each jar.

Step 4: Check AWS Resources results:

Log into aws console and check the Glue Job and S3 Bucket.

On the AWS Glue console, you can run the Glue Job by clicking on the job name.

After the job is finished, you can check the Glue Data Catalog and query the new database from AWS Athena.

On AWS Athena check for the database: hudi_demo and for the table: hudi_trips.

Owner
Gabriel Amazonas Mesquita
Gabriel Amazonas Mesquita
Implementation of the paper 'Sentence Bottleneck Autoencoders from Transformer Language Models'

Introduction This repository contains the code for the paper Sentence Bottleneck Autoencoders from Transformer Language Models by Ivan Montero, Nikola

Ivan Montero 14 Dec 28, 2022
asyncio client for Deta Cloud

aiodeta Unofficial client for Deta Clound Install pip install aiodeta Supported functionality Deta Base Deta Drive Decorator for cron tasks Examples i

Andrii Leitsius 19 Feb 14, 2022
A Slack bot for playing Texas Hold 'Em where the currency is various workout tasks e.g. pushups

A Slack app/bot for playing Texas Hold 'Em where the currency is various workout tasks e.g. pushups. The intent is to make the workday more fun & active for remote teams.

Kyle McIntyre 3 Sep 19, 2022
Sakura: an powerfull Autofilter bot that can be used in your groups

Sakura AutoFilter This Bot May Look Like Mwk_AutofilterBot And Its Because I Like Its UI, That's All Sakura is an powerfull Autofilter bot that can be

PaulWalker 12 Oct 21, 2022
Simple Bot With Python 3.8+ For Converstaion Your Media

Media-Conversation Simple Bot With Python 3.8+ For Converstaion Your Media

Farzin 2 Dec 06, 2021
This automation protect against subdomain takeover on AWS env which also send alerts on slack.

AWS_Subdomain_Takeover_Detector Purpose The purpose of this automation is to detect misconfigured Route53 entries which are vulnerable to subdomain ta

Puneet Kumar Maurya 8 May 18, 2022
A PowerPacked Version Of Telegram Leech Bot With Modern Easy-To-Use Interface & UI !

FuZionX Leech Bot A Powerful Telegram Leech Bot Modded by MysterySD to directly Leech to Telegram, with Multi Direct Links Support for Enhanced Leechi

MysterySD 28 Oct 09, 2022
This Telegram bot allows you to create direct links with pre-filled text to WhatsApp Chats

WhatsApp API Bot Telegram bot to create direct links with pre-filled text for WhatsApp Chats You can check our bot here. The bot is based on the API p

RobotTrick • רובוטריק 17 Aug 20, 2022
C Y B Ξ R UserBot is a project that simplifies the use of Telegram.

C Y B Ξ R USΞRBOT 🇦🇿 C Y B Ξ R UserBot is a project that simplifies the use of Telegram. All rights reserved. Automatic Setup Android: open Termux p

FVREED 4 Dec 07, 2022
Python Dialogflow CX Scripting API (SCRAPI)

Python Dialogflow CX Scripting API (SCRAPI) A high level scripting API for bot builders, developers, and maintainers. Table of Contents Introduction W

Google Cloud Platform 39 Dec 09, 2022
A custom rom post bot for Telegram.

Rom Poster Bot A simple Post Bot written in Python using pyTelegramBotAPI to post rom updates to telegram whenever you need. Made by lazy peep for laz

Prajwal 6 Nov 03, 2022
Braintree Python library

Braintree Python library The Braintree Python library provides integration access to the Braintree Gateway. TLS 1.2 required The Payment Card Industry

Braintree 230 Dec 18, 2022
OpenVisionAPI client

OpenVisionAPI Client 🚀 Getting Started Prerequisites Installing Install the dependencies $ make setup Usage $ source .venv/bin/activate $ ./ova_clie

Open Vision API 40 Nov 11, 2022
Telegram Bot to Connect Strangers

Telegram Bot to Connect Strangers How to Run Set your telegram bot token as environment variable TELEGRAM_BOT_TOKEN: export TELEGRAM_BOT_TOKEN=your_t

PyTopia 12 Dec 24, 2022
Using Streamlit to build a simple UI on top of the OpenSea API

OpenSea API Explorer Using Streamlit to build a simple UI on top of the OpenSea API. 🤝 Contributing Contributions, issues and feature requests are we

Gavin Capriola 1 Jan 04, 2022
This is a Anti Channel Ban Robots

AntiChannelBan This is a Anti Channel Ban Robots delete and ban message sent by channels Heroku Deployment 💜 Heroku is the best way to host ur Projec

BᵣₐyDₑₙ 25 Dec 10, 2021
Discord bot do sprawdzania ceny pizzy.

Discord bot do sprawdzania ceny pizzy w pizzeri Bombola. Umieszczony jest na platformie Heroku, dzięki czemu działa 24/7. Commands List Info: Jako com

1 Sep 18, 2021
Open Source Discord bot with many cool features like Weather, Balance, Avatar, User, Server, RP-commands, Gif search, YouTube search, VK post search etc.

Сокобот Дискорд бот с открытым исходным кодом. Содержит в себе экономику, полезные команды (!аватар, !юзер, !сервер и тд.), рп-команды (!обнять, !глад

serverok 2 Jan 16, 2022
Google translator bot using pyTelegramBotAPI

iTranslator-bot Super google translator bot using pyTelegramBotAPI A bot is a professional bot that automatically detects a language in texts or capti

Abdulatif 6 Nov 22, 2022
Make low level API wrapper in fast, easy.

The lowrapper is a library for quickly and easily creating an environment for tapping the API without implementation.

tasuren 1 Oct 25, 2022