Source files for the data lake demo video using the AWS TICKIT database

Overview

Data Lake Demo

Source code for video demonstration detailed in the post, Building a Simple Data Lake on AWS . Build a simple data lake on AWS using a combination of services, including Amazon MWAA, AWS Glue Data Catalog, AWS Glue Crawlers, AWS Glue Jobs, AWS Glue Studio, Amazon Athena, and Amazon S3.

Architecture

Architecture

TICKIT Sample Database

Amazon Redshift TICKIT Sample Database

TICKIT Tables

  • tickit.saas.category
  • tickit.saas.event
  • tickit.saas.venue
  • tickit.crm.users
  • tickit.date
  • tickit.listing
  • tickit.sales

Naming Conventions

+-------------+--------------------------------------------------------------------+
| Prefix      | Description                                                        |
+-------------+--------------------------------------------------------------------+
| _source     | Data Source metadata only (org. call _raw in video)                |
| _raw        | Raw/Bronze data from data sources (org. call _converted in video)  |
| _refined    | Refined/Silver data - raw data with initial ELT/cleansing applied  |
| _aggregated | Gold/Aggregated data - aggregated/joined refined data              |
+-------------+--------------------------------------------------------------------+

AWS CLI Commands

There were two small changes made to the source code, as compared to the video demonstration, to help clarify the flow of data in the demonstration. The prefix for the (7) data source AWS Glue Data Catalog table’s prefix was switched from raw_ from source_. Also, the (7) Raw/Bronze AWS Glue Data Catalog table’s prefix was switched from converted_ to raw_. The final data flow is 1) source_, 2) raw_, 3) refined_, and 4) agg_ (aggregated).

DATA_LAKE_BUCKET="your-data-lake-bucket"

aws s3 rm "s3://${DATA_LAKE_BUCKET}/tickit/" --recursive

aws glue delete-database --name tickit_demo

aws glue create-database \
  --database-input '{"Name": "tickit_demo", "Description": "Track sales activity for the fictional TICKIT web site"}'

aws glue get-tables \
  --database-name tickit_demo \
  --query "TableList[].Name" \
  --output table

aws glue start-crawler --name tickit_postgresql
aws glue start-crawler --name tickit_mysql
aws glue start-crawler --name tickit_mssql

aws glue get-tables \
  --database-name tickit_demo \
  --query "TableList[].Name" \
  --expression "source_*"  \
  --output table

aws glue start-job-run --job-name tickit_public_category_raw
aws glue start-job-run --job-name tickit_public_date_raw
aws glue start-job-run --job-name tickit_public_event_raw
aws glue start-job-run --job-name tickit_public_listing_raw
aws glue start-job-run --job-name tickit_public_sales_raw
aws glue start-job-run --job-name tickit_public_users_raw
aws glue start-job-run --job-name tickit_public_venue_raw

aws glue start-job-run --job-name tickit_public_category_refine
aws glue start-job-run --job-name tickit_public_date_refine
aws glue start-job-run --job-name tickit_public_event_refine
aws glue start-job-run --job-name tickit_public_listing_refine
aws glue start-job-run --job-name tickit_public_sales_refine
aws glue start-job-run --job-name tickit_public_users_refine
aws glue start-job-run --job-name tickit_public_venue_refine

aws glue get-tables \
  --database-name tickit_demo \
  --query "TableList[].Name" \
  --output table

aws s3api list-objects-v2 \
  --bucket ${DATA_LAKE_BUCKET} \
  --prefix "tickit/" \
  --query "Contents[].Key" \
  --output table
Owner
Gary A. Stafford
AWS Senior Solutions Architect | AWS Certified Professional | Cloud | Data | Containers | Serverless | DevOps | Polyglot Developer
Gary A. Stafford
Automatically logs into VTOP and can perform certain tasks

VTOP_Login Automatically logs into VTOP and can perform certain tasks To run the

Jatin 1 Jan 30, 2022
Autocut the Twitch VODs based on Marker

Markut Given the VOD of the stream and the markers that are exported as a CSV file, generate a video using ffmpeg that cuts out part of the VOD accord

Tsoding 18 Dec 19, 2022
Youtube as covert-channel - Control systems remotely and execute commands by uploading videos to Youtube

covert-tube A program to control systems remotely by uploading videos to Youtube using Python to create the videos and the listener, emulating some ma

Ricardo Ruiz 101 Nov 01, 2022
video streaming userbot (vsu) based on pytgcalls for streaming video trought the telegram video chat group.

VIDEO STREAM USERBOT ✨ an another telegram userbot for streaming video trought the telegram video chat. Environmental Variables 📌 API_ID : Get this v

levina 6 Oct 17, 2021
Telegram Video Chat Video Streaming bot 🇱🇰

🧪 Get SESSION_NAME from below: Pyrogram 🎭 Preview ✨ Features Music & Video stream support MultiChat support Playlist & Queue support Skip, Pause, Re

DOOZY YEZ 5 Jun 26, 2022
A GUI based datamoshing apllication for everyone! Apply this glitch to your videos and gifs. Supports all video formats!

A GUI based datamoshing apllication for everyone! Apply this glitch to your videos and gifs. Supports all video formats!

Akascape 131 Dec 31, 2022
Video processing routines for SciPy

scikit-video Video Processing SciKit BETA Video processing algorithms, including I/O, quality metrics, temporal filtering, motion/object detection, mo

Alex Izvorski 119 Dec 27, 2022
A Python library that simplifies working with video from soccer matches.

Match Video This is a Python library that simplifies working with video from soccer matches. It allows match video to be selected intuitively by perio

Grant Wenzinger 2 Jul 21, 2022
DICexport is a GUI (PyQt5) to export digital image correlation videos

DIC Video Exporter DICexport is a GUI (PyQt5) to export digital image correlation videos. It offers the flexibility to choose a selected range of a vi

Chaoyi Zhu 0 Jun 23, 2022
A Telegram bot to convert videos into x265/x264 format via ffmpeg.

Video Encoder Bot A Telegram bot to convert videos into x265/x264 format via ffmpeg. Configuration Add values in environment variables or add them in

1 Mar 08, 2022
Python Simple Mass Video Clipper (PSMVC)

Python Simple Mass Video Clipper (PSMVC) PSMVC é um gerador de cortes via terminal construído em python. Uso Basta abrir o arquivo start.py Dependenci

Bruno 2 Oct 16, 2021
Youtube-dislikes-adder - Add dislikes to the description of your YouTube videos.

Add number of dislikes to the description of your YouTube videos. Number of dislikes are updated if you let this function as a bot.

fluks 1 Aug 23, 2022
A free project by a normal kamenrider fan

DEMONS DRIVER Python + OpenCV demons.py采集原视频中led灯珠颜色,并将结果输出到output文件夹 Arduino + WS2812B 基于FastLED 实现DEMONS驱动器的led面板效果 项目未完成,持续更新中 --------------------

2 Nov 14, 2022
Video Chat Streamer With Python

Voice Chat Streamer This bot can stream audio or video files and urls in telegram voice chats :) 🎯 Follow me and star this repo for more telegram bot

WiskeyWorm 4 Oct 09, 2022
Automatic video generator for local news

Automatic video generator for local news

Gabriel Monteiro 2 Jan 11, 2022
Add a "flame" effect on each hand's index onto a video stream.

Add a "flame" effect on each hand's index onto a video stream. recording.webm.mov This script is just a quick hack, it's a bit of glue between mediapi

Paul Willot 7 Sep 15, 2022
Converts Betaflight blackbox gyro to MP4 GoPro Meta data so it can be used with ReelSteady GO

Here are a bunch of scripts that I created some time ago as a proof of concept that Betaflight blackbox gyro data can be converted to GoPro Metadata F

108 Oct 05, 2022
It is a simple python package to play videos in the terminal using characters as pixels

It is a simple python package to play videos in the terminal using characters as pixels

Joel Ibaceta 1.4k Jan 07, 2023
Rune - a video miniplayer made with Python.

Rune - a video miniplayer made with Python.

1 Dec 13, 2021
A platform which give you info about the newest video on a channel

youtube A platform which give you info about the newest video on a channel. This uses web scraping, a better implementation will be to use the API. BR

Custom components for Home Assistant 36 Sep 29, 2022