Source files for the data lake demo video using the AWS TICKIT database

Overview

Data Lake Demo

Source code for video demonstration detailed in the post, Building a Simple Data Lake on AWS . Build a simple data lake on AWS using a combination of services, including Amazon MWAA, AWS Glue Data Catalog, AWS Glue Crawlers, AWS Glue Jobs, AWS Glue Studio, Amazon Athena, and Amazon S3.

Architecture

Architecture

TICKIT Sample Database

Amazon Redshift TICKIT Sample Database

TICKIT Tables

  • tickit.saas.category
  • tickit.saas.event
  • tickit.saas.venue
  • tickit.crm.users
  • tickit.date
  • tickit.listing
  • tickit.sales

Naming Conventions

+-------------+--------------------------------------------------------------------+
| Prefix      | Description                                                        |
+-------------+--------------------------------------------------------------------+
| _source     | Data Source metadata only (org. call _raw in video)                |
| _raw        | Raw/Bronze data from data sources (org. call _converted in video)  |
| _refined    | Refined/Silver data - raw data with initial ELT/cleansing applied  |
| _aggregated | Gold/Aggregated data - aggregated/joined refined data              |
+-------------+--------------------------------------------------------------------+

AWS CLI Commands

There were two small changes made to the source code, as compared to the video demonstration, to help clarify the flow of data in the demonstration. The prefix for the (7) data source AWS Glue Data Catalog table’s prefix was switched from raw_ from source_. Also, the (7) Raw/Bronze AWS Glue Data Catalog table’s prefix was switched from converted_ to raw_. The final data flow is 1) source_, 2) raw_, 3) refined_, and 4) agg_ (aggregated).

DATA_LAKE_BUCKET="your-data-lake-bucket"

aws s3 rm "s3://${DATA_LAKE_BUCKET}/tickit/" --recursive

aws glue delete-database --name tickit_demo

aws glue create-database \
  --database-input '{"Name": "tickit_demo", "Description": "Track sales activity for the fictional TICKIT web site"}'

aws glue get-tables \
  --database-name tickit_demo \
  --query "TableList[].Name" \
  --output table

aws glue start-crawler --name tickit_postgresql
aws glue start-crawler --name tickit_mysql
aws glue start-crawler --name tickit_mssql

aws glue get-tables \
  --database-name tickit_demo \
  --query "TableList[].Name" \
  --expression "source_*"  \
  --output table

aws glue start-job-run --job-name tickit_public_category_raw
aws glue start-job-run --job-name tickit_public_date_raw
aws glue start-job-run --job-name tickit_public_event_raw
aws glue start-job-run --job-name tickit_public_listing_raw
aws glue start-job-run --job-name tickit_public_sales_raw
aws glue start-job-run --job-name tickit_public_users_raw
aws glue start-job-run --job-name tickit_public_venue_raw

aws glue start-job-run --job-name tickit_public_category_refine
aws glue start-job-run --job-name tickit_public_date_refine
aws glue start-job-run --job-name tickit_public_event_refine
aws glue start-job-run --job-name tickit_public_listing_refine
aws glue start-job-run --job-name tickit_public_sales_refine
aws glue start-job-run --job-name tickit_public_users_refine
aws glue start-job-run --job-name tickit_public_venue_refine

aws glue get-tables \
  --database-name tickit_demo \
  --query "TableList[].Name" \
  --output table

aws s3api list-objects-v2 \
  --bucket ${DATA_LAKE_BUCKET} \
  --prefix "tickit/" \
  --query "Contents[].Key" \
  --output table
Owner
Gary A. Stafford
AWS Senior Solutions Architect | AWS Certified Professional | Cloud | Data | Containers | Serverless | DevOps | Polyglot Developer
Gary A. Stafford
OpenShot Video Editor is an award-winning free and open-source video editor for Linux, Mac, and Windows, and is dedicated to delivering high quality video editing and animation solutions to the world.

OpenShot Video Editor is an award-winning free and open-source video editor for Linux, Mac, and Windows, and is dedicated to delivering high quality v

OpenShot Studios, LLC 3.1k Jan 01, 2023
Create a Video Membership app using FastAPI & NoSQL

Video Membership Create a Video Membership app using FastAPI & NoSQL. In this series, we're going to explore building a membership application using F

Coding For Entrepreneurs 69 Dec 25, 2022
Become a virtual character with just your webcam!

Become a virtual character with just your webcam!

Rich 300 Jan 03, 2023
Script simples para baixar vídeos/áudios/playlist do YouTube

🔗 VilelaTube ▶️ Script simples para baixar vídeos/áudios/playlist do YouTube Requisitos • Como usar • Melhorias futuras ⚠️ Atenção! ⚠️ Lembre-se de a

João Victor Vilela dos Santos 2 Nov 03, 2021
Simple background blur for your webcam

backgroundblur Simple background blur for your webcam. This script will capture your webcams output, add a blur effect to the background and output th

Stefan Wagner 4 Dec 07, 2021
Python package for Near Duplicate Video Detection (Perceptual Video Hashing) - Get a 64-bit comparable hash-value for any video.

The Python package for near duplicate video detection ⭐️ Introduction Videohash is a Python package for detecting near-duplicate videos (Perceptual Vi

Akash Mahanty 144 Dec 19, 2022
Stream music with ffmpeg and python

youtube-stream Stream music with ffmpeg and python original Usage set the KEY in stream.sh run server.py run stream.sh (You can use Git bash or WSL in

Giyoung Ryu 14 Nov 17, 2021
Python program - to extract slides from videos

Programa em Python - que fiz em algumas horas e que provavelmente tem bugs - para extrair slides de vídeos.

Natanael Antonioli 45 Nov 12, 2022
Automatic video generator for local news

Automatic video generator for local news

Gabriel Monteiro 2 Jan 11, 2022
Wonkey - an open source programming language for the creation of cross-platform video games

Wonkey Programming Language Wonkey is an open source programming language for the creation of cross-platform video games, highly inspired by the “Blit

Wonkey Coders 110 Nov 09, 2022
Stream-Cli application that allow you to play your favorite movies from the terminal

Stream-Cli application that allow you to play your favorite movies from the terminal

redouane 380 Jan 08, 2023
A way to run youtube videos in TTY

TTY youtube client its finally here, the one thing literally NO ONE ASKED FOR!! A way to run youtube videos in TTY Dependencies: (pip) yt-search (syst

1 Nov 28, 2021
Adblocker for movie subtitles.

SubAdBlock Adblocker for movie subtitles. Usage Place "main.py" and "config.conf" in directory with subtitles and run "main.py". It will automatically

Marko Živić 1 Jan 09, 2022
Media player custom component which works with MQTT.

Media player custom component which works with MQTT. I designed this to specifically work with a ESP32 which i used to control a speakercraft amp.

2 Feb 10, 2022
pygamevideo module helps developer to embed videos into their Pygame display

pygamevideo module helps developer to embed videos into their Pygame display. Audio playback doesn't use pygame.mixer.

Kadir Aksoy 10 Dec 28, 2022
video streaming userbot (vsu) based on pytgcalls for streaming video trought the telegram video chat group.

VIDEO STREAM USERBOT ✨ an another telegram userbot for streaming video trought the telegram video chat. Environmental Variables 📌 API_ID : Get this v

levina 6 Oct 17, 2021
Real-time video and audio streams over the network, with Streamlit.

streamlit-webrtc Example You can try out the sample app using the following commands.

Yuichiro Tachibana (Tsuchiya) 648 Jan 01, 2023
Source files for the data lake demo video using the AWS TICKIT database

Data Lake Demo Source code for video demonstration detailed in the post, Building a Simple Data Lake on AWS . Build a simple data lake on AWS using a

Gary A. Stafford 97 Dec 23, 2022
Python script for extracting audio from video files and creating Mel spectrograms

video2spectrogram About This package is meant to automate the process of extracting audio files from videos and saving the plots computed from these a

Alexandros Stergiou 1 Oct 28, 2021
Program to play videos with props in Apex Legends

R5Fresh A video player for the Apex Legends mod R5Reloaded

9 Nov 13, 2022