Source files for the data lake demo video using the AWS TICKIT database

Overview

Data Lake Demo

Source code for video demonstration detailed in the post, Building a Simple Data Lake on AWS . Build a simple data lake on AWS using a combination of services, including Amazon MWAA, AWS Glue Data Catalog, AWS Glue Crawlers, AWS Glue Jobs, AWS Glue Studio, Amazon Athena, and Amazon S3.

Architecture

Architecture

TICKIT Sample Database

Amazon Redshift TICKIT Sample Database

TICKIT Tables

  • tickit.saas.category
  • tickit.saas.event
  • tickit.saas.venue
  • tickit.crm.users
  • tickit.date
  • tickit.listing
  • tickit.sales

Naming Conventions

+-------------+--------------------------------------------------------------------+
| Prefix      | Description                                                        |
+-------------+--------------------------------------------------------------------+
| _source     | Data Source metadata only (org. call _raw in video)                |
| _raw        | Raw/Bronze data from data sources (org. call _converted in video)  |
| _refined    | Refined/Silver data - raw data with initial ELT/cleansing applied  |
| _aggregated | Gold/Aggregated data - aggregated/joined refined data              |
+-------------+--------------------------------------------------------------------+

AWS CLI Commands

There were two small changes made to the source code, as compared to the video demonstration, to help clarify the flow of data in the demonstration. The prefix for the (7) data source AWS Glue Data Catalog table’s prefix was switched from raw_ from source_. Also, the (7) Raw/Bronze AWS Glue Data Catalog table’s prefix was switched from converted_ to raw_. The final data flow is 1) source_, 2) raw_, 3) refined_, and 4) agg_ (aggregated).

DATA_LAKE_BUCKET="your-data-lake-bucket"

aws s3 rm "s3://${DATA_LAKE_BUCKET}/tickit/" --recursive

aws glue delete-database --name tickit_demo

aws glue create-database \
  --database-input '{"Name": "tickit_demo", "Description": "Track sales activity for the fictional TICKIT web site"}'

aws glue get-tables \
  --database-name tickit_demo \
  --query "TableList[].Name" \
  --output table

aws glue start-crawler --name tickit_postgresql
aws glue start-crawler --name tickit_mysql
aws glue start-crawler --name tickit_mssql

aws glue get-tables \
  --database-name tickit_demo \
  --query "TableList[].Name" \
  --expression "source_*"  \
  --output table

aws glue start-job-run --job-name tickit_public_category_raw
aws glue start-job-run --job-name tickit_public_date_raw
aws glue start-job-run --job-name tickit_public_event_raw
aws glue start-job-run --job-name tickit_public_listing_raw
aws glue start-job-run --job-name tickit_public_sales_raw
aws glue start-job-run --job-name tickit_public_users_raw
aws glue start-job-run --job-name tickit_public_venue_raw

aws glue start-job-run --job-name tickit_public_category_refine
aws glue start-job-run --job-name tickit_public_date_refine
aws glue start-job-run --job-name tickit_public_event_refine
aws glue start-job-run --job-name tickit_public_listing_refine
aws glue start-job-run --job-name tickit_public_sales_refine
aws glue start-job-run --job-name tickit_public_users_refine
aws glue start-job-run --job-name tickit_public_venue_refine

aws glue get-tables \
  --database-name tickit_demo \
  --query "TableList[].Name" \
  --output table

aws s3api list-objects-v2 \
  --bucket ${DATA_LAKE_BUCKET} \
  --prefix "tickit/" \
  --query "Contents[].Key" \
  --output table
Owner
Gary A. Stafford
AWS Senior Solutions Architect | AWS Certified Professional | Cloud | Data | Containers | Serverless | DevOps | Polyglot Developer
Gary A. Stafford
A Python extension that provides bindings to WebRTC M92

This project follows the W3C specification with some modifications and additions to make it work better with Python applications, with useful APIs like programmatic audio and video.

Il'ya 104 Dec 26, 2022
Python retagging utility for mkv files using mkvmerge.

pyretag A python script to retag mkv files. Setting Up pip install pyfiglet pip install rich Move the mkv files to input folder.

25 Dec 04, 2022
A platform which give you info about the newest video on a channel

youtube A platform which give you info about the newest video on a channel. This uses web scraping, a better implementation will be to use the API. BR

Custom components for Home Assistant 36 Sep 29, 2022
Boltstream Live Video Streaming Website + Backend

Boltstream Self-hosted Live Video Streaming Website + Backend Reference

Ben Wilber 1.7k Dec 28, 2022
Rune - a video miniplayer made with Python.

Rune - a video miniplayer made with Python.

1 Dec 13, 2021
Streamlink is a CLI utility which pipes video streams from various services into a video player

Streamlink is a CLI utility which pipes video streams from various services into a video player

8.2k Dec 26, 2022
Rembg Video Virtual Green Screen Edition

Rembg Virtual Greenscreen Edition is a tool to create a green screen matte for videos

Tim Scarfe 217 Jan 06, 2023
Wonkey - an open source programming language for the creation of cross-platform video games

Wonkey Programming Language Wonkey is an open source programming language for the creation of cross-platform video games, highly inspired by the “Blit

Wonkey Coders 110 Nov 09, 2022
Converts Betaflight blackbox gyro to MP4 GoPro Meta data so it can be used with ReelSteady GO

Here are a bunch of scripts that I created some time ago as a proof of concept that Betaflight blackbox gyro data can be converted to GoPro Metadata F

108 Oct 05, 2022
Become a virtual character with just your webcam!

Become a virtual character with just your webcam!

Rich 300 Jan 03, 2023
Python application that can be used to generate video thumbnail for mp4 and mkv file types.

Thumbnail Generator 🎬 What is This This is a Python application that can be used to generate video thumbnail for mp4 and mkv file types. Installation

Tharindu N. 13 Jan 03, 2023
Tautulli - A Python based monitoring and tracking tool for Plex Media Server.

Tautulli A python based web application for monitoring, analytics and notifications for Plex Media Server. This project is based on code from Headphon

Tautulli 4.7k Jan 07, 2023
camKapture is an open source application that allows users to access their webcam device and take pictures or create videos.

camKapture is an open source application that allows users to access their webcam device and take pictures or create videos.

manoj 1 Jun 21, 2022
Video-stream - A telegram video stream bot repo

This is a Telegram Video stream Bot. Binary Tech 💫 Features stream videos downl

silentz lk 1 Feb 02, 2022
Your self hosted Youtube media server

The Tube Archivist Your self hosted Youtube media server Core functionality Subscribe to your favourite Youtube channels Download Videos using yt-dlp

Simon 2.1k Dec 31, 2022
Simple background blur for your webcam

backgroundblur Simple background blur for your webcam. This script will capture your webcams output, add a blur effect to the background and output th

Stefan Wagner 4 Dec 07, 2021
video streaming userbot (vsu) based on pytgcalls for streaming video trought the telegram video chat group.

VIDEO STREAM USERBOT ✨ an another telegram userbot for streaming video trought the telegram video chat. Environmental Variables 📌 API_ID : Get this v

levina 6 Oct 17, 2021
DICexport is a GUI (PyQt5) to export digital image correlation videos

DIC Video Exporter DICexport is a GUI (PyQt5) to export digital image correlation videos. It offers the flexibility to choose a selected range of a vi

Chaoyi Zhu 0 Jun 23, 2022
Convert lecture videos to slides in one line. Takes an input of a directory containing your lecture videos and outputs a directory containing .PDF files containing the slides of each lecture.

Convert lecture videos to slides in one line. Takes an input of a directory containing your lecture videos and outputs a directory containing .PDF files containing the slides of each lecture.

Sidharth Anand 12 Sep 10, 2022
This application makes a webrtc video call with jitsi meet signaling

gstreamer-jitsi-meet This application makes a webrtc video call with jitsi meet signaling. Other end can be any jitsi meet app or web app. It doesn't

Linh 7 Apr 26, 2022