Source files for the data lake demo video using the AWS TICKIT database

Overview

Data Lake Demo

Source code for video demonstration detailed in the post, Building a Simple Data Lake on AWS . Build a simple data lake on AWS using a combination of services, including Amazon MWAA, AWS Glue Data Catalog, AWS Glue Crawlers, AWS Glue Jobs, AWS Glue Studio, Amazon Athena, and Amazon S3.

Architecture

Architecture

TICKIT Sample Database

Amazon Redshift TICKIT Sample Database

TICKIT Tables

  • tickit.saas.category
  • tickit.saas.event
  • tickit.saas.venue
  • tickit.crm.users
  • tickit.date
  • tickit.listing
  • tickit.sales

Naming Conventions

+-------------+--------------------------------------------------------------------+
| Prefix      | Description                                                        |
+-------------+--------------------------------------------------------------------+
| _source     | Data Source metadata only (org. call _raw in video)                |
| _raw        | Raw/Bronze data from data sources (org. call _converted in video)  |
| _refined    | Refined/Silver data - raw data with initial ELT/cleansing applied  |
| _aggregated | Gold/Aggregated data - aggregated/joined refined data              |
+-------------+--------------------------------------------------------------------+

AWS CLI Commands

There were two small changes made to the source code, as compared to the video demonstration, to help clarify the flow of data in the demonstration. The prefix for the (7) data source AWS Glue Data Catalog table’s prefix was switched from raw_ from source_. Also, the (7) Raw/Bronze AWS Glue Data Catalog table’s prefix was switched from converted_ to raw_. The final data flow is 1) source_, 2) raw_, 3) refined_, and 4) agg_ (aggregated).

DATA_LAKE_BUCKET="your-data-lake-bucket"

aws s3 rm "s3://${DATA_LAKE_BUCKET}/tickit/" --recursive

aws glue delete-database --name tickit_demo

aws glue create-database \
  --database-input '{"Name": "tickit_demo", "Description": "Track sales activity for the fictional TICKIT web site"}'

aws glue get-tables \
  --database-name tickit_demo \
  --query "TableList[].Name" \
  --output table

aws glue start-crawler --name tickit_postgresql
aws glue start-crawler --name tickit_mysql
aws glue start-crawler --name tickit_mssql

aws glue get-tables \
  --database-name tickit_demo \
  --query "TableList[].Name" \
  --expression "source_*"  \
  --output table

aws glue start-job-run --job-name tickit_public_category_raw
aws glue start-job-run --job-name tickit_public_date_raw
aws glue start-job-run --job-name tickit_public_event_raw
aws glue start-job-run --job-name tickit_public_listing_raw
aws glue start-job-run --job-name tickit_public_sales_raw
aws glue start-job-run --job-name tickit_public_users_raw
aws glue start-job-run --job-name tickit_public_venue_raw

aws glue start-job-run --job-name tickit_public_category_refine
aws glue start-job-run --job-name tickit_public_date_refine
aws glue start-job-run --job-name tickit_public_event_refine
aws glue start-job-run --job-name tickit_public_listing_refine
aws glue start-job-run --job-name tickit_public_sales_refine
aws glue start-job-run --job-name tickit_public_users_refine
aws glue start-job-run --job-name tickit_public_venue_refine

aws glue get-tables \
  --database-name tickit_demo \
  --query "TableList[].Name" \
  --output table

aws s3api list-objects-v2 \
  --bucket ${DATA_LAKE_BUCKET} \
  --prefix "tickit/" \
  --query "Contents[].Key" \
  --output table
Owner
Gary A. Stafford
AWS Senior Solutions Architect | AWS Certified Professional | Cloud | Data | Containers | Serverless | DevOps | Polyglot Developer
Gary A. Stafford
Python package for Near Duplicate Video Detection (Perceptual Video Hashing) - Get a 64-bit comparable hash-value for any video.

The Python package for near duplicate video detection ⭐️ Introduction Videohash is a Python package for detecting near-duplicate videos (Perceptual Vi

Akash Mahanty 144 Dec 19, 2022
Docker container to expose a local RTMP, RTSP, and HLS stream for all your Wyze cameras including v3

Docker container to expose a local RTMP, RTSP, and HLS stream for all your Wyze cameras including v3. No Third-party or special firmware required.

1.2k Jan 07, 2023
Video processing routines for SciPy

scikit-video Video Processing SciKit BETA Video processing algorithms, including I/O, quality metrics, temporal filtering, motion/object detection, mo

Alex Izvorski 119 Dec 27, 2022
Rune - a video miniplayer made with Python.

Rune - a video miniplayer made with Python.

1 Dec 13, 2021
A platform which give you info about the newest video on a channel

youtube A platform which give you info about the newest video on a channel. This uses web scraping, a better implementation will be to use the API. BR

Custom components for Home Assistant 36 Sep 29, 2022
Turn any live video stream or locally stored video into a dataset of interesting samples for ML training, or any other type of analysis.

Sieve Video Data Collection Example Find samples that are interesting within hours of raw video, for free and completely automatically using Sieve API

Sieve 72 Aug 01, 2022
PyAV is a Pythonic binding for the FFmpeg libraries.

PyAV is a Pythonic binding for the FFmpeg libraries. We aim to provide all of the power and control of the underlying library, but manage the gritty details as much as possible.

PyAV 1.8k Jan 01, 2023
Uncompress DEFLATE streams in pure Python

stream-deflate Uncompress DEFLATE streams in pure Python. Work in progress. This README serves as a rough design spec. Installation pip install stream

Michal Charemza 7 Oct 13, 2022
A way to run youtube videos in TTY

TTY youtube client its finally here, the one thing literally NO ONE ASKED FOR!! A way to run youtube videos in TTY Dependencies: (pip) yt-search (syst

1 Nov 28, 2021
Video stream recording dockerized server using python/ffmpeg.

Stream Recording Server Video stream recording dockerized server using python/ffmpeg. Usage Configuration Prepare .env file, check .env.example for th

GR 2 Jan 14, 2022
Youtube as covert-channel - Control systems remotely and execute commands by uploading videos to Youtube

covert-tube A program to control systems remotely by uploading videos to Youtube using Python to create the videos and the listener, emulating some ma

Ricardo Ruiz 101 Nov 01, 2022
This plugin generates json files used by deovr allowing you to play 2d and 3d video's using the player

deovr-plugin This plugin generates json files used by deovr allowing you to play 2d and 3d video's using the player. Deovr looks for an index file /de

10 Sep 29, 2022
Add a "flame" effect on each hand's index onto a video stream.

Add a "flame" effect on each hand's index onto a video stream. recording.webm.mov This script is just a quick hack, it's a bit of glue between mediapi

Paul Willot 7 Sep 15, 2022
Video-to-GIF-Converter - A small code snippet that can be used to convert any video to a gif

Video to GIF Converter Project Description: This is a small code snippet that ca

Hassan Shahzad 3 Jun 22, 2022
Python retagging utility for mkv files using mkvmerge.

pyretag A python script to retag mkv files. Setting Up pip install pyfiglet pip install rich Move the mkv files to input folder.

25 Dec 04, 2022
A project that uses optical flow and machine learning to detect aimhacking in video clips.

waldo-anticheat A project that aims to use optical flow and machine learning to visually detect cheating or hacking in video clips from fps games. Che

RicanSamurai 542 Dec 03, 2022
Media player custom component which works with MQTT.

Media player custom component which works with MQTT. I designed this to specifically work with a ESP32 which i used to control a speakercraft amp.

2 Feb 10, 2022
Simple nightcore song+video maker

nighty nighty is a simple nightcore song maker (+ video) Installation clone the repo wherever you want git clone https://www.github.com/DanyB0/nighty

DanyB0 2 Oct 02, 2022
KonomiTV: Kind and Optimized Next brOadcast watching systeM Infrastructure for TV

備考・注意事項 現在 α 版で、まだ実験的なプロダクトです。通常利用には耐えないでしょうし、サポートもできません。 安定しているとは到底言いがたい品質ですが、それでも構わない方のみ導入してください。 使い方などの説明も用意できていないため、自力でトラブルに対処できるエンジニアの方以外には現状おすすめ

tsukumi 244 Dec 31, 2022
This is a simple script to generate a .opml file from a list of youtube channels.

Youtube to rss Don't spend more time than you need to on youtube.com This is a simple script to generate a .opml file from a list of youtube channels.

Kostas 1 Oct 04, 2022