Source files for the data lake demo video using the AWS TICKIT database

Overview

Data Lake Demo

Source code for video demonstration detailed in the post, Building a Simple Data Lake on AWS . Build a simple data lake on AWS using a combination of services, including Amazon MWAA, AWS Glue Data Catalog, AWS Glue Crawlers, AWS Glue Jobs, AWS Glue Studio, Amazon Athena, and Amazon S3.

Architecture

Architecture

TICKIT Sample Database

Amazon Redshift TICKIT Sample Database

TICKIT Tables

  • tickit.saas.category
  • tickit.saas.event
  • tickit.saas.venue
  • tickit.crm.users
  • tickit.date
  • tickit.listing
  • tickit.sales

Naming Conventions

+-------------+--------------------------------------------------------------------+
| Prefix      | Description                                                        |
+-------------+--------------------------------------------------------------------+
| _source     | Data Source metadata only (org. call _raw in video)                |
| _raw        | Raw/Bronze data from data sources (org. call _converted in video)  |
| _refined    | Refined/Silver data - raw data with initial ELT/cleansing applied  |
| _aggregated | Gold/Aggregated data - aggregated/joined refined data              |
+-------------+--------------------------------------------------------------------+

AWS CLI Commands

There were two small changes made to the source code, as compared to the video demonstration, to help clarify the flow of data in the demonstration. The prefix for the (7) data source AWS Glue Data Catalog table’s prefix was switched from raw_ from source_. Also, the (7) Raw/Bronze AWS Glue Data Catalog table’s prefix was switched from converted_ to raw_. The final data flow is 1) source_, 2) raw_, 3) refined_, and 4) agg_ (aggregated).

DATA_LAKE_BUCKET="your-data-lake-bucket"

aws s3 rm "s3://${DATA_LAKE_BUCKET}/tickit/" --recursive

aws glue delete-database --name tickit_demo

aws glue create-database \
  --database-input '{"Name": "tickit_demo", "Description": "Track sales activity for the fictional TICKIT web site"}'

aws glue get-tables \
  --database-name tickit_demo \
  --query "TableList[].Name" \
  --output table

aws glue start-crawler --name tickit_postgresql
aws glue start-crawler --name tickit_mysql
aws glue start-crawler --name tickit_mssql

aws glue get-tables \
  --database-name tickit_demo \
  --query "TableList[].Name" \
  --expression "source_*"  \
  --output table

aws glue start-job-run --job-name tickit_public_category_raw
aws glue start-job-run --job-name tickit_public_date_raw
aws glue start-job-run --job-name tickit_public_event_raw
aws glue start-job-run --job-name tickit_public_listing_raw
aws glue start-job-run --job-name tickit_public_sales_raw
aws glue start-job-run --job-name tickit_public_users_raw
aws glue start-job-run --job-name tickit_public_venue_raw

aws glue start-job-run --job-name tickit_public_category_refine
aws glue start-job-run --job-name tickit_public_date_refine
aws glue start-job-run --job-name tickit_public_event_refine
aws glue start-job-run --job-name tickit_public_listing_refine
aws glue start-job-run --job-name tickit_public_sales_refine
aws glue start-job-run --job-name tickit_public_users_refine
aws glue start-job-run --job-name tickit_public_venue_refine

aws glue get-tables \
  --database-name tickit_demo \
  --query "TableList[].Name" \
  --output table

aws s3api list-objects-v2 \
  --bucket ${DATA_LAKE_BUCKET} \
  --prefix "tickit/" \
  --query "Contents[].Key" \
  --output table
Owner
Gary A. Stafford
AWS Senior Solutions Architect | AWS Certified Professional | Cloud | Data | Containers | Serverless | DevOps | Polyglot Developer
Gary A. Stafford
Source files for the data lake demo video using the AWS TICKIT database

Data Lake Demo Source code for video demonstration detailed in the post, Building a Simple Data Lake on AWS . Build a simple data lake on AWS using a

Gary A. Stafford 97 Dec 23, 2022
720p FPGA Media Player (RISC-V + Motion JPEG + SD + HDMI on an Artix 7)

FPGA Media Player This project is a FPGA based media player which is capable of playing Motion JPEG encoded video over HDMI or VGA on commonly availab

179 Dec 02, 2022
Telegram Video Chat Video Streaming bot 🇱🇰

🧪 Get SESSION_NAME from below: Pyrogram 🎭 Preview ✨ Features Music & Video stream support MultiChat support Playlist & Queue support Skip, Pause, Re

DOOZY YEZ 5 Jun 26, 2022
DICexport is a GUI (PyQt5) to export digital image correlation videos

DIC Video Exporter DICexport is a GUI (PyQt5) to export digital image correlation videos. It offers the flexibility to choose a selected range of a vi

Chaoyi Zhu 0 Jun 23, 2022
Converts Betaflight blackbox gyro to MP4 GoPro Meta data so it can be used with ReelSteady GO

Here are a bunch of scripts that I created some time ago as a proof of concept that Betaflight blackbox gyro data can be converted to GoPro Metadata F

108 Oct 05, 2022
Become a virtual character with just your webcam!

Become a virtual character with just your webcam!

Rich 300 Jan 03, 2023
Python script for extracting audio from video files and creating Mel spectrograms

video2spectrogram About This package is meant to automate the process of extracting audio files from videos and saving the plots computed from these a

Alexandros Stergiou 1 Oct 28, 2021
Video stream image stacking -- live version

video stream image stacking v2 -- live version A very simple streamed video image stacking code! Version 2.1 left mouse click to select a small region

Chakravarthy Mathiazhagan 1 Jan 03, 2022
A Python media index

pyvideo https://pyvideo.org is simply an index of Python-related media records. The raw data being used here comes out of the pyvideo/data repo. Befor

pyvideo 235 Dec 24, 2022
This is a simple script to generate a .opml file from a list of youtube channels.

Youtube to rss Don't spend more time than you need to on youtube.com This is a simple script to generate a .opml file from a list of youtube channels.

Kostas 1 Oct 04, 2022
Python bindings for FFmpeg - with complex filtering support

ffmpeg-python: Python bindings for FFmpeg Overview There are tons of Python FFmpeg wrappers out there but they seem to lack complex filter support. ff

Karl Kroening 7.7k Jan 03, 2023
Meteor scan - Scan through video for meteor

meteor_scan Scan through video for meteor Installation Install python packages b

2 Jun 04, 2022
KonomiTV: Kind and Optimized Next brOadcast watching systeM Infrastructure for TV

備考・注意事項 現在 α 版で、まだ実験的なプロダクトです。通常利用には耐えないでしょうし、サポートもできません。 安定しているとは到底言いがたい品質ですが、それでも構わない方のみ導入してください。 使い方などの説明も用意できていないため、自力でトラブルに対処できるエンジニアの方以外には現状おすすめ

tsukumi 244 Dec 31, 2022
A Python extension that provides bindings to WebRTC M92

This project follows the W3C specification with some modifications and additions to make it work better with Python applications, with useful APIs like programmatic audio and video.

Il'ya 104 Dec 26, 2022
pyffstream - A CLI frontend for streaming over SRT and RTMP specializing in sending off files

pyffstream - A CLI frontend for streaming over SRT and RTMP specializing in sending off files

Gregory Beauregard 3 Mar 04, 2022
Turn any live video stream or locally stored video into a dataset of interesting samples for ML training, or any other type of analysis.

Sieve Video Data Collection Example Find samples that are interesting within hours of raw video, for free and completely automatically using Sieve API

Sieve 72 Aug 01, 2022
Filtering user-generated video content(SberZvukTechDays)Filtering user-generated video content(SberZvukTechDays)

Filtering user-generated video content(SberZvukTechDays) Table of contents General info Team members Technologies Setup Result General info This is a

Roman 6 Apr 06, 2022
A simple Telegram bot to extract hard-coded subtitle from videos using FFmpeg & Tesseract.

Video Subtitle Extractor Bot A simple Telegram bot to extract hard-coded subtitle from videos using FFmpeg & Tesseract. Note that the accuracy of reco

14 Oct 28, 2022
Video-to-GIF-Converter - A small code snippet that can be used to convert any video to a gif

Video to GIF Converter Project Description: This is a small code snippet that ca

Hassan Shahzad 3 Jun 22, 2022