Source files for the data lake demo video using the AWS TICKIT database

Overview

Data Lake Demo

Source code for video demonstration detailed in the post, Building a Simple Data Lake on AWS . Build a simple data lake on AWS using a combination of services, including Amazon MWAA, AWS Glue Data Catalog, AWS Glue Crawlers, AWS Glue Jobs, AWS Glue Studio, Amazon Athena, and Amazon S3.

Architecture

Architecture

TICKIT Sample Database

Amazon Redshift TICKIT Sample Database

TICKIT Tables

  • tickit.saas.category
  • tickit.saas.event
  • tickit.saas.venue
  • tickit.crm.users
  • tickit.date
  • tickit.listing
  • tickit.sales

Naming Conventions

+-------------+--------------------------------------------------------------------+
| Prefix      | Description                                                        |
+-------------+--------------------------------------------------------------------+
| _source     | Data Source metadata only (org. call _raw in video)                |
| _raw        | Raw/Bronze data from data sources (org. call _converted in video)  |
| _refined    | Refined/Silver data - raw data with initial ELT/cleansing applied  |
| _aggregated | Gold/Aggregated data - aggregated/joined refined data              |
+-------------+--------------------------------------------------------------------+

AWS CLI Commands

There were two small changes made to the source code, as compared to the video demonstration, to help clarify the flow of data in the demonstration. The prefix for the (7) data source AWS Glue Data Catalog table’s prefix was switched from raw_ from source_. Also, the (7) Raw/Bronze AWS Glue Data Catalog table’s prefix was switched from converted_ to raw_. The final data flow is 1) source_, 2) raw_, 3) refined_, and 4) agg_ (aggregated).

DATA_LAKE_BUCKET="your-data-lake-bucket"

aws s3 rm "s3://${DATA_LAKE_BUCKET}/tickit/" --recursive

aws glue delete-database --name tickit_demo

aws glue create-database \
  --database-input '{"Name": "tickit_demo", "Description": "Track sales activity for the fictional TICKIT web site"}'

aws glue get-tables \
  --database-name tickit_demo \
  --query "TableList[].Name" \
  --output table

aws glue start-crawler --name tickit_postgresql
aws glue start-crawler --name tickit_mysql
aws glue start-crawler --name tickit_mssql

aws glue get-tables \
  --database-name tickit_demo \
  --query "TableList[].Name" \
  --expression "source_*"  \
  --output table

aws glue start-job-run --job-name tickit_public_category_raw
aws glue start-job-run --job-name tickit_public_date_raw
aws glue start-job-run --job-name tickit_public_event_raw
aws glue start-job-run --job-name tickit_public_listing_raw
aws glue start-job-run --job-name tickit_public_sales_raw
aws glue start-job-run --job-name tickit_public_users_raw
aws glue start-job-run --job-name tickit_public_venue_raw

aws glue start-job-run --job-name tickit_public_category_refine
aws glue start-job-run --job-name tickit_public_date_refine
aws glue start-job-run --job-name tickit_public_event_refine
aws glue start-job-run --job-name tickit_public_listing_refine
aws glue start-job-run --job-name tickit_public_sales_refine
aws glue start-job-run --job-name tickit_public_users_refine
aws glue start-job-run --job-name tickit_public_venue_refine

aws glue get-tables \
  --database-name tickit_demo \
  --query "TableList[].Name" \
  --output table

aws s3api list-objects-v2 \
  --bucket ${DATA_LAKE_BUCKET} \
  --prefix "tickit/" \
  --query "Contents[].Key" \
  --output table
Owner
Gary A. Stafford
AWS Senior Solutions Architect | AWS Certified Professional | Cloud | Data | Containers | Serverless | DevOps | Polyglot Developer
Gary A. Stafford
Simple VLC-based media player that can play multiple videos at the same time

Screenshots About Simple VLC-based media player that can play multiple videos at the same time. You can play as many videos as you like, the only limi

161 Jan 05, 2023
This program is to make a video based on Deep Dream

This program is to make a video based on Deep Dream. The program is modified from DeepDreamAnim and DeepDreamVideo with additional functions for bleding two frames based on the optical flows. It also

Aertist 23 Jan 22, 2022
Your own movie streaming service. Easy to install, easy to use. Download, manage and watch your favorite movies conveniently from your browser or phone. Install it on your server, access it anywhere and enjoy.

Vigilio Your own movie streaming service. Easy to install, easy to use. Download, manage and watch your favorite movies conveniently from your browser

Tugcan Olgun 141 Jan 06, 2023
Video stream image stacking -- live version

video stream image stacking v2 -- live version A very simple streamed video image stacking code! Version 2.1 left mouse click to select a small region

Chakravarthy Mathiazhagan 1 Jan 03, 2022
Komposition - The video editor built for screencasters

Komposition The video editor built for screencasters Tutorial Video | Introduction | Installation Documentation See the documentation and user guide.

Oskar Wickström 428 Jan 08, 2023
Cross-platform command-line AV1 / VP9 / HEVC / H264 encoding framework with per scene quality encoding

Av1an A cross-platform framework to streamline encoding Easy, Fast, Efficient and Feature Rich An easy way to start using AV1 / HEVC / H264 / VP9 / VP

Zen 947 Jan 01, 2023
A multithreaded view bot for YouTube

Simple program to increase YouTube views written in Python.

Monirul Shawon 906 Jan 09, 2023
Repository to create Ascii art in CMD based on video file.

Made to take any file format, and transform it into ascii art, displayed as a video in the cmd. If the cmd formatting is wrong, try zooming a little and remember to make cmd fullscreen. I made my cmd

60 Dec 17, 2022
A Python extension that provides bindings to WebRTC M92

This project follows the W3C specification with some modifications and additions to make it work better with Python applications, with useful APIs like programmatic audio and video.

Il'ya 104 Dec 26, 2022
Webcam Indicator is an application to recieve and send messages from your own Webcam Server.

Welcome to Webcam Indicator 👋 Webcam Indicator is an application to recieve and send messages from your own Webcam Server. 🏠 Homepage Prerequisites

Lorenzo Carbonell 2 Apr 04, 2022
Add filters (background blur, etc) to your webcam on Linux.

Add filters (background blur, etc) to your webcam on Linux.

Jashandeep Sohi 480 Dec 14, 2022
A script to disable steam servers regionwise. [Works on Windows only]

Csgo-server-blocker A script to disable steam servers regionwise. [Works on Windows only] Dependencies python3.x Usage: pip install requirements.txt I

Aditya Bennur 2 Jun 10, 2022
Home Assistant custom component for viewing IP cameras RTSP stream in real time using WebRTC technology

WebRTC Camera Home Assistant custom component for viewing IP cameras RTSP stream in real time using WebRTC technology. Based on: Pion - pure Go implem

Alex X 739 Dec 30, 2022
High-performance cross-platform Video Processing Python framework powerpacked with unique trailblazing features :fire:

Releases | Gears | Documentation | Installation | License VidGear is a High-Performance Video Processing Python Library that provides an easy-to-use,

Abhishek Thakur 2.6k Dec 28, 2022
Jio TV Server - Watch TV right from your laptop

Jio-PyServer Jio TV - Python Server Watch TV right from your laptop! Requirements: Python 3.X Internet Access A Jio Account Known Issues: Channel Stre

Elvis Tony 11 Apr 05, 2022
TkVideoplayer - This is a simple library to play video files in tkinter.

TkVideoplayer - This is a simple library to play video files in tkinter.

Art/Paul 38 Dec 23, 2022
A GUI application for cropping images from videos

v-trimming-gui A GUI application for cropping images from videos. 動画をシークバーで操作しながらスクリーンショットを撮るためのアプリ。 Requirement Python =3.7 opencv-python ^4.5.5 PyS

Menrui 6 Feb 05, 2022
Uncompress DEFLATE streams in pure Python

stream-deflate Uncompress DEFLATE streams in pure Python. Work in progress. This README serves as a rough design spec. Installation pip install stream

Michal Charemza 7 Oct 13, 2022
Boltstream Live Video Streaming Website + Backend

Boltstream Self-hosted Live Video Streaming Website + Backend Reference

Ben Wilber 1.7k Dec 28, 2022
Stream music with ffmpeg and python

youtube-stream Stream music with ffmpeg and python original Usage set the KEY in stream.sh run server.py run stream.sh (You can use Git bash or WSL in

Giyoung Ryu 14 Nov 17, 2021