Source files for the data lake demo video using the AWS TICKIT database

Overview

Data Lake Demo

Source code for video demonstration detailed in the post, Building a Simple Data Lake on AWS . Build a simple data lake on AWS using a combination of services, including Amazon MWAA, AWS Glue Data Catalog, AWS Glue Crawlers, AWS Glue Jobs, AWS Glue Studio, Amazon Athena, and Amazon S3.

Architecture

Architecture

TICKIT Sample Database

Amazon Redshift TICKIT Sample Database

TICKIT Tables

  • tickit.saas.category
  • tickit.saas.event
  • tickit.saas.venue
  • tickit.crm.users
  • tickit.date
  • tickit.listing
  • tickit.sales

Naming Conventions

+-------------+--------------------------------------------------------------------+
| Prefix      | Description                                                        |
+-------------+--------------------------------------------------------------------+
| _source     | Data Source metadata only (org. call _raw in video)                |
| _raw        | Raw/Bronze data from data sources (org. call _converted in video)  |
| _refined    | Refined/Silver data - raw data with initial ELT/cleansing applied  |
| _aggregated | Gold/Aggregated data - aggregated/joined refined data              |
+-------------+--------------------------------------------------------------------+

AWS CLI Commands

There were two small changes made to the source code, as compared to the video demonstration, to help clarify the flow of data in the demonstration. The prefix for the (7) data source AWS Glue Data Catalog table’s prefix was switched from raw_ from source_. Also, the (7) Raw/Bronze AWS Glue Data Catalog table’s prefix was switched from converted_ to raw_. The final data flow is 1) source_, 2) raw_, 3) refined_, and 4) agg_ (aggregated).

DATA_LAKE_BUCKET="your-data-lake-bucket"

aws s3 rm "s3://${DATA_LAKE_BUCKET}/tickit/" --recursive

aws glue delete-database --name tickit_demo

aws glue create-database \
  --database-input '{"Name": "tickit_demo", "Description": "Track sales activity for the fictional TICKIT web site"}'

aws glue get-tables \
  --database-name tickit_demo \
  --query "TableList[].Name" \
  --output table

aws glue start-crawler --name tickit_postgresql
aws glue start-crawler --name tickit_mysql
aws glue start-crawler --name tickit_mssql

aws glue get-tables \
  --database-name tickit_demo \
  --query "TableList[].Name" \
  --expression "source_*"  \
  --output table

aws glue start-job-run --job-name tickit_public_category_raw
aws glue start-job-run --job-name tickit_public_date_raw
aws glue start-job-run --job-name tickit_public_event_raw
aws glue start-job-run --job-name tickit_public_listing_raw
aws glue start-job-run --job-name tickit_public_sales_raw
aws glue start-job-run --job-name tickit_public_users_raw
aws glue start-job-run --job-name tickit_public_venue_raw

aws glue start-job-run --job-name tickit_public_category_refine
aws glue start-job-run --job-name tickit_public_date_refine
aws glue start-job-run --job-name tickit_public_event_refine
aws glue start-job-run --job-name tickit_public_listing_refine
aws glue start-job-run --job-name tickit_public_sales_refine
aws glue start-job-run --job-name tickit_public_users_refine
aws glue start-job-run --job-name tickit_public_venue_refine

aws glue get-tables \
  --database-name tickit_demo \
  --query "TableList[].Name" \
  --output table

aws s3api list-objects-v2 \
  --bucket ${DATA_LAKE_BUCKET} \
  --prefix "tickit/" \
  --query "Contents[].Key" \
  --output table
Owner
Gary A. Stafford
AWS Senior Solutions Architect | AWS Certified Professional | Cloud | Data | Containers | Serverless | DevOps | Polyglot Developer
Gary A. Stafford
基于BililiveRecorder 的集群录播客户端

高度自动化的录播服务端! 一、项目介绍 1、介绍 这是NGlive的录播服务器集群的客户端部分实现代码,它可以自动化的进行录制-压制-上传-通知,同时流程高度可自定义,并且可以任意受中心服务器的调度,有一定的错误修复能力。可以保证长期稳定的运行。 2、基本功能 这个客户端集 录制、转码压制、上传为一

NGWORKS 7 Jul 10, 2022
MoviePy is a Python library for video editing, can read and write all the most common audio and video formats

MoviePy is a Python library for video editing: cutting, concatenations, title insertions, video compositing (a.k.a. non-linear editing), video processing, and creation of custom effects. See the gall

10k Jan 08, 2023
Converts Betaflight blackbox gyro to MP4 GoPro Meta data so it can be used with ReelSteady GO

Here are a bunch of scripts that I created some time ago as a proof of concept that Betaflight blackbox gyro data can be converted to GoPro Metadata F

108 Oct 05, 2022
Techie Sneh 17 Nov 23, 2021
Jio TV Server - Watch TV right from your laptop

Jio-PyServer Jio TV - Python Server Watch TV right from your laptop! Requirements: Python 3.X Internet Access A Jio Account Known Issues: Channel Stre

Elvis Tony 11 Apr 05, 2022
Terminal-Video-Player - A program that can display video in the terminal using ascii characters

Terminal-Video-Player - A program that can display video in the terminal using ascii characters

15 Nov 10, 2022
Help for manipulating the plex-media-server transcode on the raspberry pi

raspi-plex-transcode Help for manipulating the plex-media-server transcode on the raspberry pi Ensure hardware decoding works and your firmware is up

10 Sep 29, 2022
Extracting frames from video and create video using frames

Extracting frames from video and create video using frames This program uses opencv library to extract the frames from video and create video from ext

1 Nov 19, 2021
Simple background blur for your webcam

backgroundblur Simple background blur for your webcam. This script will capture your webcams output, add a blur effect to the background and output th

Stefan Wagner 4 Dec 07, 2021
A VcVideoPlayer Bot for Telegram made with 💞 By @ThePro_CoderZ

VcVideoPlayer A VcVideoPlayer Bot for Telegram made with 💞 By @Akki_ThePro Heroku Deploy The easiest way to deploy this Bot is via Heroku. License

1 Dec 06, 2021
Adblocker for movie subtitles.

SubAdBlock Adblocker for movie subtitles. Usage Place "main.py" and "config.conf" in directory with subtitles and run "main.py". It will automatically

Marko Živić 1 Jan 09, 2022
A simple Python Youtube Wachtime for YTbebot

Simple bot that was development in python 3.7, that automatically watch youtube videos. It can be used to give more views in your channel helping in the spread and increase the followers because your

Rian eka wiratma 1 Dec 05, 2021
Real-time video and audio streams over the network, with Streamlit.

streamlit-webrtc Example You can try out the sample app using the following commands.

Yuichiro Tachibana (Tsuchiya) 648 Jan 01, 2023
A pure python media player that can be used in AI media API development.

A pure python media player that can be used in AI media API development.

YDOOK 1 Dec 04, 2021
It is a simple python package to play videos in the terminal using characters as pixels

It is a simple python package to play videos in the terminal using characters as pixels

Joel Ibaceta 1.4k Jan 07, 2023
Python Simple Mass Video Clipper (PSMVC)

Python Simple Mass Video Clipper (PSMVC) PSMVC é um gerador de cortes via terminal construído em python. Uso Basta abrir o arquivo start.py Dependenci

Bruno 2 Oct 16, 2021
goal: render videos on eu4's timeline function

Rendering Videos on the EU4 Time Line This repository contains code to create an eu4-savefile that plays back a video in question.

29 Dec 24, 2022
Program to play videos with props in Apex Legends

R5Fresh A video player for the Apex Legends mod R5Reloaded

9 Nov 13, 2022
A GUI application for cropping images from videos

v-trimming-gui A GUI application for cropping images from videos. 動画をシークバーで操作しながらスクリーンショットを撮るためのアプリ。 Requirement Python =3.7 opencv-python ^4.5.5 PyS

Menrui 6 Feb 05, 2022
Streamlink is a CLI utility which pipes video streams from various services into a video player

Streamlink is a CLI utility which pipes video streams from various services into a video player

8.2k Dec 26, 2022