Source files for the data lake demo video using the AWS TICKIT database

Overview

Data Lake Demo

Source code for video demonstration detailed in the post, Building a Simple Data Lake on AWS . Build a simple data lake on AWS using a combination of services, including Amazon MWAA, AWS Glue Data Catalog, AWS Glue Crawlers, AWS Glue Jobs, AWS Glue Studio, Amazon Athena, and Amazon S3.

Architecture

Architecture

TICKIT Sample Database

Amazon Redshift TICKIT Sample Database

TICKIT Tables

  • tickit.saas.category
  • tickit.saas.event
  • tickit.saas.venue
  • tickit.crm.users
  • tickit.date
  • tickit.listing
  • tickit.sales

Naming Conventions

+-------------+--------------------------------------------------------------------+
| Prefix      | Description                                                        |
+-------------+--------------------------------------------------------------------+
| _source     | Data Source metadata only (org. call _raw in video)                |
| _raw        | Raw/Bronze data from data sources (org. call _converted in video)  |
| _refined    | Refined/Silver data - raw data with initial ELT/cleansing applied  |
| _aggregated | Gold/Aggregated data - aggregated/joined refined data              |
+-------------+--------------------------------------------------------------------+

AWS CLI Commands

There were two small changes made to the source code, as compared to the video demonstration, to help clarify the flow of data in the demonstration. The prefix for the (7) data source AWS Glue Data Catalog table’s prefix was switched from raw_ from source_. Also, the (7) Raw/Bronze AWS Glue Data Catalog table’s prefix was switched from converted_ to raw_. The final data flow is 1) source_, 2) raw_, 3) refined_, and 4) agg_ (aggregated).

DATA_LAKE_BUCKET="your-data-lake-bucket"

aws s3 rm "s3://${DATA_LAKE_BUCKET}/tickit/" --recursive

aws glue delete-database --name tickit_demo

aws glue create-database \
  --database-input '{"Name": "tickit_demo", "Description": "Track sales activity for the fictional TICKIT web site"}'

aws glue get-tables \
  --database-name tickit_demo \
  --query "TableList[].Name" \
  --output table

aws glue start-crawler --name tickit_postgresql
aws glue start-crawler --name tickit_mysql
aws glue start-crawler --name tickit_mssql

aws glue get-tables \
  --database-name tickit_demo \
  --query "TableList[].Name" \
  --expression "source_*"  \
  --output table

aws glue start-job-run --job-name tickit_public_category_raw
aws glue start-job-run --job-name tickit_public_date_raw
aws glue start-job-run --job-name tickit_public_event_raw
aws glue start-job-run --job-name tickit_public_listing_raw
aws glue start-job-run --job-name tickit_public_sales_raw
aws glue start-job-run --job-name tickit_public_users_raw
aws glue start-job-run --job-name tickit_public_venue_raw

aws glue start-job-run --job-name tickit_public_category_refine
aws glue start-job-run --job-name tickit_public_date_refine
aws glue start-job-run --job-name tickit_public_event_refine
aws glue start-job-run --job-name tickit_public_listing_refine
aws glue start-job-run --job-name tickit_public_sales_refine
aws glue start-job-run --job-name tickit_public_users_refine
aws glue start-job-run --job-name tickit_public_venue_refine

aws glue get-tables \
  --database-name tickit_demo \
  --query "TableList[].Name" \
  --output table

aws s3api list-objects-v2 \
  --bucket ${DATA_LAKE_BUCKET} \
  --prefix "tickit/" \
  --query "Contents[].Key" \
  --output table
Owner
Gary A. Stafford
AWS Senior Solutions Architect | AWS Certified Professional | Cloud | Data | Containers | Serverless | DevOps | Polyglot Developer
Gary A. Stafford
Simple background blur for your webcam

backgroundblur Simple background blur for your webcam. This script will capture your webcams output, add a blur effect to the background and output th

Stefan Wagner 4 Dec 07, 2021
Filtering user-generated video content(SberZvukTechDays)Filtering user-generated video content(SberZvukTechDays)

Filtering user-generated video content(SberZvukTechDays) Table of contents General info Team members Technologies Setup Result General info This is a

Roman 6 Apr 06, 2022
Boltstream Live Video Streaming Website + Backend

Boltstream Self-hosted Live Video Streaming Website + Backend Reference

Ben Wilber 1.7k Dec 28, 2022
Automagically synchronize subtitles with video.

FFsubsync Language-agnostic automatic synchronization of subtitles with video, so that subtitles are aligned to the correct starting point within the

Stephen Macke 5.7k Jan 06, 2023
A GUI based glitch tool that uses FFMPEG to create motion interpolated glitches in your videos.

FF Dissolve Glitch This is a GUI based glitch tool that uses FFmpeg to create awesome and wierd motion interpolated glitches in videos. I call it FF d

Akash Bora 19 Nov 10, 2022
Tweet stream in OBS browser source

OBS-Twitter-Stream OBSなどの配信ソフトのブラウザソースで特定のキーワードを含んだツイートを表示します 使い方 使い方については以下のwikiを御覧ください https://github.com/CubeZeero/OBS-Twitter-Stream/wiki ダウンロード W

Cube 23 Dec 18, 2022
Image and video quality assessment

CenseoQoE: 视觉感知画质评价框架 项目介绍 图像/视频在编解码、传输和显示等过程中难免引入不同类型/程度的失真导致图像质量下降。图像/视频质量评价(IVQA)的研究目标是希望模仿人类视觉感知系统, 通过算法评估图片/视频在终端用户的眼中画质主观体验的好坏,目前在视频编解码、画质增强、画质监。

Tencent 133 Dec 20, 2022
Simple nightcore song+video maker

nighty nighty is a simple nightcore song maker (+ video) Installation clone the repo wherever you want git clone https://www.github.com/DanyB0/nighty

DanyB0 2 Oct 02, 2022
PyAV is a Pythonic binding for the FFmpeg libraries.

PyAV is a Pythonic binding for the FFmpeg libraries. We aim to provide all of the power and control of the underlying library, but manage the gritty details as much as possible.

PyAV 1.8k Jan 01, 2023
Program to play videos with props in Apex Legends

R5Fresh A video player for the Apex Legends mod R5Reloaded

9 Nov 13, 2022
获取斗鱼&虎牙&哔哩哔哩&抖音&快手等 48 个直播平台的真实流媒体地址(直播源)和弹幕,直播源可在 PotPlayer、flv.js 等播放器中播放。

获取斗鱼&虎牙&哔哩哔哩&抖音&快手等 48 个直播平台的真实流媒体地址(直播源)和弹幕,直播源可在 PotPlayer、flv.js 等播放器中播放。

乌帮图 5.6k Jan 06, 2023
Use ZWO astronomy camera as an IP camera.

ZWO Astronomy Camera as IP Camera Astronomy cameras are known for their high sensitivity and flexibility on whether to have IR pass through and bayer

Yan Wang 9 Oct 15, 2022
A GUI application for cropping images from videos

v-trimming-gui A GUI application for cropping images from videos. 動画をシークバーで操作しながらスクリーンショットを撮るためのアプリ。 Requirement Python =3.7 opencv-python ^4.5.5 PyS

Menrui 6 Feb 05, 2022
Ffmpeg videostream - High speed video frame access in Python, using FFmpeg and FFshow

FFmpeg VideoStream High speed video frame access in Python, using FFmpeg and FFshow This script requires: Karl Kroening's 'ffmpeg-python' library. (ht

3 Sep 29, 2022
This application makes a webrtc video call with jitsi meet signaling

gstreamer-jitsi-meet This application makes a webrtc video call with jitsi meet signaling. Other end can be any jitsi meet app or web app. It doesn't

Linh 7 Apr 26, 2022
This is an example of building a video Question-Answer system using Jina.

example-video-search This is an example of building a video Question-Answer system using Jina. The index data is subtitle files of YouTube videos. Aft

Jina AI 9 Oct 18, 2022
Script simples para baixar vídeos/áudios/playlist do YouTube

🔗 VilelaTube ▶️ Script simples para baixar vídeos/áudios/playlist do YouTube Requisitos • Como usar • Melhorias futuras ⚠️ Atenção! ⚠️ Lembre-se de a

João Victor Vilela dos Santos 2 Nov 03, 2021
Stream anime from kaa.si with python

kaa.si-cli Stream anime using MPV player from kaa.si with python

Muhammad Rovino Sanjaya 52 Dec 24, 2022
BlogBot - a Python script that create blogs from YouTube videos.

BlogBot - Convert Youtube Videos To Blogs BlogBot is a Python script that create blogs from YouTube videos.

Nikhil Bhamere 4 Apr 22, 2022
Rembg Video Virtual Green Screen Edition

Rembg Virtual Greenscreen Edition is a tool to create a green screen matte for videos

Tim Scarfe 217 Jan 06, 2023