Source files for the data lake demo video using the AWS TICKIT database

Overview

Data Lake Demo

Source code for video demonstration detailed in the post, Building a Simple Data Lake on AWS . Build a simple data lake on AWS using a combination of services, including Amazon MWAA, AWS Glue Data Catalog, AWS Glue Crawlers, AWS Glue Jobs, AWS Glue Studio, Amazon Athena, and Amazon S3.

Architecture

Architecture

TICKIT Sample Database

Amazon Redshift TICKIT Sample Database

TICKIT Tables

  • tickit.saas.category
  • tickit.saas.event
  • tickit.saas.venue
  • tickit.crm.users
  • tickit.date
  • tickit.listing
  • tickit.sales

Naming Conventions

+-------------+--------------------------------------------------------------------+
| Prefix      | Description                                                        |
+-------------+--------------------------------------------------------------------+
| _source     | Data Source metadata only (org. call _raw in video)                |
| _raw        | Raw/Bronze data from data sources (org. call _converted in video)  |
| _refined    | Refined/Silver data - raw data with initial ELT/cleansing applied  |
| _aggregated | Gold/Aggregated data - aggregated/joined refined data              |
+-------------+--------------------------------------------------------------------+

AWS CLI Commands

There were two small changes made to the source code, as compared to the video demonstration, to help clarify the flow of data in the demonstration. The prefix for the (7) data source AWS Glue Data Catalog table’s prefix was switched from raw_ from source_. Also, the (7) Raw/Bronze AWS Glue Data Catalog table’s prefix was switched from converted_ to raw_. The final data flow is 1) source_, 2) raw_, 3) refined_, and 4) agg_ (aggregated).

DATA_LAKE_BUCKET="your-data-lake-bucket"

aws s3 rm "s3://${DATA_LAKE_BUCKET}/tickit/" --recursive

aws glue delete-database --name tickit_demo

aws glue create-database \
  --database-input '{"Name": "tickit_demo", "Description": "Track sales activity for the fictional TICKIT web site"}'

aws glue get-tables \
  --database-name tickit_demo \
  --query "TableList[].Name" \
  --output table

aws glue start-crawler --name tickit_postgresql
aws glue start-crawler --name tickit_mysql
aws glue start-crawler --name tickit_mssql

aws glue get-tables \
  --database-name tickit_demo \
  --query "TableList[].Name" \
  --expression "source_*"  \
  --output table

aws glue start-job-run --job-name tickit_public_category_raw
aws glue start-job-run --job-name tickit_public_date_raw
aws glue start-job-run --job-name tickit_public_event_raw
aws glue start-job-run --job-name tickit_public_listing_raw
aws glue start-job-run --job-name tickit_public_sales_raw
aws glue start-job-run --job-name tickit_public_users_raw
aws glue start-job-run --job-name tickit_public_venue_raw

aws glue start-job-run --job-name tickit_public_category_refine
aws glue start-job-run --job-name tickit_public_date_refine
aws glue start-job-run --job-name tickit_public_event_refine
aws glue start-job-run --job-name tickit_public_listing_refine
aws glue start-job-run --job-name tickit_public_sales_refine
aws glue start-job-run --job-name tickit_public_users_refine
aws glue start-job-run --job-name tickit_public_venue_refine

aws glue get-tables \
  --database-name tickit_demo \
  --query "TableList[].Name" \
  --output table

aws s3api list-objects-v2 \
  --bucket ${DATA_LAKE_BUCKET} \
  --prefix "tickit/" \
  --query "Contents[].Key" \
  --output table
Owner
Gary A. Stafford
AWS Senior Solutions Architect | AWS Certified Professional | Cloud | Data | Containers | Serverless | DevOps | Polyglot Developer
Gary A. Stafford
A platform which give you info about the newest video on a channel

youtube A platform which give you info about the newest video on a channel. This uses web scraping, a better implementation will be to use the API. BR

Custom components for Home Assistant 36 Sep 29, 2022
Zaid Vc Player Bot For Telegram

Zaid Vc Player Bot For Telegram

1 Dec 04, 2021
A python youtube search module

A python youtube search module

Fayas Noushad 4 Dec 01, 2021
Real-time video and audio streams over the network, with Streamlit.

streamlit-webrtc Example You can try out the sample app using the following commands.

Yuichiro Tachibana (Tsuchiya) 648 Jan 01, 2023
Add a "flame" effect on each hand's index onto a video stream.

Add a "flame" effect on each hand's index onto a video stream. recording.webm.mov This script is just a quick hack, it's a bit of glue between mediapi

Paul Willot 7 Sep 15, 2022
Use ZWO astronomy camera as an IP camera.

ZWO Astronomy Camera as IP Camera Astronomy cameras are known for their high sensitivity and flexibility on whether to have IR pass through and bayer

Yan Wang 9 Oct 15, 2022
Meteor scan - Scan through video for meteor

meteor_scan Scan through video for meteor Installation Install python packages b

2 Jun 04, 2022
Script simples para baixar vídeos/áudios/playlist do YouTube

🔗 VilelaTube ▶️ Script simples para baixar vídeos/áudios/playlist do YouTube Requisitos • Como usar • Melhorias futuras ⚠️ Atenção! ⚠️ Lembre-se de a

João Victor Vilela dos Santos 2 Nov 03, 2021
Ffmpeg videostream - High speed video frame access in Python, using FFmpeg and FFshow

FFmpeg VideoStream High speed video frame access in Python, using FFmpeg and FFshow This script requires: Karl Kroening's 'ffmpeg-python' library. (ht

3 Sep 29, 2022
Play Video & Music on Telegram Group Video Chat

🖤 DEMONGIRL 🖤 ʜᴇʟʟᴏ ❤️ 🇱🇰 Join us ᴠɪᴅᴇᴏ sᴛʀᴇᴀᴍ ɪs ᴀɴ ᴀᴅᴠᴀɴᴄᴇᴅ ᴛᴇʟᴇʀᴀᴍ ʙᴏᴛ ᴛʜᴀᴛ's ᴀʟʟᴏᴡ ʏᴏᴜ ᴛᴏ ᴘʟᴀʏ ᴠɪᴅᴇᴏ & ᴍᴜsɪᴄ ᴏɴ ᴛᴇʟᴇɢʀᴀᴍ ɢʀᴏᴜᴘ ᴠɪᴅᴇᴏ ᴄʜᴀᴛ 🧪 ɢ

Jonathan 5 Dec 31, 2021
Video stream image stacking -- live version

video stream image stacking v2 -- live version A very simple streamed video image stacking code! Version 2.1 left mouse click to select a small region

Chakravarthy Mathiazhagan 1 Jan 03, 2022
camKapture is an open source application that allows users to access their webcam device and take pictures or create videos.

camKapture is an open source application that allows users to access their webcam device and take pictures or create videos.

manoj 1 Jun 21, 2022
Video Object Segmentation(VOS) From Zero to HeroVideo Object Segmentation(VOS) From Zero to Hero

Video Object Segmentation(VOS) From Zero to Hero! Goal 1:train a two layers cnn model for vos. Finish! see model.py FFNet for more diteal.(2021.9.30)

1 Oct 22, 2021
This is an example of building a video Question-Answer system using Jina.

example-video-search This is an example of building a video Question-Answer system using Jina. The index data is subtitle files of YouTube videos. Aft

Jina AI 9 Oct 18, 2022
Program to play videos with props in Apex Legends

R5Fresh A video player for the Apex Legends mod R5Reloaded

9 Nov 13, 2022
基于BililiveRecorder 的集群录播客户端

高度自动化的录播服务端! 一、项目介绍 1、介绍 这是NGlive的录播服务器集群的客户端部分实现代码,它可以自动化的进行录制-压制-上传-通知,同时流程高度可自定义,并且可以任意受中心服务器的调度,有一定的错误修复能力。可以保证长期稳定的运行。 2、基本功能 这个客户端集 录制、转码压制、上传为一

NGWORKS 7 Jul 10, 2022
This is a simple script to generate a .opml file from a list of youtube channels.

Youtube to rss Don't spend more time than you need to on youtube.com This is a simple script to generate a .opml file from a list of youtube channels.

Kostas 1 Oct 04, 2022
A self-hosted streaming platform with Discord authentication, auto-recording and more!

A self-hosted streaming platform with Discord authentication, auto-recording and more!

John Patrick Glattetre 331 Dec 27, 2022
This program is to make a video based on Deep Dream

This program is to make a video based on Deep Dream. The program is modified from DeepDreamAnim and DeepDreamVideo with additional functions for bleding two frames based on the optical flows. It also

Aertist 23 Jan 22, 2022
Docker container to expose a local RTMP, RTSP, and HLS stream for all your Wyze cameras including v3

Docker container to expose a local RTMP, RTSP, and HLS stream for all your Wyze cameras including v3. No Third-party or special firmware required.

1.2k Jan 07, 2023