Source files for the data lake demo video using the AWS TICKIT database

Overview

Data Lake Demo

Source code for video demonstration detailed in the post, Building a Simple Data Lake on AWS . Build a simple data lake on AWS using a combination of services, including Amazon MWAA, AWS Glue Data Catalog, AWS Glue Crawlers, AWS Glue Jobs, AWS Glue Studio, Amazon Athena, and Amazon S3.

Architecture

Architecture

TICKIT Sample Database

Amazon Redshift TICKIT Sample Database

TICKIT Tables

  • tickit.saas.category
  • tickit.saas.event
  • tickit.saas.venue
  • tickit.crm.users
  • tickit.date
  • tickit.listing
  • tickit.sales

Naming Conventions

+-------------+--------------------------------------------------------------------+
| Prefix      | Description                                                        |
+-------------+--------------------------------------------------------------------+
| _source     | Data Source metadata only (org. call _raw in video)                |
| _raw        | Raw/Bronze data from data sources (org. call _converted in video)  |
| _refined    | Refined/Silver data - raw data with initial ELT/cleansing applied  |
| _aggregated | Gold/Aggregated data - aggregated/joined refined data              |
+-------------+--------------------------------------------------------------------+

AWS CLI Commands

There were two small changes made to the source code, as compared to the video demonstration, to help clarify the flow of data in the demonstration. The prefix for the (7) data source AWS Glue Data Catalog table’s prefix was switched from raw_ from source_. Also, the (7) Raw/Bronze AWS Glue Data Catalog table’s prefix was switched from converted_ to raw_. The final data flow is 1) source_, 2) raw_, 3) refined_, and 4) agg_ (aggregated).

DATA_LAKE_BUCKET="your-data-lake-bucket"

aws s3 rm "s3://${DATA_LAKE_BUCKET}/tickit/" --recursive

aws glue delete-database --name tickit_demo

aws glue create-database \
  --database-input '{"Name": "tickit_demo", "Description": "Track sales activity for the fictional TICKIT web site"}'

aws glue get-tables \
  --database-name tickit_demo \
  --query "TableList[].Name" \
  --output table

aws glue start-crawler --name tickit_postgresql
aws glue start-crawler --name tickit_mysql
aws glue start-crawler --name tickit_mssql

aws glue get-tables \
  --database-name tickit_demo \
  --query "TableList[].Name" \
  --expression "source_*"  \
  --output table

aws glue start-job-run --job-name tickit_public_category_raw
aws glue start-job-run --job-name tickit_public_date_raw
aws glue start-job-run --job-name tickit_public_event_raw
aws glue start-job-run --job-name tickit_public_listing_raw
aws glue start-job-run --job-name tickit_public_sales_raw
aws glue start-job-run --job-name tickit_public_users_raw
aws glue start-job-run --job-name tickit_public_venue_raw

aws glue start-job-run --job-name tickit_public_category_refine
aws glue start-job-run --job-name tickit_public_date_refine
aws glue start-job-run --job-name tickit_public_event_refine
aws glue start-job-run --job-name tickit_public_listing_refine
aws glue start-job-run --job-name tickit_public_sales_refine
aws glue start-job-run --job-name tickit_public_users_refine
aws glue start-job-run --job-name tickit_public_venue_refine

aws glue get-tables \
  --database-name tickit_demo \
  --query "TableList[].Name" \
  --output table

aws s3api list-objects-v2 \
  --bucket ${DATA_LAKE_BUCKET} \
  --prefix "tickit/" \
  --query "Contents[].Key" \
  --output table
Owner
Gary A. Stafford
AWS Senior Solutions Architect | AWS Certified Professional | Cloud | Data | Containers | Serverless | DevOps | Polyglot Developer
Gary A. Stafford
I have baked a custom integration to control Eufy Security Cameras and access RTSP and P2P stream if possible.

I have baked a custom integration to control Eufy Security Cameras and access RTSP (real time streaming protocol) and P2P (peer to peer) stream if pos

Fuat Akgün 422 Jan 01, 2023
Tiny python video cutter

tiny_python_video_cutter Source code based on a discussion in StackOverflow Setup project in Pycharm: Configure virtual env in Pycharm. You are done w

Truong 2 May 28, 2022
This is a tool for making a every day video if you take a picture of you everyday

Face-Everyday-Maker-Studio Description This project is a tool for making a everyday video, which is timelapse video or slides video, of images but for

John A Betancourt G 9 Sep 06, 2022
A python program which converts images and video into excel spreadsheets.

image2excel A program which converts images and video into Excel spreadsheets. Usage examples can be found in examples Videos can take a long time to

Oscar Peace 2 Aug 09, 2021
GStreamer Inspector GUI

gst-explorer GStreamer GUI Interface Tool GUI interface for inspecting GStreamer Plugins, Elements and Type Finders. Expects Python3 Qt, PyQt5 and GSt

Jetsonhacks 31 Nov 29, 2022
Become a virtual character with just your webcam!

Become a virtual character with just your webcam!

Rich 300 Jan 03, 2023
A VcVideoPlayer Bot for Telegram made with 💞 By @ThePro_CoderZ

VcVideoPlayer A VcVideoPlayer Bot for Telegram made with 💞 By @Akki_ThePro Heroku Deploy The easiest way to deploy this Bot is via Heroku. License

1 Dec 06, 2021
Search a video semantically with AI.

Which Frame? Search a video semantically with AI. For example, try a natural language search query like "a person with sunglasses". You can also searc

David Chuan-En Lin 1 Nov 06, 2021
基于BililiveRecorder 的集群录播客户端

高度自动化的录播服务端! 一、项目介绍 1、介绍 这是NGlive的录播服务器集群的客户端部分实现代码,它可以自动化的进行录制-压制-上传-通知,同时流程高度可自定义,并且可以任意受中心服务器的调度,有一定的错误修复能力。可以保证长期稳定的运行。 2、基本功能 这个客户端集 录制、转码压制、上传为一

NGWORKS 7 Jul 10, 2022
Use ZWO astronomy camera as an IP camera.

ZWO Astronomy Camera as IP Camera Astronomy cameras are known for their high sensitivity and flexibility on whether to have IR pass through and bayer

Yan Wang 9 Oct 15, 2022
Python based script to operate FFMPEG.

FMPConvert Python based script to operate FFMPEG. Ver 1.0 -- 2022.02.08 Feature ✅ Maximum compatibility: Third-party dependency libraries unused ✅ Che

cybern000b 1 Feb 28, 2022
Adblocker for movie subtitles.

SubAdBlock Adblocker for movie subtitles. Usage Place "main.py" and "config.conf" in directory with subtitles and run "main.py". It will automatically

Marko Živić 1 Jan 09, 2022
Docker container to expose a local RTMP, RTSP, and HLS stream for all your Wyze cameras including v3

Docker container to expose a local RTMP, RTSP, and HLS stream for all your Wyze cameras including v3. No Third-party or special firmware required.

1.2k Jan 07, 2023
Simple background blur for your webcam

backgroundblur Simple background blur for your webcam. This script will capture your webcams output, add a blur effect to the background and output th

Stefan Wagner 4 Dec 07, 2021
Video Translation Into Text

2021/12/9 The project has been updated Added a home screen Just drag it onto the screen The final results \ 2021/12/9 项目已更新 添加了主界面 拖到即可 最后结果 \ Using t

10 Mar 12, 2022
Automatically logs into VTOP and can perform certain tasks

VTOP_Login Automatically logs into VTOP and can perform certain tasks To run the

Jatin 1 Jan 30, 2022
Vigia-youtube - The YouTube Watch bot is able to monitor channels on Google's video platform

Vigia do YouTube O bot Vigia do YouTube é capaz de monitorar canais na plataform

Alessandro Feitosa Jr 10 Oct 03, 2022
Program to play videos with props in Apex Legends

R5Fresh A video player for the Apex Legends mod R5Reloaded

9 Nov 13, 2022
Lightweight, zero-dependency proxy and storage RTSP server

python-rtsp-server Python-rtsp-server is a lightweight, zero-dependency proxy and storage server for several IP-cameras and multiple clients. Features

Vlad 39 Nov 23, 2022
A Advanced Anime Theme VC Video Player created for playing vidio in the voice chats of Telegram Groups

Yui Vidio Player A Advanced Anime Theme VC Video Player created for playing vidio in the voice chats of Telegram Groups Demo Setting up Add this Bot t

Achu biju 32 Sep 16, 2021