Source files for the data lake demo video using the AWS TICKIT database

Overview

Data Lake Demo

Source code for video demonstration detailed in the post, Building a Simple Data Lake on AWS . Build a simple data lake on AWS using a combination of services, including Amazon MWAA, AWS Glue Data Catalog, AWS Glue Crawlers, AWS Glue Jobs, AWS Glue Studio, Amazon Athena, and Amazon S3.

Architecture

Architecture

TICKIT Sample Database

Amazon Redshift TICKIT Sample Database

TICKIT Tables

  • tickit.saas.category
  • tickit.saas.event
  • tickit.saas.venue
  • tickit.crm.users
  • tickit.date
  • tickit.listing
  • tickit.sales

Naming Conventions

+-------------+--------------------------------------------------------------------+
| Prefix      | Description                                                        |
+-------------+--------------------------------------------------------------------+
| _source     | Data Source metadata only (org. call _raw in video)                |
| _raw        | Raw/Bronze data from data sources (org. call _converted in video)  |
| _refined    | Refined/Silver data - raw data with initial ELT/cleansing applied  |
| _aggregated | Gold/Aggregated data - aggregated/joined refined data              |
+-------------+--------------------------------------------------------------------+

AWS CLI Commands

There were two small changes made to the source code, as compared to the video demonstration, to help clarify the flow of data in the demonstration. The prefix for the (7) data source AWS Glue Data Catalog table’s prefix was switched from raw_ from source_. Also, the (7) Raw/Bronze AWS Glue Data Catalog table’s prefix was switched from converted_ to raw_. The final data flow is 1) source_, 2) raw_, 3) refined_, and 4) agg_ (aggregated).

DATA_LAKE_BUCKET="your-data-lake-bucket"

aws s3 rm "s3://${DATA_LAKE_BUCKET}/tickit/" --recursive

aws glue delete-database --name tickit_demo

aws glue create-database \
  --database-input '{"Name": "tickit_demo", "Description": "Track sales activity for the fictional TICKIT web site"}'

aws glue get-tables \
  --database-name tickit_demo \
  --query "TableList[].Name" \
  --output table

aws glue start-crawler --name tickit_postgresql
aws glue start-crawler --name tickit_mysql
aws glue start-crawler --name tickit_mssql

aws glue get-tables \
  --database-name tickit_demo \
  --query "TableList[].Name" \
  --expression "source_*"  \
  --output table

aws glue start-job-run --job-name tickit_public_category_raw
aws glue start-job-run --job-name tickit_public_date_raw
aws glue start-job-run --job-name tickit_public_event_raw
aws glue start-job-run --job-name tickit_public_listing_raw
aws glue start-job-run --job-name tickit_public_sales_raw
aws glue start-job-run --job-name tickit_public_users_raw
aws glue start-job-run --job-name tickit_public_venue_raw

aws glue start-job-run --job-name tickit_public_category_refine
aws glue start-job-run --job-name tickit_public_date_refine
aws glue start-job-run --job-name tickit_public_event_refine
aws glue start-job-run --job-name tickit_public_listing_refine
aws glue start-job-run --job-name tickit_public_sales_refine
aws glue start-job-run --job-name tickit_public_users_refine
aws glue start-job-run --job-name tickit_public_venue_refine

aws glue get-tables \
  --database-name tickit_demo \
  --query "TableList[].Name" \
  --output table

aws s3api list-objects-v2 \
  --bucket ${DATA_LAKE_BUCKET} \
  --prefix "tickit/" \
  --query "Contents[].Key" \
  --output table
Owner
Gary A. Stafford
AWS Senior Solutions Architect | AWS Certified Professional | Cloud | Data | Containers | Serverless | DevOps | Polyglot Developer
Gary A. Stafford
video streaming userbot (vsu) based on pytgcalls for streaming video trought the telegram video chat group.

VIDEO STREAM USERBOT ✨ an another telegram userbot for streaming video trought the telegram video chat. Environmental Variables 📌 API_ID : Get this v

levina 6 Oct 17, 2021
Stream-Cli application that allow you to play your favorite movies from the terminal

Stream-Cli application that allow you to play your favorite movies from the terminal

redouane 380 Jan 08, 2023
A project that uses optical flow and machine learning to detect aimhacking in video clips.

waldo-anticheat A project that aims to use optical flow and machine learning to visually detect cheating or hacking in video clips from fps games. Che

RicanSamurai 542 Dec 03, 2022
pyYotubemanager is full web automated bot capable of General tasks like:- Uploading a Video , Downloading , adding Title , Description , Listing types , adding Thumbnail

PyYoutubemanager Explore the docs » View Demo · Report Bug · Request Feature About The Project PyYotubemanager is full web automated bot capable of Ge

4 Jun 29, 2022
Telegram Music/ Video Streaming Bot Using Pytgcalls

Video Player 🔥 ᴢᴀɪᴅ ᴠᴄ ᴘʟᴀyᴇʀ ɪꜱ ᴀ ᴛᴇʟᴇɢʀᴀᴍ ᴘʀᴏᴊᴇᴄᴛ ʙᴀꜱᴇᴅ ᴏɴ ᴘʏʀᴏɢʀᴀᴍ ꜰᴏʀ ᴘʟᴀʏ ᴍᴜꜱɪᴄꜱ ɪɴ ᴠᴄ ᴄʜᴀᴛꜱ... 🅡🅔🅟🅞 🅢🅣🅐🅣🅢 ʀᴇQᴜɪʀᴇᴍᴇɴᴛꜱ 📝 FFmpeg NodeJ

16 Nov 30, 2022
Techie Sneh 17 Nov 23, 2021
A simple Python Youtube Wachtime for YTbebot

Simple bot that was development in python 3.7, that automatically watch youtube videos. It can be used to give more views in your channel helping in the spread and increase the followers because your

Rian eka wiratma 1 Dec 05, 2021
Spotify playlist video generator

This program creates a video version of your Spotify playlist by using the Spotify API and YouTube-dl.

0 Mar 03, 2022
Python package for Near Duplicate Video Detection (Perceptual Video Hashing) - Get a 64-bit comparable hash-value for any video.

The Python package for near duplicate video detection ⭐️ Introduction Videohash is a Python package for detecting near-duplicate videos (Perceptual Vi

Akash Mahanty 144 Dec 19, 2022
A GUI based glitch tool that uses FFMPEG to create motion interpolated glitches in your videos.

FF Dissolve Glitch This is a GUI based glitch tool that uses FFmpeg to create awesome and wierd motion interpolated glitches in videos. I call it FF d

Akash Bora 19 Nov 10, 2022
MPV remote controller is a program for remote controlling mpv player with device in your local network through web browser.

MPV remote controller is a program for remote controlling mpv player with device in your local network through web browser.

5 May 26, 2022
Cvplayer - A simple video player written in python using ffpyplayer and OpenCV

Video Player cvplayer is a minimal wrapper around the ffpyplayer.MediaPlayer cla

ADI 7 Dec 19, 2022
DICexport is a GUI (PyQt5) to export digital image correlation videos

DIC Video Exporter DICexport is a GUI (PyQt5) to export digital image correlation videos. It offers the flexibility to choose a selected range of a vi

Chaoyi Zhu 0 Jun 23, 2022
Video Translation Into Text

2021/12/9 The project has been updated Added a home screen Just drag it onto the screen The final results \ 2021/12/9 项目已更新 添加了主界面 拖到即可 最后结果 \ Using t

10 Mar 12, 2022
Tweet stream in OBS browser source

OBS-Twitter-Stream OBSなどの配信ソフトのブラウザソースで特定のキーワードを含んだツイートを表示します 使い方 使い方については以下のwikiを御覧ください https://github.com/CubeZeero/OBS-Twitter-Stream/wiki ダウンロード W

Cube 23 Dec 18, 2022
Video Chat Streamer With Python

Voice Chat Streamer This bot can stream audio or video files and urls in telegram voice chats :) 🎯 Follow me and star this repo for more telegram bot

WiskeyWorm 4 Oct 09, 2022
A Python media index

pyvideo https://pyvideo.org is simply an index of Python-related media records. The raw data being used here comes out of the pyvideo/data repo. Befor

pyvideo 235 Dec 24, 2022
Your own movie streaming service. Easy to install, easy to use. Download, manage and watch your favorite movies conveniently from your browser or phone. Install it on your server, access it anywhere and enjoy.

Vigilio Your own movie streaming service. Easy to install, easy to use. Download, manage and watch your favorite movies conveniently from your browser

Tugcan Olgun 141 Jan 06, 2023
Youtube-dislikes-adder - Add dislikes to the description of your YouTube videos.

Add number of dislikes to the description of your YouTube videos. Number of dislikes are updated if you let this function as a bot.

fluks 1 Aug 23, 2022