Source files for the data lake demo video using the AWS TICKIT database

Overview

Data Lake Demo

Source code for video demonstration detailed in the post, Building a Simple Data Lake on AWS . Build a simple data lake on AWS using a combination of services, including Amazon MWAA, AWS Glue Data Catalog, AWS Glue Crawlers, AWS Glue Jobs, AWS Glue Studio, Amazon Athena, and Amazon S3.

Architecture

Architecture

TICKIT Sample Database

Amazon Redshift TICKIT Sample Database

TICKIT Tables

  • tickit.saas.category
  • tickit.saas.event
  • tickit.saas.venue
  • tickit.crm.users
  • tickit.date
  • tickit.listing
  • tickit.sales

Naming Conventions

+-------------+--------------------------------------------------------------------+
| Prefix      | Description                                                        |
+-------------+--------------------------------------------------------------------+
| _source     | Data Source metadata only (org. call _raw in video)                |
| _raw        | Raw/Bronze data from data sources (org. call _converted in video)  |
| _refined    | Refined/Silver data - raw data with initial ELT/cleansing applied  |
| _aggregated | Gold/Aggregated data - aggregated/joined refined data              |
+-------------+--------------------------------------------------------------------+

AWS CLI Commands

There were two small changes made to the source code, as compared to the video demonstration, to help clarify the flow of data in the demonstration. The prefix for the (7) data source AWS Glue Data Catalog table’s prefix was switched from raw_ from source_. Also, the (7) Raw/Bronze AWS Glue Data Catalog table’s prefix was switched from converted_ to raw_. The final data flow is 1) source_, 2) raw_, 3) refined_, and 4) agg_ (aggregated).

DATA_LAKE_BUCKET="your-data-lake-bucket"

aws s3 rm "s3://${DATA_LAKE_BUCKET}/tickit/" --recursive

aws glue delete-database --name tickit_demo

aws glue create-database \
  --database-input '{"Name": "tickit_demo", "Description": "Track sales activity for the fictional TICKIT web site"}'

aws glue get-tables \
  --database-name tickit_demo \
  --query "TableList[].Name" \
  --output table

aws glue start-crawler --name tickit_postgresql
aws glue start-crawler --name tickit_mysql
aws glue start-crawler --name tickit_mssql

aws glue get-tables \
  --database-name tickit_demo \
  --query "TableList[].Name" \
  --expression "source_*"  \
  --output table

aws glue start-job-run --job-name tickit_public_category_raw
aws glue start-job-run --job-name tickit_public_date_raw
aws glue start-job-run --job-name tickit_public_event_raw
aws glue start-job-run --job-name tickit_public_listing_raw
aws glue start-job-run --job-name tickit_public_sales_raw
aws glue start-job-run --job-name tickit_public_users_raw
aws glue start-job-run --job-name tickit_public_venue_raw

aws glue start-job-run --job-name tickit_public_category_refine
aws glue start-job-run --job-name tickit_public_date_refine
aws glue start-job-run --job-name tickit_public_event_refine
aws glue start-job-run --job-name tickit_public_listing_refine
aws glue start-job-run --job-name tickit_public_sales_refine
aws glue start-job-run --job-name tickit_public_users_refine
aws glue start-job-run --job-name tickit_public_venue_refine

aws glue get-tables \
  --database-name tickit_demo \
  --query "TableList[].Name" \
  --output table

aws s3api list-objects-v2 \
  --bucket ${DATA_LAKE_BUCKET} \
  --prefix "tickit/" \
  --query "Contents[].Key" \
  --output table
Owner
Gary A. Stafford
AWS Senior Solutions Architect | AWS Certified Professional | Cloud | Data | Containers | Serverless | DevOps | Polyglot Developer
Gary A. Stafford
PyAV is a Pythonic binding for the FFmpeg libraries.

PyAV is a Pythonic binding for the FFmpeg libraries. We aim to provide all of the power and control of the underlying library, but manage the gritty details as much as possible.

PyAV 1.8k Jan 01, 2023
video streaming userbot (vsu) based on pytgcalls for streaming video trought the telegram video chat group.

VIDEO STREAM USERBOT ✨ an another telegram userbot for streaming video trought the telegram video chat. Environmental Variables 📌 API_ID : Get this v

levina 6 Oct 17, 2021
All the code in these repos was created and explained by HashLips on the main YouTube channel.

Welcome to HashLips 👄 All the code in these repos was created and explained by HashLips on the main YouTube channel. To find out more please visit: ?

HashLips 6.7k Jan 06, 2023
Synchronize Two Cameras in Real Time using Multiprocessing

Synchronize Two Cameras in Real Time using Multiprocessing In progress ... 📁 Project Structure 📚 Install Libraries for this Project (requirements.tx

Eduardo Carvalho Nunes 2 Oct 31, 2021
iYTDL - Asynchronous Standalone Inline YouTube-DL Module

iYTDL Asynchronous Standalone Inline YouTube-DL Module ⬇️ Installing Install pip3 install iytdl Upgrade pip3 install -U iytdl ⭐️ Features Fully Asynch

iYTDL 46 Dec 24, 2022
Turn any live video stream or locally stored video into a dataset of interesting samples for ML training, or any other type of analysis.

Sieve Video Data Collection Example Find samples that are interesting within hours of raw video, for free and completely automatically using Sieve API

Sieve 72 Aug 01, 2022
Convert lecture videos to slides in one line. Takes an input of a directory containing your lecture videos and outputs a directory containing .PDF files containing the slides of each lecture.

Convert lecture videos to slides in one line. Takes an input of a directory containing your lecture videos and outputs a directory containing .PDF files containing the slides of each lecture.

Sidharth Anand 12 Sep 10, 2022
A Python extension that provides bindings to WebRTC M92

This project follows the W3C specification with some modifications and additions to make it work better with Python applications, with useful APIs like programmatic audio and video.

Il'ya 104 Dec 26, 2022
Python bindings for FFmpeg - with complex filtering support

ffmpeg-python: Python bindings for FFmpeg Overview There are tons of Python FFmpeg wrappers out there but they seem to lack complex filter support. ff

Karl Kroening 7.7k Jan 03, 2023
Image and video quality assessment

CenseoQoE: 视觉感知画质评价框架 项目介绍 图像/视频在编解码、传输和显示等过程中难免引入不同类型/程度的失真导致图像质量下降。图像/视频质量评价(IVQA)的研究目标是希望模仿人类视觉感知系统, 通过算法评估图片/视频在终端用户的眼中画质主观体验的好坏,目前在视频编解码、画质增强、画质监。

Tencent 133 Dec 20, 2022
GStreamer Inspector GUI

gst-explorer GStreamer GUI Interface Tool GUI interface for inspecting GStreamer Plugins, Elements and Type Finders. Expects Python3 Qt, PyQt5 and GSt

Jetsonhacks 31 Nov 29, 2022
A youtube video link or id to video thumbnail python package.

Youtube-Video-Thumbnail A youtube video link or id to video thumbnail python package. Made with Python3

Fayas Noushad 10 Oct 21, 2022
Video stream recording dockerized server using python/ffmpeg.

Stream Recording Server Video stream recording dockerized server using python/ffmpeg. Usage Configuration Prepare .env file, check .env.example for th

GR 2 Jan 14, 2022
Python package for Near Duplicate Video Detection (Perceptual Video Hashing) - Get a 64-bit comparable hash-value for any video.

The Python package for near duplicate video detection ⭐️ Introduction Videohash is a Python package for detecting near-duplicate videos (Perceptual Vi

Akash Mahanty 144 Dec 19, 2022
Techie Sneh 17 Nov 23, 2021
Takes a video as an input and creates a video which is suitable to upload on Youtube Shorts and Tik Tok (1080x1920 resolution).

Shorts-Tik-Tok-Creator Takes a video as an input and creates a video which is suitable to upload on Youtube Shorts and Tik Tok (1080x1920 resolution).

Arber Hakaj 5 Nov 09, 2022
Automatically segment in-video YouTube sponsorships.

SponsorBlock Auto Segment [Model Download] Automatically segment in-video YouTube sponsorships. Trained on a large dataset of YouTube sponsor transcri

Akmal 7 Aug 22, 2022
Meteor scan - Scan through video for meteor

meteor_scan Scan through video for meteor Installation Install python packages b

2 Jun 04, 2022
A GUI based datamoshing apllication for everyone! Apply this glitch to your videos and gifs. Supports all video formats!

A GUI based datamoshing apllication for everyone! Apply this glitch to your videos and gifs. Supports all video formats!

Akascape 131 Dec 31, 2022
This is an example of building a video Question-Answer system using Jina.

example-video-search This is an example of building a video Question-Answer system using Jina. The index data is subtitle files of YouTube videos. Aft

Jina AI 9 Oct 18, 2022