This is a repository for the Duke University Cloud Computing course project on Serveless Data Engineering Pipeline. For this project, I recreated the below pipeline.

Overview

AWS Data Engineering Pipeline

This is a repository for the Duke University Cloud Computing course project on Serverless Data Engineering Pipeline. For this project, I recreated the below pipeline in iCloud9 (reference: https://github.com/noahgift/awslambda):

drawing

Below are the steps of how to build this pipeline in AWS:

1️⃣ Create a new iCloud9 environment dedicated to this project.

🤔 Need a refresher? Please check this repo.

⚠️ Make sure to use name as your unique id for your items in the fang table.

2️⃣ Create a fang table in DynamoDB and SQS queue.

You can check how to do it here.

3️⃣ Build producer Lambda Function

  1. In iCloud9, initialize a serverless application with SAM template:

    sam init 

Inputs: 1, 2, 4, "producer"

  1. Set virtual environment and source it:

    # I called my virtual environment "comprehendProducer"
    python3 -m venv ~/.comprehendProducer
    source ~/.comprehendProducer/bin/activate
  2. Add the code for your application to app.py

  3. Add relevant packages used in your app to requirements.txt file

  4. Install requirements

     cd hello_world/
     pip install -r requirements.txt 
     cd .. 
  5. Create a repository (producer) in Elastic Container Registry (ECR) and copy its URI

  6. Build and deploy your serverless application:

    sam build 
    sam deploy --guided

    When prompted to input URI, paste the URI for the producer repository that you've just created.

  7. Create IAM Role granting Administrator Access to the Producer Lambda function.

    🤔 Not sure how to create IAM Role? Check out this video (17 min ).

  8. Add the execution role that you created to the Producer Lambda function.

    In case you forgot how to do it:

    In AWS console: Lambda ➡️ click on producer function ➡️ configuration ➡️ permissions ➡️ Edit ➡️ Select the role under Existing role.

  9. You are all set with the producer function! Now deactivate virtual environment:

    deactivate 
    cd .. 
    

4️⃣ Create an S3 bucket and note its name

5️⃣ Build consumer Lambda Function

Repeat steps in 3️⃣ .

⚠️ In #3 when you add the code for a consumer app to app.py, make sure to replace bucket="fangsentiment" with the name of your S3 bucket.

6️⃣ Add triggers to Lambda Functions

🤔 Not sure how to do it? Check out this video (start times are noted below):

Producer Lambda Function: CloudWatchEvent(30 min)

Consumer Lambda Function: SQS (42 min)

7️⃣ If all goes well, you will see sentiment results in your S3 bucket:

s3

💡 Tip: If you've already deployed your Lambda function but need to edit your application, you can make the necessary edits to your app and build and deploy the app again:

sam build && sam deploy 

💡 Tip: If you don't have space left on disk, you may want to remove a few docker containers that you don't use.

#list containers 
docker image ls 
# remove a container 
docker image rm <containerId>
Leakvertise is a Python open-source project which aims to bypass these fucking annoying captchas and ads from linkvertise, easily

Leakvertise Leakvertise is a Python open-source project which aims to bypass these fucking annoying captchas and ads from linkvertise, easily. You can

Quatrecentquatre 9 Oct 06, 2022
A Powerful, Smart And Simple Userbot In Telethon.

Owner: Masterolic 🇮🇳 BLACK LIGHTNING A Powerful, Smart And Simple Userbot In Telethon. Credits This is A Remix Bot Of Many UserBot. DARKCOBRA Friday

Masterolic 1 Nov 28, 2021
An advanced Twitter scraping & OSINT tool written in Python that doesn't use Twitter's API, allowing you to scrape a user's followers, following, Tweets and more while evading most API limitations.

TWINT - Twitter Intelligence Tool No authentication. No API. No limits. Twint is an advanced Twitter scraping tool written in Python that allows for s

TWINT Project 14.2k Jan 03, 2023
TgMusicBot is a telegram userbot for playing songs in telegram voice calls based on Pyrogram and PyTgCalls.

TgMusicBot [Stable] TgMusicBot is a telegram userbot for playing songs in telegram voice calls based on Pyrogram and PyTgCalls. Commands !start / !hel

Kürşad 21 Dec 25, 2022
Twitter for Python!

Tweepy: Twitter for Python! Installation The easiest way to install the latest version from PyPI is by using pip: pip install tweepy You can also use

9.4k Jan 07, 2023
A repo containing toolings and software useful for a DevOps Engineer

DevOps-Tooling A repo containing toolings and software useful for a DevOps Engineer (or if you're setting up your Mac from the beginning) Currently se

Mohamed Abukar 45 Dec 12, 2022
dex.guru python sdk

dexguru-sdk.py dexguru-sdk.py allows you to access dex.guru public methods from your async python scripts. Installation To install latest version, jus

DexGuru 17 Dec 06, 2022
A Telegram bot to transcribe audio, video and image into text.

Transcriber Bot A Telegram bot to transcribe audio, video and image into text. Deploy to Heroku Local Deploying Install the FFmpeg. Make sure you have

10 Dec 19, 2022
Connect your Nintendo Switch playing status to Discord!

Disclaimer: Unfortunately, it appears that Nintendo has removed returning self-Presence in their API as of recently, making this project near obsolete

Deltaion Lee 145 Dec 30, 2022
AWSXenos will list all the trust relationships in all the IAM roles and S3 buckets

AWS External Account Scanner Xenos, is Greek for stranger. AWSXenos will list all the trust relationships in all the IAM roles, and S3 buckets, in an

AirWalk 57 Nov 07, 2022
A multipurpose, semi-modular Discord bot written in Python with the new discord.py module.

Discord.py Reaction Bot MIRAI KURIYAMA A multipurpose, semi-modular Discord bot written in Python with the new discord.py module. Installing dependenc

1 Dec 02, 2021
A Tᴇʟᴇɢʀᴀᴍ Vɪᴅᴇᴏ Pʟᴀʏᴇʀ Bᴏᴛ Tᴏ Pʟᴀʏ YT Vɪᴅᴇᴏs & Lɪᴠᴇ Sᴛʀᴇᴀᴍ.

Tuktuky_Music Telegram bot to stream videos in telegram voicechat for both groups and channels. Supports live strams, YouTube videos and telegram medi

TᑌKTᑌKY ᖇᗩᕼᗰᗩᑎ 3 Sep 14, 2021
Código para trabalho com o dataset Wine em Python

Um perceptron multicamadas (MLP) é uma rede neural artificial feedforward que gera um conjunto de saídas a partir de um conjunto de entradas. Um MLP é

Hemili Beatriz 1 Jan 08, 2022
File-sharing-Bot: Telegram Bot to store Posts and Documents and it can Access by Special Links.

Bromélia HSS bromelia-hss is the second official implementation of a Diameter-based protocol application by using the Python micro framework Bromélia.

1 Dec 17, 2021
A Python API wrapper for the Twitter API!

PyTweet PyTweet is an api wrapper made for twitter using twitter's api version 2! Installation Windows py3 -m pip install PyTweet Linux python -m pip

TheFarGG 1 Nov 19, 2022
Command-line program to download image galleries and collections from several image hosting sites

gallery-dl gallery-dl is a command-line program to download image galleries and collections from several image hosting sites (see Supported Sites). It

Mike Fährmann 6.4k Jan 06, 2023
SpamBot.py allows you, to spam other Chat Partners etc.

SpamBot -SpamBot.py allows you, to spam other Chat Partners etc. Install If you downloaded it yet, you have to install "requirements.txt" write the di

Marco 1 Jan 16, 2022
Verkehrsunfälle in Deutschland, aufgeschlüsselt nach Verkehrsmittel des Hauptverursachers und Nebenverursachers

How-To Einfach ./main.py ausführen mit der Statistik-Datei aus dem Ordner "Unfälle_mit_mehreren_Beteiligten" als erstem Argument. Requirements python,

4 Oct 12, 2022
Framework to collect and process weather data from wttr.in.

Weathercrawler Automatic extraction and processing framework for weather data from wttr.in Installation tested with: Python 3.7.3 Python 3.9.4 git clo

Maurice Günder 0 Jul 26, 2021
Telegram Link Shortener Bot (With 20 Shorteners)

Telegram ShortenerBot ShortenerBot: 🇬🇧 Telegram Link Shortener Bot (11 + 9 Shorteners) 🇹🇷 Telegram Link Kısaltıcı Bot (11 + 9 Kısaltıcı) All suppo

Hüzünlü Artemis [HuzunluArtemis] 10 May 24, 2022