Pyspark sam - Analyze Big Sequence Alignments with PySpark in AWS EMR

Overview

pyspark_sam

This repo hosts my code for the article "Analyze Big Sequence Alignments with PySpark in AWS EMR".

Prerequisite

  1. Spark

  2. AWS CLI

  3. AWS Account

Run

Follow the instruction in the article. Once you have uploaded the files into your S3 bucket, run

aws emr create-cluster --name "Spark_step_pip" \
    --release-label emr-6.5.0 \
    --applications Name=Spark \
    --log-uri s3://[your_S3_bucket]/logs/ \
    --instance-type m5.xlarge \
    --instance-count 3 \
    --bootstrap-actions Path=s3://[your_S3_bucket]/emr_bootstrap.sh \
    --use-default-roles --auto-terminate \
    --steps "Type=Spark,Name=SparkProgram,ActionOnFailure=CONTINUE,Args=[--deploy-mode,cluster,--master,yarn,--py-files,s3://[your_S3_bucket]/helper_function.py,s3://[your_S3_bucket]/spark_3mer.py,s3://[your_S3_bucket]/test.sam,[your_S3_bucket],sankey.json]" 

When the job finishes, download the sankey.json. And run this command to visualize:

python sankey.py sankey.json

Authors

  • Sixing Huang - Concept and Coding

License

This project is licensed under the MIT License - see the LICENSE file for details

Owner
Sixing Huang
A triple Neo4j certified data scientist. I am currently working at BGI in Shenzhen.
Sixing Huang
Console BeautifulDiscord theme manager

BeautifulDiscord theme manager Console script for downloading & managing Discord .css themes via BeautifulDiscord. Setup Simply run # Linux/MacOS pip3

1 Dec 15, 2022
A basic Ubisoft API wrapper created in python.

UbisoftAPI A basic Ubisoft API wrapper created in python. I will be updating this with more endpoints as time goes on. Please note that this is my fir

Ethan 2 Oct 31, 2021
ВКонтакте бот для управления Sugar кошельком

Sugarchain VK ВКонтакте бот для управления Sugar кошельком Установка Установить зависимости можно командой: pip install -r requirements.txt Запуск (из

Vladimir 4 Jun 06, 2021
Library for working with QIWI API.

Library for working with QIWI API.

qxtony 2 Apr 26, 2022
This is a DCA crypto trading bot built for Binance written in Python

This is a DCA crypto trading bot built for Binance written in Python. It works by allowing you to DCA at an interval of your choosing and reports back on your average buy price as well as a chart con

Andrei 55 Oct 17, 2022
dex.guru python sdk

dexguru-sdk.py dexguru-sdk.py allows you to access dex.guru public methods from your async python scripts. Installation To install latest version, jus

DexGuru 17 Dec 06, 2022
Python client for Messari's API

Messari API Messari provides a free API for crypto prices, market data metrics, on-chain metrics, and qualitative information (asset profiles). This d

Messari 85 Dec 22, 2022
A pypi packages finder telegram bot.

PyPi-Bot A pypi packages information finder telegram bot. Made with Python3 (C) @FayasNoushad Copyright permission under MIT License License - https:

Fayas Noushad 17 Oct 21, 2022
A Telegram Bin Checker Bot made with python for check Bin valid or Invalid. 💳

Bin Checker Bot A Telegram Bin Checker Bot made with python for check Bin valid or Invalid. 📌 Deploy On Heroku 🏷 Environment Variables API_ID - Your

Chamindu Denuwan 20 Dec 10, 2022
5 Oct 19, 2022
A Simple Voice Music Player

📀 𝐕𝐂𝐔𝐬𝐞𝐫𝐁𝐨𝐭 √𝙏𝙚𝙖𝙢✘𝙊𝙘𝙩𝙖𝙫𝙚 NOTE JUST AN ENGLISH VERSION OF OUR PRIVATE SOURCE WAIT FOR LATEST UPDATES JOIN @𝐒𝐔𝐏𝐏𝐎𝐑𝐓 JOIN @𝐂?

TeamOctave 8 May 08, 2022
Represents a Lavalink client used to manage nodes and connections.

lavaplayer Represents a Lavalink client used to manage nodes and connections. setup pip install lavaplayer setup lavalink you need to java 11* LTS or

HazemMeqdad 37 Nov 21, 2022
Please Do Not Throw Sausage Pizza Away - Side Scrolling Up The OSI Stack

Please Do Not Throw Sausage Pizza Away - Side Scrolling Up The OSI Stack

John Capobianco 2 Jan 25, 2022
Deploy your apps on any Cloud provider in just a few seconds

The simplest way to deploy your apps in the Cloud Deploy your apps on any Cloud providers in just a few seconds ⚡ Qovery Engine is an open-source abst

Qovery 1.9k Dec 26, 2022
Url-shortener - A url shortener made in python using the API's from the pyshorteners lib

URL Shortener Um encurtador de link feito em python usando as API's da lib pysho

Spyware 3 Jan 07, 2022
Assassination API for getting random quotes from Assassination Classroom.

Assassination API Take advantage of what you have, while you have it. Quotes from Assassination Classroom Assassination classroom is one of best anime

Swanand Mulay 3 Jul 15, 2022
AWS Enumeration and Footprinting Tool

Quiet Riot 🎶 C'mon, Feel The Noise 🎶 An enumeration tool for scalable, unauthenticated validation of AWS principals; including AWS Acccount IDs, roo

Wes Ladd 89 Jan 05, 2023
NFT Generator: A modular NFT generator application

NFT Generator A simple passion project done with the role to learn a bit about h

2 Aug 30, 2022
A python library to interact with the EarnApp API

EarnApp.py Table of contents General info Documentation Setup General info A python library to interact with the EarnApp API. Documentation First, imp

3 Dec 14, 2022
Infrastructure template and Jupyter notebooks for running RoseTTAFold on AWS Batch.

AWS RoseTTAFold Infrastructure template and Jupyter notebooks for running RoseTTAFold on AWS Batch. Overview Proteins are large biomolecules that play

AWS Samples 20 May 10, 2022