Pyspark sam - Analyze Big Sequence Alignments with PySpark in AWS EMR

Overview

pyspark_sam

This repo hosts my code for the article "Analyze Big Sequence Alignments with PySpark in AWS EMR".

Prerequisite

  1. Spark

  2. AWS CLI

  3. AWS Account

Run

Follow the instruction in the article. Once you have uploaded the files into your S3 bucket, run

aws emr create-cluster --name "Spark_step_pip" \
    --release-label emr-6.5.0 \
    --applications Name=Spark \
    --log-uri s3://[your_S3_bucket]/logs/ \
    --instance-type m5.xlarge \
    --instance-count 3 \
    --bootstrap-actions Path=s3://[your_S3_bucket]/emr_bootstrap.sh \
    --use-default-roles --auto-terminate \
    --steps "Type=Spark,Name=SparkProgram,ActionOnFailure=CONTINUE,Args=[--deploy-mode,cluster,--master,yarn,--py-files,s3://[your_S3_bucket]/helper_function.py,s3://[your_S3_bucket]/spark_3mer.py,s3://[your_S3_bucket]/test.sam,[your_S3_bucket],sankey.json]" 

When the job finishes, download the sankey.json. And run this command to visualize:

python sankey.py sankey.json

Authors

  • Sixing Huang - Concept and Coding

License

This project is licensed under the MIT License - see the LICENSE file for details

Owner
Sixing Huang
A triple Neo4j certified data scientist. I am currently working at BGI in Shenzhen.
Sixing Huang
EZXT - A ccxt wrapped client for binance & ftx

EZXT Open source & beginner-friendly ccxt wrapped client for binance & ftx Want

Shaft 10 Oct 25, 2022
Creates Spotify playlists from Spinitron playlists.

spin2spot Creates Spotify playlists from Spinitron playlists. Quick Start You can use spin2spot as a command-line tool: Erik Didriksen 1 Aug 28, 2021

A ShareX alternative for Mac OS built in Python.

Clipboard Uploader A ShareX alternative for Mac OS built in Python. Install and setup Download the latest release and put it in your applications fold

Ben Tettmar 2 Jun 07, 2022
Die wichtigsten APIs Deutschlands in einem Python Paket.

Deutschland A python package that gives you easy access to the most valuable datasets of Germany. Installation pip install deutschland Geographic data

Bundesstelle für Open Data 921 Jan 08, 2023
Wallpaper API from wallpaperscraft.com

wallpaper-api Wallpaper API from https://wallpaperscraft.com for API documentation see https://maajid-wallpaper-api.deta.dev/docs How to Run first, cl

Athallah Muhammad Maajid 2 Apr 06, 2022
A simple, lightweight Discord bot running with only 512 MB memory on Heroku

Haruka This used to be a music bot, but people keep using it for NSFW content. Can't everyone be less horny? Bot commands See the built-in help comman

Haruka 4 Dec 26, 2022
⛑ REDCap API interface in Python

REDCap API in Python Description Supports structured data extraction for REDCap projects. The API module d3b_redcap_api.redcap.REDCapStudy can be logi

D3b 1 Nov 21, 2022
We have made you a wrapper you can't refuse

We have made you a wrapper you can't refuse We have a vibrant community of developers helping each other in our Telegram group. Join us! Stay tuned fo

20.6k Jan 04, 2023
A continued fork of Disco

Orca Orca is an extensive and extendable Python 3.x library for the Discord API. orca boasts the following major features: Expressive, functional inte

RPS 4 Apr 03, 2022
Telegram Bot for saving and sharing personal and group notes

EZ Notes Bot (ezNotesBot) Telegram Bot for saving and sharing personal and group notes. Usage Personal notes: reply to any message in PM to save it as

Dash Eclipse 8 Nov 07, 2022
Asynchronous Python Wrapper for the Ufile API

Ufile.io Asynchronous Python Wrapper for the Ufile API (Unofficial).

Gautam Kumar 16 Aug 31, 2022
Autofilterv5 With Same more Features

Autofilterv5 With Same more Features ✨ Imbd + Index +.....

Selfie SD 8 Oct 21, 2022
CloudFormation Drift Remediation - Use Cloud Control API to remediate drift that was detected on a CloudFormation stack

CloudFormation Drift Remediation - Use Cloud Control API to remediate drift that was detected on a CloudFormation stack

Cloudar 36 Dec 11, 2022
SquireBot is a Discord bot designed to run and manage tournaments entirely within a Discord.

Overview SquireBot is a Discord bot designed to run and manage tournaments entirely within a Discord. The current intended usecase is Magic: the Gathe

7 Nov 29, 2022
YouTube playlist Files downloaded by FDM are not organized according to the original order on YouTube

Youtube-Playlist-File-Organizer YouTube playlist Files downloaded by Free Download Manager are not organized according to the original order on YouTub

David Mainoo 3 Dec 27, 2021
Discord Token Nuker With Python

Discord token nuker a.k.a A$$Fvcker Setup For installing the requirements do this: pip install -r requirements.txt To start the Token nuker run this

PR3C14D0 8 Sep 22, 2022
Facebook fishing on telegram bot

Facebook-fishing Facebook fishing on telegram bot تثبيت الاداة pkg update -y pkg upgrade -y pkg install git -y pkg install python -y git clone https:/

sadamalsharabi 7 Oct 18, 2022
A Bot Telegram Anti Users Channel to automatic ban users who using channel to send message in group.

Tg_Anti_UsersChannel A Bot Telegram Anti Users Channel to automatic ban users who using channel to send message in group. Features: Automatic ban Whit

idzeroid 6 Dec 26, 2021
A program used to create accounts in bulk, still a work in progress as of now.

Discord Account Creator This project is still a work in progress. It will be published upon its full completion. About This project is still under dev

patched 8 Sep 15, 2022
A simple python script for rclone. Use multiple Google Service Accounts and cycle through them.

About GSAclone GSAclone is a simple python script for rclone, written with the purpose of using multiple Google service accounts on Google Drive and "

Shiro39 6 Feb 25, 2022