Pyspark sam - Analyze Big Sequence Alignments with PySpark in AWS EMR

Overview

pyspark_sam

This repo hosts my code for the article "Analyze Big Sequence Alignments with PySpark in AWS EMR".

Prerequisite

  1. Spark

  2. AWS CLI

  3. AWS Account

Run

Follow the instruction in the article. Once you have uploaded the files into your S3 bucket, run

aws emr create-cluster --name "Spark_step_pip" \
    --release-label emr-6.5.0 \
    --applications Name=Spark \
    --log-uri s3://[your_S3_bucket]/logs/ \
    --instance-type m5.xlarge \
    --instance-count 3 \
    --bootstrap-actions Path=s3://[your_S3_bucket]/emr_bootstrap.sh \
    --use-default-roles --auto-terminate \
    --steps "Type=Spark,Name=SparkProgram,ActionOnFailure=CONTINUE,Args=[--deploy-mode,cluster,--master,yarn,--py-files,s3://[your_S3_bucket]/helper_function.py,s3://[your_S3_bucket]/spark_3mer.py,s3://[your_S3_bucket]/test.sam,[your_S3_bucket],sankey.json]" 

When the job finishes, download the sankey.json. And run this command to visualize:

python sankey.py sankey.json

Authors

  • Sixing Huang - Concept and Coding

License

This project is licensed under the MIT License - see the LICENSE file for details

Owner
Sixing Huang
A triple Neo4j certified data scientist. I am currently working at BGI in Shenzhen.
Sixing Huang
WBMS automates sending of message to multiple numbers via WhatsApp Web

WhatsApp Bulk Message Sender - WBMS WBMS automates sending of message to multiple numbers via WhatsApp Web. Report Bug · Request Feature Love the proj

Akshay Parakh 3 Jun 26, 2022
A GitHub Action that automatically reports your Advent of Code progress in a table in your README

Advent README Stars This action adds and maintains a stars report in your README based on your Advent of Code progress. Example Table 2021 Results Day

Kevin Duff 36 Dec 30, 2022
A Twitter bot developed in Python using the Tweepy library and hosted in AWS.

Twitter Cameroon: @atangana_aron A Twitter bot developed in Python using the Tweepy library and hosted in AWS. https://twitter.com/atangana_aron Cost

1 Jan 30, 2022
Python wrapper for eBay API

python-ebay - Python Wrapper for eBay API This project intends to create a simple python wrapper around eBay APIs. Development and Download Sites The

Roopesh 99 Nov 16, 2022
Trading through Binance's API using Python & sqlite

pycrypt Automate trading crypto using Python to pull data from Binance's API and analyse trends. May or may not consistently lose money but oh well it

Maxim 4 Sep 02, 2022
A simple bot which using an API , detects reported discord scams and kicks the user if possible while deleting the message

A simple bot which using an API , detects reported discord scams and kicks the user if possible while deleting the message

Vioshim 3 Nov 16, 2022
A python API for BSCScan (Binance Smart Chain Explorer), available on PyPI.

bscscan-python A complete Python API for BscScan.com, available on PyPI. Powered by BscScan.com APIs. This is a gently modified fork of the etherscan-

Panagiotis Kotsias 246 Dec 31, 2022
Check your bot status automatically using userbot, simply and easy

Status Checker Userbot check your bot status automatically using userbot, simply and easy. Mandatory Vars API_ID : Telegram API_ID, get it from my.tel

ALBY 6 Feb 20, 2022
An API wrapper for convertio.co written in Python.

An API wrapper for convertio.co written in Python.

Moonrise 9 Sep 27, 2022
A calculator telegram bot.

Calculator-Bot A calculator telegram bot. Made with Python3 (C) @FayasNoushad Copyright permission under MIT License License - https://github.com/Fay

Fayas Noushad 33 Nov 30, 2022
数字货币动态趋势网格,随着行情变动。目前实盘月化10%。目前支持币安,未来上线火币、OKEX。

数字货币动态趋势网格,随着行情变动。目前实盘月化10%。目前支持币安,未来上线火币、OKEX。

幸福村的码农 98 Dec 27, 2022
Jupyter notebooks and AWS CloudFormation template to show how Hudi, Iceberg, and Delta Lake work

Modern Data Lake Storage Layers This repository contains supporting assets for my research in modern Data Lake storage layers like Apache Hudi, Apache

Damon P. Cortesi 25 Oct 31, 2022
Easily update resume to naukri with one click

NAUKRI RESUME AUTO UPDATER I am using poetry for dependencies. you can check or change in data.txt file for username and password Resume file must be

Rahul.p 1 May 02, 2022
qualysclient - a python SDK for interacting with the Qualys API

qualysclient - a python SDK for interacting with the Qualys API

5 Oct 28, 2022
Ig-Crackv2 - Crack Instagram Version 2.9

★★ Information ★★ ★★Menu Special Crack Melalui Pengikut Crack Melalui Mengikuti

Risky [ Zero Tow ] 11 Aug 30, 2022
Simple spam bot made in python

Simple Spam Bot A Simple and easy way to be the most hated person between your friends, All you have to do is spam the group chat using this bot until

Kareem Osama 6 Sep 05, 2022
Autofilterv5 With Same more Features

Autofilterv5 With Same more Features ✨ Imbd + Index +.....

Selfie SD 8 Oct 21, 2022
Reverse engineering the dengue virus (under development construction)

Reverse engineering the dengue virus (under development 🚧 ) What is dengue? Dengue is a viral infection transmitted to humans through the bite of inf

kjain 4 Feb 09, 2022
Python Library for Secp256k1 Bitcoin curve to do fast ECC calculation

secp256k1 Python Library for Secp256k1 Bitcoin curve to do fast ECC calculation Example Usage import secp256k1 as ice print('[C]',privatekey_to_addres

iceland 49 Jan 01, 2023
Bendford analysis of Ethereum transaction

Bendford analysis of Ethereum transaction The python script script.py extract from already downloaded archive file the ethereum transaction. The value

sleepy ramen 2 Dec 18, 2021