Pyspark sam - Analyze Big Sequence Alignments with PySpark in AWS EMR

Overview

pyspark_sam

This repo hosts my code for the article "Analyze Big Sequence Alignments with PySpark in AWS EMR".

Prerequisite

  1. Spark

  2. AWS CLI

  3. AWS Account

Run

Follow the instruction in the article. Once you have uploaded the files into your S3 bucket, run

aws emr create-cluster --name "Spark_step_pip" \
    --release-label emr-6.5.0 \
    --applications Name=Spark \
    --log-uri s3://[your_S3_bucket]/logs/ \
    --instance-type m5.xlarge \
    --instance-count 3 \
    --bootstrap-actions Path=s3://[your_S3_bucket]/emr_bootstrap.sh \
    --use-default-roles --auto-terminate \
    --steps "Type=Spark,Name=SparkProgram,ActionOnFailure=CONTINUE,Args=[--deploy-mode,cluster,--master,yarn,--py-files,s3://[your_S3_bucket]/helper_function.py,s3://[your_S3_bucket]/spark_3mer.py,s3://[your_S3_bucket]/test.sam,[your_S3_bucket],sankey.json]" 

When the job finishes, download the sankey.json. And run this command to visualize:

python sankey.py sankey.json

Authors

  • Sixing Huang - Concept and Coding

License

This project is licensed under the MIT License - see the LICENSE file for details

Owner
Sixing Huang
A triple Neo4j certified data scientist. I am currently working at BGI in Shenzhen.
Sixing Huang
Play Video & Music on Telegram Group Video Chat

Video Stream is an Advanced Telegram Bot that's allow you to play Video & Music on Telegram Group Video Chat 🧪 Get SESSION_NAME from below: Pyrogram

Sehath Perera 1 Jan 17, 2022
Cdk-python-crud-app - CDK Python CRUD App

Welcome to your CDK Python project! You should explore the contents of this proj

Shapon Sheikh 1 Jan 12, 2022
CnCL - CnCLess it's an Easy to deploy Botnet without CnC/C2

CnCL CnCLess it's an Easy to deploy Botnet without CnC/C2, Harder to track and t

ZSendokame 2 Jan 10, 2022
Python wrapper for eBay API

python-ebay - Python Wrapper for eBay API This project intends to create a simple python wrapper around eBay APIs. Development and Download Sites The

Roopesh 99 Nov 16, 2022
Telegram Userbot to steram youtube live or Youtube vido in telegram vc by help of pytgcalls

TGVCVidioPlayerUB Telegram Userbot to steram youtube live or youtube vidio in telegram vc by help of pytgcalls Commands = Vidio Playing 🎧 stream :

Achu biju 3 Oct 28, 2022
A Python Library to interface with LinkedIn API, OAuth and JSON responses

#Overview Here's another library based on the LinkedIn API, OAuth and JSON responses. Hope this documentation explains everything you need to get star

Mike Helmick 69 Dec 11, 2022
Discord Token Nuker With Python

Discord token nuker a.k.a A$$Fvcker Setup For installing the requirements do this: pip install -r requirements.txt To start the Token nuker run this

PR3C14D0 8 Sep 22, 2022
Home Assistant Hilo Integration via HACS

BETA This is a beta release. There will be some bugs, issues, etc. Please bear with us and open issues in the repo. Hilo Hilo integration for Home Ass

66 Dec 23, 2022
A client that allows a user, specifiy their discord token, to send images remotely to discord

ImageBot_for_Discord A client that allows a user, specifiy their discord token, to send images remotely to discord. Can select images using a file dia

0 Aug 24, 2022
Python client for QIWI payment system

Pyqiwi Lib for QIWI payment system Installation pip install pyqiwi Usage from decimal import Decimal from datetime import datetime, timedelta from p

Andrey 12 Jun 03, 2022
A telegram bot to track whales activities on multiple blockchains.

Telegram Bot : Whale Watcher A straightforward telegram bot written in python to track whales activity on multiple blockchains, using whale-alert API

Laurenz Bougan 1 Dec 10, 2021
Desafio de projeto sobre Git/Github

Maçã ou Laranja? 🤔 Desafio Projeto Dio para Git/Github 🔶 Para esse primeiro repositório, decidir adicionar o primeiro algoritmo de inteligência arti

José Filipe 2 Oct 23, 2022
Michelle is a Discord Bot coded in Python with Discord.py by Mudit07.

Michelle is a Discord Bot coded in Python with Discord.py by Mudit07.

Michelle 3 Oct 09, 2021
This is a small package to interact with the OpenLigaDB API.

OpenLigaDB This is a small package to interact with the OpenLigaDB API. Installation Run the following to install: pip install openligadb Usage from o

1 Dec 31, 2021
Copier template for solving Advent of Code puzzles with Python

Advent of Code Python Template for Copier This template creates scaffolding for one day of Advent of Code. It includes tests and can download your per

Geir Arne Hjelle 6 Dec 25, 2022
Python wrappers for INHECO ODTC and SCILA libraries by INHECO GmbH.

Python wrappers for INHECO ODTC and SCILA libraries by INHECO GmbH.

1 Feb 09, 2022
Easily update resume to naukri with one click

NAUKRI RESUME AUTO UPDATER I am using poetry for dependencies. you can check or change in data.txt file for username and password Resume file must be

Rahul.p 1 May 02, 2022
Automatically pick a winner who Retweeted, Commented, and Followed your Twitter account!

AutomaticTwitterGiveaways automates selecting winners for "Retweet, Comment, Follow" type Twitter giveaways.

1 Jan 13, 2022
A multifunctional bot for Discord

Um bot multifuncional e divertido para Discord Estive desenvolvendo o BotDaora desde o começo de outubro de 2021 e agora ele é open-source! tomei essa

Ruan 4 Dec 28, 2021
Simple web browser to visualize HiC tracks

HiCBrowser : A simple web browser to visualize Hi-C and other genomic tracks Fidel Ramirez, José Villaveces, Vivek Bhardwaj Installation You can insta

The deepTools ecosystem 14 Jun 21, 2022