Pyspark sam - Analyze Big Sequence Alignments with PySpark in AWS EMR

Overview

pyspark_sam

This repo hosts my code for the article "Analyze Big Sequence Alignments with PySpark in AWS EMR".

Prerequisite

  1. Spark

  2. AWS CLI

  3. AWS Account

Run

Follow the instruction in the article. Once you have uploaded the files into your S3 bucket, run

aws emr create-cluster --name "Spark_step_pip" \
    --release-label emr-6.5.0 \
    --applications Name=Spark \
    --log-uri s3://[your_S3_bucket]/logs/ \
    --instance-type m5.xlarge \
    --instance-count 3 \
    --bootstrap-actions Path=s3://[your_S3_bucket]/emr_bootstrap.sh \
    --use-default-roles --auto-terminate \
    --steps "Type=Spark,Name=SparkProgram,ActionOnFailure=CONTINUE,Args=[--deploy-mode,cluster,--master,yarn,--py-files,s3://[your_S3_bucket]/helper_function.py,s3://[your_S3_bucket]/spark_3mer.py,s3://[your_S3_bucket]/test.sam,[your_S3_bucket],sankey.json]" 

When the job finishes, download the sankey.json. And run this command to visualize:

python sankey.py sankey.json

Authors

  • Sixing Huang - Concept and Coding

License

This project is licensed under the MIT License - see the LICENSE file for details

Owner
Sixing Huang
A triple Neo4j certified data scientist. I am currently working at BGI in Shenzhen.
Sixing Huang
Bot Auto Chess.com

Bot Auto Chess.com Is a suggestion for chess moves on the chess.com platform. The available features are: chess suggestions and moves automatically. i

Tn. Ninja 34 Jan 01, 2023
Automate UCheck COVID-19 self-assessment form submission

ucheck Automate UCheck COVID-19 self-assessment form submission. Disclaimer ucheck automatically completes the University of Tornto's UCheck COVID-19

Ira Horecka 15 Nov 30, 2022
unofficial library for discord components(on development)

discord.py-buttons unofficial library for discord buttons(on development) Install pip install --upgrade discord_buttons Example from discord import Cl

kiki7000 129 Dec 31, 2022
A discord bot that can detect Nitro Scam Links and delete them to protect other users

A discord bot that can detect Nitro Scam Links and delete them to protect other users. Add it to your server from here.

Kanak Mittal 9 Oct 20, 2022
A Bot To Get Info Of Telegram messages , Media , Channel id Group ID etc.

Info-Bot A Bot To Get Info Of Telegram messages , Media , Channel id Group ID etc. Get Info Of Your And Messages , Channels , Groups ETC... How to mak

Vɪᴠᴇᴋ 23 Nov 12, 2022
A corona statistics and information telegram bot.

A corona statistics and information telegram bot.

Fayas Noushad 15 Oct 21, 2022
use python script to fix vmp dump api in ida

FixVmpDump use python script to fix vmp dump api in ida. support x86 and x64. details in my blog: https://blog.csdn.net/yan_star/article/details/11279

97 Nov 02, 2022
Automates the process to obtain an appointment for NIE in spain.

get-nie-appointment A Python script that automates the process of getting an appointment for NIE assignation. It can be modified in order to change th

Ezequiel Aceto 39 Nov 28, 2022
AnyAPI is a library that helps you to write any API wrapper with ease and in pythonic way.

AnyAPI AnyAPI is a library that helps you to write any API wrappers with ease and in pythonic way. Features Have better looking code using dynamic met

Fatih Kilic 129 Sep 20, 2022
A script written in python3 for bruteforcing Gmail accounts.

GmailBruteforce Made for bruteforcing gmail accounts. It needs Less Secure Apps setting turned on in order to work. Installation For windows git clone

Shinero 4 Sep 16, 2022
Official Python client for the MonkeyLearn API. Build and consume machine learning models for language processing from your Python apps.

MonkeyLearn API for Python Official Python client for the MonkeyLearn API. Build and run machine learning models for language processing from your Pyt

MonkeyLearn 157 Nov 22, 2022
A file-based quote bot written in Python

Let's Write a Python Quote Bot! This repository will get you started with building a quote bot in Python. It's meant to be used along with the Learnin

1 Jan 19, 2022
Repositorio dedicado a contener los archivos fuentes del bot de discord "Lector de Ejercicios".

Lector de Ejercicios Este bot de discord está pensado para usarse únicamente en el discord de la materia Algoritmos y Programación I, de la Facultad d

Franco Lighterman Reismann 3 Sep 17, 2022
A small python script which runs a speedtest using speedtest.net and inserts it into a Google Docs Spreadsheet.

speedtest-google-sheets This is a small python script which runs a speedtest using speedtest.net and inserts it into a Google Docs Spreadsheet. Setup

marie 2 Feb 10, 2022
An API serving data on all creatures, monsters, materials, equipment, and treasure in The Legend of Zelda: Breath of the Wild

Hyrule Compendium API An API serving data on all creatures, monsters, materials, equipment, and treasure in The Legend of Zelda: Breath of the Wild. B

Aarav Borthakur 116 Dec 01, 2022
Find songs by lyrics.

LyricSearch Hi, welcome to LyricSearch - a simple (Yes), fast (Maybe), and powerful (Approach) lyric search engine. We support Three search methods to

Dicer_ 1 Dec 13, 2021
Automated AWS account hardening with AWS Control Tower and AWS Step Functions

Automate activities in Control Tower provisioned AWS accounts Table of contents Introduction Architecture Prerequisites Tools and services Usage Clean

AWS Samples 20 Dec 07, 2022
Automatically pick a winner who Retweeted, Commented, and Followed your Twitter account!

AutomaticTwitterGiveaways automates selecting winners for "Retweet, Comment, Follow" type Twitter giveaways.

1 Jan 13, 2022
Repository to access information of stocks in Bombay Stock Exchange.

BSE Repository to access information of stocks in Bombay Stock Exchange. The code in this repository uses BSE API and conclusions made using the code

1 Nov 13, 2021
L3DAS22 challenge supporting API

L3DAS22 challenge supporting API This repository supports the L3DAS22 IEEE ICASSP Grand Challenge and it is aimed at downloading the dataset, pre-proc

L3DAS 38 Dec 25, 2022