Pyspark sam - Analyze Big Sequence Alignments with PySpark in AWS EMR

Overview

pyspark_sam

This repo hosts my code for the article "Analyze Big Sequence Alignments with PySpark in AWS EMR".

Prerequisite

  1. Spark

  2. AWS CLI

  3. AWS Account

Run

Follow the instruction in the article. Once you have uploaded the files into your S3 bucket, run

aws emr create-cluster --name "Spark_step_pip" \
    --release-label emr-6.5.0 \
    --applications Name=Spark \
    --log-uri s3://[your_S3_bucket]/logs/ \
    --instance-type m5.xlarge \
    --instance-count 3 \
    --bootstrap-actions Path=s3://[your_S3_bucket]/emr_bootstrap.sh \
    --use-default-roles --auto-terminate \
    --steps "Type=Spark,Name=SparkProgram,ActionOnFailure=CONTINUE,Args=[--deploy-mode,cluster,--master,yarn,--py-files,s3://[your_S3_bucket]/helper_function.py,s3://[your_S3_bucket]/spark_3mer.py,s3://[your_S3_bucket]/test.sam,[your_S3_bucket],sankey.json]" 

When the job finishes, download the sankey.json. And run this command to visualize:

python sankey.py sankey.json

Authors

  • Sixing Huang - Concept and Coding

License

This project is licensed under the MIT License - see the LICENSE file for details

Owner
Sixing Huang
A triple Neo4j certified data scientist. I am currently working at BGI in Shenzhen.
Sixing Huang
🔏 Discordちゃんねる ◆wGFzKUzY7E

使い方 discord.pyをインストール. python -m pip install -r requirements.txtを実行. bot.pyと同じ階層に.tokenを用意. bot.pyを実行. ※現状、使用しているライブラリの関係でWindowsOSは未対応です。 コマンド ニックネーム

Gattxxa 3 Feb 02, 2022
Materials to reproduce our findings in our stories, "Amazon Puts Its Own 'Brands' First Above Better-Rated Products" and "When Amazon Takes the Buy Box, it Doesn’t Give it up"

Amazon Brands and Exclusives This repository contains code to reproduce the findings featured in our story "Amazon Puts Its Own 'Brands' First Above B

The Markup 60 Nov 11, 2022
A liblary whre you can find helpful functions for your discord bot

DBotUtils A liblary whre you can find helpful functions for your discord bot Easy setup Setup is easily and flexible. Change anytime. After setup just

Kondek286 1 Nov 02, 2021
Cities bot - A simple example of using aiogram and the wikipedia package

Cities game A simple example of using aiogram and the wikipedia package. The bot

Artem Meller 2 Jan 29, 2022
A simple object model for the Notion SDK.

A simplified object model for the Notion SDK. This is loosely modeled after concepts found in SQLAlchemy.

Jason Heddings 54 Jan 02, 2023
Search stock images (e.g. via Unsplash) and save them to your Wagtail image library.

Wagtail Stock Images Search stock images (e.g. via Unsplash) and save them to your Wagtail image library. Requirements Python 3 Django = 2 Wagtail =

Vicktor 12 Oct 12, 2022
Official Python wrapper for the Quantel Finance API

Quantel is a powerful financial data and insights API. It provides easy access to world-class financial information. Quantel goes beyond just financial statements, giving users valuable information l

Guy 47 Oct 16, 2022
Script to get a notification when a product, on Amazon Warehouse, is available within a target price

Amazon_Warehouse_Scraping This script aims to scrape Amazon Warehouse and send an email back if there are products whose price matches with the target

2 Oct 25, 2021
This is a simple grabber written in Python which helps you to grab products from Willhaben.at

Willhaben Grabber This is a simple grabber written in Python which helps you to grab products from Willhaben.at General info The tool generates a sear

Ramo 0 Feb 16, 2022
Cogs for RedDiscord-Bot V3

Cogs v3 Disclaimer: This is an unapproved repo, meaning no one has formally reviewed this repo yet and any loss of data in your bot isn't my fault (An

Honkertonken 5 Nov 17, 2022
A Python API wrapper for the Twitter API!

PyTweet PyTweet is an api wrapper made for twitter using twitter's api version 2! Installation Windows py3 -m pip install PyTweet Linux python -m pip

TheFarGG 1 Nov 19, 2022
CryptoBar - A simple MenuBar app that shows the price of 3 cryptocurrencies

CryptoBar A very simple MenuBar app that shows the price of the following crypto

4 Jul 04, 2022
Twitter-Scrapping - Tweeter tweets extracting using python

Twitter-Scrapping Twitter tweets extracting using python This project is to extr

Suryadeepsinh Gohil 2 Feb 04, 2022
The best (and now open source) Discord selfbot.

React Selfbot Yes, for real Why am I making this open source? Because can't stop calling my product a rat, tokenlogger and what else not. But there is

30 Nov 13, 2022
Telegram bot for making Heroku app.json by @AbirHasan2005

Heroku-app.json A Telegram bot for making Heroku app.json by @AbirHasan2005. Demo Bot Host Bot Deploy to Heroku Click Below Button to Deploy to Heroku

Abir Hasan 46 Nov 13, 2022
Python script to Funge NFTs.

Python script to Funge NFTs. It scrapes OpenSea for a given list of NFT collections and downloads a certain number of NFTs from each collection or the entire collections.

3 Apr 28, 2022
CloudFormation Drift Remediation - Use Cloud Control API to remediate drift that was detected on a CloudFormation stack

CloudFormation Drift Remediation - Use Cloud Control API to remediate drift that was detected on a CloudFormation stack

Cloudar 36 Dec 11, 2022
Spore REST API asyncio client

Spore REST API asyncio client

LEv145 16 Aug 02, 2022
A discord bot for tracking Iranian Minecraft servers and showing the statistics of them

A discord bot for tracking Iranian Minecraft servers and showing the statistics of them

MCTracker 20 Dec 30, 2022
Eva Maria Bot With Python

Eva Maria Bot Features Auto Filter Manual Filter IMDB Admin Commands Broadcast Index IMDB search Inline Search Random pics ids and User info Stats, Us

Aadhi 3 Jan 06, 2022