Snakemake worflow to process and filter long read data from Oxford Nanopore Technologies.

Overview

Nanopore-Workflow

Snakemake workflow to process and filter long read data from Oxford Nanopore Technologies. It is designed to compare whole human genome tumor/normal pairs, but can also run individual samples. Reports and plots are generated for de novo genome assembly, differentially methylated regions, copy number variants, and structural variants. Filtering heuristics typically reduce the reported translocations to the break points. It is suggested to have at least 15x - 20x of coverage, and a median read length of at least 5kbp - 6kbp.

nanopore_workflow

Installation instructions

Download the latest code from GitHub:

git clone https://github.com/mike-molnar/nanopore-workflow.git

Before running the workflow, you will need to download the reference genome. I have not included the download as part of the workflow because it is designed to run a cluster that may not have internet access. You can use a local copy of GRCh38 if you have one, but the chromosomes must be named chr1, chr2, ... , and the reference can only contain the autosomes and sex chromosomes. To download the reference genome and index it, change to the reference directory of the workflow and run the script:

cd /path/to/nanopore-workflow/reference
chmod u+x download_reference.sh
./download_reference.sh

To run the workflow copy the Snakefile and config.yaml files to the directory that you want to run the workflow:

cp /path/to/nanopore-workflow/Snakefile /path/to/nanopore-workflow/config.yaml /path/to/samples

Modify the config.yaml file to represent the information for the necessary files and directories of your sample(s). The workflow is currently designed to have a single FASTQ, and a single sequencing summary file in a folder named fastq that is in a folder named after the sample. The config.yaml file provides an example of how to format the initial files and directories before running the workflow.

To run on a grid engine

There are a few different grid engines, so the exact format may be different for your particular grid engine. To run everything except the de novo assembly on a Univa grid engine:

snakemake --jobs 500 --rerun-incomplete --keep-going --latency-wait 30 --cluster "qsub -cwd -V -o snakemake.output.log -e snakemake.error.log -q queue_name -P project_name -pe smp {threads} -l h_vmem={params.memory_per_thread} -l h_rt={params.run_time} -b y" all_but_assembly

You will have to replace queue_name and project_name with the necessary values to run on your grid.

Dependencies

There are many dependencies, so it is best to create a new Conda environment using the YAML files in the env directory. There is a YAML file for the workflow, and another for Medaka. You will need to install a separate environment for QUAST if you are going to run the de novo assembly portion of the workflow. Change to the env directory and create the environments with Conda:

cd /path/to/nanopore-worflow/env
conda env create -n nanopore-workflow -f nanopore-workflow_env.yml
conda env create -n medaka -f medaka_env.yml
conda env create -n quast -f quast_env.yml
conda env create -n R_env -f R_env.yml
conda activate nanopore-workflow

Before running the workflow you will need to export the paths of the four environments to your PATH variable:

export PATH="/path/to/conda/envs/nanopore-workflow/bin:$PATH"
export PATH="/path/to/conda/envs/medaka/bin:$PATH"
export PATH="/path/to/conda/envs/quast/bin:$PATH"
export PATH="/path/to/conda/envs/R_env/bin:$PATH"

nanopore-workflow dependencies:

  • bcftools
  • bedtools
  • cutesv
  • flye
  • longshot
  • nanofilt v2.8.0
  • nanoplot v1.20.0
  • nanopolish
  • seaborn v0.10.0
  • snakemake
  • sniffles
  • survivor
  • svim
  • whatshap
  • winnowmap

R_env dependencies:

  • bioconductor-karyoploter
  • bioconductor-txdb.hsapiens.ucsc.hg38.knowngene
  • bioconductor-org.hs.eg.db
  • bioconductor-dss
  • r-tidyverse
You might also like...
 A simple way to read and write LAPS passwords from linux.
A simple way to read and write LAPS passwords from linux.

A simple way to read and write LAPS passwords from linux. This script is a python setter/getter for property ms-Mcs-AdmPwd used by LAPS inspired by @s

 ⚙️ Compile, Read and update your .conf file in python
⚙️ Compile, Read and update your .conf file in python

⚙️ Compile, Read and update your .conf file in python

Discovering local read-level DNA methylation patterns and DNA methylation heterogeneity in intermediately methylated regions

Discovering local read-level DNA methylation patterns and DNA methylation heterogeneity in intermediately methylated regions

Users can read others' travel journeys in addition to being able to upload and delete posts detailing their own experiences

Users can read others' travel journeys in addition to being able to upload and delete posts detailing their own experiences! Posts are organized by country and destination within that country.

To lazy to read your homework ? Get it done with LOL

LOL To lazy to read your homework ? Get it done with LOL Needs python 3.x L:::::::::L OO:::::::::OO L:::::::::L L:::::::

Pequenos programas variados que estou praticando e implementando, leia o Read.me!

my-small-programs Pequenos programas variados que estou praticando e implementando! Arquivo: automacao Automacao de processos de rotina com código Pyt

Show my read on kindle this year

Show my kindle status on GitHub

Incident Response Process and Playbooks | Goal: Playbooks to be Mapped to MITRE Attack Techniques
Incident Response Process and Playbooks | Goal: Playbooks to be Mapped to MITRE Attack Techniques

PURPOSE OF PROJECT That this project will be created by the SOC/Incident Response Community Develop a Catalog of Incident Response Playbook for every

These are After Effects and Python files that were made in the process of creating the video for the contest.

spirograph These are After Effects and Python files that were made in the process of creating the video for the contest. In the python file you can qu

Releases(v0.1.0)
The blancmange curve can be visually built up out of triangle wave functions if the infinite sum is approximated by finite sums of the first few terms.

Blancmange-curve The blancmange curve can be visually built up out of triangle wave functions if the infinite sum is approximated by finite sums of th

Shankar Mahadevan L 1 Nov 30, 2021
Pampy: The Pattern Matching for Python you always dreamed of.

Pampy: Pattern Matching for Python Pampy is pretty small (150 lines), reasonably fast, and often makes your code more readable and hence easier to rea

Claudio Santini 3.5k Dec 30, 2022
General tricks that may help you find bad, or noisy, labels in your dataset

doubtlab A lab for bad labels. Warning still in progress. This repository contains general tricks that may help you find bad, or noisy, labels in your

vincent d warmerdam 449 Dec 26, 2022
Submission from Team OMR for the TRI-NIT Hackathon

Submission from Team OMR for the TRI-NIT Hackathon

0 Feb 01, 2022
peace-performance (Rust) binding for python. To calculate star ratings and performance points for all osu! gamemodes

peace-performance-python Fast, To calculate star ratings and performance points for all osu! gamemodes peace-performance (Rust) binding for python bas

9 Sep 19, 2022
SuperCollider library for Python

SuperCollider library for Python This project is a port of core features of SuperCollider's language to Python 3. It is intended to be the same librar

Lucas Samaruga 65 Dec 22, 2022
This is the improvised version of Dobot Magician which can be implemented for Dobot M1

pydobotM1 This is the edited driver for Dobot M1 version of the original pydobot library intended for use with the Dobot Magician. Here's what you nee

Shaik Abdullah 2 Jul 11, 2022
Syntax highlighting for yarn.lock and bun.lockb files

Yarn.lock Syntax Highlighting Syntax highlighting for yarn.lock and bun.lockb files Installation Plugin is not publushed yet on Package Control, to in

Alexander Kuznetsov 4 Jul 06, 2022
An experimental Python-to-C transpiler and domain specific language for embedded high-performance computing

An experimental Python-to-C transpiler and domain specific language for embedded high-performance computing

Andrea Zanelli 562 Dec 28, 2022
The first Python 1v1.lol triggerbot working with colors !

1v1.lol TriggerBot Afin d'utiliser mon triggerbot, vous devez activer le plein écran sur 1v1.lol sur votre naviguateur (quelque-soit ce dernier). Vous

Venax 5 Jul 25, 2022
Python dictionaries with advanced dot notation access

from box import Box movie_box = Box({ "Robin Hood: Men in Tights": { "imdb stars": 6.7, "length": 104 } }) movie_box.Robin_Hood_Men_in_Tights.imdb_s

Chris Griffith 2.1k Dec 28, 2022
Python script which synchronizes the replica-directoty with the original-one.

directories_synchronizer Python script which synchronizes the replica-directoty with the original-one. Automatically detects all changes when script i

0 Feb 13, 2022
Proyectos de ejercicios básicos y avanzados hecho en python

Proyectos Básicos y Avanzados hecho en python Instalación: Tener instalado python 3.x o superior. Tener pip instalado. Tener virtualenv o venv instala

Karlo Xavier Chok 1 Dec 27, 2021
Usos Semester average helper

Usos Semester average helper Dzieki temu skryptowi mozesz sprawdzic srednia ocen na kazdy odbyty przez ciebie semestr PARAMETERS required: '--username

2 Jan 17, 2022
Generate a wordlist to fuzz amounts or any other numerical values.

Generate a wordlist to fuzz amounts or any other numerical values. Based on Common Security Issues in Financially-Oriented Web Applications.

Ivan Šincek 3 Oct 14, 2022
Just some information about this nerd.

Greetings, mates, I am ErrorDIM - aka ErrorDimension 👋 🧬 Programming Languages I Can Use: 🥇 Top Starred Repositories: # Name Stars Size Major Langu

ErrorDIM 3 Jan 11, 2022
NeoInterface - Neo4j made easy for Python programmers!

Neointerface - Neo4j made easy for Python programmers! A Python interface to use the Neo4j graph database, and simplify its use. class NeoInterface: C

15 Dec 15, 2022
Adansons Base is a data management tool that organizes metadata of unstructured data and creates and organizes datasets.

Adansons Base is a data management tool that organizes metadata of unstructured data and creates and organizes datasets. It makes dataset creation more effective and helps find essential insights fro

Adansons Inc 27 Oct 22, 2022
Python script to combine the statistical results of a TOPAS simulation that was split up into multiple batches.

topas-merge-simulations Python script to combine the statistical results of a TOPAS simulation that was split up into multiple batches At the top of t

Sebastian Schäfer 1 Aug 16, 2022
A maubot plugin to invite users to Matrix rooms according to LDAP groups

LDAP Inviter Bot This is a maubot plugin that invites users to Matrix rooms according to their membership in LDAP groups.

David Mehren 14 Dec 09, 2022