HiFi DeepVariant + WhatsHap workflow

Workflow steps

align HiFi reads to reference with pbmm2
call small variants with DeepVariant, using two-pass method (DeepVariant ➡️ WhatsHap phase ➡️ WhatsHap haplotag ➡️ DeepVariant)
phase small variants with WhatsHap
haplotag aligned BAMs with WhatsHap and merge

Directory structure within basedir

.
├── cluster_logs  # slurm stderr/stdout logs
├── reference
│   ├── reference.chr_lengths.txt  # cut -f1,2 reference.fasta > reference.chr_lengths.txt
│   ├── reference.fasta
│   └── reference.fasta.fai
├── samples
│   └── 
   
      # sample_id regex: r'[A-Za-z0-9_-]+'
│       ├── whatshap/  # phased small variants; merged haplotagged alignments
│       ├── logs/  # per-rule stdout/stderr logs
│       ├── aligned/  # intermediate
│       ├── deepvariant/  # intermediate
│       ├── deepvariant_intermediate/  # intermediate
│       └── whatshap_intermediate/  # intermediate
├── smrtcells
│   ├── done  # move folders from smrtcells/ready to smrtcells/done to prevent re-processing
│   └── ready
│       └── 
    
       # uBAMs or FASTQs per sample
│                        # filename regex: r'm\d{5}[Ue]?_\d{6}_\d{6}).(ccs|hifi_reads).bam' or r'm\d{5}[Ue]?_\d{6}_\d{6}).fastq.gz'
└── workflow  # clone of this repo

To run the pipeline

$ conda create \
    --channel bioconda \
    --channel conda-forge \
    --prefix ./conda_env \
    python=3 snakemake mamba lockfile

$ conda activate ./conda_env

$ sbatch workflow/run_snakemake.sh <sample_id>

HiFi DeepVariant + WhatsHap workflowHiFi DeepVariant + WhatsHap workflow

Related tags

Overview

HiFi DeepVariant + WhatsHap workflow

Workflow steps

Directory structure within basedir

To run the pipeline

Owner

William Rowell

Tutorial to pretrain & fine-tune a 🤗 Flax T5 model on a TPUv3-8 with GCP

Smart discord chatbot integrated with Dialogflow to manage different classrooms and assist in teaching!

Applied Natural Language Processing in the Enterprise - An O'Reilly Media Publication

原神抽卡记录数据集-Genshin Impact gacha data

Optimal Transport Tools (OTT), A toolbox for all things Wasserstein.

Indobenchmark are collections of Natural Language Understanding (IndoNLU) and Natural Language Generation (IndoNLG)

Code for the paper "A Simple but Tough-to-Beat Baseline for Sentence Embeddings".

PyTorch code for EMNLP 2019 paper "LXMERT: Learning Cross-Modality Encoder Representations from Transformers".

This repository has a implementations of data augmentation for NLP for Japanese.

一个基于Nonebot2和go-cqhttp的娱乐性qq机器人

Guide: Finetune GPT2-XL (1.5 Billion Parameters) and GPT-NEO (2.7 B) on a single 16 GB VRAM V100 Google Cloud instance with Huggingface Transformers using DeepSpeed

Official code for Spoken ObjectNet: A Bias-Controlled Spoken Caption Dataset

📔️ Generate a text-based journal from a template file.

运小筹公众号是致力于分享运筹优化(LP、MIP、NLP、随机规划、鲁棒优化)、凸优化、强化学习等研究领域的内容以及涉及到的算法的代码实现。

A large-scale (194k), Multiple-Choice Question Answering (MCQA) dataset designed to address realworld medical entrance exam questions.

An example project using OpenPrompt under pytorch-lightning for prompt-based SST2 sentiment analysis model

SummerTime - Text Summarization Toolkit for Non-experts

Reproducing the Linear Multihead Attention introduced in Linformer paper (Linformer: Self-Attention with Linear Complexity)

PeCo: Perceptual Codebook for BERT Pre-training of Vision Transformers

Implementation of Multistream Transformers in Pytorch