Gapmm2: gapped alignment using minimap2 (align transcripts to genome)

Related tags

Deep Learninggapmm2
Overview

Latest Github release Conda

gapmm2: gapped alignment using minimap2

This tool is a wrapper for minimap2 to run spliced/gapped alignment, ie aligning transcripts to a genome. You are probably saying, yes minimap2 runs this with -x splice --cs option (you are correct). However, there are instances where the terminal exons from stock minimap2 alignments are missing. This tool detects those alignments that have unaligned terminal eons and uses edlib to find the terminal exon positions. The tool then updates the PAF output file with the updated information.

Rationale

We can pull out a gene model in GFF3 format that has a short 5' terminal exon:

scaffold_9	funannotate	gene	408904	409621	.	-	.	ID=OPO1_006919;
scaffold_9	funannotate	mRNA	408904	409621	.	-	.	ID=OPO1_006919-T1;Parent=OPO1_006919;product=hypothetical protein;
scaffold_9	funannotate	exon	409609	409621	.	-	.	ID=OPO1_006919-T1.exon1;Parent=OPO1_006919-T1;
scaffold_9	funannotate	exon	409320	409554	.	-	.	ID=OPO1_006919-T1.exon2;Parent=OPO1_006919-T1;
scaffold_9	funannotate	exon	409090	409255	.	-	.	ID=OPO1_006919-T1.exon3;Parent=OPO1_006919-T1;
scaffold_9	funannotate	exon	408904	409032	.	-	.	ID=OPO1_006919-T1.exon4;Parent=OPO1_006919-T1;
scaffold_9	funannotate	CDS	409609	409621	.	-	0	ID=OPO1_006919-T1.cds;Parent=OPO1_006919-T1;
scaffold_9	funannotate	CDS	409320	409554	.	-	2	ID=OPO1_006919-T1.cds;Parent=OPO1_006919-T1;
scaffold_9	funannotate	CDS	409090	409255	.	-	1	ID=OPO1_006919-T1.cds;Parent=OPO1_006919-T1;
scaffold_9	funannotate	CDS	408904	409032	.	-	0	ID=OPO1_006919-T1.cds;Parent=OPO1_006919-T1;

If we then map this transcript against the genome, we get the following PAF alignment.

$ minimap2 -x splice --cs genome.fasta cds-transcripts.fa | grep 'OPO1_006919'
OPO1_006919-T1	543	13	543	-	scaffold_9	658044	408903	409554	530	530	60	NM:i:0	ms:i:530	AS:i:466	nn:i:0	ts:A:+	tp:A:P	cm:i:167	s1:i:510	s2:i:0	de:f:0	rl:i:0	cs:Z::129~ct57ac:166~ct64ac:235

The --cs flag in minimap2 can be used to parse the coordinates (below) and you can see we are missing the 5' exon.

>>> cs2coords(408903, 13, 543, '-', ':129~ct57ac:166~ct64ac:235')
([(409320, 409554), (409090, 409255), (408904, 409032)],

So if we run this same alignment with gapmm2 we are able to properly align the 5' terminal exon.

$ gapmm2 genome.fa cds-transcripts.fa | grep 'OPO1_006919'
OPO1_006919-T1	543	0	543	-	scaffold_9	658044	408903	409621	543	543	60	tp:A:P	ts:A:+	NM:i:0	cs:Z::129~ct57ac:166~ct64ac:235~ct54ac:13
>>> cs2coords(408903, 0, 543, '-', ':129~ct57ac:166~ct64ac:235~ct54ac:13')
([(409609, 409621), (409320, 409554), (409090, 409255), (408904, 409032)]

Usage:

gapmm2 can be run as a command line script:

$ gapmm2
usage: gapmm2 [-o] [-t] [-m] [-d] [-h] [--version] reference query

gapmm2: gapped alignment with minimap2. Performs minimap2/mappy alignment with splice options and refines terminal alignments with edlib. Output is PAF format.

Positional arguments:
  reference         reference genome (FASTA)
  query             transcipts in FASTA or FASTQ

Optional arguments:
  -o , --out        output in PAF format (default: stdout)
  -t , --threads    number of threads to use with minimap2 (default: 3)
  -m , --min-mapq   minimum map quality value (default: 1)
  -d, --debug       write some debug info to stderr (default: False)

Help:
  -h, --help        Show this help message and exit
  --version         Show program's version number and exit

It can also be run as a python module. The splice_aligner function will return a list of lists containing PAF formatted data for each alignment and a dictionary of simple stats.

>>> from gapmm2.align import splice_aligner
>>> results, stats = splice_aligner('genome.fa', 'transcripts.fa')
>>> stats
{'n': 6926, 'low-mapq': 0, 'refine-left': 409, 'refine-right': 63}
>>> len(results)
6926
>>> results[0]
['OPO1_000001-T1', 2184, 0, 2184, '+', 'scaffold_1', 1803704, 887, 3127, 2184, 2184, 60, 'tp:A:P', 'ts:A:+', 'NM:i:0', 'cs:Z::958~gt56ag:1226']
>>> 

To install the python package, you can do this with pip:

python -m pip install gapmm2

To install the most updated code in master you can run:

python -m pip install git+https://github.com/nextgenusfs/gapmm2.git
You might also like...
[NAACL & ACL 2021] SapBERT: Self-alignment pretraining for BERT.
[NAACL & ACL 2021] SapBERT: Self-alignment pretraining for BERT.

SapBERT: Self-alignment pretraining for BERT This repo holds code for the SapBERT model presented in our NAACL 2021 paper: Self-Alignment Pretraining

the code of the paper: Recurrent Multi-view Alignment Network for Unsupervised Surface Registration (CVPR 2021)
the code of the paper: Recurrent Multi-view Alignment Network for Unsupervised Surface Registration (CVPR 2021)

RMA-Net This repo is the implementation of the paper: Recurrent Multi-view Alignment Network for Unsupervised Surface Registration (CVPR 2021). Paper

Pytorch implementation for
Pytorch implementation for "Implicit Feature Alignment: Learn to Convert Text Recognizer to Text Spotter".

Implicit Feature Alignment: Learn to Convert Text Recognizer to Text Spotter This is a pytorch-based implementation for paper Implicit Feature Alignme

The PyTorch improved version of TPAMI 2017 paper: Face Alignment in Full Pose Range: A 3D Total Solution.
The PyTorch improved version of TPAMI 2017 paper: Face Alignment in Full Pose Range: A 3D Total Solution.

Face Alignment in Full Pose Range: A 3D Total Solution By Jianzhu Guo. [Updates] 2020.8.30: The pre-trained model and code of ECCV-20 are made public

🧠 A PyTorch implementation of 'Deep CORAL: Correlation Alignment for Deep Domain Adaptation.', ECCV 2016
🧠 A PyTorch implementation of 'Deep CORAL: Correlation Alignment for Deep Domain Adaptation.', ECCV 2016

Deep CORAL A PyTorch implementation of 'Deep CORAL: Correlation Alignment for Deep Domain Adaptation. B Sun, K Saenko, ECCV 2016' Deep CORAL can learn

An official implementation of the paper Exploring Sequence Feature Alignment for Domain Adaptive Detection Transformers
An official implementation of the paper Exploring Sequence Feature Alignment for Domain Adaptive Detection Transformers

Sequence Feature Alignment (SFA) By Wen Wang, Yang Cao, Jing Zhang, Fengxiang He, Zheng-jun Zha, Yonggang Wen, and Dacheng Tao This repository is an o

Collective Multi-type Entity Alignment Between Knowledge Graphs (WWW'20)

CG-MuAlign A reference implementation for "Collective Multi-type Entity Alignment Between Knowledge Graphs", published in WWW 2020. If you find our pa

Seeing Dynamic Scene in the Dark: High-Quality Video Dataset with Mechatronic Alignment (ICCV2021)
Seeing Dynamic Scene in the Dark: High-Quality Video Dataset with Mechatronic Alignment (ICCV2021)

Seeing Dynamic Scene in the Dark: High-Quality Video Dataset with Mechatronic Alignment This is a pytorch project for the paper Seeing Dynamic Scene i

The tl;dr on a few notable transformer/language model papers + other papers (alignment, memorization, etc).
The tl;dr on a few notable transformer/language model papers + other papers (alignment, memorization, etc).

The tl;dr on a few notable transformer/language model papers + other papers (alignment, memorization, etc).

Releases(v0.2.0)
Owner
Jon Palmer
Jon Palmer
An official reimplementation of the method described in the INTERSPEECH 2021 paper - Speech Resynthesis from Discrete Disentangled Self-Supervised Representations.

Speech Resynthesis from Discrete Disentangled Self-Supervised Representations Implementation of the method described in the Speech Resynthesis from Di

Facebook Research 253 Jan 06, 2023
Spectrum Surveying: Active Radio Map Estimation with Autonomous UAVs

Spectrum Surveying: The Python code in this repository implements the simulations and plots the figures described in the paper “Spectrum Surveying: Ac

Universitetet i Agder 2 Dec 06, 2022
Complete* list of autonomous driving related datasets

AD Datasets Complete* and curated list of autonomous driving related datasets Contributing Contributions are very welcome! To add or update a dataset:

Daniel Bogdoll 13 Dec 19, 2022
Implementation of "Meta-rPPG: Remote Heart Rate Estimation Using a Transductive Meta-Learner"

Meta-rPPG: Remote Heart Rate Estimation Using a Transductive Meta-Learner This repository is the official implementation of Meta-rPPG: Remote Heart Ra

Eugene Lee 137 Dec 13, 2022
ReferFormer - Official Implementation of ReferFormer

The official implementation of the paper: Language as Queries for Referring Video Object Segmentation Language as Queries for Referring Video Object S

Jonas Wu 232 Dec 29, 2022
UnFlow: Unsupervised Learning of Optical Flow with a Bidirectional Census Loss

UnFlow: Unsupervised Learning of Optical Flow with a Bidirectional Census Loss This repository contains the TensorFlow implementation of the paper UnF

Simon Meister 270 Nov 06, 2022
PClean: A Domain-Specific Probabilistic Programming Language for Bayesian Data Cleaning

PClean: A Domain-Specific Probabilistic Programming Language for Bayesian Data Cleaning Warning: This is a rapidly evolving research prototype.

MIT Probabilistic Computing Project 190 Dec 27, 2022
Implementation of Invariant Point Attention, used for coordinate refinement in the structure module of Alphafold2, as a standalone Pytorch module

Invariant Point Attention - Pytorch Implementation of Invariant Point Attention as a standalone module, which was used in the structure module of Alph

Phil Wang 113 Jan 05, 2023
TACTO: A Fast, Flexible and Open-source Simulator for High-Resolution Vision-based Tactile Sensors

TACTO: A Fast, Flexible and Open-source Simulator for High-Resolution Vision-based Tactile Sensors This package provides a simulator for vision-based

Facebook Research 255 Dec 27, 2022
i-SpaSP: Structured Neural Pruning via Sparse Signal Recovery

i-SpaSP: Structured Neural Pruning via Sparse Signal Recovery This is a public code repository for the publication: i-SpaSP: Structured Neural Pruning

Cameron Ronald Wolfe 5 Nov 04, 2022
Python wrappers to the C++ library SymEngine, a fast C++ symbolic manipulation library.

SymEngine Python Wrappers Python wrappers to the C++ library SymEngine, a fast C++ symbolic manipulation library. Installation Pip See License section

136 Dec 28, 2022
Pairwise model for commonlit competition

Pairwise model for commonlit competition To run: - install requirements - create input directory with train_folds.csv and other competition data - cd

abhishek thakur 45 Aug 31, 2022
This repository contains the code for the ICCV 2019 paper "Occupancy Flow - 4D Reconstruction by Learning Particle Dynamics"

Occupancy Flow This repository contains the code for the project Occupancy Flow - 4D Reconstruction by Learning Particle Dynamics. You can find detail

189 Dec 29, 2022
We present a regularized self-labeling approach to improve the generalization and robustness properties of fine-tuning.

Overview This repository provides the implementation for the paper "Improved Regularization and Robustness for Fine-tuning in Neural Networks", which

NEU-StatsML-Research 21 Sep 08, 2022
A Pytorch implementation of CVPR 2021 paper "RSG: A Simple but Effective Module for Learning Imbalanced Datasets"

RSG: A Simple but Effective Module for Learning Imbalanced Datasets (CVPR 2021) A Pytorch implementation of our CVPR 2021 paper "RSG: A Simple but Eff

120 Dec 12, 2022
Tutorial materials for Part of NSU Intro to Deep Learning with PyTorch.

Intro to Deep Learning Materials are part of North South University (NSU) Intro to Deep Learning with PyTorch workshop series. (Slides) Related materi

Hasib Zunair 9 Jun 08, 2022
Supplementary materials to "Spin-optomechanical quantum interface enabled by an ultrasmall mechanical and optical mode volume cavity" by H. Raniwala, S. Krastanov, M. Eichenfield, and D. R. Englund, 2022

Supplementary materials to "Spin-optomechanical quantum interface enabled by an ultrasmall mechanical and optical mode volume cavity" by H. Raniwala,

Stefan Krastanov 1 Jan 17, 2022
Deep Semisupervised Multiview Learning With Increasing Views (IEEE TCYB 2021, PyTorch Code)

Deep Semisupervised Multiview Learning With Increasing Views (ISVN, IEEE TCYB) Peng Hu, Xi Peng, Hongyuan Zhu, Liangli Zhen, Jie Lin, Huaibai Yan, Dez

3 Nov 19, 2022
Auto-Encoding Score Distribution Regression for Action Quality Assessment

DAE-AQA It is an open source program reference to paper Auto-Encoding Score Distribution Regression for Action Quality Assessment. 1.Introduction DAE

13 Nov 16, 2022
Calculates JMA (Japan Meteorological Agency) seismic intensity (shindo) scale from acceleration data recorded in NumPy array

shindo.py Calculates JMA (Japan Meteorological Agency) seismic intensity (shindo) scale from acceleration data stored in NumPy array Introduction Japa

RR_Inyo 3 Sep 23, 2022