nextPARS, a novel Illumina-based implementation of in-vitro parallel probing of RNA structures.

Related tags

Deep LearningnextPARS
Overview

nextPARS, a novel Illumina-based implementation of in-vitro parallel probing of RNA structures.

Here you will find the scripts necessary to produce the scores described in our paper from fastq files obtained during the experiment.

Install Prerequisites

First install git:

sudo apt-get update
sudo apt-get install git-all

Then clone this repository

git clone https://github.com/jwill123/nextPARS.git

Now, ensure the necessary python packages are installed, and can be found in the $PYTHONPATH environment variable by running the script packages_for_nextPARS.sh in the nextPARS directory.

cd nextPARS/conf
chmod 775 packages_for_nextPARS.sh
./packages_for_nextPARS.sh

Convert fastq to tab

In order to go from the fastq outputs of the nextPARS experiments to a format that allows us to calculate scores, first map the reads in the fastq files to a reference using the program of your choice. Once you have obtained a bam file, use PARSParser_0.67.b.jar. This program counts the number of reads beginning at each position (which indicates a cut site for the enzyme in the file name) and outputs it in .tab format (count values for each position are separated by semi-colons).

Example usage:

java -jar PARSParser_0.67.b.jar -a bamFile -b bedFile -out outFile -q 20 -m 5

where the required arguments are:

  • -a gives the bam file of interest
  • -b is the bed file for the reference
  • -out is the name given to the output file in .tab format

Also accepts arguments:

  • -q for minimum mapping quality for reads to be included [default = 0]
  • -m for minimum average counts per position for a given transcript [default = 5.0]

Sample Data

There are sample data files found in the folder nextPARS/data, as well as the necessary fasta files in nextPARS/data/SEQS/PROBES, and the reference structures obtained from PDB in nextPARS/data/STRUCTURES/REFERENCE_STRUCTURES There are also 2 folders of sample output files from the PARSParser_0.67.b.jar program that can be used as further examples of the nextPARS score calculations described below. These folders are found in nextPARS/data/PARSParser_outputs. NOTE: these are randomly generated sequences with random enzyme values, so they are just to be used as examples for the usage of the scripts, good results should not be expected with these.

nextPARS Scores

To obtain the scores from nextPARS experiments, use the script get_combined_score.py. Sample data for the 5 PDB control structures can be found in the folder nextPARS/data/

There are a number of different command line options in the script, many of which were experimental or exploratory and are not relevant here. The useful ones in this context are the following:

  • Use the -i option [REQUIRED] to indicate the molecule for which you want scores (all available data files will be included in the calculations -- molecule name must match that in the data file names)

  • Use the -inDir option to indicate the directory containing the .tab files with read counts for each V1 and S1 enzyme cuts

  • Use the -f option to indicate the path to the fasta file for the input molecule

  • Use the -s option to produce an output Structure Preference Profile (SPP) file. Values for each position are separated by semi-colons. Here 0 = paired position, 1 = unpaired position, and NA = position with a score too low to determine its configuration.

  • Use the -o option to output the calculated scores, again with values for each position separated by semi-colons.

  • Use the --nP_only option to output the calculated nextPARS scores before incorporating the RNN classifier, again with values for each position separated by semi-colons.

  • Use the option {-V nextPARS} to produce an output with the scores that is compatible with the structure visualization program VARNA1

  • Use the option {-V spp} to produce an output with the SPP values that is compatible with VARNA.

  • Use the -t option to change the threshold value for scores when determining SPP values [default = 0.8, or -0.8 for negative scores]

  • Use the -c option to change the percentile cap for raw values at the beginning of calculations [default = 95]

  • Use the -v option to print some statistics in the case that there is a reference CT file available ( as with the example molecules, found in nextPARS/data/STRUCTURES/REFERENCE_STRUCTURES ). If not, will still print nextPARS scores and info about the enzyme .tab files included in the calculations.

Example usage:

# to produce an SPP file for the molecule TETp4p6
python get_combined_score.py -i TETp4p6 -s
# to produce a Varna-compatible output with the nextPARS scores for one of the 
# randomly generated example molecules
python get_combined_score.py -i test_37 -inDir nextPARS/data/PARSParser_outputs/test1 \
  -f nextPARS/data/PARSParser_outputs/test1/test1.fasta -V nextPARS

RNN classifier (already incorporated into the nextPARS scores above)

To run the RNN classifier separately, using a different experimental score input (in .tab format), it can be run like so with the predict2.py script:

python predict2.py -f molecule.fasta -p scoreFile.tab -o output.tab

Where the command line options are as follows:

  • the -f option [REQUIRED] is the input fasta file
  • the -p option [REQUIRED] is the input Score tab file
  • the -o option [REQUIRED] is the final Score tab output file.
  • the -w1 option is the weight for the RNN score. [default = 0.5]
  • the -w2 option is the weight for the experimental data score. [default = 0.5]

References:

  1. Darty,K., Denise,A. and Ponty,Y. (2009) VARNA: Interactive drawing and editing of the RNA secondary structure. Bioinforma. Oxf. Engl., 25, 1974–197
Owner
Jesse Willis
Jesse Willis
The project covers common metrics for super-resolution performance evaluation.

Super-Resolution Performance Evaluation Code The project covers common metrics for super-resolution performance evaluation. Metrics support The script

xmy 10 Aug 03, 2022
Reproducible research and reusable acyclic workflows in Python. Execute code on HPC systems as if you executed them on your personal computer!

Reproducible research and reusable acyclic workflows in Python. Execute code on HPC systems as if you executed them on your machine! Motivation Would

Joeri Hermans 15 Sep 11, 2022
Back to Basics: Efficient Network Compression via IMP

Back to Basics: Efficient Network Compression via IMP Authors: Max Zimmer, Christoph Spiegel, Sebastian Pokutta This repository contains the code to r

IOL Lab @ ZIB 1 Nov 19, 2021
The official PyTorch code for NeurIPS 2021 ML4AD Paper, "Does Thermal data make the detection systems more reliable?"

MultiModal-Collaborative (MMC) Learning Framework for integrating RGB and Thermal spectral modalities This is the official code for NeurIPS 2021 Machi

NeurAI 12 Nov 02, 2022
Official page of Struct-MDC (RA-L'22 with IROS'22 option); Depth completion from Visual-SLAM using point & line features

Struct-MDC (click the above buttons for redirection!) Official page of "Struct-MDC: Mesh-Refined Unsupervised Depth Completion Leveraging Structural R

Urban Robotics Lab. @ KAIST 37 Dec 22, 2022
Unofficial Tensorflow 2 implementation of the paper Implicit Neural Representations with Periodic Activation Functions

Siren: Implicit Neural Representations with Periodic Activation Functions The unofficial Tensorflow 2 implementation of the paper Implicit Neural Repr

Seyma Yucer 2 Jun 27, 2022
Code reproduce for paper "Vehicle Re-identification with Viewpoint-aware Metric Learning"

VANET Code reproduce for paper "Vehicle Re-identification with Viewpoint-aware Metric Learning" Introduction This is the implementation of article VAN

EMDATA-AILAB 23 Dec 26, 2022
Explainability for Vision Transformers (in PyTorch)

Explainability for Vision Transformers (in PyTorch) This repository implements methods for explainability in Vision Transformers

Jacob Gildenblat 442 Jan 04, 2023
Code for unmixing audio signals in four different stems "drums, bass, vocals, others". The code is adapted from "Jukebox: A Generative Model for Music"

Status: Archive (code is provided as-is, no updates expected) Disclaimer This code is a based on "Jukebox: A Generative Model for Music" Paper We adju

Wadhah Zai El Amri 24 Dec 29, 2022
[2021][ICCV][FSNet] Full-Duplex Strategy for Video Object Segmentation

Full-Duplex Strategy for Video Object Segmentation (ICCV, 2021) Authors: Ge-Peng Ji, Keren Fu, Zhe Wu, Deng-Ping Fan*, Jianbing Shen, & Ling Shao This

Daniel-Ji 55 Dec 22, 2022
ConvMAE: Masked Convolution Meets Masked Autoencoders

ConvMAE ConvMAE: Masked Convolution Meets Masked Autoencoders Peng Gao1, Teli Ma1, Hongsheng Li2, Jifeng Dai3, Yu Qiao1, 1 Shanghai AI Laboratory, 2 M

Alpha VL Team of Shanghai AI Lab 345 Jan 08, 2023
Generate images from texts. In Russian

ruDALL-E Generate images from texts pip install rudalle==1.1.0rc0 🤗 HF Models: ruDALL-E Malevich (XL) ruDALL-E Emojich (XL) (readme here) ruDALL-E S

AI Forever 1.6k Dec 31, 2022
CAMPARI: Camera-Aware Decomposed Generative Neural Radiance Fields

CAMPARI: Camera-Aware Decomposed Generative Neural Radiance Fields Paper | Supplementary | Video | Poster If you find our code or paper useful, please

26 Nov 29, 2022
Y. Zhang, Q. Yao, W. Dai, L. Chen. AutoSF: Searching Scoring Functions for Knowledge Graph Embedding. IEEE International Conference on Data Engineering (ICDE). 2020

AutoSF The code for our paper "AutoSF: Searching Scoring Functions for Knowledge Graph Embedding" and this paper has been accepted by ICDE2020. News:

AutoML Research 64 Dec 17, 2022
Keras Image Embeddings using Contrastive Loss

Image to Embedding projection in vector space. Implementation in keras and tensorflow of batch all triplet loss for one-shot/few-shot learning.

Shravan Anand K 5 Mar 21, 2022
Small utility to demangle Nim symbols in callgrind files

nim_callgrind A small utility to demangle Nim symbols from callgrind files. Usage Run your (Nim) program with something like this: valgrind --tool=cal

kraptor 3 Feb 15, 2022
Real-Time-Student-Attendence-System - Real Time Student Attendence System

Real-Time-Student-Attendence-System The Student Attendance Management System Pro

Rounak Das 1 Feb 15, 2022
This repository contains the re-implementation of our paper deSpeckNet: Generalizing Deep Learning Based SAR Image Despeckling

deSpeckNet-TF-GEE This repository contains the re-implementation of our paper deSpeckNet: Generalizing Deep Learning Based SAR Image Despeckling publi

Adugna Mullissa 16 Sep 07, 2022
A fast and easy to use, moddable, Python based Minecraft server!

PyMine PyMine - The fastest, easiest to use, Python-based Minecraft Server! Features Note: This list is not always up to date, and doesn't contain all

PyMine 144 Dec 30, 2022
Technical Analysis library in pandas for backtesting algotrading and quantitative analysis

bta-lib - A pandas based Technical Analysis Library bta-lib is pandas based technical analysis library and part of the backtrader family. Links Main P

DRo 393 Dec 20, 2022