nextPARS, a novel Illumina-based implementation of in-vitro parallel probing of RNA structures.

Related tags

Deep LearningnextPARS
Overview

nextPARS, a novel Illumina-based implementation of in-vitro parallel probing of RNA structures.

Here you will find the scripts necessary to produce the scores described in our paper from fastq files obtained during the experiment.

Install Prerequisites

First install git:

sudo apt-get update
sudo apt-get install git-all

Then clone this repository

git clone https://github.com/jwill123/nextPARS.git

Now, ensure the necessary python packages are installed, and can be found in the $PYTHONPATH environment variable by running the script packages_for_nextPARS.sh in the nextPARS directory.

cd nextPARS/conf
chmod 775 packages_for_nextPARS.sh
./packages_for_nextPARS.sh

Convert fastq to tab

In order to go from the fastq outputs of the nextPARS experiments to a format that allows us to calculate scores, first map the reads in the fastq files to a reference using the program of your choice. Once you have obtained a bam file, use PARSParser_0.67.b.jar. This program counts the number of reads beginning at each position (which indicates a cut site for the enzyme in the file name) and outputs it in .tab format (count values for each position are separated by semi-colons).

Example usage:

java -jar PARSParser_0.67.b.jar -a bamFile -b bedFile -out outFile -q 20 -m 5

where the required arguments are:

  • -a gives the bam file of interest
  • -b is the bed file for the reference
  • -out is the name given to the output file in .tab format

Also accepts arguments:

  • -q for minimum mapping quality for reads to be included [default = 0]
  • -m for minimum average counts per position for a given transcript [default = 5.0]

Sample Data

There are sample data files found in the folder nextPARS/data, as well as the necessary fasta files in nextPARS/data/SEQS/PROBES, and the reference structures obtained from PDB in nextPARS/data/STRUCTURES/REFERENCE_STRUCTURES There are also 2 folders of sample output files from the PARSParser_0.67.b.jar program that can be used as further examples of the nextPARS score calculations described below. These folders are found in nextPARS/data/PARSParser_outputs. NOTE: these are randomly generated sequences with random enzyme values, so they are just to be used as examples for the usage of the scripts, good results should not be expected with these.

nextPARS Scores

To obtain the scores from nextPARS experiments, use the script get_combined_score.py. Sample data for the 5 PDB control structures can be found in the folder nextPARS/data/

There are a number of different command line options in the script, many of which were experimental or exploratory and are not relevant here. The useful ones in this context are the following:

  • Use the -i option [REQUIRED] to indicate the molecule for which you want scores (all available data files will be included in the calculations -- molecule name must match that in the data file names)

  • Use the -inDir option to indicate the directory containing the .tab files with read counts for each V1 and S1 enzyme cuts

  • Use the -f option to indicate the path to the fasta file for the input molecule

  • Use the -s option to produce an output Structure Preference Profile (SPP) file. Values for each position are separated by semi-colons. Here 0 = paired position, 1 = unpaired position, and NA = position with a score too low to determine its configuration.

  • Use the -o option to output the calculated scores, again with values for each position separated by semi-colons.

  • Use the --nP_only option to output the calculated nextPARS scores before incorporating the RNN classifier, again with values for each position separated by semi-colons.

  • Use the option {-V nextPARS} to produce an output with the scores that is compatible with the structure visualization program VARNA1

  • Use the option {-V spp} to produce an output with the SPP values that is compatible with VARNA.

  • Use the -t option to change the threshold value for scores when determining SPP values [default = 0.8, or -0.8 for negative scores]

  • Use the -c option to change the percentile cap for raw values at the beginning of calculations [default = 95]

  • Use the -v option to print some statistics in the case that there is a reference CT file available ( as with the example molecules, found in nextPARS/data/STRUCTURES/REFERENCE_STRUCTURES ). If not, will still print nextPARS scores and info about the enzyme .tab files included in the calculations.

Example usage:

# to produce an SPP file for the molecule TETp4p6
python get_combined_score.py -i TETp4p6 -s
# to produce a Varna-compatible output with the nextPARS scores for one of the 
# randomly generated example molecules
python get_combined_score.py -i test_37 -inDir nextPARS/data/PARSParser_outputs/test1 \
  -f nextPARS/data/PARSParser_outputs/test1/test1.fasta -V nextPARS

RNN classifier (already incorporated into the nextPARS scores above)

To run the RNN classifier separately, using a different experimental score input (in .tab format), it can be run like so with the predict2.py script:

python predict2.py -f molecule.fasta -p scoreFile.tab -o output.tab

Where the command line options are as follows:

  • the -f option [REQUIRED] is the input fasta file
  • the -p option [REQUIRED] is the input Score tab file
  • the -o option [REQUIRED] is the final Score tab output file.
  • the -w1 option is the weight for the RNN score. [default = 0.5]
  • the -w2 option is the weight for the experimental data score. [default = 0.5]

References:

  1. Darty,K., Denise,A. and Ponty,Y. (2009) VARNA: Interactive drawing and editing of the RNA secondary structure. Bioinforma. Oxf. Engl., 25, 1974–197
Owner
Jesse Willis
Jesse Willis
NeuralCompression is a Python repository dedicated to research of neural networks that compress data

NeuralCompression is a Python repository dedicated to research of neural networks that compress data. The repository includes tools such as JAX-based entropy coders, image compression models, video c

Facebook Research 297 Jan 06, 2023
This repo is the official implementation of "L2ight: Enabling On-Chip Learning for Optical Neural Networks via Efficient in-situ Subspace Optimization".

L2ight is a closed-loop ONN on-chip learning framework to enable scalable ONN mapping and efficient in-situ learning. L2ight adopts a three-stage learning flow that first calibrates the complicated p

Jiaqi Gu 9 Jul 14, 2022
Pytorch Implementation of Spiking Neural Networks Calibration, ICML 2021

SNN_Calibration Pytorch Implementation of Spiking Neural Networks Calibration, ICML 2021 Feature Comparison of SNN calibration: Features SNN Direct Tr

Yuhang Li 60 Dec 27, 2022
PyTorch Implementation of Backbone of PicoDet

PicoDet-Backbone PyTorch Implementation of Backbone of PicoDet Original Implementation is implemented on PaddlePaddle. Example picodet_l_backbone = ES

Yonghye Kwon 7 Jul 12, 2022
Robustness between the worst and average case

Robustness between the worst and average case A repository that implements intermediate robustness training and evaluation from the NeurIPS 2021 paper

CMU Locus Lab 16 Dec 02, 2022
Continuum Learning with GEM: Gradient Episodic Memory

Gradient Episodic Memory for Continual Learning Source code for the paper: @inproceedings{GradientEpisodicMemory, title={Gradient Episodic Memory

Facebook Research 360 Dec 27, 2022
Compartmental epidemic model to assess undocumented infections: applications to SARS-CoV-2 epidemics in Brazil - Datasets and Codes

Compartmental epidemic model to assess undocumented infections: applications to SARS-CoV-2 epidemics in Brazil - Datasets and Codes The codes for simu

1 Jan 12, 2022
Labelbox is the fastest way to annotate data to build and ship artificial intelligence applications

Labelbox Labelbox is the fastest way to annotate data to build and ship artificial intelligence applications. Use this github repository to help you s

labelbox 1.7k Dec 29, 2022
Implementation of SegNet: A Deep Convolutional Encoder-Decoder Architecture for Semantic Pixel-Wise Labelling

Caffe SegNet This is a modified version of Caffe which supports the SegNet architecture As described in SegNet: A Deep Convolutional Encoder-Decoder A

Alex Kendall 1.1k Jan 02, 2023
DanceTrack: Multiple Object Tracking in Uniform Appearance and Diverse Motion

DanceTrack DanceTrack is a benchmark for tracking multiple objects in uniform appearance and diverse motion. DanceTrack provides box and identity anno

260 Dec 28, 2022
Llvlir - Low Level Variable Length Intermediate Representation

Low Level Variable Length Intermediate Representation Low Level Variable Length

Michael Clark 2 Jan 24, 2022
Prometheus Exporter for data scraped from datenplattform.darmstadt.de

darmstadt-opendata-exporter Scrapes data from https://datenplattform.darmstadt.de and presents it in the Prometheus Exposition format. Pull requests w

Martin Weinelt 2 Apr 12, 2022
Model search is a framework that implements AutoML algorithms for model architecture search at scale

Model search (MS) is a framework that implements AutoML algorithms for model architecture search at scale. It aims to help researchers speed up their exploration process for finding the right model a

Google 3.2k Dec 31, 2022
Official PyTorch implementation of the paper "Likelihood Training of Schrödinger Bridge using Forward-Backward SDEs Theory (SB-FBSDE)"

Official PyTorch implementation of the paper "Likelihood Training of Schrödinger Bridge using Forward-Backward SDEs Theory (SB-FBSDE)" which introduces a new class of deep generative models that gene

Guan-Horng Liu 43 Jan 03, 2023
This code is for eCaReNet: explainable Cancer Relapse Prediction Network.

eCaReNet This code is for eCaReNet: explainable Cancer Relapse Prediction Network. (Towards Explainable End-to-End Prostate Cancer Relapse Prediction

Institute of Medical Systems Biology 2 Jul 28, 2022
Code for Massive-scale Decoding for Text Generation using Lattices

Massive-scale Decoding for Text Generation using Lattices Jiacheng Xu, Greg Durrett TL;DR: a new search algorithm to construct lattices encoding many

Jiacheng Xu 37 Dec 18, 2022
Code and datasets for TPAMI 2021

SkeletonNet This repository constains the codes and ShapeNetV1-Surface-Skeleton,ShapNetV1-SkeletalVolume and 2d image datasets ShapeNetRendering. Plea

34 Aug 15, 2022
A simple, high level, easy-to-use open source Computer Vision library for Python.

ZoomVision : Slicing Aid Detection A simple, high level, easy-to-use open source Computer Vision library for Python. Installation Installing dependenc

Nurettin Sinanoğlu 2 Mar 04, 2022
Lighthouse: Predicting Lighting Volumes for Spatially-Coherent Illumination

Lighthouse: Predicting Lighting Volumes for Spatially-Coherent Illumination Pratul P. Srinivasan, Ben Mildenhall, Matthew Tancik, Jonathan T. Barron,

Pratul Srinivasan 65 Dec 14, 2022
A PyTorch implementation: "LASAFT-Net-v2: Listen, Attend and Separate by Attentively aggregating Frequency Transformation"

LASAFT-Net-v2 Listen, Attend and Separate by Attentively aggregating Frequency Transformation Woosung Choi, Yeong-Seok Jeong, Jinsung Kim, Jaehwa Chun

Woosung Choi 29 Jun 04, 2022