Discovering local read-level DNA methylation patterns and DNA methylation heterogeneity in intermediately methylated regions

Overview

MeConcord

  • MeConcord is a method used to investigate local read-level DNA methylation patterns for intermediately methylated regions with bisulfite sequencing data.
  • Intermediately methylated regions occupy a significant fraction of the whole genome and are markedly associated with epigenetic regulations or cell-type deconvolution of bulk data. However, these regions show distinct methylation patterns corresponding to different biological mechanisms. Although there have been some metrics developed for investigating these regions, the poor perfor-mance in antagonizing noises limits the utility for distinguishing distinct methylation patterns.
  • We proposed a method, MeConcord, with two metrics measuring local methylation concordance across reads and CpGs, respectively, with Hamming distance. MeConcord showed the most robust performance in distinguishing distinct methylation patterns (identical, uniform, and disor-dered) compared with other metrics.

Installation

  • MeConcord is implemented by Python and compatible with both Python 2 and Python 3.
  • Modules of python are required:pysam(if the input is .bam files), pandas,numpy, scipy,multiprocessing.
  • The scripts could be downloaded and used directly with command python *.py -i ....

Usage

Input

MeConcord currently only accept the output(.bam or converted to .sam) of Bismark (https://github.com/FelixKrueger/Bismark/blob/master/README.md)

Run

1.Obtaining CpG positions across genome

Usage: python pre_cpg_pos.py -i hg38.fa -o ./cpg_pos/

  • i, The path to reference sequences (.fa);
  • o, The path that you want to deposit the positions of CpG sites, each chromosome has a seperate file;
  • h, Help information

2.Converting mapped Bam, Sam, Sam.gz files from Bismark to methylation recordings read-by-read

Usage: python s1_bamToMeRecord.py -i test.bam -o test -c 0

  • i, The path to input files (.bam or .sam or .sam.gz);
  • o, Output prefix;
  • c, Clipping read ends with such base number (defalut 0); can be used when sequencing quality of read ends is not good. such as -c 5 to remove 5 bases from the both ends of the reads.
  • h, Help information

3.Spliting the big MeRecord files into small files of each chromosome to redude memory requirements in the next step

Usage: python s2_RecordSplit.py -i ./test_ReadsMethyAndMuts.txt -o ./test -g chr1,chr2,chr3,chr4,chr5

  • i, The path to s1 output. ( end with _ReadsMethyAndMuts.txt);
  • o, Output prefix;
  • g, Chromosomes used; (default chromsome 1-22); chromosomes shoud be seperated by comma;
  • h, Help information

4. Calculating concordance metrics (NRC, NCC and P-values)

Usage: python s3_RecordToMeConcord.py -p 4 -i ./test -o ./test -r ./region.bed -c ./cpgpos/ -b 150 -m 600 -z 0 -g chr1,chr2,chr3

  • i, The path to s2_RecordSplit.py output, with prefixed file name;
  • p, Threads used for parallel computation; default is 4;
  • o, Output prefix;
  • r, The files with genomic regions for computation, chrom, start, end seperated by tab;
  • c, Cpg position folder, output of pre_cpg_pos.py;
  • b, Bin size (default 150bp);
  • z, Whether is the genomic file based on 0; 0 (default) or 1; output is same to input bins; if -r is a bed file, -z should be 1;
  • g, Chromosomes used; (default chromsome 1-22); chromosomes shoud be seperated by comma;
  • m, Maximum of fragement length in sequencing library(default 600bp for paired-end reads). if there are single-end reads,m should be set as the length of reads, if not sure, default will work for most cases;

5. Methylation recordings to methylation matrix (optional)

Usage: python s4_RecordToMeMatrix.py -i ./test -o ./test -r ./p1.bed -c ./cpgpos/ -m 600 -z 0 -g chr1,chr2

  • i, The path to s2_RecordSplit.py output, with prefixed file name;
  • o, Output prefix;
  • r, The files with genomic regions for computation, chrom, start, end seperated by tab;
  • c, Cpg position folder, output of pre_cpg_pos.py;
  • z, Whether is the genomic file based on 0; 0 (default) or 1; output is same to input bins; if -r is a bed file, -z should be 1;
  • g, Chromosomes used; (default chromsome 1-22); chromosomes shoud be seperated by comma;
  • m, Maximum of reads length (default 600bp for paired-end reads). if there are single-end reads,m should be set length of reads, if not sure, default will work for most cases;

6. Visualization of methylation matrix (optional)

Usage: visualization_Matlab.m

  • Open this script and edit

    • path_to_matrix as the path you deposit the MeMatrix;
    • path_to_cpgPos as the path you deposit CpG positions of the genome, which is the result of pre_cpg_pos.py;
    • name as the name of MeMatrix, for example 'test_chr1_1287967_1288117';
  • Output: two lollipop plots, one without considering distance between CpGs, one considering distance between CpGs.

    • unmethylated CpGs are labeled as light blue
    • CpGs without signal are labeled as grey
    • methylated CpGs are labeled as dark red

Test for an example

  • STEP 1 python s1_bamToMeRecord.py -i ./test/GM12878_chr1_1286017_1294783.bam -o ./test/test -c 2 or python s1_bamToMeRecord.py -i ./test/GM12878_chr1_1286017_1294783.sam -o ./test/test -c 2 if there is no pysam module on Windows

    • The error that Could not retrieve index file for './test/GM12878_chr1_1286017_1294783.bam' doesn't affect the results.
    • Please check if there is an output in test folder, test_ReadsMethyAndMuts.txt. If yes, it works.
  • STEP 2 python s2_RecordSplit.py -i ./test/test_ReadsMethyAndMuts.txt -o ./test/test -g chr1

    • Please check if there is an output in test folder, test_ReadsMethyAndMuts_chr1.txt. If yes, it works.
  • STEP 3 python s3_RecordToMeConcord.py -p 1 -i ./test/test -o ./test/test -r ./test/tmp1.bed -c ./test/ -b 150 -m 600 -z 1 -g chr1

    • Please check if there is an output in test folder, test_MeConcord.txt. If yes, it works.
  • STEP 4 python s4_RecordToMeMatrix.py -i ./test/test -o ./test/test -r ./test/tmp2.bed -c ./test/ -m 600 -z 1 -g chr1

    • Please check if there is two output files in test folder, test_chr1_1287967_1288117_me.txt; test_chr1_1287967_1288117_unme.txt. If yes, it works.
Owner
omics tools,especially for DNA methylation
Rename and categorize your DMOJ solutions

DMOJ Downloader What is this for? DMOJ lets you download the code for all your solutions, however the files are just named as numbers

Evan Wild 1 Dec 04, 2022
App and Python library for parsing, writing, and validation of the STAND013 file format.

python-stand013 python-stand013 is a Python app and library for parsing, writing, and validation of the STAND013 file format. Features The following i

Oda 3 Nov 09, 2022
jmespath.rs Python binding

rjmespath-py jmespath.rs Python binding.

messense 3 Dec 14, 2022
SimBiber - A tool for simplifying bibtex with official info

SimBiber: A tool for simplifying bibtex with official info. We often need to sim

336 Jan 02, 2023
Python script for converting obsidian md-file to html (recursively adds all link/images)

ObsidianToHtmlConverter I made a small python script for converting obsidian md-file to static (local) html (recursively adds all link/images) I made

47 Jan 03, 2023
App to decide weekly winners in H2H 1 Win (9 Cat)

Fantasy Weekly Winner for H2H 1 Win (9 Cat) Yahoo Fantasy API Read

Sai Atmakuri 1 Dec 31, 2021
A Python wrapper API for operating and working with the Neo4j Graph Data Science (GDS) library

gdsclient NOTE: This is a work in progress and many GDS features are known to be missing or not working properly. This repo hosts the sources for gdsc

Neo4j 100 Dec 20, 2022
Learn Python tips, tools, and techniques in around 5 minutes each.

Python shorts Learn Python tips, tools, and techniques in around 5 minutes each. Watch on YouTube Subscribe on YouTube to keep up with all the videos.

Michael Kennedy 28 Jan 01, 2023
Hacking and Learning consistently for 100 days straight af.

#100DaysOfHacking Hacking and Learning consistently for 100 days straight af. [yes, no breaks except mental-break ones, Obviously.] This Repo is one s

FENIL SHAH 17 Sep 09, 2022
A Lynx that manages a group that puts the federation first.

Lynx Super Federation Management Group Lynx was created to manage your groups on telegram and focuses on the Lynx Federation. I made this to root out

Unknown 2 Nov 01, 2022
Force you (or your user) annotate Python function type hints.

Must-typing Force you (or your user) annotate function type hints. Notice: It's more like a joke, use it carefully. If you call must_typing in your mo

Konge 13 Feb 19, 2022
Click2call for asterisk with python

Click2call para Asterisk com Python Este projeto disponibiliza uma API construíd

Benedito Marques 1 Jan 17, 2022
A totally unrealistic cell growth/reproduction simulation.

A totally unrealistic cell growth/reproduction simulation.

Andrien Wiandyano 1 Oct 24, 2021
A lightweight solution for local Particle development.

neopo A lightweight solution for local Particle development. Features Builds Particle projects locally without any overhead. Compatible with Particle

Nathan Robinson 19 Jan 01, 2023
An After Effects render queue for ShotGrid Toolkit.

AEQueue An After Effects render queue for ShotGrid Toolkit. Features Render multiple comps to locations defined by templates in your Toolkit config. C

Brand New School 5 Nov 20, 2022
A streamlit app for exploring image search results from HuggingPics

title emoji colorFrom colorTo sdk app_file pinned huggingpics-explorer 🤗 blue red streamlit app.py false huggingpics-explorer A streamlit app for exp

Nathan Raw 4 Sep 10, 2022
Load dependent libraries dynamically.

dypend dypend Load dependent libraries dynamically. A few days ago, I encountered many users feedback in an open source project. The Problem is they c

Louis 5 Mar 02, 2022
Drug Discovery App Using Lipinski's Rule-of-Five.

Drug Discovery App A Drug Discovery App Using Lipinski's Rule-of-Five. TAPIWA CHAMBOKO 🚀 About Me I'm a full stack developer experienced in deploying

tapiwa chamboko 3 Nov 08, 2022
Change ACLs for QNAP LXD unprivileged container.

qnaplxdunpriv If Advanced Folder Permissions is enabled in QNAP NAS, unprivileged LXD containers won't start. qnaplxdunpriv changes ACLs of some Conta

1 Jan 10, 2022
Reproducible nvim completion framework benchmarks.

Nvim.Bench Reproducible nvim completion framework benchmarks. Runs inside Docker. Fair and balanced Methodology Note: for all "randomness", they are g

i love my dog 14 Nov 20, 2022