Discovering local read-level DNA methylation patterns and DNA methylation heterogeneity in intermediately methylated regions

Overview

MeConcord

  • MeConcord is a method used to investigate local read-level DNA methylation patterns for intermediately methylated regions with bisulfite sequencing data.
  • Intermediately methylated regions occupy a significant fraction of the whole genome and are markedly associated with epigenetic regulations or cell-type deconvolution of bulk data. However, these regions show distinct methylation patterns corresponding to different biological mechanisms. Although there have been some metrics developed for investigating these regions, the poor perfor-mance in antagonizing noises limits the utility for distinguishing distinct methylation patterns.
  • We proposed a method, MeConcord, with two metrics measuring local methylation concordance across reads and CpGs, respectively, with Hamming distance. MeConcord showed the most robust performance in distinguishing distinct methylation patterns (identical, uniform, and disor-dered) compared with other metrics.

Installation

  • MeConcord is implemented by Python and compatible with both Python 2 and Python 3.
  • Modules of python are required:pysam(if the input is .bam files), pandas,numpy, scipy,multiprocessing.
  • The scripts could be downloaded and used directly with command python *.py -i ....

Usage

Input

MeConcord currently only accept the output(.bam or converted to .sam) of Bismark (https://github.com/FelixKrueger/Bismark/blob/master/README.md)

Run

1.Obtaining CpG positions across genome

Usage: python pre_cpg_pos.py -i hg38.fa -o ./cpg_pos/

  • i, The path to reference sequences (.fa);
  • o, The path that you want to deposit the positions of CpG sites, each chromosome has a seperate file;
  • h, Help information

2.Converting mapped Bam, Sam, Sam.gz files from Bismark to methylation recordings read-by-read

Usage: python s1_bamToMeRecord.py -i test.bam -o test -c 0

  • i, The path to input files (.bam or .sam or .sam.gz);
  • o, Output prefix;
  • c, Clipping read ends with such base number (defalut 0); can be used when sequencing quality of read ends is not good. such as -c 5 to remove 5 bases from the both ends of the reads.
  • h, Help information

3.Spliting the big MeRecord files into small files of each chromosome to redude memory requirements in the next step

Usage: python s2_RecordSplit.py -i ./test_ReadsMethyAndMuts.txt -o ./test -g chr1,chr2,chr3,chr4,chr5

  • i, The path to s1 output. ( end with _ReadsMethyAndMuts.txt);
  • o, Output prefix;
  • g, Chromosomes used; (default chromsome 1-22); chromosomes shoud be seperated by comma;
  • h, Help information

4. Calculating concordance metrics (NRC, NCC and P-values)

Usage: python s3_RecordToMeConcord.py -p 4 -i ./test -o ./test -r ./region.bed -c ./cpgpos/ -b 150 -m 600 -z 0 -g chr1,chr2,chr3

  • i, The path to s2_RecordSplit.py output, with prefixed file name;
  • p, Threads used for parallel computation; default is 4;
  • o, Output prefix;
  • r, The files with genomic regions for computation, chrom, start, end seperated by tab;
  • c, Cpg position folder, output of pre_cpg_pos.py;
  • b, Bin size (default 150bp);
  • z, Whether is the genomic file based on 0; 0 (default) or 1; output is same to input bins; if -r is a bed file, -z should be 1;
  • g, Chromosomes used; (default chromsome 1-22); chromosomes shoud be seperated by comma;
  • m, Maximum of fragement length in sequencing library(default 600bp for paired-end reads). if there are single-end reads,m should be set as the length of reads, if not sure, default will work for most cases;

5. Methylation recordings to methylation matrix (optional)

Usage: python s4_RecordToMeMatrix.py -i ./test -o ./test -r ./p1.bed -c ./cpgpos/ -m 600 -z 0 -g chr1,chr2

  • i, The path to s2_RecordSplit.py output, with prefixed file name;
  • o, Output prefix;
  • r, The files with genomic regions for computation, chrom, start, end seperated by tab;
  • c, Cpg position folder, output of pre_cpg_pos.py;
  • z, Whether is the genomic file based on 0; 0 (default) or 1; output is same to input bins; if -r is a bed file, -z should be 1;
  • g, Chromosomes used; (default chromsome 1-22); chromosomes shoud be seperated by comma;
  • m, Maximum of reads length (default 600bp for paired-end reads). if there are single-end reads,m should be set length of reads, if not sure, default will work for most cases;

6. Visualization of methylation matrix (optional)

Usage: visualization_Matlab.m

  • Open this script and edit

    • path_to_matrix as the path you deposit the MeMatrix;
    • path_to_cpgPos as the path you deposit CpG positions of the genome, which is the result of pre_cpg_pos.py;
    • name as the name of MeMatrix, for example 'test_chr1_1287967_1288117';
  • Output: two lollipop plots, one without considering distance between CpGs, one considering distance between CpGs.

    • unmethylated CpGs are labeled as light blue
    • CpGs without signal are labeled as grey
    • methylated CpGs are labeled as dark red

Test for an example

  • STEP 1 python s1_bamToMeRecord.py -i ./test/GM12878_chr1_1286017_1294783.bam -o ./test/test -c 2 or python s1_bamToMeRecord.py -i ./test/GM12878_chr1_1286017_1294783.sam -o ./test/test -c 2 if there is no pysam module on Windows

    • The error that Could not retrieve index file for './test/GM12878_chr1_1286017_1294783.bam' doesn't affect the results.
    • Please check if there is an output in test folder, test_ReadsMethyAndMuts.txt. If yes, it works.
  • STEP 2 python s2_RecordSplit.py -i ./test/test_ReadsMethyAndMuts.txt -o ./test/test -g chr1

    • Please check if there is an output in test folder, test_ReadsMethyAndMuts_chr1.txt. If yes, it works.
  • STEP 3 python s3_RecordToMeConcord.py -p 1 -i ./test/test -o ./test/test -r ./test/tmp1.bed -c ./test/ -b 150 -m 600 -z 1 -g chr1

    • Please check if there is an output in test folder, test_MeConcord.txt. If yes, it works.
  • STEP 4 python s4_RecordToMeMatrix.py -i ./test/test -o ./test/test -r ./test/tmp2.bed -c ./test/ -m 600 -z 1 -g chr1

    • Please check if there is two output files in test folder, test_chr1_1287967_1288117_me.txt; test_chr1_1287967_1288117_unme.txt. If yes, it works.
Owner
omics tools,especially for DNA methylation
This is a method to build your own qgis configuration packages using osgeo4W.

This is a method to build your own qgis configuration packages using osgeo4W. Then you can automate deployment in your organization with a controled and trusted environnement.

Régis Haubourg 26 Dec 05, 2022
Reso is a low-level circuit design language and simulator, inspired by things like Redstone, Conway's Game of Life, and Wireworld.

Reso Reso is a low-level circuit design language and simulator, inspired by things like Redstone, Conway's Game of Life, and Wireworld. What is Reso?

Lynn 287 Nov 26, 2022
A python script to make leaderboards using a CSV with the runners name, IDs and Flag Emojis

SrcLbMaker A python script to make speedrun.com global leaderboards. Installation You need python 3.6 or higher. First, go to the folder where you wan

2 Jul 25, 2022
A partial-transpiler that converts a subset of Python to the Folders esoteric programming language

Py2Folders A partial-transpiler that converts a subset of Python to the Folders esoteric programming language Folders Folders is an esoteric programmi

Daniel Johnson 1 Dec 23, 2021
RangDev Notepad App With Python

RangDev Notepad-App-With-Python Take down quick and speedy notes! This is a small project of a notepad app built with Tkinter and SQLite3. Database cr

rangga.alrasya 1 Dec 01, 2021
A toy repo illustrating a minimal installable Python package

MyToy: a minimal Python package This repository contains a minimal, toy Python package with a few files as illustration for students of how to lay out

Fernando Perez 19 Apr 24, 2022
log4shell pwner for vulnerable minecraft servers

Log4-hell name supposed to be Log4$hell but oh well log4shell pwner for vulnerable minecraft servers install all reqs python + a minecraft client for

1 Jan 05, 2022
Paimon is a pixie (or script) who was made for anyone from {EPITECH} who are struggling with the Coding Style.

Paimon Paimon is a pixie (or script) who was made for anyone from {EPITECH} who are struggling with the Coding Style. Her goal is to assist you in you

Lyy 2 Oct 17, 2021
Create rangebased on lists or values of the range itself. Range any type. Can you imagine?

funcao-allrange-for-python3 Create rangebased on lists or values of the range itself. Range any type. Can you imagine? WARNING!!! THIS MODULE DID NOT

farioso-fernando 1 Feb 09, 2022
Python library for datamining glitch information from Gen 1 Pokémon GameBoy ROMs

g1utils This is a Python library for datamining information about various glitches (glitch Pokémon, glitch maps, etc.) from Gen 1 Pokémon ROMs. TODO A

1 Jan 13, 2022
Download and process GOES-16 and GOES-17 data from NOAA's archive on AWS using Python.

Download and display GOES-East and GOES-West data GOES-East and GOES-West satellite data are made available on Amazon Web Services through NOAA's Big

Brian Blaylock 88 Dec 16, 2022
pspsps(1) is a compyuter software to call an online catgirl to the Linux terminyal.

pspsps(1): call a catgirl from the Internyet to the Linux terminyal show processes: ps show catgirls: pspsps —@ Melissa Boiko 32 Dec 19, 2022

Sigma coding youtube - This is a collection of all the code that can be found on my YouTube channel Sigma Coding.

Sigma Coding Tutorials & Resources YouTube • Facebook Support Sigma Coding Patreon • GitHub Sponsor • Shop Amazon Table of Contents Overview Topics Re

Alex Reed 927 Jan 08, 2023
Completed task 1 and task 2 at LetsGrowMore as a data science intern.

LetsGrowMore-Internship Completed task 1 and task 2 at LetsGrowMore as a data science intern. Task 1- Task 2- Creating a Decision Tree classifier and

Sanjyot Panure 1 Jan 16, 2022
A python script to simplify recompiling, signing and installing reverse engineered android apps.

urszi.py A python script to simplify the Uninstall Recompile Sign Zipalign Install cycle when reverse engineering Android applications. It checks if d

Ahmed Harmouche 4 Jun 24, 2022
Collection of script & resources for Foundry's Nuke software.

Author: Liam Collod. Collections of scripting stuff I wrote for Foundry's Nuke software. Utilisation You can have a look at the README.md file in each

Liam Collod 1 May 14, 2022
Shopify Backend Developer Intern Challenge - Summer 2022

Shopify Backend Developer Intern The task is build an inventory tracking web application for a logistics company. The detailed task details can be fou

Meet Gandhi 11 Oct 08, 2022
Python script for changing the SSH banner content with other content

Banner-changer-py Python script for changing the SSH banner content with other content. The Script will take the content of a specified file range and

2 Nov 23, 2021
School helper, helps you at your pyllabus's.

pyllabus, helps you at your syllabus's... WARNING: It won't run without config.py! You should add config.py yourself, it will include your APIKEY. e.g

Ahmet Efe AKYAZI 6 Aug 07, 2022
Skip spotify ads by automatically restarting application when ad comes

SpotiByeAds No one likes interruptions! Don't you hate it when you're listening to your favorite jazz track or your EDM playlist and an ad for Old Spi

Partho 287 Dec 29, 2022