Skip to content

yonniejon/sequence_dependent_methylation_analysis

Repository files navigation

Sequence Dependent Methylation Analysis

This repository contains tools and scripts for analyzing sequence-dependent DNA methylation patterns using whole-genome bisulfite sequencing (WGBS) data.

Overview

This pipeline processes aligned methylation data (BAM) to extract sequence features and correlate them with observed fragment-level methylation patterns.


WGBSTOOLS Requirement

This pipeline depends on wgbstools (>= version 0.2.0). Install here: wgbstools Typical install time is a few minutes plus downloading reference genome fasta files which can be another few minutes.

Other Requirements

moreutils, bedtools (v2.30.0), samtools (1.21), htslib (1.21), tabix (1.13+ds), blat (37x1), zsh 5.8.1 (or convert to bash), R coloc package (6.0.0), TwoSampleMR (0.6.29) numpy (1.25.2), pandas (2.3.3), scipy (1.12.0), scikit-learn (1.1.3), pybedtools (0.9.0)

Data Sources

The analysis is based on the human DNA methylation atlas.


DEMO

This repository comes with publicly available WGBS data for the purposes of a demo. It is located in "atlas_data" folder. Follow the below instructions, and in particular execute_pipeline.txt on the demo data to run the demo. The expected outputs of the demo are bimodal regions identified on the bam files and splitting WGBS reads by allele at the one specified SNP sd_asm_analysis/homog/homog_aligned/all_snps/all_gnom_ad_in_bimodal.snps_file.txt.gz .

Expected run time of the demo is 10 minutes. Expected run time on a large scale atlas is ~4 hours per WGBS sample.

Executing the full pipeline on many SNPs and many WGBS files grouped by tissue then identifies tissue-specific SD-ASM.

This demo was tested using zsh 5.8.1 (x86_64-ubuntu-linux-gnu) and Python 3.10.12 . For full python environment that this was tested on see demo/requirements.txt . Typicall install times for the git repository is less than a minute.

Usage Instructions

All processing steps and command-line instructions are documented in the file execute_pipeline.txt. To reproduce the analysis or run it on new data, please follow the sequence of commands provided in that file.

Steps for Execution:

  1. Clone the Repository Use git clone https://github.com/yonniejon/sequence_dependent_methylation_analysis.git to download the project.

  2. Data Preparation

    • Download the required BAM files from the EGA study EGAS00001006791.
    • Ensure they are placed in the expected directory structure as defined in the scripts.
  3. Run the Pipeline

    • Open the file execute_pipeline.txt.
    • Execute the steps sequentially. This includes:
      • Pre-processing and filtering of BAM files.
      • Extracting methylation states at specific CpG sites.
      • Calculating sequence-dependent features.
      • Downstream statistical analysis.

Citation

If you use this code or the provided data, please cite:

Rosenski, J., Sabag, O., et al. The genetic basis for DNA methylation variation across tissues and development. (2025). https://doi.org/10.1101/2025.09.15.675351

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors