This repository contains the code and data used in the paper: The goal of this project is to analysis mycobacterium tuberculosis Hi-C data.The codes presented here should allow to reproduce the different graphs and figures from the main text and the supplementary data.
- Dependencies
- Raw data extraction and alignment
- Building contacts map
- Construct the 3D genome model
- Find cooperative operon hubs
- Search for homologous genes
- Measure the chromatin order and the structural plasticity
- Predict KO-latent Hi-C contact matrix
Scripts and codes can be run on OS X and other Unix-based systems, and necessitate:
- Coolbox
- Pandas
- Numpy
- Matplotlib
- Scipy
- Seaborn
- Sklearn
We use the H37Rv reference genome ( NCBI:txid83332, total length 4411532).
To build the contact map , we use
The scale map tool can visualize the dispersion of contact signals along spatial scales.This function is implemented by the code.
The Directionality Index (DI) quantifies the bias of chromatin interactions toward upstream or downstream regions. It is widely used to identify TAD boundaries in 3D genome organization.we use the code to calculate.
To study the three-dimensional structure of chromosomes, we use [method] to directly simulate the positional relationships between nucleotides.
We obtained all homologous genes of Rv0047c at the bacterial level from orthoDBthen we use ncbi-blast to calculate homology relationship.
blastp -query protein.faa -out rv0047c.txt -db fasta.fa -outfmt 6 -evalue 1e-5 -num_threads 2 -max_target_seqs 10000To Measure the chromatin order and the structural plasticity,we selected Shannon entropy and Moran’s I as the evaluation metrics.