Skip to content

Brycealong/DeepBSA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

deepbsa User Guide

Table of contents

Dependencies

Python libraries

  • pandas
  • matplotlib
  • statsmodels
  • tensorflow
  • pyinstaller
  • tqdm

Installation using conda

Create an environment (optional).

conda create -n deepbsa
conda activate deepbsa
conda install -c conda-forge python=3.11

Then install the packages using conda.

conda install -c conda-forge pandas matplotlib statsmodels tensorflow=2.15.0 pyinstaller tqdm

Clone this repository

Change directory to where you want to place this repo and use

git clone https://github.com/Brycealong/DeepBSA.git

to clone the repo. The default name of folder is DeepBSA.

Models

Please download this directory entirely and place it in the repo directory you just cloned.

Usage

python main.py -h

usage: main.py [-h] --i I [--m M [M ...]] [--p P] [--p1 P1] [--p2 P2] [--p3 P3] [--chromosomes CHROMOSOMES [CHROMOSOMES ...]]
               [--samples SAMPLES [SAMPLES ...]] [--s S] [--w W] [--t T]

options:
  -h, --help            show this help message and exit
  --i I                 The input file path(vcf/csv).
  --m M [M ...]         List of algorithms to use(DL/K/ED4/SNP/SmoothG/SmoothLOD/Ridit) used. Default is DL.
  --p P                 Whether to pretreatment data(1[True] or 0[False]). Default is True.
  --p1 P1               Pretreatment step 1: Number of read thread, the SNP whose number lower than it will be filtered. Default is 0.
  --p2 P2               Pretreatment step 2: Chi-square test(1[True] or 0[False]). Default is 1[True].
  --p3 P3               Pretreatment step 3: Continuity test(1[True] or 0[False]). Default is 1[True].
  --chromosomes CHROMOSOMES [CHROMOSOMES ...]
                        List of chromosomes to select.
  --samples SAMPLES [SAMPLES ...]
                        List of samples to select.
  --s S                 The function to smooth the result(Tri-kernel-smooth/LOWESS/Moving Average), Defalut is LOWESS
  --w W                 Windows size of LOESS. The number is range from 0-1. 0 presents the best size for minimum AICc. Default is
                        0(auto).
  --t T                 The threshold to find peaks(float). Default is 0(auto)

Example:

Move inside the directory.

cd DeepBSA

We have put the vcf file inside the directory under the path data/hq.vcf.gz.

python main.py --i data/hq.vcf.gz \
               --m DL K ED4 SNP SmoothG SmoothLOD Ridit \
               --p 1 \
               --p1 15 \
               --chromosomes 1A 1B 1D 2A 2B 2D 3A 3B 3D 4A 4B 4D 5A 5B 5D 6A 6B 6D 7A 7B 7D \
               --samples B-WT B-m \
               --s Tri-kernel-smooth \
               --w 0.75 \
               --t 0

This example is:

  • testing on all the algorithms
  • with pretreatment
  • minimum read number 15
  • choosing SNPs from 1A to 7D
  • choosing bulks B-WT and B-m
  • using smooth function Tri-kernel-smooth
  • window size fraction 0.75
  • auto calculate threshold for each algorithm

Outputs

The program will output multiple directories named Excel_Files, NoPretreatment, Pretreated_Files, some __pycache__ and Results. Inside Results:

Results
├── hq
    ├──15-DL-Tri-kernel-smooth-0.75-0.1250.png
    ├──15-DL-Tri-kernel-smooth-0.75-0.1250.pdf
    ├──15-DL-Tri-kernel-smooth-0.75-0.1250.csv
    ├──...

The program can be rerun on different datasets, and each run will create a new folder inside the Results directory in the format Results/{basename_of_file}/. For example, we have data/hq.vcf.gz and the directory is named hq. Therefore, it is essential that the VCF files have unique basenames to avoid conflicts.

Naming convention: {read_number}-{func_name}-{smooth_func}-{smooth_window_size}-{threshold}.pdf

  • read_number: --p1
  • func_name: --m
  • smooth_func: --s
  • smooth_window_size: --w (auto if set to 0)
  • threshold: --t (auto calculated if set to 0)

results:

  • 15-DL-Tri-kernel-smooth-0.75-0.1177.csv : columns in this order.

    • QTL: Identifier for the Quantitative Trait Locus.
    • Chr: Chromosome where the QTL is located.
    • Left: Left boundary of the QTL interval.
    • Peak: Peak position of the QTL.
    • Right: Right boundary of the QTL interval.
    • Value: Smoothed data of the peak position.
  • 15-DL-Tri-kernel-smooth-0.75-0.1177.png

    • dots : variant

    • orange line : smoothed data

    • blue dashed line : threshold

15-DL-Tri-kernel-smooth-0.75-0.1177

  • 15-DL-Tri-kernel-smooth-0.75-0.1177.pdf : same as png.

Same as above for other methods.

Citation

  • Li, Zhao, et al. "DeepBSA: A deep-learning algorithm improves bulked segregant analysis for dissecting complex traits." Molecular Plant 15.9 (2022): 1418-1427.
  • Dong, Jianke, et al. "QTL analysis for low temperature tolerance of wild potato species Solanum commersonii in natural field trials." Scientia Horticulturae 310 (2023): 111689.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages