Skip to content

DIncalciLab/SAMURAI_paper_scripts

Repository files navigation

SAMURAI_paper_scripts

This repository includes all the scripts used for the paper "SAMURAI: Shallow Analysis of copy nuMber Using a Reproducible And Integrated bioinformatics pipeline".

Case Study 1 - Evaluation of SAMURAI on simulated data: Download and dilution of Test data from Smolander et al.

Step 1: Download Simulated Sample

The original simulated sample files (simulated_L001_R1_001.fastq.gz, simulated_L001_R2_001.fastq.gz) can be downloaded from Zenodo

Step 2: Align FASTQ Files

Align the downloaded FASTQ files to hg38 using BWA-MEM. You can use the following Singularity container for BWA-MEM.

Step 3: Downsample BAM Files

Downsampling is performed using Picard DownsampleSamm. You can install Picard locally or use a Singularity container:

Step 4: Produce Diluted Samples

To produce diluted samples, use the following command, changing the parameter P to simulate different coverages (e.g., 0.1, 0.3, 0.5, 0.7):

java -jar picard.jar DownsampleSam \
            I=input.bam \
            O=downsampled.bam \
            P=0.5

Case Study 1 - Evaluation of SAMURAI on simulated data: Dilution of normal samples to build the Panel of normals (PoN) for liquid biopsy test

The script download_normal_gatk.sh can be used to download GATK data to build a simulated panel of normal. Data need to be downsampled at different coverages.

The script contains the automatic download of three singularity images for sambamba, samtools and bedtools that are needed for the in-silico dilution.

The function Subsample takes as input:

  1. input_bam: Original downloaded BAM normal file (SM-74NEG.bam)
  2. desired_read_count: Desired read count for subsampling
  3. output_bam: Final diluted BAM normal file

Within the script, you can adjust the following parameters:

  • CORES : Number of cores to use
  • READ_COUNT: Number of reads for subsampling
  • NUM_SAMPLES : Number of samples to generate

The script then converts diluted samples from BAM to fastq format.

You can use the script by launching bash download_normal_gatk.sh after ajusting the parameters as you like. Alternatively, you can download data and singularity images on your own and use the different part of the script separately.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages