This repository includes all the scripts used for the paper "SAMURAI: Shallow Analysis of copy nuMber Using a Reproducible And Integrated bioinformatics pipeline".
Case Study 1 - Evaluation of SAMURAI on simulated data: Download and dilution of Test data from Smolander et al.
The original simulated sample files (simulated_L001_R1_001.fastq.gz
, simulated_L001_R2_001.fastq.gz
) can be downloaded from Zenodo
Align the downloaded FASTQ files to hg38
using BWA-MEM
. You can use the following Singularity container for BWA-MEM.
Downsampling is performed using Picard DownsampleSam
m. You can install Picard locally or use a Singularity container:
To produce diluted samples, use the following command, changing the parameter P
to simulate different coverages (e.g., 0.1, 0.3, 0.5, 0.7):
java -jar picard.jar DownsampleSam \
I=input.bam \
O=downsampled.bam \
P=0.5
Case Study 1 - Evaluation of SAMURAI on simulated data: Dilution of normal samples to build the Panel of normals (PoN) for liquid biopsy test
The script download_normal_gatk.sh
can be used to download GATK data to build a simulated panel of normal.
Data need to be downsampled at different coverages.
The script contains the automatic download of three singularity
images for sambamba
, samtools
and bedtools
that are needed for the in-silico dilution.
The function Subsample
takes as input:
input_bam
: Original downloadedBAM
normal file (SM-74NEG.bam
)desired_read_count
: Desired read count for subsamplingoutput_bam
: Final dilutedBAM
normal file
Within the script, you can adjust the following parameters:
CORES
: Number of cores to useREAD_COUNT
: Number of reads for subsamplingNUM_SAMPLES
: Number of samples to generate
The script then converts diluted samples from BAM
to fastq
format.
You can use the script by launching bash download_normal_gatk.sh
after ajusting the parameters as you like. Alternatively, you can download data and singularity images on your own and use the different part of the script separately.