Skip to content

Commit

Permalink
update tm01
Browse files Browse the repository at this point in the history
  • Loading branch information
t-whalley committed Mar 6, 2024
1 parent 8a17c66 commit ff84e65
Show file tree
Hide file tree
Showing 19 changed files with 1,255 additions and 75 deletions.
19 changes: 13 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Lodestone Testing

## Aims
This repositry is designed to perform automated testing of [Lodestone](https://github.com/Pathogen-Genomics-Cymru/tb-pipeline) pipeline. It takes test datasets help in an [S3 bucket](https://s3.console.aws.amazon.com/s3/buckets/sp3testdata), downloads them along with the requisite Kraken2 and Bowtie databases and runs the nextflow pipeline against them based on changes to the module module files.
This repositry is designed to perform automated testing of [Lodestone](https://github.com/Pathogen-Genomics-Cymru/lodestone) pipeline. It takes test datasets help in an [S3 bucket](https://microbial-bioin-sp3.s3.climb.ac.uk/Lodestone_Testing_1.0/), downloads them along with the requisite Kraken2 and Bowtie databases and runs the nextflow pipeline against them based on changes to the module module files.

## Usage
This repo is primarily intended to be automated. However, the test scripts can still be ran as follows:
Expand All @@ -11,11 +11,18 @@ This repo is primarily intended to be automated. However, the test scripts can s
Where ```_test_ids``` refers to the test ID, ranging from TM01 to TM10. Multiple arguments can be supplied. Additional arguments can be supplied to point to the correct Kraken2 and Bowtie databases, as well to run the script with or without singularity. See ```bash run_test.sh -h``` for more information:

```
bash run_test.h -t <test_ids> -k <kraken_db> -b <bowtie_db> -i <bowtie_index>
-t <test_ids>: string of of test IDs. These can be TM[01-10].
-k <kraken_db>: Kraken2 database directory. Default = pluspf_16gb
-b <bowtie_db>: Bowtie database directory. Default = hg19_1kgmaj
-i <index>: Bowtie index prefix. Default = hg19_1kgmaj
bash run_test.h -t <test_ids> -k <kraken_db> -b <bowtie_db> -i <bowtie_index> -a <afanc_db> -r <resource_db>
-t <test_ids>: string of of test IDs. These can be TM[01-10].
Multiple arguments can be supplied but they must each have the -t flag before
them e.g. -t TM01 -t TM02
-d <data>: Directory containing test data. Default = s3://microbial-bioin-sp3/Lodestone_Testing_1.0/
-k <kraken_db>: Kraken2 database directory. Default = <empty>
-b <bowtie_db>: Bowtie database directory. Default = <empty>
-i <bowtie_index>: Bowtie index prefix. Default = <empty>
-a <afanc_db>: Afanc database directory. Default = <empty>
-r <resource_db>: Resource directory. Default = <empty>
-p <profile>: Nextflow profile. Default = climb
```

## Output
Expand Down
10 changes: 7 additions & 3 deletions run_test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,10 @@ done
shift $((OPTIND -1))

#recursively copy and update any scripts into new location so we don't have to override basedir in NF
cp -R -u -p lodestone/nextflow.config test_scripts/mainscripts
cp -R -u -p lodestone/config test_scripts/mainscripts
cp -R -u -p lodestone/singularity test_scripts/mainscripts
cp -R -u -p lodestone/docker test_scripts/mainscripts
cp -R -u -p lodestone/bin test_scripts/mainscripts

######
Expand Down Expand Up @@ -76,9 +80,9 @@ fi

if [[ $profile == "" ]]; then
#go to our default
profile="--profile climb"
profile="-profile climb"
else
profile="--profile $profile"
profile="-profile $profile"
fi

if [[ $data == "" ]]; then
Expand All @@ -97,7 +101,7 @@ $bowtie_db $bowtie_index $kraken_db $profile --pattern '*_{1,2}.fq.gz' -with-rep
for id in "${test_args[@]}"; do
#set input and output and find test script
script="test_scripts/mainscripts/${id}_main.nf"
input_dir="$data/$id"
input_dir="${data}${id}"
output_dir="${id}_out"

#run it
Expand Down
120 changes: 120 additions & 0 deletions test_scripts/docker/Dockerfile.clockwork-0.9.8
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
FROM debian:buster

LABEL maintainer="[email protected]" \
about.summary="container for the clockwork workflow"

ENV samtools_version=1.12 \
htslib_version=1.12 \
bcftools_version=1.12 \
minimap2_version=2.17 \
picard_version=2.18.16 \
gramtools_version=8af53f6c8c0d72ef95223e89ab82119b717044f2 \
vt_version=2187ff6347086e38f71bd9f8ca622cd7dcfbb40c \
minos_version=0.11.0 \
cortex_version=3a235272e4e0121be64527f01e73f9e066d378d3 \
vcftools_version=0.1.15 \
mccortex_version=97aba198d632ee98ac1aa496db33d1a7a8cb7e51 \
stampy_version=1.0.32r3761 \
python_version=3.6.5 \
clockwork_version=2364dec4cbf25c844575e19e8fe0a319d10721b5

ENV PACKAGES="procps curl git build-essential wget zlib1g-dev pkg-config jq r-base-core rsync autoconf libncurses-dev libbz2-dev liblzma-dev libcurl4-openssl-dev cmake tabix libvcflib-tools libssl-dev software-properties-common perl locales locales-all" \
PYTHON="python2.7 python-dev"

COPY bin/ /opt/bin/
ENV PATH=/opt/bin:$PATH


RUN apt-get update \
&& apt-get install -y $PACKAGES $PYTHON \
&& curl -fsSL https://www.python.org/ftp/python/${python_version}/Python-${python_version}.tgz | tar -xz \
&& cd Python-${python_version} \
&& ./configure --enable-optimizations \
&& make altinstall \
&& cd .. \
&& ln -s /usr/local/bin/python3.6 /usr/local/bin/python3 \
&& ln -s /usr/local/bin/pip3.6 /usr/local/bin/pip3 \
&& pip3 install --upgrade pip \
&& pip3 install 'cluster_vcf_records==0.13.1' pysam setuptools awscli \
&& wget -qO - https://adoptopenjdk.jfrog.io/adoptopenjdk/api/gpg/key/public | apt-key add - \
&& add-apt-repository --yes https://adoptopenjdk.jfrog.io/adoptopenjdk/deb/ \
&& apt-get update && apt-get install -y adoptopenjdk-8-hotspot


RUN curl -fsSL https://github.com/samtools/samtools/archive/${samtools_version}.tar.gz | tar -xz \
&& curl -fsSL https://github.com/samtools/htslib/releases/download/${htslib_version}/htslib-${htslib_version}.tar.bz2 | tar -xj \
&& make -C samtools-${samtools_version} -j HTSDIR=../htslib-${htslib_version} \
&& make -C samtools-${samtools_version} -j HTSDIR=../htslib-${htslib_version} prefix=/usr/local install \
&& rm -r samtools-${samtools_version} \
&& curl -fsSL https://github.com/samtools/bcftools/archive/refs/tags/${bcftools_version}.tar.gz | tar -xz \
&& make -C bcftools-${bcftools_version} -j HTSDIR=../htslib-${htslib_version} \
&& make -C bcftools-${bcftools_version} -j HTSDIR=../htslib-${htslib_version} prefix=/usr/local install \
&& rm -r bcftools-${bcftools_version}


RUN curl -fsSL minimap2-${minimap2_version}.tar.gz https://github.com/lh3/minimap2/archive/v${minimap2_version}.tar.gz | tar -xz \
&& cd minimap2-${minimap2_version} \
&& make \
&& chmod +x minimap2 \
&& mv minimap2 /usr/local/bin \
&& cd .. \
&& rm -r minimap2-${minimap2_version} \
&& wget https://github.com/broadinstitute/picard/releases/download/${picard_version}/picard.jar -O /usr/local/bin/picard.jar


RUN git clone https://github.com/atks/vt.git vt-git \
&& cd vt-git \
&& git checkout ${vt_version} \
&& make \
&& cd .. \
&& mv vt-git/vt /usr/local/bin \
&& pip3 install tox "six>=1.14.0" \
&& git clone https://github.com/iqbal-lab-org/gramtools \
&& cd gramtools \
&& git checkout ${gramtools_version} \
&& pip3 install . \
&& cd .. \
&& pip3 install cython \
&& pip3 install git+https://github.com/iqbal-lab-org/minos@v${minos_version}


RUN git clone --recursive https://github.com/iqbal-lab/cortex.git \
&& cd cortex \
&& git checkout ${cortex_version} \
&& bash install.sh \
&& make NUM_COLS=1 cortex_var \
&& make NUM_COLS=2 cortex_var \
&& cd .. \
&& mkdir bioinf-tools \
&& cd bioinf-tools \
&& curl -fsSL http://www.well.ox.ac.uk/~gerton/software/Stampy/stampy-${stampy_version}.tgz | tar -xz \
&& make -C stampy-* \
&& cp -s stampy-*/stampy.py . \
&& curl -fsSL https://github.com/vcftools/vcftools/releases/download/v${vcftools_version}/vcftools-${vcftools_version}.tar.gz | tar -xz \
&& cd vcftools-${vcftools_version} \
&& ./configure --prefix $PWD/install \
&& make && make install \
&& ln -s src/perl/ . \
&& cd .. \
&& git clone --recursive https://github.com/mcveanlab/mccortex \
&& cd mccortex \
&& git checkout ${mccortex_version} \
&& make all \
&& cd .. \
&& cp -s mccortex/bin/mccortex31 . \
&& cd .. \
&& git clone https://github.com/iqbal-lab-org/clockwork \
&& cd clockwork \
&& git checkout ${clockwork_version} \
&& cd python \
&& pip3 install . \
&& chmod +x scripts/clockwork

ENV CLOCKWORK_CORTEX_DIR=/cortex \
PATH=${PATH}:/clockwork/python/scripts \
PICARD_JAR=/usr/local/bin/picard.jar

ENV LC_ALL en_US.UTF-8 \
LANG en_US.UTF-8 \
LANGUAGE en_US.UTF-8

131 changes: 131 additions & 0 deletions test_scripts/docker/Dockerfile.preprocessing-0.9.8
Original file line number Diff line number Diff line change
@@ -0,0 +1,131 @@
FROM ubuntu:focal

LABEL maintainer="[email protected]" \
about.summary="container for the preprocessing workflow"

ENV samtools_version=1.12 \
bcftools_version=1.12 \
htslib_version=1.12 \
bedtools_version=2.29.2 \
bowtie2_version=2.4.2 \
fastp_version=0.20.1 \
fastqc_version=0.11.9 \
fqtools_version=2.3 \
kraken2_version=2.1.1 \
afanc_version=0.10.2 \
mykrobe_version=0.12.1 \
bwa_version=0.7.17 \
mash_version=2.3 \
fastani_version=1.33

ENV PACKAGES="procps curl git wget build-essential zlib1g-dev libncurses-dev libz-dev libbz2-dev liblzma-dev libcurl4-openssl-dev libgsl-dev rsync unzip ncbi-blast+ pigz jq libtbb-dev openjdk-11-jre-headless autoconf r-base-core locales locales-all" \
PYTHON="python3 python3-pip python3-dev" \
PYTHON_PACKAGES="biopython awscli boto3"

ENV PATH=${PATH}:/usr/local/bin/mccortex/bin:/usr/local/bin/bwa-${bwa_version}:/opt/edirect \
LD_LIBRARY_PATH=/usr/local/lib

RUN export DEBIAN_FRONTEND="noninteractive"

COPY bin/ /opt/bin/
ENV PATH=/opt/bin:$PATH

RUN apt-get update \
&& DEBIAN_FRONTEND="noninteractive" apt-get install -y $PACKAGES $PYTHON \
&& pip3 install --upgrade pip \
&& pip3 install $PYTHON_PACKAGES \
&& ln -s /usr/bin/python3 /usr/bin/python

RUN curl -fsSL https://github.com/samtools/samtools/archive/${samtools_version}.tar.gz | tar -xz \
&& curl -fsSL https://github.com/samtools/htslib/releases/download/${htslib_version}/htslib-${htslib_version}.tar.bz2 | tar -xj \
&& make -C samtools-${samtools_version} -j HTSDIR=../htslib-${htslib_version} \
&& make -C samtools-${samtools_version} -j HTSDIR=../htslib-${htslib_version} prefix=/usr/local install \
&& rm -r samtools-${samtools_version} \
&& curl -fsSL https://github.com/samtools/bcftools/archive/refs/tags/${bcftools_version}.tar.gz | tar -xz \
&& make -C bcftools-${bcftools_version} -j HTSDIR=../htslib-${htslib_version} \
&& make -C bcftools-${bcftools_version} -j HTSDIR=../htslib-${htslib_version} prefix=/usr/local install \
&& rm -r bcftools-${bcftools_version}

RUN curl -fsSL https://github.com/alastair-droop/fqtools/archive/v${fqtools_version}.tar.gz | tar -xz \
&& mv htslib-${htslib_version} fqtools-${fqtools_version} \
&& cd fqtools-${fqtools_version} \
&& mv htslib-${htslib_version} htslib \
&& cd htslib \
&& autoreconf -i \
&& ./configure \
&& make \
&& make install \
&& cd .. \
&& make \
&& mv bin/* /usr/local/bin \
&& chmod +x /usr/local/bin/fqtools \
&& cd .. \
&& rm -r fqtools-${fqtools_version}

RUN curl -fsSL https://github.com/arq5x/bedtools2/releases/download/v${bedtools_version}/bedtools-${bedtools_version}.tar.gz | tar -xz \
&& make -C bedtools2 \
&& mv bedtools2/bin/* /usr/local/bin \
&& rm -r bedtools2

RUN curl -fsSL https://sourceforge.net/projects/bowtie-bio/files/bowtie2/${bowtie2_version}/bowtie2-${bowtie2_version}-source.zip -o bowtie2-${bowtie2_version}-source.zip \
&& unzip bowtie2-${bowtie2_version}-source.zip \
&& make -C bowtie2-${bowtie2_version} prefix=/usr/local install \
&& rm -r bowtie2-${bowtie2_version} \
&& rm bowtie2-${bowtie2_version}-source.zip

RUN curl -fsSL https://github.com/OpenGene/fastp/archive/v${fastp_version}.tar.gz | tar -xz \
&& cd fastp-${fastp_version} \
&& make \
&& make install \
&& cd .. \
&& rm -r fastp-${fastp_version}

RUN wget https://www.bioinformatics.babraham.ac.uk/projects/fastqc/fastqc_v${fastqc_version}.zip \
&& unzip fastqc_v${fastqc_version}.zip \
&& chmod +x FastQC/fastqc \
&& mv FastQC/* /usr/local/bin \
&& rm fastqc_v${fastqc_version}.zip \
&& rm -r FastQC

RUN curl -fsSL https://github.com/DerrickWood/kraken2/archive/v${kraken2_version}.tar.gz | tar -xz \
&& cd kraken2-${kraken2_version} \
&& ./install_kraken2.sh /usr/local/bin \
&& cd ..

RUN curl -fsSL https://github.com/ArthurVM/Afanc/archive/refs/tags/v${afanc_version}-alpha.tar.gz | tar -xz \
&& cd Afanc-${afanc_version}-alpha \
&& pip3 install ./ \
&& cd .. \
&& curl -fsSL "https://github.com/marbl/Mash/releases/download/v${mash_version}/mash-Linux64-v${mash_version}.tar" | tar -x \
&& mv mash-Linux64-v${mash_version}/mash /usr/local/bin \
&& rm -r mash-Linux* \
&& wget https://github.com/ParBLiSS/FastANI/releases/download/v${fastani_version}/fastANI-Linux64-v${fastani_version}.zip \
&& unzip fastANI-Linux64-v${fastani_version}.zip \
&& mv fastANI /usr/local/bin

RUN sh -c "$(curl -fsSL ftp://ftp.ncbi.nlm.nih.gov/entrez/entrezdirect/install-edirect.sh)" \
&& mkdir -p /opt/edirect \
&& mv /root/edirect/* /opt/edirect

RUN git clone --recursive -b geno_kmer_count https://github.com/phelimb/mccortex \
&& make -C mccortex \
&& mv mccortex /usr/local/bin \
&& curl -fsSL mykrobe-${mykrobe_version}.tar.gz https://github.com/Mykrobe-tools/mykrobe/archive/v${mykrobe_version}.tar.gz | tar -xz \
&& cd mykrobe-${mykrobe_version} \
&& pip3 install requests \
&& pip3 install . \
&& ln -s /usr/local/bin/mccortex/bin/mccortex31 /usr/local/lib/python3.8/dist-packages/mykrobe/cortex/mccortex31 \
&& mykrobe panels update_metadata \
&& mykrobe panels update_species all \
&& cd ..

RUN curl -fsSL https://github.com/lh3/bwa/archive/v${bwa_version}.tar.gz | tar -C /usr/local/bin -xz \
&& make -C /usr/local/bin/bwa-${bwa_version} \
&& chmod +x /usr/local/bin/bwa-${bwa_version}/bwa

RUN unset DEBIAN_FRONTEND

ENV LC_ALL en_US.UTF-8 \
LANG en_US.UTF-8 \
LANGUAGE en_US.UTF-8

54 changes: 54 additions & 0 deletions test_scripts/docker/Dockerfile.tbprofiler-0.9.8
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
FROM mambaorg/micromamba:1.3.0 as app

#copy the reference genome to pre-compute our index
COPY resources/tuberculosis.fasta /data/tuberculosis.fasta

USER root
WORKDIR /

ARG TBPROFILER_VER="5.0.1"

# this version is the shortened commit hash on the `master` branch here https://github.com/jodyphelan/tbdb/
# commits are found on https://github.com/jodyphelan/tbdb/commits/master
# this was the latest commit as of 2023-10-26
ARG TBDB_VER="e25540b"

# LABEL instructions tag the image with metadata that might be important to the user
LABEL base.image="micromamba:1.3.0"
LABEL dockerfile.version="1"
LABEL software="tbprofiler"
LABEL software.version="${TBPROFILER_VER}"
LABEL description="The pipeline aligns reads to the H37Rv reference using bowtie2, BWA or minimap2 and then calls variants using bcftools. These variants are then compared to a drug-resistance database."
LABEL website="https://github.com/jodyphelan/TBProfiler/"
LABEL license="https://github.com/jodyphelan/TBProfiler/blob/master/LICENSE"
LABEL maintainer="John Arnn"
LABEL maintainer.email="[email protected]"
LABEL maintainer2="Curtis Kapsak"
LABEL maintainer2.email="[email protected]"

# Install dependencies via apt-get; cleanup apt garbage
RUN apt-get update && apt-get install -y --no-install-recommends \
wget \
ca-certificates \
procps && \
apt-get autoclean && rm -rf /var/lib/apt/lists/*

# install tb-profiler via bioconda; install into 'base' conda env
RUN micromamba install --yes --name base --channel conda-forge --channel bioconda \
tb-profiler=${TBPROFILER_VER}

RUN micromamba install --yes --name base --channel conda-forge --channel bioconda gatk4
RUN micromamba install --yes --name base --channel conda-forge --channel bioconda samtools
RUN micromamba install --yes --name base --channel conda-forge jq
RUN micromamba clean --all --yes

# hardcode 'base' env bin into PATH, so conda env does not have to be "activated" at run time
ENV PATH="/opt/conda/bin:${PATH}"

# Version of database can be confirmed at /opt/conda/share/tbprofiler/tbdb.version.json
# can also run 'tb-profiler list_db' to find the same version info
# In 5.0.1 updating_tbdb does not work with tb-profiler update_tbdb --commit ${TBDB_VER}
RUN tb-profiler update_tbdb --commit ${TBDB_VER}

WORKDIR /data
RUN tb-profiler update_tbdb --match_ref tuberculosis.fasta
Loading

0 comments on commit ff84e65

Please sign in to comment.