-
Notifications
You must be signed in to change notification settings - Fork 10
New architecture (2019)
Support multiple tooling systems (Lmod, Conda, Singularity containers). The tooling system will take care of adding the tool to the PATH.
- NGSpipe2go/config/tools.groovy (new) --> contains 1) the definition of all run strings to add the tool to the path, 2) a list with the default running environments and versions for each tool, and 3) a function to retrieve the run string to add the tool to the path.
// locations
def conda_tools = "/fsimb/common/conda_tools"
def singularity_tools = "/fsimb/common/singularity_tools"
// defaults
tools_defaults = [
picard: [ runenv: "lmod", version: "2.7" ],
<...>
]
// prepare environment
tools_prepare_env = [
<...>
picard: [
"2.7": [
lmod: "module load picard/2.7.0"
],
"2.18": [
conda: "source activate ${conda_tools}/picard/2.18.26",
singularity: "alias picard=\"singularity run --app picard ${singularity_tools}/picard/2.18.17r0/picard.simg\""
]
]
<...>
- NGSpipe2go/pipelines/<pipeline>/tools.groovy --> contains the custom run environments and versions for tools, which overwrite the defaults defined in the file above.
load PIPELINE_ROOT + "/config/tools.groovy" // load defaults
tools_custom = [
picard: [ runenv: "conda", version: "2.18" ]
]
tools = new LinkedHashMap(tools_defaults) // create new tools map
tools.putAll(tools_custom) // merge user and defaults
- NGSpipe2go/modules/<group>/<module>.groovy --> calls the function to get the run string to add the tool to the path.
bowtie_se = {
<...>
def TOOL_ENV = prepare_tool_env("bowtie",
tools["bowtie"]["version"],
tools["bowtie"]["runenv"])
transform(".fastq.gz") to (".bam") {
exec """
${TOOL_ENV} &&
<...>
""","bowtie_se"
}
}
Remove any trace of tools being called with absolute paths. Also, tools included in NGSpipe2go will now be called relative to the pipeline root folder (see example).
- NGSpipe2go/modules/<group>/<module>.groovy
dupradar = {
<...>
transform(".bam") to("_dupRadar.png") {
exec """
<...>
Rscript ${PIPELINE_ROOT}/tools/dupRadar/dupRadar.R bam=$input $DUPRADAR_FLAGS
<...>
""","dupradar"
}
}
- conditionally run stages (with a message)
- collect different files
- behavior defined in essential vars
- NGSpipe2go/pipelines/<pipeline>/essential.vars.groovy Define some global variables at the end of the file:
<...>
// optional pipeline stages to include
RUN_TRACKHUB=false
RUN_IN_PAIRED_END_MODE=(ESSENTIAL_PAIRED == "yes")
- NGSpipe2go/pipelines/<pipeline>/<pipeline>.groovy
<...>
// Main pipeline task
dontrun = { println "didn't run $module" }
run {
"%.fastq.gz" * [ FastQC ] +
(RUN_IN_PAIRED_END_MODE ? "%.R*.fastq.gz" * [ BWA_pe ] : "%.fastq.gz" * [ BWA_se ] ) +
"%.bam" * [ RmDups + BAMindexer + IndelRealignment + BaseRecalibration + [ VariantCallHC, VariantCallUG ] ] +
"%.vcf.gz" * [ VariantEval ] +
(RUN_TRACKHUB ? trackhub_config + trackhub : dontrun.using(module:"trackhub")) +
collectBpipeLogs + shinyReports
}
Concentrate in one unique place common commands that all modules need to run.
- get_preamble(): defined in config/preambles.groovy
- returns custom preamble for module (if exists), default preamble otherwise
- NGSpipe2go/config/preambles.groovy Definition of the default preamble:
default_preamble="""
export TMP="$TMP";
if [ ! -d "\$TMP" ]; then
mkdir -p "\$TMP";
fi;
if [ -n "\$SLURM_JOBID" ]; then
export TMP="/jobdir/\$SLURM_JOBID";
fi
"""
Definition of the module specific preambles that replace the default preamble:
```groovy
module_preambles=[
default: default_preamble,
"bowtie": default_preamble + " && echo Running bowtie version: \$(bowtie --version | grep version)"
]
Function that picks the right preamble for a module:
String get_preamble (String module) {
return (module_preambles.containsKey(module) ? module_preambles[module] : module_preambles.default)
}
- NGSpipe2go/modules//.groovy
bowtie_se = {
<...>
def TOOL_ENV = prepare_tool_env("bowtie",
tools["bowtie"]["version"],
tools["bowtie"]["runenv"])
def PREAMBLE = get_preamble(“bowtie_se”)
transform(".fastq.gz") to (".bam") {
exec """
${TOOL_ENV} &&
${PREAMBLE} &&
<...>
""","bowtie_se"
}
}
- NGSpipe2go/pipelines//.groovy
PIPELINE_ROOT="/fsimb/groups/imb-bioinfocf/projects/cfb_internal/tmp/ngspipe2go_chipseq_test/NGSpipe2go"
load PIPELINE_ROOT + "/pipelines/ChIPseq/essential.vars.groovy"
load PIPELINE_ROOT + "/pipelines/ChIPseq/tools.groovy"
load PIPELINE_ROOT + "/config/preambles.groovy"
<...>
Reduce the number of global variables and avoid collisions between module.vars
- NGSpipe2go/modules/<group>/<module>.vars.groovy
def dupradar_flags = [
stranded:"stranded=" + ESSENTIAL_STRANDED // strandness
paired :"paired=" + ESSENTIAL_PAIRED // is a paired end
outdir :"outdir=" + QC + "/dupRadar" //output dir. If
threads :"threads=" + Integer.toString(ESSENTIAL_THREADS)
gtf :"gtf=" + ESSENTIAL_GENESGTF // gene model
extra :"" // extra parms sent to the tool
]
- NGSpipe2go/modules/<group>/<module>.groovy
def DUPRADAR_FLAGS = dupradar_flags.gtf + " " +
dupradar_flags.stranded + " " +
dupradar_flags.paired + " " +
dupradar_flags.outdir + " " +
dupradar_flags.threads + " " +
dupradar_flags.extra
<...>
Rscript ${PIPELINE_ROOT}/tools/dupRadar/dupRadar.R bam=$input $DUPRADAR_FLAGS
Remove the program options from module.vars. Rational is that the user of the pipeline doesn't need to know how the parameters are called in the programs called within the module.
- NGSpipe2go/modules/<group>/<module>.vars.groovy
def dupradar_flags = [
stranded: ESSENTIAL_STRANDED // strandness
paired : ESSENTIAL_PAIRED // is a paired end
outdir : QC + "/dupRadar" //output dir. If
threads : Integer.toString(ESSENTIAL_THREADS)
gtf : ESSENTIAL_GENESGTF // gene model
]
- NGSpipe2go/modules/<group>/<module>.groovy
def DUPRADAR_FLAGS =
(dupradar_flags.gtf ? “ gtf=” + dupradar_flags.gtf : “”) +
(dupradar_flags.stranded ? “ stranded=” + dupradar_flags.stranded : “”) +
(dupradar_flags.paired ? “ paired=” + dupradar_flags.paired : “”) +
(dupradar_flags.outdir ? “ outdir=” + dupradar_flags.outdir : “”) +
(dupradar_flags.threads ? “ threads=” + dupradar_flags.threads : “”) +
<...>
Rscript ${PIPELINE_ROOT}/tools/dupRadar/dupRadar.R bam=$input $DUPRADAR_FLAGS
Files that the modules need (module.vars or anything else), need to be imported by the module and not the pipeline.
- NGSpipe2go/modules/<group>/<module>.groovy
load PIPELINE_ROOT + "/modules/RNAseq/dupradar.vars.groovy"
dupRadar = {
doc title: "dupRadar",
desc: "analysis of duplication rate on RNAseq analysis",
constraints: "",
bpipe_version: "tested with bpipe 0.9.8.7",
author: "Sergi Sayols"
<...>
- NGSpipe2go/pipelines/<pipeline>/<pipeline>.groovy
PIPELINE_ROOT="./NGSpipe2go/" // may need adjustment for some projects
load PIPELINE_ROOT + "/pipelines/RNAseq/essential.vars.groovy"
load PIPELINE_ROOT + "/pipelines/RNAseq/tools.groovy"
load PIPELINE_ROOT + "/modules/NGS/bam2bw.module.groovy"
load PIPELINE_ROOT + "/modules/NGS/bamcoverage.module.groovy"
<...>
Essential vars now are pipeline-specific, and are stored along with the pipeline NGSpipe2go/pipelines/\<pipeline\>/
Modules, specially reports, may compile stuff from multiple pipelines, which may not execute identical modules. Therefore, they may require customization based on the pipeline (and version).
- NGSpipe2go/pipelines/<pipeline>/<pipeline>.groovy
PIPELINE="scRNAseq_smartseq2"
PIPELINE_VERSION="1.0"
PIPELINE_ROOT="./NGSpipe2go"
<...>