Skip to content

New architecture (2019)

ssayols edited this page Oct 14, 2019 · 12 revisions

multiple tooling systems

Aim

Support multiple tooling systems (Lmod, Conda, Singularity containers). The tooling system will take care of adding the tool to the PATH.

Files affected:

  • NGSpipe2go/config/tools.groovy (new) --> contains 1) the definition of all run strings to add the tool to the path, 2) a list with the default running environments and versions for each tool, and 3) a function to retrieve the run string to add the tool to the path.
// locations
def conda_tools       = "/fsimb/common/conda_tools"
def singularity_tools = "/fsimb/common/singularity_tools"

// defaults
tools_defaults = [
    picard: [ runenv: "lmod", version: "2.7" ],
<...>
]

// prepare environment
tools_prepare_env = [
<...>
    picard: [
        "2.7": [
            lmod: "module load picard/2.7.0"
        ],
        "2.18": [
            conda: "source activate ${conda_tools}/picard/2.18.26",
            singularity: "alias picard=\"singularity run --app picard ${singularity_tools}/picard/2.18.17r0/picard.simg\""
        ]
    ]
<...>
  • NGSpipe2go/pipelines/<pipeline>/tools.groovy --> contains the custom run environments and versions for tools, which overwrite the defaults defined in the file above.
load PIPELINE_ROOT + "/config/tools.groovy" // load defaults
tools_custom = [ 
    picard: [ runenv: "conda", version: "2.18" ]
]

tools = new LinkedHashMap(tools_defaults)   // create new tools map
tools.putAll(tools_custom)                  // merge user and defaults
  • NGSpipe2go/modules/<group>/<module>.groovy --> calls the function to get the run string to add the tool to the path.
bowtie_se = {
<...>
    def TOOL_ENV = prepare_tool_env("bowtie",
                                    tools["bowtie"]["version"],
                                    tools["bowtie"]["runenv"])

    transform(".fastq.gz") to (".bam") {
        exec """
            ${TOOL_ENV} &&
<...>
        ""","bowtie_se"
    }
}

no hardcoded tool locations

Aim

Remove any trace of tools being called with absolute paths. Also, tools included in NGSpipe2go will now be called relative to the pipeline root folder (see example).

Files affected:

  • NGSpipe2go/modules/<group>/<module>.groovy
dupradar = {
<...>
  transform(".bam") to("_dupRadar.png") {
    exec """
<...>
      Rscript ${PIPELINE_ROOT}/tools/dupRadar/dupRadar.R bam=$input $DUPRADAR_FLAGS
<...>
        ""","dupradar"
    }
}

conditional stages

Aim

  • conditionally run stages (with a message)
  • collect different files
  • behavior defined in essential vars

Files affected:

  • NGSpipe2go/pipelines/<pipeline>/essential.vars.groovy Define some global variables at the end of the file:
<...>

// optional pipeline stages to include
RUN_TRACKHUB=false
RUN_IN_PAIRED_END_MODE=(ESSENTIAL_PAIRED == "yes")
  • NGSpipe2go/pipelines/<pipeline>/<pipeline>.groovy
<...>

// Main pipeline task
dontrun = { println "didn't run $module" }

run {
    "%.fastq.gz" * [ FastQC ] +
    (RUN_IN_PAIRED_END_MODE ? "%.R*.fastq.gz" * [ BWA_pe ] : "%.fastq.gz" * [ BWA_se ] ) +
    "%.bam" * [ RmDups + BAMindexer + IndelRealignment + BaseRecalibration + [ VariantCallHC, VariantCallUG ] ] +
    "%.vcf.gz" * [ VariantEval ] +
    (RUN_TRACKHUB ? trackhub_config + trackhub : dontrun.using(module:"trackhub")) +
    collectBpipeLogs + shinyReports
}

module preambles

Aim

Concentrate in one unique place common commands that all modules need to run.

  • get_preamble(): defined in config/preambles.groovy
  • returns custom preamble for module (if exists), default preamble otherwise

Files affected:

  • NGSpipe2go/config/preambles.groovy Definition of the default preamble:
default_preamble="""
    export TMP="$TMP";
    if [ ! -d "\$TMP" ]; then
        mkdir -p "\$TMP";
    fi;

    if [ -n "\$SLURM_JOBID" ]; then
        export TMP="/jobdir/\$SLURM_JOBID";
    fi
"""

Definition of the module specific preambles that replace the default preamble:
```groovy
module_preambles=[
    default: default_preamble,
    "bowtie": default_preamble + " && echo Running bowtie version: \$(bowtie --version | grep version)"
]

Function that picks the right preamble for a module:

String get_preamble (String module) {
    return (module_preambles.containsKey(module) ? module_preambles[module] : module_preambles.default)
}
  • NGSpipe2go/modules//.groovy
bowtie_se = {
<...>
  def TOOL_ENV = prepare_tool_env("bowtie",
                                  tools["bowtie"]["version"],
                                  tools["bowtie"]["runenv"])
  def PREAMBLE = get_preamble(“bowtie_se”)
  transform(".fastq.gz") to (".bam") {
    exec """
      ${TOOL_ENV} &&
      ${PREAMBLE} &&
<...>
        ""","bowtie_se"
    }
}
  • NGSpipe2go/pipelines//.groovy
PIPELINE_ROOT="/fsimb/groups/imb-bioinfocf/projects/cfb_internal/tmp/ngspipe2go_chipseq_test/NGSpipe2go"

load PIPELINE_ROOT + "/pipelines/ChIPseq/essential.vars.groovy"
load PIPELINE_ROOT + "/pipelines/ChIPseq/tools.groovy"
load PIPELINE_ROOT + "/config/preambles.groovy"

<...>

module parms in a Map

Aim

Reduce the number of global variables and avoid collisions between module.vars

Files affected:

  • NGSpipe2go/modules/<group>/<module>.vars.groovy
def dupradar_flags = [
  stranded:"stranded=" + ESSENTIAL_STRANDED  // strandness
  paired  :"paired="   + ESSENTIAL_PAIRED    // is a paired end
  outdir  :"outdir="   + QC + "/dupRadar"    //output dir. If 
  threads :"threads="  + Integer.toString(ESSENTIAL_THREADS)
  gtf     :"gtf="      + ESSENTIAL_GENESGTF            // gene model
  extra   :""                      // extra parms sent to the tool
]
  • NGSpipe2go/modules/<group>/<module>.groovy
def DUPRADAR_FLAGS = dupradar_flags.gtf      + " " +
                     dupradar_flags.stranded + " " + 
                     dupradar_flags.paired   + " " +
                     dupradar_flags.outdir   + " " +
                     dupradar_flags.threads  + " " +
                     dupradar_flags.extra

<...>

Rscript ${PIPELINE_ROOT}/tools/dupRadar/dupRadar.R bam=$input $DUPRADAR_FLAGS

module parms don’t include program option

Aim

Remove the program options from module.vars. Rational is that the user of the pipeline doesn't need to know how the parameters are called in the programs called within the module.

Files affected:

  • NGSpipe2go/modules/<group>/<module>.vars.groovy
def dupradar_flags = [
  stranded: ESSENTIAL_STRANDED  // strandness
  paired  : ESSENTIAL_PAIRED    // is a paired end
  outdir  : QC + "/dupRadar"    //output dir. If 
  threads : Integer.toString(ESSENTIAL_THREADS)
  gtf     : ESSENTIAL_GENESGTF  // gene model
]
  • NGSpipe2go/modules/<group>/<module>.groovy
def DUPRADAR_FLAGS = 
  (dupradar_flags.gtf      ? “ gtf=+ dupradar_flags.gtf      : “”) +
  (dupradar_flags.stranded ? “ stranded=+ dupradar_flags.stranded : “”) +
  (dupradar_flags.paired   ? “ paired=+ dupradar_flags.paired   : “”) +
  (dupradar_flags.outdir   ? “ outdir=+ dupradar_flags.outdir   : “”) +
  (dupradar_flags.threads  ? “ threads=+ dupradar_flags.threads  : “”) +

<...>

Rscript ${PIPELINE_ROOT}/tools/dupRadar/dupRadar.R bam=$input $DUPRADAR_FLAGS

modules load their own .vars files (and others)

Aim

Files that the modules need (module.vars or anything else), need to be imported by the module and not the pipeline.

Files affected:

  • NGSpipe2go/modules/<group>/<module>.groovy
load PIPELINE_ROOT + "/modules/RNAseq/dupradar.vars.groovy"

dupRadar = {
    doc title: "dupRadar",
        desc:  "analysis of duplication rate on RNAseq analysis",
        constraints: "",
        bpipe_version: "tested with bpipe 0.9.8.7",
        author: "Sergi Sayols"
<...>
  • NGSpipe2go/pipelines/<pipeline>/<pipeline>.groovy
PIPELINE_ROOT="./NGSpipe2go/"    // may need adjustment for some projects

load PIPELINE_ROOT + "/pipelines/RNAseq/essential.vars.groovy"
load PIPELINE_ROOT + "/pipelines/RNAseq/tools.groovy"

load PIPELINE_ROOT + "/modules/NGS/bam2bw.module.groovy"
load PIPELINE_ROOT + "/modules/NGS/bamcoverage.module.groovy"
<...>

pipeline-specific essential vars

Aim

Essential vars now are pipeline-specific, and are stored along with the pipeline NGSpipe2go/pipelines/\<pipeline\>/

Pipelines have an identification and a version string

Aim

Modules, specially reports, may compile stuff from multiple pipelines, which may not execute identical modules. Therefore, they may require customization based on the pipeline (and version).

Files affected:

  • NGSpipe2go/pipelines/<pipeline>/<pipeline>.groovy
PIPELINE="scRNAseq_smartseq2"
PIPELINE_VERSION="1.0"
PIPELINE_ROOT="./NGSpipe2go"
<...>