Style_guide

NGSpipe2go Style Rules

General comments

The platform organizes pipelines into separate, well-defined parts, which are meant to be combined together as building block for new pipelines.

It should stick to the UNIX philosophy and the KISS principle. Thus try to keep modules and pipelines as simple and atomic as possible. Do one thing and do it well.

Modules

Filenames: all lowercase, ending with .groovy
Always 2 files: module logic ending with .module.groovy, module config ending with .vars.groovy
The module logic should be made out of 4 parts:
1. metadata: description, author, constraints
2. groovy code to pick config variables
3. bpipe specific promises and expectations (input-->output)
4. bash script doing the actual tool call - we use && between commands to stop the pipeline as soon as a command fires an error (scripts are preceded automatically with set -o errexit.
General coding style:
- Line Length: maximum 80 characters
- Indentation: four (4) spaces, no tabs
- Curly Braces: first on same line, last on own line
- else: Surround else with braces
- Semicolons: don't use them
- Stages start with capital letters (e.g. Bowtie_se = { vs. bowtie_se = {).

Example (bowtie1.module.groovy):

//rule for task bowtie_se from catalog ChIPseq, version 1
//desc: Align single end reads
Bowtie_pe = {
    doc title: "Bowtie PE alignment",
        desc:  "Align paired end reads",
        constraints: "Only works with compressed input. Samtools multithreaded version expected (>=0.1.19).",
        bpipe_version: "tested with bpipe 0.9.8.7",
        author: "Sergi Sayols modified for paired end by Nastasja Kreim"

    output.dir = BOWTIE_MAPPED
    def OUTPUTFILE = input1
    int path_index = OUTPUTFILE.lastIndexOf("/")
    OUTPUTFILE = OUTPUTFILE.substring(path_index+1)
    println(OUTPUTFILE)
    OUTPUTFILE = (OUTPUTFILE =~ /_R1.fastq.gz/).replaceFirst("")

    def BOWTIE_FLAGS = "-q --sam "  +
                       BOWTIE_QUALS    + " " + 
                       BOWTIE_BEST     + " " + 
                       BOWTIE_MM_SEED  + " " + 
                       BOWTIE_INSERT   + " " + 
                       BOWTIE_MAQERR   + " " + 
                       BOWTIE_MULTIMAP + " " + 
                       BOWTIE_THREADS  + " " + 
                       BOWTIE_EXTRA
    def SAMTOOLS_VIEW_FLAGS = "-bhSu "
    def SAMTOOLS_SORT_FLAGS = "-O bam " + BOWTIE_SAMTOOLS_THREADS

    produce(OUTPUTFILE + ".bam") {
        exec """
            module load bowtie/${BOWTIE_VERSION}     &&
            module load samtools/${SAMTOOLS_VERSION} &&         

            if [ -n "\$LSB_JOBID" ]; then
                export TMPDIR=/jobdir/\${LSB_JOBID};
            fi                                       &&
            
            bowtie $BOWTIE_FLAGS $BOWTIE_REF -1 $input1 -2 $input2 | samtools view $SAMTOOLS_VIEW_FLAGS - | samtools sort $SAMTOOLS_SORT_FLAGS -T $TMPDIR/\$(basename $output.prefix) - > $output
        ""","bowtie_pe"
    }
}

Module configuration

variables must start with the module prefix (e.g, for MACS2 --> MACS2_TARGETS="targets.txt"). This is to avoid collisions (no warning or error issued, it gets simply overwritten (only the last inclusion persists))
Do not reuse tool parameters between modules (e.g., if MACS2 calls samtools --> MACS2_SAMTOOLS_THREADS=4)
Always add a description of what the parameter is supposed to do and, if possible, add a default (e.g. MACS2_TARGETS="targets.txt" // targets file describing the samples)
Always include the tool parameter together with the value in the variable (e.g. MACS2_BWIDTH="--bw " + Integer.toString(ESSENTIAL_FRAGLEN))
When possible include a variable with free arguments to sent to the tool (e.g. MACS2_EXTRA="" // other parms sent to macs2)

Example of a variable definition file (bowtie1.vars.groovy):

// bowtie parameters with suggested typical defaults
BOWTIE_THREADS=" -p" + Integer.toString(ESSENTIAL_THREADS) // threads to use
BOWTIE_SAMTOOLS_THREADS="-@" + Integer.toString(ESSENTIAL_THREADS)
BOWTIE_REF=ESSENTIAL_BOWTIE_REF // prefix of the bowtie reference genome
BOWTIE_INSERT="-l 40"     // seed length, the optimum depends on the read length and quality
BOWTIE_MM_SEED="-n 2"     // maximum number of mismatches allowed in the seed sequence
BOWTIE_MAQERR="-e 70"     // maximum permitted total of quality values at all mismatched positions throughout the entire alignment
BOWTIE_MULTIMAP="-m 1"      // discard (-m 1) or keep one random alignment (-M 1) of all reads mapping to multiple locations
BOWTIE_BEST="--best --strata --tryhard --chunkmbs 256"  // bowtie best mapping mode
BOWTIE_QUALS="--phred33-quals"  // phred33-quals. Use --phred64-quals for old sequencing runs
BOWTIE_EXTRA=""                 // extra parms to be passed to bowtie, e.g. for trimming barcodes

Pipeline definition files

name pipeline definition files as some_pipe.pipeline.groovy. This makes it clear that it is a pipeline file, and the extension groovy tells the code editor how to highlight the code.
a bpipe.config file, specific for SLURM.
start with the definition of where the modules are located (e.g. MODULE_FOLDER="/project/NGSpipe2go/modules/")
load the essential variable for the pipeline, tool versions and tool locations:

  load MODULE_FOLDER + "RNAseq/essential.vars.groovy"
  load MODULE_FOLDER + "RNAseq/tool.locations.groovy"
  load MODULE_FOLDER + "RNAseq/tool.versions.groovy"

load the modules using the bpipe load command. Avoid including any bpipe or groovy code in the pipeline definition (e.g. load MODULE_FOLDER + "NGS/fastqc.vars.groovy")

Tools

Should be only simple wrappers. Complex tools are outside the scope of the pipeline manager, perhaps in imbforge

Provide feedback

Saved searches

Use saved searches to filter your results more quickly