Skip to content

DSL2 - emit tuples with optional values #2678

@rcannood

Description

@rcannood

Usage scenario

I'd like to be able to return a tuple with optional elements. For example, by defining the output as tuple val(id), path("output.txt"), path("output2.txt" optional: true), I'd like a process to be able to emit an event ["foo", path("output.txt"), null].

The process and downstream processes can take a while to run, so using a multi-channel output in combination with a groupTuple() (See Attempt 3) is very undesirable.

Suggested implementation

Probably this would require:

Reproducible examples

I made several attempts at getting this to run with the current implementation of Nextflow. To summarise:

  • Attempt 1: optional path in non-optional tuple → errors
  • Attempt 2: optional tuple → tuple with missing file is not emitted
  • Attempt 3: multi-channel output followed by a groupTuple → introduces a bottleneck in workflows with long execution times
  • Attempt 4: a messy workaround solution to this problem

Attempt 1: optional path in tuple

Because of TupleOutParam.groovy#L103-L105, this optional value is overridden by the tuple's value for 'optional', namely false.

If I try to run the code following code, Nextflow will produce an error when output2.txt is missing.

Attempt 1 reprex
nextflow.enable.dsl=2

process test_process1 {
  input:
    tuple val(id)
  output:
    tuple val(id), path("output.txt"), path("output2.txt", optional: true)
  script:
    """
    echo $id > output.txt
    if [[ "$id" == "foo" ]]; then
      echo $id > output2.txt
    fi
    """
}

workflow {
  Channel.fromList( ["foo", "bar"] )
    | view { "input: ${it}" }
    | test_process1
    | view { "output: ${it}" }
}

$ NXF_VER=21.10.6 nextflow run test_outputs_opt1.nf
input: foo
input: bar
output: [foo, work/81/e866d5e329c9ac9980a0c9313d347b/output.txt, work/81/e866d5e329c9ac9980a0c9313d347b/output2.txt]
[8c/e39e04] NOTE: Missing output file(s) `output2.txt` expected by process `test_process1 (2)` -- Error is ignored

Attempt 2: make the whole tuple optional

By making the whole tuple optional, Nextflow doesn't produce an error anymore, but my whole tuple is removed, which is undesirable.

Attempt 2 reprex
nextflow.enable.dsl=2

process test_process1 {
  input:
    tuple val(id)
  output:
    tuple val(id), path("output.txt"), path("output2.txt") optional true
  script:
    """
    echo $id > output.txt
    if [[ "$id" == "foo" ]]; then
      echo $id > output2.txt
    fi
    """
}

workflow {
  Channel.fromList( ["foo", "bar"] )
    | view { "input: ${it}" }
    | test_process1
    | view { "output: ${it}" }
}

$ NXF_VER=21.10.6 nextflow run test_outputs_opt2.nf
input: foo
input: bar
output: [foo, work/95/0e07ee0b94834d4587509b152aa354/output.txt, /home/rcannoodwork/95/0e07ee0b94834d4587509b152aa354/output2.txt]

Attempt 3: multichannel output

This approach is what is proposed in #1980. However, having to use 'groupTuple()' to merge the multichannel output back into a single event is also undesirable, as now the whole Channel needs to be executed before any events can be emitted downstream. Note that setting size: 2 doesn't work in this case, since some tuples should have one element, others two.

Attempt 3 reprex
nextflow.enable.dsl=2

process test_process2 {
  input:
    tuple val(id)
  output:
    tuple val(id), val("output1"), path("output.txt")
    tuple val(id), val("output2"), path("output2.txt") optional true
  script:
    """
    echo $id > output.txt
    if [[ "$id" == "foo" ]]; then
      echo $id > output2.txt
    fi
    """
}

workflow {
  Channel.fromList( ["foo", "bar"] )
    | view { "input: ${it}" }
    | test_process2
    | mix
    | groupTuple(by: 0)
    | map{ [ it[0], [it[1], it[2]].transpose().collectEntries() ]}
    | view { "output: ${it}" }
}

$ NXF_VER=21.10.6 nextflow run test_outputs_opt3.nf
input: foo
input: bar
output: [bar, [output1:work/9c/97b3a2884f97594532a19923e6c748/output.txt]]
output: [foo, [output1:work/60/984231826c9a9cc2a1e1cf29e16fdb/output.txt, output2:work/60/984231826c9a9cc2a1e1cf29e16fdb/output2.txt]]

Attempt 4: add junk to output

By adding a file known to exist (e.g. ".command.sh") to the output, I can force the Channel to always return a tuple. This works, but the code looks quite messy and I need to do postprocessing to remove the additional file.

Attempt 4 reprex
nextflow.enable.dsl=2

process test_process3 {
  input:
    tuple val(id)
  output:
    tuple val(id), path{[".command.sh", "output.txt"]}, path{[".command.sh", "output2.txt"]}
  script:
    """
    echo $id > output.txt
    if [[ "$id" == "foo" ]]; then
      echo $id > output2.txt
    fi
    """
}

workflow {
  Channel.fromList( ["foo", "bar"] )
    | view { "input: ${it}" }
    | test_process3
    | map { output ->
      map = [["output1", "output2"], output.drop(1)].transpose()
      map_without_dummy = map.collectEntries{ key, out ->
        if (out instanceof List && out.size() > 2) {
          [ key, out.drop(1) ]
        } else if (out instanceof List && out.size == 2) {
          [ key, out[1] ]
        } else {
          [ key, null ]
        }
      }
      [ output[0], map_without_dummy ]
    }
    | view { "output: ${it}" }
}

$ NXF_VER=21.10.6 nextflow run test_outputs_opt4.nf
input: foo
input: bar
output: [foo, [output1:work/96/a51f95280ee3332f50b6b05a12596b/output.txt, output2:work/96/a51f95280ee3332f50b6b05a12596b/output2.txt]]
output: [bar, [output1:work/ec/87149bfea74975d37307d6a115c812/output.txt, output2:null]]

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions