diff --git a/adr/20250825-workflow-params.md b/adr/20250825-workflow-params.md new file mode 100644 index 0000000000..17a41ef626 --- /dev/null +++ b/adr/20250825-workflow-params.md @@ -0,0 +1,229 @@ +# Workflow params + +- Authors: Ben Sherman +- Status: accepted +- Date: 2025-08-25 +- Tags: lang, static-types, params + +## Summary + +Introduce a unified, statically typed way to declare the top-level inputs (i.e. parameters) of a workflow. + +## Problem Statement + +Pipeline parameters in Nextflow are currently declared using property assignments: + +```groovy +params.reads = "$baseDir/data/ggal/ggal_gut_{1,2}.fq" +params.transcriptome = "$baseDir/data/ggal/ggal_1_48850000_49020000.Ggal71.500bpflank.fa" +params.multiqc = "$baseDir/multiqc" +``` + +This approach has several limitations: + +- **No type annotations**: Parameter types cannot be expressed in the script. The type of a parameter can only be inferred from its default value, which may be ambiguous (e.g., a default value of `null`, a `String` that should be treated as a `Path`). + +- **Heuristic type coercion**: When a parameter is supplied on the command line, Nextflow attempts to coerce the string value to the appropriate type using heuristics (e.g., `'true'` → boolean `true`, `'42'` → integer `42`). These heuristics are not always correct and can lead to unexpected behavior. + +- **No built-in validation**: There is no built-in way to validate that a parameter is required, or that a parameter value has the correct type. Validation must be done manually in the script, or through an external JSON Schema file (`nextflow_schema.json`). + +- **Scattered declarations and usage**: Parameters may be declared anywhere in the script or across multiple scripts, making it difficult to get a single view of the pipeline parameters. Parameters can be used anywhere in the pipeline, even outside the script where they are declared, making it impossible to validate params usage at compile-time. + +## Goals + +- Declare all parameters in one place in the script, with documentation. + +- Provide explicit type annotations for parameters, enabling compile-time validation and IDE support. + +- Clearly distinguish between required and optional parameters. + +- Coerce CLI parameter values based on declared types, rather than relying on heuristics. + +## Non-goals + +- Removing the legacy `params.foo = bar` syntax -- legacy parameters must continue to work without modification. + +- Changing the `params` config scope -- params can still be declared in the config file, although some best practices apply. + +- Replacing `nextflow_schema.json` -- while the `params` block addresses many of the same needs, existing pipelines that use a JSON Schema should not be required to migrate. A native integration with `nextflow_schema.json` can be explored in the future. + +- Supporting nested params -- the `params` block only supports a flat list of params. Nested params can still be used in the config, but they do not have first-class support at this time. + +## Decision + +Introduce the `params` block for declaring pipeline parameters. Each parameter is declared with a name, a type, and an optional default value: + +```groovy +params { + // Path to the input samplesheet + input: Path + + // Whether to save intermediate files + save_intermeds: Boolean = false +} +``` + +Typed parameters are used to validate parameter usage in the script, and to coerce CLI parameter values at runtime. + +## Core Capabilities + +### Parameter declaration + +The `params` block consists of parameter *declarations*. Each parameter is declared as `name: Type` (required) or `name: Type = default` (optional with default): + +```groovy +params { + input: Path // required + extra_file: Path? // optional (defaults to null) + db_file: Path = 'db.json' // optional with default + flag: Boolean // boolean params default to false +} +``` + +All standard Nextflow types except `Channel` and `Value` can be used for parameter type annotations. + +### Required and optional parameters + +A parameter without a default value is *required*. If a required parameter is not supplied at runtime (via the command line, a params file, or the config), the run fails immediately with an informative error. + +A parameter with the `?` suffix on its type is *optional* and will be `null` if not supplied. Boolean parameters without a default value implicitly default to `false`. + +### Type-based CLI coercion + +When a parameter is supplied on the command line, Nextflow converts the string value to the declared type: + +| Declared type | String input | Resolved value | +|---|---|---| +| `Boolean` | `'true'` | `true` | +| `Integer` | `'42'` | `42` | +| `Float` | `'3.14'` | `3.14` | +| `Duration` | `'1h'` | `Duration.of('1h')` | +| `MemoryUnit` | `'8 GB'` | `MemoryUnit.of('8 GB')` | +| `Path` | `'/data'` | `Path.of('/data')` | + +This replaces the heuristic type detection used for legacy parameters. + +### Compile-time validation + +Legacy parameters can be accessed globally by all scripts in the pipeline. While this approach is flexible, it prevents compile-time validation and breaks modularity. + +When a module references a param, it implicitly assumes that the param will always be defined by the workflow that uses it. This assumption cannot be validated at compile-time, so if the param is missing, an error will occur only at runtime. + +The `params` block solves this problem by defining all params in one place. It serves as the inputs for the entry workflow, similar to the `take:` section in named workflows. Parameters should be passed to processes and workflows as explicit inputs, so that every variable reference can be validated against local declarations. + +For example, the following workflow: + +```groovy +// main.nf +params.input = '...' + +workflow { + HELLO() +} + +// hello.nf +workflow HELLO { + println "input = ${params.input}" +} +``` + +Can be rewritten as follows: + +```groovy +// main.nf +params { + input: String +} + +workflow { + HELLO(params.input) +} + +// hello.nf +workflow HELLO { + take: + input: String + + main: + println "input = ${input}" +} +``` + +Typed parameters can still be used globally by all scripts for backwards compatibility. However, the type checker will only validate params used in the entry workflow and `output` block. Users should eventually migrate their pipelines as shown above for effective type checking. + +### Script and config params + +Parameters can also be defined in config files: + +```groovy +params { + outdir = 'results' + publish_dir_mode = 'copy' +} +``` + +Config params continue to work as before. As a best practice, they should be used only to "configure the configuration." + +Some config params can be replaced with native functionality, e.g., `outputDir` and `workflow.output.mode` for the above. The nf-core [institutional configs](https://github.com/nf-core/configs), which enable users to run a pipeline with their institutional config entirely from the command line, cannot be easily replaced and provide a clear use case for config params. + +Config params are also propagated to the script since the config file can overwrite script params (e.g. in a profile). However, since the script `params` block only allows params that were explicitly declared, it needs to be able to distinguish between config params and invalid params (e.g. command line param with a typo). + +To prevent a circular dependency between the script execution and config resolution, parameters are resolved as follows: + +1. Load *CLI params* from command line, params file + +2. Load config files + - Params declared in the `params` scope are *config params* + - If a config setting references an undeclared param, report an error + - Params assigned in a profile are also marked as config params -- they should be used to overwrite existing params or potential script params + - CLI params override config params + +3. Execute script, resolve `params` block + - CLI params and config params override default values in `params` block + - If a required script param is undefined, report an error + - If a CLI param is not declared in the `params` block and is not a config param, report an error + +In other words, params are applied in the following order (lowest to highest precedence): + +1. Default value in the `params` block +2. Config file (`params { param = value }`) +3. Params file (`-params-file params.json`) +4. Command-line arguments (`--param value`) + +Any parameter supplied via command line or params file must be declared in the script or config. Supplying an undeclared parameter is an error. + +## Links + +- Community issue: [#4669](https://github.com/nextflow-io/nextflow/issues/4669) +- [Workflow outputs ADR](./20251020-workflow-outputs.md) +- [Record types ADR](./20260306-record-types.md) + +## Appendix + +### Runtime type analysis via reflection + +Validating and converting params against declared types requires the type annotations to be fully available at runtime. Parameterized types such as `List` must provide both the type (`List`) and the generic type arguments (`[String]`). + +During compilation, type annotations are modeled using `ClassNode`, which provides the "raw" type and type arguments via `getTypeClass() -> Class` and `getGenericsTypes() -> GenericsType[]`. + +At runtime, type annotations are modeled using `Type`, for which there are two primary cases: + +- If the type is parameterized, it is a `ParameterizedType`, which provides the "raw" type and type arguments via `getRawType() -> Class` and `getActualTypeArguments() -> Type[]`. + +- Otherwise, the type is a `Class` corresponding to the raw type. + +This type information can be obtained at runtime from the following entities: + +- Class fields via `Field::getGenericType() -> Type` +- Method parameters via `Parameter::getParameterizedType() -> Type` + +For this reason, the `params` block is compiled as a class, so that each parameter declaration is a field which can model a parameterized type. + +Type annotations can be marked as nullable using the `?` suffix. This marker is compiled as a custom `@Nullable` annotation on the corresponding field, so that the runtime can use this information. + +For example, when loading a JSON file as a collection of records, Nextflow uses the given record type to validate each JSON object in the collection: + +- String values that map to a record field with type `Path` are converted to Path values +- If a JSON object is missing a record field that is marked as nullable, it is considered valid + +While type annotations are used only at compile-time in all other contexts, they are needed at runtime for pipeline parameters in order to validate and convert external input data to the expected type. diff --git a/adr/20251020-workflow-outputs.md b/adr/20251020-workflow-outputs.md new file mode 100644 index 0000000000..65da201faf --- /dev/null +++ b/adr/20251020-workflow-outputs.md @@ -0,0 +1,198 @@ +# Workflow outputs + +- Authors: Ben Sherman +- Status: accepted +- Date: 2025-10-20 +- Tags: lang, workflows + +## Summary + +Introduce a unified, dataflow-centric way to declare the top-level outputs of a workflow. + +## Problem Statement + +In Nextflow DSL1, each process used `publishDir` to copy output files from the work directory to an external location. Nextflow DSL2 inherited this approach but it became increasingly problematic as pipelines grew larger and more modular: + +- **Mismatch with reusable modules**: Publishing rules often depend on how a process is used in a given pipeline. Setting `publishDir` inside a module process makes the module less reusable, since the publish path and mode are baked into the process definition. Using process selectors in configuration is verbose and fragile. + +- **Fragmented outputs**: Publishing logic is scattered across many module files. There is no single place to see what a pipeline produces or to reason about the output structure. + +- **Redundant configuration**: Common settings like the base output directory and publish mode must be repeated in every `publishDir` declaration, leading to duplication. + +- **Mismatch with channels**: Channels carry both files and structured metadata (e.g., sample IDs, quality flags). The `publishDir` directive matches files with glob patterns and cannot capture metadata unless it happens to be written to a file. This mismatch makes it difficult to produce structured, self-describing outputs. + +## Goals + +- Declare all pipeline outputs in a single location alongside the entry workflow. + +- Assign outputs from channels rather than from individual process definitions, decoupling pipeline-specific publishing rules from reusable modules. + +- Support dynamic and fine-grained file publishing to match common publishing patterns (e.g. directory per sample, directory per pipeline step). + +- Support structured index files (CSV, JSON, YAML) that preserve output files with associated metadata. + +- Define publishing behavior (mode, overwrite, storage class, etc.) globally in the config. + +- Support type annotations on output declarations for documentation and compile-time validation. + +## Non-goals + +- Removing support for `publishDir` immediately -- `publishDir` should continue to work without modification, although it may eventually be phased out as users migrate away from it. + +- Publishing outputs from processes or named workflows -- only the entry workflow has a `publish:` section. + +- Defining a JSON schema for workflow outputs -- schema/spec generation will be explored in the future. + +## Decision + +Introduce the `output` block for declaring workflow outputs. Each output defines how files are published to the output directory, and the format of the index file (if defined). + +Introduce the output directory as a first-class concept in Nextflow, as well as the `workflow.output` config scope for controlling publishing behavior. + +## Core Capabilities + +### Output definition + +Workflow outputs consist of an `output` block, which declares each output, and a `publish:` section in the entry workflow, which assigns a dataflow source (channel or value) to each output: + +```groovy +workflow { + main: + ch_fastqc = FASTQC(ch_reads) + ch_report = MULTIQC(ch_fastqc.collect()) + + publish: + fastqc = ch_fastqc + report = ch_report +} + +output { + fastqc: Channel { + path 'fastqc' + } + report: Path { + path '.' + } +} +``` + +Every output assigned in `publish:` must be declared in the `output` block, and vice versa. A mismatch is a compile-time error. + +Each output declaration can specify a type annotation for documentation and type checking support. Type annotations are optional and do not change runtime behavior. They are used by the type checker to validate the `publish:` section and the `path` directive. + +### Output directory + +The top-level output directory defaults to `results` in the launch directory. It can be overridden from the command line or config file: + +```bash +nextflow run main.nf -output-dir my-results +``` + +```groovy +// nextflow.config +outputDir = 'my-results' +``` + +All publish paths declared in the `output` block are relative to this directory. Absolute paths are not allowed. + +### Static and dynamic publish paths + +The `path` directive accepts a string for a fixed path, or a closure for per-value paths: + +```groovy +output { + // static: all files go to results/fastq/ + reads { + path 'fastq' + } + + // dynamic: results are organized by sample id + samples { + path { sample -> "${sample.id}" } + } +} +``` + +Nextflow recursively scans channel values for files, including files nested inside lists, maps, records, and tuples. Files that did not originate from the work directory are not published. + +### Fine-grained file publishing with `>>` + +Within a `path` closure, individual files can be published to different locations using the `>>` operator. Only files explicitly captured with `>>` are published; other files in the value are ignored. + +```groovy +output { + samples { + path { sample -> + sample.fastqc >> "fastqc/" + sample.bam >> (params.save_bams ? "align/" : null) + sample.bam_index >> (params.save_bams ? "align/" : null) + } + } +} +``` + +The *publish source* (left-hand side) should be a file or collection of files. The *publish target* (right-hand side) should be a relative path. If the target has a trailing slash, then the source is published *into* the target directory; otherwise the source is published *as* the target name. + +A `null` target suppresses publishing for that file, and a `null` source is also a no-op. This way, publishing of individual files can be disabled by either setting the record field to `null` in workflow logic or using a param in the publish statement. + +### Index files + +Each output can generate a structured index file that records each published channel value along with its metadata. Supported formats are CSV, JSON, and YAML. + +```groovy +output { + samples { + path 'fastq' + index { + path 'samples.csv' + header true + } + } +} +``` + +The index file is essentially a *samplesheet* -- it preserves the structure of files and metadata in the published channel, and can be easily passed as input to downstream pipelines. Metadata fields (sample IDs, quality flags, etc.) do not need to be written to a separate metadata file or encoded into file paths. + +Files that did not originate from the work directory are not published, but are still included in the index. + +### Global defaults via configuration + +Common publish settings can be set globally under the `workflow.output` config scope: + +```groovy +// nextflow.config +workflow { + output { + mode = 'copy' + overwrite = 'lenient' + } +} +``` + +These defaults can be overridden per-output in the `output` block: + +```groovy +// main.nf +output { + fastqc { + mode = 'symlink' + overwrite = true + } +} +``` + +## Alternatives + +### Publishing from processes and subworkflows + +Earlier iterations allowed for workflow outputs to be published from subworkflows or processes, instead of requiring all workflow outputs to be propagated up to the entry workflow. + +While this approach is less verbose, it breaks the modularity of processes and subworkflows. Publishing behavior is inherent to the pipeline, not the individual subcomponents which could be shared across many pipelines. The process or subworkflow should expose all of its outputs as channels, and the calling pipeline should decide whether and how to publish these outputs. + +On the other hand, propagating all workflow outputs to the top will make pipelines more verbose, especially when using "skinny tuple" channels. This issue will be alleviated by migrating from tuples to records -- for this reason, it is recommended that large pipelines be migrated to records before being migrated to workflow outputs. + +## Links + +- Community issues: [#4042](https://github.com/nextflow-io/nextflow/issues/4042), [#4661](https://github.com/nextflow-io/nextflow/issues/4661), [#4670](https://github.com/nextflow-io/nextflow/issues/4670) +- [Workflow params ADR](./20250825-workflow-params.md) +- [Record types ADR](./20260306-record-types.md) diff --git a/docs/tutorials/static-types.md b/docs/tutorials/static-types.md index 7c524a069b..8f96de8e21 100644 --- a/docs/tutorials/static-types.md +++ b/docs/tutorials/static-types.md @@ -166,47 +166,6 @@ read_pairs_ch = channel.of(params.reads) } ``` -You can simplify the code further by modeling `params.reads` as a collection of records instead of a file path. - -Add a header row to the samplesheet: - -``` -id,fastq_1,fastq_2 -gut,... -liver,... -lung,... -spleen,... -``` - -Refactor `params.reads` as a collection of records: - -```nextflow -params { - // The input samplesheet of paired-end reads - reads: List = "${projectDir}/data/allreads.csv" - - // ... -} - -record Sample { - id: String - fastq_1: Path - fastq_2: Path -} -``` - -In the above, `Sample` is a *record type* based on the samplesheet structure. When a file path is supplied to a collection-type parameter (e.g., `List`), the file path is automatically loaded and parsed into a collection. - -Refactor the `read_pairs_ch` to load the collection into a channel: - -```nextflow -read_pairs_ch = channel.fromList(params.reads) -``` - -:::{note} -Collection-type params can also be loaded from JSON and YAML samplesheets. See {ref}`workflow-typed-params` for more information. -::: - ### Migrating processes See {ref}`process-typed-page` for an overview of typed processes. @@ -434,7 +393,11 @@ You can infer the type of each workflow input by examining how the workflow is c ```nextflow workflow { - read_pairs_ch = channel.fromList(params.reads) + read_pairs_ch = channel.of(params.reads) + .flatMap { csv -> csv.splitCsv() } + .map { row -> + record(id: row[0], fastq_1: file(row[1]), fastq_2: file(row[2])) + } RNASEQ(read_pairs_ch, params.transcriptome) @@ -444,7 +407,7 @@ workflow { You can determine the type of each input as follows: -- The channel `read_pairs_ch` has type `Channel`, where `E` is the type of each value in the channel. It is loaded from `params.reads` which has type `List`. Therefore `read_pairs_ch` has type `Channel`. +- The channel `read_pairs_ch` has type `Channel`, where each record contains the fields `id`, `fastq_1`, `fastq_2`. - The parameter `params.transcriptome` has type `Path` as defined in the `params` block. @@ -460,6 +423,12 @@ workflow RNASEQ { // ... } + +record Sample { + id: String + fastq_1: Path + fastq_2: Path +} ``` The `read_pairs_ch` channel also needs to provide all of the record fields required by downstream processes. It is used by `FASTQC` and `QUANT`, which both declare the following record input: @@ -549,7 +518,11 @@ The entry workflow is defined as follows: ```nextflow workflow { - read_pairs_ch = channel.fromFilePairs(params.reads, checkIfExists: true, flat: true) + read_pairs_ch = channel.of(params.reads) + .flatMap { csv -> csv.splitCsv() } + .map { row -> + record(id: row[0], fastq_1: file(row[1]), fastq_2: file(row[2])) + } (fastqc_ch, quant_ch) = RNASEQ(read_pairs_ch, params.transcriptome) @@ -565,7 +538,11 @@ Rewrite this workflow based on the updated params, processes, and subworkflows: nextflow.enable.types = true workflow { - read_pairs_ch = channel.fromList(params.reads) + read_pairs_ch = channel.of(params.reads) + .flatMap { csv -> csv.splitCsv() } + .map { row -> + record(id: row[0], fastq_1: file(row[1]), fastq_2: file(row[2])) + } samples_ch = RNASEQ(read_pairs_ch, params.transcriptome) @@ -577,7 +554,7 @@ workflow { } ``` -The `reads` param was refactored as a collection of records, so it is loaded into a channel using `channel.fromList`. It is compatible with the records expected by `RNASEQ`. +The `reads` param was refactored as a `Path`, so it is loaded into a channel of records using `splitCsv`. It is compatible with the records expected by `RNASEQ`. The `RNASEQ` workflow now returns a single combined channel, so the `mix` operation is no longer needed. The `flatMap` operator is used to extract the files from each record in `samples_ch`. diff --git a/docs/workflow-typed.md b/docs/workflow-typed.md index b384a90407..bf902c1fb4 100644 --- a/docs/workflow-typed.md +++ b/docs/workflow-typed.md @@ -52,12 +52,6 @@ A parameter that doesn't specify a default value is a *required* parameter. If a Boolean parameters that don't specify a default value will default to `false`. -Parameters with a collection type (i.e., `List`, `Set`, or `Bag`) can be supplied a file path instead of a literal collection. The file must be CSV, JSON, or YAML. Nextflow will parse the file contents and assign the resulting collection to the parameter. An error is thrown if the file contents do not match the parameter type. - -:::{note} -When supplying a CSV file to a collection parameter, the CSV file must contain a header row and must use a comma (`,`) as the column separator. -::: - ## Typed outputs :::{versionadded} 25.10.0 diff --git a/modules/nextflow/src/main/groovy/nextflow/script/BaseScript.groovy b/modules/nextflow/src/main/groovy/nextflow/script/BaseScript.groovy index d4d1109d51..6aaaa213b6 100644 --- a/modules/nextflow/src/main/groovy/nextflow/script/BaseScript.groovy +++ b/modules/nextflow/src/main/groovy/nextflow/script/BaseScript.groovy @@ -116,15 +116,16 @@ abstract class BaseScript extends Script implements ExecutionContext { /** * Define a params block. * + * @param clazz * @param body */ - protected void params(Closure body) { + protected void params(Class clazz, Closure body) { if( entryFlow ) throw new IllegalStateException("Workflow params definition must be defined before the entry workflow") if( ExecutionStack.withinWorkflow() ) throw new IllegalStateException("Workflow params definition is not allowed within a workflow") - this.paramsDef = new ParamsDef(body) + this.paramsDef = new ParamsDef(clazz, body) } /** diff --git a/modules/nextflow/src/main/groovy/nextflow/script/ParamsDef.groovy b/modules/nextflow/src/main/groovy/nextflow/script/ParamsDef.groovy index c66e46b4f0..18914cdd58 100644 --- a/modules/nextflow/src/main/groovy/nextflow/script/ParamsDef.groovy +++ b/modules/nextflow/src/main/groovy/nextflow/script/ParamsDef.groovy @@ -26,14 +26,17 @@ import nextflow.Session @CompileStatic class ParamsDef { + private Class clazz + private Closure closure - ParamsDef(Closure closure) { + ParamsDef(Class clazz, Closure closure) { + this.clazz = clazz this.closure = closure } void apply(Session session) { - final dsl = new ParamsDsl() + final dsl = new ParamsDsl(clazz) final cl = (Closure)closure.clone() cl.setDelegate(dsl) cl.setResolveStrategy(Closure.DELEGATE_FIRST) diff --git a/modules/nextflow/src/main/groovy/nextflow/script/ParamsDsl.groovy b/modules/nextflow/src/main/groovy/nextflow/script/ParamsDsl.groovy index 68f28def61..8e35dbf0e5 100644 --- a/modules/nextflow/src/main/groovy/nextflow/script/ParamsDsl.groovy +++ b/modules/nextflow/src/main/groovy/nextflow/script/ParamsDsl.groovy @@ -16,23 +16,18 @@ package nextflow.script +import java.lang.reflect.Type import java.nio.file.Path -import groovy.json.JsonSlurper -import groovy.yaml.YamlSlurper import groovy.transform.Canonical import groovy.transform.CompileStatic import groovy.util.logging.Slf4j import nextflow.Session -import nextflow.file.FileHelper import nextflow.exception.ScriptRuntimeException import nextflow.script.dsl.Types -import nextflow.script.types.Bag -import nextflow.splitter.CsvSplitter -import nextflow.util.ArrayBag import nextflow.util.Duration import nextflow.util.MemoryUnit -import org.codehaus.groovy.runtime.typehandling.DefaultTypeTransformation +import nextflow.util.TypeHelper import org.codehaus.groovy.runtime.typehandling.GroovyCastException /** * Implements the DSL for defining workflow params @@ -43,9 +38,16 @@ import org.codehaus.groovy.runtime.typehandling.GroovyCastException @CompileStatic class ParamsDsl { + private Class clazz + private Map declarations = [:] - void declare(String name, Class type, boolean optional, Object defaultValue = null) { + ParamsDsl(Class clazz) { + this.clazz = clazz + } + + void declare(String name, boolean optional, Object defaultValue = null) { + final type = clazz.getField(name).getGenericType() if( defaultValue == null && type == Boolean ) defaultValue = false declarations[name] = new Param(name, type, optional, defaultValue) @@ -79,8 +81,9 @@ class ParamsDsl { throw new ScriptRuntimeException("Parameter `$name` is required but was not specified on the command line, params file, or config") } + final expectedType = TypeHelper.getRawType(decl.type) final actualType = params[name]?.getClass() - if( actualType != null && !isAssignableFrom(decl.type, actualType) ) + if( actualType != null && !isAssignableFrom(expectedType, actualType) ) throw new ScriptRuntimeException("Parameter `$name` with type ${Types.getName(decl.type)} cannot be assigned to ${params[name]} [${Types.getName(actualType)}]") } @@ -95,10 +98,13 @@ class ParamsDsl { } } - private Object resolveFromCli(Param decl, Object value) { + private static Object resolveFromCli(Param decl, Object value) { if( value == null ) return null + if( value instanceof Collection || value instanceof Map ) + return asType(value, decl) + if( value !instanceof CharSequence ) return value @@ -129,57 +135,42 @@ class ParamsDsl { return MemoryUnit.of(str) } - if( Collection.class.isAssignableFrom(decl.type) ) { - return resolveFromFile(decl.name, decl.type, FileHelper.asPath(str)) - } - if( decl.type == Path ) { - return FileHelper.asPath(str) + return TypeHelper.asPathType(str) } return value } - private Object resolveFromCode(Param decl, Object value) { + private static Object resolveFromCode(Param decl, Object value) { if( value == null ) return null + if( value instanceof Collection || value instanceof Map ) + return asType(value, decl) + if( value !instanceof CharSequence ) return value final str = value.toString() - if( Collection.class.isAssignableFrom(decl.type) ) - return resolveFromFile(decl.name, decl.type, FileHelper.asPath(str)) - if( decl.type == Path ) - return FileHelper.asPath(str) + return TypeHelper.asPathType(str) return value } - private Object resolveFromFile(String name, Class type, Path file) { - final ext = file.getExtension() - final value = switch( ext ) { - case 'csv' -> new CsvSplitter().options(header: true, sep: ',').target(file).list() - case 'json' -> new JsonSlurper().parse(file) - case 'yaml' -> new YamlSlurper().parse(file) - case 'yml' -> new YamlSlurper().parse(file) - default -> throw new ScriptRuntimeException("Unrecognized file format '${ext}' for input file '${file}' supplied for parameter `${name}` -- should be CSV, JSON, or YAML") - } - + private static Object asType(Object value, Param decl) { try { - if( Bag.class.isAssignableFrom(type) && value instanceof Collection ) - return new ArrayBag(value) - return DefaultTypeTransformation.castToType(value, type) + return TypeHelper.asType(value, decl.type) } - catch( GroovyCastException e ) { + catch( GroovyCastException | UnsupportedOperationException e ) { final actualType = value.getClass() - throw new ScriptRuntimeException("Parameter `${name}` with type ${Types.getName(type)} cannot be assigned to contents of '${file}' [${Types.getName(actualType)}]") + throw new ScriptRuntimeException("Parameter `${decl.name}` with type ${Types.getName(decl.type)} cannot be assigned to ${value} [${Types.getName(actualType)}]") } } - private boolean isAssignableFrom(Class target, Class source) { + private static boolean isAssignableFrom(Class target, Class source) { if( target == Float.class ) return Number.class.isAssignableFrom(source) @@ -192,7 +183,7 @@ class ParamsDsl { @Canonical private static class Param { String name - Class type + Type type boolean optional Object defaultValue } diff --git a/modules/nextflow/src/test/groovy/nextflow/script/ParamsDslTest.groovy b/modules/nextflow/src/test/groovy/nextflow/script/ParamsDslTest.groovy index 37a7e2e2a5..e2ceee469c 100644 --- a/modules/nextflow/src/test/groovy/nextflow/script/ParamsDslTest.groovy +++ b/modules/nextflow/src/test/groovy/nextflow/script/ParamsDslTest.groovy @@ -20,11 +20,15 @@ import java.nio.file.Files import java.nio.file.Path import nextflow.Session -import nextflow.file.FileHelper +import nextflow.exception.AbortOperationException import nextflow.exception.ScriptRuntimeException +import nextflow.file.FileHelper import nextflow.script.types.Bag +import nextflow.script.types.Record import spock.lang.Specification import spock.lang.Unroll + +import static test.ScriptHelper.* /** * * @author Ben Sherman @@ -33,30 +37,43 @@ class ParamsDslTest extends Specification { def 'should declare workflow params with CLI overrides'() { given: - def cliParams = [input: './data', chunk_size: '3'] + def inputFile = Files.createTempFile('test', '.csv') + def cliParams = [input: inputFile.toString(), chunk_size: '3'] def configParams = [outdir: 'results'] - def session = new Session([params: configParams + cliParams]) - session.init(null, null, cliParams, configParams) when: - def dsl = new ParamsDsl() - dsl.declare('input', Path, false) - dsl.declare('chunk_size', Integer, false, 1) - dsl.declare('save_intermeds', Boolean, false) - dsl.apply(session) + def result = runScript( + '''\ + params { + input: Path + chunk_size: Integer = 1 + save_intermeds: Boolean + } + + workflow { params } + ''', + config: [params: configParams + cliParams], + params: cliParams, + configParams: configParams + ) then: - session.binding.getParams() == [input: FileHelper.asPath('./data'), chunk_size: 3, save_intermeds: false, outdir: 'results'] + result == [input: inputFile, chunk_size: 3, save_intermeds: false, outdir: 'results'] + + cleanup: + inputFile?.delete() } def 'should allow optional param'() { - given: - def session = new Session() - session.init(null) - when: - def dsl = new ParamsDsl() - dsl.declare('input', Path, true) - dsl.apply(session) + runScript( + '''\ + params { + input: Path? + } + + workflow { params } + ''' + ) then: noExceptionThrown() } @@ -65,14 +82,20 @@ class ParamsDslTest extends Specification { given: def cliParams = [:] def configParams = [outdir: 'results'] - def session = new Session() - session.init(null, null, cliParams, configParams) when: - def dsl = new ParamsDsl() - dsl.declare('input', Path, false) - dsl.declare('save_intermeds', Boolean, false) - dsl.apply(session) + runScript( + '''\ + params { + input: Path + save_intermeds: Boolean + } + + workflow { params } + ''', + params: cliParams, + configParams: configParams + ) then: def e = thrown(ScriptRuntimeException) e.message == 'Parameter `input` is required but was not specified on the command line, params file, or config' @@ -82,251 +105,186 @@ class ParamsDslTest extends Specification { given: def cliParams = [inputs: './data'] def configParams = [outdir: 'results'] - def session = new Session() - session.init(null, null, cliParams, configParams) when: - def dsl = new ParamsDsl() - dsl.declare('input', Path, false) - dsl.declare('save_intermeds', Boolean, false) - dsl.apply(session) + runScript( + '''\ + params { + input: Path + save_intermeds: Boolean + } + + workflow { params } + ''', + params: cliParams, + configParams: configParams + ) then: def e = thrown(ScriptRuntimeException) e.message == 'Parameter `inputs` was specified on the command line or params file but is not declared in the script or config' } def 'should report error for invalid type'() { - given: - def cliParams = [input: './data', save_intermeds: 42] - def configParams = [:] - def session = new Session() - session.init(null, null, cliParams, configParams) - when: - def dsl = new ParamsDsl() - dsl.declare('input', Path, false) - dsl.declare('save_intermeds', Boolean, false) - dsl.apply(session) + runScript( + '''\ + params { + save_intermeds: Boolean + } + + workflow { params } + ''', + params: [save_intermeds: 42] + ) then: def e = thrown(ScriptRuntimeException) e.message == 'Parameter `save_intermeds` with type Boolean cannot be assigned to 42 [Integer]' } - @Unroll - def 'should validate float param with default value'() { + def 'should report error for missing input file'() { given: - def session = new Session() - session.init(null) + def cliParams = [input: 'input.csv'] + def configParams = [:] when: - def dsl = new ParamsDsl() - dsl.declare('factor', Float, false, DEF_VALUE) - dsl.apply(session) + runScript( + '''\ + params { + input: Path + } + + workflow { params } + ''', + params: cliParams, + configParams: configParams + ) then: - noExceptionThrown() - - where: - DEF_VALUE << [ 0.1f, 0.1d, 0.1g ] + def e = thrown(AbortOperationException) + e.message == "Input file 'input.csv' does not exist" } @Unroll - def 'should validate integer param with default value'() { - given: - def session = new Session() - session.init(null) - + def 'should validate float param with default value'() { when: - def dsl = new ParamsDsl() - dsl.declare('factor', Integer, false, DEF_VALUE) - dsl.apply(session) + runScript( + """\ + params { + factor: Float = ${DEF_VALUE} + } + + workflow { params } + """ + ) then: noExceptionThrown() where: - DEF_VALUE << [ 100i, 100l, 100g ] + DEF_VALUE << [ '0.1f', '0.1d', '0.1g' ] } - def 'should load collection param from CSV file'() { - given: - def csvFile = Files.createTempFile('test', '.csv') - csvFile.text = '''\ - id,name,value - 1,sample1,100 - 2,sample2,200 - 3,sample3,300 - '''.stripIndent() - def cliParams = [samples: csvFile.toString()] - def session = new Session() - session.init(null, null, cliParams, [:]) - + @Unroll + def 'should validate integer param with default value'() { when: - def dsl = new ParamsDsl() - dsl.declare('samples', List, false) - dsl.apply(session) - + runScript( + """\ + params { + factor: Integer = ${DEF_VALUE} + } + + workflow { params } + """ + ) then: - def samples = session.binding.getParams().samples - samples instanceof List - samples.size() == 3 - samples[0].id == '1' - samples[0].name == 'sample1' - samples[0].value == '100' - samples[1].id == '2' - samples[2].id == '3' + noExceptionThrown() - cleanup: - csvFile?.delete() + where: + DEF_VALUE << [ '100i', '100l', '100g' ] } - def 'should load collection param from JSON file'() { + def 'should validate record collection param'() { given: - def jsonFile = Files.createTempFile('test', '.json') - jsonFile.text = '''\ - [ - {"id": 1, "name": "sample1", "value": 100}, - {"id": 2, "name": "sample2", "value": 200}, - {"id": 3, "name": "sample3", "value": 300} - ] - '''.stripIndent() - def cliParams = [ - samplesList: jsonFile.toString(), - samplesBag: jsonFile.toString(), - samplesSet: jsonFile.toString() + def samples = [ + [id: 1, name: "sample1", value: 100], + [id: 2, name: "sample2", value: 200], + [id: 3, name: "sample3", value: 300] ] - def session = new Session() - session.init(null, null, cliParams, [:]) - - when: - def dsl = new ParamsDsl() - dsl.declare('samplesList', List, false) - dsl.declare('samplesBag', Bag, false) - dsl.declare('samplesSet', Set, false) - dsl.apply(session) - - then: - def samplesList = session.binding.getParams().samplesList - samplesList instanceof List - samplesList.size() == 3 - samplesList[0].id == 1 - samplesList[0].name == 'sample1' - samplesList[0].value == 100 - samplesList[1].id == 2 - samplesList[2].id == 3 - - def samplesBag = session.binding.getParams().samplesBag - samplesBag instanceof Bag - samplesBag.size() == 3 - - def samplesSet = session.binding.getParams().samplesSet - samplesSet instanceof Set - samplesSet.size() == 3 - - cleanup: - jsonFile?.delete() - } - - def 'should load collection param from YAML file'() { - given: - def yamlFile = Files.createTempFile('test', '.yml') - yamlFile.text = '''\ - - id: 1 - name: sample1 - value: 100 - - id: 2 - name: sample2 - value: 200 - - id: 3 - name: sample3 - value: 300 - '''.stripIndent() - def cliParams = [samples: yamlFile.toString()] - def session = new Session() - session.init(null, null, cliParams, [:]) when: - def dsl = new ParamsDsl() - dsl.declare('samples', List, false) - dsl.apply(session) - + def result = runScript( + '''\ + params { + samples: List + } + + record Sample { + id: Integer + name: String + value: Integer + } + + workflow { + params.samples + } + ''', + params: [samples: samples] + ) then: - def samples = session.binding.getParams().samples - samples instanceof List - samples.size() == 3 - samples[0].id == 1 - samples[0].name == 'sample1' - samples[0].value == 100 - samples[1].id == 2 - samples[2].id == 3 - - cleanup: - yamlFile?.delete() + result instanceof List + result.size() == 3 + result[0] instanceof Record + result[0].id == 1 + result[0].name == 'sample1' + result[0].value == 100 + result[1].id == 2 + result[2].id == 3 } - def 'should load collection param from file specified in config'() { - given: - def jsonFile = Files.createTempFile('test', '.json') - jsonFile.text = '[{"x": 1}, {"x": 2}]' - def configParams = [items: jsonFile.toString()] - def session = new Session() - session.init(null, null, [:], configParams) - + def 'should report error for invalid record type'() { when: - def dsl = new ParamsDsl() - dsl.declare('items', List, false) - dsl.apply(session) - - then: - def items = session.binding.getParams().items - items instanceof List - items.size() == 2 - items[0].x == 1 - items[1].x == 2 - - cleanup: - jsonFile?.delete() - } - - def 'should report error for unrecognized file format'() { - given: - def txtFile = Files.createTempFile('test', '.txt') - txtFile.text = 'some text' - def cliParams = [items: txtFile.toString()] - def session = new Session() - session.init(null, null, cliParams, [:]) - - when: - def dsl = new ParamsDsl() - dsl.declare('items', List, false) - dsl.apply(session) - + runScript( + '''\ + params { + items: List + } + + workflow { params.items } + ''', + params: [items: [not: 'a list']] + ) then: def e = thrown(ScriptRuntimeException) - e.message.contains("Unrecognized file format 'txt'") - e.message.contains("supplied for parameter `items` -- should be CSV, JSON, or YAML") - - cleanup: - txtFile?.delete() + e.message.contains('Parameter `items` with type List cannot be assigned to') } - def 'should report error for invalid file content type'() { + def 'should report error for missing record field'() { given: - def jsonFile = Files.createTempFile('test', '.json') - jsonFile.text = '{"not": "a list"}' - def cliParams = [items: jsonFile.toString()] - def session = new Session() - session.init(null, null, cliParams, [:]) + def samples = [ + [id: 1, name: "sample1"] + ] when: - def dsl = new ParamsDsl() - dsl.declare('items', List, false) - dsl.apply(session) + runScript( + '''\ + params { + samples: List + } + + record Sample { + id: Integer + name: String + value: Integer + } + + workflow { + params.samples + } + ''', + params: [samples: samples] + ) then: - def e = thrown(ScriptRuntimeException) - e.message.contains('Parameter `items` with type List cannot be assigned to contents of') - - cleanup: - jsonFile?.delete() + def e = thrown(AbortOperationException) + e.message == "Input record [id:1, name:sample1] is missing field 'value' required by record type 'Sample'" } } diff --git a/modules/nextflow/src/test/groovy/nextflow/script/ScriptRunnerTest.groovy b/modules/nextflow/src/test/groovy/nextflow/script/ScriptRunnerTest.groovy index 0e37c7017e..a9e87339cf 100644 --- a/modules/nextflow/src/test/groovy/nextflow/script/ScriptRunnerTest.groovy +++ b/modules/nextflow/src/test/groovy/nextflow/script/ScriptRunnerTest.groovy @@ -57,7 +57,7 @@ class ScriptRunnerTest extends Dsl2Spec { """ when: - def result = runScript(script, config) + def result = runScript(script, config: config) then: result instanceof DataflowVariable @@ -91,7 +91,7 @@ class ScriptRunnerTest extends Dsl2Spec { } ''' when: - runScript(script, config) + runScript(script, config: config) def processor = TaskProcessor.currentProcessor() then: processor.name == 'simpleTask' @@ -227,7 +227,7 @@ class ScriptRunnerTest extends Dsl2Spec { when: def config = [process:[executor: 'nope']] - runScript(script, config) + runScript(script, config: config) then: def e = thrown(ScriptCompilationException) @@ -259,7 +259,7 @@ class ScriptRunnerTest extends Dsl2Spec { def config = [process: [executor: 'nope']] when: - def result = runScript(script, config) + def result = runScript(script, config: config) then: result.val == 'cat filename' @@ -291,7 +291,7 @@ class ScriptRunnerTest extends Dsl2Spec { ''' when: - runScript(script, config) + runScript(script, config: config) def process = TaskProcessor.currentProcessor() then: @@ -335,7 +335,7 @@ class ScriptRunnerTest extends Dsl2Spec { ''' when: - runScript(script, config) + runScript(script, config: config) def process = TaskProcessor.currentProcessor() then: @@ -368,7 +368,7 @@ class ScriptRunnerTest extends Dsl2Spec { ''' when: - runScript(script, config) + runScript(script, config: config) def process = TaskProcessor.currentProcessor() then: @@ -404,7 +404,7 @@ class ScriptRunnerTest extends Dsl2Spec { ''' when: - runScript(script, config) + runScript(script, config: config) def process = TaskProcessor.currentProcessor() then: @@ -433,7 +433,7 @@ class ScriptRunnerTest extends Dsl2Spec { ''' when: - runScript(script, config) + runScript(script, config: config) def process = TaskProcessor.currentProcessor() then: @@ -478,7 +478,7 @@ class ScriptRunnerTest extends Dsl2Spec { ''' when: - def result = runScript(script, config) + def result = runScript(script, config: config) .getVal() .toString() .stripIndent() @@ -526,7 +526,7 @@ class ScriptRunnerTest extends Dsl2Spec { def config = [process: [executor:'nope']] when: - def result = runScript(script, config) + def result = runScript(script, config: config) .getVal() .toString() .stripIndent() @@ -628,7 +628,7 @@ class ScriptRunnerTest extends Dsl2Spec { ''' when: - def result = runScript(script, config) + def result = runScript(script, config: config) then: result instanceof DataflowVariable diff --git a/modules/nextflow/src/testFixtures/groovy/test/ScriptHelper.groovy b/modules/nextflow/src/testFixtures/groovy/test/ScriptHelper.groovy index 85bea9167d..43ed4be041 100644 --- a/modules/nextflow/src/testFixtures/groovy/test/ScriptHelper.groovy +++ b/modules/nextflow/src/testFixtures/groovy/test/ScriptHelper.groovy @@ -71,7 +71,7 @@ class ScriptHelper { def session = opts.config ? new MockSession(opts.config as Map) : new MockSession() session.setBinding(new ScriptBinding()) - session.init(null) + session.init(null, null, opts.params, opts.configParams) session.start() def loader = ScriptLoaderFactory.create(session) @@ -154,14 +154,14 @@ class ScriptHelper { * last statement of the entry workflow, which can be used to pass * output channels to a test. * + * @param opts * @param text - * @param config */ - static Object runScript(String text, Map config = null) { - def session = config ? new MockSession(config) : new MockSession() + static Object runScript(Map opts = [:], String text) { + def session = opts.config ? new MockSession(opts.config) : new MockSession() session.setBinding(new ScriptBinding()) - session.init(null) + session.init(null, null, opts.params, opts.configParams) session.start() def loader = ScriptLoaderFactory.create(session) diff --git a/modules/nextflow/src/main/groovy/nextflow/util/HashBag.java b/modules/nf-commons/src/main/nextflow/util/HashBag.java similarity index 100% rename from modules/nextflow/src/main/groovy/nextflow/util/HashBag.java rename to modules/nf-commons/src/main/nextflow/util/HashBag.java diff --git a/modules/nf-commons/src/main/nextflow/util/TypeHelper.groovy b/modules/nf-commons/src/main/nextflow/util/TypeHelper.groovy index 94a9ba7dfc..74858cee63 100644 --- a/modules/nf-commons/src/main/nextflow/util/TypeHelper.groovy +++ b/modules/nf-commons/src/main/nextflow/util/TypeHelper.groovy @@ -16,31 +16,40 @@ package nextflow.util +import java.lang.reflect.Field import java.lang.reflect.ParameterizedType import java.lang.reflect.Type +import java.nio.file.Files +import java.nio.file.Path import groovy.transform.CompileStatic +import groovy.transform.Memoized +import nextflow.exception.AbortOperationException +import nextflow.file.FileHelper +import nextflow.script.dsl.Nullable +import nextflow.script.types.Bag +import nextflow.script.types.Record +import nextflow.util.HashBag +import nextflow.util.RecordMap +import org.codehaus.groovy.runtime.typehandling.DefaultTypeTransformation +import org.codehaus.groovy.runtime.typehandling.GroovyCastException /** - * A utility class that provides helper methods for working with generic types at runtime. - *

- * This class is designed to extract type information from objects that have generic superclasses. - *

+ * Utility functions for working with types at runtime. * * @author Paolo Di Tommaso + * @author Ben Sherman */ @CompileStatic class TypeHelper { /** - * Retrieves the generic type at the specified index from the given object's superclass. + * Returns the given type argument from the given value's superclass. * - *

This method assumes that the object's class extends a parameterized type, - * and it returns the type argument at the given index.

+ * This method assumes that the value's class extends a parameterized type. * - * @param object the object whose generic type is to be retrieved - * @param index the index of the generic type parameter (starting from 0) - * @return the {@link Type} representing the generic type at the specified index + * @param value + * @param index * * @example *
@@ -50,9 +59,137 @@ class TypeHelper {
      * System.out.println(type); // Output: class java.lang.Integer
      * 
*/ - static Type getGenericType(Object object, int index) { - final params = (ParameterizedType) (object.getClass().getGenericSuperclass()) + static Type getGenericType(Object value, int index) { + final params = (ParameterizedType) value.getClass().getGenericSuperclass() return params.getActualTypeArguments()[index] } + /** + * Get the concrete Java class for a type. + * + * @param type + */ + static Class getRawType(Type type) { + return \ + type instanceof Class ? type : + type instanceof ParameterizedType ? type.getRawType() : + Object + } + + /** + * Determine whether a type is a collection type (Bag, List, Set). + * + * @param type + */ + static boolean isCollectionType(Type type) { + return Collection.class.isAssignableFrom(getRawType(type)) + } + + /** + * Determine whether a type is a record type. + * + * @param type + */ + static boolean isRecordType(Type type) { + return type instanceof Class && Record.class.isAssignableFrom(type) + } + + /** + * Convert a value to the given type. + * + * @param value + * @param type + */ + static Object asType(Object value, Type type) { + if( value == null ) + return null + + if( isCollectionType(type) ) + return asCollectionType(value as Collection, type) + + if( isRecordType(type) ) + return asRecordType(value as Map, (Class) type) + + if( type == Path ) + return TypeHelper.asPathType(value.toString()) + + return DefaultTypeTransformation.castToType(value, getRawType(type)) + } + + /** + * Convert a collection to the given collection type. + * + * If the type specifies an element type, each element in the + * collection is converted to that type. + * + * @param collection + * @param type + */ + static Collection asCollectionType(Collection collection, Type type) { + if( type instanceof ParameterizedType ) { + final elementType = type.getActualTypeArguments()[0] + collection = collection.collect { el -> asType(el, elementType) } + } + + return switch( getRawType(type) ) { + case Bag.class -> new HashBag<>(collection) + case List.class -> collection as List + case Set.class -> collection as Set + default -> collection + } + } + + /** + * Convert a string representing a file path to a Path. + * Report an error if the path does not exist. + * + * @param str + */ + static Path asPathType(String str) { + final result = FileHelper.asPath(str) + if( !Files.exists(result) ) + throw new AbortOperationException("Input file '${str}' does not exist") + return result + } + + /** + * Convert a map to a record, validating it against the given + * record type. + * + * @param map + * @param type + */ + static Record asRecordType(Map map, Class type) { + final fields = recordFields(type) + + for( final field : fields.values() ) { + if( field.isAnnotationPresent(Nullable.class) ) + continue + if( map.get(field.getName()) == null ) + throw new AbortOperationException("Input record ${map} is missing field '${field.getName()}' required by record type '${type.getSimpleName()}'") + } + + final result = new HashMap(map.size()) + for( final entry : map.entrySet() ) { + final name = entry.key + final value = fields.containsKey(name) + ? asType(entry.value, fields[name].getGenericType()) + : entry.value + result.put(name, value) + } + return new RecordMap(result) + } + + @Memoized + private static Map recordFields(Class type) { + final fields = type.getDeclaredFields() + final result = new HashMap(fields.size()) + for( final field : fields ) { + if( field.isSynthetic() ) + continue + result.put(field.getName(), field) + } + return result + } + } diff --git a/modules/nf-commons/src/test/nextflow/serde/GsonEncoderTest.groovy b/modules/nf-commons/src/test/nextflow/serde/GsonEncoderTest.groovy index 871755a7d2..dea59b13f3 100644 --- a/modules/nf-commons/src/test/nextflow/serde/GsonEncoderTest.groovy +++ b/modules/nf-commons/src/test/nextflow/serde/GsonEncoderTest.groovy @@ -64,7 +64,7 @@ class GsonEncoderTest extends Specification { animal == dog } - def 'should encode and decode polymorphic class/1'() { + def 'should encode and decode polymorphic class/2'() { given: def encoder = new MyEncoder() def dog = new Cat("bau", true) diff --git a/modules/nextflow/src/test/groovy/nextflow/util/HashBagTest.groovy b/modules/nf-commons/src/test/nextflow/util/HashBagTest.groovy similarity index 100% rename from modules/nextflow/src/test/groovy/nextflow/util/HashBagTest.groovy rename to modules/nf-commons/src/test/nextflow/util/HashBagTest.groovy diff --git a/modules/nf-commons/src/test/nextflow/util/TypeHelperTest.groovy b/modules/nf-commons/src/test/nextflow/util/TypeHelperTest.groovy new file mode 100644 index 0000000000..f53232a44f --- /dev/null +++ b/modules/nf-commons/src/test/nextflow/util/TypeHelperTest.groovy @@ -0,0 +1,209 @@ +/* + * Copyright 2013-2026, Seqera Labs + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package nextflow.util + +import java.nio.file.Files +import java.nio.file.Path + +import nextflow.exception.AbortOperationException +import nextflow.script.dsl.Nullable +import nextflow.script.types.Bag +import nextflow.script.types.Record +import org.codehaus.groovy.runtime.typehandling.GroovyCastException +import spock.lang.Specification + +/** + * @author Ben Sherman + */ +class TypeHelperTest extends Specification { + + // helper classes + + static class Sample implements Record { + String name + Integer count + @Nullable String optional + } + + static class Params implements Record { + List samples + } + + // ---- getRawType ---- + + def 'should return the Class itself as raw type'() { + expect: + TypeHelper.getRawType(String) == String + TypeHelper.getRawType(Integer) == Integer + } + + def 'should extract raw type from ParameterizedType'() { + given: + def type = Params.getDeclaredField('samples').getGenericType() + + expect: + TypeHelper.getRawType(type) == List + } + + // ---- isCollectionType ---- + + def 'should determine whether a type is a collection type'() { + expect: + TypeHelper.isCollectionType(List) + TypeHelper.isCollectionType(Set) + TypeHelper.isCollectionType(Bag) + !TypeHelper.isCollectionType(String) + !TypeHelper.isCollectionType(Integer) + !TypeHelper.isCollectionType(Path) + !TypeHelper.isCollectionType(Record) + !TypeHelper.isCollectionType(Sample) + } + + // ---- isRecordType ---- + + def 'should determine whether a type is a record type'() { + expect: + TypeHelper.isRecordType(Record) + TypeHelper.isRecordType(Sample) + !TypeHelper.isRecordType(String) + !TypeHelper.isRecordType(Integer) + !TypeHelper.isRecordType(Path) + !TypeHelper.isRecordType(List) + !TypeHelper.isRecordType(Set) + !TypeHelper.isRecordType(Bag) + } + + // ---- asType ---- + + def 'should convert raw value to target type'() { + given: + def inputFile = Files.createTempFile('test', '.txt') + + expect: 'primitive type' + TypeHelper.asType(null, String) == null + TypeHelper.asType('42', String) == '42' + TypeHelper.asType(42, Integer) == 42 + + when: 'list type' + def result = TypeHelper.asType([1, 2, 3], List) + then: + result instanceof List + result == [1, 2, 3] + + when: 'set type' + result = TypeHelper.asType([1, 2, 2], Set) + then: + result instanceof Set + result == [1, 2] as Set + + when: 'path type' + result = TypeHelper.asType(inputFile.toString(), Path) + then: + result instanceof Path + result == inputFile + + when: 'record type' + result = TypeHelper.asType([name: 'Alice', count: 5], Sample) + then: + result instanceof RecordMap + result.name == 'Alice' + result.count == 5 + + cleanup: + inputFile?.delete() + } + + def 'should convert raw data structure to lists and records'() { + when: + def params = [ + samples: [ + [name: 'Alice', count: 3, extra: 'value'] + ] + ] + def result = TypeHelper.asRecordType(params, Params) + then: + result instanceof RecordMap + result.samples instanceof List + result.samples[0] instanceof RecordMap + result.samples[0].name == 'Alice' + result.samples[0].count == 3 + result.samples[0].optional == null + result.samples[0].extra == 'value' + } + + // ---- asCollectionType ---- + + def 'should convert raw collection to collection type'() { + when: + def result = TypeHelper.asCollectionType([1, 2, 2], Bag) + then: + result == new HashBag<>([1, 2, 2]) + + when: + result = TypeHelper.asCollectionType([3, 1, 2], List) + then: + result instanceof List + result == [3, 1, 2] + + when: + result = TypeHelper.asCollectionType([1, 2, 2], Set) + then: + result instanceof Set + result == [1, 2] as Set + } + + def 'should convert each collection element to the element type if specified'() { + given: + def type = Params.getDeclaredField('samples').getGenericType() + + when: + def result = TypeHelper.asCollectionType([ [name: 'Alice', count: 3] ], type) + then: + result instanceof List + result.size() == 1 + result[0] instanceof RecordMap + result[0].name == 'Alice' + result[0].count == 3 + } + + // ---- asRecordType ---- + + def 'should convert raw map to a record'() { + when: + def result = TypeHelper.asRecordType([name: 'Alice', count: 3, extra: 'value'], Sample) + then: + result instanceof RecordMap + result.name == 'Alice' + result.count == 3 + result.optional == null + result.extra == 'value' + } + + def 'should report error when a required field is null'() { + when: + TypeHelper.asRecordType([name: null, count: 5], Sample) + then: + thrown(AbortOperationException) + } + + def 'should report error when a required field is absent'() { + when: + TypeHelper.asRecordType([name: 'Alice'], Sample) + then: + thrown(AbortOperationException) + } + +} diff --git a/modules/nf-lang/src/main/java/nextflow/script/control/ScriptToGroovyVisitor.java b/modules/nf-lang/src/main/java/nextflow/script/control/ScriptToGroovyVisitor.java index cc0a50f089..dc354c051d 100644 --- a/modules/nf-lang/src/main/java/nextflow/script/control/ScriptToGroovyVisitor.java +++ b/modules/nf-lang/src/main/java/nextflow/script/control/ScriptToGroovyVisitor.java @@ -15,6 +15,7 @@ */ package nextflow.script.control; +import java.lang.reflect.Modifier; import java.util.Arrays; import java.util.Comparator; import java.util.List; @@ -36,9 +37,12 @@ import nextflow.script.ast.ScriptNode; import nextflow.script.ast.ScriptVisitorSupport; import nextflow.script.ast.WorkflowNode; +import nextflow.script.dsl.Nullable; import org.codehaus.groovy.ast.ASTNode; +import org.codehaus.groovy.ast.ClassHelper; import org.codehaus.groovy.ast.ClassNode; import org.codehaus.groovy.ast.CodeVisitorSupport; +import org.codehaus.groovy.ast.FieldNode; import org.codehaus.groovy.ast.MethodNode; import org.codehaus.groovy.ast.Parameter; import org.codehaus.groovy.ast.VariableScope; @@ -160,22 +164,39 @@ public void visitInclude(IncludeNode node) { @Override public void visitParams(ParamBlockNode node) { + var paramsType = new RecordNode(packageName(moduleNode) + "." + "__Params"); + for( var param : node.declarations ) { + var fn = new FieldNode( + param.getName(), + Modifier.PUBLIC, + param.getType(), + paramsType, + param.getInitialExpression() + ); + paramsType.addField(fn); + } + moduleNode.addClass(paramsType); + var statements = Arrays.stream(node.declarations) .map((param) -> { - var name = constX(param.getName()); - var type = classX(param.getType()); - var optional = constX(param.getType().getNodeMetaData(ASTNodeMarker.NULLABLE) != null); + var name = param.getName(); + var optional = param.getType().getNodeMetaData(ASTNodeMarker.NULLABLE) != null; var arguments = param.hasInitialExpression() - ? args(name, type, optional, param.getInitialExpression()) - : args(name, type, optional); + ? args(constX(name), constX(optional), param.getInitialExpression()) + : args(constX(name), constX(optional)); return stmt(callThisX("declare", arguments)); }) .toList(); var closure = closureX(block(new VariableScope(), statements)); - var result = stmt(callThisX("params", args(closure))); + var result = stmt(callThisX("params", args(classX(paramsType), closure))); moduleNode.addStatement(result); } + private static String packageName(ScriptNode moduleNode) { + var scriptClass = moduleNode.getClasses().get(0); + return scriptClass.getNameWithoutPackage(); + } + @Override public void visitParamV1(ParamNodeV1 node) { var result = stmt(assignX(node.target, node.value)); @@ -299,8 +320,15 @@ public void visitOutputs(OutputBlockNode node) { moduleNode.addStatement(result); } + private static final ClassNode NULLABLE = ClassHelper.makeCached(Nullable.class); + @Override public void visitRecord(RecordNode node) { + for( var fn : node.getFields() ) { + if( fn.getType().getNodeMetaData(ASTNodeMarker.NULLABLE) != null ) + fn.addAnnotation(NULLABLE); + } + var result = stmt(callThisX("declareType", args(classX(node)))); moduleNode.addStatement(result); } diff --git a/modules/nf-lang/src/main/java/nextflow/script/dsl/Nullable.java b/modules/nf-lang/src/main/java/nextflow/script/dsl/Nullable.java new file mode 100644 index 0000000000..263f7f1cfa --- /dev/null +++ b/modules/nf-lang/src/main/java/nextflow/script/dsl/Nullable.java @@ -0,0 +1,32 @@ +/* + * Copyright 2013-2026, Seqera Labs + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package nextflow.script.dsl; + +import java.lang.annotation.ElementType; +import java.lang.annotation.Retention; +import java.lang.annotation.RetentionPolicy; +import java.lang.annotation.Target; + +/** + * Annotation for denoting that a field or method return value + * can be null (equivalent to `?` suffix in a Nextflow type annotation). + * + * @author Ben Sherman + */ +@Retention(RetentionPolicy.RUNTIME) +@Target({ ElementType.FIELD, ElementType.METHOD }) +public @interface Nullable { +}