diff --git a/CHANGELOG.md b/CHANGELOG.md
index a526970d8..74fdfa05e 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -10,6 +10,27 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
### Enhancements and fixes
- [PR #1663](https://github.com/nf-core/rnaseq/pull/1663) - Bump version after release 3.22.2
+- [PR #1616](https://github.com/nf-core/rnaseq/pull/1616) - Add Sylph for contamination detection.
+
+## Parameters
+
+| Old parameter | New parameter |
+| ------------- | ------------------ |
+| | `--sylph_db` |
+| | `--sylph_taxonomy` |
+
+### Software dependencies
+
+| Dependency | Old version | New version |
+| ----------- | ----------- | ----------- |
+| `sylph` | | 0.7.0 |
+| `sylph-tax` | | 1.2.0 |
+
+> **NB:** Dependency has been **updated** if both old and new version information is present.
+>
+> **NB:** Dependency has been **added** if just the new version information is present.
+>
+> **NB:** Dependency has been **removed** if new version information isn't present.
## [[3.22.2](https://github.com/nf-core/rnaseq/releases/tag/3.22.2)] - 2025-12-11
diff --git a/CITATIONS.md b/CITATIONS.md
index 5eaeea7f8..79a31ca13 100644
--- a/CITATIONS.md
+++ b/CITATIONS.md
@@ -88,6 +88,10 @@
> Kovaka S, Zimin AV, Pertea GM, Razaghi R, Salzberg SL, Pertea M. Transcriptome assembly from long-read RNA-seq alignments with StringTie2 Genome Biol. 2019 Dec 16;20(1):278. doi: 10.1186/s13059-019-1910-1. PubMed PMID: 31842956; PubMed Central PMCID: PMC6912988.
+- [Sylph](https://pubmed.ncbi.nlm.nih.gov/39379646/)
+
+ > Shaw J, Yu YW. Rapid species-level metagenome profiling and containment estimation with sylph. Nat Biotechnol. 2025 Aug;43(8):1348-1359. doi: 10.1038/s41587-024-02412-y. Epub 2024 Oct 8. PMID: 39379646; PMCID: PMC12339375.
+
- [Trim Galore!](https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/)
- [UMI-tools](https://pubmed.ncbi.nlm.nih.gov/28100584/)
diff --git a/README.md b/README.md
index 79619a722..4ae4c97a1 100644
--- a/README.md
+++ b/README.md
@@ -49,7 +49,9 @@
3. [`dupRadar`](https://bioconductor.org/packages/release/bioc/html/dupRadar.html)
4. [`Preseq`](http://smithlabresearch.org/software/preseq/)
5. [`DESeq2`](https://bioconductor.org/packages/release/bioc/html/DESeq2.html)
- 6. [`Kraken2`](https://ccb.jhu.edu/software/kraken2/) -> [`Bracken`](https://ccb.jhu.edu/software/bracken/) on unaligned sequences; _optional_
+ 6. Contamination detection on unaligned sequences; _optional_
+ 1. [`Kraken2`](https://ccb.jhu.edu/software/kraken2/) -> [`Bracken`](https://ccb.jhu.edu/software/bracken/)
+ 2. [`Sylph`](https://sylph-docs.github.io/)
15. Pseudoalignment and quantification ([`Salmon`](https://combine-lab.github.io/salmon/) or ['Kallisto'](https://pachterlab.github.io/kallisto/); _optional_)
16. Present QC for raw read, alignment, gene biotype, sample similarity, and strand-specificity checks ([`MultiQC`](http://multiqc.info/), [`R`](https://www.r-project.org/))
diff --git a/docs/images/sylphtax-top-n-plot.png b/docs/images/sylphtax-top-n-plot.png
new file mode 100644
index 000000000..1a0c6460e
Binary files /dev/null and b/docs/images/sylphtax-top-n-plot.png differ
diff --git a/docs/output.md b/docs/output.md
index 04b9c71cd..40dacb7e6 100644
--- a/docs/output.md
+++ b/docs/output.md
@@ -57,6 +57,7 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d
- [featureCounts](#featurecounts)
- [DESeq2](#deseq2)
- [Kraken2/Bracken](#kraken2bracken)
+ - [Sylph](#Sylph)
- [MultiQC](#multiqc)
- [Pseudoalignment and quantification](#pseudoalignment-and-quantification)
- [Pseudoalignment](#pseudoalignment)
@@ -751,6 +752,21 @@ The plot on the left hand side shows the standard PC plot - notice the variable

+### Sylph
+
+
+Output files
+
+- `/contaminants/sylph`
+ - `*.tsv` Summary of containment ANI and abundances of detected species in the sample. See the [Sylph documentation](https://sylph-docs.github.io/Output-format/) for full details on the output format.
+ - `*.sylphmpa` Taxonomic report of unaligned reads from `sylph-tax`. See the [Sylph documentation](https://sylph-docs.github.io/sylph-tax-output-format/) for full details on the output format.
+
+
+
+[Sylph](https://sylph-docs.github.io/) is a metagenomic profiler that determines the species present in reads by statistically estimating containment ANI. Its companion script, [sylph-tax](https://sylph-docs.github.io/sylph-tax/), converts these ANI estimates into estimated taxonomic abundances in the sample. These algorithms are run on unaligned sequences to detect potential contamination of samples. MultiQC shows the Top 10 strains in the Sylph-tax abundance estimates, with toggles available for higher taxonomic levels.
+
+
+
### MultiQC
diff --git a/docs/usage.md b/docs/usage.md
index 39706c3ac..b8160a543 100644
--- a/docs/usage.md
+++ b/docs/usage.md
@@ -414,11 +414,19 @@ By default, the input GTF file will be filtered to ensure that sequence names co
## Contamination screening options
-The pipeline provides the option to scan unaligned reads for contamination from other species using [Kraken2](https://ccb.jhu.edu/software/kraken2/), with the possibility of applying corrections from [Bracken](https://ccb.jhu.edu/software/bracken/). Since running Bracken is not computationally expensive, we recommend always using it to refine the abundance estimates generated by Kraken2.
+:::note
+The `--contaminant_screening` option is not currently available using ARM architecture ('-profile arm')
+:::
+
+The pipeline provides the option to scan unaligned reads for contamination from other species using either [Sylph](https://sylph-docs.github.io/) or [Kraken2](https://ccb.jhu.edu/software/kraken2/), with the possibility of applying corrections from [Bracken](https://ccb.jhu.edu/software/bracken/). Since running Bracken is not computationally expensive, we recommend always using it to refine the abundance estimates generated by Kraken2.
+
+Sylph is a [faster and much more memory-efficient tool](https://doi.org/10.1038/s41587-024-02412-y) with about equal precision in species detection to Kraken2/Bracken. Sylph also has lower rates of false positives. However, Sylph does not assign specific reads to species; it only provides overall abundance estimates. Sylph abundance estimates also [cannot assign a certain percentage of reads as unclassified](https://github.com/bluenote-1577/sylph/issues/49).
+
+Pre-constructed sylph databases can be found [here](https://sylph-docs.github.io/pre%E2%80%90built-databases/) and taxonomies [here](https://sylph-docs.github.io/sylph-tax/). The [documentation](https://sylph-docs.github.io/sylph-tax/) also has instructions on creating custom databases/taxonomies. As a newer tool, the effect of database choice on Sylph's performance has not been explored as thoroughly as for Kraken2 or Bracken. However, the following comments on choosing databases for Kraken2 are very likely still applicable to an extent for Sylph.
-It is important to note that the accuracy of Kraken2 is [highly dependent on the database](https://doi.org/10.1099/mgen.0.000949) used. Specifically, it is [crucial](https://doi.org/10.1128/mbio.01607-23) to ensure that the host genome is included in the database. If you are particularly concerned about certain contaminants, it may be beneficial to use a smaller, more focused database containing primarily those contaminants instead of the full standard database. Various pre-built databases [are available for download](https://benlangmead.github.io/aws-indexes/k2), and instructions for building a custom database can be found in the [Kraken2 documentation](https://github.com/DerrickWood/kraken2/blob/master/docs/MANUAL.markdown). Additionally, genomes of contaminants detected in previous sequencing experiments are available on the [OpenContami website](https://openlooper.hgc.jp/opencontami/help/help_oct.php).
+The accuracy of Kraken2 is [highly dependent on the database](https://doi.org/10.1099/mgen.0.000949) used. Specifically, it is [crucial](https://doi.org/10.1128/mbio.01607-23) to ensure that the host genome/transcriptome is included in the database. (Note that the pre-built sylph databases do _not_ appear to contain the human genome/transcriptome). If you are particularly concerned about certain contaminants, it may be beneficial to use a smaller, more focused database containing primarily those contaminants instead of the full standard database. Various pre-built databases [are available for download](https://benlangmead.github.io/aws-indexes/k2), and instructions for building a custom database can be found in the [Kraken2 documentation](https://github.com/DerrickWood/kraken2/blob/master/docs/MANUAL.markdown). Additionally, genomes of contaminants detected in previous sequencing experiments are available on the [OpenContami website](https://openlooper.hgc.jp/opencontami/help/help_oct.php).
-While Kraken2 is capable of detecting low-abundance contaminants in a sample, false positives can occur. Therefore, if only a very small number of reads from a contaminating species are detected, these results should be interpreted with caution.
+While Kraken2 is capable of detecting low-abundance contaminants in a sample, false positives can occur. Therefore, if only a very small number of reads from a contaminating species are detected, these results should be interpreted with caution. Lastly, while Kraken2 can be used without Bracken, since running Bracken is not computationally expensive, we recommend always using it to refine the abundance estimates generated by Kraken2.
## Running the pipeline
diff --git a/modules.json b/modules.json
index 4589b12db..533c4e456 100644
--- a/modules.json
+++ b/modules.json
@@ -262,6 +262,16 @@
"git_sha": "1f008221e451e7a4738226c49e69aaa2eb731369",
"installed_by": ["modules", "quantify_pseudo_alignment"]
},
+ "sylph/profile": {
+ "branch": "master",
+ "git_sha": "41dfa3f7c0ffabb96a6a813fe321c6d1cc5b6e46",
+ "installed_by": ["modules"]
+ },
+ "sylphtax/taxprof": {
+ "branch": "master",
+ "git_sha": "41dfa3f7c0ffabb96a6a813fe321c6d1cc5b6e46",
+ "installed_by": ["modules"]
+ },
"trimgalore": {
"branch": "master",
"git_sha": "05954dab2ff481bcb999f24455da29a5828af08d",
diff --git a/modules/nf-core/sylph/profile/environment.yml b/modules/nf-core/sylph/profile/environment.yml
new file mode 100644
index 000000000..ae8337cc8
--- /dev/null
+++ b/modules/nf-core/sylph/profile/environment.yml
@@ -0,0 +1,7 @@
+---
+# yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/modules/environment-schema.json
+channels:
+ - conda-forge
+ - bioconda
+dependencies:
+ - bioconda::sylph=0.7.0
diff --git a/modules/nf-core/sylph/profile/main.nf b/modules/nf-core/sylph/profile/main.nf
new file mode 100644
index 000000000..1231bab39
--- /dev/null
+++ b/modules/nf-core/sylph/profile/main.nf
@@ -0,0 +1,51 @@
+process SYLPH_PROFILE {
+ tag "$meta.id"
+ label 'process_high'
+
+ conda "${moduleDir}/environment.yml"
+ container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
+ 'https://depot.galaxyproject.org/singularity/sylph:0.7.0--h919a2d8_0' :
+ 'biocontainers/sylph:0.7.0--h919a2d8_0' }"
+
+ input:
+ tuple val(meta), path(reads)
+ path(database)
+
+ output:
+ tuple val(meta), path('*.tsv'), emit: profile_out
+ path "versions.yml" , emit: versions
+
+ when:
+ task.ext.when == null || task.ext.when
+
+ script:
+ def args = task.ext.args ?: ''
+ def prefix = task.ext.prefix ?: "${meta.id}"
+ def input = meta.single_end ? "${reads}" : "-1 ${reads[0]} -2 ${reads[1]}"
+ """
+ sylph profile \\
+ -t $task.cpus \\
+ $args \\
+ $database\\
+ $input \\
+ -o ${prefix}.tsv
+
+ cat <<-END_VERSIONS > versions.yml
+ "${task.process}":
+ sylph: \$(sylph -V | awk '{print \$2}')
+ END_VERSIONS
+ """
+
+ stub:
+ def args = task.ext.args ?: ''
+ def prefix = task.ext.prefix ?: "${meta.id}"
+ def input = meta.single_end ? "${reads}" : "-1 ${reads[0]} -2 ${reads[1]}"
+ """
+ touch ${prefix}.tsv
+ cat <<-END_VERSIONS > versions.yml
+ "${task.process}":
+ sylph: \$(sylph -V | awk '{print \$2}')
+ END_VERSIONS
+ """
+
+}
diff --git a/modules/nf-core/sylph/profile/meta.yml b/modules/nf-core/sylph/profile/meta.yml
new file mode 100644
index 000000000..c78b0f33c
--- /dev/null
+++ b/modules/nf-core/sylph/profile/meta.yml
@@ -0,0 +1,59 @@
+name: "sylph_profile"
+description: Sylph profile command for taxonoming profiling
+keywords:
+ - profile
+ - metagenomics
+ - sylph
+ - classification
+tools:
+ - sylph:
+ description: Sylph quickly enables querying of genomes against even low-coverage
+ shotgun metagenomes to find nearest neighbour ANI.
+ homepage: https://github.com/bluenote-1577/sylph
+ documentation: https://github.com/bluenote-1577/sylph
+ tool_dev_url: https://github.com/bluenote-1577/sylph
+ doi: 10.1038/s41587-024-02412-y
+ licence: ["MIT"]
+ identifier: biotools:sylph
+input:
+ - - meta:
+ type: map
+ description: |
+ Groovy Map containing sample information
+ e.g. `[ id:'test', single_end:false ]`
+ - reads:
+ type: file
+ description: |
+ List of input FastQ/FASTA files of size 1 and 2 for single-end and paired-end data,
+ respectively. They are automatically sketched to .sylsp/.syldb
+ ontologies: []
+ - database:
+ type: file
+ description: Pre-sketched *.syldb/*.sylsp files. Raw single-end fastq/fasta are
+ allowed and will be automatically sketched to .sylsp/.syldb.
+ pattern: "*.{syldb,sylsp,fasta,fastq}"
+ ontologies:
+ - edam: http://edamontology.org/format_1930 # FASTQ
+output:
+ profile_out:
+ - - meta:
+ type: map
+ description: Groovy Map containing sample information
+ - "*.tsv":
+ type: file
+ description: Output file of species-level taxonomic profiling with abundances
+ and ANIs.
+ pattern: "*tsv"
+ ontologies: []
+ versions:
+ - versions.yml:
+ type: file
+ description: File containing software versions
+ pattern: "versions.yml"
+ ontologies:
+ - edam: http://edamontology.org/format_3750 # YAML
+authors:
+ - "@jiahang1234"
+ - "@sofstam"
+maintainers:
+ - "@sofstam"
diff --git a/modules/nf-core/sylph/profile/nextflow.config b/modules/nf-core/sylph/profile/nextflow.config
new file mode 100644
index 000000000..f54f711c0
--- /dev/null
+++ b/modules/nf-core/sylph/profile/nextflow.config
@@ -0,0 +1,12 @@
+if (!params.skip_qc) {
+ if (params.contaminant_screening in ['sylph']) {
+ process {
+ withName: 'SYLPH_PROFILE' {
+ publishDir = [
+ path: { "${params.outdir}/${params.aligner}/contaminants/sylph" },
+ mode: params.publish_dir_mode
+ ]
+ }
+ }
+ }
+}
\ No newline at end of file
diff --git a/modules/nf-core/sylph/profile/tests/main.nf.test b/modules/nf-core/sylph/profile/tests/main.nf.test
new file mode 100644
index 000000000..cfdddf685
--- /dev/null
+++ b/modules/nf-core/sylph/profile/tests/main.nf.test
@@ -0,0 +1,80 @@
+nextflow_process {
+
+ name "Test Process SYLPH_PROFILE"
+ script "../main.nf"
+ process "SYLPH_PROFILE"
+ tag "modules"
+ tag "modules_nfcore"
+ tag "sylph"
+ tag "sylph/profile"
+
+ test("sarscov2 illumina single-end [fastq_gz]") {
+ when {
+ process {
+ """
+ input[0] = [ [ id:'test',single_end:true ], // meta map
+ [
+ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true)
+ ]
+ ]
+ input[1] = file(params.modules_testdata_base_path +'genomics/sarscov2/genome/genome.fasta', checkIfExists: true)
+ """
+ }
+ }
+
+ then {
+ assert process.success
+ assert snapshot(
+ process.out.versions,
+ file(process.out.profile_out[0][1]).readLines()[0]
+ ).match()
+ }
+ }
+
+ test("sarscov2 illumina paired-end [fastq_gz]") {
+ when {
+ process {
+ """
+ input[0] = [ [ id:'test' ], // meta map
+ [
+ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true),
+ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_2.fastq.gz', checkIfExists: true)
+ ]
+ ]
+ input[1] = file(params.modules_testdata_base_path +'genomics/sarscov2/genome/genome.fasta', checkIfExists: true)
+ """
+ }
+ }
+
+ then {
+ assert process.success
+ assert snapshot(
+ process.out.versions,
+ file(process.out.profile_out[0][1]).readLines()[0]
+ ).match()
+ }
+ }
+
+ test("sarscov2 illumina paired-end [fastq_gz]-stub") {
+ options "-stub"
+
+ when {
+ process {
+ """
+ input[0] = [ [ id:'test' ], // meta map
+ [
+ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true),
+ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_2.fastq.gz', checkIfExists: true)
+ ]
+ ]
+ input[1] = file(params.modules_testdata_base_path +'genomics/sarscov2/genome/genome.fasta', checkIfExists: true)
+ """
+ }
+ }
+
+ then {
+ assert process.success
+ assert snapshot(process.out).match()
+ }
+ }
+}
diff --git a/modules/nf-core/sylph/profile/tests/main.nf.test.snap b/modules/nf-core/sylph/profile/tests/main.nf.test.snap
new file mode 100644
index 000000000..5541ce615
--- /dev/null
+++ b/modules/nf-core/sylph/profile/tests/main.nf.test.snap
@@ -0,0 +1,61 @@
+{
+ "sarscov2 illumina paired-end [fastq_gz]": {
+ "content": [
+ [
+ "versions.yml:md5,7b5a545483277cc0ff9189f8891e737f"
+ ],
+ "Sample_file\tGenome_file\tTaxonomic_abundance\tSequence_abundance\tAdjusted_ANI\tEff_cov\tANI_5-95_percentile\tEff_lambda\tLambda_5-95_percentile\tMedian_cov\tMean_cov_geq1\tContainment_ind\tNaive_ANI\tkmers_reassigned\tContig_name"
+ ],
+ "meta": {
+ "nf-test": "0.9.2",
+ "nextflow": "24.10.4"
+ },
+ "timestamp": "2025-03-05T11:07:00.061876287"
+ },
+ "sarscov2 illumina single-end [fastq_gz]": {
+ "content": [
+ [
+ "versions.yml:md5,7b5a545483277cc0ff9189f8891e737f"
+ ],
+ "Sample_file\tGenome_file\tTaxonomic_abundance\tSequence_abundance\tAdjusted_ANI\tEff_cov\tANI_5-95_percentile\tEff_lambda\tLambda_5-95_percentile\tMedian_cov\tMean_cov_geq1\tContainment_ind\tNaive_ANI\tkmers_reassigned\tContig_name"
+ ],
+ "meta": {
+ "nf-test": "0.9.2",
+ "nextflow": "24.10.4"
+ },
+ "timestamp": "2025-03-05T11:05:21.230604092"
+ },
+ "sarscov2 illumina paired-end [fastq_gz]-stub": {
+ "content": [
+ {
+ "0": [
+ [
+ {
+ "id": "test"
+ },
+ "test.tsv:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ],
+ "1": [
+ "versions.yml:md5,7b5a545483277cc0ff9189f8891e737f"
+ ],
+ "profile_out": [
+ [
+ {
+ "id": "test"
+ },
+ "test.tsv:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ],
+ "versions": [
+ "versions.yml:md5,7b5a545483277cc0ff9189f8891e737f"
+ ]
+ }
+ ],
+ "meta": {
+ "nf-test": "0.9.2",
+ "nextflow": "24.10.4"
+ },
+ "timestamp": "2025-03-05T11:08:35.882851964"
+ }
+}
\ No newline at end of file
diff --git a/modules/nf-core/sylphtax/taxprof/environment.yml b/modules/nf-core/sylphtax/taxprof/environment.yml
new file mode 100644
index 000000000..517edcad5
--- /dev/null
+++ b/modules/nf-core/sylphtax/taxprof/environment.yml
@@ -0,0 +1,7 @@
+---
+# yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/modules/environment-schema.json
+channels:
+ - conda-forge
+ - bioconda
+dependencies:
+ - "bioconda::sylph-tax=1.2.0"
diff --git a/modules/nf-core/sylphtax/taxprof/main.nf b/modules/nf-core/sylphtax/taxprof/main.nf
new file mode 100644
index 000000000..d7508b3a5
--- /dev/null
+++ b/modules/nf-core/sylphtax/taxprof/main.nf
@@ -0,0 +1,53 @@
+
+process SYLPHTAX_TAXPROF {
+ tag "$meta.id"
+ label 'process_medium'
+
+ conda "${moduleDir}/environment.yml"
+ container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
+ 'https://depot.galaxyproject.org/singularity/sylph-tax:1.2.0--pyhdfd78af_0':
+ 'biocontainers/sylph-tax:1.2.0--pyhdfd78af_0' }"
+
+ input:
+ tuple val(meta), path(sylph_results)
+ path taxonomy
+
+ output:
+ tuple val(meta), path("*.sylphmpa"), emit: taxprof_output
+ path "versions.yml" , emit: versions
+
+ when:
+ task.ext.when == null || task.ext.when
+
+ script:
+ def args = task.ext.args ?: ''
+ def prefix = task.ext.prefix ?: "${meta.id}"
+
+ """
+ export SYLPH_TAXONOMY_CONFIG="/tmp/config.json"
+ sylph-tax \\
+ taxprof \\
+ $sylph_results \\
+ $args \\
+ -t $taxonomy
+
+ mv *.sylphmpa ${prefix}.sylphmpa
+
+ cat <<-END_VERSIONS > versions.yml
+ "${task.process}":
+ sylph-tax: \$(sylph-tax --version)
+ END_VERSIONS
+ """
+
+ stub:
+ def prefix = task.ext.prefix ?: "${meta.id}"
+ """
+ export SYLPH_TAXONOMY_CONFIG="/tmp/config.json"
+ touch ${prefix}.sylphmpa
+
+ cat <<-END_VERSIONS > versions.yml
+ "${task.process}":
+ sylph-tax: \$(sylph-tax --version)
+ END_VERSIONS
+ """
+}
diff --git a/modules/nf-core/sylphtax/taxprof/meta.yml b/modules/nf-core/sylphtax/taxprof/meta.yml
new file mode 100644
index 000000000..c254b608b
--- /dev/null
+++ b/modules/nf-core/sylphtax/taxprof/meta.yml
@@ -0,0 +1,57 @@
+name: sylphtax_taxprof
+description: Incorporates taxonomy into sylph metagenomic classifier
+keywords:
+ - taxonomy
+ - sylph
+ - metagenomics
+tools:
+ - sylphtax:
+ description: Integrating taxonomic information into the sylph metagenome profiler.
+ homepage: https://github.com/bluenote-1577/sylph-tax?tab=readme-ov-file
+ documentation: https://sylph-docs.github.io/sylph-tax/
+ licence: ["MIT"]
+ identifier: ""
+input:
+ - - meta:
+ type: map
+ description: |
+ Groovy Map containing sample information
+ e.g. `[ id:'sample1', single_end:false ]`
+ - sylph_results:
+ type: file
+ description: Output results from sylph classifier. The database file(s) used
+ to create this file with sylph must be the same as those of the taxonomy input
+ channel of this module.
+ pattern: "*.{tsv}"
+ ontologies:
+ - edam: http://edamontology.org/format_3475 # TSV
+ - taxonomy:
+ type: file
+ description: A list of sylph-tax identifiers (e.g. GTDB_r220 or IMGVR_4.1). Multiple
+ taxonomy metadata files can be input. Custom taxonomy files are also possible.
+ ontologies: []
+output:
+ taxprof_output:
+ - - meta:
+ type: map
+ description: |
+ Groovy Map containing sample information
+ e.g. [ id:'test', single_end:false ]
+ pattern: "*{.sylphmpa}"
+ - "*.sylphmpa":
+ type: map
+ description: |
+ Groovy Map containing sample information
+ e.g. [ id:'test', single_end:false ]
+ pattern: "*{.sylphmpa}"
+ versions:
+ - versions.yml:
+ type: file
+ description: File containing software versions
+ pattern: "versions.yml"
+ ontologies:
+ - edam: http://edamontology.org/format_3750 # YAML
+authors:
+ - "@sofstam"
+maintainers:
+ - "@sofstam"
diff --git a/modules/nf-core/sylphtax/taxprof/nextflow.config b/modules/nf-core/sylphtax/taxprof/nextflow.config
new file mode 100644
index 000000000..505f70dc2
--- /dev/null
+++ b/modules/nf-core/sylphtax/taxprof/nextflow.config
@@ -0,0 +1,12 @@
+if (!params.skip_qc) {
+ if (params.contaminant_screening in ['sylph']) {
+ process {
+ withName: 'SYLPHTAX_TAXPROF' {
+ publishDir = [
+ path: { "${params.outdir}/${params.aligner}/contaminants/sylph" },
+ mode: params.publish_dir_mode
+ ]
+ }
+ }
+ }
+}
\ No newline at end of file
diff --git a/modules/nf-core/sylphtax/taxprof/tests/main.nf.test b/modules/nf-core/sylphtax/taxprof/tests/main.nf.test
new file mode 100644
index 000000000..0f0f9724d
--- /dev/null
+++ b/modules/nf-core/sylphtax/taxprof/tests/main.nf.test
@@ -0,0 +1,91 @@
+nextflow_process {
+
+ name "Test Process SYLPHTAX_TAXPROF"
+ script "../main.nf"
+ process "SYLPHTAX_TAXPROF"
+
+ tag "modules"
+ tag "modules_nfcore"
+ tag "sylph"
+ tag "sylph/profile"
+ tag "sylphtax"
+ tag "sylphtax/taxprof"
+
+
+ test("sarscov2 illumina single-end [fastq_gz]") {
+ setup {
+ run("SYLPH_PROFILE") {
+ script "../../../sylph/profile/main.nf"
+ process {
+ """
+ input[0] = [ [ id:'test', single_end:true ], // meta map
+ [
+ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true)
+ ]
+ ]
+ input[1] = file(params.modules_testdata_base_path +'genomics/sarscov2/genome/genome.fasta', checkIfExists: true)
+ """
+ }
+ }
+ }
+ when {
+ process {
+ """
+ input[0] = SYLPH_PROFILE.out.profile_out
+ input[1] = file('https://github.com/nf-core/test-datasets/raw/taxprofiler/data/database/sylph/test_taxonomy.tsv.gz', checkIfExists: true)
+ """
+ }
+ }
+
+ then {
+ assertAll(
+ { assert process.success },
+ { assert snapshot(
+ process.out.versions,
+ process.out.taxprof_output
+ ).match() }
+ )
+ }
+
+ }
+
+ test("stub sarscov2 illumina single-end [fastq_gz]") {
+
+ options '-stub'
+
+ setup {
+ run("SYLPH_PROFILE") {
+ script "../../../sylph/profile/main.nf"
+ process {
+ """
+ input[0] = [ [ id:'test' ], // meta map
+ [
+ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true)
+ ]
+ ]
+ input[1] = file(params.modules_testdata_base_path +'genomics/sarscov2/genome/genome.fasta', checkIfExists: true)
+ """
+ }
+ }
+ }
+ when {
+ process {
+ """
+ input[0] = SYLPH_PROFILE.out.profile_out
+ input[1] = file('https://github.com/nf-core/test-datasets/raw/taxprofiler/data/database/sylph/test_taxonomy.tsv.gz', checkIfExists: true)
+ """
+ }
+ }
+
+ then {
+ assertAll(
+ { assert process.success },
+ { assert snapshot(
+ process.out.versions,
+ process.out.taxprof_output
+ ).match() }
+ )
+ }
+ }
+
+}
diff --git a/modules/nf-core/sylphtax/taxprof/tests/main.nf.test.snap b/modules/nf-core/sylphtax/taxprof/tests/main.nf.test.snap
new file mode 100644
index 000000000..3c26e75ec
--- /dev/null
+++ b/modules/nf-core/sylphtax/taxprof/tests/main.nf.test.snap
@@ -0,0 +1,43 @@
+{
+ "stub sarscov2 illumina single-end [fastq_gz]": {
+ "content": [
+ [
+ "versions.yml:md5,bdbbd22b3e721ba2027d3e6cb1dc4bb4"
+ ],
+ [
+ [
+ {
+ "id": "test"
+ },
+ "test.sylphmpa:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ]
+ ],
+ "meta": {
+ "nf-test": "0.9.2",
+ "nextflow": "24.10.5"
+ },
+ "timestamp": "2025-04-07T15:28:04.026470884"
+ },
+ "sarscov2 illumina single-end [fastq_gz]": {
+ "content": [
+ [
+ "versions.yml:md5,bdbbd22b3e721ba2027d3e6cb1dc4bb4"
+ ],
+ [
+ [
+ {
+ "id": "test",
+ "single_end": true
+ },
+ "test.sylphmpa:md5,a9743c21a53ba766226e57d2a25f6167"
+ ]
+ ]
+ ],
+ "meta": {
+ "nf-test": "0.9.2",
+ "nextflow": "24.10.5"
+ },
+ "timestamp": "2025-04-07T15:27:55.45776116"
+ }
+}
\ No newline at end of file
diff --git a/nextflow.config b/nextflow.config
index 07b3a5a26..a6a3080b9 100644
--- a/nextflow.config
+++ b/nextflow.config
@@ -100,6 +100,8 @@ params {
save_kraken_assignments = false
save_kraken_unassigned = false
bracken_precision = "S"
+ sylph_db = null
+ sylph_taxonomy = null
skip_rseqc = false
skip_biotype_qc = false
skip_deseq2_qc = false
diff --git a/nextflow_schema.json b/nextflow_schema.json
index 852b5bdd4..41f232250 100644
--- a/nextflow_schema.json
+++ b/nextflow_schema.json
@@ -600,15 +600,15 @@
},
"contaminant_screening": {
"type": "string",
- "description": "Tool to use for detecting contaminants in unaligned reads - available options are 'kraken2' and 'kraken2_bracken'",
+ "description": "Tool to use for detecting contaminants in unaligned reads - available options are 'sylph', 'kraken2', or 'kraken2_bracken'",
"fa_icon": "fas fa-virus-slash",
- "enum": ["kraken2", "kraken2_bracken"]
+ "enum": ["kraken2", "kraken2_bracken", "sylph"]
},
"kraken_db": {
"type": "string",
"format": "directory-path",
"description": "Database when using Kraken2/Bracken for contaminant screening.",
- "help_text": "See the usage documentation for more information on setting up and using Kraken2 databases.",
+ "help_text": "See the usage documentation for more information on setting up and using Kraken2 databases. Requires the --contaminant-screening option to be set to 'kraken2' or 'kraken2_bracken' to have an effect",
"fa_icon": "fas fa-fish"
},
"bracken_precision": {
@@ -616,8 +616,22 @@
"default": "S",
"fa_icon": "fas fa-tree",
"description": "Taxonomic level for Bracken abundance estimations.",
- "help_text": "Use the first letter of taxonomic levels: Domain, Phylum, Class, Order, Family, Genus, or Species.",
+ "help_text": "Use the first letter of taxonomic levels: Domain, Phylum, Class, Order, Family, Genus, or Species. Requires --contaminant-screening option to be set to 'kraken2_bracken' to have an effect.",
"enum": ["D", "P", "C", "O", "F", "G", "S"]
+ },
+ "sylph_db": {
+ "type": "string",
+ "format": "file-path",
+ "description": "Comma separated list of databases to profile against when using Sylph for contamination detection",
+ "help_text": "See the usage documentation for more information on setting up and using Sylph databases. Requires --contaminant-screening option to be set to 'sylph' to have an effect.",
+ "fa_icon": "fas fa-database"
+ },
+ "sylph_taxonomy": {
+ "type": "string",
+ "description": "Comma separated list of taxonomies when using Sylph for contamination detection",
+ "help_text": "See the usage documentation for more information on Sylph taxonomies. Requires --contaminant-screening option to be set to 'sylph' to have an effect.",
+ "fa_icon": "fas fa-tree",
+ "format": "file-path"
}
}
},
diff --git a/subworkflows/local/utils_nfcore_rnaseq_pipeline/main.nf b/subworkflows/local/utils_nfcore_rnaseq_pipeline/main.nf
index a94cac101..7d1b4fc4a 100644
--- a/subworkflows/local/utils_nfcore_rnaseq_pipeline/main.nf
+++ b/subworkflows/local/utils_nfcore_rnaseq_pipeline/main.nf
@@ -349,18 +349,17 @@ def validateInputParameters() {
}
// Check that Kraken/Bracken database provided if using kraken2/bracken
- if (params.contaminant_screening in ['kraken2', 'kraken2_bracken']) {
- if (!params.kraken_db) {
- error("Contaminant screening set to kraken2 but not database is provided. Please provide a database with the --kraken_db option.")
- }
- // Check that Kraken/Bracken parameters are not provided when Kraken2 is not being used
- } else {
- if (!params.bracken_precision.equals('S')) {
- brackenPrecisionWithoutKrakenDBWarn()
- }
+ if (params.contaminant_screening in ['kraken2', 'kraken2_bracken'] && !params.kraken_db) {
+ error("Contaminant screening set to kraken2 but no database was provided. Please provide a database with the --kraken_db option.")
+ }
- if (params.save_kraken_assignments || params.save_kraken_unassigned || params.kraken_db) {
- krakenArgumentsWithoutKrakenDBWarn()
+ // Check that Sylph database and taxonomy is provided if using Sylph
+ if (params.contaminant_screening == 'sylph') {
+ if (!params.sylph_db) {
+ error("Contaminant screening is set to Sylph but no database was provided. Please provide a database with the --sylph_db option.")
+ }
+ if (!params.sylph_taxonomy) {
+ error("Contaminant screening is set to Sylph but no taxonomy was provided. Please provide a taxonomy with the --sylph_taxonomy option.")
}
}
@@ -602,26 +601,6 @@ def additionaFastaIndexWarn(index) {
"~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~"
}
-//
-// Print a warning if --save_kraken_assignments or --save_kraken_unassigned is provided without --kraken_db
-//
-def krakenArgumentsWithoutKrakenDBWarn() {
- log.warn "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n" +
- " 'Kraken2 related arguments have been provided without setting contaminant\n" +
- " screening to Kraken2. Kraken2 is not being run so these will not be used.\n" +
- "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~"
-}
-
-///
-/// Print a warning if --bracken-precision is provided without --kraken_db
-///
-def brackenPrecisionWithoutKrakenDBWarn() {
- log.warn "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n" +
- " '--bracken-precision' parameter has been provided without Kraken2 contaminant screening.\n" +
- " Bracken will not run so precision will not be set.\n" +
- "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~"
-}
-
//
// Function to generate an error if contigs in genome fasta file > 512 Mbp
//
diff --git a/workflows/rnaseq/assets/multiqc/multiqc_config.yml b/workflows/rnaseq/assets/multiqc/multiqc_config.yml
index 841935f53..5edc2ba99 100644
--- a/workflows/rnaseq/assets/multiqc/multiqc_config.yml
+++ b/workflows/rnaseq/assets/multiqc/multiqc_config.yml
@@ -98,6 +98,7 @@ run_modules:
- rseqc
- qualimap
- kraken
+ - sylphtax
# Order of modules
@@ -168,6 +169,8 @@ sp:
kraken:
- fn: "*.kraken2.report.txt"
- fn: "*.kraken2.report_bracken.txt"
+ sylphtax:
+ - fn: "*.sylphmpa"
rseqc/bam_stat:
fn: "*.bam_stat.txt"
rseqc/gene_body_coverage:
diff --git a/workflows/rnaseq/main.nf b/workflows/rnaseq/main.nf
index 80f6ae605..25fb731b3 100755
--- a/workflows/rnaseq/main.nf
+++ b/workflows/rnaseq/main.nf
@@ -43,6 +43,8 @@ include { STRINGTIE_STRINGTIE } from '../../modules/nf-core/stringtie/str
include { SUBREAD_FEATURECOUNTS } from '../../modules/nf-core/subread/featurecounts'
include { KRAKEN2_KRAKEN2 as KRAKEN2 } from '../../modules/nf-core/kraken2/kraken2/main'
include { BRACKEN_BRACKEN as BRACKEN } from '../../modules/nf-core/bracken/bracken/main'
+include { SYLPH_PROFILE } from '../../modules/nf-core/sylph/profile/main'
+include { SYLPHTAX_TAXPROF } from '../../modules/nf-core/sylphtax/taxprof/main'
include { MULTIQC } from '../../modules/nf-core/multiqc'
include { BEDTOOLS_GENOMECOV as BEDTOOLS_GENOMECOV_FW } from '../../modules/nf-core/bedtools/genomecov'
include { BEDTOOLS_GENOMECOV as BEDTOOLS_GENOMECOV_REV } from '../../modules/nf-core/bedtools/genomecov'
@@ -674,6 +676,24 @@ workflow RNASEQ {
ch_versions = ch_versions.mix(BRACKEN.out.versions)
ch_multiqc_files = ch_multiqc_files.mix(BRACKEN.out.txt.collect{it[1]})
}
+ } else if (params.contaminant_screening == 'sylph') {
+ def sylph_databases = params.sylph_db ? params.sylph_db.split(',').collect{ file(it.trim()) } : []
+ ch_sylph_databases = channel.value(sylph_databases)
+ SYLPH_PROFILE (
+ ch_unaligned_sequences,
+ ch_sylph_databases
+ )
+ ch_sylph_profile = SYLPH_PROFILE.out.profile_out.filter{!it[1].isEmpty()}
+ ch_versions = ch_versions.mix(SYLPH_PROFILE.out.versions)
+
+ def sylph_taxonomies = params.sylph_taxonomy ? params.sylph_taxonomy.split(',').collect{ file(it.trim()) } : []
+ ch_sylph_taxonomies = channel.value(sylph_taxonomies)
+ SYLPHTAX_TAXPROF (
+ ch_sylph_profile,
+ ch_sylph_taxonomies
+ )
+ ch_versions = ch_versions.mix(SYLPHTAX_TAXPROF.out.versions)
+ ch_multiqc_files = ch_multiqc_files.mix(SYLPHTAX_TAXPROF.out.taxprof_output.collect{it[1]})
}
}
diff --git a/workflows/rnaseq/nextflow.config b/workflows/rnaseq/nextflow.config
index d6f674c7a..a3cd7ca8f 100644
--- a/workflows/rnaseq/nextflow.config
+++ b/workflows/rnaseq/nextflow.config
@@ -10,6 +10,8 @@ includeConfig "../../modules/nf-core/stringtie/stringtie/nextflow.config"
includeConfig "../../modules/nf-core/subread/featurecounts/nextflow.config"
includeConfig "../../modules/nf-core/kraken2/kraken2/nextflow.config"
includeConfig "../../modules/nf-core/bracken/bracken/nextflow.config"
+includeConfig "../../modules/nf-core/sylph/profile/nextflow.config"
+includeConfig "../../modules/nf-core/sylphtax/taxprof/nextflow.config"
includeConfig "../../subworkflows/local/align_star/nextflow.config"
includeConfig "../../subworkflows/local/quantify_rsem/nextflow.config"
includeConfig "../../subworkflows/nf-core/quantify_pseudo_alignment/nextflow.config"