SortMeRNA low mem params #105

JackCurragh · 2025-04-30T13:53:51Z

…. Any rRNA hit is enough to characterise read as rRNA not RPF. This should lower memory usage and address some user reported errors on slack and [#81].

PR checklist

Dev -> master for v1.1.0

…. Any rRNA hit is enough to characterise read as rRNA not RPF. This should lower memory usage and address some user reported errors.

nf-core-bot · 2025-04-30T13:54:28Z

Warning

Newer version of the nf-core template is available.

Your pipeline is using an old version of the nf-core template: 3.2.0.
Please update your pipeline to the latest version.

For more documentation on how to update your pipeline, please see the nf-core documentation and Synchronisation documentation.

github-actions · 2025-04-30T13:55:57Z

`nf-core pipelines lint` overall result: Passed ✅ ⚠️

Posted for pipeline commit 4962134

+| ✅ 232 tests passed       |+
#| ❔   6 tests were ignored |#
!| ❗   6 tests had warnings |!

Details

❗ Test warnings:

nextflow_config - Config manifest.version should end in dev: 1.1.0
pipeline_todos - TODO string in ro-crate-metadata.json: "description": "
\n \n <source media="(prefers-color-scheme: dark)" srcset="docs/images/nf-core-riboseq_logo_dark.png">\n <img alt="nf-core/riboseq" src="docs/images/nf-core-riboseq_logo_light.png">\n \n
\n\n\n \n\n\n\n\n\n\n\n\n \n\n## Introduction\n\nnf-core/riboseq is a bioinformatics pipeline that ...\n\n TODO nf-core:\n Complete this sentence with a 2-3 sentence summary of what types of data the pipeline ingests, a brief overview of the\n major pipeline sections and the types of output it produces. You're giving an overview to someone new\n to nf-core here, in 15-20 seconds. For an example, see https://github.com/nf-core/rnaseq/blob/master/README.md#introduction\n\n\n Include a figure that guides the user through the major workflow steps. Many nf-core\n workflows use the "tube map" design for that. See https://nf-co.re/docs/contributing/design_guidelines#examples for examples. \n Fill in short bullet-pointed list of the default steps in the pipeline 1. Read QC (FastQC)2. Present QC for raw reads (MultiQC)\n\n## Usage\n\n> [!NOTE]\n> If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with -profile test before running the workflow on actual data.\n\n Describe the minimum required steps to execute the pipeline, e.g. how to prepare samplesheets.\n Explain what rows and columns represent. For instance (please edit as appropriate):\n\nFirst, prepare a samplesheet with your input data that looks as follows:\n\nsamplesheet.csv:\n\ncsv\nsample,fastq_1,fastq_2\nCONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz\n\n\nEach row represents a fastq file (single-end) or a pair of fastq files (paired end).\n\n\n\nNow, you can run the pipeline using:\n\n update the following command to include all required parameters for a minimal example \n\nbash\nnextflow run nf-core/riboseq \\\n -profile <docker/singularity/.../institute> \\\n --input samplesheet.csv \\\n --outdir <OUTDIR>\n\n\n> [!WARNING]\n> Please provide pipeline parameters via the CLI or Nextflow -params-file option. Custom config files including those provided by the -c Nextflow option can be used to provide any configuration except for parameters; see docs.\n\nFor more details and further functionality, please refer to the usage documentation and the parameter documentation.\n\n## Pipeline output\n\nTo see the results of an example test run with a full size dataset refer to the results tab on the nf-core website pipeline page.\nFor more details about the output files and reports, please refer to the\noutput documentation.\n\n## Credits\n\nnf-core/riboseq was originally written by Jonathan Manning.\n\nWe thank the following people for their extensive assistance in the development of this pipeline:\n\n If applicable, make list of people who have also contributed \n\n## Contributions and Support\n\nIf you would like to contribute to this pipeline, please see the contributing guidelines.\n\nFor further information or help, don't hesitate to get in touch on the Slack #riboseq channel (you can join with this invite).\n\n## Citations\n\n Add citation for pipeline after first release. Uncomment lines below and update Zenodo doi and badge at the top of this file. \n If you use nf-core/riboseq for your analysis, please cite it using the following doi: 10.5281/zenodo.XXXXXX \n\n Add bibliography of tools and data used in your pipeline \n\nAn extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.\n\nYou can cite the nf-core publication as follows:\n\n> The nf-core framework for community-curated bioinformatics pipelines.\n>\n> Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.\n>\n> Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.\n",
pipeline_todos - TODO string in main.nf: Optionally add in-text citation tools to this list.
pipeline_todos - TODO string in main.nf: Optionally add bibliographic entries to this list.
pipeline_todos - TODO string in main.nf: Only uncomment below if logic in toolCitationText/toolBibliographyText has been filled!
schema_lint - Input mimetype is missing or empty

❔ Tests ignored:

nextflow_config - Config default ignored: params.ribo_database_manifest
files_unchanged - File ignored due to lint config: assets/nf-core-riboseq_logo_light.png
files_unchanged - File ignored due to lint config: docs/images/nf-core-riboseq_logo_light.png
files_unchanged - File ignored due to lint config: docs/images/nf-core-riboseq_logo_dark.png
files_unchanged - File ignored due to lint config: .gitignore or .prettierignore
actions_ci - actions_ci

✅ Tests passed:

files_exist - File found: .gitattributes
files_exist - File found: .gitignore
files_exist - File found: .nf-core.yml
files_exist - File found: .editorconfig
files_exist - File found: .prettierignore
files_exist - File found: .prettierrc.yml
files_exist - File found: CHANGELOG.md
files_exist - File found: CITATIONS.md
files_exist - File found: CODE_OF_CONDUCT.md
files_exist - File found: LICENSE or LICENSE.md or LICENCE or LICENCE.md
files_exist - File found: nextflow_schema.json
files_exist - File found: nextflow.config
files_exist - File found: README.md
files_exist - File found: .github/.dockstore.yml
files_exist - File found: .github/CONTRIBUTING.md
files_exist - File found: .github/ISSUE_TEMPLATE/bug_report.yml
files_exist - File found: .github/ISSUE_TEMPLATE/config.yml
files_exist - File found: .github/ISSUE_TEMPLATE/feature_request.yml
files_exist - File found: .github/PULL_REQUEST_TEMPLATE.md
files_exist - File found: .github/workflows/branch.yml
files_exist - File found: .github/workflows/ci.yml
files_exist - File found: .github/workflows/linting_comment.yml
files_exist - File found: .github/workflows/linting.yml
files_exist - File found: assets/email_template.html
files_exist - File found: assets/email_template.txt
files_exist - File found: assets/sendmail_template.txt
files_exist - File found: assets/nf-core-riboseq_logo_light.png
files_exist - File found: conf/modules.config
files_exist - File found: conf/test.config
files_exist - File found: conf/test_full.config
files_exist - File found: docs/images/nf-core-riboseq_logo_light.png
files_exist - File found: docs/images/nf-core-riboseq_logo_dark.png
files_exist - File found: docs/output.md
files_exist - File found: docs/README.md
files_exist - File found: docs/README.md
files_exist - File found: docs/usage.md
files_exist - File found: main.nf
files_exist - File found: assets/multiqc_config.yml
files_exist - File found: conf/base.config
files_exist - File found: conf/igenomes.config
files_exist - File found: conf/igenomes_ignored.config
files_exist - File found: .github/workflows/awstest.yml
files_exist - File found: .github/workflows/awsfulltest.yml
files_exist - File found: modules.json
files_exist - File found: ro-crate-metadata.json
files_exist - File not found check: .github/ISSUE_TEMPLATE/bug_report.md
files_exist - File not found check: .github/ISSUE_TEMPLATE/feature_request.md
files_exist - File not found check: .github/workflows/push_dockerhub.yml
files_exist - File not found check: .markdownlint.yml
files_exist - File not found check: .nf-core.yaml
files_exist - File not found check: .yamllint.yml
files_exist - File not found check: bin/markdown_to_html.r
files_exist - File not found check: conf/aws.config
files_exist - File not found check: docs/images/nf-core-riboseq_logo.png
files_exist - File not found check: lib/Checks.groovy
files_exist - File not found check: lib/Completion.groovy
files_exist - File not found check: lib/NfcoreTemplate.groovy
files_exist - File not found check: lib/Utils.groovy
files_exist - File not found check: lib/Workflow.groovy
files_exist - File not found check: lib/WorkflowMain.groovy
files_exist - File not found check: lib/WorkflowRiboseq.groovy
files_exist - File not found check: parameters.settings.json
files_exist - File not found check: pipeline_template.yml
files_exist - File not found check: Singularity
files_exist - File not found check: lib/nfcore_external_java_deps.jar
files_exist - File not found check: .travis.yml
nextflow_config - Found nf-schema plugin
nextflow_config - Config variable found: manifest.name
nextflow_config - Config variable found: manifest.nextflowVersion
nextflow_config - Config variable found: manifest.description
nextflow_config - Config variable found: manifest.version
nextflow_config - Config variable found: manifest.homePage
nextflow_config - Config variable found: timeline.enabled
nextflow_config - Config variable found: trace.enabled
nextflow_config - Config variable found: report.enabled
nextflow_config - Config variable found: dag.enabled
nextflow_config - Config variable found: process.cpus
nextflow_config - Config variable found: process.memory
nextflow_config - Config variable found: process.time
nextflow_config - Config variable found: params.outdir
nextflow_config - Config variable found: params.input
nextflow_config - Config variable found: validation.help.enabled
nextflow_config - Config variable found: manifest.mainScript
nextflow_config - Config variable found: timeline.file
nextflow_config - Config variable found: trace.file
nextflow_config - Config variable found: report.file
nextflow_config - Config variable found: dag.file
nextflow_config - Config variable found: validation.help.beforeText
nextflow_config - Config variable found: validation.help.afterText
nextflow_config - Config variable found: validation.help.command
nextflow_config - Config variable found: validation.summary.beforeText
nextflow_config - Config variable found: validation.summary.afterText
nextflow_config - Config variable (correctly) not found: params.nf_required_version
nextflow_config - Config variable (correctly) not found: params.container
nextflow_config - Config variable (correctly) not found: params.singleEnd
nextflow_config - Config variable (correctly) not found: params.igenomesIgnore
nextflow_config - Config variable (correctly) not found: params.name
nextflow_config - Config variable (correctly) not found: params.enable_conda
nextflow_config - Config variable (correctly) not found: params.max_cpus
nextflow_config - Config variable (correctly) not found: params.max_memory
nextflow_config - Config variable (correctly) not found: params.max_time
nextflow_config - Config variable (correctly) not found: params.validationFailUnrecognisedParams
nextflow_config - Config variable (correctly) not found: params.validationLenientMode
nextflow_config - Config variable (correctly) not found: params.validationSchemaIgnoreParams
nextflow_config - Config variable (correctly) not found: params.validationShowHiddenParams
nextflow_config - Config timeline.enabled had correct value: true
nextflow_config - Config report.enabled had correct value: true
nextflow_config - Config trace.enabled had correct value: true
nextflow_config - Config dag.enabled had correct value: true
nextflow_config - Config manifest.name began with nf-core/
nextflow_config - Config variable manifest.homePage began with https://github.com/nf-core/
nextflow_config - Config dag.file ended with .html
nextflow_config - Config variable manifest.nextflowVersion started with >= or !>=
nextflow_config - Config params.custom_config_version is set to master
nextflow_config - Config params.custom_config_base is set to https://raw.githubusercontent.com/nf-core/configs/master
nextflow_config - Lines for loading custom profiles found
nextflow_config - nextflow.config contains configuration profile test
nextflow_config - Config default value correct: params.gtf_extra_attributes= gene_name
nextflow_config - Config default value correct: params.gtf_group_features= gene_id
nextflow_config - Config default value correct: params.igenomes_base= s3://ngi-igenomes/igenomes/
nextflow_config - Config default value correct: params.trimmer= trimgalore
nextflow_config - Config default value correct: params.min_trimmed_reads= 10000
nextflow_config - Config default value correct: params.extra_fqlint_args= --disable-validator P001
nextflow_config - Config default value correct: params.remove_ribo_rna= true
nextflow_config - Config default value correct: params.umi_dedup_tool= umitools
nextflow_config - Config default value correct: params.umitools_extract_method= string
nextflow_config - Config default value correct: params.umitools_grouping_method= directional
nextflow_config - Config default value correct: params.aligner= star
nextflow_config - Config default value correct: params.pseudo_aligner_kmer_size= 31
nextflow_config - Config default value correct: params.min_mapped_reads= 5.0
nextflow_config - Config default value correct: params.stranded_threshold= 0.8
nextflow_config - Config default value correct: params.unstranded_threshold= 0.1
nextflow_config - Config default value correct: params.save_align_intermeds= true
nextflow_config - Config default value correct: params.skip_bbsplit= true
nextflow_config - Config default value correct: params.custom_config_version= master
nextflow_config - Config default value correct: params.custom_config_base= https://raw.githubusercontent.com/nf-core/configs/master
nextflow_config - Config default value correct: params.test_data_base= https://raw.githubusercontent.com/nf-core/test-datasets/riboseq/testdata/
nextflow_config - Config default value correct: params.publish_dir_mode= copy
nextflow_config - Config default value correct: params.max_multiqc_email_size= 25.MB
nextflow_config - Config default value correct: params.validate_params= true
nextflow_config - Config default value correct: params.pipelines_testdata_base_path= https://raw.githubusercontent.com/nf-core/test-datasets/
files_unchanged - .gitattributes matches the template
files_unchanged - .prettierrc.yml matches the template
files_unchanged - CODE_OF_CONDUCT.md matches the template
files_unchanged - LICENSE matches the template
files_unchanged - .github/.dockstore.yml matches the template
files_unchanged - .github/CONTRIBUTING.md matches the template
files_unchanged - .github/ISSUE_TEMPLATE/bug_report.yml matches the template
files_unchanged - .github/ISSUE_TEMPLATE/config.yml matches the template
files_unchanged - .github/ISSUE_TEMPLATE/feature_request.yml matches the template
files_unchanged - .github/PULL_REQUEST_TEMPLATE.md matches the template
files_unchanged - .github/workflows/branch.yml matches the template
files_unchanged - .github/workflows/linting_comment.yml matches the template
files_unchanged - .github/workflows/linting.yml matches the template
files_unchanged - assets/email_template.html matches the template
files_unchanged - assets/email_template.txt matches the template
files_unchanged - assets/sendmail_template.txt matches the template
files_unchanged - docs/README.md matches the template
actions_awstest - '.github/workflows/awstest.yml' is triggered correctly
actions_awsfulltest - .github/workflows/awsfulltest.yml is triggered correctly
actions_awsfulltest - .github/workflows/awsfulltest.yml does not use -profile test
readme - README Nextflow minimum version badge matched config. Badge: 24.04.2, Config: 24.04.2
readme - README Zenodo placeholder was replaced with DOI.
plugin_includes - No wrong validation plugin imports have been found
pipeline_name_conventions - Name adheres to nf-core convention
template_strings - Did not find any Jinja template strings (0 files)
schema_lint - Schema lint passed
schema_lint - Schema title + description lint passed
schema_params - Schema matched params returned from nextflow config
system_exit - No System.exit calls found
actions_schema_validation - Workflow validation passed: awsfulltest.yml
actions_schema_validation - Workflow validation passed: download_pipeline.yml
actions_schema_validation - Workflow validation passed: linting.yml
actions_schema_validation - Workflow validation passed: release-announcements.yml
actions_schema_validation - Workflow validation passed: ci.yml
actions_schema_validation - Workflow validation passed: linting_comment.yml
actions_schema_validation - Workflow validation passed: fix-linting.yml
actions_schema_validation - Workflow validation passed: template_version_comment.yml
actions_schema_validation - Workflow validation passed: branch.yml
actions_schema_validation - Workflow validation passed: clean-up.yml
actions_schema_validation - Workflow validation passed: awstest.yml
merge_markers - No merge markers found in pipeline files
modules_json - Only installed modules found in modules.json
multiqc_config - assets/multiqc_config.yml found and not ignored.
multiqc_config - assets/multiqc_config.yml contains report_section_order
multiqc_config - assets/multiqc_config.yml contains export_plots
multiqc_config - assets/multiqc_config.yml contains report_comment
multiqc_config - assets/multiqc_config.yml follows the ordering scheme of the minimally required plugins.
multiqc_config - assets/multiqc_config.yml contains a matching 'report_comment'.
multiqc_config - assets/multiqc_config.yml contains 'export_plots: true'.
modules_structure - modules directory structure is correct 'modules/nf-core/TOOL/SUBTOOL'
local_component_structure - local subworkflows directory structure is correct 'subworkflows/local/TOOL/SUBTOOL'
base_config - conf/base.config found and not ignored.
modules_config - conf/modules.config found and not ignored.
modules_config - GUNZIP_ found in conf/modules.config and Nextflow scripts.
modules_config - UNTAR_ found in conf/modules.config and Nextflow scripts.
modules_config - UNTAR_ found in conf/modules.config and Nextflow scripts.
modules_config - GFFREAD found in conf/modules.config and Nextflow scripts.
modules_config - SALMON_INDEX found in conf/modules.config and Nextflow scripts.
modules_config - GTF2BED found in conf/modules.config and Nextflow scripts.
modules_config - CUSTOM_CATADDITIONALFASTA found in conf/modules.config and Nextflow scripts.
modules_config - GTF_FILTER found in conf/modules.config and Nextflow scripts.
modules_config - CUSTOM_GETCHROMSIZES found in conf/modules.config and Nextflow scripts.
modules_config - CAT_FASTQ found in conf/modules.config and Nextflow scripts.
modules_config - SORTMERNA_INDEX found in conf/modules.config and Nextflow scripts.
modules_config - FQ_SUBSAMPLE found in conf/modules.config and Nextflow scripts.
modules_config - FQ_LINT found in conf/modules.config and Nextflow scripts.
modules_config - FQ_LINT_AFTER_TRIMMING found in conf/modules.config and Nextflow scripts.
modules_config - FQ_LINT_AFTER_BBSPLIT found in conf/modules.config and Nextflow scripts.
modules_config - FQ_LINT_AFTER_SORTMERNA found in conf/modules.config and Nextflow scripts.
modules_config - UMITOOLS_EXTRACT found in conf/modules.config and Nextflow scripts.
modules_config - BBMAP_BBSPLIT found in conf/modules.config and Nextflow scripts.
modules_config - NFCORE_RIBOSEQ found in conf/modules.config and Nextflow scripts.
modules_config - NFCORE_RIBOSEQ found in conf/modules.config and Nextflow scripts.
modules_config - NFCORE_RIBOSEQ found in conf/modules.config and Nextflow scripts.
modules_config - NFCORE_RIBOSEQ found in conf/modules.config and Nextflow scripts.
modules_config - NFCORE_RIBOSEQ found in conf/modules.config and Nextflow scripts.
modules_config - STAR_ALIGN found in conf/modules.config and Nextflow scripts.
modules_config - NFCORE_RIBOSEQ found in conf/modules.config and Nextflow scripts.
modules_config - NFCORE_RIBOSEQ found in conf/modules.config and Nextflow scripts.
modules_config - NFCORE_RIBOSEQ found in conf/modules.config and Nextflow scripts.
modules_config - NFCORE_RIBOSEQ found in conf/modules.config and Nextflow scripts.
modules_config - NFCORE_RIBOSEQ found in conf/modules.config and Nextflow scripts.
modules_config - MULTIQC found in conf/modules.config and Nextflow scripts.
modules_config - RIBOTISH_QUALITY found in conf/modules.config and Nextflow scripts.
modules_config - RIBOTISH_PREDICT_INDIVIDUAL found in conf/modules.config and Nextflow scripts.
modules_config - RIBOTISH_PREDICT_ALL found in conf/modules.config and Nextflow scripts.
modules_config - RIBOTRICER_PREPAREORFS found in conf/modules.config and Nextflow scripts.
modules_config - RIBOTRICER_DETECTORFS found in conf/modules.config and Nextflow scripts.
modules_config - ANOTA2SEQ_ANOTA2SEQRUN found in conf/modules.config and Nextflow scripts.
nfcore_yml - Repository type in .nf-core.yml is valid: pipeline
nfcore_yml - nf-core version in .nf-core.yml is set to the latest version: 3.2.0

Run details

nf-core/tools version 3.2.0
Run at 2025-05-07 07:23:16

pinin4fjords

Sounds reasonable, approving. Perhaps we can see how well this works in this pipeline for a while, and then deploy in rnaseq if we don't have issues.

iraiosub · 2025-05-07T10:17:28Z

Thanks for looking into this — --no-best does avoid the exhaustive search and ranking even with --num_alignments 1, which explains the memory savings. That said, this issue seems extremely rare, despite the widespread use of SortMeRNA in nf-core/rnaseq, and since --no-best changes alignment behavior, it might be better as a documented option for filtering or large datasets rather than a default change for everyone? It could also affect rRNA subtype reporting in MultiQC, so unless the performance gains turn out to be consistently big, it might be worth holding off on changing the default? It feels like a case where a config override would be the better path unless we start seeing this come up more frequently.

JackCurragh · 2025-05-07T15:42:47Z

Hi @iraiosub, I take your point that rRNA subtype might change but my intuition (which is often wrong) would be that the high identity required for a alignment would mean we would be getting the right subtype regardless. I also checked with this branch versus dev using -profile test and the multiqc_sortmerna.txt files are identical. I dont have an alternative large dataset at hand currently to test this further.

I also suspect the reason this has less frequently come up for nf-core/rnaseq is that the rRNA content is significantly different between these two data types. Also, perhaps this goes agsinst my point above but, since the reads are longer in RNA-Seq multimapping of rRNA reads is less common?

Regardless, not going to merge until we are at an agreement. I can of course make this an option if its preferred but don't have a tonne of time at the moment.

pinin4fjords · 2025-05-07T16:10:27Z

FWIW I do get queries along these lines fairly frequently for SortMeRNA in rnaseq

iraiosub · 2025-05-07T19:35:44Z

Thanks for the thoughtful reply @JackCurragh! Just to clarify — this is already configurable via -params-file <file>, so users can enable --no-best themselves without any extra work on your/our end. I wasn’t suggesting making it optional within the pipeline itself. I’m not against merging if that’s preferred or it's a common issue, but if so, it would be helpful to document in the pipeline docs that --no-best is set by default because the main goal is rRNA filtering, and it improves performance — with the tradeoff that the reported alignment may not always be the best match.

pinin4fjords and others added 8 commits February 3, 2025 09:09

Merge pull request #95 from nf-core/dev

235e008

Dev -> master for v1.1.0

changed sortMeRNA params to not look for best alignment of rRNA reads…

872054e

…. Any rRNA hit is enough to characterise read as rRNA not RPF. This should lower memory usage and address some user reported errors.

update nextflow_schema.json and schema_input.json

8c4ea4d

update nextflow_schema.json

bf47009

Bump changelog

3d5ac8a

update nextflow_schema.json

880ffd3

bump changelog

71919b3

updated changelog

66a4049

JackCurragh requested a review from pinin4fjords April 30, 2025 13:53

Merge branch 'dev' into sortMeRNA_low_mem_params

20f5bfe

JackCurragh added 2 commits May 7, 2025 08:18

bump Cache pdiff actions/cache to v4 due to CI complaint

3feb510

remove empty line in CHANGELOG.md

4962134

pinin4fjords approved these changes May 7, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SortMeRNA low mem params #105

SortMeRNA low mem params #105

Uh oh!

JackCurragh commented Apr 30, 2025

Uh oh!

nf-core-bot commented Apr 30, 2025

Uh oh!