Skip to content

Conversation

@JackCurragh
Copy link
Contributor

…. Any rRNA hit is enough to characterise read as rRNA not RPF. This should lower memory usage and address some user reported errors on slack and [#81].

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • If necessary, also make a PR on the nf-core/riboseq branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core pipelines lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

@nf-core-bot
Copy link
Member

Warning

Newer version of the nf-core template is available.

Your pipeline is using an old version of the nf-core template: 3.2.0.
Please update your pipeline to the latest version.

For more documentation on how to update your pipeline, please see the nf-core documentation and Synchronisation documentation.

@github-actions
Copy link

github-actions bot commented Apr 30, 2025

nf-core pipelines lint overall result: Passed ✅ ⚠️

Posted for pipeline commit 4962134

+| ✅ 232 tests passed       |+
#| ❔   6 tests were ignored |#
!| ❗   6 tests had warnings |!
Details

❗ Test warnings:

  • nextflow_config - Config manifest.version should end in dev: 1.1.0
  • pipeline_todos - TODO string in ro-crate-metadata.json: "description": "

    \n \n <source media="(prefers-color-scheme: dark)" srcset="docs/images/nf-core-riboseq_logo_dark.png">\n <img alt="nf-core/riboseq" src="docs/images/nf-core-riboseq_logo_light.png">\n \n

    \n\nGitHub Actions CI Status\nGitHub Actions Linting StatusAWS CICite with Zenodo\nnf-test\n\nNextflow\nrun with conda\nrun with docker\nrun with singularity\nLaunch on Seqera Platform\n\nGet help on SlackFollow on TwitterFollow on MastodonWatch on YouTube\n\n## Introduction\n\nnf-core/riboseq is a bioinformatics pipeline that ...\n\n TODO nf-core:\n Complete this sentence with a 2-3 sentence summary of what types of data the pipeline ingests, a brief overview of the\n major pipeline sections and the types of output it produces. You're giving an overview to someone new\n to nf-core here, in 15-20 seconds. For an example, see https://github.com/nf-core/rnaseq/blob/master/README.md#introduction\n\n\n Include a figure that guides the user through the major workflow steps. Many nf-core\n workflows use the "tube map" design for that. See https://nf-co.re/docs/contributing/design_guidelines#examples for examples. \n Fill in short bullet-pointed list of the default steps in the pipeline 1. Read QC (FastQC)2. Present QC for raw reads (MultiQC)\n\n## Usage\n\n> [!NOTE]\n> If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with -profile test before running the workflow on actual data.\n\n Describe the minimum required steps to execute the pipeline, e.g. how to prepare samplesheets.\n Explain what rows and columns represent. For instance (please edit as appropriate):\n\nFirst, prepare a samplesheet with your input data that looks as follows:\n\nsamplesheet.csv:\n\ncsv\nsample,fastq_1,fastq_2\nCONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz\n\n\nEach row represents a fastq file (single-end) or a pair of fastq files (paired end).\n\n\n\nNow, you can run the pipeline using:\n\n update the following command to include all required parameters for a minimal example \n\nbash\nnextflow run nf-core/riboseq \\\n -profile <docker/singularity/.../institute> \\\n --input samplesheet.csv \\\n --outdir <OUTDIR>\n\n\n> [!WARNING]\n> Please provide pipeline parameters via the CLI or Nextflow -params-file option. Custom config files including those provided by the -c Nextflow option can be used to provide any configuration except for parameters; see docs.\n\nFor more details and further functionality, please refer to the usage documentation and the parameter documentation.\n\n## Pipeline output\n\nTo see the results of an example test run with a full size dataset refer to the results tab on the nf-core website pipeline page.\nFor more details about the output files and reports, please refer to the\noutput documentation.\n\n## Credits\n\nnf-core/riboseq was originally written by Jonathan Manning.\n\nWe thank the following people for their extensive assistance in the development of this pipeline:\n\n If applicable, make list of people who have also contributed \n\n## Contributions and Support\n\nIf you would like to contribute to this pipeline, please see the contributing guidelines.\n\nFor further information or help, don't hesitate to get in touch on the Slack #riboseq channel (you can join with this invite).\n\n## Citations\n\n Add citation for pipeline after first release. Uncomment lines below and update Zenodo doi and badge at the top of this file. \n If you use nf-core/riboseq for your analysis, please cite it using the following doi: 10.5281/zenodo.XXXXXX \n\n Add bibliography of tools and data used in your pipeline \n\nAn extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.\n\nYou can cite the nf-core publication as follows:\n\n> The nf-core framework for community-curated bioinformatics pipelines.\n>\n> Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.\n>\n> Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.\n",
  • pipeline_todos - TODO string in main.nf: Optionally add in-text citation tools to this list.
  • pipeline_todos - TODO string in main.nf: Optionally add bibliographic entries to this list.
  • pipeline_todos - TODO string in main.nf: Only uncomment below if logic in toolCitationText/toolBibliographyText has been filled!
  • schema_lint - Input mimetype is missing or empty

❔ Tests ignored:

  • nextflow_config - Config default ignored: params.ribo_database_manifest
  • files_unchanged - File ignored due to lint config: assets/nf-core-riboseq_logo_light.png
  • files_unchanged - File ignored due to lint config: docs/images/nf-core-riboseq_logo_light.png
  • files_unchanged - File ignored due to lint config: docs/images/nf-core-riboseq_logo_dark.png
  • files_unchanged - File ignored due to lint config: .gitignore or .prettierignore
  • actions_ci - actions_ci

✅ Tests passed:

Run details

  • nf-core/tools version 3.2.0
  • Run at 2025-05-07 07:23:16

Copy link
Member

@pinin4fjords pinin4fjords left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds reasonable, approving. Perhaps we can see how well this works in this pipeline for a while, and then deploy in rnaseq if we don't have issues.

@iraiosub
Copy link
Contributor

iraiosub commented May 7, 2025

Thanks for looking into this — --no-best does avoid the exhaustive search and ranking even with --num_alignments 1, which explains the memory savings. That said, this issue seems extremely rare, despite the widespread use of SortMeRNA in nf-core/rnaseq, and since --no-best changes alignment behavior, it might be better as a documented option for filtering or large datasets rather than a default change for everyone? It could also affect rRNA subtype reporting in MultiQC, so unless the performance gains turn out to be consistently big, it might be worth holding off on changing the default? It feels like a case where a config override would be the better path unless we start seeing this come up more frequently.

@JackCurragh
Copy link
Contributor Author

Hi @iraiosub, I take your point that rRNA subtype might change but my intuition (which is often wrong) would be that the high identity required for a alignment would mean we would be getting the right subtype regardless. I also checked with this branch versus dev using -profile test and the multiqc_sortmerna.txt files are identical. I dont have an alternative large dataset at hand currently to test this further.

I also suspect the reason this has less frequently come up for nf-core/rnaseq is that the rRNA content is significantly different between these two data types. Also, perhaps this goes agsinst my point above but, since the reads are longer in RNA-Seq multimapping of rRNA reads is less common?

Regardless, not going to merge until we are at an agreement. I can of course make this an option if its preferred but don't have a tonne of time at the moment.

@pinin4fjords
Copy link
Member

FWIW I do get queries along these lines fairly frequently for SortMeRNA in rnaseq

@iraiosub
Copy link
Contributor

iraiosub commented May 7, 2025

Thanks for the thoughtful reply @JackCurragh! Just to clarify — this is already configurable via -params-file <file>, so users can enable --no-best themselves without any extra work on your/our end. I wasn’t suggesting making it optional within the pipeline itself. I’m not against merging if that’s preferred or it's a common issue, but if so, it would be helpful to document in the pipeline docs that --no-best is set by default because the main goal is rRNA filtering, and it improves performance — with the tradeoff that the reported alignment may not always be the best match.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants