Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Longread only functionality #718

Open
wants to merge 35 commits into
base: dev
Choose a base branch
from
Open

Conversation

muabnezor
Copy link
Contributor

@muabnezor muabnezor commented Nov 28, 2024

This PR adds long-read only functionality to mag.

closes #662, #659, #275

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • If necessary, also make a PR on the nf-core/mag branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core pipelines lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

@muabnezor muabnezor added the WIP Work in progress label Nov 28, 2024
@muabnezor muabnezor requested a review from jfy133 November 28, 2024 12:28
Copy link

github-actions bot commented Nov 28, 2024

Warning

Newer version of the nf-core template is available.

Your pipeline is using an old version of the nf-core template: 3.1.0.
Please update your pipeline to the latest version.

For more documentation on how to update your pipeline, please see the nf-core documentation and Synchronisation documentation.

@muabnezor muabnezor changed the base branch from master to dev November 28, 2024 12:29
Copy link
Collaborator

@d4straub d4straub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only had a short look, sorry, but I just wanted to express my gratitude that you tackle long read assembly!

nextflow.config Show resolved Hide resolved
conf/modules.config Outdated Show resolved Hide resolved
@muabnezor
Copy link
Contributor Author

I only had a short look, sorry, but I just wanted to express my gratitude that you tackle long read assembly!

My pleasure hehe. Still WIP. I have to tidy up the code and run some validation on real data, but we're getting there!

@muabnezor
Copy link
Contributor Author

So far in this PR I have

  • changed the validation schema to allow for long read only input
  • Added a host removal track for long reads using minimap2 as aligner
  • Added Flye and metaMDBG as longread assemblers
  • refactoring of code so that all the assembly code is moved to a subworkflow

Downstream from the assembly, except for the binning preparation, the long read and short read assemblies are treated the same.

@muabnezor muabnezor removed the WIP Work in progress label Dec 12, 2024
@muabnezor
Copy link
Contributor Author

Me and some colleagues are running some tests on ont data, and I think this works as expected. We will continue testing and play around with the parameters, but if anyone wants to have a look through the code and suggest improvements, that would be awesome!

@jfy133 jfy133 linked an issue Jan 20, 2025 that may be closed by this pull request
Copy link
Member

@jfy133 jfy133 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I'm not seeing anything obvious! But I would still like to run a test (can't do that today).

Once commetns (mostly questions) addressed, I will start the manual tests :)

@@ -24,10 +24,13 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- [#707](https://github.com/nf-core/mag/pull/707) - Make Bin QC a subworkflow (added by @dialvarezs)
- [#707](https://github.com/nf-core/mag/pull/707) - Added CheckM2 as an alternative bin completeness and QC tool (added by @dialvarezs)
- [#708](https://github.com/nf-core/mag/pull/708) - Added `--exclude_unbins_from_postbinning` parameter to exclude unbinned contigs from post-binning processes, speeding up Prokka in some cases (added by @dialvarezs)
- [#718](https://github.com/nf-core/mag/pull/718) - Added metaMDBG and Flye as longread assemblers (added by @muabnezor)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- [#718](https://github.com/nf-core/mag/pull/718) - Added metaMDBG and Flye as longread assemblers (added by @muabnezor)
- [#718](https://github.com/nf-core/mag/pull/718) - Added metaMDBG and (meta)Flye as long read assemblers (suggested by ljmesi [and many others] added by @muabnezor)

- [#732](https://github.com/nf-core/mag/pull/732) - Added support for Prokka's compliance mode with `--prokka_with_compliance --prokka_compliance_centre <xyz>` (reported by @audy and @Thomieh73, added by @jfy133)

### `Changed`

- [#718](https://github.com/nf-core/mag/pull/718) - Longread only input (added by @muabnezor)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- [#718](https://github.com/nf-core/mag/pull/718) - Longread only input (added by @muabnezor)
- [#718](https://github.com/nf-core/mag/pull/718) - Longread only input is now an option (added by @muabnezor)

@@ -37,16 +40,22 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- [#716](https://github.com/nf-core/mag/pull/692) - Make short read processing a subworkflow (added by @muabnezor)
- [#708](https://github.com/nf-core/mag/pull/708) - Fixed channel passed as GUNC input (added by @dialvarezs)
- [#729](https://github.com/nf-core/mag/pull/729) - Fixed misspecified multi-FASTQ input for single-end data in MEGAHIT (reported by John Richards, fix by @jfy133)
- [#718](https://github.com/nf-core/mag/pull/718) - refactoring assembly into subworkflow (added by @muabnezor)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- [#718](https://github.com/nf-core/mag/pull/718) - refactoring assembly into subworkflow (added by @muabnezor)
- [#718](https://github.com/nf-core/mag/pull/718) - refactoring assembly steps into subworkflow (added by @muabnezor)

@@ -66,6 +66,10 @@

- [Filtlong](https://github.com/rrwick/Filtlong)

- [Flye](https://www.nature.com/articles/s41592-020-00971-x)

> Kolmogorov, M., Bickhart, D.M., Behsaz, B. et al. metaFlye: scalable long-read metagenome assembly using repeat graphs. Nat Methods 17, 1103–1110 (2020). https://doi.org/10.1038/s41592-020-00971-x
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
> Kolmogorov, M., Bickhart, D.M., Behsaz, B. et al. metaFlye: scalable long-read metagenome assembly using repeat graphs. Nat Methods 17, 1103–1110 (2020). https://doi.org/10.1038/s41592-020-00971-x
> Kolmogorov, M., Bickhart, D.M., Behsaz, B. et al. metaFlye: scalable long-read metagenome assembly using repeat graphs. Nat Methods 17, 1103–1110 (2020). doi: 10.1038/s41592-020-00971-x

@@ -106,6 +110,14 @@

> Levy Karin, E., Mirdita, M. & Söding, J. MetaEuk—sensitive, high-throughput gene discovery, and annotation for large-scale eukaryotic metagenomics. Microbiome 8, 48 (2020). 10.1186/s40168-020-00808-x

- [metaMDBG](https://www.nature.com/articles/s41587-023-01983-6)

> Benoit, G., Raguideau, S., James, R. et al. High-quality metagenome assembly from long accurate reads with metaMDBG. Nat Biotechnol 42, 1378–1383 (2024). https://doi.org/10.1038/s41587-023-01983-6
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
> Benoit, G., Raguideau, S., James, R. et al. High-quality metagenome assembly from long accurate reads with metaMDBG. Nat Biotechnol 42, 1378–1383 (2024). https://doi.org/10.1038/s41587-023-01983-6
> Benoit, G., Raguideau, S., James, R. et al. High-quality metagenome assembly from long accurate reads with metaMDBG. Nat Biotechnol 42, 1378–1383 (2024). doi:10.1038/s41587-023-01983-6

BOWTIE2_ASSEMBLY_BUILD ( assemblies )
ch_versions = Channel.empty()
ch_multiqc_files = Channel.empty()
// multiple symlinks to the same assembly -> use first of sorted list
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// multiple symlinks to the same assembly -> use first of sorted list
// multiple symlinks to the same assembly -> use first of sorted list

What is this comment referring to, it sounds a bit scary?

ch_minimap2_input_idx = ch_minimap2_input
.map { meta_idx, index, meta, reads -> [ meta_idx, index ] }

MINIMAP2_ASSEMBLY_ALIGN ( ch_minimap2_input_reads, ch_minimap2_input_idx, true, 'bai', false, false )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the true/false/falses? something should be parameterasable by the user?

ch_short_reads_spades

main:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

@@ -244,12 +247,6 @@ def validateInputParameters(hybrid) {
if (params.host_fasta && params.host_genome) {
error('[nf-core/mag] ERROR: Both host fasta reference and iGenomes genome are specified to remove host contamination! Invalid combination, please specify either --host_fasta or --host_genome.')
}
if (hybrid && (params.host_fasta || params.host_genome)) {
log.warn('[nf-core/mag]: Host read removal is only applied to short reads. Long reads might be filtered indirectly by Filtlong, which is set to use read qualities estimated based on k-mer matches to the short, already filtered reads.')
if (params.longreads_length_weight > 1) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you verify this warning is no onger necessary?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(the one before is fine obviously)

@@ -112,6 +107,8 @@ workflow MAG {

if (!params.keep_phix) {
ch_phix_db_file = Channel.value(file("${params.phix_reference}"))
} else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the fix for the phix bug? If so would appreciate to split this out into a separate PR:

@muabnezor
Copy link
Contributor Author

thank you @jfy133. I'm away this week, but I'll try to find time to go through your comments asap.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add separate Nanopore input option
4 participants