-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Longread only functionality #718
base: dev
Are you sure you want to change the base?
Conversation
… 97 to 90 when dealing with longreads
Warning Newer version of the nf-core template is available. Your pipeline is using an old version of the nf-core template: 3.1.0. For more documentation on how to update your pipeline, please see the nf-core documentation and Synchronisation documentation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only had a short look, sorry, but I just wanted to express my gratitude that you tackle long read assembly!
My pleasure hehe. Still WIP. I have to tidy up the code and run some validation on real data, but we're getting there! |
…rtread assemblers, when assembly input is given
…empty if no files are given
…d also return remainder in case the ch_short_reads channel is empty. Change config for FILTLONG to only use '--trim' option if shortreads are passed
So far in this PR I have
Downstream from the assembly, except for the binning preparation, the long read and short read assemblies are treated the same. |
Me and some colleagues are running some tests on ont data, and I think this works as expected. We will continue testing and play around with the parameters, but if anyone wants to have a look through the code and suggest improvements, that would be awesome! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I'm not seeing anything obvious! But I would still like to run a test (can't do that today).
Once commetns (mostly questions) addressed, I will start the manual tests :)
@@ -24,10 +24,13 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 | |||
- [#707](https://github.com/nf-core/mag/pull/707) - Make Bin QC a subworkflow (added by @dialvarezs) | |||
- [#707](https://github.com/nf-core/mag/pull/707) - Added CheckM2 as an alternative bin completeness and QC tool (added by @dialvarezs) | |||
- [#708](https://github.com/nf-core/mag/pull/708) - Added `--exclude_unbins_from_postbinning` parameter to exclude unbinned contigs from post-binning processes, speeding up Prokka in some cases (added by @dialvarezs) | |||
- [#718](https://github.com/nf-core/mag/pull/718) - Added metaMDBG and Flye as longread assemblers (added by @muabnezor) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- [#718](https://github.com/nf-core/mag/pull/718) - Added metaMDBG and Flye as longread assemblers (added by @muabnezor) | |
- [#718](https://github.com/nf-core/mag/pull/718) - Added metaMDBG and (meta)Flye as long read assemblers (suggested by ljmesi [and many others] added by @muabnezor) |
- [#732](https://github.com/nf-core/mag/pull/732) - Added support for Prokka's compliance mode with `--prokka_with_compliance --prokka_compliance_centre <xyz>` (reported by @audy and @Thomieh73, added by @jfy133) | ||
|
||
### `Changed` | ||
|
||
- [#718](https://github.com/nf-core/mag/pull/718) - Longread only input (added by @muabnezor) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- [#718](https://github.com/nf-core/mag/pull/718) - Longread only input (added by @muabnezor) | |
- [#718](https://github.com/nf-core/mag/pull/718) - Longread only input is now an option (added by @muabnezor) |
@@ -37,16 +40,22 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 | |||
- [#716](https://github.com/nf-core/mag/pull/692) - Make short read processing a subworkflow (added by @muabnezor) | |||
- [#708](https://github.com/nf-core/mag/pull/708) - Fixed channel passed as GUNC input (added by @dialvarezs) | |||
- [#729](https://github.com/nf-core/mag/pull/729) - Fixed misspecified multi-FASTQ input for single-end data in MEGAHIT (reported by John Richards, fix by @jfy133) | |||
- [#718](https://github.com/nf-core/mag/pull/718) - refactoring assembly into subworkflow (added by @muabnezor) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- [#718](https://github.com/nf-core/mag/pull/718) - refactoring assembly into subworkflow (added by @muabnezor) | |
- [#718](https://github.com/nf-core/mag/pull/718) - refactoring assembly steps into subworkflow (added by @muabnezor) |
@@ -66,6 +66,10 @@ | |||
|
|||
- [Filtlong](https://github.com/rrwick/Filtlong) | |||
|
|||
- [Flye](https://www.nature.com/articles/s41592-020-00971-x) | |||
|
|||
> Kolmogorov, M., Bickhart, D.M., Behsaz, B. et al. metaFlye: scalable long-read metagenome assembly using repeat graphs. Nat Methods 17, 1103–1110 (2020). https://doi.org/10.1038/s41592-020-00971-x |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
> Kolmogorov, M., Bickhart, D.M., Behsaz, B. et al. metaFlye: scalable long-read metagenome assembly using repeat graphs. Nat Methods 17, 1103–1110 (2020). https://doi.org/10.1038/s41592-020-00971-x | |
> Kolmogorov, M., Bickhart, D.M., Behsaz, B. et al. metaFlye: scalable long-read metagenome assembly using repeat graphs. Nat Methods 17, 1103–1110 (2020). doi: 10.1038/s41592-020-00971-x |
@@ -106,6 +110,14 @@ | |||
|
|||
> Levy Karin, E., Mirdita, M. & Söding, J. MetaEuk—sensitive, high-throughput gene discovery, and annotation for large-scale eukaryotic metagenomics. Microbiome 8, 48 (2020). 10.1186/s40168-020-00808-x | |||
|
|||
- [metaMDBG](https://www.nature.com/articles/s41587-023-01983-6) | |||
|
|||
> Benoit, G., Raguideau, S., James, R. et al. High-quality metagenome assembly from long accurate reads with metaMDBG. Nat Biotechnol 42, 1378–1383 (2024). https://doi.org/10.1038/s41587-023-01983-6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
> Benoit, G., Raguideau, S., James, R. et al. High-quality metagenome assembly from long accurate reads with metaMDBG. Nat Biotechnol 42, 1378–1383 (2024). https://doi.org/10.1038/s41587-023-01983-6 | |
> Benoit, G., Raguideau, S., James, R. et al. High-quality metagenome assembly from long accurate reads with metaMDBG. Nat Biotechnol 42, 1378–1383 (2024). doi:10.1038/s41587-023-01983-6 |
BOWTIE2_ASSEMBLY_BUILD ( assemblies ) | ||
ch_versions = Channel.empty() | ||
ch_multiqc_files = Channel.empty() | ||
// multiple symlinks to the same assembly -> use first of sorted list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// multiple symlinks to the same assembly -> use first of sorted list | |
// multiple symlinks to the same assembly -> use first of sorted list |
What is this comment referring to, it sounds a bit scary?
ch_minimap2_input_idx = ch_minimap2_input | ||
.map { meta_idx, index, meta, reads -> [ meta_idx, index ] } | ||
|
||
MINIMAP2_ASSEMBLY_ALIGN ( ch_minimap2_input_reads, ch_minimap2_input_idx, true, 'bai', false, false ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the true/false/falses? something should be parameterasable by the user?
ch_short_reads_spades | ||
|
||
main: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -244,12 +247,6 @@ def validateInputParameters(hybrid) { | |||
if (params.host_fasta && params.host_genome) { | |||
error('[nf-core/mag] ERROR: Both host fasta reference and iGenomes genome are specified to remove host contamination! Invalid combination, please specify either --host_fasta or --host_genome.') | |||
} | |||
if (hybrid && (params.host_fasta || params.host_genome)) { | |||
log.warn('[nf-core/mag]: Host read removal is only applied to short reads. Long reads might be filtered indirectly by Filtlong, which is set to use read qualities estimated based on k-mer matches to the short, already filtered reads.') | |||
if (params.longreads_length_weight > 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you verify this warning is no onger necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(the one before is fine obviously)
@@ -112,6 +107,8 @@ workflow MAG { | |||
|
|||
if (!params.keep_phix) { | |||
ch_phix_db_file = Channel.value(file("${params.phix_reference}")) | |||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the fix for the phix bug? If so would appreciate to split this out into a separate PR:
thank you @jfy133. I'm away this week, but I'll try to find time to go through your comments asap. |
This PR adds long-read only functionality to mag.
closes #662, #659, #275
PR checklist
nf-core pipelines lint
).nextflow run . -profile test,docker --outdir <OUTDIR>
).nextflow run . -profile debug,test,docker --outdir <OUTDIR>
).docs/usage.md
is updated.docs/output.md
is updated.CHANGELOG.md
is updated.README.md
is updated (including new tool citations and authors/contributors).