Description
Unless I have overlooked something, fgumi currently has no equivalent tool to the umi_tools prepare-for-rsem command.
Since this command frequently takes a significant time e.g. in the nf-core rnaseq pipeline, a faster rewrite in Rust would be desirable. Since fgumi will prospectively be supplementing/replacing umi_tools in that pipeline, adding this functionality to fgumi rather than publishing a standalone tool seems reasonable.
I have an AI-generated draft for that feature sitting on a branch of my fork as preview. I have however not yet reviewed it extensively and am currently running comparisons of both tools with real-world data.
Please let me know if you are generally open to that addition, then I will spend more time and effort on human curation to get it into shape for a proper PR.
Related Issues
No response
Version
0.2.0
Impact
Inside the rnaseq pipeline, prepare-for-salmon/rsem is frequently the longest running task, taking up to 12h per sample. Far longer than alignment and also longer than umi_tools extract and umi_tools dedup. Replacing this script could thus significantly shorten runtimes of the pipeline and it would be great if fgumi would cater for that step as well.
Suggestions
AI-generated draft for that feature
Test input
I am still looking for suitable test data. Neither umi_tools itself nor the nf-core module seem to be using a particular test dataset that truly covers all edge cases.
Checklist
Description
Unless I have overlooked something, fgumi currently has no equivalent tool to the
umi_tools prepare-for-rsemcommand.Since this command frequently takes a significant time e.g. in the nf-core rnaseq pipeline, a faster rewrite in Rust would be desirable. Since fgumi will prospectively be supplementing/replacing
umi_toolsin that pipeline, adding this functionality to fgumi rather than publishing a standalone tool seems reasonable.I have an AI-generated draft for that feature sitting on a branch of my fork as preview. I have however not yet reviewed it extensively and am currently running comparisons of both tools with real-world data.
Please let me know if you are generally open to that addition, then I will spend more time and effort on human curation to get it into shape for a proper PR.
Related Issues
No response
Version
0.2.0
Impact
Inside the rnaseq pipeline, prepare-for-salmon/rsem is frequently the longest running task, taking up to 12h per sample. Far longer than alignment and also longer than
umi_tools extractandumi_tools dedup. Replacing this script could thus significantly shorten runtimes of the pipeline and it would be great if fgumi would cater for that step as well.Suggestions
AI-generated draft for that feature
Test input
I am still looking for suitable test data. Neither
umi_toolsitself nor the nf-core module seem to be using a particular test dataset that truly covers all edge cases.Checklist
CONTRIBUTINGguide