Skip to content

feat: Reimplementation of umi_tools prepare-for-rsem #333

@MatthiasZepper

Description

@MatthiasZepper

Description

Unless I have overlooked something, fgumi currently has no equivalent tool to the umi_tools prepare-for-rsem command.

Since this command frequently takes a significant time e.g. in the nf-core rnaseq pipeline, a faster rewrite in Rust would be desirable. Since fgumi will prospectively be supplementing/replacing umi_tools in that pipeline, adding this functionality to fgumi rather than publishing a standalone tool seems reasonable.

I have an AI-generated draft for that feature sitting on a branch of my fork as preview. I have however not yet reviewed it extensively and am currently running comparisons of both tools with real-world data.

Please let me know if you are generally open to that addition, then I will spend more time and effort on human curation to get it into shape for a proper PR.

Related Issues

No response

Version

0.2.0

Impact

Image

Inside the rnaseq pipeline, prepare-for-salmon/rsem is frequently the longest running task, taking up to 12h per sample. Far longer than alignment and also longer than umi_tools extract and umi_tools dedup. Replacing this script could thus significantly shorten runtimes of the pipeline and it would be great if fgumi would cater for that step as well.

Suggestions

AI-generated draft for that feature

Test input

I am still looking for suitable test data. Neither umi_tools itself nor the nf-core module seem to be using a particular test dataset that truly covers all edge cases.

Checklist

  • Read the CONTRIBUTING guide
  • Checked there are no existing Issues for this feature request
  • Checked there are no Discussions that address this issue
  • Checked you have installed at least the minimum required version of Rust
  • Ensured that the latest release does not contain this feature
  • Provided all information necessary to reproduce the current behavior

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions