Skip to content

Better selection of sequences if more than --seqs-per-taxon #30

@iimog

Description

@iimog

Currently, if there are more than --seqs-per-taxon (default 3, #28 suggests 9) sequences for a taxon only the longest ones will be kept, ties broken arbitrarily. Andreas Kolter suggested to use voucher information from NCBI to not take multiple sequences from the same specimen and to shuffle NCBI IDs to get more diverse studies because sequences from the same study often get similar IDs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions