Skip to content

Use type strains for GTDB to NCBI taxonomy translation #552

@fplazaonate

Description

@fplazaonate

Hi,

The gtdb_to_ncbi_majority_vote.py is great but is subject to biases when multiple genomes are incorrectly annotated on the NCBI.

Have you considered implementing more complex rules such as:

  1. Give more weight to genomes representative of type strains?
  2. Give more weight to genomes included in RefSeq?

I have performed some tests and it helped a lot to recover correct NCBI taxonomy at species level.

Best,
Florian

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions