Hi,
The gtdb_to_ncbi_majority_vote.py is great but is subject to biases when multiple genomes are incorrectly annotated on the NCBI.
Have you considered implementing more complex rules such as:
- Give more weight to genomes representative of type strains?
- Give more weight to genomes included in RefSeq?
I have performed some tests and it helped a lot to recover correct NCBI taxonomy at species level.
Best,
Florian