Skip to content

Unexpected splitting of high-AF genomes into different secondary clusters in dRep #281

@Pranjal-Bioinfo

Description

@Pranjal-Bioinfo

Hi dRep team,

I’m encountering a possible issue with how dRep assigns secondary clusters even when pairwise ANI and alignment fraction (AF) values are high.

Here’s an example from my dataset:

SPAdes-CONCOCTRefined-1UOYPL_HR.28.fa,SPAdes-CONCOCTRefined-9TDUTH_S296.109.fa,0.964682,0.3767441860465116,13
SPAdes-CONCOCTRefined-9TDUTH_S296.109.fa,SPAdes-CONCOCTRefined-1UOYPL_HR.28.fa,0.969092,0.6694214876033058,13

Comparison Direction ANI (%) Coverage (%) Primary Cluster
1UOYPL_HR.28 → 9TDUTH_S296.109 96.47% 37.67% 13
9TDUTH_S296.109 → 1UOYPL_HR.28 96.91% 66.94% 13

These two genomes belong to the same primary cluster (13) and are connected with:

Despite this, dRep splits them into different secondary clusters (13_1 and 13_2),
Field 1UOYPL_HR.28 9TDUTH_S296.109
primary_cluster 13 13
secondary_cluster 13_1 13_2
centrality 0.05 0.05
ANI and Coverage pass (≥ 95%, ≥ 0.30) pass (≥ 95%, ≥ 0.30)

Is there a reason dRep's secondary clustering would behave this way even when the ANI/AF thresholds are satisfied?

Thanks for the great tool, and I’d really appreciate clarification!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions