You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The fact that foldseek is even able to make MSAs based on the clustering result itself is extremely impressive. However, for the following issue makes the alignments unusable:
Expected Behavior
For sequences within result2msa a3m files to align well to each other, since they belong to the same cluster, and have been structurally aligned with conservative clustering criteria (e.g. -c 0.8)
Current Behavior
Currently, most alignments in MSAs for some clusters are unusable because they look like this:
with very few aligned residues (way below -c 0.8) and e-values well above the specified cutoff (-e 0.001).
Steps to Reproduce (for bugs)
Perform foldseek clustering with foldseek cluster
Create clustering tsv file with foldseek createtsv on clustering result db
Create a3m db with foldseek result2msa DB DB DB_C a3m --msa-format-mode 6 on clustering result db
Browse a3m db ffdata file with less until one encounters e-values above 1
Context
One would expect cluster members to match each other better than the above example, as the e-value cutoff was set to 0.001 and -c was set to 0.8. While it is understandable that not all members of a cluster can match each other equally well, it's still hard to reconcile why the cluster representative itself matches no other cluster member within the specified thresholds. Selecting a better cluster representative would be a low-effort way to vastly improve alignment quality.
Your Environment
foldseek Version: 9.427df8a
The text was updated successfully, but these errors were encountered:
The fact that foldseek is even able to make MSAs based on the clustering result itself is extremely impressive. However, for the following issue makes the alignments unusable:
Expected Behavior
For sequences within
result2msa
a3m files to align well to each other, since they belong to the same cluster, and have been structurally aligned with conservative clustering criteria (e.g.-c 0.8
)Current Behavior
Currently, most alignments in MSAs for some clusters are unusable because they look like this:
with very few aligned residues (way below
-c 0.8
) and e-values well above the specified cutoff (-e 0.001
).Steps to Reproduce (for bugs)
foldseek cluster
foldseek createtsv
on clustering result dbfoldseek result2msa DB DB DB_C a3m --msa-format-mode 6
on clustering result dbless
until one encounters e-values above 1Context
One would expect cluster members to match each other better than the above example, as the e-value cutoff was set to 0.001 and -c was set to 0.8. While it is understandable that not all members of a cluster can match each other equally well, it's still hard to reconcile why the cluster representative itself matches no other cluster member within the specified thresholds. Selecting a better cluster representative would be a low-effort way to vastly improve alignment quality.
Your Environment
foldseek Version: 9.427df8a
The text was updated successfully, but these errors were encountered: