You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Given a directory containing the PDB files with the following PDB IDs:
8G2V,7UZI
Among them, the chain instances (A, B, C, D, E, F, G, H, I, J) of 8G2V share nearly identical structure and thus should be clustered into the same group.
Current Behavior
Each chain instance of 8G2V be a idependent cluster.
Steps to Reproduce (for bugs)
foldseek easy-cluster 8G2V.cif.gz 7UZI.cif.gz result tmp --tmscore-threshold 0.5
Context
There would be another problem if run:
foldseek easy-cluster 8G2V.cif.gz result tmp --tmscore-threshold 0.5
giving:
easy-cluster 8G2V.cif.gz result tmp --tmscore-threshold 0.5
MMseqs Version: 9.427df8a
Substitution matrix aa:3di.out,nucl:3di.out
Seed substitution matrix aa:3di.out,nucl:3di.out
Sensitivity 4
k-mer length 0
Target search mode 0
k-score seq:2147483647,prof:2147483647
Max sequence length 65535
Max results per query 300
Split database 0
Split mode 2
Split memory limit 0
Coverage threshold 0
Coverage mode 0
Compositional bias 1
Compositional bias 1
Diagonal scoring true
Exact k-mer matching 0
Mask residues 1
Mask residues probability 0.9
Mask lower case residues 1
Minimum diagonal score 30
Selected taxa
Spaced k-mers 1
Preload mode 0
Spaced k-mer pattern
Local temporary path
Threads 20
Compressed 0
Verbosity 3
TMscore threshold 0.5
LDDT threshold 0
Sort by structure bit score 1
Alignment type 2
Exact TMscore 0
Add backtrace false
Alignment mode 0
Alignment mode 0
E-value threshold 10
Seq. id. threshold 0
Min alignment length 0
Seq. id. mode 0
Alternative alignments 0
Max reject 2147483647
Max accept 2147483647
Gap open cost aa:10,nucl:10
Gap extension cost aa:1,nucl:1
TMalign hit order 0
TMalign fast 1
Cluster mode 0
Max connected component depth 1000
Similarity type 2
Weight file name
Cluster Weight threshold 0.9
Single step clustering false
Cascaded clustering steps 3
Cluster reassign false
Remove temporary files true
Force restart with latest tmp false
MPI runner
k-mers per sequence 21
Scale k-mers per sequence aa:0.000,nucl:0.200
Adjust k-mer length false
Shift hash 67
Include only extendable false
Skip repeating k-mers false
Rescore mode 0
Remove hits by seq. id. and coverage false
Sort results 0
Path to ProstT5
Chain name mode 0
Write mapping file 0
Mask b-factor threshold 0
Coord store mode 2
Write lookup file 1
Input format 0
File Inclusion Regex .*
File Exclusion Regex ^$
cluster tmp/7126666531623036926/input tmp/7126666531623036926/clu tmp/7126666531623036926/clu_tmp --tmscore-threshold 0.5 --remove-tmp-files 1
Set cluster sensitivity to -s 8.000000
Set cluster mode SET COVER
Set cluster iterations to 3
tmp/7126666531623036926/clu_tmp/4050237725070610072/input_step_redundancy_ca exists and will be overwritten
createsubdb tmp/7126666531623036926/clu_tmp/4050237725070610072/clu_redundancy tmp/7126666531623036926/input_ca tmp/7126666531623036926/clu_tmp/4050237725070610072/input_step_redundancy_ca -v 3 --subdb-mode 1
Time for merging to input_step_redundancy_ca: 0h 0m 0s 0ms
Time for processing: 0h 0m 0s 1ms
prefilter tmp/7126666531623036926/clu_tmp/4050237725070610072/input_step_redundancy_ss tmp/7126666531623036926/clu_tmp/4050237725070610072/input_step_redundancy_ss tmp/7126666531623036926/clu_tmp/4050237725070610072/pref_step0 --sub-mat 'aa:3di.out,nucl:3di.out' --seed-sub-mat 'aa:3di.out,nucl:3di.out' -s 1 -k 0 --target-search-mode 0 --k-score seq:2147483647,prof:2147483647 --alph-size aa:21,nucl:5 --max-seq-len 65535 --max-seqs 100 --split 0 --split-mode 2 --split-memory-limit 0 -c 0.8 --cov-mode 0 --comp-bias-corr 0 --comp-bias-corr-scale 1 --diag-score 0 --exact-kmer-matching 0 --mask 0 --mask-prob 0.9 --mask-lower-case 1 --min-ungapped-score 0 --add-self-matches 1 --spaced-kmer-mode 1 --db-load-mode 0 --pca substitution:1.100,context:1.400 --pcb substitution:4.100,context:5.800 --threads 20 --compressed 0 -v 3
Query database size: 10 type: Aminoacid
Estimated memory consumption: 977M
Target database size: 10 type: Aminoacid
Index table k-mer threshold: 154 at k-mer size 6
Index table: counting k-mers
[=================================================================] 100.00% 10 0s 0ms
Index table: Masked residues: 0
No k-mer could be extracted for the database tmp/7126666531623036926/clu_tmp/4050237725070610072/input_step_redundancy_ss.
Maybe the sequences length is less than 14 residues.
Error: Prefilter step 0 died
Error: Search died
Your Environment
Which foldseek version was used (Statically-compiled, self-compiled, Conda, etc.): conda 9.427df8a
The text was updated successfully, but these errors were encountered:
The two things I suspect (not on the team but have had similar issues) are the prefiltering step and the very low tm-score threshold you have.
The prefiltering step groups similar proteins before doing the full alignment to save time, which can be less stringent than the threshold you want to group. Since you're still working with a small number of proteins you can just add '--exhaustive-search' to skip this step entirely, though it may become quite slow if you move to larger datasets.
In this case you should also probably have a much stricter tm-score cutoff. I would start with something like 0.8 which I think is a reasonable threshold for homologous proteins but you'll probably need trial and error.
Last thought is to check the qtmscore and ttmscore output since the proteins in the first pdb are quite short, so you might get the signal you want normalizing the tmscore by one protein vs. the other.
Expected Behavior
Given a directory containing the PDB files with the following PDB IDs:
Among them, the chain instances (A, B, C, D, E, F, G, H, I, J) of
8G2V
share nearly identical structure and thus should be clustered into the same group.Current Behavior
Each chain instance of
8G2V
be a idependent cluster.Steps to Reproduce (for bugs)
Context
There would be another problem if run:
giving:
Your Environment
The text was updated successfully, but these errors were encountered: