Question regarding "aatype_pred_num_tokens" #9

smiles724 · 2024-10-17T05:54:30Z

Hi, Jason,

Asking question again and thanks for your patience in advance.

In your prediction model, you always set the aatype_pred_num_tokens to 21, including the traditional 20 amino acid types and the additional mask token type. Meanwhile, in your loss computation part, you manually turn the dimension for the mask token to 1e-9, a number close to 0.

Therefore, I just wonder why we cannot directly ask the model to predict 20 tokens without the mask one. Is there any advantage of your current implementation?

The only reason I can guess is that as you use an extra padding token (the same as masking token), so during the loss calculation, it will trigger error if you only predict 20 types.

The text was updated successfully, but these errors were encountered:

jasonkyuyim · 2024-10-22T13:30:12Z

Both approaches are fine. I think we were going for shape compatibility by keeping the aatype dimension to always be 21. This simplified some torch operations.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question regarding "aatype_pred_num_tokens" #9

Question regarding "aatype_pred_num_tokens" #9

smiles724 commented Oct 17, 2024 •

edited

Loading

jasonkyuyim commented Oct 22, 2024

Question regarding "aatype_pred_num_tokens" #9

Question regarding "aatype_pred_num_tokens" #9

Comments

smiles724 commented Oct 17, 2024 • edited Loading

jasonkyuyim commented Oct 22, 2024

smiles724 commented Oct 17, 2024 •

edited

Loading