Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question regarding "aatype_pred_num_tokens" #9

Open
smiles724 opened this issue Oct 17, 2024 · 1 comment
Open

Question regarding "aatype_pred_num_tokens" #9

smiles724 opened this issue Oct 17, 2024 · 1 comment

Comments

@smiles724
Copy link

smiles724 commented Oct 17, 2024

Hi, Jason,

Asking question again and thanks for your patience in advance.

In your prediction model, you always set the aatype_pred_num_tokens to 21, including the traditional 20 amino acid types and the additional mask token type. Meanwhile, in your loss computation part, you manually turn the dimension for the mask token to 1e-9, a number close to 0.

Therefore, I just wonder why we cannot directly ask the model to predict 20 tokens without the mask one. Is there any advantage of your current implementation?

The only reason I can guess is that as you use an extra padding token (the same as masking token), so during the loss calculation, it will trigger error if you only predict 20 types.

@jasonkyuyim
Copy link
Owner

Both approaches are fine. I think we were going for shape compatibility by keeping the aatype dimension to always be 21. This simplified some torch operations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants