-
Notifications
You must be signed in to change notification settings - Fork 45
Add tcr epitope binding dataset #67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add tcr epitope binding dataset #67
Conversation
data/tcr_epitope_binding/meta.yaml
Outdated
- tcr binding affinity | ||
- binding affinity | ||
- binding |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel it would be better if we could include in all the "synonyms" also the binding site, e.g., "epitope binding affinity"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kjappelbaum I included "epitope binding affinity" and also "epitope binding" as synonyms.
data/tcr_epitope_binding/meta.yaml
Outdated
- id: epitope_smiles | ||
type: SMILES | ||
description: 'epitope smiles ' | ||
- id: epitope_aa | ||
type: amino acid | ||
description: epitope amino acid sequence | ||
- id: tcr_aa | ||
type: amino acid | ||
description: tcr amino acid sequence |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do I understand the dataset correctly that the binding only makes sense if we specify both the TCR and the Epitope?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is right. Given the epitope and TCR, predict if the pair binds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, we will need to add templates to sample this data correctly. There are examples for the templates in the Contribution Guide. Let me know if you want some hand with this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kjappelbaum Thanks for the feedback, I attempted to add a template. However, I am not sure if I fully understand what to do here. Can you please have a look and provide some help on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution 💯
I think I do not fully understand the dataset yet, perhaps you can help me?
for more information, see https://pre-commit.ci
…strubeyj/chemnlp into Add-TCR-epitope-binding-dataset
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again for your contribution. Before we merge, we should add the templates for sampling, as mentioned in one of my comments.
…strubeyj/chemnlp into Add-TCR-epitope-binding-dataset
I added a
meta.yaml
,transform.py
andexample_processing_and_templates.ipynb
for the TCR epitope binding data found at TDC commons. It is a dataset that contains epitope (SMILES and amino acid sequence) and TCR (amino acid sequence) pairs. For each pair there is a binary label for binding. The data is used in the Weber et al. paper.