-
Notifications
You must be signed in to change notification settings - Fork 710
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Covalent ligands: (non)leaving atoms and unrealistic bond lengths #159
Comments
Here is a different test case: 5AQD, PEB bound to cystein (no atoms leaving).
"CAA - SG" is the bond between the ligand and Cys82 and "SG - CB" is the bond between the linking sulphur atom and the next carbon atom in the side chain. The "CAA - SG" bond is reasonably close to the expectation but the "SG - CB" is missing ~0.5A - roughly the same difference as in the case of the "NZ - CE" bond above. |
Hi, I am recently performing AF3 for covalent interaction prediction, could you share that how you build your CCD file for you ligand? Mannually or by some tool? |
All of the ligands I'm interested in are either available in ccd or can be derived from those. For the task of removing the leaving atoms, I use pdbeccdutils. |
thanks a lot! I tried a ligand which has ccd file,I am sure that the program runs well according to the log information,but af3 did not give me any result. Do you think it is due to that the leaving atoms have not been removed in the ccd file?Beilei LeiNorthwest A&F University在 2024年12月3日,17:59,Andrey Rozenberg ***@***.***> 写道:
All of the ligands I'm interested in are either available in ccd or can be derived from those. For the task of removing the leaving atoms, I use pdbeccdutils.
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: ***@***.***>
|
This comment was marked as off-topic.
This comment was marked as off-topic.
ok,so sorry about it.Beilei LeiNorthwest A&F University在 2024年12月3日,20:17,Andrey Rozenberg ***@***.***> 写道:
@sunshine20241203 This is not the right place to discuss this, sorry. I've opened this issue to bring attention to the specific problem in question.
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>
|
I am getting similar results in terms of unrealistic bond lengths. In the following example, I am predicting a covalent complex (PDB id: 5P9J). In my case, I downloaded the cif of the covalent ligand, and then provided it as a user-provided CCD. The issue is similar to what @alephreish described - the length of the bond within the cysteine sidechain (CB-SG bond) is predicted with an unrealistic length of 1.1A. The covalent attachment bond (connecting cysteine SG with atom CAA of the ligand) has a reasonable length. I should note that the prediction complex and the ground truth overlay very well in terms of RMSD (see figure), so it is possible that this is simply the performance of AF3 on covalent complexes - good RMSD, with a single unrealistic bond length: |
Thanks for these detailed reports. AF3 only handles leaving atoms properly for Glycans - for general polymer-ligand covalent bonds we did not drop the leaving atoms from the predictions during training or inference. Because during training no position losses were applied for the excess atom, and in some cases the leaving atom is not inlcuded in the molecule definition, it is not too surprising that the positions of the other non-leaving atoms are not greatly affected by inclusion of exclusion of the leaving atom. Overall, manually removing the atom after running the prediction is probably fine. Regarding the bad bond lengths around the covalent bond, this might be resolvable by running a short relaxation or idealisation procedure on the output. We tried this by manually removing the O1 atom and idealising in REFMAC5 and it resolved the issue (note that we had to convert the output from cif to pdb format for it to work in REFMAC5). |
@joshabramson Thanks for the response. Sure, minimization does fix the bonds (we've doing it with gromacs). A fix on the side of alphafold3 itself would be cleaner though. |
Apologies, we do not currently plan to make model upgrades for this repository. |
However, one can try the following to resolve things without changing the model: define a custom ccd code in the input and use that as a modified residue ptm. Two things to try:
|
@joshabramson The second option (to represent the entire part of residue+ligand as a single modified residue using @YoavShamir5 and anyone else interested in implementing this: Some residue+ligand combinations are directly available in ccd (this is the case e.g. with LYS+RET=LYR). alphafold3 does remove the leaving atom (OXT) in this case. In other cases: If some existing structures have the ligand of interest, the easiest is probably to take the coordinates and the bonds for the ligand and the residue from the real structures. Otherwise, one has to connect the ligand with its residue manually or programmatically. |
Thanks a lot, @alephreish and @joshabramson! Really helpful. |
Note that msa and templates can be given directly as input via the input json, which stops automatic computation of the msa/templates, so one only needs to compute them once then can reuse. However there are currently issues with msa for modified residues that may get in the way, see #54 |
@joshabramson thanks a lot for your input, I am not sure I am totally clear about your point - I suggested running run_inference on the unmodified sequence, and then modifying the generated output json by mentioning a different modification each time we run inference on a new ligand. Is this what you suggest to do in cases where multiple covalent ligands are to be predicted with the same protein? |
Yes, you can use the output from the first inference to give you msa and templates for future runs where you are only switching one modified residue in the input. |
I've been checking out the first option (to force the ligand-binding residue to be represented as atoms by renaming it with |
@alephreish I fixed #54 today. |
Regarding the second option (user-provided CCD of the ligand + residue) - is there any documentation regarding preparation of this CCD? I guess it is the same as the general documentation for user-provided CCDs, but for example, is it important for atom names of the residues to be preserved (residue's CB must be named CB etc.)? |
I personally use pymol API (e.g. to fuse the ligand with the amino acid or to remove leaving atoms) but the cif it outputs is oversimplified. To make to it work, I have to combine it with gemmi to update the cif with new coordinates, atoms and bonds. There are probably better ways of doing it. I do not think atom names matter at all as long as the elements are correct. They might be important for downstream analyses though. E.g. in my pipeline, I have a post-processing step which separates the ligand from its amino acid into a residue of its own, at which point I also rename the atoms of the ligand if needed. |
Thanks a lot for the advice, @alephreish! Finally, do you know if the user-provided PTM-residue should be in its original form prior to formation of the polypeptide (e.g. it should include the OH group) or not? I guess that it should, because CCD residue entries do have this form. In any case, I'd recommend to the AF3 team to add a few lines in the documentation for users interested in user-provided PTMs. |
@YoavShamir5 The modified residue should have the N, CA, C, O and OXT atoms, the rest is up to you. The bond with the ligand should be as you want it to be. If it might be interesting to you or anyone else, I can release the code I use to fuse the ligands with the amino acids and convert it to ccd cif. |
Thanks a lot @alephreish, I can generate the fused residue-ligand on my side. Regarding minimization of the original covalent option via GROMACS or REFMAC5 - did you enforce distance restraints between the covalently bound atom of the ligand and the covalent residue of the protein? |
I'm trying to fold proteins with covalent ligands and face the following problems:
A patched version of helixfold3 also supports covalent bonds: it suffers from the first, less severe problem but it does produce realistic bond lengths.
I'm playing with bacteriorhodopsin as a test case. The leaving atom of the RET ligand is O1 and the linking atom is C15 which covalently binds to the NZ atom of a specific lysine residue. The affected bonds are: "C15 - NZ" which is the bond between the ligand and the linking atom of the amino acid and "NZ - CE" which is the bond between the linking atom of the amino acid and the next carbon of the side chain (see the snapshots below). The resulting bond lengths are as follows:
As can be seen, both bonds are affected but the bond within the side chain of the amino acid ("NZ - CE") is affected even more which is really weird.
Below are snapshots from pymol. The ligand is in red, the protein in orange and the linking atom of the lysine residue is in blue. Notice that , the structure of ligand+lysine is much closer to reality in the case of helixfold3. (Notice that pymol automatically connects the atoms C15 and CE in the AF3 structures because they are way too close to each either. Weird things happen with other software as well: e.g. if you try to convert the structures with Open Babel, it incorrectly interprets the "NZ - CE" bond in the AF3 structures as double):
Attached is AF3.json which has
userCCD
containing the ligand with the leaving atoms (O1) and hydrogens removed. For the default behavior of AF3, removeuserCCD
.The text was updated successfully, but these errors were encountered: