Covalent ligands: (non)leaving atoms and unrealistic bond lengths #159

alephreish · 2024-12-02T15:16:57Z

I'm trying to fold proteins with covalent ligands and face the following problems:

_chem_comp_atom.pdbx_leaving_atom_flag is ignored, such that the resulting structures contain all atoms of the ligand including those should have been removed - this can be remedied by removing them beforehand
the resulting bond lengths are incorrect (see below) - I haven't found a remedy for this

A patched version of helixfold3 also supports covalent bonds: it suffers from the first, less severe problem but it does produce realistic bond lengths.

I'm playing with bacteriorhodopsin as a test case. The leaving atom of the RET ligand is O1 and the linking atom is C15 which covalently binds to the NZ atom of a specific lysine residue. The affected bonds are: "C15 - NZ" which is the bond between the ligand and the linking atom of the amino acid and "NZ - CE" which is the bond between the linking atom of the amino acid and the next carbon of the side chain (see the snapshots below). The resulting bond lengths are as follows:

	C15 - NZ	NZ - CE	structure
5ZIM	1.350	1.482	ground truth
HF3	1.228	1.497	helixfold3 with the O1 atom removed
AF3	1.110	0.960	alphafold3 with the O1 atom removed
AF3 (with O1)	1.143	0.968	alphafold with the O1 in place (the default behavior of AF3)

As can be seen, both bonds are affected but the bond within the side chain of the amino acid ("NZ - CE") is affected even more which is really weird.

Below are snapshots from pymol. The ligand is in red, the protein in orange and the linking atom of the lysine residue is in blue. Notice that , the structure of ligand+lysine is much closer to reality in the case of helixfold3. (Notice that pymol automatically connects the atoms C15 and CE in the AF3 structures because they are way too close to each either. Weird things happen with other software as well: e.g. if you try to convert the structures with Open Babel, it incorrectly interprets the "NZ - CE" bond in the AF3 structures as double):

5ZIM	HF3

AF3	AF3 with O1

Attached is AF3.json which has userCCD containing the ligand with the leaving atoms (O1) and hydrogens removed. For the default behavior of AF3, remove userCCD.

The text was updated successfully, but these errors were encountered:

alephreish · 2024-12-02T22:01:17Z

Here is a different test case: 5AQD, PEB bound to cystein (no atoms leaving).

	CAA - SG	SG - CB
5AQD	1.801	1.819
HF3	1.775	1.827
AF3	1.773	1.345

"CAA - SG" is the bond between the ligand and Cys82 and "SG - CB" is the bond between the linking sulphur atom and the next carbon atom in the side chain. The "CAA - SG" bond is reasonably close to the expectation but the "SG - CB" is missing ~0.5A - roughly the same difference as in the case of the "NZ - CE" bond above.

sunshine20241203 · 2024-12-03T09:42:55Z

Hi, I am recently performing AF3 for covalent interaction prediction, could you share that how you build your CCD file for you ligand? Mannually or by some tool?

alephreish · 2024-12-03T09:58:58Z

All of the ligands I'm interested in are either available in ccd or can be derived from those. For the task of removing the leaving atoms, I use pdbeccdutils.

sunshine20241203 · 2024-12-03T11:21:14Z

thanks a lot! I tried a ligand which has ccd file，I am sure that the program runs well according to the log information,but af3 did not give me any result. Do you think it is due to that the leaving atoms have not been removed in the ccd file?Beilei LeiNorthwest A&F University在 2024年12月3日，17:59，Andrey Rozenberg ***@***.***> 写道： All of the ligands I'm interested in are either available in ccd or can be derived from those. For the task of removing the leaving atoms, I use pdbeccdutils. —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: ***@***.***>

sunshine20241203 · 2024-12-03T13:09:22Z

ok，so sorry about it.Beilei LeiNorthwest A&F University在 2024年12月3日，20:17，Andrey Rozenberg ***@***.***> 写道： @sunshine20241203 This is not the right place to discuss this, sorry. I've opened this issue to bring attention to the specific problem in question. —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>

YoavShamir5 · 2024-12-05T09:54:59Z

I am getting similar results in terms of unrealistic bond lengths. In the following example, I am predicting a covalent complex (PDB id: 5P9J). In my case, I downloaded the cif of the covalent ligand, and then provided it as a user-provided CCD. The issue is similar to what @alephreish described - the length of the bond within the cysteine sidechain (CB-SG bond) is predicted with an unrealistic length of 1.1A. The covalent attachment bond (connecting cysteine SG with atom CAA of the ligand) has a reasonable length. I should note that the prediction complex and the ground truth overlay very well in terms of RMSD (see figure), so it is possible that this is simply the performance of AF3 on covalent complexes - good RMSD, with a single unrealistic bond length:

joshabramson · 2024-12-06T16:44:54Z

Thanks for these detailed reports.

AF3 only handles leaving atoms properly for Glycans - for general polymer-ligand covalent bonds we did not drop the leaving atoms from the predictions during training or inference. Because during training no position losses were applied for the excess atom, and in some cases the leaving atom is not inlcuded in the molecule definition, it is not too surprising that the positions of the other non-leaving atoms are not greatly affected by inclusion of exclusion of the leaving atom. Overall, manually removing the atom after running the prediction is probably fine.

Regarding the bad bond lengths around the covalent bond, this might be resolvable by running a short relaxation or idealisation procedure on the output. We tried this by manually removing the O1 atom and idealising in REFMAC5 and it resolved the issue (note that we had to convert the output from cif to pdb format for it to work in REFMAC5).

alephreish · 2024-12-06T16:55:28Z

@joshabramson Thanks for the response. Sure, minimization does fix the bonds (we've doing it with gromacs). A fix on the side of alphafold3 itself would be cleaner though.

joshabramson · 2024-12-06T17:49:13Z

Apologies, we do not currently plan to make model upgrades for this repository.

joshabramson · 2024-12-07T10:54:10Z

However, one can try the following to resolve things without changing the model: define a custom ccd code in the input and use that as a modified residue ptm. Two things to try:

Replace the bonded LYS residue by a custom version that is actually identical to the original (must use a different non-existing code, e.g. LYS-mod) - this will force the model to view the residue as a modified residue and thus use one model 'token' per atom, rather than putting the whole residue into one model token. This might affect the quality of the bond.
Replace the whole LYS+RET combination by a single large molecule and set that as the modified residue (replacing the original LYS). The covalent bond will need to be internal to the molecule definition, and the O1 atom should not be present.

alephreish · 2024-12-07T19:02:40Z

@joshabramson The second option (to represent the entire part of residue+ligand as a single modified residue using modifications) works correctly: the bond lengths are fairly close to the ground truth. Thanks a lot for the idea! This is a very straightforward approach for ligands covalently bound to a single residue. For ligand moieties attached to multiple residues simultaneously (e.g. /5aqd/KC/R/PEB`188/ in 5AQD), one has to explore how modifications plays out with bondedAtomPairs.

@YoavShamir5 and anyone else interested in implementing this:

Some residue+ligand combinations are directly available in ccd (this is the case e.g. with LYS+RET=LYR). alphafold3 does remove the leaving atom (OXT) in this case. In other cases: If some existing structures have the ligand of interest, the easiest is probably to take the coordinates and the bonds for the ligand and the residue from the real structures. Otherwise, one has to connect the ligand with its residue manually or programmatically.

YoavShamir5 · 2024-12-08T18:37:29Z

Thanks a lot, @alephreish and @joshabramson! Really helpful.
Regarding the second method suggested - I wonder if this enables one to run MSA and template search (run_data_pipeline) just once for a protein, and then predict (run_inference) for multiple covalent ligands using those results. I guess one can use the unmodified protein sequence for MSA, and then for each ligand of interest, one can modify the json generated by run_data_pipeline (manually modifying the "modifications" field), as input to the inference step. Does this sound reasonable? Or should residue modifications be included in the data pipeline (in which case - the data step should be run for each covalent ligand of choice).

joshabramson · 2024-12-08T18:49:28Z

Note that msa and templates can be given directly as input via the input json, which stops automatic computation of the msa/templates, so one only needs to compute them once then can reuse.

However there are currently issues with msa for modified residues that may get in the way, see #54

YoavShamir5 · 2024-12-08T19:01:41Z

@joshabramson thanks a lot for your input, I am not sure I am totally clear about your point - I suggested running run_inference on the unmodified sequence, and then modifying the generated output json by mentioning a different modification each time we run inference on a new ligand. Is this what you suggest to do in cases where multiple covalent ligands are to be predicted with the same protein?

joshabramson · 2024-12-08T19:10:18Z

Yes, you can use the output from the first inference to give you msa and templates for future runs where you are only switching one modified residue in the input.

alephreish · 2024-12-17T20:25:50Z

I've been checking out the first option (to force the ligand-binding residue to be represented as atoms by renaming it with modifications and to connect it to the ligand with bondedAtomPairs) as it would be more flexible. From my very preliminary experiments, it seems to work to some degree: The bond lengths are only a bit shorter but the angles between the bonds around the residue-ligand connection are off. I'm planning to stick with the second option, but if someone wants to continue exploring this: to avoid #54 (before it is properly fixed), use a residue name in "ptmType" which maps to the original amino acid in CCD_NAME_TO_ONE_LETTER (e.g. for arginine, pick e.g. "modifications": [ { "ptmType": "AR7" } ] and re-define AR7 in userCCD).

Augustin-Zidek · 2024-12-17T21:22:01Z

@alephreish I fixed #54 today.

YoavShamir5 · 2024-12-24T08:43:06Z

Regarding the second option (user-provided CCD of the ligand + residue) - is there any documentation regarding preparation of this CCD? I guess it is the same as the general documentation for user-provided CCDs, but for example, is it important for atom names of the residues to be preserved (residue's CB must be named CB etc.)?

alephreish · 2024-12-24T09:15:17Z

I personally use pymol API (e.g. to fuse the ligand with the amino acid or to remove leaving atoms) but the cif it outputs is oversimplified. To make to it work, I have to combine it with gemmi to update the cif with new coordinates, atoms and bonds. There are probably better ways of doing it.

I do not think atom names matter at all as long as the elements are correct. They might be important for downstream analyses though. E.g. in my pipeline, I have a post-processing step which separates the ligand from its amino acid into a residue of its own, at which point I also rename the atoms of the ligand if needed.

YoavShamir5 · 2024-12-24T09:28:33Z

Thanks a lot for the advice, @alephreish! Finally, do you know if the user-provided PTM-residue should be in its original form prior to formation of the polypeptide (e.g. it should include the OH group) or not? I guess that it should, because CCD residue entries do have this form. In any case, I'd recommend to the AF3 team to add a few lines in the documentation for users interested in user-provided PTMs.

alephreish · 2024-12-24T17:38:02Z

@YoavShamir5 The modified residue should have the N, CA, C, O and OXT atoms, the rest is up to you. The bond with the ligand should be as you want it to be. If it might be interesting to you or anyone else, I can release the code I use to fuse the ligands with the amino acids and convert it to ccd cif.

YoavShamir5 · 2024-12-25T08:45:35Z

Thanks a lot @alephreish, I can generate the fused residue-ligand on my side. Regarding minimization of the original covalent option via GROMACS or REFMAC5 - did you enforce distance restraints between the covalently bound atom of the ligand and the covalent residue of the protein?

Augustin-Zidek added the question Further information is requested label Dec 2, 2024

This comment was marked as off-topic.

Sign in to view

alephreish closed this as completed Dec 6, 2024

joshabramson reopened this Dec 7, 2024

alephreish closed this as completed Dec 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Covalent ligands: (non)leaving atoms and unrealistic bond lengths #159

Covalent ligands: (non)leaving atoms and unrealistic bond lengths #159

alephreish commented Dec 2, 2024 •

edited

Loading

alephreish commented Dec 2, 2024 •

edited

Loading

sunshine20241203 commented Dec 3, 2024

alephreish commented Dec 3, 2024

sunshine20241203 commented Dec 3, 2024 via email

This comment was marked as off-topic.

sunshine20241203 commented Dec 3, 2024 via email

YoavShamir5 commented Dec 5, 2024

joshabramson commented Dec 6, 2024

alephreish commented Dec 6, 2024

joshabramson commented Dec 6, 2024

joshabramson commented Dec 7, 2024

alephreish commented Dec 7, 2024 •

edited

Loading

YoavShamir5 commented Dec 8, 2024

joshabramson commented Dec 8, 2024 •

edited

Loading

YoavShamir5 commented Dec 8, 2024

joshabramson commented Dec 8, 2024

alephreish commented Dec 17, 2024

Augustin-Zidek commented Dec 17, 2024

YoavShamir5 commented Dec 24, 2024

alephreish commented Dec 24, 2024

YoavShamir5 commented Dec 24, 2024

alephreish commented Dec 24, 2024

YoavShamir5 commented Dec 25, 2024

Covalent ligands: (non)leaving atoms and unrealistic bond lengths #159

Covalent ligands: (non)leaving atoms and unrealistic bond lengths #159

Comments

alephreish commented Dec 2, 2024 • edited Loading

alephreish commented Dec 2, 2024 • edited Loading

sunshine20241203 commented Dec 3, 2024

alephreish commented Dec 3, 2024

sunshine20241203 commented Dec 3, 2024 via email

This comment was marked as off-topic.

sunshine20241203 commented Dec 3, 2024 via email

YoavShamir5 commented Dec 5, 2024

joshabramson commented Dec 6, 2024

alephreish commented Dec 6, 2024

joshabramson commented Dec 6, 2024

joshabramson commented Dec 7, 2024

alephreish commented Dec 7, 2024 • edited Loading

YoavShamir5 commented Dec 8, 2024

joshabramson commented Dec 8, 2024 • edited Loading

YoavShamir5 commented Dec 8, 2024

joshabramson commented Dec 8, 2024

alephreish commented Dec 17, 2024

Augustin-Zidek commented Dec 17, 2024

YoavShamir5 commented Dec 24, 2024

alephreish commented Dec 24, 2024

YoavShamir5 commented Dec 24, 2024

alephreish commented Dec 24, 2024

YoavShamir5 commented Dec 25, 2024

alephreish commented Dec 2, 2024 •

edited

Loading

alephreish commented Dec 2, 2024 •

edited

Loading

alephreish commented Dec 7, 2024 •

edited

Loading

joshabramson commented Dec 8, 2024 •

edited

Loading