Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make charge symmetrization stereo-aware #738

Closed
wants to merge 8 commits into from
Closed

Conversation

maxentile
Copy link
Collaborator

@maxentile maxentile commented May 10, 2022

  • Disables passing symmetrize=True to the AM1 routine (since this appears to average over stereo-oblivious atom classes, and occasionally introduces very small asymmetries in cases like benzene)
  • Symmetrizes by averaging over RDKit stereo-sensitive atom classes
  • Tests on benzene (where 2 distinct charges are expected, but 3 were produced)
  • Tests on a case where symmetry is broken by stereo configuration (where n_atoms distinct charges are expected, but ~n_atoms/2 were produced)

Initial PR description:

symmetrize=True flag was already being passed to toolkit functions but there were still small asymmetries in some cases (see e.g. #737 (comment) )

Issue magnitude: very very small. (In 82 of 642 FreeSolv molecules, the partial charges within a symmetry class varied up to ~0.0002 e. However, it's still nice for len(set(assigned_charges)) for benzene to be 2 rather than 3.)

symmetrize=True flag was already being passed to toolkit functions but there were still small asymmetries in some cases
@maxentile maxentile marked this pull request as ready for review May 11, 2022 13:59
@maxentile maxentile requested a review from jkausrelay May 11, 2022 14:20
@maxentile maxentile requested review from badisa and proteneer May 11, 2022 20:39
from openeye import oechem

oemol = convert_to_oe(rdmol)
oechem.OEPerceiveSymmetry(oemol)
Copy link
Owner

@proteneer proteneer May 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. is this stereo-aware? (i.e. a chiral graph may be asymmetric, but may be falsely perceived to be symmetric unless the generators of the automorphism group is aware of this)

  2. bool OEPerceiveSymmetry(OEMolBase &, bool includeH=true, bool automorph=false), what does automorph do here? API docs unfortunately are quite sparse. @bp-kelley @badisa might know...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After talking to Paul Hawkins, automorph will find symmetries within the molecule and exclude them.
Screen Shot 2022-05-23 at 8 15 47 AM
In the above image, it will account for the symmetry of the molecule over 7, 5, 4, 8. Note that this can blow up (IE take hours) in the case of large cyclic molecules

@proteneer
Copy link
Owner

To clarify on symmetry perception under stereochemistry, consider this molecule:

F\C=C\C=C\C=C/F
Screen Shot 2022-05-11 at 4 46 47 PM

@maxentile
Copy link
Collaborator Author

maxentile commented May 13, 2022

Sorry, not sure what the correct behavior is for that molecule.
Should the symmetry classes (or charges) be asymmetric within this molecule?

Numbered by symmetry class
image

Enumerating all configurations around the double bonds

F/C=C/C=C/C=C/F
F/C=C/C=C/C=C\F
F/C=C/C=C\C=C/F
F/C=C/C=C\C=C\F
F/C=C\C=C/C=C/F
F/C=C\C=C/C=C\F
F/C=C\C=C\C=C/F
F/C=C\C=C\C=C\F
F\C=C/C=C/C=C/F
F\C=C/C=C/C=C\F
F\C=C/C=C\C=C/F
F\C=C/C=C\C=C\F
F\C=C\C=C/C=C/F
F\C=C\C=C/C=C\F
F\C=C\C=C\C=C/F
F\C=C\C=C\C=C\F

The symmetry classes assigned by this method always appear the same

[6 5 4 3 3 4 5 6 2 1 0 0 1 2]
[6 5 4 3 3 4 5 6 2 1 0 0 1 2]
[6 5 4 3 3 4 5 6 2 1 0 0 1 2]
[6 5 4 3 3 4 5 6 2 1 0 0 1 2]
[6 5 4 3 3 4 5 6 2 1 0 0 1 2]
[6 5 4 3 3 4 5 6 2 1 0 0 1 2]
[6 5 4 3 3 4 5 6 2 1 0 0 1 2]
[6 5 4 3 3 4 5 6 2 1 0 0 1 2]
[6 5 4 3 3 4 5 6 2 1 0 0 1 2]
[6 5 4 3 3 4 5 6 2 1 0 0 1 2]
[6 5 4 3 3 4 5 6 2 1 0 0 1 2]
[6 5 4 3 3 4 5 6 2 1 0 0 1 2]
[6 5 4 3 3 4 5 6 2 1 0 0 1 2]
[6 5 4 3 3 4 5 6 2 1 0 0 1 2]
[6 5 4 3 3 4 5 6 2 1 0 0 1 2]
[6 5 4 3 3 4 5 6 2 1 0 0 1 2]

The charges currently assigned (without applying this extra symmetrize step) vary by stereoisomer, but are already exactly or nearly constant within each symmetry class.

Charges assigned to each stereoisomer

[-2.081, 0.824, -2.241, -1.359, -1.359, -2.241, 0.824, -2.081, 1.696, 1.673, 1.487, 1.487, 1.673, 1.696]
[-2.082, 0.794, -2.22, -1.398, -1.398, -2.22, 0.794, -2.081, 1.686, 1.69, 1.53, 1.53, 1.69, 1.686]
[-2.093, 0.804, -2.245, -1.346, -1.346, -2.245, 0.804, -2.092, 1.697, 1.649, 1.533, 1.533, 1.649, 1.697]
[-2.083, 0.818, -2.25, -1.333, -1.333, -2.25, 0.818, -2.082, 1.696, 1.689, 1.462, 1.462, 1.689, 1.696]
[-2.087, 0.801, -2.228, -1.355, -1.355, -2.228, 0.801, -2.086, 1.694, 1.654, 1.521, 1.521, 1.654, 1.694]
[-2.088, 0.788, -2.229, -1.376, -1.376, -2.229, 0.788, -2.087, 1.681, 1.623, 1.601, 1.601, 1.623, 1.681]
[-2.104, 0.757, -2.194, -1.419, -1.419, -2.194, 0.757, -2.103, 1.694, 1.693, 1.572, 1.572, 1.693, 1.694]
[-2.091, 0.789, -2.221, -1.389, -1.389, -2.221, 0.789, -2.09, 1.691, 1.687, 1.533, 1.533, 1.687, 1.691]
[-2.091, 0.789, -2.221, -1.389, -1.389, -2.221, 0.789, -2.09, 1.691, 1.687, 1.533, 1.533, 1.687, 1.691]
[-2.104, 0.757, -2.194, -1.419, -1.419, -2.194, 0.757, -2.103, 1.694, 1.693, 1.572, 1.572, 1.693, 1.694]
[-2.088, 0.788, -2.229, -1.376, -1.376, -2.229, 0.788, -2.087, 1.681, 1.623, 1.601, 1.601, 1.623, 1.681]
[-2.087, 0.801, -2.228, -1.355, -1.355, -2.228, 0.801, -2.086, 1.694, 1.654, 1.521, 1.521, 1.654, 1.694]
[-2.083, 0.818, -2.25, -1.333, -1.333, -2.25, 0.818, -2.082, 1.696, 1.689, 1.462, 1.462, 1.689, 1.696]
[-2.093, 0.804, -2.245, -1.346, -1.346, -2.245, 0.804, -2.092, 1.697, 1.649, 1.533, 1.533, 1.649, 1.697]
[-2.082, 0.794, -2.22, -1.398, -1.398, -2.22, 0.794, -2.081, 1.686, 1.69, 1.53, 1.53, 1.69, 1.686]
[-2.081, 0.824, -2.241, -1.359, -1.359, -2.241, 0.824, -2.081, 1.696, 1.673, 1.487, 1.487, 1.673, 1.696]

Deviation within each symmetry class for each stereoisomer

(max_c q_c - min_c q_c)
for c in sym_classes

0 0 0 0 0 0 0.00071
0 0 0 0 0 0 0.00047
0 0 0 0 0 0 0.00071
0 0 0 0 0 0 0.00071
0 0 0 0 0 0 0.00071
0 0 0 0 0 0 0.00071
0 0 0 0 0 0 0.00094
0 0 0 0 0 0 0.00094
0 0 0 0 0 0 0.00094
0 0 0 0 0 0 0.00094
0 0 0 0 0 0 0.00071
0 0 0 0 0 0 0.00071
0 0 0 0 0 0 0.00071
0 0 0 0 0 0 0.00071
0 0 0 0 0 0 0.00047
0 0 0 0 0 0 0.00071

The effect of the change proposed in this PR on this molecule would be to eliminate the ~ 0.0001 e asymmetry in the charge assigned to the 2 fluorines.

@proteneer
Copy link
Owner

proteneer commented May 13, 2022

The two fluorines should not be in the same symmetry class (for the stereoisomer depicted in the diagram). Considering all conformations that satisfy the depicted stereo, running AM1 calculations should've resulted in different charges and should not be symmetrized. Although the diff is very small [-2.082 vs -2.081] shown below, they should be different!

[-2.082, 0.794, -2.22, -1.398, -1.398, -2.22, 0.794, -2.081, 1.686, 1.69, 1.53, 1.53, 1.69, 1.686]

I suspect it may be because AM1 is not sensitive enough on the raw charges for the example I provided, maybe try this one, where the environmental differences around the Fs are more drastic:

O/C(=C(/O)\C(\Cl)=C(\F)Cl)/C(/Cl)=C(\F)Cl

Screen Shot 2022-05-13 at 11 27 42 AM

@maxentile
Copy link
Collaborator Author

Ahh, thanks for the more dramatic example!

Enumerating all configurations around the double-bonds in the molecule O/C(=C(/O)\C(\Cl)=C(\F)Cl)/C(/Cl)=C(\F)Cl, again the picture is similar:
image

  • Proposed symmetry classes are stereo invariant (same assignment of 8 classes to 16 atoms for all stereoisomers)
  • Currently assigned charges are stereo sensitive (differences up to ~0.03 e across stereoisomers)
  • Currently assigned charges within each symmetry class within each stereoisomer are exactly or nearly constant (exactly constant for 7 classes, exception: difference of ~0.00003 e for the class containing oxygen)

If we want the charges to be stereo asymmetric, I wonder if we should modify these lines

AM1: oequacpac.OEAM1Charges(symmetrize=True),
AM1ELF10: oequacpac.OEELFCharges(oequacpac.OEAM1Charges(symmetrize=True), 10),
AM1BCC: oequacpac.OEAM1BCCCharges(symmetrize=True),

@maxentile
Copy link
Collaborator Author

Toggling symmetrize=True to symmetrize=False in AM1ELF10

AM1: oequacpac.OEAM1Charges(symmetrize=True),
AM1ELF10: oequacpac.OEELFCharges(oequacpac.OEAM1Charges(symmetrize=True), 10),
AM1BCC: oequacpac.OEAM1BCCCharges(symmetrize=True),
causes large differences of up to 0.1 e within each graph symmetry class within each stereoisomer of O/C(=C(/O)\C(\Cl)=C(\F)Cl)/C(/Cl)=C(\F)Cl (as should be expected).

@maxentile maxentile changed the title Add extra symmetrize step to oe_assign_charges Make charge symmetrization stereo-aware May 13, 2022
@maxentile
Copy link
Collaborator Author

Based on offline discussion with @proteneer -- looked into using the RDKit alternative https://www.rdkit.org/docs/source/rdkit.Chem.rdmolfiles.html#rdkit.Chem.rdmolfiles.CanonicalRankAtoms . This appears to have the desired behavior!

The PR now:

  • Disables passing symmetrize=True to the AM1 routine (since this appears to average over stereo-oblivious atom classes)
  • Symmetrizes the result by averaging over RDKit stereo-sensitive atom classes

The number of distinct partial charges is reduced on benzene (from 3 on master to 2 on a220996), and increased on O/C(=C(/O)\C(\Cl)=C(\F)Cl)/C(/Cl)=C(\F)Cl (from 9 on master to 16 on a220996)

Copy link
Owner

@proteneer proteneer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice! would be good sanity check that this doesn't drastically affect FreeSolv results but LGTM!

@maxentile
Copy link
Collaborator Author

would be good sanity check that this doesn't drastically affect FreeSolv results

Checking effect on computed charges for FreeSolv molecules (ref: master (451803e) -> proposed: most recent PR commit (9ce6cb9))

Histogram of signed charge changes per atom (note log y scale)

image

For ~99.2% of atoms, the effect on charge is < 0.001 e, and ~99.7% of atoms, the effect on charge is < 0.01 e.

Histogram of unsigned charge changes per molecule (max over atoms within each molecule, note log x scale)
image

Displaying the molecules where the difference exceeds arbitrary threshold of 0.01 e
image

Interesting that these all have one or more nitrate groups...

Haven't yet checked the effect of the proposed change on computed hydration free energies, and I'd be curious also to compare the assigned symmetry classes, esp. for the plotted molecules.

@proteneer
Copy link
Owner

Can this be merged?

@proteneer
Copy link
Owner

Can you try this (thanks @bp-kelley for the suggestion on ResonanceMolSupplier!):

from rdkit import Chem
import numpy as np

def resonance_aware_ranking(mol):

    flags = Chem.ResonanceFlags() | Chem.ResonanceFlags.ALLOW_CHARGE_SEPARATION
    permutations = []

    for m in Chem.ResonanceMolSupplier(mol, flags):
        res = Chem.CanonicalRankAtoms(m, breakTies=False, includeChirality=True)
        permutations.append(list(res))

    permutations = np.array(permutations).T

    # identify which group each orbit belongs to
    iota = 0
    kv = dict()
    classes = []
    for a_idx, orbit in enumerate(permutations):
        canon_orbit = tuple(sorted(orbit))
        if canon_orbit not in kv:
            kv[canon_orbit] = iota
            iota += 1

        classes.append(kv[canon_orbit])

    return classes
        
print(resonance_aware_rank(Chem.AddHs(Chem.MolFromSmiles("[O-]C(=O)C"))))
print(resonance_aware_rank(Chem.AddHs(Chem.MolFromSmiles("CO[N+]([O-])=O"))))

@maxentile
Copy link
Collaborator Author

Thanks @proteneer and @bp-kelley -- will try this out!

(Just want to strengthen the test to assert that the difference between old and new symmetrization behavior is indeed limited to stereo-awareness.)

@maxentile
Copy link
Collaborator Author

Stale -- migrated to #817

@maxentile maxentile closed this Sep 7, 2022
@maxentile maxentile deleted the symmetrize-am1 branch November 7, 2022 19:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants