Foyer allows users to describe atomtypes using a modified version of SMARTS You may already be familiar with SMILES representations for describing chemical structures. SMARTS is a straightforward extension of this notation.
Consider the following example defining the OPLS-AA atomtypes for a methyl group carbon and its hydrogen atoms:
<ForceField>
<AtomTypes>
<Type name="opls_135" class="CT" element="C" mass="12.01100" def="[C;X4](C)(H)(H)H" desc="alkane CH3"/>
<Type name="opls_140" class="HC" element="H" mass="1.00800" def="H[C;X4]" desc="alkane H"/>
</AtomTypes>
</ForceField>
This .xml
format is an extension of the OpenMM force field format
The above example utilizes two additional .xml
attributes supported by foyer:
def
and desc
. The atomtype that we are attempting to match is always the
first token in the SMARTS string, in the above example, [C;X4]
and H
.
The opls_135
(methyl group carbon) is defined by a SMARTS
string indicated a carbon with 4 bonds, a carbon neighbor and 3
hydrogen neighbors. The opls_140
(alkane hydrogen) is defined simply as a
hydrogen atom bonded to a carbon with 4 bonds.
When multiple atomtype definitions can apply to a given atom, we must establish precedence between those definitions. Many other atomtypers determine rule precedence by placing more specific rules first and evaluate those in sequence, breaking out of the loop as soon as a match is found.
While this approach works, it becomes more challenging to maintain the correct ordering of rules as the number of atomtypes grows. Foyer iteratively runs all rules on all atoms and each atom maintains a whitelist (rules that apply) and a blacklist (rules that have been superceded by another rule). The set difference between the white- and blacklists yields the correct atomtype if the force field is implemented correctly.
We can add a rule to a blacklist using the overrides
attribute in the .xml
file. To illustrate an example where overriding can be used consider the
following types describing alkenes and benzene:
<ForceField>
<AtomTypes>
<Type name="opls_141" class="CM" element="C" mass="12.01100" def="[C;X3](C)(C)C" desc="alkene C (R2-C=)"/>
<Type name="opls_142" class="CM" element="C" mass="12.01100" def="[C;X3](C)(C)H" desc="alkene C (RH-C=)"/>
<Type name="opls_144" class="HC" element="H" mass="1.00800" def="[H][C;X3]" desc="alkene H"/>
<Type name="opls_145" class="CA" element="C" mass="12.01100" def="[C;X3;r6]1[C;X3;r6][C;X3;r6][C;X3;r6][C;X3;r6][C;X3;r6]1" overrides="opls_141,opls_142"/>
<Type name="opls_146" class="HA" element="H" mass="1.00800" def="[H][C;%opls_145]" overrides="opls_144" desc="benzene H"/>
</AtomTypes>
</ForceField>
If we're atomtyping a benzene molecule, the carbon atoms will match the SMARTS
patterns for both opls_142
and opls_145
. Without the overrides
attribute,
foyer will notify you that multiple atomtypes were found for each carbon.
Providing the overrides
indicates that if the opls_145
pattern matches, it
should supercede the specified rules.
We currently do not (yet) support all of SMARTS' features. Here we're keeping track of which portions are supported.