Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Native Standardiser. #55

Open
lewisacidic opened this issue Aug 29, 2016 · 9 comments
Open

Native Standardiser. #55

lewisacidic opened this issue Aug 29, 2016 · 9 comments

Comments

@lewisacidic
Copy link
Owner

A native standardis(z)er would be a great addition to the library, as currently the only way to standardise molecules is using the ChemAxon Standardizer wrapper.

The implementation should provide a similar API to the current Standardizer, namely by inheriting from TransformFilter. It should be configurable in code, like the rest, which should also make it configurable with YAML and JSON.

>>> std = skchem.standardizers.Standardizer(remove_fragments=True, disconnect_metals=True, neutralize=True)
>>> m = skchem.Mol.from_smiles('CC.CCC', name='ethane_n_propane')
>>> std.transform(m).to_smiles()
'CCC'

>>> std.transform([m, skchem.Mol.from_smiles('CCO[Na]', 'sodium_ethoxide')
batch
ethane_n_propane   CCC
sodium_ethoxide     CCO 
Name: structure, dtype: object

Standardisation may be thought of as a series of elemental operations applied to molecules. These could be implemented as mini transformers, and the Standardizer could just be a Pipeline (this would probably require work on the Pipeline class!)

from skchem.standardizers import FragmentRemover, MetalDisconnector ...
from skchem.pipeline import Pipeline

std = Pipeline([
               FragmentRemover(remove_smallest=True),  
               MetalDisconnector(keep_grignards=True)...
])

The issue with this is:

  1. This makes for a painfully verbose API
  2. There is a predetermined 'most sensible'/'correct' order to perform the transforms (for example, its probably better to remove fragments before tautomerizing as tautomerizing the other fragments is wasted effort).

Perhaps it would be best to have a Standardiser object (that could possibly inherit from Pipeline) that in turn creates the smaller objects, and keeps sensible defaults.

class Standardizer(Pipeline):
    def __init__(self, remove_fragments=True, disconnect_metals=True...):
        # add in the 'sensible order'
        if remove_fragments:
             self.objects.append(FragmentRemover())
        etc.

This makes it harder to have fine grain control over these smaller objects though (maybe we want to 'keep_grignards' or something), so perhaps we could pass the actual transformer if we wanted control over this:

class Standardizer(Pipeline):
    def __init__(self, remove_fragments=True, disconnect_metals=True...):
        if remove_fragments:
             if not isinstance(remove_fragments, Transformer):
                 remove_fragments = FragmentRemover()
             self.objects.append(remove_fragments)
        etc.

This would (with luck) serialise to JSON and YAML for free, be easily configurable in a manner consistent with the rest of the library.

@lewisacidic
Copy link
Owner Author

A list of ChemAxon Standardizer 'Actions' can be found here:

https://docs.chemaxon.com/display/docs/Standardizer+Actions.

A list of the features is below. As the features are developed, we can tick or cross off (if they are unnecessary, impractical or impossible). I bolded the most desirable features in my eyes.

  • Add Explicit Hydrogens
  • Alias to Atom
  • Alias to Group
  • Aromatize
  • Clean 2D
  • Clean 3D
  • Clear Isotopes
  • Clear Stereo
  • Contract S-groups
  • Convert Double Bonds
  • Convert Pi-metal Bonds
  • Convert to Enhanced Stereo
  • Create Group
  • Dearomatize
  • Disconnect Metal Atoms
  • Expand S-groups
  • Expand Stoichiometry
  • Map
  • Map Reaction
  • Mesmerise
  • Neutralize
  • Remove Absolute Stereo
  • Remove Atom Values
  • Remove Attached Data
  • Remove Explicit Hydrogens
  • Remove Fragment
  • Remove R-group definitions
  • Remove Solvents
  • Remove Stereo Care Box
  • Replace Atoms
  • Set Absolute Stereo
  • Set Hydrogen Isotope Symbol
  • Strip Salts
  • Tautomerize
  • Transform
  • Ungroup S-groups
  • Unmap
  • Wedge Clean

@lewisacidic
Copy link
Owner Author

Projects that provide similar functionality are @mcs07 's MolVS (in fact MolVS is close to implementing much of the functionality - Matt, would you mind if we used any of the code?). Others are listed in MolVS README.

@lewisacidic
Copy link
Owner Author

There is also @flatkinson 's https://github.com/flatkinson/standardiser, which I am told is being actively used in the eTox project.

Both these projects look good and battle tested. Perhaps we should write a wrapper rather than reimplement the functionality for now?

@mwojcikowski
Copy link

mwojcikowski commented Sep 6, 2016

I think most of them are trivial in RDKit. There are although few, like "Tautomerize" which are way beyond easy (althouth I think Paolo Tosco might have done something in that direction judging from last UGM presentation).

Shouldn't SanitizeMol = Mesomerise? I think so.

@mwojcikowski
Copy link

mwojcikowski commented Sep 6, 2016

If you want I'm happy to help with this one. I'm assembling a list of RDKit functions (or short implementaton coment)

[Still updated]

  • Add Explicit Hydrogens: AddHs
  • Alias to Atom [RDKit does not support aliases]
  • Alias to Group [RDKit does not support aliases]
  • Aromatize: SetAromaticity
  • Clean 2D: AllChem.Compute2DCoords
  • Clean 3D: AllChem.EmbedMolecule, AllChem.MMFFOptimizeMolecule or AllChem.UFFOptimizeMolecule
  • Clear Isotopes [Isotopes are only supported on Query Mols]
  • Clear Stereo [manualy clear Stereo and Chirality tags for atoms and bonds]
  • Contract S-groups
  • Convert Double Bonds [RDKit relies on Stereo flag, ChemAxon seam to have additional argument for drawing]
  • Convert Pi-metal Bonds
  • Convert to Enhanced Stereo
  • Create Group
  • Dearomatize: Kekulize
  • Disconnect Metal Atoms: FragmentOnBonds
  • Expand S-groups
  • Expand Stoichiometry
  • Map
  • Map Reaction
  • Mesomerise: SanitizeMol (?)
  • Neutralize
  • Remove Absolute Stereo
  • Remove Atom Values
  • Remove Attached Data
  • Remove Explicit Hydrogens: RemoveHs
  • Remove Fragment: DeleteSubstruct
  • Remove R-group definitions
  • Remove Solvents: DeleteSubstruct + Solvents list
  • Remove Stereo Care Box
  • Replace Atoms: ReplaceSubstructs
  • Set Absolute Stereo
  • Set Hydrogen Isotope Symbol
  • Strip Salts: SaltRemover
  • Tautomerize: ResonanceMolSupplier
  • Transform
  • Ungroup S-groups
  • Unmap
  • Wedge Clean

@lewisacidic
Copy link
Owner Author

Hi @mwojcikowski thanks a lot for this! @MichaelLampe is currently looking into this - his branch is here - I'm unfortunately too busy with my PhD to really have much input at the moment, so perhaps you both could discuss/work on it?

@lewisacidic
Copy link
Owner Author

I also had a chat with @mcs07 at the recent Cambridge Cheminformatics Network Meeting, he is hoping to continue to work on MolVS when he gets some free time (he is also super busy with PhD!). Some extra features that he mentioned he is interested in that it doesn't look like ChemAxon does is ring opening/closing (e.g. linear vs cyclic glucose).

He also suggested to look at @russodanielp's fork of MolVS that is showing some recent work, specifically around pipelining.

@russodanielp
Copy link

russodanielp commented Sep 6, 2016

Hi @mwojcikowski and @richlewis42. I started working on the pipeline and had it work for my purposes. Still need to clean up a bit of the code.

I also am involved in a few research PhD projects but would be happy to contribute to this project of add to MolVS in my free time.

@mcs07
Copy link

mcs07 commented Sep 8, 2016

It also looks like the Avalon Struct Checker may soon be properly integrated into RDKit: rdkit/rdkit#1054
Might be useful for many standardization tasks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants