-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Native Standardiser. #55
Comments
A list of ChemAxon Standardizer 'Actions' can be found here: https://docs.chemaxon.com/display/docs/Standardizer+Actions. A list of the features is below. As the features are developed, we can tick or cross off (if they are unnecessary, impractical or impossible). I bolded the most desirable features in my eyes.
|
Projects that provide similar functionality are @mcs07 's MolVS (in fact MolVS is close to implementing much of the functionality - Matt, would you mind if we used any of the code?). Others are listed in MolVS README. |
There is also @flatkinson 's https://github.com/flatkinson/standardiser, which I am told is being actively used in the eTox project. Both these projects look good and battle tested. Perhaps we should write a wrapper rather than reimplement the functionality for now? |
I think most of them are trivial in RDKit. There are although few, like "Tautomerize" which are way beyond easy (althouth I think Paolo Tosco might have done something in that direction judging from last UGM presentation). Shouldn't SanitizeMol = Mesomerise? I think so. |
If you want I'm happy to help with this one. I'm assembling a list of RDKit functions (or short implementaton coment) [Still updated]
|
Hi @mwojcikowski thanks a lot for this! @MichaelLampe is currently looking into this - his branch is here - I'm unfortunately too busy with my PhD to really have much input at the moment, so perhaps you both could discuss/work on it? |
I also had a chat with @mcs07 at the recent Cambridge Cheminformatics Network Meeting, he is hoping to continue to work on MolVS when he gets some free time (he is also super busy with PhD!). Some extra features that he mentioned he is interested in that it doesn't look like ChemAxon does is ring opening/closing (e.g. linear vs cyclic glucose). He also suggested to look at @russodanielp's fork of MolVS that is showing some recent work, specifically around pipelining. |
Hi @mwojcikowski and @richlewis42. I started working on the pipeline and had it work for my purposes. Still need to clean up a bit of the code. I also am involved in a few research PhD projects but would be happy to contribute to this project of add to MolVS in my free time. |
It also looks like the Avalon Struct Checker may soon be properly integrated into RDKit: rdkit/rdkit#1054 |
A native standardis(z)er would be a great addition to the library, as currently the only way to standardise molecules is using the ChemAxon Standardizer wrapper.
The implementation should provide a similar API to the current Standardizer, namely by inheriting from
TransformFilter
. It should be configurable in code, like the rest, which should also make it configurable with YAML and JSON.Standardisation may be thought of as a series of elemental operations applied to molecules. These could be implemented as mini transformers, and the Standardizer could just be a Pipeline (this would probably require work on the Pipeline class!)
The issue with this is:
Perhaps it would be best to have a
Standardiser
object (that could possibly inherit fromPipeline
) that in turn creates the smaller objects, and keeps sensible defaults.This makes it harder to have fine grain control over these smaller objects though (maybe we want to 'keep_grignards' or something), so perhaps we could pass the actual transformer if we wanted control over this:
This would (with luck) serialise to JSON and YAML for free, be easily configurable in a manner consistent with the rest of the library.
The text was updated successfully, but these errors were encountered: