A GitHub repository to show comparison between models exported from KNIME and Python. Python provides the flexibility, while KNIME provides the simplicity with its codeless system.
Things to compare:
- Train deep learning models using Keras in Python and export to KNIME.
- Load the deep learning models exported from KNIME to evaluate the results in Python.
- Repeat steps 1-2 for a conventional machine learning model (e.g. Decision Tree)
- Compare the evaluation results done in Python and KNIME.
- Export the Python models to test it in KNIME too.
NOTE: The models should have exactly the same parameters if possible for the best results.
There are two notebooks: Decision Tree Comparison.ipynb for comparing Decision Tree models, and MNIST Digit Classification.ipynb for comparing CNN models trained in Keras in both KNIME and Python.
TensorFlow library for loading and training of deep learning models. Refer to the YouTube video here for detailed instructions to install TensorFlow with GPU support.
Using sklearn-pmml-model library from here, to load PMML model.
Follow the installation steps explained in that GitHub repo.
Using sklearn2pmml library from here to create PMML model.
You may refer to the results folder for each of the individual confusion matrix for different models.
- The MNIST images obtained from KNIME are named differently than usual, i.e.
Row0.png, Row1.png, ... , Row9999.png. Be careful that sorting the filenames by using only the vanillasortedfunction (without using a custom sorting method with thekeyargument) would result in an incorrect order of the images when compared to their corresponding labels. - When comparing deep learning models, be careful that the default parameters/config in KNIME could be different from Python, such as in the case of Adadelta optimizer: the default learning rate in Python is 0.001, but 1.0 in KNIME.
- The training and evaluation results for Adadelta were noticeably different (about 2-3%) between KNIME and Python for some unknown reasons, most likely because of different Keras version used in the backend of KNIME.
About the Exported Model from KNIME
- The exported deep learning models from KNIME can only be either the
SavedModelformat, or the Keras.h5format. Theh5format can retain all the functionalities associated with theKerasframework, such as being able to check all the layers by callingmodel.summary()method. While theSavedModelformat will have limited functionalities and you would need to extract the model function from themodel.signatures['serve']key to be able to run inference. Please refer to the last two sections of theMNIST Digit Classification.ipynbnotebook for the detailed steps. - You might need to downgrade NumPy's version to below
1.20if you experienced error like below when trying to load the Kerash5model exported from KNIME. Link to the issue here.
NotImplementedError: Cannot convert a symbolic Tensor (lstm_1/strided_slice:0) to a numpy array. This error may indicate that you're trying to pass a Tensor to a NumPy call, which is not supported
- You must install the
sklearn-pmml-modellibrary from here, to load PMML model. - You must also install the
sklearn2pmmllibrary from here to convert fromsklearnpipeline to a PMML model. - Both of these libraries are the best solutions that I have found for now, things might change and you might need to find other alternatives in the future.
- For the detailed steps of how to load and export PMML model, please refer to the
Decision Tree Comparison.ipynbnotebook. - Be careful that KNIME might use different model parameters/implementations from
sklearnmodels and many parameters cannot be changed easily insklearnto follow the parameters used by KNIME. - The
Decision Tree Learnerapplication used in KNIME also automatically encodes the nominal/categorical columns and I couldn't find any reference on how exactly they encode them. Therefore, in the experiment done in this repo, I usedOrdinalEncoderto encode the columns before I use the KNIME's Decision Tree Model to make inference, but the performance result was slightly different. - On the other hand, the Decision Tree model that I trained in Python has to drop the categorical columns in order to achieve similar performance as the KNIME's model.

/Python_(programming_language)-Logo.wine.png)
