You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: vignettes/process-prediction-workflow.Rmd
+37-46
Original file line number
Diff line number
Diff line change
@@ -27,46 +27,47 @@ library(purrr)
27
27
28
28
# Introduction
29
29
The goal of processpredictR is to perform prediction tasks on processes using event logs and Transformer models.
30
-
5 process monitoring tasks are defined as follows:
30
+
The 5 process monitoring tasks are defined as follows:
31
31
32
-
*outcome: predict the case outcome, which can be the last activity, or a manually defined variable
33
-
*next activity: predict the next activity instance
34
-
*remaining trace: predict the sequence of all next activity instances
35
-
*next time: predict the start time of the next activity instance
36
-
*remaining time: predict the remaining time till the end of the case
32
+
*_outcome_: predict the case outcome, which can be the last activity, or a manually defined variable
33
+
*_next activity_: predict the next activity instance
34
+
*_remaining trace_: predict the sequence of all next activity instances
35
+
*_next time_: predict the start time of the next activity instance
36
+
*_remaining time_: predict the remaining time till the end of the case
37
37
38
-
The overall approach using `processpredictR` is shown in the Figure below. `prepare_examples()` transforms logs into a dataset that can be used for training and prediction, which is thereafter split into train and test set. Subsequently a model is made, compiled and fit. Finally, the model can be used to predict and can be evaluatied
38
+
The overall approach using `processpredictR` is shown in the Figure below. `prepare_examples()` transforms logs into a dataset that can be used for training and prediction, which is thereafter split into train and test set. Subsequently a model is made, compiled and fit. Finally, the model can be used to predict and can be evaluated
Different levels of customization are offered. Using `create_model`, a standard off-the-shelf model can be created for each of the supported tasks, including standard features.
44
+
Different levels of customization are offered. Using `create_model()`, a standard off-the-shelf model can be created for each of the supported tasks, including standard features.
45
45
46
-
A first customization is to include, additional features, such as case or event attributes. These can be configured in the `prepare_examples` step, and they will be processed automatically (normalized for numerical features, or hot-encoded for categorical features).
46
+
A first customization is to include additional features, such as case or event attributes. These can be configured in the `prepare_examples()` step, and they will be processed automatically (normalized for numerical features, or hot-encoded for categorical features).
47
47
48
-
A further way to customize your model, is to only generate the input layer of the model with `create_model`, and define the remainder of the model yourself by adding `keras` layers using the provided `stack_layers` function.
48
+
A further way to customize your model, is to only generate the input layer of the model with `create_model()`, and define the remainder of the model yourself by adding `keras` layers using the provided `stack_layers()` function.
49
49
50
-
Going beyond that, you can also create the model entirely yourself using `keras`, including the preprocessing of the data. Auxilliary functions are provided to help you with, e.g., tokenezing the activity sequences.
50
+
Going beyond that, you can also create the model entirely yourself using `keras`, including the preprocessing of the data. Auxiliary functions are provided to help you with, e.g., tokenizing activity sequences.
51
+
52
+
In the remainder of this tutorial, each of the steps and possible avenues for customization will be described in more detail.
51
53
52
-
In the remained of this tutorial, each of the steps and possible avenues for customization will be described in more detail.
53
54
# Preprocessing
54
-
As a first step in the process prediction workflow we use `prepare_examples` to obtain a dataset where:
55
+
As a first step in the process prediction workflow we use `prepare_examples()` to obtain a dataset, where:
55
56
56
57
* each row/observation is a unique activity instance id
57
58
* the prefix(_list) column stores the sequence of activities already executed in the case
58
-
* necessary feature and target variables are calculated
59
+
* necessary features and target variables are calculated and/or added
59
60
60
-
The returning object is of class `ppred_examples_df`, which inherits from `tbl_df`.
61
+
The returned object is of class `ppred_examples_df`, which inherits from `tbl_df`.
61
62
62
-
In this tutorial, we will use the `traffic_fines` event log from `eventdataR`. Note that both `eventlog` and `activitylog` objects, as defined by `bupaR` are supported.
63
+
In this tutorial we will use the `traffic_fines` event log from `eventdataR`. Note that both `eventlog` and `activitylog` objects, as defined by `bupaR` are supported.
We split the transformed dataset into train- and test sets for later use in `fit()` and `predict()`, respectively. The proportion of the train set is configured with the split argument.
70
+
We split the transformed dataset `df`into train- and test sets for later use in `fit()` and `predict()`, respectively. The proportion of the train set is configured with the `split` argument.
The next step in the workflow is to build a model. processpredictR provides a default set of functions that are wrappers of generics provided by keras-package. For ease of use, the preprocessing steps such as tokenizing of sequences, normalizing numerical features, etc. are automated.
90
+
The next step in the workflow is to build a model. `processpredictR` provides a default set of functions that are wrappers of generics provided by `keras`. For ease of use, the preprocessing steps, such as tokenizing of sequences, normalizing numerical features, etc. happen within the `create_model()` function and are abstracted from the user.
90
91
91
92
## Define model
92
93
93
-
Based on the train set we define the default transformer model, using `create_model`
94
+
Based on the train set we define the default transformer model, using `create_model()`.
94
95
95
96
```{r}
96
97
model <- split$train_df %>% create_model(name = "my_model")
Note that `create_model` returns a list, of which the actual keras model is stored under the name `model`. Thus, we can use functions from the keras-package as follows:
137
+
Note that `create_model()` returns a list, in which the actual keras model is stored under the element name `model`. Thus, we can use functions from the keras-package as follows:
147
138
148
139
```{r}
149
140
model$model$name # get the name of a model
@@ -159,7 +150,7 @@ model$model$non_trainable_variables # list of non-trainable parameters of a mode
159
150
#> list()
160
151
```
161
152
162
-
The result of `create_model` is assigned it's own class (`ppred_model`) for which the processpredictR provides the methods _compile()_, _fit()_, _predict()_ and _evaluate()_.
153
+
The result of `create_model()` is assigned it's own class (`ppred_model`) for which the `processpredictR` provides the methods _compile()_, _fit()_, _predict()_ and _evaluate()_.
163
154
164
155
## Compilation
165
156
@@ -175,7 +166,7 @@ model %>% compile() # model compilation
175
166
176
167
## Training
177
168
178
-
Training of the model is done with the `fit` function. During training, a visualization window will open in the Viewer-pane to show the progress in terms of loss. Optionally, the result of `fit` can be assign to an object to access the training metrics specified in _compile()_.
169
+
Training of the model is done with the `fit()` function. During training, a visualization window will open in the Viewer-pane to show the progress in terms of loss. Optionally, the result of `fit()` can be assigned to an object to access the training metrics specified in _compile()_.
The method _predict()_ can return 3 types of output, by setting the argument `output` to "append", "y_pred" or "raw.
208
+
The method _predict()_ can return 3 types of output, by setting the argument `output` to "append", "y_pred" or "raw".
218
209
219
210
Test dataset with appended predicted values (output = "append")
220
211
@@ -275,7 +266,7 @@ predictions %>% head(5)
275
266
</p>
276
267
277
268
### Visualize predictions
278
-
For the classification tasks outcome and next activity a `confusion_matrix` function is provided.
269
+
For the classification tasks outcome and next activity a `confusion_matrix()` function is provided.
279
270
280
271
```{r}
281
272
predictions %>% class
@@ -333,7 +324,7 @@ model %>% evaluate(split$test_df)
333
324
334
325
# Add extra features
335
326
336
-
Next to the activity prefixes in the data, and standard features defined for each task, additional features can be defined when using `prepare_examples`. The example below shows how the month in which a case is started can be added as a feature.
327
+
Next to the activity prefixes in the data, and standard features defined for each task, additional features can be defined when using `prepare_examples()`. The example below shows how the month in which a case is started can be added as a feature.
337
328
338
329
```{r}
339
330
# preprocessed dataset with categorical hot encoded features
Additional features can be either numerical variables, or factors. Numerical variables will be automatically normalized. Factors will automatically be converted to hot-encoded variables. A few important notes:
375
366
376
367
- Character values are not accepted, and should be transformed to factors.
377
-
- We assume that no features have missing values. If there are any, these should be imputed or removed before using `prepare_examples`.
378
-
- Finally, in case the data is an event log, features should have a single values for each activity instance. Start and complete event should thus have a single unique value of a variable for it to be used as feature.
368
+
- We assume that no features have missing values. If there are any, these should be imputed or removed before using `prepare_examples()`.
369
+
- Finally, in case the data is an event log, features should have single values for each activity instance. Start and complete event should thus have a single unique value of a variable for it to be used as feature.
379
370
380
371
# Customize your transformer model
381
372
382
-
Instead of using the standard `off the shelf` transformer model that comes with `processpredictR`, you can customize the model. One way to do this, is by using the `custom` argument of the `create_model` function. The resulting model will then only contain the input layers of the model, as shown below.
373
+
Instead of using the standard `off the shelf` transformer model that comes with `processpredictR`, you can customize the model. One way to do this, is by using the `custom` argument of the `create_model()` function. The resulting model will then only contain the input layers of the model, as shown below.
You can than stack layers on top of your custom model as you prefer, using the `stack_layers` function.
401
+
You can than stack layers on top of your custom model as you prefer, using the `stack_layers()` function. This function provides an abstraction from a little bit more code work if `keras` is used (see later).
Once you have finalized your model, with an appropriate output-layer (which should have the correct amount of outputs, as recorded in `customer_model$num_outputs` and an appropriate activiation function), you can use the `compile`, `fit`, `predict` and `evaluate` functions as before.
437
+
Once you have finalized your model, with an appropriate output-layer (which should have the correct amount of outputs, as recorded in `customer_model$num_outputs` and an appropriate activation function), you can use the `compile()`, `fit()`, `predict()` and `evaluate()` functions as before.
447
438
448
439
# Custom training and prediction
449
440
450
-
We can also opt for setting up and training our model manually, instead of using the provided methods. Note that after defining a model with keras::keras_model() the model no longer is of class ppred_model.
441
+
We can also opt for setting up and training our model manually, instead of using the provided methods. Note that after defining a model with `keras::keras_model()` the model no longer is of class `ppred_model`.
451
442
452
443
```{r}
453
444
new_outputs <- custom_model$model$output %>% # custom_model$model to access a model and $output to access the outputs of that model
0 commit comments