Skip to content

Commit 66b1137

Browse files
committedJan 4, 2023
2 parents 1f3a2cb + 9699260 commit 66b1137

File tree

1 file changed

+37
-46
lines changed

1 file changed

+37
-46
lines changed
 

‎vignettes/process-prediction-workflow.Rmd

+37-46
Original file line numberDiff line numberDiff line change
@@ -27,46 +27,47 @@ library(purrr)
2727

2828
# Introduction
2929
The goal of processpredictR is to perform prediction tasks on processes using event logs and Transformer models.
30-
5 process monitoring tasks are defined as follows:
30+
The 5 process monitoring tasks are defined as follows:
3131

32-
* outcome: predict the case outcome, which can be the last activity, or a manually defined variable
33-
* next activity: predict the next activity instance
34-
* remaining trace: predict the sequence of all next activity instances
35-
* next time: predict the start time of the next activity instance
36-
* remaining time: predict the remaining time till the end of the case
32+
* _outcome_: predict the case outcome, which can be the last activity, or a manually defined variable
33+
* _next activity_: predict the next activity instance
34+
* _remaining trace_: predict the sequence of all next activity instances
35+
* _next time_: predict the start time of the next activity instance
36+
* _remaining time_: predict the remaining time till the end of the case
3737

38-
The overall approach using `processpredictR` is shown in the Figure below. `prepare_examples()` transforms logs into a dataset that can be used for training and prediction, which is thereafter split into train and test set. Subsequently a model is made, compiled and fit. Finally, the model can be used to predict and can be evaluatied
38+
The overall approach using `processpredictR` is shown in the Figure below. `prepare_examples()` transforms logs into a dataset that can be used for training and prediction, which is thereafter split into train and test set. Subsequently a model is made, compiled and fit. Finally, the model can be used to predict and can be evaluated
3939

4040
```{r echo = F, eval = T, out.width = "60%", fig.align = "center"}
4141
knitr::include_graphics("framework.PNG")
4242
```
4343

44-
Different levels of customization are offered. Using `create_model`, a standard off-the-shelf model can be created for each of the supported tasks, including standard features.
44+
Different levels of customization are offered. Using `create_model()`, a standard off-the-shelf model can be created for each of the supported tasks, including standard features.
4545

46-
A first customization is to include, additional features, such as case or event attributes. These can be configured in the `prepare_examples` step, and they will be processed automatically (normalized for numerical features, or hot-encoded for categorical features).
46+
A first customization is to include additional features, such as case or event attributes. These can be configured in the `prepare_examples()` step, and they will be processed automatically (normalized for numerical features, or hot-encoded for categorical features).
4747

48-
A further way to customize your model, is to only generate the input layer of the model with `create_model`, and define the remainder of the model yourself by adding `keras` layers using the provided `stack_layers` function.
48+
A further way to customize your model, is to only generate the input layer of the model with `create_model()`, and define the remainder of the model yourself by adding `keras` layers using the provided `stack_layers()` function.
4949

50-
Going beyond that, you can also create the model entirely yourself using `keras`, including the preprocessing of the data. Auxilliary functions are provided to help you with, e.g., tokenezing the activity sequences.
50+
Going beyond that, you can also create the model entirely yourself using `keras`, including the preprocessing of the data. Auxiliary functions are provided to help you with, e.g., tokenizing activity sequences.
51+
52+
In the remainder of this tutorial, each of the steps and possible avenues for customization will be described in more detail.
5153

52-
In the remained of this tutorial, each of the steps and possible avenues for customization will be described in more detail.
5354
# Preprocessing
54-
As a first step in the process prediction workflow we use `prepare_examples` to obtain a dataset where:
55+
As a first step in the process prediction workflow we use `prepare_examples()` to obtain a dataset, where:
5556

5657
* each row/observation is a unique activity instance id
5758
* the prefix(_list) column stores the sequence of activities already executed in the case
58-
* necessary feature and target variables are calculated
59+
* necessary features and target variables are calculated and/or added
5960

60-
The returning object is of class `ppred_examples_df`, which inherits from `tbl_df`.
61+
The returned object is of class `ppred_examples_df`, which inherits from `tbl_df`.
6162

62-
In this tutorial, we will use the `traffic_fines` event log from `eventdataR`. Note that both `eventlog` and `activitylog` objects, as defined by `bupaR` are supported.
63+
In this tutorial we will use the `traffic_fines` event log from `eventdataR`. Note that both `eventlog` and `activitylog` objects, as defined by `bupaR` are supported.
6364

6465
```{r, eval = T}
6566
df <- prepare_examples(traffic_fines, task = "outcome")
6667
df
6768
```
6869

69-
We split the transformed dataset into train- and test sets for later use in `fit()` and `predict()`, respectively. The proportion of the train set is configured with the split argument.
70+
We split the transformed dataset `df` into train- and test sets for later use in `fit()` and `predict()`, respectively. The proportion of the train set is configured with the `split` argument.
7071

7172
```{r, eval = T}
7273
set.seed(123)
@@ -86,11 +87,11 @@ n_distinct(split$train_df$case_id) / n_distinct(df$case_id)
8687

8788
# Transformer model
8889

89-
The next step in the workflow is to build a model. processpredictR provides a default set of functions that are wrappers of generics provided by keras-package. For ease of use, the preprocessing steps such as tokenizing of sequences, normalizing numerical features, etc. are automated.
90+
The next step in the workflow is to build a model. `processpredictR` provides a default set of functions that are wrappers of generics provided by `keras`. For ease of use, the preprocessing steps, such as tokenizing of sequences, normalizing numerical features, etc. happen within the `create_model()` function and are abstracted from the user.
9091

9192
## Define model
9293

93-
Based on the train set we define the default transformer model, using `create_model`
94+
Based on the train set we define the default transformer model, using `create_model()`.
9495

9596
```{r}
9697
model <- split$train_df %>% create_model(name = "my_model")
@@ -122,28 +123,18 @@ model # is a list
122123
#> ________________________________________________________________________________
123124
```
124125

126+
Some useful information and metrics are stored for a tracebility and an easy extraction if needed.
125127
```{r}
126-
model %>% attributes() # objects from a returned list
128+
model %>% names() # objects from a returned list
127129
```
128130
```
129131
#> $names
130132
#> [1] "model" "max_case_length" "number_features" "task"
131133
#> [5] "num_outputs" "vocabulary"
132-
#>
133-
#> $class
134-
#> [1] "ppred_model" "list"
135-
#>
136-
#> $max_case_length
137-
#> [1] 9
138-
#>
139-
#> $number_features
140-
#> [1] 0
141-
#>
142-
#> $task
143-
#> [1] "outcome"
134+
144135
```
145136

146-
Note that `create_model` returns a list, of which the actual keras model is stored under the name `model`. Thus, we can use functions from the keras-package as follows:
137+
Note that `create_model()` returns a list, in which the actual keras model is stored under the element name `model`. Thus, we can use functions from the keras-package as follows:
147138

148139
```{r}
149140
model$model$name # get the name of a model
@@ -159,7 +150,7 @@ model$model$non_trainable_variables # list of non-trainable parameters of a mode
159150
#> list()
160151
```
161152

162-
The result of `create_model` is assigned it's own class (`ppred_model`) for which the processpredictR provides the methods _compile()_, _fit()_, _predict()_ and _evaluate()_.
153+
The result of `create_model()` is assigned it's own class (`ppred_model`) for which the `processpredictR` provides the methods _compile()_, _fit()_, _predict()_ and _evaluate()_.
163154

164155
## Compilation
165156

@@ -175,7 +166,7 @@ model %>% compile() # model compilation
175166

176167
## Training
177168

178-
Training of the model is done with the `fit` function. During training, a visualization window will open in the Viewer-pane to show the progress in terms of loss. Optionally, the result of `fit` can be assign to an object to access the training metrics specified in _compile()_.
169+
Training of the model is done with the `fit()` function. During training, a visualization window will open in the Viewer-pane to show the progress in terms of loss. Optionally, the result of `fit()` can be assigned to an object to access the training metrics specified in _compile()_.
179170

180171
```{r}
181172
hist <- fit(object = model, train_data = split$train_df, epochs = 5)
@@ -214,7 +205,7 @@ hist$metrics
214205

215206
## Make predictions
216207

217-
The method _predict()_ can return 3 types of output, by setting the argument `output` to "append", "y_pred" or "raw.
208+
The method _predict()_ can return 3 types of output, by setting the argument `output` to "append", "y_pred" or "raw".
218209

219210
Test dataset with appended predicted values (output = "append")
220211

@@ -275,7 +266,7 @@ predictions %>% head(5)
275266
</p>
276267

277268
### Visualize predictions
278-
For the classification tasks outcome and next activity a `confusion_matrix` function is provided.
269+
For the classification tasks outcome and next activity a `confusion_matrix()` function is provided.
279270

280271
```{r}
281272
predictions %>% class
@@ -333,7 +324,7 @@ model %>% evaluate(split$test_df)
333324

334325
# Add extra features
335326

336-
Next to the activity prefixes in the data, and standard features defined for each task, additional features can be defined when using `prepare_examples`. The example below shows how the month in which a case is started can be added as a feature.
327+
Next to the activity prefixes in the data, and standard features defined for each task, additional features can be defined when using `prepare_examples()`. The example below shows how the month in which a case is started can be added as a feature.
337328

338329
```{r}
339330
# preprocessed dataset with categorical hot encoded features
@@ -374,12 +365,12 @@ df_next_time$train_df %>% attr("hot_encoded_categorical_features")
374365
Additional features can be either numerical variables, or factors. Numerical variables will be automatically normalized. Factors will automatically be converted to hot-encoded variables. A few important notes:
375366

376367
- Character values are not accepted, and should be transformed to factors.
377-
- We assume that no features have missing values. If there are any, these should be imputed or removed before using `prepare_examples`.
378-
- Finally, in case the data is an event log, features should have a single values for each activity instance. Start and complete event should thus have a single unique value of a variable for it to be used as feature.
368+
- We assume that no features have missing values. If there are any, these should be imputed or removed before using `prepare_examples()`.
369+
- Finally, in case the data is an event log, features should have single values for each activity instance. Start and complete event should thus have a single unique value of a variable for it to be used as feature.
379370

380371
# Customize your transformer model
381372

382-
Instead of using the standard `off the shelf` transformer model that comes with `processpredictR`, you can customize the model. One way to do this, is by using the `custom` argument of the `create_model` function. The resulting model will then only contain the input layers of the model, as shown below.
373+
Instead of using the standard `off the shelf` transformer model that comes with `processpredictR`, you can customize the model. One way to do this, is by using the `custom` argument of the `create_model()` function. The resulting model will then only contain the input layers of the model, as shown below.
383374

384375
```{r}
385376
df <- prepare_examples(traffic_fines, task = "next_activity") %>% split_train_test()
@@ -407,7 +398,7 @@ custom_model
407398
#> ________________________________________________________________________________
408399
```
409400

410-
You can than stack layers on top of your custom model as you prefer, using the `stack_layers` function.
401+
You can than stack layers on top of your custom model as you prefer, using the `stack_layers()` function. This function provides an abstraction from a little bit more code work if `keras` is used (see later).
411402
```{r}
412403
custom_model <- custom_model %>%
413404
stack_layers(layer_dropout(rate = 0.1)) %>%
@@ -443,11 +434,11 @@ custom_model %>%
443434
stack_layers(layer_dropout(rate = 0.1), layer_dense(units = 64, activation = 'relu'))
444435
```
445436

446-
Once you have finalized your model, with an appropriate output-layer (which should have the correct amount of outputs, as recorded in `customer_model$num_outputs` and an appropriate activiation function), you can use the `compile`, `fit`, `predict` and `evaluate` functions as before.
437+
Once you have finalized your model, with an appropriate output-layer (which should have the correct amount of outputs, as recorded in `customer_model$num_outputs` and an appropriate activation function), you can use the `compile()`, `fit()`, `predict()` and `evaluate()` functions as before.
447438

448439
# Custom training and prediction
449440

450-
We can also opt for setting up and training our model manually, instead of using the provided methods. Note that after defining a model with keras::keras_model() the model no longer is of class ppred_model.
441+
We can also opt for setting up and training our model manually, instead of using the provided methods. Note that after defining a model with `keras::keras_model()` the model no longer is of class `ppred_model`.
451442

452443
```{r}
453444
new_outputs <- custom_model$model$output %>% # custom_model$model to access a model and $output to access the outputs of that model
@@ -505,7 +496,7 @@ compile(object=custom_model, optimizer = "adam",
505496
```
506497

507498

508-
Before training the model we first must prepare the data, using the `tokenize` function.
499+
Before training the model we first must prepare the data, using the `tokenize()` function.
509500

510501
```{r}
511502
# the trace of activities must be tokenized
@@ -551,8 +542,8 @@ map(tokens_train, head) # the output of tokens is a list
551542
x <- tokens_train$token_x %>% pad_sequences(maxlen = max_case_length(df$train_df), value = 0)
552543
y <- tokens_train$token_y
553544
```
554-
We are now ready to train our custom model.
555545

546+
We are now ready to train our custom model (the code below is not being evaluated).
556547
```{r, eval=F}
557548
# train
558549
fit(object = custom_model, x, y, epochs = 10, batch_size = 10) # see also ?keras::fit.keras.engine.training.Model

0 commit comments

Comments
 (0)
Please sign in to comment.