-
Notifications
You must be signed in to change notification settings - Fork 12
/
Copy pathevent_filters.Rmd
150 lines (100 loc) · 4.67 KB
/
event_filters.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
---
title: bupaR Docs | Filter events
---
```{r echo = F, out.width="25%", fig.align = "right"}
knitr::include_graphics("images/icons/filter.png")
```
***
# Event filters
```{r}
library(bupaR)
```
## Activity
The filter activity function can be used to filter activities by name. It has three arguments
* the event log
* a vector of activities
* the reverse argument (FALSE or TRUE)
```{r}
patients %>%
filter_activity(c("X-Ray", "Blood test")) %>%
activities
```
As one can see, there are only 2 distinct activities left in the event log.
## Activity Frequency
#### Relative filtering - using percentage
It is also possible to filter on activity frequency. This filter uses a percentile cut off, and will look at those activities which are most frequent until the required percentage of events has been reached. Thus, a percentile cut off of 80% will look at the activities needed to represent 80% of the events. In the example below, the __least__ frequent activities covering 50% of the event log are selected, since the reverse argument is true.
```{r}
patients %>%
filter_activity_frequency(percentage = 0.5, reverse = T) %>%
activities
```
#### Absolute filtering - using interval
Instead of providing a target percentage, we can provide a target frequency interval. For example, only retain the activities which occur more than 300 times.
```{r}
patients %>%
filter_activity_frequency(interval = c(300,500)) %>%
activities
```
When we don't now the maximal frequency - 500 in this case, we can use an open interval by using NA.
```{r}
patients %>%
filter_activity_frequency(interval = c(300, NA)) %>%
activities
```
## Activity Instance
Specific activity instances can be selected using the `filter_activity_instance`.
```{r}
patients %>%
filter_activity_instance(activity_instances = 10)
```
## Lifecycle
`filter_lifecycle` can be used to select events with a specific lifecycle
```{r}
patients %>%
filter_lifecycle("complete")
```
## Lifecycle Presence
We can select activity instances that contain a specific status with `filter_lifecycle_presence`. Its workings are comparable to `filter_activity_presence`.
## Resource Labels
Similar to the activity filter, the resource filter can be used to filter events by listing on or more resources.
```{r}
patients %>%
filter_resource(c("r1","r4")) %>%
resource_frequency("resource")
```
## Resource Frequency
Instead of filtering events by the resource that performed the activity, we can also filter event by the frequency of the resource. This happens in the same way as for the activity frequency filter. The filter below gives us the 80% activity instances performed by the most common resources.
```{r}
patients %>%
filter_resource_frequency(perc = 0.80) %>%
resources()
```
Alternatively, using the interval argument, we can select resources who perform between 200 and 300 activity instances.
```{r}
patients %>%
filter_resource_frequency(interval = c(200,300)) %>%
resources()
```
## Trim to Endpoints
The trim filter is a special event filter, as it also take into account the notion of cases. In fact, it _trim_ cases such that they start with a certain activities until they end with a certain activity. It requires two list: one for possible start activities and one for end activities. The cases will be trimmed from the first appearance of a start activity till the last appearance of an end activity. When reversed, these _slices_ of the event log will be removed instead of preserved.
```{r}
patients %>%
filter_trim(start_activities = "Registration", end_activities = c("MRI SCAN","X-Ray")) %>%
process_map(type = performance())
```
## Trim to Time Window
Instead of triming cases to a particular start and/or end activity, we can also trim cases to a particular time window. For this we use the function `filter_time_period` with filter_method `trim`. This filter needs a time interval, which is a vector of length 2 containing data/datetime values. These can be created easily using [lubridate](https://lubridate.tidyverse.org/) function, e.g. `ymd` for year-month-day formats.
This example takes only activity instances which happened (at least partly, i.e. some events) in December of 2017.
```{r}
library(lubridate)
patients %>%
filter_time_period(interval = ymd(c(20171201, 20171231)), filter_method = "trim") %>%
summary()
```
Using a different filter method (start, complete, contained or intersecting), this filter can also act as a case filter (see below).
```{r footer, results = "asis", echo = F}
CURRENT_PAGE <- stringr::str_replace(knitr::current_input(), ".Rmd",".html")
res <- knitr::knit_expand("_button_footer.Rmd", quiet = TRUE)
res <- knitr::knit_child(text = unlist(res), quiet = TRUE)
cat(res, sep = '\n')
```