Filters the log based on frequency of activities.

filter_activity_frequency(eventlog, interval, percentage, reverse, ...)

# S3 method for eventlog
filter_activity_frequency(eventlog, interval = NULL,
  percentage = NULL, reverse = FALSE, ...)

# S3 method for grouped_eventlog
filter_activity_frequency(eventlog,
  interval = NULL, percentage = NULL, reverse = FALSE, ...)

ifilter_activity_frequency(eventlog)

Arguments

eventlog

The dataset to be used. Should be a (grouped) eventlog object.

interval

An activity frequency interval (numeric vector of length 2). Half open interval can be created using NA.

percentage

The target coverage of activity instances. A percentile of 0.9 will return the most common activity types of the eventlog, which account for at least 90% of the activity instances.

reverse

Logical, indicating whether the selection should be reversed.

...

Deprecated arguments.

Value

When given an eventlog, it will return a filtered eventlog. When given a grouped eventlog, the filter will be applied in a stratified way (i.e. each separately for each group). The returned eventlog will be grouped on the same variables as the original event log.

Details

Filtering the event log based in activity frequency can be done in two ways: using an interval of allowed frequencies, or specify a coverage percentage.

  • percentage: When filtering using a percentage p%, the filter will return p frequency. The filter will retain additional activity labels as long as the number of activity instances does not exceed the percentage threshold.

  • interval: When filtering using an interval, activity labels will be retained when their absolute frequency fall in this interval. The interval is specified using a numeric vector of length 2. Half open intervals can be created by using NA. E.g., `c(10, NA)` will select activity labels which occur 10 times or more.

Methods (by class)

  • eventlog: Filter eventlog on activity frequency

  • grouped_eventlog: Stratified filter for grouped eventlog

See also