Business Process Event Log Anomaly Detection based on Statistical Leverage ⋆ Jonghyeon Ko and Marco Comuzzi Ulsan National Institute of Science and Technology (UNIST) Ulsan, Republic of Korea {whd1gus2, mcomuzzi}@unist.ac.kr Abstract. We present a novel information-theoretic framework to de- tect anomalous traces in business process event logs. Although information- theoretic approaches to anomaly detection are considered fundamental in data analytics, they have not been considered in the context of event logs. The proposed framework combines a trace-level anomaly score based on statistical leverage, which also gives an indication of the severity of an anomaly, and different ways of setting the value of a threshold to detect anomalous traces. The framework has been first proposed in a traditional offline setting, but we also discuss its extension in the online setting, i.e., when events in a log are considered as a stream. · · · · Keywords: Anomaly Detection Event Log Business Process Information- theoretic Measure Statistical Leverage 1 Introduction Business processes span across all departments in an organization, and informa- tion logged during the execution of business processes is available in so-called event logs [2]. The events in an event log are the ones that capture the execu- tion of activities that punctuate the flow of each process execution, labeled with their own case identifier. As such, event logs have three key attributes (a case identifier, a timestamp, and an activity label) and additional attributes relevant in a specific domain, such as an identifier of the (human) resource in charge of the execution or supervision of an activity. Event logs are prone to errors due to different root causes affecting the process execution and/or the logging process, such as system malfunctioning or sub- optimal resource behaviour [2–4]. Low quality event logs can crucially disrupt process mining-based analyses. The process models discovered from a low quality log, in fact, may be highly inaccurate [5] and, more generally, low quality event logs may give a falsified perception of the process that is analysed to stakeholders, resulting in financial loss. In this context, the research field of event log anomaly detection (or anomalous behavior detection) has emerged recently with the aim ⋆ Copyright ©2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). 2 J. Ko & M. Comuzzi of developing methods to identify and possibly correct the anomalies recorded in an event log. Our work focuses on detecting anomalies at the trace-level in event logs, i.e., anomalous sequences of activities for a case. This problem has been ap- proached initially in the literature relying on a model of correct behaviour, cap- tured for instance by a process model or labelled instances from which a model of positive/negative behaviour can be extracted. More recently, however, machine learning-based approaches have tackled this problem more generally as an unsu- pervised problem, without any reliance on an existing process model or labelled data. In this context, this paper gives a brief overview of our recent and ongoing work on trace-level anomaly detection using statistical leverage. We propose a novel information-theoretic measure of anomaly score of a trace, based on the definition of statistical leverage. Besides supporting the anomaly detection task – through the definition of an appropriate anomaly threshold – an anomaly score is also able to represent the severity of anomalies, which is an aspect that cannot be handled naturally by classification-based approaches. The approach has been customised to both offline and online settings, i.e., treating an event log as a batch of events accumulated over a relatively long period, and as a stream of events that become available as soon as they captured, respectively. The content of this paper is compiled from our previous publications on offline [2] and online [3] anomaly detection. This paper is organised as follows. Section 2 introduces the definition of the proposed anomaly score, the anomaly detection method and the experimental results in the offline settings, whereas Section 3 discusses the same in the online settings. Conclusions are drawn in Section 4. 2 Offline settings: anomaly score and anomaly detection 2.1 Anomaly score In statistics, the leverage is a measure capturing how far away one observation is from other observations in a dataset [1]. It has been used as a key support coefficient to develop statistical measures of anomaly scores for tabular numeric data such as the Cook’s distance or the Welsch-Kuh distance. The main idea underpinning our approach is to define an anomaly score for each trace in a log based on the notion of statistical leverage. To calculate the leverage of traces in an event log E, this has to be appropriately pre-processed to obtain a numeric matrix of observations X(E). Similarly to other approaches in the literature [5, 6], we consider one-hot sequence encoding of activity labels and zero-padding for trace-level aggregation of events (see Figure 1). Given an event log E and its activity labels AE = {a1 , . . . , aK }, K dummy attributes di,j,k are created for each event ei,j in a trace σj ⊆ E. In this way, using one-hot encoding, each trace σj of length Nj is encoded into Nj × K attributes. Then, for trace-level aggregation, cases are aggregated in a J × (N max × K) matrix Title Suppressed Due to Excessive Length 3 Case_ID Event_ID Activity CompleteTimeStamp CaseID EventID 𝒂𝟏 𝒂𝟐 𝒂𝟑 𝒂𝟒 CompleteTimeStamp 𝐸𝑋1 𝑎1 2019/10/27 10:25:21 Step 1. One-hot 𝐸𝑋1 1 0 0 0 2019/10/27 10:25:21 CaseX 𝐸𝑋2 𝑎2 2019/10/29 12:31:48 encoding CaseX 𝐸𝑋2 0 1 0 0 2019/10/29 12:31:48 𝐸𝑋3 𝑎3 2019/10/29 13:01:13 𝐸𝑋3 0 0 1 0 2019/10/29 13:01:13 𝐸𝑌1 𝑎2 2019/10/28 09:41:53 𝐸𝑌1 0 1 0 0 2019/10/28 09:41:53 CaseY CaseY 𝐸𝑌2 𝑎4 2019/10/30 14:25:51 𝐸𝑌2 0 0 0 1 2019/10/30 14:25:51 Step 2. Concatenate grouped events & 0- padding CaseID 𝒂𝟏𝟏 𝒂𝟏𝟐 𝒂𝟏𝟑 𝒂𝟏𝟒 𝒂𝟐𝟏 𝒂𝟐𝟐 𝒂𝟐𝟑 𝒂𝟐𝟒 𝒂𝟑𝟏 𝒂𝟑𝟐 𝒂𝟑𝟑 𝒂𝟑𝟒 CaseX 1 0 0 0 0 1 0 0 0 0 1 0 CaseY 0 1 0 0 0 0 0 1 0 0 0 0 0-padding Fig. 1. One-hot encoding and 0-padding X(E) by zero-padding, where N max = maxσj ∈E (Nj ) is the length of the longest trace(s) in E. Then, the leverage ˆl(σj ) of a trace σj is calculated using the matrix X(E). The leverage of the j-th trace in a log is equal to j-th diagonal element of the projection matrix H(E) = X(E) · (X(E)T · X(E))−1 · X(E)T : ˆl(σj ) = hj,j ∈ H(E) (1) However, the leverage ˆl(σj ) defined above suffers from a bias due to zero- padding, which increases the leverage of longer traces and decrease the one of shorter traces (which are more likely to be considered similar because they con- tain many zero-padded values). In order to counter this issue affecting ˆl(σj ), the anomaly score that we propose is a weighted version of the leverage ˆlw (σj ), which considers a trace-length weighting factor wj that can optimally be adjusted using a robustly-fitted model on different real-life event logs. The detailed procedure of defining the weight wj is described in our published paper [2]. 2.2 Thresholding methods The anomaly score defined above ranges between 0 and 1 and the higher its value the more likely a trace to be anomalous. Therefore, to support anomaly detection, a threshold has to be chosen that discriminates between anomalous (above threshold) and normal (below threshold) traces. We propose 3 different methods to define such a threshold: (i) a deviation- based threshold (M 1), (ii) a statistical distribution-based threshold (M 2), and (iii) a distributional gap-based threshold (M 3). In M 1, the threshold T is defined by the mean and standard deviation of the anomaly scores of all traces in a log. In particular, a trace is considered anoma- lous if its score is greater than the sum of the average and the standard devia- tion of the anomaly scores of all traces in a log, i.e., T = meanσj ⊆E [ˆlw (σj )] + stdevσj ⊆E (ˆlw (σj )). In M 2, the threshold is defined as the 0.9 quantile of the probability density function of a gamma distribution fitted using the anomaly 4 J. Ko & M. Comuzzi scores of all traces in a log. Finally, in M 3 the threshold is chosen empirically as the first stationary point of the expected cumulative denisty function of the anomaly score values. The latter method M 3, in particular, is robust because it does not require any prior knowledge about the ratio of anomalies in an event log, and therefore it is easy to use due to its non-parametric design. 2.3 Evaluation The proposed anomaly detection framework, i.e., the anomaly score and the anomaly threshold to support anomaly detection, has been evaluated on artificial and real life event logs against state of the art anomaly detection techniques (heuristic methods, classification-based and deep learning-based methods). In a nutshell, the evaluation has found that: – For real logs, the proposed framework shows the highest average F1-score, compared with the state of the art baselines, while, for artificial logs, the performance is comparable with the one of best baselines; – The proposed framework is better than the state of the art baselines at maintaining a good performance particularly with low anomaly ratios in event logs; it also shows robust performance regardless of the anomaly ratio, while the performance of the baselines tends to degrade for lower anomaly ratios; – The proposed framework shows a balanced performance in respect of recall and precision, with recall/1.5 ≤ precision ≤ 1.5·recall in most experiments, while other baselines often show skewed performance towards one of the two performance measures. More details about the evaluation are reported in our original publication [2]. 3 Adapting the framework to online settings 3.1 Online anomaly detection Anomaly detection in online settings can be crucial for discovering anomalies in process execution as soon as they occur and, consequently, allowing to promptly take early corrective actions. However, the online settings introduce additional challenges such as requirements of adaptability to concept drift, and the finite memory usage. To cope with these challenges, we have adapted the framework described in the previous section to online settings by considering the two basic concepts of grace period and (trace-based) sliding window. The parameter Grace Period (GP ) specifies a minimum number of traces to be completed, i.e., for which all the events have to be received, before the framework can be evaluated, which prevents running the anomaly detection model at early stages with an insufficient number of received events. The Sliding Window (SW ) aims at keeping the finite Title Suppressed Due to Excessive Length 5 Event_ID Activity Timestamp Anomaly score (𝓵෠ 𝒘 ) Case.01 Event_1 A 2019/09/01 09:10 0.00 (●) Case.02 Event_2 A 2019/09/01 09:25 0.01 Case.70 (●) Case.70 Event_300 Case.03 Event_3 O 2019/09/02 09:05 0.01 (●) Case.01 Event_4 B 2019/09/02 15:51 0.00 (●) Case.02 Event_5 Y 2019/09/03 09:13 0.26 (●) Case.02 Event_6 C 2019/09/04 16:23 0.25 (●) Case.01 Event_7 C 2019/09/04 17:02 0.01 (●) … … … … Case.70 Event_297 A 2020/08/08 11:23 0.05 (●) Case.72 Event_298 E 2020/08/09 13:10 0.16 (●) Case.72 Event_299 B 2020/08/10 14:44 0.19 (●) Case.70 Event_300 Z 2020/08/10 16:36 ? Step 2. Preprocess zero padding Step 1. A new 300th event in event streams the event log of total 300 events (one-hot encoding and zero padding) Event_ID Activity Timestamp Anomaly score (𝓵෠ 𝒘 ) Anomaly score (𝓵෠ 𝒘 ) Case.01 Event_1 A 2019/09/01 09:10 0.00 (●) Case.01 0.09 Case.02 Event_2 A 2019/09/01 09:25 0.01 (●) Case.02 0.19 Case.03 Event_3 O 2019/09/02 09:05 0.01 (●) Case.03 0.04 Case.01 Event_4 B 2019/09/02 15:51 0.00 (●) Case.04 0.02 Case.02 Event_5 Y 2019/09/03 09:13 0.26 (●) Case.05 0.32 Case.02 Event_6 C 2019/09/04 16:23 0.25 (●) Case.06 0.01 Case.01 Event_7 C 2019/09/04 17:02 0.01 (●) Case.07 0.06 … … … … Case.08 0.08 Case.70 Event_297 A 2020/08/08 11:23 0.05 (●) Case.09 0.05 Case.72 Event_298 E 2020/08/09 13:10 0.16 (●) Case.10 0.21 Case.72 Event_299 B 2020/08/10 14:44 0.19 (●) … … Case.70 Event_300 Z 2020/08/10 16:36 0.83 (●) Case.70 0.83 Case.71 0.07 Case.72 0.16 Step 4. Allocate the anomaly score of case.70 to corresponding event (Event_300) Step 3. Calculate anomaly score Fig. 2. Example of anomaly detection using a fixed anomaly threshold T = 0.1 memory usage by maintaining only the events of a finite number of recent traces for the analysis. After having introduced the basic concepts of grace period and (trace-based) sliding window, we now introduce the main procedure for the online anomaly detection in Figure 1. Each time a new event is received, we get a sliding window data Et satisfied with the two parameters GP and W . Then, using the anomaly scoe defined in the previous Section 2, after applying one-hot encoding & zero- padding to Et , we calculate an anomaly score for all traces ˆlw (σj ) in the sliding window. For a quicker and fully-automated way of detecting anomalies in online set- tings, as far as thresholds are concerned, we consider three constant thresholds and one variable threshold. We consider the constant values Tc1 = 0.1, Tc2 = 0.15, and Tc3 = 0.2 and, as variable threshold, the value Tv = meanσj ⊆Et [ˆlw (σj )]+ stdevσj ⊆Et (ˆlw (σj )), which calculates the threshold based on the mean and stan- dard deviation of the leverage scores of all traces in the sliding window. Based on a set threshold, a trace σj is labelled as anomalous if ˆlw (σj ) > T . This procedure is replicated each time a new event is received. 6 J. Ko & M. Comuzzi 3.2 Evaluation The proposed online framework for anomaly detection has been evaluated on artificial and a real life event log. The results obtained can be summarised as follows: – Since event logs are structured by instance observations, not point observa- tions, the performance of the proposed framework becomes high and stable when a sufficient number of events have been received for all traces in the window SW ; – The performance (accuracy) of the framework tends to improve as the size of the sliding window increases, although this also leads to increased run time; – The proposed framework shows better recall than precision or F1-score, i.e., the framework is good at recognising correctly the anomalous traces, but often mistakes non-anomalous traces for anomalous ones, therefore creating false positives. More details about this evaluation are available in our original paper [3]. 4 Conclusions This paper has presented a novel information-theoretic framework for anomaly detection of anomalous traces in business process event logs. The proposed frame- work has been applied to both online and offline event log settings. The results have shown proposed framework is robust on performance across different event logs and even with low anomaly ratios, as well as it can be easily used even by less experienced data analysts owing to its non-parametric design. References 1. Everitt, B.: The cambridge dictionary of statistics cambridge university press. Cam- bridge, UK Google Scholar (1998) 2. Ko, J., Comuzzi, M.: Detecting anomalies in business process event logs using sta- tistical leverage. Information Sciences 549, 53–67 (2021) 3. Ko, J., Comuzzi, M.: Online anomaly detection using statistical leverage for stream- ing business process events. In: Process Mining Workshops: ICPM 2020 International Workshops, Padua, Italy, October 5–8, 2020, Revised Selected Papers. vol. 406, p. 193. Springer Nature (2021) 4. Ko, J., Lee, J., Comuzzi, M.: Air-bagel: An interactive root cause-based anomaly generator for event logs. In: Proceedings of International Conference on Process Mining (ICPM) Demo Track (2020) 5. Nguyen, H.T.C., Lee, S., Kim, J., Ko, J., Comuzzi, M.: Autoencoders for improving quality of process event logs. Expert Systems with Applications 131, 132–147 (2019) 6. Nolle, T., Luettgen, S., Seeliger, A., Mühlhäuser, M.: Binet: Multi-perspective busi- ness process anomaly classification. Information Systems p. 101458 (2019)