=Paper=
{{Paper
|id=Vol-3139/paper04
|storemode=property
|title=Complex Behaviours Detection and Analysis in Process Mining
|pdfUrl=https://ceur-ws.org/Vol-3139/paper04.pdf
|volume=Vol-3139
|authors=Yang Lu
|dblpUrl=https://dblp.org/rec/conf/caise/Lu22
}}
==Complex Behaviours Detection and Analysis in Process Mining==
Complex Behaviours Detection and Analysis in Process Mining Yang Lu School of Computer Science, The University of Sydney, Sydney, NSW 2006, Australia Abstract Process mining builds a bridge between traditional process modelling and data mining. Process discovery algorithms aim at constructing process models automatically from event logs. Existing process discovery algorithms perform well with simple event logs, but their performance can be affected when the input event logs contain complex behaviours, and the discovered process models may not represent the real behaviours of the business processes. In this research, we plan to develop methods to automatically detect different complex behaviours in event logs. The detection of complex behaviours can be based on the extensions of existing process discovery algorithms or stand-alone complex behaviours detection tools. The paper presents our research questions, methodologies, current research progress and potential challenges. Keywords Process Mining, Process Discovery, Complex Behaviours Detection 1. Introduction Process mining is a relatively new subject which builds a bridge between traditional process modeling and data mining [1]. Process discovery is the most critical part of process mining which aims at extracting process insights of a system. The discovered process models should not only have high quality measures (i.e., fitness, precision and generalization), but also be an accurate representation of the real process behaviours. Many process discovery algorithms have been proposed, and some can return process mod- els with high fitness and precision values [2]. However, when input logs contain complex behaviours, the discovered models may not describe the process behaviours accurately. Firstly, some process discovery algorithms cannot guarantee the soundness of discovered models. When complex behaviours are included in event logs, they may return process models which are not sound [2]. Secondly, in order to accommodate complex process behaviours in a single process model, process discovery algorithms may return a complex and incomprehensible process models [3]. For example, when the event log contains information of several micro- processes, it is hard to use a single process model to describe the processes [4]. Thirdly, due to the representation bias [5] of process modeling languages (eg., Petri nets, BPMNs, Process Proceedings of the Doctoral Consortium Papers Presented at the 34th International Conference on Advanced Information Systems Engineering (CAiSE 2022), June 06–10, 2022, Leuven, Belgium Envelope-Open yalu8986@uni.sydney.edu.au (Y. Lu) GLOBE https://www.sydney.edu.au/engineering/about/our-people/research-students/yang-lu-503.html (Y. Lu) Orcid 0000-0002-9002-8650 (Y. Lu) © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) trees), some process behaviours may not be able to be accurately presented. For example, if an activity happens in the context rather than in the control flow of the process (i.e., the activity can happen at anytime during the process), it cannot be accurately described by traditional process modeling languages [6]. Lastly, even though a comprehensive and structured process model is discovered, it assumes the process to be static and may ignore the changes of the process during the execution time [7]. To discover comprehensive process models, many event log filtering tools have been proposed [8] to simplify the event logs. Although event logs can be simplified by using log filtering tools, important knowledge about the processes can be ignored. As a goal of process mining is to facilitate business process improvement taking the advantage of event logs, it is important for us to get deep understanding about business processes. Filtering out and ignoring complex behaviours in event logs is inappropriate. To this end, the proposed PhD project focuses on the detection and analysis of different complex behaviours in event logs. The research questions are presented below: • Can we extend existing process modeling languages to present different complex be- haviours? • Can we extend existing process discovery algorithms to discover different complex be- haviours? • Can we develop stand-alone tools to discover different complex behaviours directly from event logs? On the one hand, existing process discovery algorithms can be extended to handle complex behaviours. By extending existing process discovery algorithms, their advantages can be inherited. On the other hand, stand-alone tools to discover different complex process behaviours can be developed. These tools can be applied before performing data pre-processing and process discovery algorithms to help us understand the complex behaviours ignored by the discovered model. The rest of this paper is structured as follows: Section 2 is a literature review of related work. Section 3 presents our research approach. Section 4 presents our current research progress. Section 5 presents potential challenges of our research project and concludes the paper. 2. Related Work 2.1. Extending Existing Process Modeling Languages and Discovery Algorithms to Handle Complex Behaviours The original alpha algorithm [9] is one of the earliest process discovery algorithms. It guarantees to rediscover the process model when the event log satisfies certain conditions. However, the process behaviours which can be discovered by the original alpha algorithm are limited. The original alpha algorithm has been extended to handle complex behaviours such as the short loops [10], invisible tasks [11, 12] and non-free-choice behaviours [11, 13]. These algorithms allow complex behaviours to be discovered while maintaining the guarantees of the original alpha algorithm. The inductive miner [14, 15] is one of the most popular process discovery algorithms which guarantees to return sound process models. Given the direct outcome of the inductive miner is a process tree, the behaviours being represented are limited. A so-called ”flow model” (i.e., process model with high fitness but very low precision) can be returned when the process behaviours are unable to be represented by process trees. To allow more behaviours to be represented and discovered, both the inductive miner and process tree are extended. For example, Leemans et al. [16] extend the inductive miner to discover cancellation behaviours, Lu et al. [17] extend the inductive miner to discover switch behaviours (i.e., change of paths on exclusive choice branches). Leemans et al. [18] also extend the inductive miner to discover recursive behaviours in the execution of software source code. In all these extensions, the discovered models are always sound. 2.2. Stand-alone Tools to Discover Complex Behaviours Many methods have been developed in order to cluster traces in event logs [4]. Traces with similar behaviours are put into the same cluster. Instead of using one process model to represent the process in the event log, a separate process model is discovered for each cluster. Besides trace clustering algorithms, some methods also focus on the discovery of hierarchical process models to handle complex event logs with many activities and sub-processes (eg., [19, 20]). A high-level process model can be used to represent the relationship between different sub-processes while a low-level model can be used to represent the process model of each single sub-process. To deal with changes within business processes, various methods have been proposed to deal with concept drifts in event logs [7]. Some algorithms focus on detecting the time points of concept drifts (eg., [21, 22]). Others focus on providing comprehensive results to users (eg., visualise process changes [23]). When time points of concept drifts are detected, a separate process model can be discovered between each pair of points to help us understand the evolving of processes. Some research also focuses on detecting other complex behaviours. For example, Lu et al. [24] propose a method to handle duplicate activities in event logs. More specifically, the same activity may execute in different stages of a process. A more comprehensive process model can be discovered if we split such an activity into multiple activities. Dees et al. [6] propose a method to visualise the behaviours of context activities on the edges of process models (i.e., activities which can happen at anytime during the execution of the process). Dubinsky et al. [25] propose a method to detect the ”split-cases” behaviours in event logs (i.e., when a case is illegally split into multiple cases). In a nutshell, stand-alone tools are independent from process discovery algorithms. They can discover more insights about the process from event logs which usually cannot be discovered directly by process discovery algorithms. These methods can be used together with process discovery algorithms to enhance our understanding of business processes described in event logs. Figure 1: An example switch process tree and its corresponding workflow net 3. Research Approach Our research approach is adapted from the Design Science Research (DSR) [26] methodology. Our research contains the following stages: • Identifying Problems: At this stage, literature review is conducted. The literature review focuses on research related to detecting process behaviours from event logs (eg., process discovery algorithms, concept drift detection algorithms, trace clustering algorithms, etc.). Based on the literature review, gaps can be identified among existing literature (eg., existing concept drift detection algorithms cannot distinguish process drifts from noises). • Designing and Developing Solutions: Once a gap has been identified among existing literature, a solution can then be proposed to solve the problem. Then a software can be built to implement the proposed solution. Existing open-source software packages can be modified for faster development process. • Demonstration and Evaluation: When a potential solution is implemented, evaluation should be conducted to show the capabilities of the proposed solution. The evaluation usually contains two parts: firstly, the method is evaluated empirically using a big number of synthetic datasets. The performance of the proposed method can also be compared with existing methods. Secondly, the method is evaluated using real-life datasets. • Communication: If a valid solution is built, the work can be presented to the research community through academic conferences, workshops or journals. 4. Current Results In this section, we briefly introduce some of our work to discover complex process behaviours in event logs. Please refer to the corresponding references for the details of our proposed methods and evaluation results. It has to be noted that all work presented in this section has been evaluated using publicly available datasets. The implementations of these methods are also publicly available. 4.1. Discovering Switch Behaviours in Process Mining In [17], we propose a novel method to extend the inductive miner to discover switch behaviours. Assume there is an exclusive decision choice in a process model, and the decision point is split into multiple branches. In real-life event logs, it is possible to switch between different Table 1 Comparing the extended inductive miner with the original inductive miner branches after the decision has been made. However, due to the limitation of process trees, the original inductive miner is unable to discover such behaviours. A ”follower model” with very low precision can be returned when such behaviours exist in event logs. To solve the problem, we firstly extend the process tree notation to allow the presentation of switch behaviours. The new notation is called ”switch process tree”. Each switch process tree can be translated into an equivalent workflow net. In addition, with some constraints of switch process trees, their corresponding workflow nets can be guaranteed to be sound. Fig. 1 shows an example process tree and its corresponding workflow net. The extension of process trees allows us to extend the inductive miner to discover switch behaviours. As shown in Table 1, when switch behaviours exist in event logs, our proposed method can significantly improve the precision of the discovered process models. The discovered process models by the extended inductive miner can accurately present the process behaviours in event logs. Finally, the method is also evaluated using a real-life dataset (”BPIC13-incident” event log from the ”4TU Center for Research Data”). As shown in Table 2, a more accurate and simpler process model is discovered by our proposed method. Table 2 Evaluation results with the publicly-available dataset (IMs refers to our approach, SM refers to the Split Miner [27], and IMf refers to the inductive miner infrequent [15]) Figure 2: Comparing our drift detection algorithm with the baseline [22] under different settings when noises are inserted in event logs. Please refer to [22] for details of the evaluation results 4.2. Discovering Concept Drifts in Process Mining In [21], we propose a method to accurately detect time points of process drifts in event logs. A process drift point is defined as the time point when there is a statistically significant difference among the observed process behaviours before and after the change point. Most existing process drift detection algorithms assume the input event logs to be clean and cannot differentiate process drifts from noises. Our proposed method can not only accurately detect process drift points, but can also be robust to noises. Evaluation results show that our proposed method can consistently perform better than existing methods. Fig. 2 shows the evaluation results comparing to the baseline [22] when 20% of noises are inserted into the event logs. In [28], we propose a method to detect branching frequency changes in process models. Branching frequency changes refer to changes in frequencies between different options when there is an exclusive choice. Our method is evaluated using a publicly available event log (”Italian Help Desk” event log from the ”4TU Center for Research Data”). The process model is presented in Fig. 3 and the results are presented in Fig. 4. Two frequency changes are detected in the event log, and the frequencies between choosing activity ”Wait”, ”Require upgrade” and ”Resolve ticket” after ”Take in charge ticket” change significantly after each drift point. 5. Conclusions and Future Work In this paper, we explain the need for developing new techniques to discover complex behaviours in event logs for better understanding of business processes. On the one hand, existing process Figure 3: Process model of the “Italian help desk” log Figure 4: Visualisation of the detected branching frequency changes in the ”Italian Help Desk” log discovery algorithms can be extended to handle complex behaviours. On the other hand, stand-alone tools can also be constructed for the detection of specific types of behaviours. Although we have provided some solutions for the research problem, there are still some challenges. Firstly, there is the lack of methods to validate the results. For example, in [17], although a process model with higher precision value can be obtained from the publicly available event log, it may still be insufficient to conclude that switch behaviours actually exist in the process. We plan to evaluate this work empirically to evaluate its capabilities in the future. Secondly, some methods heavily rely on user-defined parameters. For example, In [21], a window size is required from users. The window size defines the number of events in each sample when performing statistical tests. Different process drift points can be reported with different window sizes. In [28], the detection of drift points relies on an external change detection algorithm. The detection results are dependent on the parameters of the external algorithm. Finally, we also plan to develop methods to discover other complex behaviours such as the automatic detection of cancellation behaviours and non-free-choice behaviours in event logs. Acknowledgments This research project is supervised by Associate Professor Simon K. Poon from the School of Computer Science, the University of Sydney. References [1] W. van der Aalst, Process Mining, Springer Berlin Heidelberg, Berlin, Heidelberg, 2016. doi:1 0 . 1 0 0 7 / 9 7 8 - 3 - 6 6 2 - 4 9 8 5 1 - 4 . [2] A. Augusto, R. Conforti, M. Dumas, M. La Rosa, F. M. Maggi, A. Marrella, M. Mecella, A. Soo, Automated discovery of process models from event logs: review and benchmark, IEEE transactions on knowledge and data engineering 31 (2018) 686–705. [3] W. M. van der Aalst, What makes a good process model?, Software & Systems Modeling 11 (2012) 557–569. [4] F. Zandkarimi, J.-R. Rehse, P. Soudmand, H. Hoehle, A generic framework for trace clustering in process mining, in: 2020 2nd International Conference on Process Mining (ICPM), 2020, pp. 177–184. doi:1 0 . 1 1 0 9 / I C P M 4 9 6 8 1 . 2 0 2 0 . 0 0 0 3 4 . [5] W. M. van der Aalst, On the representational bias in process mining, in: 2011 IEEE 20th International Workshops on Enabling Technologies: Infrastructure for Collaborative Enterprises, IEEE, 2011, pp. 2–7. [6] M. Dees, B. Hompes, W. M. van der Aalst, Events put into context (epic), in: 2020 2nd International Conference on Process Mining (ICPM), 2020, pp. 65–72. doi:1 0 . 1 1 0 9 / ICPM49681.2020.00020. [7] D. M. V. Sato, S. C. De Freitas, J. P. Barddal, E. E. Scalabrin, A survey on concept drift in process mining, ACM Computing Surveys (CSUR) 54 (2021) 1–38. [8] H. M. Marin-Castro, E. Tello-Leal, Event Log Preprocessing for Process Mining: A Review, Applied Sciences 11 (2021) 10556. doi:1 0 . 3 3 9 0 / a p p 1 1 2 2 1 0 5 5 6 . [9] W. Van der Aalst, T. Weijters, L. Maruster, Workflow mining: Discovering process models from event logs, IEEE transactions on knowledge and data engineering 16 (2004) 1128–1142. [10] A. A. De Medeiros, B. F. Van Dongen, W. M. Van der Aalst, A. Weijters, Process mining: Extending the 𝛼-algorithm to mine short loops (2004). [11] Q. Guo, L. Wen, J. Wang, Z. Yan, P. S. Yu, Mining invisible tasks in non-free-choice constructs, in: International Conference on Business Process Management, Springer, 2016, pp. 109–125. [12] L. Wen, J. Wang, W. M. van der Aalst, B. Huang, J. Sun, Mining process models with prime invisible tasks, Data & Knowledge Engineering 69 (2010) 999–1021. [13] L. Wen, W. M. Van Der Aalst, J. Wang, J. Sun, Mining process models with non-free-choice constructs, Data Mining and Knowledge Discovery 15 (2007) 145–180. [14] S. J. J. Leemans, D. Fahland, W. M. P. van der Aalst, Discovering block-structured process models from event logs - a constructive approach, in: J.-M. Colom, J. Desel (Eds.), Ap- plication and Theory of Petri Nets and Concurrency, Springer Berlin Heidelberg, Berlin, Heidelberg, 2013, pp. 311–329. [15] S. J. J. Leemans, D. Fahland, W. M. P. van der Aalst, Discovering block-structured pro- cess models from event logs containing infrequent behaviour, in: N. Lohmann, M. Song, P. Wohed (Eds.), Business Process Management Workshops, Springer International Pub- lishing, Cham, 2014, pp. 66–78. [16] M. Leemans, W. M. P. van der Aalst, Modeling and discovering cancelation behavior, in: H. Panetto, C. Debruyne, W. Gaaloul, M. Papazoglou, A. Paschke, C. A. Ardagna, R. Meersman (Eds.), On the Move to Meaningful Internet Systems. OTM 2017 Conferences, Springer International Publishing, Cham, 2017, pp. 93–113. [17] Y. Lu, Q. Chen, S. Poon, A novel approach to discover switch behaviours in process mining, in: S. Leemans, H. Leopold (Eds.), Process Mining Workshops, Springer International Publishing, Cham, 2021, pp. 57–68. [18] M. Leemans, W. M. Van Der Aalst, M. G. Van Den Brand, Recursion aware modeling and discovery for hierarchical software event log analysis, in: 2018 IEEE 25th international conference on software analysis, evolution and reengineering (SANER), IEEE, 2018, pp. 185–196. [19] X. Lu, A. Gal, H. A. Reijers, Discovering hierarchical processes using flexible activity trees for event abstraction, in: 2020 2nd International Conference on Process Mining (ICPM), 2020, pp. 145–152. doi:1 0 . 1 1 0 9 / I C P M 4 9 6 8 1 . 2 0 2 0 . 0 0 0 3 0 . [20] S. J. Leemans, K. Goel, S. J. van Zelst, Using multi-level information in hierarchical process mining: Balancing behavioural quality and model complexity, in: 2020 2nd International Conference on Process Mining (ICPM), 2020, pp. 137–144. doi:1 0 . 1 1 0 9 / ICPM49681.2020.00029. [21] Y. Lu, Q. Chen, S. Poon, A robust and accurate approach to detect process drifts from event streams, in: A. Polyvyanyy, M. T. Wynn, A. Van Looy, M. Reichert (Eds.), Business Process Management, Springer International Publishing, Cham, 2021, pp. 383–399. [22] A. Ostovar, A. Maaradji, M. La Rosa, A. H. M. ter Hofstede, B. F. V. van Dongen, Detect- ing drift from event streams of unpredictable business processes, in: I. Comyn-Wattiau, K. Tanaka, I.-Y. Song, S. Yamamoto, M. Saeki (Eds.), Conceptual Modeling, Springer Inter- national Publishing, Cham, 2016, pp. 330–346. [23] A. Yeshchenko, C. Di Ciccio, J. Mendling, A. Polyvyanyy, Visual drift detection for sequence data analysis of business processes, IEEE Transactions on Visualization and Computer Graphics (2021). [24] X. Lu, D. Fahland, F. J. H. M. van den Biggelaar, W. M. P. van der Aalst, Handling duplicated tasks in process discovery by refining event labels, in: M. La Rosa, P. Loos, O. Pastor (Eds.), Business Process Management, Springer International Publishing, Cham, 2016, pp. 90–107. [25] Y. Dubinsky, P. Soffer, Detecting the “split-cases” workaround in event logs, in: A. Augusto, A. Gill, S. Nurcan, I. Reinhartz-Berger, R. Schmidt, J. Zdravkovic (Eds.), Enterprise, Business- Process and Information Systems Modeling, Springer International Publishing, Cham, 2021, pp. 47–61. [26] A. R. Hevner, S. T. March, J. Park, S. Ram, Design science in information systems research, MIS Quarterly 28 (2004) 75–105. [27] A. Augusto, R. Conforti, M. Dumas, M. La Rosa, A. Polyvyanyy, Split miner: automated discovery of accurate and simple business process models from event logs, Knowledge and Information Systems 59 (2019) 251–284. [28] Y. Lu, Q. Chen, S. Poon, Detecting and understanding branching frequency changes in process models, in: A. Augusto, A. Gill, S. Nurcan, I. Reinhartz-Berger, R. Schmidt, J. Zdravkovic (Eds.), Enterprise, Business-Process and Information Systems Modeling, Springer International Publishing, Cham, 2021, pp. 39–46.