Interactive Process Drift Detection: A Framework for Visual Analysis of Process Drifts (Extended Abstract) Denise Maria Vecino Sato Rafaela Mantovani Fontana Jean Paul Barddal Edson Emilio Scalabrin Graduate Program in Informatics Department of Professional and Graduate Program in Informatics Graduate Program in Informatics Pontifícia Universidade Católica Technological Education Pontifícia Universidade Católica Pontifícia Universidade Católica do Paraná and Instituto Federal Universidade Federal do Paraná do Paraná do Paraná do Paraná Curitiba, Brazil Curitiba, Brazil Curitiba, Brazil Curitiba, Brazil 0000-0001-6350-4167 0000-0001-9928-854X 0000-0002-3918-179 0000-0003-1117-7082 Abstract—Interactive Process Drift Detection (IPDD) is a model. However, the most common perspective considered in framework for visual analysis of process drifts. A process drift the available tools is the control flow. Identifying and indicates a change in the process model occurred at some point in understanding the process drifts is relevant for business analysts time. IPDD firstly generates process models for subparts of the because it improves their knowledge about the processes and event log using a sliding window approach. Then, it detects the enhances the quality of process mining analysis. Even when drifts by evaluating similarity metrics calculated between adjacent analysts perform offline process mining analysis, process drift process models; a difference in some of the metrics indicates a detection can provide benefits, e.g., avoid complex discovered drift. The current implementation of IPDD generates the process process models, improve conformance checking, or enhance models using the directly-follows graph (DFG) and applies two processes based on their current state. metrics: nodes and edges similarity. The user interface shows the drifts in the process models over time, allowing the user to visually Different tools for detecting process drifts from event logs understand the model changes. Also, the user can easily change the have been proposed, but the accuracy of the detection is usually hyperparameters for the analysis and verify the results on the related to the hyperparameter configuration [3]. The ProDrift interface. The user interface of IPDD allows the user to evaluate plugin in Apromore [4], [5] and the ConceptDrift plugin in ProM the detected drifts by calculating the F-score metric, which is [2] can detect different types of drifts (sudden and gradual); useful when using artificial datasets. The underlying idea is to ease however, the focus is the change point and information about it. the choice of a “good” value for the hyperparameter The user has to complement the drift analysis by executing a configuration, which is critical for almost any drift detection tool. more exploratory mining slicing the event log based on the Keywords— process drift detection, visual process analysis, reported change points to understand the evolution of the process drift, concept drift process. A more recent tool, named VDD [6], detects the four types of drifts and allows the user to explore the drift using the I. INTRODUCTION process model. However, the tool is based on constraints mined over Declare models, and it mixes DFGs with the constraints to Process mining aims at creating valuable knowledge about explain the dynamic of the process over time. None of the business processes obtained from information systems event identified tools calculate an accuracy metric in the user interface. data. Usually, process mining techniques assume the processes to be steady-state, i.e., the event data contains information from Tunning the hyperparameter configuration to enhance the a unique version of the process. However, this assumption does detection accuracy imposes a challenge to the proposed tools not reflect the reality of the business processes, which constantly because the different approaches are affected by the adapt to new regulations, improve performance, or enhance user hyperparameter configuration. IPDD aims to overcome this experience. The situation where a process changes while being issue by providing an interactive user interface where the user analyzed is named concept drift or process drift [1]. quickly changes the parameter and visually evaluates the results. The tool provides visual process drift detection analysis by The change in the process can affect the ongoing instances, showing the distinct process models over time, in what we can sudden or gradually. A sudden drift occurs when all the ongoing consider a “replay” of the process models. IPDD also provides instances start to follow the new process model immediately. In information about the differences against the previous model for a gradual drift, there is a period of time where instances from each process model, enhancing the analysis. IPDD’s current both versions of the process model coexist. The process drifts implementation detects sudden drifts in the control-flow can also follow recurrent or incremental patterns. A recurrent perspective offline, which is a limitation. drift indicates that a replaced process model can occur again. In an incremental drift, minor changes of the process model are II. IPDD MAIN FEATURES implemented during some time. Sudden, gradual, incremental, and recurring are considered process drift types [2]. The process The IPDD framework detects the process drifts by analyzing drift can also affect one or more perspectives of the process the event log using a sliding window strategy. First, the user This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001. Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 XXX-X-XXXX-XXXX-X/XX/$XX.00 International (CC BY 4.0). ©20XX IEEE defines the window size based on the number of traces, and IPDD splits the log using tumbling windows. Then, it generates a model for each window and calculates the similarity metrics between adjacent models. The idea is to compare models mined from adjacent time slots using similarity metrics; when they are not similar, IPDD identifies a drift and characterizes the change based on the information provided by the metric. The IPDD’s current implementation mines the DFGs (process maps) from the traces in the time slots using the Pm4Py [7]. Then, the adjacent derived graphs are compared using the Nodes (NS) and Edges similarity (ES) metrics. NS is calculated using Eq. 1 [2], where 𝑛𝑝 and 𝑛𝑞 are the number of activities in the process maps 𝑃 and 𝑄 (derived from adjacent windows) respectively, and 𝑛𝑐𝑠 indicates the number of common activities between 𝑃 and 𝑄. ES is calculated using Eq. 2, similar to NS: 𝑒𝑝 Fig.1. Screenshot from the main window. is the number of edges in 𝑃, 𝑒𝑞 is the number of edges in 𝑄, and 𝑒𝑐𝑠 indicates the number of common edges in both 𝑃 and 𝑄. to check different hyperparameter configurations to overcome this challenge visually. 𝑁𝑆 = 2 ∗ 𝑛𝑐𝑠 = (𝑛𝑝 + 𝑛𝑞) (1) The tool was presented to our research group in Curitiba 𝐸𝑆 = 2 ∗ 𝑒𝑐𝑠 = (𝑒𝑝 + 𝑒𝑞) (2) (Brazil), including researchers from three post-graduate IPDD calculates both metrics, and if one or both is less than programs (Informatics, Production and Systems Engineering, 0, it marks the window as a drift. The F-score metric uses the and Health Technology). Firstly we have conducted a usability True Positives (TP), False Positives (FP), and FN (False assessment for redesigning the user interface. Currently, we are Negatives). A TP indicates a window reported as a drift working on a case study on a manufacturing scenario. The idea containing a trace inputted as a real drift; an FP is counted when is to detect drifts in the temporal perspective of the process a window reporting a drift does not contain any trace informed (sojourn time). The information about drifts will be used as input as real drifts, and an FN is incremented when a window that does for planning the maintenance intervals on the production line. not report a drift contains any traces inputted as actual drifts. ACKNOWLEDGMENT Fig. 1 shows the tool’s main screen, allowing users to easily This study was financed in part by the Coordenação de change parameters and visually check the results. The parameter Aperfeiçoamento de Pessoal de Nível Superior-Brasil (CAPES) configuration panel is on top, where users must define the - Finance Code 001 – Grant No.: 88887.321450/2019-00. hyperparameter configuration before starting the analysis. After clicking on “Analyze Process Drifts”, users can follow the REFERENCES current status in the “Status” area below the parameters panel. [1] W. M. P. Van der Aalst et al., “Process Mining Manifesto,” in When the analysis finishes, IPDD shows the process drift International Conference on Business Process Management BPM 2011: analysis panel. There is a timeline of windows in the upper part Business Process Management Workshops, 2011, vol. 99, pp. 169–194. of this panel, where users can click to inspect specific windows [2] R. P. J. C. Bose, W. M. P. van der Aalst, I. Zliobaite, and M. Pechenizkiy, of the process model. The similarity metrics information (on the “Dealing With Concept Drifts in Process Mining,” IEEE Trans. Neural left side) is updated for each window selected, providing Networks Learn. Syst., vol. 25, no. 1, pp. 154–171, Jan. 2014. information about the differences between the current and the [3] S. M. Vecino, D. F. Cristiana, B. Paul, and S. Emilio, “A Survey on previous model. In the example, the ES indicates a drift that is Concept Drift in Process Mining,” ACM Comput. Surv., vol. 54, no. 9, pp. 1–38, Oct. 2021. characterized by two edges added. After IPDD finishes the [4] A. Maaradji, M. Dumas, M. La Rosa, and A. Ostovar, “Fast and Accurate analysis, the user can show the evaluation panel to calculate the Business Process Drift Detection,” in International Conference on F-score metric by clicking “Evaluate results”. IPDD framework Business Process Management BPM 2016: Business Process is described in more detail in [8]. Its source code is available in Management, 2015, pp. 406–422. a public repository1, the deployed application is available in a [5] A. Maaradji, M. Dumas, M. L. Rosa, and A. Ostovar, “Detecting Sudden public node2, and a demo video is available on YouTube3. and Gradual Drifts in Business Processes from Execution Traces,” IEEE Trans. Knowl. Data Eng., vol. 29, no. 10, pp. 2140–2154, 2017. III. CASE STUDIES [6] A. Yeshchenko, C. Di Ciccio, J. Mendling, and A. Polyvyanyy, “Visual Drift Detection for Sequence Data Analysis of Business Processes,” IEEE Authors have proposed different tools for process drift Trans. Vis. Comput. Graph., pp. 1–1, 2021. detection. However, the methods are usually sensitive to the [7] A. Berti and S. van Zelst, “Process Mining for Python (PM4Py): Bridging hyperparameter configuration. Moreover, almost all approaches the Gap Between Process- and Data Science.” 2019. apply windowing strategies – and defining a “good” value for [8] D. M. V. Sato, J. P. Barddal, and E. E. Scalabrin, “Interactive Process the window size is still a challenge. Also, the adaptive Drift Detection Framework,” in International Conference on Artificial approaches have some drawbacks; other parameters affect the Intelligence and Soft Computing (ICAISC), 2021, pp. 192–204. detected drifts [3]. Our IPDD approach gives users the freedom 3 1 https://github.com/denisesato/InteractiveProcessDriftDetectionFW Demonstration video at: https://youtu.be/8feKd6jr8Gs 2 http://visual-pro-drift.com.br:8050/ Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).