Towards Semantic-driven, Declarative and Interactive Process Mining Christian Dormagen1 1 Otto-Friedrich Universität Bamberg, Kapuzinerstraße 16 96047 Bamberg, Germany Abstract Semantic knowledge about organizational processes, available either explicitly in information systems, extracted as event log traces, or implicitly as expert knowledge, is underutilized in process mining. This PhD proposal aims to advance process mining by integrating implicit and explicit business process knowledge into a semantically enriched event log that links data from different sources. The challenges of noisy event logs and data integration require an optimal level of abstraction. We aim to extend the scope of the semantic event log by incorporating different levels of abstraction, hierarchical structures and process perspectives such as macroscopic system-level behavior. Central to our strategy is the development of interactive declarative process discovery methods using the semantic event log, combining inductive learning, logical reasoning as well as explainable and interactive machine learning. The goal is to enable incremental correction and refinement of process models by domain experts, with feedback formalized in the semantic event log as background knowledge. This research aims to bridge theoretical advances with practical applications, evaluated in collaboration with industry partners, to promote more effective process discovery, exploration and understanding. Keywords Semantic Process Mining, Explanatory Interactive Machine Learning, Event Log Abstraction 1. Introduction Classical process mining methods typically treat information within event logs as abstract tokens, lacking a deeper exploitation of the underlying semantics governing process behavior. This leads to discovered process models potentially being semantically unsound, for example allowing a Send Order before a Create Order event in a purchase-to-pay (P2P) process due to artifacts in the data. Explicit incorporation of semantic knowledge about cause-effect relations between events can identify and mitigate such errors. Another challenge arises when unifying possibly heterogeneous data from various sources. Different sites of one company may use varied namespaces or record data at different granularity levels. Integrating this information requires a semantic understanding of the underlying entities and their hierarchical relationships. These problems exist despite the fact that semantic information is usually available in some form, either encoded in the ERP system supporting process execution, made explicit in technical documents [1], or implicit as expert knowledge of the human actors executing the process. This highlights the need to move beyond classical process mining and towards semantic process ICPM Doctoral Consortium and Demo Track 2023 $ christian.dormagen@uni-bamberg.de (C. Dormagen) © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings mining, which has recently been recognized by the process mining community [2, 1, 3, 4]. These considerations motivate our research questions: RQ1: Can we extract and encode existing semantic knowledge in a semantic event log? RQ2: Can we further enrich the semantic event log with additional information to provide different perspectives on the process behavior beyond typical event sequences? RQ3: Can we use the information encoded in the semantic event log to enable new types of process analysis and exploration beyond the traditional workflow perspective? RQ4: Can we develop declarative process discovery methods on the foundation of the semantic event log to discover more accurate process models? We outline our approach to extend current event log abstraction methods to explicitly capture system behavior and enrich the event log with contextual background knowledge, transforming it into a semantic event log. We then discuss the semantic foundations for formalizing and encoding this knowledge. Finally, we discuss process discovery methods based on the semantic event log and propose a practical evaluation concept for our approach. 2. Methodology 2.1. Event log abstraction Traditional event log abstraction methods transform event logs from a lower to a higher level of granularity [5]. This is necessary because the granularity of raw data from ERP systems often exceeds the level of detail required for analysis and contains artifacts from human error or interactions between different information systems. These methods typically aim to cluster multiple events representing an activity or sub-process within the target process. Note that this inherently assigns semantics to the data through a hierarchy of events. If we cannot infer this information from available context or expert knowledge, existing discovery methods can still uncover it. We are particularly interested in event log abstraction methods that go beyond adjusting log granularity and instead transform the information to capture a novel view of process behavior. An example of this is the system-level event discovery method proposed by Bakullari et al. [6]. This method abstracts a classical event log into a log of system-level events that highlight patterns of interest across multiple cases, such as high resource workload or atypical delays in specific steps. We want to explicitly incorporate this system-level behavior into our semantic event log. As a result, we plan to extend the method by introducing new types of system- level patterns for discovery, such as batching, the simultaneous execution of an event type across multiple cases. Since batching behavior can lead to delays or spikes in workload, it is of interest to explain delays in a process [7, 8]. We expect that explicitly discovering such patterns and enriching the semantic log with them will enable us to reason more easily about the dependencies between individual cases and the overall systemic behavior within a process. 2.2. Semantic grounding To capture the semantics of an event log and of contextual background knowledge, we aim to formally define meaningful representations inspired by [3, 9, 1, 10, 2, 11] that encode causal, hierarchical and temporal relationships between events and attribute types typical of event logs. We are interested in both structural knowledge about the domain of the process and knowledge of control-flow dependencies. Structural knowledge refers to the types of entities and relations which the process manipulates. It includes for example hierarchical sub/superclass relationships among events or resources. This knowledge facilitates seamless integration of diverse data sources and grants flexibility in adjusting the level of event or resource granularity during analysis. Control-flow dependencies are for example self-imposed via business rules or exist as laws of nature [12]. These relations enforce a strict order of event execution and any trace violating the imposed event order is inherently erroneous. Our earlier example of Send Order event not being possible before a corresponding Create Order event is such a law of nature relation, while a business rule might dictate that ’Any bank transfer above a certain value must first be checked by a resource with a sufficiently high level of authorization.’ We can derive such knowledge from specific details about the organization in which the process is executed. In addition, we can refer to generic knowledge available about the type of process. For example, process types such as P2P and Order-to-Cash (O2C) are ubiquitous and have a wealth of typical or expected process behavior to reference. Various logical formalisms are commonly used to specify knowledge in this context. For expressing time-variant properties, LTLf is a frequently used formalism and is widely used in declarative process mining as the underlying logic of DECLARE [13]. Relational knowledge, on the other hand, is often specified using Description Logics (DL) [14]. DLs are a family of knowledge representation languages widely used in the Semantic Web to formalize and specify ontologies, which we plan to use in order to encode our extracted knowledge. Ontologies are closely related to knowledge graphs through their role in providing a logical framework for representing structured knowledge in such graphs. We plan to realize our semantic event log in the form of a knowledge graph encoding, integrating contextual background knowledge from linked data sources such as formal ontologies from the semantic web. Furthermore, a knowledge graph representation may offer solutions to recognized issues with classical event logs, including con/divergence and the limitation in capturing multiple case concepts or process views within a single data structure [15]. Of particular interest to us is the causal and temporal specification of system-level event patterns. Our focus will be on investigating extended semantics of window operators. Bakullari et al.[6] utilize a simple window function for pattern discovery that divides a log into fixed-size windows. Our plan is to develop generalized window semantics to enable the representation and discovery of more complex behavior patterns. Consequently, we are particularly interested in languages that support expressive temporal window operators, such as LARS[16], which supports time-based, tuple-based, partition-based, and filter-based window operators. 2.3. Declarative interactive process mining Based on the enriched semantic log, we plan to develop declarative process discovery and analysis methods, combining inductive learning (e.g. Inductive Logic Programming (ILP) [17]), logical inference methods and explanatory interactive machine learning [18]. We plan to use ILP for discriminative mining in the analysis of system-level behavior, made explicit as system-level events by splitting the log into traces affected and unaffected by a system- level event of interest. This can allow us to learn a discriminative process model and gain further insight into the factors that lead to a particular system-level event and its consequences. For example, we can split the log into traces that are affected by a high workload of a particular resource and use the learned discriminative model to guide the experts using the system in restructuring the process to prevent such events from occurring. Our declarative process discovery algorithm will leverage the enhanced semantic representa- tion and contextual background knowledge. This knowledge constrains the set of discoverable process models, aiming to improve process discovery. Furthermore, it facilitates the identifi- cation of new and implicitly hidden relationships shared between event patterns or process instances. Attempts at incorporating richer constraints over data, classes or relations of events into declarative process mining have already been made, most notably in the context of object- centric behavioral constraints [19, 20] for the DECLARE [13] language. The ideas presented there can serve as a starting point and to inform our own declarative miner. Furthermore, the semantic information is crucial as a foundation for the realization of Explainable and Interactive Machine Learning (XIML) within process mining. This enables a human-in-the-loop approach that incorporates explanations and considers the collaborative engagement of domain experts. While initial approaches to implementing XIML methods in process mining exist, they either lack the ability to make additional knowledge explicit for later process discovery[21] or are post-hoc methods that modify an existing process model without integration into the discovery process itself [22]. Our plan is to implement an XIML method inspired by the CAIPI algorithm [18, 23], enabling expert users to provide feedback that is then fed back into the semantic log. This approach facilitates incremental correction and refinement of process models during process discov- ery. Consequently, the contextual background is progressively extended by domain experts, enhancing the discovery, analysis, and exploration of process models. 2.4. Evaluation We will evaluate our method on synthetic logs for correctness of implementation. For compari- son with established methods we will also apply it to existing event logs, including the BPIC19 log [24], as it is a P2P log with substantial contextual knowledge. Working with an industry partner, we have access to real-world P2P and O2C event logs from production ERPs. In addition, a case study involving industry experts will provide feedback on our XIML based declarative pro- cess discovery method, validating the correctness and practicality of our semantically enriched event log and discovery approach. Acknowledgments This project is funded by the Bavarian research program (BayVFP) under KIGA (DIK0313). References [1] O. Nykänen, A. Rivero-Rodriguez, P. Pileggi, P. A. Ranta, M. Kailanto, J. Koro, Associating event logs with ontologies for semantic process mining and analysis, in: Proceedings of the 19th International Academic Mindtrek Conference, ACM, 2015. doi:10.1145/2818187.2818273. [2] A. K. A. de Medeiros, W. van der Aalst, C. Pedrinaci, Semantic process mining tools: core building blocks, in: 16th European Conference on Information Systems, 2008. URL: https://oro.open.ac.uk/23397/. [3] S. Esser, D. Fahland, Multi-dimensional event data in graph databases, Journal on Data Semantics 10 (2021). [4] A. K. A. de Medeiros, C. Pedrinaci, W. M. P. van der Aalst, J. Domingue, M. Song, A. Rozinat, B. Norton, L. Cabral, An outlook on semantic business process mining and monitoring, in: On the Move to Meaningful Internet Systems 2007: OTM 2007 Workshops, Springer Berlin Heidelberg, 2007, pp. 1244–1255. [5] S. J. van Zelst, F. Mannhardt, M. de Leoni, A. Koschmider, Event abstraction in process mining: literature review and taxonomy, Granular Computing 6 (2020) 719–736. doi:10.1007/s41066-020-00226-2. [6] B. Bakullari, W. M. P. van der Aalst, High-level event mining: A framework, 2022. [7] P. Waibel, C. Novak, S. Bala, K. Revoredo, J. Mendling, Analysis of business process batching using causal event models, in: Lecture Notes in Business Information Processing, Springer International Publishing, 2021. [8] E. L. Klijn, D. Fahland, Performance mining for batch processing using the performance spectrum, in: Business Process Management Workshops, Springer International Publishing, 2019. [9] K. Okoye, A.-R. Tawil, U. Naeem, E. Lamine, Semantic Process Mining Towards the Discovery And Enhancement Of Learning Models Analysis, 2016, pp. 121–164. doi:10.13140/RG.2.1.4793.4325. [10] A. Rebmann, H. van der Aa, Extracting semantic process information from the natural language in event logs, in: Advanced Information Systems Engineering, Springer International Publishing, 2021. [11] O. Kingsley, A.-R. H. Tawil, U. Naeem, S. Islam, E. Lamine, Using semantic-based approach to manage perspec- tives of process mining: Application on improving learning process domain data, in: 2016 IEEE International Conference on Big Data (Big Data), 2016, pp. 3529–3538. doi:10.1109/BigData.2016.7841016. [12] G. Adamo, C. D. Francescomarino, C. Ghidini, F. M. Maggi, Beyond arrows in process models: A user study on activity dependences and their rationales, Information Systems 100 (2021). [13] M. Pesic, H. Schonenberg, W. M. van der Aalst, DECLARE: Full support for loosely-structured processes, in: 11th IEEE International Enterprise Distributed Object Computing Conference (EDOC 2007), IEEE, 2007. [14] F. Baader, D. Calvanese, D. L. McGuinness, D. Nardi, P. F. Patel-Schneider (Eds.), The Description Logic Handbook, 2 ed., Cambridge University Press, Cambridge, UK, 2007. [15] W. M. P. van der Aalst, Object-centric process mining: Dealing with divergence and convergence in event data, in: Software Engineering and Formal Methods, Springer International Publishing, 2019. [16] H. Beck, M. Dao-Tran, T. Eiter, LARS: A logic-based framework for analytic reasoning over streams, Artificial Intelligence 261 (2018) 16–70. doi:10.1016/j.artint.2018.04.003. [17] S. Muggleton, Inductive logic programming, New Generation Computing 8 (1991). [18] S. Teso, K. Kersting, Explanatory Interactive Machine Learning, in: V. Conitzer, G. K. Hadfield, S. Vallor (Eds.), Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, AIES 2019, Honolulu, HI, USA, January 27-28, 2019, ACM, 2019, pp. 239–245. doi:10.1145/3306618.3314293. [19] G. Li, R. M. de Carvalho, W. M. P. van der Aalst, Automatic discovery of object-centric behavioral constraint models, in: Business Information Systems, Springer International Publishing, 2017. [20] A. Artale, A. Kovtunova, M. Montali, W. M. P. van der Aalst, Modeling and reasoning over declarative data- aware processes with object-centric behavioral constraints, in: Lecture Notes in Computer Science, Springer International Publishing, 2019, pp. 139–156. doi:10.1007/978-3-030-26619-6_11. [21] P. M. Dixit, H. M. W. Verbeek, J. C. A. M. Buijs, W. M. P. van der Aalst, Interactive data-driven process model construction, in: Conceptual Modeling, Springer International Publishing, 2018. [22] P. M. Dixit, J. C. A. M. Buijs, W. M. P. van der Aalst, B. F. A. Hompes, J. Buurman, Using domain knowledge to enhance process mining results, in: Lecture Notes in Business Information Processing, Springer International Publishing, 2017, pp. 76–104. doi:10.1007/978-3-319-53435-0_4. [23] E. Slany, Y. Ott, S. Scheele, J. Paulus, U. Schmid, Caipi in practice: Towards explainable interactive medical image classification, 2022. doi:10.48550/ARXIV.2204.02661. [24] B. van Dongen, Bpi challenge 2019, 2019. doi:10.4121/UUID:D06AFF4B-79F0-45E6-8EC8-E19730C248F1.