Multi-perspective path semantics in process mining based on causal process knowledge Lukas Pfahlsberger1,˚ , Christoffer Rubensson1 , Steven Knoblich1 , Maxim Vidgof2 and Jan Mendling1 1 Humboldt-Universität zu Berlin (HU Berlin), Rudower Chaussee 25, 12489 Berlin-Adlershof, Germany 2 Wirtschaftsuniversität Wien (WU), Welthandelsplatz 1, 1020 Wien, Austria Abstract Process mining allows process analysts to investigate business processes with the help of algorithms and event log data. To better identify and understand inefficiencies in discovered process models, various visualization techniques have been proposed to enhance these models with further information, such as displaying the duration of execution time between activities using sequential color schemes or integrating statistical metrics into the model through textual annotations. However, it remains a challenge for analysts to identify interesting behavioral patterns in directly follows graphs. Consequently, this may lead process analysts to draw incorrect conclusions or be unable to identify the root causes for answering their analytical questions. This paper proposes a novel set of path semantics based on causal knowledge. We further examine how several combined path semantics, referred to as pattern types, may provide analysts with additional information on the underlying behavior. By examining an order-to-cash process in the real world, we demonstrate the usefulness and additional benefits of these path semantics for process analysts. Keywords Process mining, visual analytics, path semantics, causal process knowledge, directly follows 1. Introduction Over the last two decades, a plethora of different techniques, methods, and approaches for visually representing business processes discovered from event-log data has been developed to support analysts in making better and faster decisions [1]. However, an essential part of these proposed process representations hardly differs in the semantic meaning of the visual components. In particular, the semantics of the paths are almost without exception limited to a single meaning, namely, a directly follows relationship. Companion Proceedings of the 16th IFIP WG 8.1 Working Conference on the Practice of Enterprise Modeling and the 13th Enterprise Design and Engineering Working Conference, November 28 – December 1, 2023, Vienna, Austria ˚ Corresponding author. $ lukas.pfahlsberger@hu-berlin.de (L. Pfahlsberger); christoffer.rubensson@hu-berlin.de (C. Rubensson); steven.knoblich@hu-berlin.de (S. Knoblich); maxim.vidgof@wu.ac.at (M. Vidgof); jan.mendling@hu-berlin.de (J. Mendling) € https://www.informatik.hu-berlin.de/en/forschung-en/gebiete/promis-en/team/pfahlsbl (L. Pfahlsberger); http://hu.berlin/rubensson (C. Rubensson); https://nm.wu.ac.at/nm/vidgof (M. Vidgof); http://hu.berlin/mendling (J. Mendling)  0000-0002-1367-9441 (L. Pfahlsberger); 0009-0004-4940-5866 (C. Rubensson); 0009-0002-8509-7042 (S. Knoblich); 0000-0003-2394-2247 (M. Vidgof); 0000-0002-7260-524X (J. Mendling) © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings In this paper, we propose our vision for multi-perspective path semantics. To this end, we integrate causal process knowledge [2] that allows differentiating between eight different path semantics. Furthermore, we abstract the individual path semantics into distinguishable pattern types that are categorized into allowed and prohibited behavior. This allows linking pattern types directly to generic use cases for process analysis. Thereby, analysts can identify and discover interesting behavior that indicates problems in visual process representations more precisely. We contribute to the field of visual analytics for process mining by proposing multi- perspective path semantics based on causal process knowledge. Our approach improves the precision and speed of the process analysis due to a linkage between the meaning of the paths and suitable use cases. Previous studies have presented visualization frameworks for process mining [3] that investigate the effectiveness of different ways of representing process mining outcomes. However, these works often neglect the aspect of path semantics and instead focus on alternative visual forms of aggregating, clustering, or sorting the process data. We further contribute to the integration of prior domain-specific knowledge into existing process discovery techniques [4, 5, 6, 7]. We thereby show that prior knowledge can not only help to improve the structural aspects of the model (e.g., by reducing complexity) but also the visual representation of its elements, such as paths. The remainder of the paper is structured as follows. Section 2 presents the theoretical background focusing on visual analytics, graph types in process mining, with a focus on arc semantics, and causal process knowledge. Section 3 introduces our vision for multi-perspective path semantics. Section 4 derives pattern types based on distinct combinations of path semantics on a process instance level. Section 5 links the pattern types to use cases, thus evaluating our vision for multi-perspective path semantics based on a real-world order-to-cash process. Section 6 points out implications for future research and limitations. Finally, Section 7 concludes this paper. 2. Background In this section, we discuss three distinct areas of research. First, we briefly introduce the topic of visualization and how it is utilized for analytical purposes. Second, we introduce previous works on the semantics of arcs in process models. Finally, we define causal process knowledge and explore its applications in process mining. 2.1. Visualization & visual analytics Visualization can be seen as the process of converting data into graphical representations [8, p. 3], enabling the derivation of insights otherwise difficult to discern from raw data sets. Visual analytics is a multidisciplinary field that employs visualization techniques for graphically representing knowledge and enhancing analytical reasoning [9, p. 4]. Many visualization techniques are available to accomplish this objective (cf., [10, 11]). Visual analytics relies on active interaction between users and data [9, p. 4], making human judgment a crucial component. Munzner’s nested model [12], a design and validation framework for visualizations, emphasizes a human-centered approach with a four-layer design process. The framework takes the domain problem and its intended user as a starting point. This is followed by translating the problem into a computer science context, designing the visualization, and finally creating its rendering mechanism. McKenna et al. [13] extend the nested model [12] with the overlapping activities understand, ideate, make, and deploy that further emphasize the user-centric motivation and their design outcomes for each step in the process. In another framework, Moere and Purchase [14] define roles with domain-specific needs for visualizations, which could facilitate the quality of the design solutions when met. These roles comprise the visualization studies (researchers), with the aim for utility and soundness; the visualization prac- tice (businesses), with the need for market-oriented solutions; and the visualization exploration (artists), who strive to create visually appealing yet workable designs [14, pp. 366-368]. Lastly, Moody [15] developed the Physics of Notations, a theory comprising a set of design princi- ples to support the creation and validation of cognitively effective visualizations in software engineering. An example principle is the Principle of Semiotic Clarity that ascribes designers to ensure a one-to-one relationship between graphical symbols and the semantic construct they represent [15, pp. 762-763]. Failure to adhere to this principle may result in ineffective visualizations, such as a symbol deficit whenever a semantic is not represented by any symbol or a symbol redundancy whenever multiple symbols refer to the same semantic [15, pp. 762-763]. The process of visualizing the data involves a range of dimensions to consider (cf., [15]). While we only provide a few examples, one aspect involves using geometrical objects to depict data, such as glyphs in different shapes [8]. A well-known depiction of data in statistics is the boxplot (e.g., [16, pp. 45-46]), in which numerical data is grouped into a single box with extending lines (whiskers) to indicate, i.a., data variability. In process science, a standard for visualizing processes is the Business Process Model and Notation (BPMN) standard1 , which, i.a., uses boxes, rhombuses, and arrows to, respectively, depict activities, gateways (decision points), and sequence flow between activities. Another aspect to consider is color, or color mapping, which can be used to map data to certain attributes [8, p.5]. Despite appearing trivial, color in visualization is a highly complex topic as it is closely linked to human perception through various channels, such as color properties (e.g., hue and saturation), color combinations, and geometrical patterns (e.g., [17]). Brewer [18] makes an important contribution that provides insights and guidelines on color mapping based on data types and human perception, further demonstrating the topic’s complexity. 2.2. Arc semantics in process models Some of the existing process modeling languages already provide arcs with different semantics. However, the purpose and concepts behind such differentiation is not the same across languages. On the other hand, some languages do not distinguish between different arc semantics. In directly-follows graphs [1], for instance, there is only a single type of arc connecting two activities. Its arc semantics simply considers observed events that follow the path between the two activities. These arcs can store additional information (e.g., the number of instances following the arc) or durations of transitions. In BPMN, there are two kinds of arcs, namely, control flow and message flow. The semantics of the former is that a process instance transitions 1 https://www.bpmn.org (Last accessed: 2023-12-10) from one activity to another. The semantics of the message flow is that a message from an external pool (e.g., sent from an external party such as an organization) is received by an activity or event in another pool or vice versa [19]. To the best of our knowledge, Petri nets have the largest variety of arc semantics. While the primary purpose of arcs in Petri nets is to represent the movement of the tokens between places and transitions, the exact semantics may differ. First, there are differences in terms of whether and how many tokens are moved. Traditionally, one token moves along the arc when it is consumed or produced by a transition. However, there are also read arcs [20] requiring tokens to be present in a place for a connected transition to fire but not consuming or producing the tokens. There are inhibitor arcs [21] that, on the contrary, prevent an otherwise enabled transition to fire if there are tokens in a specified place. Weighted arcs [22] specify exactly how many tokens are consumed or produced by a transaction, allowing more than one token to be moved along an arc. Finally, reset arcs [23] remove all tokens from respective places when a transition fires. Furthermore, the type of tokens being moved can also differ. Colored Petri nets [24] allow distinguishing between different types of objects or object instances by assigning colors to tokens. For the places, one can then define different capacities for tokens of different colors. The firing semantics of the transitions can also depend on the color. Finally, the arcs ultimately specify the colors of tokens to be consumed or produced. 2.3. Causal process knowledge in process mining As a forefather of the philosophy of causation, Hume [25] noted that all knowledge comes from experience and that it is based on associations between perceived events. Waldmann [26] adopted this idea in his work on knowledge-based causal induction, indicating causal directionality as the fundamental factor for determining how statistical correlations are understood. The term causation can be further differentiated by Pearl’s [27] three-level causal hierarchy highlighting the role of causal knowledge in helping to associate, intervene, or counterargue. Regarding causal knowledge concerning business processes, experts with years of acquired domain-specific experience represent a valuable resource for process improvement. Experience provides process experts with a precise understanding of causal relationships between individual activities of business processes. For instance, a process owner of an order-to-cash process might readily understand that a customer order eventually leads to an invoice being created. Intuitively, it is clear to the process owner that, oppositely, an invoice followed by the customer order would contradict the causal logic of the process [2]. In the process mining research field, most discovery algorithms do not leverage causal process knowledge [2, 4]. Instead, they consider data as the “single source of truths” to behaviors while overlooking domain-specific reasons. In an experimental setting, Rembert et al. [4] develop and test a process discovery algorithm that integrates prior knowledge. The results indicate that prior knowledge increases the robustness against noise, subsequently reducing the likelihood of measurement and ordering errors, particularly for processes with a higher degree of infrequent behavior. Similarly, Diamantini et al. [5] exploit knowledge in complex domains with highly variable processes as a means to repair event logs and produce more realistic models. Waibel et al. [2] use a causal template that helps process analysts integrate a causal order into discovering Conformance behavior type Path the processAllowed combinations structures withProhibited a focus on control-flow. Compared to approaches that do not integrate Skip-reverse/ Backjump domain knowledge, the approach by [2] generates much simpler models with higher 2 2 conformance to the defined causality by reducing the number of self-loops and spurious arcs. 1,3 1,3 Conformance Lu et al. [6] propose a semi-automated approach to detecting log patterns in process discovery, Refinement Rework using human reasoning to evaluate, modify, and extend pattern types. 2 2 1 1 3 3 Adjustment Correction 3. Multi-perspective path semantics (Variation) 2 2 Skip-reverse/ In this section, we present our vision of multi-perspective path semantics. To this end, we Backjump Hypothetical/ develop eight path semantics through visual characteristics related to shape and color. The 3 1 Conformance 1 3 semantics differ Reorder based on whether Disarray the path indicates desired or undesired behavior, observed or unobserved Shortcut behavior, and its flow direction. 1 1 Hypothetical/ Omitted 3.1. Conformance Simplification pathNegligence (Sloppiness) The semantics of the conformance path relates to a combination of desired and Conformance behavior type Path combinations Allowed observed behavior. On the one hand, desired behavior connects to the causal Prohibited Skip-reverse/ 2 process knowledge, which the analyst predefines as a working hypothesis. By that, 2 Backjump Conformance 1,3 analysts presuppose, based on their own expertise, that within the very specific 1,3 Refinement context, the process is supposed to flow through this particular path. It is implicitly assumed Rework 2 that process behavior not flowing through this path is understood as a deviation or at least as 2 1 3 an unexpected behavior. On the other hand, the conformance path additionally incorporates the 1 3 Adjustment observed behavior Correction that is recorded in the data source. Hence, the semantics of the conformance (Variation) path can be considered both an actually observed behavior in the source data and a desired Skip-reverse/ 2 behavior, as intended by the analyst. 2 Backjump Hypothetical/ The visual representation takes the form of an arrow with a solid gray line connected to a 3 filled arrowhead. The structure of this path runs angularly and has no curvatures. The intention 1 Conformance 1 3 Reorder behind the visualization is to convey the impression of desired behavior that is not explicitly Disarray Shortcut 1 prominent. 1 Hypothetical/ Omitted Simplification 3.2. Hypothetical Negligence path (Sloppiness) The semantics of the hypothetical path relates to an unobserved yet desired behavior. This means that the causal process knowledge allows the process to flow through this particular path with no record in the data confirming this behavior. If a hypothetical path occurs, this always implies that there is at least one other path option over which the process can flow as well. For instance, this can be attributable to the causal process knowledge allowing for the parallel execution of two activities with a time offset or a passage in the process that allows for an arbitrary choice of follow-up options. The visual representation takes the form of an arrow with a gray dashed line connected to a filled arrowhead with an identical angular course. The visualization is supposed to convey the indefinite characteristics of a non-conforming behavior. 2 2 Skip-reverse/ Backjump Hypothetical/ 3 1 Conformance Conformance behavior type 3 1 Allowed Reorder Disarray Prohibited 2 2 Shortcut 1 1 1,3 Hypothetical/ 1,3 Omitted Refinement Rework 3.3. Omitted path Simplification Negligence 2 2 (Sloppiness) 1 3 1 3 The semantics of the omitted path also relates to unobserved yet desired behavior. Adjustment Correction As opposed to the hypothetical path, here, the intended sequence flow is considered (Variation) mandatory, but with no data recorded that confirms its execution. If an omitted 2 2 path occurs, it can be concluded that certain process activities have been skipped unintentionally or that the order of activities has been reversed. 3 The visual representation takes the form of an arrow with a gray dashed line connected to an 1 1 3 Reorder unfilled (empty) arrowhead. It also runs in an angular course. The unfilled arrowhead should Disarray r type 1 give the impression that the path is mandatory, thus its exaggerated appearance. 1 Prohibited 2 Simplification 3.4. Allowed shortcut path Negligence 1,3 (Sloppiness) Rework The semantics of the allowed shortcut path relates to desired and observed behavior. 2 Therefore, the causal process knowledge indicates that the process is allowed to 1 3 skip one or more process activities without following up on them at later points, and the data recorded indicates that this, in fact, happened. If this path appears in Correction the model, a hypothetical path can be linked to it because a circuitous route via the activities 2 skipped, in reality, would also have been possible. 3 The visual representation takes the form of an arrow with a solid gray line connected to a 1 filled arrowhead. The structural appearance, again, follows an angular course with the intention Disarray rmance behavior type to convey expected and desired behavior. However, this path can often be recognized as flowing 1 Prohibited in parallel to the direction of other conformance paths. 2 Negligence 1,3 3.5. Prohibited shortcut path (Sloppiness) Rework 2 The semantics of the prohibited shortcut path relates to undesired yet observed 1 behavior. In this case, the causal process knowledge explicitly does not allow the process to jump over a specific activity but is recorded in the data. Here, at least one 3 Correction omitted path can be linked to the prohibited shortcut path because other activities 2 not intended to be executed were left out and not followed up on. The visual representation takes the form of an arrow with a solid red line connected to a filled arrowhead. In this case, as opposed to the allowed shortcut, it has a curvilinear course. 3 1 Disarray Here, contrast is to be conveyed in relation to the allowed shortcut path, which indicates an 1 undesirable behavior through the round and less structured-appearing course. n Negligence 3.6. Allowed backjump path (Sloppiness) The semantics of the allowed backjump path relates to observed and desired behavior. The causal process knowledge indicates that the process can jump back to already executed process activities. Once an allowed backjump occurs, there are multiple follow-up options leading to an increased level of complexity. The visual representation takes the form of an arrow with a solid gray line connected to a filled arrowhead. Thereby, the path runs in an angular course in the opposite direction of other conformance paths. Even though backjumps in processes may be negatively conjugated, this path semantics emphasizes the acceptance to repeat an activity already executed before. 3.7. Prohibited backjump path The semantics of the prohibited backjump path relates to observed yet undesired behavior. In this case, the causal process knowledge restricts the process from not returning to a previously performed activity despite the recorded data indicating otherwise. With these paths, a large variety of follow-up options becomes possible, which usually leads to higher degrees of complexity. The visual representation takes the form of an arrow with a solid red line connected to a filled arrowhead with a curvilinear course. In most cases, this path runs in the opposite direction of other conformance paths, indicating an undesired behavior. 3.8. Skip-reverse path The semantics of the skip-reverse path relates to observed yet undesired behavior, which inevitably occurs in combination. This means that the recorded behavior indicates that the process activities were executed in an order that the causal process knowledge forbids. This triggers at least one path that skips a considered follow-up activity and at least one reverse path that continues where another activity was left out. However, even though all intended process activities were executed, the order of activity execution was incorrect. If this path is shown in the model, an omitted path can always be linked to a pair of reverse and skipped paths. The visual representation takes the form of an arrow with a solid red line connected to a filled arrowhead with a curvilinear course. Together with the inseparably connected path to the follow-up activity and the omitted path between the activities in the original order, the path semantics intends to create a complex-appearing and slightly chaotic impression that conveys that something is not going as desired. 4. Pattern types This section briefly demonstrates how the context, meaning various arrangements of different path semantics, may emphasize an underlying behavior that is otherwise hard to distinguish when examining one single path semantics in isolation. We refer to these path semantics ensembles as patterns or pattern types on a process instance level. An example of this is the allowed backjump path, which could denote a wrong-order execution of events if it is proceeded by a hypothetical path rather than when proceeded by a conformance path. We identify eight possible patterns, all listed in Table 1. The patterns are divided into two categories, allowed and prohibited, whether they include allowed or prohibited path semantics. Each pattern is given a name for easy recognition of an underlying behavior. We also exemplify the patterns from the perspective of an order-to-cash process. To further enhance understandability, we define a pattern as a sequence of directly follows relations x𝑎1 Ñ 𝑎2 ... Ñ 𝑎𝑛 y that is part of a process instance, where 𝑎1 ...𝑎𝑛 are activities of the Conformance behavior type Path combinations Allowed Prohibited 2 2 Skip-reverse/ Backjump 1,3 1,3 2 2 Conformance behavior type Conformance Path Conformance behavior type 1,3 1,3 combinations Path Allowed Refinement Prohibited Rework combinations Allowed Prohibited 2 2 2 2 Skip-reverse/ 2 Conformance behavior 1,3type Backjump Skip-reverse/ 2 2 2 Path Conformance Backjump 1,3 1 1 2 2 1,3 3 3 1 1 combinations Table 1 Allowed ConformanceProhibited Refinement Rework 1,3 1,3 2 1,3 3 2 3 Conformance behavior type 1,3 1,3 Path An2 overview of some possible 2 pattern types that Refinement result Reworkfrom a unique Adjustment set Correctionof path semantics. The numbers Skip-reverse/ combinations 2 (Variation) Allowed 2 Backjump indicate 1,3 the execution Prohibited order, 1,3 with 2 "1" denoting the 2 first 2 execution. 22 2 Conformance 2 2 1 1 2 2 2 Skip-reverse/ Path combinations3 1Skip-reverse/ 3 1,3 2 1 1 1,3 1 Backjump 1,3 Refinement 1,3 Rework 3Backjump 2 3 3 2 1 3 1 2 2 3 Conformance nformance behavior type Adjustment Hypothetical/ Correction 3 3 Conformance behavior type Conformance1,3 1 Refinement 2 Backjump Rework Backjump/Shortcut (Variation)2 Adjustment 3 Correction 1,3 Backjump1 Shortcut 1 3 1 3 Allowed Prohibited (Variation) Prohibited 2 1 Conformance 1 2 Conformance Reorder 2 Hypothetical/Omitted 2 2 Disarray Hypothetical/Omitted 2 2 2 3 Skip-reverse/ 3 1 1 2 2 2 Backjump Skip-reverse/ 3 2 3 2 2 2 1 Backjump Hypothetical/ 1 Shortcut 1 1 2 2 1,3 3 Adjustment 2 23 Hypothetical/ 1 Correction 1,3 1,3 2 2 1 1 3 Hypothetical/ 1 1 Conformance 3 ior type (Variation) Conformance 1,3 3 1,3 1Omitted 1,3 3 1 3 1 3 1 3 1,3 ent Refinement ReworkRework AdjustmentAllowed Correction 1 1 1 3 Reorder 3 Disarray 3 Prohibited (Variation) 2 Refinement 2 Reorder Adjustment Disarray Simplification Reorder Negligence Simplification 2 Skip-reverse/ 2 2 2 (Sloppiness) Backjump Shortcut 1 2 2 12 2 2 2 1 Hypothetical/ Skip-reverse/ 2 1 1 Shortcut 2 Hypothetical/ 1 1 1 1 3 1,3 2 3 2 1 1 3 BackjumpConformance 3 1 Omitted Hypothetical/1 1 1 1 3 1 3 1 3 Hypothetical/ 3 1,3 Omitted 3 1,3 3 1 3 Rework 3 Adjustment Conformance ent 1 Correction Prohibited Correction Reorder Disarray 1 Simplification Negligence (Variation) n) 3 Rework Simplification Correction (Sloppiness)Negligence Disarray Negligence 2 Reorder Disarray (Sloppiness) 2 2 2 Shortcut 1 2 1 1 2 1 Hypothetical/ 3 1 same sequence with different activity types so that 𝑎 ‰ ... ‰ 𝑎𝑛 holds. The timestamps of two 1 1 ShortcutOmitted 1 3 1 Hypothetical/ 3 3 1 activities in a directly follows relation 𝑎 Ñ 𝑎 must be different so that 𝑡p𝑎1 q ă 𝑡p𝑎2 q holds. If 1 Correction Omitted 3 1 1 2 3 Simplification 1 Negligence Reorder an activity (type) Disarray is executed twice in the instance, the timestamps must be different so that 𝑎1(Sloppiness) Negligence Simplification Disarray 2 holds. The arrow symbol Ñ denotes a directly follows relations between two 𝑡p𝑎1 q ă 𝑡p𝑎11 q (Sloppiness) activities 𝑎1 and 𝑎2 in a pattern, which is further assigned a letter to indicate its path semantics 1 1 1 as either a backjump path Ñ𝐵 , a shortcut path Ñ𝑆 , a conformance path Ñ𝐶 , a hypothetical 3 1 Disarray Simplification path Ñ𝐻 , or an omitted path Ñ𝑂 . A crossed-out arrow Û emphasizes a prohibited path. Note Negligence ation that we do not differentiate between designed and executed sequences since the path semantics (Sloppiness) Negligence (Sloppiness) 1 imply it. Negligence 4.1. Allowed pattern types (Sloppiness) The allowed pattern types comprise path combinations, which are allowed by design. These include: refinement, adjustment, reorder, and simplification. Refinement is characterized by a backjump path that follows and is followed by two identical conformance paths leading to the emerging sequence x𝑎1 Ñ𝐶 𝑎2 Ñ𝐵 𝑎1 Ñ𝐶 𝑎2 y. Since the same directly follows relations 𝑎1 Ñ𝐶 𝑎2 is executed twice, this pattern indicates a revising behavior (e.g., when a customer proofreads an order detail before purchase). Adjustment is characterized by a conformance path followed by a backjump path, which sequentially is followed by a shortcut path, leading to the emerging sequence: x𝑎1 Ñ𝐶 𝑎2 Ñ𝐵 𝑎1 Ñ𝑆 𝑎3 y. Here, only the first activity 𝑎1 is executed twice (cf., refinement), thus indicating a re-routing of an intended activity sequence (e.g., when a customer cancels a requested credit card payment and, instead, decides to pay in installments). Reorder is characterized by a backjump path that follows a hypothetical path, which sequen- tially is followed by a shortcut path leading to the emerging sequence: x𝑎1 Ñ𝐻 𝑎2 Ñ𝐵 𝑎1 Ñ𝑆 𝑎3 y. This means that the intended activity sequence is executed in reverse (e.g., when an order foresees a purchase-for-delivery procedure when, in fact, customers purchase items (through bill) after delivery). Simplification is characterized by a shortcut path that skips a sequence of hypothetical paths, such that the following pattern occurs: x𝑎1 pÑ𝐻 ... Ñ𝐻 𝑎𝑛 , Ñ𝑆 𝑎𝑛`1 qy. This pattern suggests redundancies in the process design (e.g., when customers use an autofill function to fill out their demographic details before purchase). 4.2. Prohibited pattern types The prohibited pattern types are analogous to the allowed patterns but contain at least one prohibited path, thus indicating violations of an intended process design. These include: rework, correction, disarray, and negligence. Rework is the analog to refinement yet with prohibited paths, such that the following pattern emerges: x𝑎1 Ñ𝐶 𝑎2 Û𝐵 𝑎1 Ñ𝐶 𝑎2 y. Here, a refining behavior is instead a source of frustration or an unnecessary emendation (e.g., when a customer has to re-purchase an order after discovering the purchase of the wrong items). Correction is the analog to adjustment yet with prohibited paths, such that the following pattern emerges: x𝑎1 Ñ𝐶 𝑎2 Û𝐵 𝑎1 Û𝑆 𝑎3 y. In comparison, whereas an adjustment may improve a process towards a better outcome, in correction, an avoidable mistake is adjusted to prevent harm (e.g., when a customer must be contacted after they were able to purchase an order with a suspended credit card successfully). Disarray is the analog to reorder yet with prohibited paths, such that the following pattern emerges: x𝑎1 Ñ𝑂 𝑎2 Û𝐵 𝑎1 Û𝑆 𝑎3 y. Here, a rearrangement of (strict) protocol procedure is executed (e.g., when an order is marked as successful before verification). Negligence is the analog to simplification yet with prohibited paths, such that the following pattern emerges: x𝑎1 pÑ𝑂 ... Ñ𝑂 𝑎𝑛 , Û𝑆 𝑎𝑛`1 qy. Here, the complete skipping of an intended activity sequence is considered wrong rather than as an improvement (e.g., when a customer is warranted a replacement item after a filed complaint without the warranty not being properly inspected by the company). 5. Use cases Allowed patterns reflect expected behavior and give insights into how well a process is adopted. Prohibited patterns are more complex. In this section, we focus on prohibited pattern types as we expect more business value from their analysis. Therefore, we articulate three assumptions and review the pattern according to four performance dimensions relevant to business processes. In the second part of this section, we examine what a technical solution can look like and explain the business impacts that can be derived from a specific process instance of an order-to-cash process. We then further apply heuristics to improve the process. 5.1. Assumptions Previous research addresses the speed of technical development and its adoption in business, which leads to the clear call to action of transferring new developments in real-life use cases [28]. Some concepts are highly adopted in business, such as the Balanced Scorecard with its four Table 2 Impact of prohibited pattern types on the performance perspectives time, cost, and quality Performance Perspective Time Cost Quality Rework ÓÓ ÓÓ Ò Pattern Correction Ó l Ò Disarray l l Ó Negligence Ò Ò ÓÓ Legend: Ò Ò high positive impact, Ò medium positive impact, l no impact, Ó medium negative impact, Ó Ó high negative impact perspectives financial, customer, learning, and growth as well as internal business process [29]2 . The latter, namely the internal business process perspective, can be measured by the four performance dimensions of the devil’s quadrangle, namely, (1) time, (2) cost, (3) quality, and (4) flexibility [19]. We apply this to emphasize that the improvement of one or multiple perspectives results in less performance of at least one other perspective [19]. For this paper, we review only the prohibited pattern types of Section 4 and the impact on the four performance dimensions with three assumptions, knowing that under realistic conditions, there are cases that will not fulfill them. We use the following assumption: 1. With increasing complexity due to path variants and activity numbers, the average process instance duration increases 2. Every activity has a cost, mainly labor costs, resulting in a negative financial impact per executed activity. 3. Every activity adds value and, therefore, enhances the quality of the process outcome, resulting in better quality the more (planned) activities are performed. Based on these assumptions, the impact on process performance is summarized in Table 2. As no impact on flexibility is identified, we do not address this perspective. • The rework pattern repeats two events and adds two connections, resulting in a high negative impact on cost. The quality is benefiting from the rework, as it repairs an error. • The correction pattern repeats one event and adds one additional connection, resulting in a medium negative impact on time and costs. The quality is benefiting as an unexpected result is prohibited. • The disarray pattern impacts time and cost under respecting the assumption, but a medium negative impact on quality as the sequence of events is not followed. • The negligence pattern is skipping one event and having one connection less, resulting in a positive effect on costs and time. On the other hand, as an event is skipped, a high negative effect on quality is expected. 2 https://www.bain.com/insights/management-tools-and-trends-2023/ (Last accessed: 2023-12-10) 5.2. Example To illustrate the added value of multi-perspective path semantics for business process analysis, we showcase its application using a real-world example of an order-to-cash process observed at a German mid-sized company. To visualize this process, we use a tool of Noreja3 . This order-to- cash process starts with placing a customer order, which is followed by the preparation and shipping of digital or physical goods or services and ends with financial processing, including the posting of an invoice and receiving of cash. This first extract of a process instance in Figure 1 represents the pattern type disarray4 . After the event Create Delivery Note (left of Figure 1), the process continues with Receive Payment, skipping the actually desired follow-up activity Post Invoice. This leads to an undesired order of the events Post Invoice and Receive Payment that contradicts the causal process knowledge. Due to the particular semantics of the paths and their highlighting in red, process analysts can now directly identify this pattern in order to derive actions. The omitted paths indicate the desired process relations. In this case, the pattern indicates that the organization takes a financial risk, as matching the payment against the actual invoice is not secured. In this example, the payment terms for each customer order are highly different and optimized by the sales department in terms of discounts, overdue fines, etc. The lack of invoice-payment matching, therefore, causes significant problems. When receiving the payment before the invoice, the potential of overpayment or underpayment is given, resulting in lower customer satisfaction and more inaccurate financial planning. Over the last decades, several redesign methodologies have been developed to improve process performance, including redesign heuristics [19]. One of these heuristics is the case-base work that removes the processing of cases in batches or at specific points in time (e.g., every Monday). With handling individual cases, the time between two events can be shortened. Applying this heuristic on the event post invoice can significantly reduce the time from creating the delivery note to the invoice posting and reduce the risk of receiving payment with an invoice reference. Applying this heuristic will, under assumption 2, result in additional costs, as the invoice posting will happen more frequently. Another heuristic that can solve the downside of the case-based work is the activity automation. Automating the posting of the invoice and executing this event directly after the creation of the delivery note will reduce the time between these events and the costs for the posting. 6. Limitations and future research This paper presents multi-path semantics for process mining, pattern types, and possible applications. This research stream is novel and promising, yet covering it entirely in one study is infeasible. We acknowledge the limitations of our paper and use them to outline directions for future research. First, we must note the limitations of the scope of our paper. We explicitly exclude so-called change activities5 . In addition, as the primary goal of this paper is to highlight the necessity for 3 https://noreja.com (Last accessed: 2023-12-10) 4 The blue symbols of the events themselves are not part of our proposed concept and therefore not explained. 5 Change activities cannot be sorted into the causal logic of a process, as they may appear randomly due to unplanned occurrences. In an order-to-cash process, this takes the form of cancellations or price/quantity changes. Figure 1: Visual representation of the pattern type disarray in a single process instance of an order-to- cash process multi-path semantics rather than to provide an exhaustive list, we note that other semantics, including domain-specific ones, may be defined by future extensions. The same applies to the patterns presented in this paper. While we identified some critical patterns using the proposed multi-path semantics, additional patterns might also be observed, especially if new path semantics are added. Second, it must be noted that in this paper, we are only considering possible path semantics for single cases. However, new challenges will arise as we try to raise the level of abstraction (e.g., to a variant or even process level). Aggregating cases with paths of different semantics between the same activities or aggregating path patterns is a non-trivial task. For instance, if, in one case, activities 𝑎1 and 𝑎2 are connected via a conforming path. In another case, the same activities are only connected via an omitted path. Then, it remains unclear which semantics (and which visual representation) should be chosen when aggregating on a process level. Third, while we do provide some visual descriptions of the paths with different semantics, it must be noted that these descriptions should be treated as preliminary proposals for visualization rather than fixed recommendations or guidelines. We explicitly leave the specific visualization (e.g., a more differentiated coloring of paths) out of the scope of this paper. We also note that further visual additions can be made (e.g., additional icons near the arc ends) for improved visual differentiation and reduced cognitive load. Ultimately, we highlight that a user evaluation is required to validate the visualization approach. In future work, we plan to tackle the identified limitations. First, empirical research is needed to evaluate both the relevance of the proposed path semantics in practice as well as the suitability of various visualization approaches. Second, further path semantics and pattern types can be obtained using both empirical and explorative studies. Finally, the impact of the observed paths on flexibility – the fourth dimension of process performance – is yet to be studied. 7. Conclusion In this paper, we envisioned a foundation for multi-perspective path semantics in process mining. First, we presented eight distinct path semantics that can be derived by integrating causal process knowledge. Second, we defined pattern types as unique sets of path semantics on a process instance level to provide additional meaning about underlying behaviors. We demonstrated the benefits of our approach by exemplifying the semantics in a use-case scenario, examining various pattern types according to their business values using the devil’s quadrangle. In addition, we gave some example visualizations from an existing process mining system. Our work facilitates the interpretability and applicability of process mining outcomes by providing a more fine-grained view of path semantics. By linking causal process knowledge to visual elements, we further contribute by extending the graphical capabilities of process models. In combination, both aspects facilitate the fast and purposeful acquisition of process-related insights for analysts. In this way, our objective is to inspire future research to challenge the still prevailing representational bias [30] in process mining, which oftentimes distorts the true nature of the underlying process. Acknowledgments This work was supported by the Einstein Foundation Berlin [grant number EPP-2019-524, 2022] and Deutsche Forschungsgemeinschaft [grant number ME 3711/2-1]. References [1] W. M. P. van der Aalst, Process Mining - Data Science in Action, Second Edition, Springer, 2016. doi:10.1007/978-3-662-49851-4. [2] P. Waibel, L. Pfahlsberger, K. Revoredo, J. Mendling, Causal process mining from relational databases with domain knowledge, CoRR abs/2202.08314 (2022). URL: https://arxiv.org/ abs/2202.08314. [3] A. Yeshchenko, J. Mendling, A survey of approaches for event sequence analysis and visualization, Information Systems 120 (2024) 102283. doi:10.1016/j.is.2023.102283. [4] A. J. Rembert, A. Omokpo, P. Mazzoleni, R. Goodwin, Process discovery using prior knowledge, in: S. Basu, C. Pautasso, L. Zhang, X. Fu (Eds.), Service-Oriented Computing - 11th International Conference, ICSOC 2013, Berlin, Germany, December 2-5, 2013, Pro- ceedings, volume 8274 of Lecture Notes in Computer Science, Springer, 2013, pp. 328–342. doi:10.1007/978-3-642-45005-1\_23. [5] C. Diamantini, L. Genga, D. Potena, W. M. P. van der Aalst, Building instance graphs for highly variable processes, Expert Syst. Appl. 59 (2016) 101–118. [6] X. Lu, D. Fahland, R. Andrews, S. Suriadi, M. T. Wynn, A. H. M. ter Hofstede, W. M. P. van der Aalst, Semi-supervised log pattern detection and exploration using event concurrence and contextual information, in: OTM Conferences (1), volume 10573 of Lecture Notes in Computer Science, Springer, 2017, pp. 154–174. [7] W. Song, H. Jacobsen, C. Ye, X. Ma, Process discovery from dependence-complete event logs, IEEE Trans. Serv. Comput. 9 (2016) 714–727. doi:10.1109/TSC.2015.2426181. [8] W. J. Schroeder, K. M. Martin, 1 - Overview of visualization, in: C. D. Hansen, C. R. Johnson (Eds.), Visualization Handbook, Butterworth-Heinemann, Burlington, 2005, pp. 3–35. doi:10.1016/B978-012387582-2/50003-4. [9] J. J. Thomas, K. A. Cook (Eds.), Illuminating the Path: The Research and Development Agenda for Visual Analytics, National Visualization and Analytics Center, 2005. ISBN: 0-7695-2323-4. [10] M. C. F. de Oliveira, H. Levkowitz, From visual data exploration to visual data mining: A survey, IEEE Trans. Vis. Comput. Graph. 9 (2003) 378–394. doi:10.1109/TVCG.2003. 1207445. [11] W. Cui, Visual analytics: A comprehensive overview, IEEE Access 7 (2019) 81555–81573. doi:10.1109/ACCESS.2019.2923736. [12] T. Munzner, A nested model for visualization design and validation, IEEE Trans. Vis. Comput. Graph. 15 (2009) 921–928. doi:10.1109/TVCG.2009.111. [13] S. McKenna, D. Mazur, J. Agutter, M. D. Meyer, Design activity framework for visualization design, IEEE Trans. Vis. Comput. Graph. 20 (2014) 2191–2200. doi:10.1109/TVCG.2014. 2346331. [14] A. V. Moere, H. C. Purchase, On the role of design in information visualization, Inf. Vis. 10 (2011) 356–371. doi:10.1177/1473871611415996. [15] D. L. Moody, The “physics" of notations: Toward a scientific basis for constructing visual notations in software engineering, IEEE Trans. Software Eng. 35 (2009) 756–779. doi:10.1109/TSE.2009.67. [16] K. Backhaus, B. Erichson, S. Gensler, R. Weiber, T. Weiber, Multivariate Analysis: An Application-Oriented Introduction, 1 ed., Springer Gabler Wiesbaden, Wiesbaden, 2021. [17] L. Bartram, A. Patra, M. C. Stone, Affective color in visualization, in: Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, Denver, CO, USA, May 06-11, 2017, ACM, 2017, pp. 1364–1374. doi:10.1145/3025453.3026041. [18] C. A. Brewer, Chapter 7 - Color use guidelines for mapping and visualization, in: Modern Cartography Series, volume 2, Elsevier, 1994, pp. 123–147. doi:10.1016/ B978-0-08-042415-6.50014-4. [19] M. Dumas, M. L. Rosa, J. Mendling, H. A. Reijers, Fundamentals of Business Process Management, Springer, 2013. [20] W. Vogler, A. L. Semenov, A. Yakovlev, Unfolding and finite prefix for nets with read arcs, in: D. Sangiorgi, R. de Simone (Eds.), CONCUR ’98: Concurrency Theory, 9th International Conference, Nice, France, September 8-11, 1998, Proceedings, volume 1466 of Lecture Notes in Computer Science, Springer, 1998, pp. 501–516. doi:10.1007/BFb0055644. [21] M. H. T. Hack, Petri net language, Technical Report, USA, 1976. [22] W. Reisig, Petri Nets: An Introduction, volume 4 of EATCS Monographs on Theoretical Computer Science, Springer, 1985. doi:10.1007/978-3-642-69968-9. [23] T. Araki, T. Kasami, Some decision problems related to the reachability problem for petri nets, Theor. Comput. Sci. 3 (1976) 85–104. doi:10.1016/0304-3975(76)90067-0. [24] K. Jensen, Coloured petri nets and the invariant-method, Theor. Comput. Sci. 14 (1981) 317–336. doi:10.1016/0304-3975(81)90049-9. [25] D. Hume, An Enquiry Concerning Human Understanding, Hackett Publishing Company, Indianapolis, IN, 1977. Original work published 1748. [26] M. R. Waldmann, Knowledge-based causal induction, in: Psychology of Learning and Motivation, volume 34, Elsevier, 1996, pp. 47–88. [27] J. Pearl, The seven tools of causal inference, with reflections on machine learning, Commun. ACM 62 (2019) 54–60. doi:10.1145/3241036. [28] J. vom Brocke, M. Jans, J. Mendling, H. A. Reijers, Call for papers, Issue 5/2021, Business & Information Systems Engineering 62 (2020) 185–187. [29] R. S. Kaplan, D. P. Norton, The Balanced Scorecard: Translating strategy into action, Harvard Business School Press, Brighton, MA 02135, 1996. [30] W. M. P. van der Aalst, J. C. A. M. Buijs, B. F. van Dongen, Towards improving the representational bias of process mining, in: K. Aberer, E. Damiani, T. S. Dillon (Eds.), Data-Driven Process Discovery and Analysis - First International Symposium, SIMPDA 2011, Campione d’Italia, Italy, June 29 - July 1, 2011, Revised Selected Papers, volume 116 of Lecture Notes in Business Information Processing, Springer, 2012, pp. 39–54. doi:10.1007/ 978-3-642-34044-4\_3.