1. Introduction

Beyond Temporal Relationships: Causal Support in Declarative Process Modeling

Luca Giuliani

Andrea Zecchini

0 0 University of Bologna, Department of Computer Science and Engineering , DISI

Process discovery algorithms extract knowledge about processes by analyzing temporal relationships only, often disregarding any additional data available in the log. Following recent trends on causally-enhanced Business Process Mining, we propose an approach that leverages causal discovery techniques to detect the underlying relationships between data features and events. The acquired knowledge is then used to complement an existing declarative process model by measuring the causal support between pairs of events, allowing to enhance its robustness and clarity. We use an example on synthetic data to discuss advantages and limitations of the approach.

eol>Declarative Process Modeling Process Discovery Causal Discovery

1. Introduction 2. Background and State Of The Art

Recent progress in the causal discovery and causal inference community has led to incorporating such techniques into process mining applications as well. Polyvyanyy et al. introduce causality mining [ 1 ], a framework aimed at uncovering causal relationships among events proximity in terms of temporal, spatial, and feature similarity. In [ 2 ], Hompes et al. propose a method that employs time series analysis to identify causal relationships between the aggregate properties of a process and other performance indicators. Lastly, Dasht Bozorgi et al. present a technique leveraging uplift trees to assess the causal impact of interventions within the domain of prescriptive process monitoring [ 3 ], while Alaee et al. focus as well on prescriptive (what-if) analysis although employing constraint-based causal discovery algorithms along with temporal constraints mined from the event log [ 4 ].

A diferent perspective is pursued in [ 5 ], where Fournier et al. strictly concentrate on causal modeling rather than causal inference, aimed at constructing what is referred to as Causal Business Process access purchase access purchase login shipping (a) True login shipping (c) Max feedback feedback access purchase access purchase login shipping (b) Plain login shipping (d) Dif feedback feedback Model. Our methodology aligns with both their goals and techniques, although we underline three primary distinctions: (a) instead of exclusively depending on execution times, we also leverage the supplementary data available in the log; (b) our approach is applied to Declarative Process Mining rather than Procedural; and (c) we employ constraint-based instead of noise-based causal discovery algorithms, as they allow us to aggregate multivariate data via ad-hoc independence tests, while still ensuring the convergence of edge orientations thanks to prior temporal knowledge drawn from the log. Declarative Process Mining The DECLARE framework allows to represent processes in terms of high-level constraints mapped into Linear Temporal Logic (LTL) formulae. As opposed to procedural models, declarative ones do not impose a rigid structure in the sequence of events, making it advantageous for processes that are loosely defined or demand constant adaptation. There exist several algorithms to extract declarative models from event logs. E.g, Declare Miner [ 6 ] selects frequently occurring sets of activities based on their support, later examining each possible set to find the optimal model. Similarly, in NegDis [ 7 ], constraint supports are retrieved through linear verification of regular expressions in Go, exploiting the theoretical equivalence between Linear Temporal Logic on Finite Traces (LTLf) and Regular Expressions (RegEx), allowing for a much greater scalability. Causal Discovery Causal discovery is concerned with the identification of causal relationships among variables. As these relationships are unidirectional, they are often represented as a Directed Acyclic Graph (DAG) with one node per variable and one edge for each direct relationship between them. Some of the most known algorithms are Peter-Clark (PC), Fast Causal Inference (FCI), Greedy Equivalence Search (GES), and Linear Non-Gaussian Acyclic Model (LiNGAM). In certain algorithms, some arcs may lack a direction when the available data do not ofer enough evidence to determine it; in such cases, domain knowledge about the process can be useful to discard inconsistent orientations [ 8 ].

3. Causal Support for DECLARE Rules

To illustrate our approach, we use a synthetically generated log depicting user actions on an e-commerce platform. Every trace begins with an access event, which solely includes time information. Subsequently,

NegDis Support (↓)

Causal Support

Max

Dif Response(access,purchase) Response(access,login) Response(login,purchase) Response(access,shipping) Response(purchase,shipping) Response(access,feedback) Response(login,feedback) Response(purchase,feedback) Response(login,shipping) Response(shipping,feedback) Response(purchase,login) Response(shipping,login) Response(feedback,shipping) in 80% of the cases users login – with their age and gender recorded along with the timestamp –, or they may directly proceed to a purchase, during which the number of items and total price are noted. Users could also login after a purchase, allowing them to submit a feedback, which takes place after these two events in 60% of scenarios where both are present; we underline that login and purchase are not causally related, and may be interleaved in the log traces. Finally, a shipping event is appended after only if the price is below 50 euros – otherwise, free shipping applies and no entry is stored.

Our approach starts from previously mined set of declarative rules, obtained using NegDis [ 7 ]. For simplicity, we exclusively focus on Response(a,b) constraints, which naturally carry causal semantics by connecting two events a and b with a causal relationship. The algorithm then proceeds as follows: 1. For each distinct trace in the log, the data of the involved events is collected into a tabular dataset. 2. A tailored variant of the PC algorithm [ 9 ] is employed to uncover the underlying causal structure within each dataset. Specifically, we customize the PC algorithm as follows: a) we build one node for each event rather than one for each data feature, conducting independence tests directly on multivariate variables by assessing the correlation between two events as the highest correlation among pairs of their individual features; b) once independence tests are carried out and before edges are oriented, we inject temporal prior knowledge by forbidding directions that contradict the event sequence. 3. The obtained causal graphs are aggregated into a single one by computing the strength of each edge as the ratio of instances where a causal link was detected to the overall occurrences of that event pair in a trace, and later processed according to three strategies: Plain no further processing is applied, retaining bidirectional edges as they are; Max for each bidirectional edge, we retain the stronger direction only, or both for equal strengths; Dif for each bidirectional edge, we retain the stronger direction only but adjust its strength by subtracting that of the weaker edge, or none for equal strengths. 4. Finally, the causal support for each event pair is determined as the maximum weight among the paths connecting them, where a weight is defined as the minimum strength among its edges.

Figure 1(a) shows the true causal graph of the process, while Figures 1(b) to 1(d) illustrate the retrieved causal graphs obtained using the three distinct aggregation strategies. Notably, the Dif method appears to be the most reliable, successfully eliminating spurious correlations between many event pairs. This observation is also reflected in Table 1, where the computed causal supports for each strategy are reported, sorted by their supports yielded by NegDis based solely on the traces. Despite all three strategies indicating low causal supports for non-causal relationships – marked in red –, only Dif reports zero scores for all of them apart from Response(login, shipping), which still has a noticeably decreased support relative to other strategies. Overall, an appropriate threshold could be determined to diferentiate between causal and non-causal relationships, allowing for example to replace the Response constraint with another binary constraint that implies no causal semantics, such as CoExistence.

4. Conclusions

We introduced a method for refining declarative process models by computing the support of a DECLARE rule based on the discovered causal relationship among its events. Our technique employs the PC algorithm to identify the causal structure of a process using additional data available in its log file, later exploiting this knowledge to rank each previously mined rule according to their causal support. Examples on a synthetically generated log demonstrated that such integration between declarative process mining and causal discovery may ofer significant advantages and is worth further investigation.

Acknowledgments

Funded by the European Union – Next Generation EU within the framework of the National Recovery and Resilience Plan NRRP – Mission 4 “Education and Research” – Component 2 - Investment 1.1 “National Research Program and Projects of Significant National Interest Fund (PRIN)” - Call PRIN 2022 - D.D. n. 104 of 02/02/2022 - Project: “Probabilistic Declarative Process Mining” (PRODE).

Declaration on Generative AI

The authors used Writefull for paraphrase and reword. After using this tool, the authors reviewed and edited the content as needed and take full responsibility for the publication’s content.

[1]

Polyvyanyy ,

Pika ,

M. T.

Wynn , A. H.

ter

Hofstede , A systematic approach for discovering causal dependencies between observations and incidents in the health and safety domain , Safety Science 118 ( 2019 ) 345 - 354 .

[2]

Hompes ,

Maaradji ,

M. La

Rosa ,

Dumas ,

Buijs , W. Aalst, Discovering causal factors explaining business process performance variation , 2017 , pp. 177 - 192 .

[3]

Dasht Bozorgi , I. Teinemaa,

Dumas ,

M. La

Rosa ,

Polyvyanyy , Prescriptive process monitoring based on causal efect estimation , Information Systems 116 ( 2023 ) 102198 .

[4]

A. J.

Alaee ,

Weidlich ,

Senderovich , Data-driven decision support for business processes: Causal reasoning and discovery , in: International Conference on Business Process Management, Springer, 2024 , pp. 90 - 106 .

[5]

Fournier ,

Limonad , I. Skarbovsky, Y. David, The why in business processes: Discovery of causal execution dependencies , KI-Künstliche Intelligenz ( 2025 ) 1 - 23 .

[6]

F. M.

Maggi ,

Di Ciccio ,

Di Francescomarino , T. Kala, Parallel algorithms for the automated discovery of declarative process models , Information Systems 74 ( 2018 ) 136 - 152 .

[7]

Chesani ,

C. D.

Francescomarino ,

Ghidini ,

Loreti ,

F. M.

Maggi ,

Mello ,

Montali ,

Tessaris , Process discovery on deviant traces and other stranger things , IEEE Trans. Knowl. Data Eng . 35 ( 2023 ) 11784 - 11800 .

[8]

A. R.

Nogueira ,

Pugnana ,

Ruggieri ,

Pedreschi ,

Gama , Methods and tools for causal discovery and causal inference, Wiley interdisciplinary reviews: data mining and knowledge discovery 12 ( 2022 ) e1449 .

[9]

Spirtes ,

Glymour ,

Scheines , Causation , Prediction, and Search , The MIT Press, 2001 .