<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Beyond Temporal Relationships: Causal Support in Declarative Process Modeling</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Luca Giuliani</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrea Zecchini</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Bologna, Department of Computer Science and Engineering</institution>
          ,
          <addr-line>DISI</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>Process discovery algorithms extract knowledge about processes by analyzing temporal relationships only, often disregarding any additional data available in the log. Following recent trends on causally-enhanced Business Process Mining, we propose an approach that leverages causal discovery techniques to detect the underlying relationships between data features and events. The acquired knowledge is then used to complement an existing declarative process model by measuring the causal support between pairs of events, allowing to enhance its robustness and clarity. We use an example on synthetic data to discuss advantages and limitations of the approach.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Declarative Process Modeling</kwd>
        <kwd>Process Discovery</kwd>
        <kwd>Causal Discovery</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
    </sec>
    <sec id="sec-2">
      <title>2. Background and State Of The Art</title>
      <p>
        Recent progress in the causal discovery and causal inference community has led to incorporating such
techniques into process mining applications as well. Polyvyanyy et al. introduce causality mining [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ],
a framework aimed at uncovering causal relationships among events proximity in terms of temporal,
spatial, and feature similarity. In [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], Hompes et al. propose a method that employs time series analysis
to identify causal relationships between the aggregate properties of a process and other performance
indicators. Lastly, Dasht Bozorgi et al. present a technique leveraging uplift trees to assess the causal
impact of interventions within the domain of prescriptive process monitoring [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], while Alaee et al.
focus as well on prescriptive (what-if) analysis although employing constraint-based causal discovery
algorithms along with temporal constraints mined from the event log [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>
        A diferent perspective is pursued in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], where Fournier et al. strictly concentrate on causal modeling
rather than causal inference, aimed at constructing what is referred to as Causal Business Process
access
purchase
access
purchase
login
shipping
(a) True
login
shipping
(c) Max
feedback
feedback
access
purchase
access
purchase
login
shipping
(b) Plain
login
shipping
(d) Dif
feedback
feedback
Model. Our methodology aligns with both their goals and techniques, although we underline three
primary distinctions: (a) instead of exclusively depending on execution times, we also leverage the
supplementary data available in the log; (b) our approach is applied to Declarative Process Mining
rather than Procedural; and (c) we employ constraint-based instead of noise-based causal discovery
algorithms, as they allow us to aggregate multivariate data via ad-hoc independence tests, while still
ensuring the convergence of edge orientations thanks to prior temporal knowledge drawn from the log.
Declarative Process Mining The DECLARE framework allows to represent processes in terms
of high-level constraints mapped into Linear Temporal Logic (LTL) formulae. As opposed to
procedural models, declarative ones do not impose a rigid structure in the sequence of events, making it
advantageous for processes that are loosely defined or demand constant adaptation. There exist several
algorithms to extract declarative models from event logs. E.g, Declare Miner [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] selects frequently
occurring sets of activities based on their support, later examining each possible set to find the optimal
model. Similarly, in NegDis [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], constraint supports are retrieved through linear verification of regular
expressions in Go, exploiting the theoretical equivalence between Linear Temporal Logic on Finite
Traces (LTLf) and Regular Expressions (RegEx), allowing for a much greater scalability.
Causal Discovery Causal discovery is concerned with the identification of causal relationships
among variables. As these relationships are unidirectional, they are often represented as a Directed
Acyclic Graph (DAG) with one node per variable and one edge for each direct relationship between
them. Some of the most known algorithms are Peter-Clark (PC), Fast Causal Inference (FCI), Greedy
Equivalence Search (GES), and Linear Non-Gaussian Acyclic Model (LiNGAM). In certain algorithms,
some arcs may lack a direction when the available data do not ofer enough evidence to determine it; in
such cases, domain knowledge about the process can be useful to discard inconsistent orientations [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Causal Support for DECLARE Rules</title>
      <p>To illustrate our approach, we use a synthetically generated log depicting user actions on an e-commerce
platform. Every trace begins with an access event, which solely includes time information. Subsequently,</p>
    </sec>
    <sec id="sec-4">
      <title>NegDis Support (↓)</title>
      <p>Causal Support</p>
      <p>Max</p>
      <p>Dif
Response(access,purchase)
Response(access,login)
Response(login,purchase)
Response(access,shipping)
Response(purchase,shipping)
Response(access,feedback)
Response(login,feedback)
Response(purchase,feedback)
Response(login,shipping)
Response(shipping,feedback)
Response(purchase,login)
Response(shipping,login)
Response(feedback,shipping)
in 80% of the cases users login – with their age and gender recorded along with the timestamp –, or
they may directly proceed to a purchase, during which the number of items and total price are noted.
Users could also login after a purchase, allowing them to submit a feedback, which takes place after
these two events in 60% of scenarios where both are present; we underline that login and purchase
are not causally related, and may be interleaved in the log traces. Finally, a shipping event is appended
after only if the price is below 50 euros – otherwise, free shipping applies and no entry is stored.</p>
      <p>
        Our approach starts from previously mined set of declarative rules, obtained using NegDis [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. For
simplicity, we exclusively focus on Response(a,b) constraints, which naturally carry causal semantics
by connecting two events a and b with a causal relationship. The algorithm then proceeds as follows:
1. For each distinct trace in the log, the data of the involved events is collected into a tabular dataset.
2. A tailored variant of the PC algorithm [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] is employed to uncover the underlying causal structure
within each dataset. Specifically, we customize the PC algorithm as follows:
a) we build one node for each event rather than one for each data feature, conducting
independence tests directly on multivariate variables by assessing the correlation between two
events as the highest correlation among pairs of their individual features;
b) once independence tests are carried out and before edges are oriented, we inject temporal
prior knowledge by forbidding directions that contradict the event sequence.
3. The obtained causal graphs are aggregated into a single one by computing the strength of each
edge as the ratio of instances where a causal link was detected to the overall occurrences of that
event pair in a trace, and later processed according to three strategies:
Plain no further processing is applied, retaining bidirectional edges as they are;
Max for each bidirectional edge, we retain the stronger direction only, or both for equal strengths;
Dif
for each bidirectional edge, we retain the stronger direction only but adjust its strength by
subtracting that of the weaker edge, or none for equal strengths.
4. Finally, the causal support for each event pair is determined as the maximum weight among the
paths connecting them, where a weight is defined as the minimum strength among its edges.
      </p>
      <p>Figure 1(a) shows the true causal graph of the process, while Figures 1(b) to 1(d) illustrate the retrieved
causal graphs obtained using the three distinct aggregation strategies. Notably, the Dif method appears
to be the most reliable, successfully eliminating spurious correlations between many event pairs. This
observation is also reflected in Table 1, where the computed causal supports for each strategy are
reported, sorted by their supports yielded by NegDis based solely on the traces. Despite all three
strategies indicating low causal supports for non-causal relationships – marked in red –, only Dif
reports zero scores for all of them apart from Response(login, shipping), which still has a noticeably
decreased support relative to other strategies. Overall, an appropriate threshold could be determined to
diferentiate between causal and non-causal relationships, allowing for example to replace the Response
constraint with another binary constraint that implies no causal semantics, such as CoExistence.</p>
    </sec>
    <sec id="sec-5">
      <title>4. Conclusions</title>
      <p>We introduced a method for refining declarative process models by computing the support of a DECLARE
rule based on the discovered causal relationship among its events. Our technique employs the PC
algorithm to identify the causal structure of a process using additional data available in its log file,
later exploiting this knowledge to rank each previously mined rule according to their causal support.
Examples on a synthetically generated log demonstrated that such integration between declarative
process mining and causal discovery may ofer significant advantages and is worth further investigation.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>Funded by the European Union – Next Generation EU within the framework of the National Recovery
and Resilience Plan NRRP – Mission 4 “Education and Research” – Component 2 - Investment 1.1
“National Research Program and Projects of Significant National Interest Fund (PRIN)” - Call PRIN 2022
- D.D. n. 104 of 02/02/2022 - Project: “Probabilistic Declarative Process Mining” (PRODE).</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>The authors used Writefull for paraphrase and reword. After using this tool, the authors reviewed and
edited the content as needed and take full responsibility for the publication’s content.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Polyvyanyy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Pika</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. T.</given-names>
            <surname>Wynn</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. H.</surname>
          </string-name>
          <article-title>ter</article-title>
          <string-name>
            <surname>Hofstede</surname>
          </string-name>
          ,
          <article-title>A systematic approach for discovering causal dependencies between observations and incidents in the health and safety domain</article-title>
          ,
          <source>Safety Science</source>
          <volume>118</volume>
          (
          <year>2019</year>
          )
          <fpage>345</fpage>
          -
          <lpage>354</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>B.</given-names>
            <surname>Hompes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Maaradji</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. La</given-names>
            <surname>Rosa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dumas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Buijs</surname>
          </string-name>
          , W. Aalst,
          <article-title>Discovering causal factors explaining business process performance variation</article-title>
          ,
          <year>2017</year>
          , pp.
          <fpage>177</fpage>
          -
          <lpage>192</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Dasht Bozorgi</surname>
          </string-name>
          , I. Teinemaa,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dumas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. La</given-names>
            <surname>Rosa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Polyvyanyy</surname>
          </string-name>
          ,
          <article-title>Prescriptive process monitoring based on causal efect estimation</article-title>
          ,
          <source>Information Systems</source>
          <volume>116</volume>
          (
          <year>2023</year>
          )
          <fpage>102198</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A. J.</given-names>
            <surname>Alaee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Weidlich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Senderovich</surname>
          </string-name>
          ,
          <article-title>Data-driven decision support for business processes: Causal reasoning and discovery</article-title>
          , in: International Conference on Business Process Management, Springer,
          <year>2024</year>
          , pp.
          <fpage>90</fpage>
          -
          <lpage>106</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>F.</given-names>
            <surname>Fournier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Limonad</surname>
          </string-name>
          , I. Skarbovsky,
          <string-name>
            <surname>Y. David,</surname>
          </string-name>
          <article-title>The why in business processes: Discovery of causal execution dependencies</article-title>
          , KI-Künstliche
          <string-name>
            <surname>Intelligenz</surname>
          </string-name>
          (
          <year>2025</year>
          )
          <fpage>1</fpage>
          -
          <lpage>23</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>F. M.</given-names>
            <surname>Maggi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Di Ciccio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Di Francescomarino</surname>
          </string-name>
          ,
          <string-name>
            <surname>T.</surname>
          </string-name>
          <article-title>Kala, Parallel algorithms for the automated discovery of declarative process models</article-title>
          ,
          <source>Information Systems</source>
          <volume>74</volume>
          (
          <year>2018</year>
          )
          <fpage>136</fpage>
          -
          <lpage>152</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>F.</given-names>
            <surname>Chesani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. D.</given-names>
            <surname>Francescomarino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Ghidini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Loreti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. M.</given-names>
            <surname>Maggi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Mello</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Montali</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Tessaris</surname>
          </string-name>
          ,
          <article-title>Process discovery on deviant traces and other stranger things</article-title>
          ,
          <source>IEEE Trans. Knowl. Data Eng</source>
          .
          <volume>35</volume>
          (
          <year>2023</year>
          )
          <fpage>11784</fpage>
          -
          <lpage>11800</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A. R.</given-names>
            <surname>Nogueira</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Pugnana</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ruggieri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Pedreschi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gama</surname>
          </string-name>
          ,
          <article-title>Methods and tools for causal discovery and causal inference, Wiley interdisciplinary reviews: data mining and knowledge discovery 12 (</article-title>
          <year>2022</year>
          )
          <article-title>e1449</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>P.</given-names>
            <surname>Spirtes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Glymour</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Scheines</surname>
          </string-name>
          ,
          <string-name>
            <surname>Causation</surname>
          </string-name>
          , Prediction, and
          <string-name>
            <surname>Search</surname>
          </string-name>
          , The MIT Press,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>