<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <article-id pub-id-type="doi">10.1007/978-3-642-40176-3\_26</article-id>
      <title-group>
        <article-title>Conformance Checking Beyond Replay Methods: Closing The Gap To Real-World Adoption</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Eduardo Goulart Rocha</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Celonis Labs GmbH</institution>
          ,
          <addr-line>Munich</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Process And Data Science (PADS) Chair - RWTH Aachen University</institution>
          ,
          <addr-line>Aachen</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2013</year>
      </pub-date>
      <volume>8094</volume>
      <fpage>26</fpage>
      <lpage>30</lpage>
      <abstract>
        <p>Conformance checking is a key subfield of process mining. Despite its importance, there is relatively little adoption of advanced (procedural) conformance checking methods in the industry, which can be attributed to two limitations of state-of-the-art techniques: first, the poor scalability resulting from their inherent worstcase exponential nature. Second, the high cognitive load needed to interpret their results, which renders them inaccessible to non-experts. This Ph.D. project aims to close these gaps by developing conformance checking techniques suitable for industrial settings. Our research is structured into two workstreams. First, we investigate eficient techniques when one only needs to compute a single number to quantify the degree of discrepancy between an event log and a process model. In parallel, we investigate how to generate conformance diagnostics that can be easily understood without technical enablement. In both workstreams, there has been progress in terms of accepted papers that validate the problem and our proposed solutions.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Process Mining</kwd>
        <kwd>Conformance Checking</kwd>
        <kwd>Conformance Diagnostics</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Conformance checking is concerned with identifying and quantifying discrepancies between event logs
and process models. It is a central task in process mining, being an enabler for tasks such as process
discovery (by measuring the degree of discrepancy between the log and the model) and enhancement
(by correlating patterns of deviation with KPI performance). Despite that, state-of-the-art replay-based
conformance checking techniques such as token-based replay [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and trace alignments [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] are not
widely adopted in industry, with most commercial tools lacking support for them.
      </p>
      <p>This Ph.D. project is motivated by problems encountered in industry when trying to productize
conformance checking techniques. In particular, we learn from two key limitations of existing methods
(with a special focus on trace alignments) which, in our experience, explain their poor adoption: first,
their poor scalability, and second, the hard-to-digest diagnostics produced by them. Our research is
split into two workstreams, depending on the conformance task to be solved.</p>
      <p>
        Task 1 (Computing a Metric) The simplest task of conformance checking is to quantify the degree of
discrepancy between an event log and a process model as a single number metric. On the one hand,
conformance metrics are expected to ofer a series of quality guarantees such as determinism, monotonicity,
and robustness to partial mismatches. On the other hand, applications such as optimization-based
process discovery [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and online monitoring of the conformance rate requires computationally eficient
metrics. Unfortunately, none of existing techniques meet both criteria.
      </p>
      <p>
        As shown in [
        <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
        ], most existing techniques do not provide suficient quality guarantees. More
critically, in terms of scalability, existing techniques do not scale to very large event logs that are
commonly found in industry. For comparison, while the largest BPI Challenge dataset (2016) contains
less than 20 million events (and this is arguably one of the least studied public datasets), in industry
      </p>
      <p>RQ Task
1.1 Introduce method for process trees
1.2 Extend to free-choice Petri nets
1.3 Formalize the stochastic problem
1.4 Extend to stochastic process trees
2.1 Define the diagnostics framework
2.2 Discover data and time patterns
2.3 Discover arbitrary patterns
2.4 Minimize arbitrary patterns</p>
      <p>
        Status / Open Challenges
Published [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]
Improve the approximation
Accepted (to appear) [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]
Decompose stochastic models
Published [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]
Scalability and decidability
Discovery and verbalization
Published [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]
one often encounters event logs with more than one billion (sometimes more than ten billion) events.
Furthermore, most techniques do not ofer runtime guarantees, meaning that it is hard to predict when
a computation will succeed and that one must resort to timeouts instead, which leads to poor customer
experience. This motivates our first research question:
RQ 1: How to eficiently measure the conformance rate of very large event logs while providing
quality and runtime guarantees?
Task 2 (Providing Diagnostics) A more advanced task in conformance checking is to identify
deviation patterns that provide insights into the nature of non-conformance. For this task, alignment-based
techniques [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] are considered state-of-the-art. However, alignments sufer from multiple shortcomings:
1. The produced diagnostics are too low-level, in the form of insertion and deletion operations,
which do not explain the nature of the deviation.
2. The same insertion/deletion operation admits diferent interpretations depending on the context.
3. Alignments are “non-deterministic”. A trace often admits multiple optimal alignments inducing
diferent diagnostics.
      </p>
      <p>
        Correctly interpreting the result of alignment techniques requires awareness of the above limitations,
raising the entry bar for non-experts. This problem has been acknowledge in the literature [
        <xref ref-type="bibr" rid="ref6 ref7 ref8">6, 7, 8</xref>
        ], but
not thoroughly researched. This motivates our second research question:
RQ 2: How to identify patterns of non-conformance that are easily interpretable by non-expert users?
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Research Agenda and Methodology</title>
      <p>Our research agenda is summarized in Table 1.</p>
      <sec id="sec-2-1">
        <title>2.1. Eficiently Measuring Conformance Via Subtraces</title>
        <p>
          To answer the first research question, we build on the technique introduced in [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. The proposed
conformance metrics are based on comparing the Markovian abstraction of the event log and the
process model. The technique ofers a series of quality guarantees (see [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]). Furthermore, it brings
two advantages in terms of scalability. First, it avoids computing the synchronous product between
both artifacts. Second, the Markovian abstraction of event logs can be computed with a linear pass
over the log, which scales to very large event logs. However, the method proposed in [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] is worst-case
exponential in the size of the process model. Our research focuses on how to eficiently compute this
abstraction for large process models. A first milestone (1.1) of presenting a compositional
polynomialtime approach for process trees has been reached [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. Additionally, we extended (1.3) the abstraction to
the stochastic perspective [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ].
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Extend to free-choice Petri nets (1.2) and stochastic process trees (1.4): For free-choice Petri</title>
        <p>nets, we can already provide a polynomial-time approximation method based on decomposing the
net and we are currently working on improving the approximation. For stochastic process trees, we
might need diferent approaches since compositional methods do not work well with their stochastic
perspective.</p>
        <p>Evaluation: This research focuses on a well-studied problem in process mining. We resort to standard
evaluation setups consisting of real-world and synthetic datasets to measure the runtime of the approach
and to compare the induced model ranks with existing state-of-the-art techniques. The scalability of
the approach must be measured across multiple dimensions: the size of (the state space of) the model,
the size of the log, the number of distinct event types, and the degree of discrepancy between the model
and the log.</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.2. Mining Behavioral Patterns for Conformance Diagnostics</title>
        <p>
          To answer the second research question, we first observe that diagnostics provided by declarative
methods are often more understandable to non-expert users, however, declarative models are unsuitable
to model certain types of processes [
          <xref ref-type="bibr" rid="ref13">13, 14</xref>
          ]. Our proposed solution is to combine both paradigms. The
approach is sketched in Figure 1. First, (control-flow) declarative constraints are discovered from a
user-provided procedural process model (step A), producing a declarative model that is "equivalent" to
the original model. Next, this model is minimized to remove redundancies (B) and used to verify the
event log (C).
        </p>
        <sec id="sec-2-3-1">
          <title>Constraint</title>
          <p>Templates</p>
        </sec>
        <sec id="sec-2-3-2">
          <title>Procedural Model</title>
          <p>A. Discover
Constraints</p>
          <p>Declarative</p>
          <p>Model</p>
          <p>B. Constraint
Minimization</p>
          <p>C. Verify</p>
          <p>Log</p>
          <p>Diagnostics
Log</p>
          <p>
            Using the discovered constraints to verify the event log ensures that the diagnostics are on a higher
level and, thus, more understandable. Furthermore, the approach enjoys good quality properties such
as determinism and monotonicity of the generated diagnostics. Our first work [
            <xref ref-type="bibr" rid="ref11">11</xref>
            ] introduces the
framework above (2.1) and demonstrates how to use it to derive understandable conformance diagnostics.
The next steps in this research involve refining each framework step. One of which, namely extending
the constraint minimization step (B) to arbitrary models (2.4), is already completed [
            <xref ref-type="bibr" rid="ref12">12</xref>
            ].
Discover Time and Data Patterns (2.2): A natural extension of the framework above is to consider
the time and data perspectives [15]. However, this raises questions on feasibility. Reasoning tasks
involving the data perspective are computationally expensive. For time constraints, these can quickly
turn undecidable [16, 17].
          </p>
          <p>Discover Arbitrary Patterns (2.3): Currently, the framework is limited to a set of constraint template
patterns (such as the set of DECLARE patterns) which might not best capture a deviation. We would like
to discover arbitrary behavioral patterns, i.e. arbitrary logical formulae beyond a fixed set of templates.
For that, we must solve two challenges: First, we must devise an algorithm to discover arbitrary logical
formulae (step A). Second, we must devise a mechanism to verbalize the discovered formulae to the
user (step C). We believe that the latter can be tackled to some extend using NLP techniques such as
rule-based methods or large language models. Therefore, we focus our attention on the first problem,
for which our current idea is to extend existing methods [18] to work in our setting. In parallel, we are
also evaluating ad hoc approaches that work by analyzing the structure of the process models [19].
Evaluation: Our evaluation will be primarily qualitative. To validate that our diagnostics are
understandable, we plan to conduct a user study. However, user studies sufer from reproducibility issues.
Therefore, we also aim to develop a set of reference process models for public datasets and demonstrate
our method on top of them. With that, we aim to develop a “benchmark” for conformance diagnostics.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Acknowledgments</title>
      <p>This Ph.D. project is a cooperation between the RWTH Aachen University and Celonis Labs GmbH. It
is supervised by Prof. Dr. Ir. Wil van der Aalst and fully funded by Celonis.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Rozinat</surname>
          </string-name>
          ,
          <string-name>
            <surname>W. M. P. van der Aalst</surname>
          </string-name>
          ,
          <article-title>Conformance checking of processes based on monitoring real behavior</article-title>
          ,
          <source>Inf. Syst</source>
          .
          <volume>33</volume>
          (
          <year>2008</year>
          )
          <fpage>64</fpage>
          -
          <lpage>95</lpage>
          . doi:
          <volume>10</volume>
          .1016/J.IS.
          <year>2007</year>
          .
          <volume>07</volume>
          .001.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Adriansyah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. F. van Dongen</given-names>
            ,
            <surname>W. M. P. van der Aalst</surname>
          </string-name>
          ,
          <article-title>Conformance checking using cost-based iftness analysis</article-title>
          ,
          <source>in: IEEE 15th International Enterprise Distributed Object Computing Conference, IEEE Computer Society</source>
          ,
          <year>2011</year>
          , pp.
          <fpage>55</fpage>
          -
          <lpage>64</lpage>
          . doi:
          <volume>10</volume>
          .1109/EDOC.
          <year>2011</year>
          .
          <volume>12</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J. C. A. M.</given-names>
            <surname>Buijs</surname>
          </string-name>
          ,
          <article-title>Flexible evolutionary algorithms for mining structured process models</article-title>
          ,
          <source>Technische Universiteit Eindhoven</source>
          <volume>57</volume>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>N.</given-names>
            <surname>Tax</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Sidorova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Fahland</surname>
          </string-name>
          ,
          <string-name>
            <surname>W. M. P. van der Aalst</surname>
          </string-name>
          ,
          <article-title>The imprecisions of precision measures in process mining</article-title>
          ,
          <source>Inf. Proc. Lett</source>
          .
          <volume>135</volume>
          (
          <year>2018</year>
          )
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          . doi:
          <volume>10</volume>
          .1016/J.IPL.
          <year>2018</year>
          .
          <volume>01</volume>
          .013.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Augusto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Armas-Cervantes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Conforti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dumas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. L.</given-names>
            <surname>Rosa</surname>
          </string-name>
          ,
          <article-title>Measuring fitness and precision of automatically discovered process models: A principled and scalable approach</article-title>
          ,
          <source>IEEE Trans. Knowl. Data Eng</source>
          .
          <volume>34</volume>
          (
          <year>2022</year>
          )
          <fpage>1870</fpage>
          -
          <lpage>1888</lpage>
          . doi:
          <volume>10</volume>
          .1109/TKDE.
          <year>2020</year>
          .
          <volume>3003258</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Adriansyah</surname>
          </string-name>
          ,
          <string-name>
            <surname>B. F. van Dongen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Zannone</surname>
          </string-name>
          ,
          <article-title>Controlling break-the-glass through alignment</article-title>
          , in: International Conference on Social Computing,
          <source>SocialCom</source>
          <year>2013</year>
          , IEEE Computer Society,
          <year>2013</year>
          , pp.
          <fpage>606</fpage>
          -
          <lpage>611</lpage>
          . doi:
          <volume>10</volume>
          .1109/SOCIALCOM.
          <year>2013</year>
          .
          <volume>91</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A. S. M.</given-names>
            <surname>Mehr</surname>
          </string-name>
          ,
          <string-name>
            <surname>R. M. de Carvalho</surname>
            ,
            <given-names>B. F. van Dongen</given-names>
          </string-name>
          ,
          <article-title>Explainable conformance checking: Understanding patterns of anomalous behavior</article-title>
          ,
          <source>Engineering Applications of Artifical Intelligence</source>
          <volume>126</volume>
          (
          <year>2023</year>
          )
          <article-title>106827</article-title>
          . doi:
          <volume>10</volume>
          .1016/J.ENGAPPAI.
          <year>2023</year>
          .
          <volume>106827</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M.</given-names>
            <surname>Grohs</surname>
          </string-name>
          , H. van der Aa, J. Rehse,
          <article-title>Beyond log and model moves in conformance checking: Discovering process-level deviation patterns</article-title>
          ,
          <source>in: Business Process Management - 22nd International Conference, BPM</source>
          <year>2024</year>
          , Krakow, Poland, September 1-
          <issue>6</issue>
          ,
          <year>2024</year>
          , Proceedings, volume
          <volume>14940</volume>
          of Lecture Notes in Computer Science, Springer,
          <year>2024</year>
          , pp.
          <fpage>381</fpage>
          -
          <lpage>399</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>031</fpage>
          -70396-6\_
          <fpage>22</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>E. G.</given-names>
            <surname>Rocha</surname>
          </string-name>
          ,
          <string-name>
            <surname>W. M. P. van der Aalst</surname>
          </string-name>
          ,
          <article-title>Polynomial-time conformance checking for process trees</article-title>
          ,
          <source>in: Business Process Management - 21st International Conference, BPM</source>
          <year>2023</year>
          , Utrecht,
          <source>The Netherlands, September 11-15</source>
          ,
          <year>2023</year>
          , Proceedings, volume
          <volume>14159</volume>
          of Lecture Notes in Computer Science, Springer,
          <year>2023</year>
          , pp.
          <fpage>109</fpage>
          -
          <lpage>125</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>031</fpage>
          -41620-0\_7.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>E.</given-names>
            <surname>Goulart Rocha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. J. J.</given-names>
            <surname>Leemans</surname>
          </string-name>
          ,
          <string-name>
            <surname>W. M. P. van der Aalst</surname>
          </string-name>
          ,
          <article-title>Stochastic conformance checking based on expected sub-trace frequency</article-title>
          ,
          <source>in: International Conference on Process Mining</source>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>E. G.</given-names>
            <surname>Rocha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. J. van Zelst</given-names>
            ,
            <surname>W. M. P. van der Aalst</surname>
          </string-name>
          ,
          <article-title>Mining behavioral patterns for conformance diagnostics</article-title>
          ,
          <source>in: Business Process Management - 22nd International Conference, BPM</source>
          <year>2024</year>
          , Krakow, Poland, September 1-
          <issue>6</issue>
          ,
          <year>2024</year>
          , Proceedings, volume
          <volume>14940</volume>
          of Lecture Notes in Computer Science, Springer,
          <year>2024</year>
          , pp.
          <fpage>291</fpage>
          -
          <lpage>308</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>031</fpage>
          -70396-6\_
          <fpage>17</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>E. G.</given-names>
            <surname>Rocha</surname>
          </string-name>
          ,
          <string-name>
            <surname>W. M. P. van der Aalst</surname>
          </string-name>
          ,
          <article-title>Precision-guided minimization of arbitrary declarative process models</article-title>
          ,
          <source>in: Enterprise, Business-Process and Information Systems Modeling - 25th International Conference, BPMDS</source>
          <year>2024</year>
          ,
          <article-title>and</article-title>
          29th International Conference,
          <string-name>
            <surname>EMMSAD</surname>
          </string-name>
          <year>2024</year>
          , volume
          <volume>511</volume>
          <source>of LNBIP</source>
          , Springer,
          <year>2024</year>
          , pp.
          <fpage>48</fpage>
          -
          <lpage>56</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>031</fpage>
          -61007-3\_5.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>H. A.</given-names>
            <surname>Reijers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Slaats</surname>
          </string-name>
          ,
          <string-name>
            <surname>C. Stahl,</surname>
          </string-name>
          <article-title>Declarative modeling-an academic dream or the future for bpm?</article-title>
          ,
          <source>in: Business Process Management - 11th International Conference, BPM 2013</source>
          , Beijing, China,
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>