1. Introduction

10.1007/978-3-642-40176-3\_26

Conformance Checking Beyond Replay Methods: Closing The Gap To Real-World Adoption

Eduardo Goulart Rocha

0 1 0 Celonis Labs GmbH , Munich , Germany 1 Process And Data Science (PADS) Chair - RWTH Aachen University , Aachen , Germany

2013

8094 26 30

Conformance checking is a key subfield of process mining. Despite its importance, there is relatively little adoption of advanced (procedural) conformance checking methods in the industry, which can be attributed to two limitations of state-of-the-art techniques: first, the poor scalability resulting from their inherent worstcase exponential nature. Second, the high cognitive load needed to interpret their results, which renders them inaccessible to non-experts. This Ph.D. project aims to close these gaps by developing conformance checking techniques suitable for industrial settings. Our research is structured into two workstreams. First, we investigate eficient techniques when one only needs to compute a single number to quantify the degree of discrepancy between an event log and a process model. In parallel, we investigate how to generate conformance diagnostics that can be easily understood without technical enablement. In both workstreams, there has been progress in terms of accepted papers that validate the problem and our proposed solutions.

eol>Process Mining Conformance Checking Conformance Diagnostics

1. Introduction

Conformance checking is concerned with identifying and quantifying discrepancies between event logs and process models. It is a central task in process mining, being an enabler for tasks such as process discovery (by measuring the degree of discrepancy between the log and the model) and enhancement (by correlating patterns of deviation with KPI performance). Despite that, state-of-the-art replay-based conformance checking techniques such as token-based replay [ 1 ] and trace alignments [ 2 ] are not widely adopted in industry, with most commercial tools lacking support for them.

This Ph.D. project is motivated by problems encountered in industry when trying to productize conformance checking techniques. In particular, we learn from two key limitations of existing methods (with a special focus on trace alignments) which, in our experience, explain their poor adoption: first, their poor scalability, and second, the hard-to-digest diagnostics produced by them. Our research is split into two workstreams, depending on the conformance task to be solved.

Task 1 (Computing a Metric) The simplest task of conformance checking is to quantify the degree of discrepancy between an event log and a process model as a single number metric. On the one hand, conformance metrics are expected to ofer a series of quality guarantees such as determinism, monotonicity, and robustness to partial mismatches. On the other hand, applications such as optimization-based process discovery [ 3 ] and online monitoring of the conformance rate requires computationally eficient metrics. Unfortunately, none of existing techniques meet both criteria.

As shown in [ 4, 5 ], most existing techniques do not provide suficient quality guarantees. More critically, in terms of scalability, existing techniques do not scale to very large event logs that are commonly found in industry. For comparison, while the largest BPI Challenge dataset (2016) contains less than 20 million events (and this is arguably one of the least studied public datasets), in industry

RQ Task 1.1 Introduce method for process trees 1.2 Extend to free-choice Petri nets 1.3 Formalize the stochastic problem 1.4 Extend to stochastic process trees 2.1 Define the diagnostics framework 2.2 Discover data and time patterns 2.3 Discover arbitrary patterns 2.4 Minimize arbitrary patterns

Status / Open Challenges Published [ 9 ] Improve the approximation Accepted (to appear) [ 10 ] Decompose stochastic models Published [ 11 ] Scalability and decidability Discovery and verbalization Published [ 12 ] one often encounters event logs with more than one billion (sometimes more than ten billion) events. Furthermore, most techniques do not ofer runtime guarantees, meaning that it is hard to predict when a computation will succeed and that one must resort to timeouts instead, which leads to poor customer experience. This motivates our first research question: RQ 1: How to eficiently measure the conformance rate of very large event logs while providing quality and runtime guarantees? Task 2 (Providing Diagnostics) A more advanced task in conformance checking is to identify deviation patterns that provide insights into the nature of non-conformance. For this task, alignment-based techniques [ 2 ] are considered state-of-the-art. However, alignments sufer from multiple shortcomings: 1. The produced diagnostics are too low-level, in the form of insertion and deletion operations, which do not explain the nature of the deviation. 2. The same insertion/deletion operation admits diferent interpretations depending on the context. 3. Alignments are “non-deterministic”. A trace often admits multiple optimal alignments inducing diferent diagnostics.

Correctly interpreting the result of alignment techniques requires awareness of the above limitations, raising the entry bar for non-experts. This problem has been acknowledge in the literature [ 6, 7, 8 ], but not thoroughly researched. This motivates our second research question: RQ 2: How to identify patterns of non-conformance that are easily interpretable by non-expert users?

2. Research Agenda and Methodology

Our research agenda is summarized in Table 1.

2.1. Eficiently Measuring Conformance Via Subtraces

To answer the first research question, we build on the technique introduced in [ 5 ]. The proposed conformance metrics are based on comparing the Markovian abstraction of the event log and the process model. The technique ofers a series of quality guarantees (see [ 5 ]). Furthermore, it brings two advantages in terms of scalability. First, it avoids computing the synchronous product between both artifacts. Second, the Markovian abstraction of event logs can be computed with a linear pass over the log, which scales to very large event logs. However, the method proposed in [ 5 ] is worst-case exponential in the size of the process model. Our research focuses on how to eficiently compute this abstraction for large process models. A first milestone (1.1) of presenting a compositional polynomialtime approach for process trees has been reached [ 9 ]. Additionally, we extended (1.3) the abstraction to the stochastic perspective [ 10 ].

Extend to free-choice Petri nets (1.2) and stochastic process trees (1.4): For free-choice Petri

nets, we can already provide a polynomial-time approximation method based on decomposing the net and we are currently working on improving the approximation. For stochastic process trees, we might need diferent approaches since compositional methods do not work well with their stochastic perspective.

Evaluation: This research focuses on a well-studied problem in process mining. We resort to standard evaluation setups consisting of real-world and synthetic datasets to measure the runtime of the approach and to compare the induced model ranks with existing state-of-the-art techniques. The scalability of the approach must be measured across multiple dimensions: the size of (the state space of) the model, the size of the log, the number of distinct event types, and the degree of discrepancy between the model and the log.

2.2. Mining Behavioral Patterns for Conformance Diagnostics

To answer the second research question, we first observe that diagnostics provided by declarative methods are often more understandable to non-expert users, however, declarative models are unsuitable to model certain types of processes [ 13, 14 ]. Our proposed solution is to combine both paradigms. The approach is sketched in Figure 1. First, (control-flow) declarative constraints are discovered from a user-provided procedural process model (step A), producing a declarative model that is "equivalent" to the original model. Next, this model is minimized to remove redundancies (B) and used to verify the event log (C).

Constraint

Templates

Procedural Model

A. Discover Constraints

Declarative

Model

B. Constraint Minimization

C. Verify

Log

Diagnostics Log

Using the discovered constraints to verify the event log ensures that the diagnostics are on a higher level and, thus, more understandable. Furthermore, the approach enjoys good quality properties such as determinism and monotonicity of the generated diagnostics. Our first work [ 11 ] introduces the framework above (2.1) and demonstrates how to use it to derive understandable conformance diagnostics. The next steps in this research involve refining each framework step. One of which, namely extending the constraint minimization step (B) to arbitrary models (2.4), is already completed [ 12 ]. Discover Time and Data Patterns (2.2): A natural extension of the framework above is to consider the time and data perspectives [15]. However, this raises questions on feasibility. Reasoning tasks involving the data perspective are computationally expensive. For time constraints, these can quickly turn undecidable [16, 17].

Discover Arbitrary Patterns (2.3): Currently, the framework is limited to a set of constraint template patterns (such as the set of DECLARE patterns) which might not best capture a deviation. We would like to discover arbitrary behavioral patterns, i.e. arbitrary logical formulae beyond a fixed set of templates. For that, we must solve two challenges: First, we must devise an algorithm to discover arbitrary logical formulae (step A). Second, we must devise a mechanism to verbalize the discovered formulae to the user (step C). We believe that the latter can be tackled to some extend using NLP techniques such as rule-based methods or large language models. Therefore, we focus our attention on the first problem, for which our current idea is to extend existing methods [18] to work in our setting. In parallel, we are also evaluating ad hoc approaches that work by analyzing the structure of the process models [19]. Evaluation: Our evaluation will be primarily qualitative. To validate that our diagnostics are understandable, we plan to conduct a user study. However, user studies sufer from reproducibility issues. Therefore, we also aim to develop a set of reference process models for public datasets and demonstrate our method on top of them. With that, we aim to develop a “benchmark” for conformance diagnostics.

Acknowledgments

This Ph.D. project is a cooperation between the RWTH Aachen University and Celonis Labs GmbH. It is supervised by Prof. Dr. Ir. Wil van der Aalst and fully funded by Celonis.

[1]

Rozinat , W. M. P. van der Aalst , Conformance checking of processes based on monitoring real behavior , Inf. Syst . 33 ( 2008 ) 64 - 95 . doi: 10 .1016/J.IS. 2007 . 07 .001.

[2]

Adriansyah , B. F. van Dongen , W. M. P. van der Aalst , Conformance checking using cost-based iftness analysis , in: IEEE 15th International Enterprise Distributed Object Computing Conference, IEEE Computer Society , 2011 , pp. 55 - 64 . doi: 10 .1109/EDOC. 2011 . 12 .

[3]

J. C. A. M.

Buijs , Flexible evolutionary algorithms for mining structured process models , Technische Universiteit Eindhoven 57 ( 2014 ).

[4]

Tax ,

Lu ,

Sidorova ,

Fahland , W. M. P. van der Aalst , The imprecisions of precision measures in process mining , Inf. Proc. Lett . 135 ( 2018 ) 1 - 8 . doi: 10 .1016/J.IPL. 2018 . 01 .013.

[5]

Augusto ,

Armas-Cervantes ,

Conforti ,

Dumas ,

M. L.

Rosa , Measuring fitness and precision of automatically discovered process models: A principled and scalable approach , IEEE Trans. Knowl. Data Eng . 34 ( 2022 ) 1870 - 1888 . doi: 10 .1109/TKDE. 2020 . 3003258 .

[6]

Adriansyah , B. F. van Dongen ,

Zannone , Controlling break-the-glass through alignment , in: International Conference on Social Computing, SocialCom 2013 , IEEE Computer Society, 2013 , pp. 606 - 611 . doi: 10 .1109/SOCIALCOM. 2013 . 91 .

[7]

A. S. M.

Mehr , R. M. de Carvalho , B. F. van Dongen , Explainable conformance checking: Understanding patterns of anomalous behavior , Engineering Applications of Artifical Intelligence 126 ( 2023 ) 106827 . doi: 10 .1016/J.ENGAPPAI. 2023 . 106827 .

[8]

Grohs , H. van der Aa, J. Rehse, Beyond log and model moves in conformance checking: Discovering process-level deviation patterns , in: Business Process Management - 22nd International Conference, BPM 2024 , Krakow, Poland, September 1- 6 , 2024 , Proceedings, volume 14940 of Lecture Notes in Computer Science, Springer, 2024 , pp. 381 - 399 . doi: 10 .1007/978-3- 031 -70396-6\_ 22 .

[9]

E. G.

Rocha , W. M. P. van der Aalst , Polynomial-time conformance checking for process trees , in: Business Process Management - 21st International Conference, BPM 2023 , Utrecht, The Netherlands, September 11-15 , 2023 , Proceedings, volume 14159 of Lecture Notes in Computer Science, Springer, 2023 , pp. 109 - 125 . doi: 10 .1007/978-3- 031 -41620-0\_7.

[10]

Goulart Rocha ,

S. J. J.

Leemans , W. M. P. van der Aalst , Stochastic conformance checking based on expected sub-trace frequency , in: International Conference on Process Mining , 2024 .

[11]

E. G.

Rocha , S. J. van Zelst , W. M. P. van der Aalst , Mining behavioral patterns for conformance diagnostics , in: Business Process Management - 22nd International Conference, BPM 2024 , Krakow, Poland, September 1- 6 , 2024 , Proceedings, volume 14940 of Lecture Notes in Computer Science, Springer, 2024 , pp. 291 - 308 . doi: 10 .1007/978-3- 031 -70396-6\_ 17 .

[12]

E. G.

Rocha , W. M. P. van der Aalst , Precision-guided minimization of arbitrary declarative process models , in: Enterprise, Business-Process and Information Systems Modeling - 25th International Conference, BPMDS 2024 , and 29th International Conference, EMMSAD 2024 , volume 511 of LNBIP , Springer, 2024 , pp. 48 - 56 . doi: 10 .1007/978-3- 031 -61007-3\_5.

[13]

H. A.

Reijers ,

Slaats , C. Stahl, Declarative modeling-an academic dream or the future for bpm? , in: Business Process Management - 11th International Conference, BPM 2013 , Beijing, China,