Advancing Decomposed Conformance Checking in
Process Mining
Wai Lam Jonathan Lee1
1
 Department of Computer Science, School of Engineering, Pontificia Universidad Católica de Chile, Vicuña Mackenna
4860, Macul, Santiago, Chile


                                         Abstract
                                         As process mining gains traction as a tool for analyzing and improving processes in the industry as
                                         exemplified by companies such as Disco, Celonis, and Minit, many commercial process mining tools
                                         are extending beyond process discovery to include conformance checking features. Performing confor-
                                         mance checking in industrial settings means that techniques have to be functional with industrial sized
                                         processes and real-life settings. In this thesis, we propose three approaches: a divide-and-conquer ap-
                                         proach to alignment, conformance checking algorithm selection and a HMM-based approach to online
                                         conformance checking.

                                         Keywords
                                         Conformance checking, Process mining, Alignment-based conformance checking


1. A divide-and-conquer alignment technique
As of current, alignment-based techniques is the state of the art for measuring fitness between a
given event log and model. However, alignments are computationally expensive as the algorithm
has to explore a large amount of states to yield an optimal alignment that provides a “best”
explanation of the discrepancies between the observed and modeled behavior.
   Decomposition techniques have emerged as a promising approach to reduce the computa-
tional complexity [1]. Rather than aligning the overall event log and model, decomposition
techniques first partition them into a set of sub-models and sub-logs so that the alignment
procedure is performed on these smaller sub-components. If a deviation is found at one of the
subcomponents, then it is clear that there is a deviation in the overall component. Otherwise,
the overall process and the overall log are perfectly fitting. As such, alignment problems can
be decomposed and distributed over a network of computers. Experimental results based on
large-scale applications of decomposition techniques have shown significant improvements
in performance, especially in computation time. Following this idea, existing decomposition
techniques have tackled the original problem in two different ways. One is the computation
of conformance at the decomposed subcomponents level, where instead of solving the overall
problem, it focuses on identifying local conformance issues at individual subcomponents. The
Proceedings of the Demonstration Resources Track, Best BPM Dissertation Award, and Doctoral Consortium at BPM
2021 co-located with the 19th International Conference on Business Process Management, BPM 2021, Rome, Italy,
September 6-10, 2021
" walee@uc.cl (W. L. J. Lee)
 0000-0002-3531-3520 (W. L. J. Lee)
                                       © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
other is the approximation of the overall conformance between the process and the log, such as
pseudo-alignments or approximate alignments. As such, it is clear that current decomposition
techniques do not address the problem of computing the exact fitness between a given log and
model.
   Therefore, we investigate and show the required condition for merging optimal decomposed
pseudo-alignments into optimal valid alignments. The merged alignments are then shown to
have the same costs as the alignments computed under the monolithic approach. Taking this
merging condition, we propose an iterative approach that computes optimal alignments and
thereby fitness in a divide and conquer manner. In a nutshell, this approach checks for the
total border agreement condition for decomposed sub-alignments and resolves those which
do not meet the condition using a coarser valid decomposition in the following iteration. This
approach creates a full circle approach for decomposed alignment (‘there and back again’),
and makes it possible to solve alignment-based conformance problems that current techniques
cannot handle and can balance between quality and computation time.
   Our experimental results found that for the proposed recomposing conformance checking
framework, the bottleneck is in that some log traces require many iterations to reach the needed
merging condition. To tackle this bottleneck, we present various heuristics that can be applied at
the recomposition step so that the resulting coarser valid decomposition is more likely to yield
valid optimal alignments. In total, we propose three net recomposition strategies and one log
recomposition strategy. The net recomposition strategies focus on identifying and resolving sets
of border activities that are causing the alignment to not meet the required merging conditions.
Border activities are activities that are in multiple sub-nets of the valid decomposition. Similarly,
the log recomposition strategy increases the requirement for a to-be-aligned log trace to be
selected for the next iteration. This means that problematic log traces that are unlikely to result
in valid alignments are not included so that we avoid wasting computation.


2. Conformance checking algorithm selection
There can be many conformance checking algorithms for the same task. For example, for
alignment-based techniques, there are many techniques that focus on being great at specific
aspects, e.g., computation time, specific data type and conformance issues. This means that the
end user needs to be aware of their advantages and disadvantages to choose the appropriate
technique for their data and objective. However, the end user may not have the expert knowledge
to always select the most appropriate algorithm. One solution is to have an oracle that have
been configured to help users with the decision given their objectives. We apply machine
learning techniques to learn a classifier that would predict the best alignment algorithm in
terms of whether to use decomposition or not to minimize computation time.
   We tackle the algorithm selection problem by encoding it as a classification task. By applying
well known classifiers, the trained predictive model can select the algorithm that is most likely
to achieve best performance among the set of available algorithms. As a first instance, we
tackle the problem of deciding when to apply decomposition-based algorithms to compute exact
optimal alignments using A* techniques so that computation time is minimized. We present
the analysis of the trained models and identifies features that can have a significant impact on
the performance of different algorithms. While this tackles a specific problem in the space of
alignment computation, it is easy to see that the proposed framework can have an impact on
other scopes, i.e., tackle more general alignment algorithm selection problems, as well as in
the considered algorithms to include other alignment-based algorithms such as planner-based
approaches, and approaches based on different theories.
   We trained several classification algorithms on a set of synthetic data for four selected
alignment algorithms. The model features used as input for the classification algorithms are
extracted from the log trace, the model, and the synchronous net product between the trace net
and model respectively. To ensure that the predictive models can practically perform predictions
prior to alignment, the features are relatively simple and can all be extracted in linear time with
respect to the size of the synchronous net product. The classifiers are then evaluated in terms of
their classification performance (precision, recall and F1-score) as well as the penalized average
runtime with a penalty factor of 10, i.e., a timeout counts as 10 times the timeout (PAR10).
   Unfortunately, our results show that none of the classifiers were able to beat the single best
(SB) solver, i.e., the algorithm with the best overall performance. Yet we did find that the trained
classifiers have comparable performance to the SB solver and are better performing than the
baseline random classifier.


3. A HMM-based approach to online conformance checking
Lastly, we turn to the challenge of conformance checking under an online context. Given the
volume and velocity at which event data comes in, organizations may not store these data for
offline analysis and have to resort to online techniques. Moreover, performing analysis in real
time allows process stakeholders to react to conformance issues. Most existing conformance
checking techniques require the trace of events to correspond to a completed case. This means
that these techniques target offline scenarios and do not typically cater for online contexts
where it is desirable to raise alerts as soon as a significant deviation is observed for cases that
have not reached completion. Moreover, due to the continuous increase in recorded data, it
can be infeasible for organizations to store data for offline processing. For example, Walmart is
estimated to collect more than 2.5 petabytes of data every hour from its customer transactions.
As such, in recent years, a new set of algorithms has been proposed for online scenarios in
which we assume to have an event stream as input so that each item relates to an observed
event for a case.
   Here, we propose a novel online approach which performs conformance checking on an event
stream with constraints on memory and time. There are several works on online conformance
checking, but there still exists areas for improvement. For example, prefix alignments [2]
and a similar approach based on enriching a transition system using alignment concepts [3]
have difficulties handling warm start scenarios. Another approach that performs conformance
checking on behavioral patterns can lose information due to its abstraction [4].
   One fundamental challenge of explaining the conformance of a running case is in balancing
between making sense at the process level as the case reaches completion and putting emphasis
on the current information at the same time. Given the trace of a complete case, alignment-
based techniques excels at giving a globally optimal conformance solution. However, as the
case unfolds in an online scenario, alignment techniques can be slow to realise such eventual
explanation since they are always seeking a globally optimal explanation. Moreover, there is
no flexibility in allowing warm start scenarios. At the other end of the spectrum, focusing
only on the current information given by an incoming event of a case can be insufficient in
providing a conformance explanation coherent at the process level. However, if we only check
directly following behavioral patterns as new events of the case come in, the conformance issue
would not be detected since new events always form a modeled directly following behavioral
pattern with the previous event. As such, it is desirable to have an online framework that yields
balanced conformance explanations.
   To tackle this problem, we present such framework based on Hidden Markov Models (HMM).
As new events come in for running cases, the model alternates between localizing the running
case within the reference model using the observed event and computing conformance from
such estimated position. Different to the assumption of the standard HMM, both the previous
state and observation can influence the next state due to non-conformance. This is modeled
by conditioning state transition and observation probabilities by both the previous state and
observation. Furthermore, rather than deciding beforehand the effects of non-conformance, an
Expectation-Maximization (EM) algorithm is applied to compute the parameters from past data.


4. Evaluation
To evaluate the proposed techniques, both synthetic and real-life datasets are used. For the
recomposition framework, synthetic data was generated using the PTandLogGenerator [5] and
the PLG2 tool. For the experiments, 15 synthetic nets were generated with various settings so
that they include large processes containing combinations of all the most common workflow
patterns, such as XOR, AND, loops, or invisible transitions. Logs were generated from the
models using simulation, and different operations were applied to emulate different plausible
noise scenarios: no noise, noise by removing events and noise by swapping. These generated
data are all publicly available. Two real life datasets are used to evaluate the recomposition
framework - the BPIC 2012 dataset and the BPIC 2018 dataset. Since no explicit process model
is provided with the dataset, a model is constructed with consideration to the task semantics.
   As explained previously, for the algorithm selection task, two publicly available synthetic
datasets were used. After verifying, cleaning and formatting the generated data, the final dataset
records performance data for over 800,000 alignment computations on 140,000 model trace
pairs that have been generated from 20 models.
   Lastly, for the HMM-based online conformance checking framework, a publicly available
synthetic dataset was used to perform a stress test and a correlation analysis with existing
online techniques. Then the framework is evaluated using a real-life dataset of a hospital billing
process.


5. Tools
All of the proposed techniques have working implementations available. The recomposing
conformance checking framework has been implemented as the Replay with Recomposition
plugin in the DecomposedReplayer package of ProM6.9. ProM is an extensible framework that
supports a wide variety of Process Mining techniques in the form of plug-ins. It is platform
independent as it is implemented in Java, and can be downloaded free of charge. In the plugin,
users can configure different values for the parameters of the framework. Both the algorithm
selection and HMM-based online framework are implemented in Python and can be found on
Github so that results can be reproduced.


Acknowledgments
This work is supported by CONICYT-PCHA / Doctorado Nacional / 2017-21170612.


References
[1] W. M. P. van der Aalst, Decomposing petri nets for process mining: Ageneric approach,
    Distributed and Parallel Databases 31 (2013) 471–507.
[2] S. J. van Zelst, A. Bolt, M. Hassani, B. F. van Dongen, W. M. P. van der Aalst, Online
    conformance checking: relating event streams to process models using prefix alignments,
    International Journal of Data Science and Analytics (2017).
[3] A. Burattin, J. Carmona, A framework for online conformance checking, Business Process
    Management Workshops - BPM 2017 International Workshops (2017) 165–177.
[4] A. Burattin, S. J. van Zelst, A. Armas-Cervantes, B. F. van Dongen, J. Carmona, Online
    conformance checking using behavioural patterns, Business Process Management - 16th
    International Conference (2018) 250–267.
[5] T. Jouck, B. Depaire, Ptandloggenerator: A generator for artificial eventdata, BPM (Demos)
    1789 (2016) 23–27.