Jan Mendling and Stefanie Rinderle-Ma, eds.: Proceedings of EMISA 2016,
                                                             Gesellschaft für Informatik, Bonn 2016

In Log and Model We Trust? (extended abstract)

Andreas Rogge-Solti1, Arik Senderovich2, Matthias Weidlich3, Jan Mendling4, and Avigdor
Gal5


Abstract: While models and event logs are readily available in modern organizations, their quality
can seldom be trusted. Raw event recordings are often noisy, incomplete, and contain erroneous
recordings. The quality of process models, both conceptual and data-driven, heavily depends on the
inputs and parameters that shape these models, such as domain expertise of the modelers and the
quality of execution data. The mentioned quality issues are specifically a challenge for conformance
checking. Conformance checking is the process mining task that aims at coping with low model
or log quality by comparing the model against the corresponding log, or vice versa. The prevalent
assumption in the literature is that at least one of the two can be fully trusted. In this work, we
propose a generalized conformance checking framework that caters for the common case, when
one does neither fully trust the log nor the model. In our experiments we show that our proposed
framework balances the trust in model and log as a generalization of state-of-the-art conformance
checking techniques.

Keywords: process mining, conformance checking, model repair, log repair


1    Introduction

Business process management plays an important role in modern organizations that aim
at improving the effectiveness and efficiency of their processes. To assist in reaching this
goal, the research area of process mining offers multitude of techniques to analyze event
logs that carry data from business processes. Process mining investigates the interplay
among reality (system), its reported observations (event log), and a corresponding process
model [BvDvdA14]. While reality is typically unknown, we are left with the need to rec-
oncile the event log and the process model, where evidence of a certain behavior may only
be present in one but not in the other.
Current conformance checking techniques are not capable of defining levels of trust for
model and log to cater for uncertainty. Therefore, we consider the problem of optimally
reconciling an event log with a process model, given an input event log and a model (if
such exist) and our degree of trust in each. In this extended abstract we present the results
of our work in [Ro16], where we outline that various process mining tasks can actually
be regarded as special cases of this generic problem formulation. The problem formula-
tion goes beyond locating misalignments between a process model and an event log by
1 Wirtschaftsuniversität Wien, Vienna, Austria, firstname.lastname@wu.ac.at
2 Technion–Israel Institute of Technology, Haifa, Israel sariks@tx.technion.ac.il
3 Humboldt University of Berlin, Germany firstname.lastname@hu-berlin.de
4 Wirtschaftsuniversität Wien, Vienna, Austria, firstname.lastname@wu.ac.at
5 Technion–Israel Institute of Technology, Haifa, Israel avigal@ie.technion.ac.il
                              Lx             mine()        M(L)
                                                           x

                                       L*             M*
                                   x                       x
                            L(M)            simulate()         M

                Figure 1: Conceptual sketch of the problem setting. [Ro16]


providing explanations of misalignments and categorizing them as one of a) anomalies
in an event log, b) modeling errors, and c) unresolvable inconsistencies. This generalized
conformance checking problem can be seen as the unification of conformance checking,
model repair, and anomaly detection.


2   The Generalized Conformance Checking Problem

We consider the setting that next to the given input event log L and input process model
M, we also have a trust level for the model πM and a trust level for the log πL . Latter
reflect for instance the trust in the event recording mechanism, or the trust in the abilities
of the modeler. In this setting, we are interested in finding the optimal (repaired) log L∗
and (repaired) model M ∗ pair that best fit the input log L and model M, and also fit each
other best according a distance measure (e.g., replay fitness).

To solve this problem, we propose a two-step divide-and-conquer approach. The main idea
of this approach is to avoid the inherent complexity induced by the freedom to change the
model or the log by sequentialization: first identifying changes in the model, before turning
to changes applied to the log.

Our approach is outlined in Figure 1. We lift the problem into the model space by mining
a model M(L) by representing event logs as their discovered counterparts. Then, we ap-
proximate M ∗ by applying a greedy heuristic search in the space between the input model
M and the mined model M(L). If we strongly trust our input model, then, we respect that
by not allowing to move too far away from the input model. If we do not trust our input
model at all, we would end up with the mined model M(L).

After approximating the optimal model M ∗ , we align the input log L to it using techniques
like [vdAAvD12], and see which deviations remain. These misalignments then need to be
classified into errors in the log, and remaining non-conformance between event log and
model. Here, we use the trust in the log to determine the the share that will be corrected
in the log. For example, if we trust our log to be correct entirely (trustlevel 1), we do not
                        Jan Mendling and Stefanie Rinderle-Ma, eds.: Proceedings of EMISA 2016,
                                                          Gesellschaft für Informatik, Bonn 2016

 Process mining task                                                      Log Trust   Model Trust
 Classical Process Discovery finds a model that best fits to the entire    πL = 1       πM = 0
 event log, e.g., the alpha algorithm [vdAWM04].
 Heuristic Process Discovery algorithms apply preprocessing 0 < πL < 1                  πM = 0
 to the event log by discarding infrequent patterns [GvdA07,
 WvdADM06].
 Model Repair fixes deficient models due to e.g., a change in the          πL = 1     0 < πM < 1
 system that is reflected in the log. For example [FvdA15].
 Conformance Checking. This task tries to find misalignments be-           πL = 1       πM = 1
 tween event log and model. Example works include [RvdA08,
 vdAAvD12, Se16].
 Log Repair. Given a trusted model and a noisy log, we modify the 0 < πL < 1            πM = 1
 log until it conforms to the model [Ro13, RSK14, Wa15].
 “Happy Path” Simulation is complementary to heuristic process             πL = 0     0 < πM < 1
 discovery. It is a theoretical use case where we do not trust infre-
 quent parts of the model [MSS15].
 Process Simulation is complementary to process discovery, where           πL = 0       πM = 1
 we are given an untrustworthy empty log and a fully trustworthy
 model.
 Garbage In, Garbage Out. When both the model and the log are              πL = 0       πM = 0
 untrustworthy, the best log and model tuple that fits them is any pair
 of model and log that fits each other, including an empty log and an
 empty model.
 Generalized Conformance Checking is the focus of this paper. In-          0 < πL       0 < πM
 stead of only detecting the misalignments, as in conformance check-
 ing, we also provide, where the model would best be adopted, and
 where the log would best be adopted for a better overall fit.


          Table 1: Some process mining tasks cast as problem instances. [Ro16]

repair the log at all. If our trust in the input log is less (because it is based on noisy sensors
for instance), then we repair a corresponding share of the misalignments in the event log.
Table 1 characterizes different areas of process mining and business intelligence that this
generalized framework covers for different trust levels.


3   Conclusion
In this work, we presented a generalization of the conformance checking problem. It strives
for a balance between two independent input parameters: the trust in the log quality, and
the trust in the model quality. Specifically, when presented with an event log and a pro-
cess model, generalized conformance checking attempts at repairing both according to
the initial trust levels, and returns an improved log-model pair. Generalized conformance
checking is comparable to state-of-the-art model repair techniques in model quality mea-
sures. The full formalisation, evaluation results, and further details can be found in the
original conference paper [Ro16].


References
[BvDvdA14]     Buijs, Joos C. A. M.; van Dongen, Boudewijn F.; van der Aalst, Wil M. P.: Qual-
               ity dimensions in process discovery: The importance of fitness, precision, general-
               ization and simplicity. International Journal of Cooperative Information Systems,
               23(01):1440001, 2014.
[FvdA15]       Fahland, Dirk; van der Aalst, Wil M.P.: Model repair – aligning process models to
               reality. Information Systems, 47:220 – 243, 2015.
[GvdA07]       Günther, Christian W.; van der Aalst, Wil M. P.: Fuzzy mining–adaptive process
               simplification based on multi-perspective metrics. In: Business Process Manage-
               ment, pp. 328–343. Springer, 2007.
[MSS15]        Marquard, Morten; Shahzad, Muhammad; Slaats, Tijs: Web-based modelling and
               collaborative simulation of declarative processes. In: Business Process Manage-
               ment, pp. 209–225. Springer, 2015.
[Ro13]         Rogge-Solti, Andreas; Mans, Ronny S.; van der Aalst, Wil M. P.; Weske, Mathias:
               Improving documentation by repairing event logs. In: The Practice of Enterprise
               Modeling, pp. 129–144. Springer, 2013.
[Ro16]         Rogge-Solti, Andreas; Senderovich, Arik; Weidlich, Matthias; Mendling, Jan; Gal,
               Avigdor: In Log and Model We Trust? - A Generalized Conformance Checking
               Framework. In: Business Process Management (BPM’16). 2016. (to appear).
[RSK14]        Rogge-Solti, Andreas; Kasneci, Gjergji: Temporal anomaly detection in business
               processes. In: Business Process Management, pp. 234–249. Springer, 2014.
[RvdA08]       Rozinat, Anne; van der Aalst, Wil M. P.: Conformance checking of processes based
               on monitoring real behavior. Information Systems, 33(1):64 – 95, 2008.
[Se16]         Senderovich, Arik; Weidlich, Matthias; Yedidsion, Liron; Gal, Avigdor; Mandel-
               baum, Avishai; Kadish, Sarah; Bunnell, Craig A.: Conformance checking and per-
               formance improvement in scheduled processes: A queueing-network perspective.
               Information Systems, Forthcoming, 2016.
[vdAAvD12]     van der Aalst, Wil M. P.; Adriansyah, Arya; van Dongen, Boudewijn F.: Replaying
               History on Process Models for Conformance Checking and Performance Analysis.
               WIREs: Data Mining and Knowledge Discovery, 2:182–192, 2012.
[vdAWM04]      van der Aalst, Wil M. P.; Weijters, Ton; Maruster, Laura: Workflow mining: Dis-
               covering process models from event logs. Knowledge and Data Engineering, IEEE
               Transactions on, 16(9):1128–1142, 2004.
[Wa15]         Wang, Jianmin; Song, Shaoxu; Lin, Xuemin; Zhu, Xiaochen; Pei, Jian: Cleaning
               structured event logs: A graph repair approach. In: Data Engineering (ICDE’15).
               IEEE, pp. 30–41, 2015.
[WvdADM06] Weijters, A. J. M. M.; van der Aalst, Wil M. P.; De Medeiros, A. K. Alves: Process
           mining with the heuristics miner-algorithm. Technical Report 166, 2006.