<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>In Log and Model We Trust? (extended abstract)</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Andreas Rogge-Solti</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Arik Senderovich</string-name>
          <email>sariks@tx.technion.ac.il</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Matthias Weidlich</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jan Mendling</string-name>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Avigdor Gal</string-name>
          <email>avigal@ie.technion.ac.il</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Humboldt University of Berlin</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Technion-Israel Institute of Technology</institution>
          ,
          <addr-line>Haifa</addr-line>
          ,
          <country country="IL">Israel</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Technion-Israel Institute of Technology</institution>
          ,
          <addr-line>Haifa</addr-line>
          ,
          <country country="IL">Israel</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Wirtschaftsuniversita ̈t Wien</institution>
          ,
          <addr-line>Vienna</addr-line>
          ,
          <country country="AT">Austria</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>Wirtschaftsuniversita ̈t Wien</institution>
          ,
          <addr-line>Vienna</addr-line>
          ,
          <country country="AT">Austria</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2016</year>
      </pub-date>
      <abstract>
        <p>While models and event logs are readily available in modern organizations, their quality can seldom be trusted. Raw event recordings are often noisy, incomplete, and contain erroneous recordings. The quality of process models, both conceptual and data-driven, heavily depends on the inputs and parameters that shape these models, such as domain expertise of the modelers and the quality of execution data. The mentioned quality issues are specifically a challenge for conformance checking. Conformance checking is the process mining task that aims at coping with low model or log quality by comparing the model against the corresponding log, or vice versa. The prevalent assumption in the literature is that at least one of the two can be fully trusted. In this work, we propose a generalized conformance checking framework that caters for the common case, when one does neither fully trust the log nor the model. In our experiments we show that our proposed framework balances the trust in model and log as a generalization of state-of-the-art conformance checking techniques.</p>
      </abstract>
      <kwd-group>
        <kwd>process mining</kwd>
        <kwd>conformance checking</kwd>
        <kwd>model repair</kwd>
        <kwd>log repair</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Introduction
Business process management plays an important role in modern organizations that aim
at improving the effectiveness and efficiency of their processes. To assist in reaching this
goal, the research area of process mining offers multitude of techniques to analyze event
logs that carry data from business processes. Process mining investigates the interplay
among reality (system), its reported observations (event log), and a corresponding process
model [BvDvdA14]. While reality is typically unknown, we are left with the need to
reconcile the event log and the process model, where evidence of a certain behavior may only
be present in one but not in the other.</p>
      <p>Current conformance checking techniques are not capable of defining levels of trust for
model and log to cater for uncertainty. Therefore, we consider the problem of optimally
reconciling an event log with a process model, given an input event log and a model (if
such exist) and our degree of trust in each. In this extended abstract we present the results
of our work in [Ro16], where we outline that various process mining tasks can actually
be regarded as special cases of this generic problem formulation. The problem
formulation goes beyond locating misalignments between a process model and an event log by</p>
      <p>L
x</p>
      <p>x
L(M)
providing explanations of misalignments and categorizing them as one of a) anomalies
in an event log, b) modeling errors, and c) unresolvable inconsistencies. This generalized
conformance checking problem can be seen as the unification of conformance checking,
model repair, and anomaly detection.
2</p>
      <p>The Generalized Conformance Checking Problem
We consider the setting that next to the given input event log L and input process model
M, we also have a trust level for the model pM and a trust level for the log pL. Latter
reflect for instance the trust in the event recording mechanism, or the trust in the abilities
of the modeler. In this setting, we are interested in finding the optimal (repaired) log L
and (repaired) model M pair that best fit the input log L and model M, and also fit each
other best according a distance measure (e.g., replay fitness).</p>
      <p>To solve this problem, we propose a two-step divide-and-conquer approach. The main idea
of this approach is to avoid the inherent complexity induced by the freedom to change the
model or the log by sequentialization: first identifying changes in the model, before turning
to changes applied to the log.</p>
      <p>Our approach is outlined in Figure 1. We lift the problem into the model space by mining
a model M(L) by representing event logs as their discovered counterparts. Then, we
approximate M by applying a greedy heuristic search in the space between the input model
M and the mined model M(L). If we strongly trust our input model, then, we respect that
by not allowing to move too far away from the input model. If we do not trust our input
model at all, we would end up with the mined model M(L).</p>
      <p>After approximating the optimal model M , we align the input log L to it using techniques
like [vdAAvD12], and see which deviations remain. These misalignments then need to be
classified into errors in the log, and remaining non-conformance between event log and
model. Here, we use the trust in the log to determine the the share that will be corrected
in the log. For example, if we trust our log to be correct entirely (trustlevel 1), we do not</p>
    </sec>
    <sec id="sec-2">
      <title>Log Trust</title>
    </sec>
    <sec id="sec-3">
      <title>Model Trust Process mining task</title>
      <p>Classical Process Discovery finds a model that best fits to the entire
event log, e.g., the alpha algorithm [vdAWM04].</p>
      <p>Heuristic Process Discovery algorithms apply preprocessing 0 &lt; pL &lt; 1
to the event log by discarding infrequent patterns [GvdA07,
WvdADM06].</p>
      <p>Model Repair fixes deficient models due to e.g., a change in the
system that is reflected in the log. For example [FvdA15].</p>
      <p>Conformance Checking. This task tries to find misalignments
between event log and model. Example works include [RvdA08,
vdAAvD12, Se16].</p>
      <p>Log Repair. Given a trusted model and a noisy log, we modify the 0 &lt; pL &lt; 1
log until it conforms to the model [Ro13, RSK14, Wa15].
“Happy Path” Simulation is complementary to heuristic process
discovery. It is a theoretical use case where we do not trust
infrequent parts of the model [MSS15].</p>
      <p>Process Simulation is complementary to process discovery, where
we are given an untrustworthy empty log and a fully trustworthy
model.</p>
      <p>Garbage In, Garbage Out. When both the model and the log are
untrustworthy, the best log and model tuple that fits them is any pair
of model and log that fits each other, including an empty log and an
empty model.</p>
      <p>Generalized Conformance Checking is the focus of this paper.
Instead of only detecting the misalignments, as in conformance
checking, we also provide, where the model would best be adopted, and
where the log would best be adopted for a better overall fit.
pL = 1
pL = 1
pL = 1
pL = 0
pL = 0
pM = 0
pM = 0
0 &lt; pM &lt; 1
pM = 1
pM = 1
pM = 1
pM = 0
pL = 0</p>
      <p>0 &lt; pM &lt; 1
0 &lt; pL
0 &lt; pM
repair the log at all. If our trust in the input log is less (because it is based on noisy sensors
for instance), then we repair a corresponding share of the misalignments in the event log.
In this work, we presented a generalization of the conformance checking problem. It strives
for a balance between two independent input parameters: the trust in the log quality, and
the trust in the model quality. Specifically, when presented with an event log and a
process model, generalized conformance checking attempts at repairing both according to
the initial trust levels, and returns an improved log-model pair. Generalized conformance
checking is comparable to state-of-the-art model repair techniques in model quality
measures. The full formalisation, evaluation results, and further details can be found in the
original conference paper [Ro16].
[BvDvdA14]
[GvdA07]
[MSS15]
[Ro13]
[Ro16]
[RSK14]
[RvdA08]
[Se16]
[vdAAvD12]
[vdAWM04]
[Wa15]</p>
      <p>Buijs, Joos C. A. M.; van Dongen, Boudewijn F.; van der Aalst, Wil M. P.:
Quality dimensions in process discovery: The importance of fitness, precision,
generalization and simplicity. International Journal of Cooperative Information Systems,
23(01):1440001, 2014.</p>
      <p>Fahland, Dirk; van der Aalst, Wil M.P.: Model repair – aligning process models to
reality. Information Systems, 47:220 – 243, 2015.</p>
      <p>Gu¨nther, Christian W.; van der Aalst, Wil M. P.: Fuzzy mining–adaptive process
simplification based on multi-perspective metrics. In: Business Process
Management, pp. 328–343. Springer, 2007.</p>
      <p>Marquard, Morten; Shahzad, Muhammad; Slaats, Tijs: Web-based modelling and
collaborative simulation of declarative processes. In: Business Process
Management, pp. 209–225. Springer, 2015.</p>
      <p>Rogge-Solti, Andreas; Mans, Ronny S.; van der Aalst, Wil M. P.; Weske, Mathias:
Improving documentation by repairing event logs. In: The Practice of Enterprise
Modeling, pp. 129–144. Springer, 2013.</p>
      <p>Rogge-Solti, Andreas; Senderovich, Arik; Weidlich, Matthias; Mendling, Jan; Gal,
Avigdor: In Log and Model We Trust? - A Generalized Conformance Checking
Framework. In: Business Process Management (BPM’16). 2016. (to appear).</p>
      <p>Rogge-Solti, Andreas; Kasneci, Gjergji: Temporal anomaly detection in business
processes. In: Business Process Management, pp. 234–249. Springer, 2014.</p>
      <p>Rozinat, Anne; van der Aalst, Wil M. P.: Conformance checking of processes based
on monitoring real behavior. Information Systems, 33(1):64 – 95, 2008.
van der Aalst, Wil M. P.; Weijters, Ton; Maruster, Laura: Workflow mining:
Discovering process models from event logs. Knowledge and Data Engineering, IEEE
Transactions on, 16(9):1128–1142, 2004.</p>
      <p>Wang, Jianmin; Song, Shaoxu; Lin, Xuemin; Zhu, Xiaochen; Pei, Jian: Cleaning
structured event logs: A graph repair approach. In: Data Engineering (ICDE’15).</p>
      <p>IEEE, pp. 30–41, 2015.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [FvdA15] [WvdADM06]
          <string-name>
            <surname>Weijters</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. J. M. M.; van der Aalst</surname>
          </string-name>
          ,
          <string-name>
            <surname>Wil M. P.; De Medeiros</surname>
            ,
            <given-names>A. K.</given-names>
          </string-name>
          <article-title>Alves: Process mining with the heuristics miner-algorithm</article-title>
          .
          <source>Technical Report 166</source>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>