<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Replay using Recomposition: Alignment-Based Conformance Checking in the Large</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Wai Lam Jonathan Lee</string-name>
          <email>walee@uc.cl</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>H.M.W. Verbeek</string-name>
          <email>h.m.w.verbeek@tue.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jorge Munoz-Gama</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Wil M.P. van der Aalst</string-name>
          <email>w.m.p.v.d.aalst@tue.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marcos Sepu´lveda</string-name>
          <email>marcos@ing.puc.cl</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Architecture of Information Systems Group, Department of Mathematics and Computer Science, Eindhoven University of Technology</institution>
          ,,
          <country country="NL">The Netherlands</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Computer Science, School of Engineering Pontificia Universidad Cato ́lica de Chile</institution>
          ,,
          <country country="CL">Chile</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In the area of process mining, ecient alignment-based conformance checking is a hot topic. Existing approaches for conformance checking are typically monolithic and compute exact fitness values. One limitation with monolithic approaches is that it may take a significant amount of computation time in large processes. Alternatively, decomposition approaches run much faster but do not always compute an exact fitness value. This paper presents the tool Replay using Recomposition which returns the exact fitness value and the resulting alignments using the decomposition approach in an iterative manner. Other than computing the exact fitness value, users can configure the balance between result accuracy and computation time to get a fitness interval within set constraints, e.g., “Give me the best fitness estimation you can find within 5 minutes”.</p>
      </abstract>
      <kwd-group>
        <kwd>process mining</kwd>
        <kwd>conformance checking</kwd>
        <kwd>business process management</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Recomposing Conformance for Large Processes</title>
      <p>
        In conformance checking of process mining, alignment-based approaches have
become the state-of-the-art technique due to their robustness and detailed view on
deviations [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Unlike previous approaches, alignment-based conformance
checking provides an optimal analysis and can pinpoint deviations between the
observed behavior and the modeled behavior at the level of events, e.g., the
identification of an executed activity in the process that is not permitted by the
process model. However, with the growth in the volume of data and the size of
event logs, there is a need to make alignment-based conformance checking more
scalable. Case studies have shown that alignment-based conformance checking
can take excessive time or even be unfeasible in large and complex processes.
Furthermore, no result is provided if the technique is not completely feasible for
the entire dataset.
      </p>
      <p>
        One of the most promising research lines is to decompose the alignment
problem. Rather than aligning the overall process and the overall event log, the
two are decomposed into subprocesses and sublogs, and alignment is performed
on these smaller subcomponents. Experimental results based on large scale
applications of decomposition techniques have shown significant improvements in
performance, especially in computation time [
        <xref ref-type="bibr" rid="ref4 ref6">4, 6</xref>
        ]. A recent work has presented
a new technique, Recomposing Conformance [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], which not only applies
decomposition to the alignment problem, but merges results from subcomponents as
an overall result. Moreover, this merged result is guaranteed to correspond to
the exact overall result that would have been yielded under the overall approach
without decomposition. As such, there are three main contributions. First, the
application of decomposition techniques can lead to significant reduction in
computation time. Second, approximate interval results can be computed even if
conformance checking is not completely feasible for the whole dataset. Third,
users can configure di↵erent criteria such as the overall time threshold to adjust
to the desired balance between accuracy and computation time. In this paper we
introduce Replay using Recomposition, an implementation of the Recomposing
Conformance technique.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Recomposing Conformance: The Tool</title>
      <p>
        Replay using Recomposition has been implemented as a plugin in the ProM6.7
framework. Figure 1 provides a high level overview of the conformance checking
framework. It takes in a process model (an Accepting Petrinet, i.e., a Petri net
with initial and final markings) and an event log as input and returns a set of
alignments and fitness scores as output. Recomposing Conformance is an
iterative framework composed of three phases. First, a decomposition of the model
and log is generated either by an initial decomposition or through a
modification of the decomposition from the previous iteration. Second, subcomponents of
the decomposition are aligned separately to achieve performance gains over the
monolithic approach. Third, alignment results from subcomponents are merged
if possible. Alignment for the subset of the log that could not merged in the
current iteration is continued in the following iteration. We refer users to [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] for
further details on the formal guarantees, merging conditions, and other details
of the technique.
      </p>
      <p>
        As previously mentioned, the Recomposing Conformance framework is highly
configurable. Users can configure three aspects to balance the result accuracy
and the required computation time. First, users can configure the initial
decomposition of the model and log which is required to start the conformance checking
process. Second, users can opt between two decomposed replay methods:
normal decomposition or Hide&amp;Reduce [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Third, the desired level of accuracy and
computation time can be adjusted through trace rejection and termination
conditions. For trace rejection, users can set the computation time and maximum
conflict thresholds for individual traces so that traces that take a long time to
align due to conformance issues are rejected to reduce computation time. For
termination, users can set the overall computation time and maximum iteration
thresholds, the target alignment percentage of the log and the target range for
the approximate interval result (level of accuracy) so that the checking process
can be terminated once the set time or accuracy is met. As previously
mentioned, an approximate interval result is computed if conformance checking is
not feasible for the whole dataset under the set constraints. Figure 2 shows a
screenshot of a dialog in the plugin where users can configure the mentioned
parameters and the resulting alignment that can be used for deviation analysis.
To showcase the tool, we take a running example and show what the tool does
with it. Given the fact that the demo needs to be interactive, we can only show
a small example here. Of course, for such a small example, the non-decomposed
replay will also finish well within time, but if we take a larger example for which
this replay does not finish within time, the demo will not be interactive any
more. For these larger examples that show that the recomposing replay can be
much faster than the non-decomposing replay, we refer the interested reader to
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>
        For the small running example, we take the “a12f0n05” event log from [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
This log contains 12 di↵erent activities, and 35 di↵erent traces which contain
some (5%) noise. Figure 3 shows a graphical overview of this event log, showing
35 di↵erent traces of which only the 7 di↵erent traces in the left-most column
occur more than once in the log. Figure 4 shows the corresponding Petri net
used to replay this event log.
      </p>
      <p>In the first iteration, the recomposing replay replays all 35 di↵erent traces
on 12 clusters. For 14 out of these 35 di↵erent traces, the first iteration already
provides us with a valid alignment. For the remaining 21 di↵erent traces,
conflicts were found and another iteration is required. For the second iteration, the
activity ‘f’ is selected to be joined on. As a result, the clusters containing ‘f’ will
be merged, yielding only 10 clusters. Only 12 out of the 21 remaining di↵erent
traces had a conflict on ‘f’, so only these 12 are replayed on these 10 clusters.
From these 12, 5 provide us with a valid alignment, and 7 do not. . Table 1 shows
the complete results for the running example. This shows that only a single
different trace required a replay on the entire (non-decomposed) net, and that all
other 34 di↵erent traces could be replayed successfully on some decomposition.
4</p>
    </sec>
    <sec id="sec-3">
      <title>Download, Screencast, and Examples</title>
      <p>The tool is available as the Replay using Recomposition plugin in the
DecomposedReplayer package of ProM6.7 (www.promtools.org).The plugin is available as
a silent version (default configurations) and a visually configurable version. The
source code is also available in https://svn.win.tue.nl/trac/prom/browser/
Packages/DecomposedReplayer. A screencast showing the features of the tool
and example datasets are also available at www.processmininguc.com/tools.</p>
      <p>Iteration Join on #Cluster #Replay #Accept #Reject #Conflict</p>
    </sec>
    <sec id="sec-4">
      <title>Conclusions</title>
      <p>This paper presented Replay using Recomposition, an implementation of the
Recomposing Conformance technique. This alignment-based conformance checking
technique applies decomposition to compute an overall result that corresponds
to the exact result that would have been yielded under the monolithic approach.
Furthermore, users can adjust configurations to balance between result accuracy
and computation time. Unlike existing alignment-based conformance checking
techniques, an approximate result is given if conformance checking is not
feasible for the whole dataset under the set constraints.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Adriansyah</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Aligning Observed and Modeled Behavior</article-title>
          .
          <source>Ph.D. thesis</source>
          , Technische Universiteit Eindhoven (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>W.L.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Verbeek</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Munoz-Gama</surname>
            , J., van der Aalst,
            <given-names>W.M.P.</given-names>
          </string-name>
          , Sepu´lveda, M.:
          <article-title>Recomposing Conformance: Closing the Circle on Decomposed AlignmentBased Conformance Checking in Process Mining. (under review) (2017), processmininguc</article-title>
          .com/publications
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Maruster</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weijters</surname>
            ,
            <given-names>A.J.M.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aalst</surname>
            ,
            <given-names>W.M.P.</given-names>
          </string-name>
          v.d.,
          <string-name>
            <surname>Bosch</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          v.d.:
          <article-title>A rule-based approach for process discovery: Dealing with noise and imbalance in process logs</article-title>
          .
          <source>Data Min. Knowl. Disc</source>
          .
          <volume>13</volume>
          (
          <issue>1</issue>
          ),
          <fpage>67</fpage>
          -
          <lpage>87</lpage>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Munoz-Gama</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <source>Conformance Checking and Diagnosis in Process Mining - Comparing Observed and Modeled Processes</source>
          . Springer (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Verbeek</surname>
            ,
            <given-names>H.M.W.</given-names>
          </string-name>
          :
          <article-title>Decomposed replay using hiding and reduction</article-title>
          .
          <source>In: PNSE 2016 Workshop Proceedings</source>
          . pp.
          <fpage>233</fpage>
          -
          <lpage>252</lpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Verbeek</surname>
            ,
            <given-names>H.M.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Munoz-Gama</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aalst</surname>
            ,
            <given-names>W.M.P.</given-names>
          </string-name>
          v.d.:
          <article-title>Divide And Conquer: A Tool Framework for Supporting Decomposed Discovery in Process Mining</article-title>
          .
          <source>The Computer Journal</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>