<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Decomposed Replay Using Hiding and Reduction</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>H.M.W. Verbeek</string-name>
          <email>h.m.w.verbeek@tue.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Mathematics and Computer Science Eindhoven University of Technology</institution>
          ,
          <addr-line>Eindhoven</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
      </contrib-group>
      <fpage>233</fpage>
      <lpage>252</lpage>
      <abstract>
        <p>In the area of process mining, decomposed replay has been proposed to be able to deal with nets and logs containing many different activities. The main assumption behind this decomposition is that replaying many subnets and sublogs containing only some activities is faster then replaying a single net and log containing many activities. Although for many nets and logs this assumption does hold, there are also nets and logs for which it does not hold. This paper shows an example net and log for which the decomposed replay may take way more time, and provides an explanation why this is the case. Next, to mitigate this problem, this paper proposes an alternative decomposed replay, and shows that this alternative decomposed replay is faster than the monolithic replay even for the problematic cases as identified earlier. However, the alternative decomposed replay is often slower than the original decomposed approach. An advantage of the alternative decomposed approach over the original approach is that its cost estimates are typically better.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        The area of process mining [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] is typically divided into process discovery, process
conformance, and process enhancement. In the context of big data, a
decomposition approach [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] has been introduced that includes both process discovery
(decomposed discovery) and process conformance (decomposed replay). In this
paper, we take this decomposed replay for process conformance as a starting
point. The main assumptions for this decomposed replay are that (1) checking
conformance using a monolithic replay (that is, by replaying a single net and
log, which contain many different activities) takes prohibitively much time, and
that (2) checking conformance using a decomposed replay (that is, by replaying
a series of subnets and sublogs, which each contain far less different activities
than the single net and log) takes far less time. The decomposition approach as
introduced in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] splits up a single Petri net in to a collection of Petri nets with
shared transitions on their borders, and guarantees that (1) the result of the
decomposed replay is perfect if and only if the result of the monolithic replay is
perfect, and (2) the result of the decomposed replay provides a lower bound for
the result of the monolithic replay otherwise.
      </p>
      <p>
        Figure 1 supports this assumption by showing typical computation times for
the nets and logs as found in a number of data sets [
        <xref ref-type="bibr" rid="ref7 ref8 ref9">7–9</xref>
        ]. These data sets contain
      </p>
      <p>
        Computation times, with provided nets
in total 59 cases of varying size, ranging from 12 to 429 activities, from 500 to
2000 traces, with varying numbers of mismatching traces (from 0% to 50%). This
figure shows that if the monolithic replay would take more than a second, the
decomposed replay would be faster. Furthermore, it shows that some replays do
not finish within 10 minutes (see [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] why we use a timeout here of 10 minutes)
using the monolithic replay, whereas all replays finish within 10 minutes using
the decomposed replay.
      </p>
      <p>
        However, Figure 2 paints a different picture. It shows the typical computation
times for replaying the log as found in the data sets on the net as discovered
using the Inductive miner [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] from that log. As a result, instead of checking the
conformance of some log on some designed net, we now check the conformance of
this log on some discovered net. As for this replay we only require a log (and not a
net), we included some additional data sets [
        <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
        ] for this figure. This figure shows
that in some cases the decomposed replay misbehaves by requiring much more
time than the monolithic replay. For example, the monolithic replay requires less
y
a
l
p
e
r
d
e
s
o
p
m
o
c
e
D
      </p>
      <p>Computation times, with discovered nets</p>
      <p>a22f0n05
0.1
1
10
100</p>
      <p>1000</p>
      <sec id="sec-1-1">
        <title>Monolithic replay</title>
      </sec>
      <sec id="sec-1-2">
        <title>Seconds</title>
      </sec>
      <sec id="sec-1-3">
        <title>Domposed replay is faster Decomposed replay is slower Infeasible</title>
        <p>than 10 seconds for the a22f0n05 case, where the decomposed replay takes more
than 10 minutes.</p>
        <p>In this paper, we investigate the root cause of this misbehavior. Based on
these findings, we propose an alternative decomposed replay to mitigate the
misbehavior problem. This alternative decomposition approach is based on the well
known concepts of hiding transitions and reducing nets. We show that, in almost
all cases that take more than 10 seconds for the monolithic replay, either using
designed nets or discovered nets, the hide-and-reduce replay is indeed faster.
However, we will also show that the (original) decomposed replay is often faster
than the hide-and-reduce replay. But, whereas the decomposed replay does
misbehave for some cases, the hide-and-reduce replay does not. Finally, we show
that the hide-and-reduce replay has an extra advantage, as it provides a better
estimate for the replay costs.</p>
        <p>The remainder of this paper is organized as follows. First, Section 2
introduces the necessary concepts, like accepting Petri nets and alignments. Second,
Section 3 shows that there is a possible problem with the decomposed replay,
and proposes the hide-and-reduce replay to mitigate this problem. Third,
Section 4 evaluates the hide-and-reduce replay using the existing data sets, which
shows that it can handle more cases than the monolithic and/or decomposed
replay, and that it typically returns a better lower bound for the costs as the
decomposed replay. Last, Section 5 concludes the paper.
2
2.1</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Preliminaries</title>
      <p>
        Logs
In this paper, we consider activity logs, which are an abstraction of the event
logs as found in practice. An activity log is a collection of traces, where every
trace is a sequence of activities [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Table 1 shows the example activity log L1,
which contains information about 20 cases, for example, 4 cases followed the
trace ha1, a2, a4, a5, a8i. In total, the log contains 13 + 17 + 9 + 2 × 9 + 9 + 4 ×
5 + 9 + 9 + 5 + 5 + 17 + 3 × 5 + 5 + 5 = 156 activities.
      </p>
      <p>Definition 1 (Universe of activities).
activities.</p>
      <p>The set A denotes the universe of</p>
      <p>To capture an activity log, we use multi-sets. If S is a set of objects, then
B(S) is a multi-set of objects, that is, if B ∈ B(S) and o ∈ S, then object o
occurs B(o) times in multi-set B.</p>
      <p>Definition 2 (Activity log). Let A ⊆ A be a set of activities. An activity log
L over A is a multi-set of activity traces over A, that is, L ∈ B(A∗).</p>
      <p>Nets
In this paper, we assume that a net is an accepting Petri net, that is, a labeled
Petri net with an initial marking and a set of final markings. The transition
labels are used to denote the activity a transition corresponds to. Transitions
that do not correspond to an activity are labeled τ , and are henceforth called
invisible. The other (activity-labeled) transitions are henceforth called visible
transitions. The initial marking and final markings are needed for the replay,
which needs to start at the initial marking and needs to end in a final marking.
Definition 3 (Petri net). A Petri net is a 3-tuple (P, T, F ) where P is a set of
places, T is a set of transitions such that P ∩ T = ∅, and F ⊆ (P × T ) ∪ (T × P )
is a set of arcs.</p>
      <p>Definition 4 (Accepting Petri net). Let A ⊆ A be a set of activities. An
accepting Petri net over the set of activities A is a 6-tuple (P, T, F, l, I, O) where
(P, T, F ) is a Petri net, l ∈ T → (A ∪ {τ }) is a labeling function that links
every transition onto an activity (possibly the dummy activity τ ), I ∈ B(P ) is
an initial marking, and O ⊆ B(P ) is a set of final markings.</p>
      <p>p1
t1
a1
p2
t8 a6
p3
p4
t3
a2
t4
a3
a4
t5
t2
t6
p5
p6
p7
a5
t7
p9
t10
a7
t11
a8
t9
p8
Assume that on the example net N1 we want to replay the trace ha1, a2, a3, a4, a5,
a6, a7, a8i, that is, we want to find the best transition sequence that starts in the
initial marking, ends in a final marking, and that has the given trace as activity
sequence. Basically, the activity sequence corresponds to the transitions sequence
which every transitions replaced by its label and all τ labels removed afterwards.
Finding the best transition sequence is done by first finding a transition sequence,
and second to impose some notion of costs to every found transition sequence. By
keeping these costs minimal while finding a transition sequence, we then obtain
the best transition sequence.</p>
      <p>
        To find a transition sequence, the replayer [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] creates a number of possible
moves, and assigns costs to every move. A move can be a synchronous move,
a log move, a visible model move, or an invisible model move. A synchronous
move consists of a transitions and its label, that is, its corresponding activity.
As an example, the move (t3, a2) is a synchronous move for net N1, indicating
that transition t3 was fired which matched activity a2 in the trace. A log move
consists of a dummy transition (indicated by ) and an activity. As an example,
the move ( , a2) is a log move for net N1, indicating that activity a2 could not
be matched to any enabled transition. A visible model move consists of a visible
transition and a dummy activity (also indicated by ). As an example, the move
(t3, ) is a visible model move for net N1, indicating that visible transition t3
was fired but could not be matched to any activity in the trace. An invisible
model move consists of an invisible transition and the τ label. As an example,
the move (t2, τ ) is an invisible model move for net N1, indicating that invisible
transition t2 was fired.
      </p>
      <p>Definition 5 (Legal moves). Let A ⊆ A be a set of activities, let σ ∈ A∗
be an activity trace over A, and let N = (P, T, F, l, I, O) be an accepting Petri
net over A. The set of legal moves of A and N is the union of the sets {(a,
t)|a ∈ A ∧ t ∈ T ∧ l(t) = a} (synchronous moves), {(a, )|a ∈ A} (log moves),
{( , t)|t ∈ T ∧ l(t) ∈ A} (visible model moves), and {(τ, t)|t ∈ T ∧ l(t) = τ }
(invisible model moves).</p>
      <p>Log moves and visible model moves hint at mismatches, as these indicate
that either some activity could not be matched by a proper enabled transition,
or that the firing of the transition could not be matched to its activity in the
trace. For this reason, the costs of these moves are typically set to positive values,
whereas the costs for the other moves are set to 0. In this paper, we will use the
costs 10 for every log move, and 4 for every visible model move.
Definition 6 (Costs structure). Let A ⊆ A be a set of activities, and let
N = (P, T, F, l, I, O) be an accepting Petri net over A. A cost structure $ for A
and N is a function that maps every legal move of A and N onto a (non-negative)
natural number.</p>
      <p>Using these moves and these costs, the question for the replayer is then to find
a sequence of moves such that (1) the activities correspond to the given activity
sequence, (2) the transitions correspond to a possible transition sequence in the
net that starts in the initial marking and ends in the final marking, and (3)
has minimal costs. This sequence of moves is henceforth called an optimal trace
alignment.
Definition 7 (Trace alignment). Let A ⊆ A be a set of activities, let σ ∈ A∗
be an activity trace over A, and let N = (P, T, F, l, I, O) be an accepting Petri
net over A. A trace alignment h for trace σ on net N is a sequence of legal moves
(a, t) ∈ ((A ∪ {τ, }) × (T ∪ { })) such that:
– σ = h 1A and
– For some o ∈ O it holds that I[h 2T io,
where
and
h 1A=
h 2T =
 hi if h = hi;</p>
      <p>hai · h 1A if h = h(a, t)i · h and a ∈ A;
 h 1A if h = h(a, t)i · h and a 6∈ A
 hi if h = hi;</p>
      <p>hti · h 2T if h = h(a, t)i · h and t ∈ T ;
 h 2T if h = h(a, t)i · h and t 6∈ T .</p>
      <p>Definition 8 (Costs of trace alignment). Let A ⊆ A be a set of activities,
let σ ∈ A∗ be an activity trace over A, let N = (P, T, F, l, I, O) be an accepting
Petri net over A, let h = h(a1, t1), . . . , (an, tn)i be a trace alignment (of length
n) for σ and N , and let $ be a cost structure for A and N . The costs of trace
alignment h, denoted $h, is defined as the sum of the costs of all legal moves in
the alignment, that is, $h = Pi∈{1,...,n} $(ai, ti).</p>
      <p>Definition 9 (Optimal trace alignment). Let A ⊆ A be a set of activities,
let σ ∈ A∗ be an activity trace over A, let N = (P, T, F, l, I, O) be an accepting
Petri net over A, let h be a trace alignment for σ and N , and let $ be a cost
structure for A and N . The trace alignment h is called optimal if there exists no
other trace alignment h0 such that $h0 &lt; $h.</p>
      <p>a1 τ
a2 a3 a4 τ
a5 a6 τ</p>
      <p>a7 a8</p>
      <p>Figure 4 shows an optimal trace alignment for the trace ha1, . . . , a8i and net
N1. Please note that an optimal trace alignment may not be unique. In the
example, we could also have chosen to do a synchronous move on a3 and a log
move on a2. The resulting alignment would also be optimal.
2.4</p>
      <p>
        Decomposed replay
For small number of activities (like 8 as in the example), computing an optimal
trace alignment on the entire net may be possible within 10 minutes, but for
larger numbers (say 100 or more), the replay will take considerable more time.
To alleviate this, decomposed replay has been proposed in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. In decomposed
replay, we first decompose the net into subnets, where the following restrictions
are taken into account:
– Every place ends up in a single subnet.
– Every invisible transition ends up in a single subnet.
– Every arc ends up in a single subnet.
– If multiple visible transitions share the same label, then these transitions
end up in a single subnet.
      </p>
      <p>As a result, only visible transitions that have a unique label can be distributed
over the subnets. Figure 5 shows how the example net N1 can be decomposed
a7
a8
a6
a7
a8
a1
a1
a6
a2
a3
a4
a2
a3
a4
a5
a5
N1a</p>
      <p>N1b</p>
      <p>N1c</p>
      <p>N1d</p>
      <p>N1e
into 5 subnets.</p>
      <p>
        The costs associated to the various moves are also decomposed (see also [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]).
For example, the activity a1 appears in two subnets (N1a and N1b). As a result,
a log move on a1 in N1a costs 10/2 = 5 and a log move on a1 in N1b costs 5 as
well. Note that, by coincidence, all activities appear in two subnets, therefore,
all log moves in all subnets now cost 5. If some activity would have appeared
in three subnets, then the costs for a log move for every subnet would be 10/3.
As a result of this decomposition, if all subalignments in the decomposed replay
agree on a log move for a1, then the costs are identical to a log move in the
monolithic replay. Likewise, a visible model move on transition t1 in N1a costs
4/2 = 2 and a visible model move in N1b costs 2 as well. In theory, the costs
for the synchronous moves and invisible model moves are also decomposed in a
similar way, but as both are typically 0, we typically ignore them.
      </p>
      <p>Second, we decompose the trace into subtraces, and replay every subtrace on
the corresponding subnet.
a1
t1
0
h1a
a1 τ a2 a3 a4 a6
t1 t2 t3 ≫</p>
      <p>t5 ≫</p>
      <p>Third, we accumulate the costs of these subalignments, which are the costs as
reported by the decomposed replay. Note that the optimal alignment h1d includes
a model move on transition t7 instead of a log move on a7. As the model move is
cheaper than the log move, doing the log move would not be optimal. This shows
that the costs as reported by the decomposed replay (5 + 5 + 5 + 2 + 5 + 5 =
27) can indeed be lower than the (correct) costs of the monolithic alignment
(10 + 10 + 10 = 30).</p>
      <p>
        As indicated in the Introduction, the decomposed replay is often much faster
than the monolithic replay. Because of the formal guarantees as provided by
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], we know that the decomposed replay only returns costs 0 if and only if the
monolithic replay returns costs 0, and that otherwise the decomposed replay
returns less costs than the monolithic replay (that is, the decomposed replay
provides a lower bound for the correct costs of the monolithic replay). As such,
we can use the decomposed replay as a fast way to check whether there are any
costs involved, and to obtain a lower bound for the correct costs.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Hide and Reduce</title>
      <p>
        In the Introduction, we have shown that for some cases, the decomposed replay
actually misbehaves. As an example, if we take the log from the a22f0n05 case
[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], discover a net for it using the Inductive Miner, and replay the log on the
discovered net, then the monolithic replay requires less than 10 seconds, whereas
the decomposed replay fails. In this section, we use the a22f0n05 case to find
the root cause of this failure. After having found that root cause, this section
proposes an alternative decomposed replay to mitigate the root cause.
subnet, the fact that 5 transitions are enabled in every state is too much for the
replayer.
      </p>
      <p>Apparently, by removing parts of the net in the subnet, information that was
vital for the replayer may have been lost. As an example, in the net, the transition
labeled s+complete can only fire after the transition labeled p+complete has been
fired, and both have to fire exactly once. In the subnet, both transitions can be
fired any number of times, and in any order, which leads to the replay to require
more than 10 minutes.</p>
      <p>For this reason, this paper presents an alternative decomposed replay. Instead
of removing places, transitions, and arcs that do not belong to a subnet, this
decomposition simply keeps these places and arcs while hiding (making invisible)
all visible transitions.</p>
      <p>Definition 10 (Hidden net). Let A ⊆ A be a set of activities, let N = (P, T,
F, l, I, O) be an accepting Petri net over A, and let Ai ⊆ A be a subset of the set
of activities. Then the hidden net for N and Ai, denoted N (Ai), is the accepting
Petri net (P, T, F, l(Ai), I, O) such that
l(Ai)(t) =
l(t) if l(t) ∈ Ai;
τ if l(t) 6∈ Ai.</p>
      <p>
        As a result, the structure of the otherwise-removed parts is maintained, and
hence the inter-transition relations as described before are maintained.
Theorem 1 (Hiding preserves having costs 0). Let A ⊆ A be a set of
activities, let N = (P, T, F, l, I, O) be an accepting Petri net over A, let A1,
. . . , An be the activities of the subnets N1, . . . , Nn that result from decomposing
net N (see [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] for details), and let σ ∈ A∗ be an activity trace over A. Then
an optimal trace alignment h for σ and N has costs 0 if and only if for every
1 ≤ i ≤ n an optimal trace alignment hi for σ and Ni has costs 0.
Proof. We prove this by comparing the monolithical (non-decomposed) costs,
say $m, the decomposed costs, say $d, and the costs as obtained by hiding, say
$h. If h has costs 0, then mutatis mutandis (some activities are replaced by τ ’s)
h also has costs 0 for every hidden subnet N (ai). As a result, if $m = 0 then
$h = 0. If hi has costs 0, then mutatis mutandis (some columns are removed)
hi also has costs 0 for the corresponding subnet Ni. As a result, if $h = 0 then
$d = 0. From [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] if follows that decomposition preserves perfect fitness, that is,
it preserves having costs 0. As a result, $m = 0 if and only if $d = 0. Therefore,
$m = 0 if and only if $h = 0.
      </p>
      <p>
        This hiding step is then optionally followed by a reduction step using
wellknown Petri-net-reduction rules [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. This makes the net as small as possible for
the replayer. Figure 9 shows an overview of these reduction rules when applied
to accepting Petri nets. From this figure, it is clear that these rules preserve the
branching activity behavior of the net:
– The rules cannot remove any visible transition.
– The FPT rule can only remove an invisible transition if there exists another
invisible transition.
– The FPP rule can only remove an unmarked place (that is, a place not
involved in the initial or any final marking) if there exists another unmarked
place.
– The ESP rule can only remove a marked place which has x tokens in the
initial and every final marking, where x &gt; 0.
– If a marked place is removed, the initial and final markings are updated
accordingly.
      </p>
      <p>Definition 11 (Reduced nets). Let A ⊆ A be a set of activities, and let
N = (P, T, F, l, I, O) be an accepting Petri net over A. Then the reduced nets for
N , denoted R(N ), is the collection of accepting Petri nets (P 0, T 0, F 0, l, I0, O0)
that can result from applying the rules as shown in Figure 9 over and over again
until no rule can be applied anymore.</p>
      <p>Theorem 2 (Reducing preserves costs 0). Let A ⊆ A be a set of activities,
let N = (P, T, F, l, I, O) be an accepting Petri net over A, let N R ∈ R(N ) be
a reduced net for N , and let σ ∈ A∗ be an activity trace over A. Then an
optimal trace alignment h for σ and N has costs 0 if and only if an optimal
trace alignment h0 for σ and N R has costs 0.</p>
      <p>Proof. As these rules preserve the branching activity behavior of the net (see
Figure 9, they preserve the costs of an optimal alignment.</p>
      <p>As an illustration, Figure 10 shows a hidden-and-reduced subnet h01b for the
subnet h1b as shown in Figure 5. Note that the subnet h01b requires, for example,
the transition labeled a1 to be fired exactly once, whereas it could be fired any</p>
      <p>Legend
an unmarked place
any place
a place containing x tokens in the initial
and every final marking (where x &gt; 0)
any number of these objects
(includes connected arcs)
updating initial and final
markings if needed
preserve the branching activity behavior.</p>
      <p>∗
∗
∗
∗
∗
∗
∗
∗
∗
∗
an invisible transition
a visible transition
any transition</p>
      <p>∗
FPT
FPP
∗
∗
∗
∗
∗</p>
      <p>FST
FST
FSP
∗
∗
∗
∗
∗
∗
∗</p>
      <p>∗
∗
∗
∗
∗
∗
∗
EST
ESP
∗
∗
∗
∗
∗
∗
∗</p>
      <p>Fig. 10. Hidden-and-reduced subnet h01b.
number of times in the subnet h1b. This may reduce the search space for the
replayer drastically. Nevertheless, there is also a downside, as this
hiding-andreducing may result in additional invisible transitions, which need to be replayed
as well.</p>
      <p>We expect this hide-and-reduce replay to be faster than the monolithic replay,
and to provide a better (higher) lower bound for the correct replay costs as the
decomposed replay. In the next section, we will evaluate these assumptions.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Evaluation</title>
      <p>We have evaluated the hide-and-reduce replay on the same data sets as we used
in the Introduction. First, this section will evaluate the computation times of
the hide-and-reduce replay by comparing them to the computation times of
the monolithic replay. Second, it will evaluate the lower bounds for the correct
replays costs as obtained by the hide-and-reduce replay by comparing them to
both other replays.</p>
      <p>Computation times, with discovered nets
Fig. 12. Computation times for the monolithic and hide-and-reduce replay with the
nets discovered using the Inductive Miner from the logs as provided by the data sets.
0.1
a32f0n50
prFm6</p>
      <p>Figure 12 shows the results of the hide-and-reduce replay using the nets as
discovered by the inductive Miner. Note the contrast with Figure 2, which shows
similar times for the (original) decomposed replay.</p>
      <p>
        Recall that the decomposed replay requires more than 10 minutes for a
number of cases which required less than 10 minutes with the monolithic replay.
The hide-and-reduce replay required more than 10 minutes for only one single
case (prFm6 [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]), which also requires more than 10 minutes with the monolithic
replay. If the monolithic replay takes more than 10 seconds, then the alternative
decomposed approach is likely to be faster. There is only a single case in the data
sets for which this does not hold: a32f0n50 [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. For this case, the monolithic
replay takes 49.1 seconds and the hide-and-reduce replay takes 49.2 seconds. The
fact that the hide-and-reduce replay is not faster for this case is caused by a
similar effect as with the a22f0n05 case: The largest subnet contains almost all
(28 of the 32) activities, and requires 48 seconds to be replayed. Of the remaining
1.2 seconds, half a second is required for replaying the other 15 subnets and 1.2
seconds are required for splitting the log and net and merging the alignments.
In contrast, the decomposed replay requires more than 10 minutes for this and
13 other cases, and takes way longer for another three.
      </p>
      <p>As a result we conclude that for the nets as discovered by the Inductive
Miner, the hide-and-reduce replay requires less than 10 minutes for more cases
than the other two replays. As such, it is an improvement over these other
replays. Furthermore, it is typically faster than the monolithic replay when the
latter takes more than 10 seconds. As such, it is an improvement over this replay.</p>
      <p>
        Figure 13 shows the results of the hide-and-reduce replay using the nets
as provided by the data sets. Like the decomposed replay, the hide-and-reduce
replay requires less than 10 minutes for all data sets and it is typically faster
than the monolithic replay. Exceptions to this are the prAm6 and prBm6 cases
[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. In both cases, the nets and logs are decomposed in more than 300 subnets
and sublogs, the decomposed replay on these subnets and sublogs requires only
2 seconds (where the monolithic replay requires almost 40 seconds), but the
reduction step takes about 60 and 40 seconds. For the other five cases for which
the alternative decomposed replay takes more than 10 seconds, this reduction
step is also the bottleneck. For all these cases, the decomposed replay requires at
most 5 seconds, but the reduction requires from 25 to 167 seconds. Apparently,
the reduction step is a possible bottleneck for the hide-and-reduce replay.
      </p>
      <p>With regard to the computation times, we can conclude that both the
decomposed and the hide-and-reduce replays can handle more cases than the monolithic
replay, that they are typically faster than the monolithic replay, and that the
net at hand largely determines whether the decomposed or the hide-and-reduce
replay should be used. If the net has been designed, then the decomposed replay
would be fastest, but this replay requires more than 10 minutes for many
discovered nets. Therefore, in case the net has been discovered, or in case one does
not know, the prudent approach would be the hide-and-reduce replay.</p>
      <p>Apart from evaluating the differences in computation times, we also need to
evaluate the reported costs of the three replays. We know that the monolithic
0.1
prBm6
prAm6</p>
      <p>Computation times, with provided nets
0.1
1
10
100</p>
      <p>1000</p>
      <sec id="sec-4-1">
        <title>Seconds</title>
      </sec>
      <sec id="sec-4-2">
        <title>Monolithic replay</title>
      </sec>
      <sec id="sec-4-3">
        <title>Hide-and-reduce replay is faster Hide-and-reduce replay is slower Infeasible</title>
        <p>replay provides the correct costs, and that both other approaches provide only
a lower bound for these correct costs. Figure 14 shows the costs as reported by
all replays on all nets (either provided or discovered) and all logs from the data
sets. This figure shows that the costs as reported by hide-and-reduce approach
are at least as good as the costs as provided by the decomposed replay, and
that it is often better (higher). To emphasize this, we have highlighted in the
figure those costs that are at least half (Above 50%) of the correct costs. Most
of these highlighted costs are reported by the hide-and-reduce replay, only a few
by the decomposed replay. This shows that the hide-and-reduce replay typically
provides a better lower bound for the costs.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusions</title>
      <p>
        In this paper, we have shown that, for some cases, the decomposed replay [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]
may take longer than the monolithic replay. Although for designed nets this
      </p>
      <p>Costs, with provided and discovered nets
0.1
0.01
0.1
1
10
100</p>
      <p>Costs using monolithic replay
Decomposed replay</p>
      <p>Hide-and-reduce replay</p>
      <p>Above 50%
is typically not the case, it may very well be the case for discovered nets. As
such, a user who wants to check conformance on a log and a net discovered
from that log may want to think twice to use the decomposed replay, as the
monolithic replay may be faster (less than 10 seconds) while providing correct
costs, while the decomposed replay may take longer (more than 10 minutes)
while only providing a lower bound for the correct costs.</p>
      <p>We have also shown that the root cause of this ‘misbehavior’ of the
decomposed replay is the fact that the decomposition may result in a subnet with a fair
amount of places and multiple source transitions. As a result of the fair amount
of places, the state space will contain a fair amount of states as well. As a result
of the multiple source transitions, the replayer needs to investigate from every
reachable state at least all source transitions, which may all lead to new states
that again need to be investigated. We have shown that the replay of a subnet
with 13 places and 5 source transitions may require more than 10 minutes for
the decomposed replayer.</p>
      <p>To mitigate this problem with the source transitions, we have proposed a
hide-and-reduce replay, which maintains the structure of the original net in the
subnets, and hence does not introduce source transitions. We have shown that
this new replay requires less than 10 minutes for all-but-one of the cases in
the data sets used, and that this case required more than 10 minutes for the
monolithic and decomposed replays as well. As such, the hide-and-reduce replay
offers a replay that can handle more situations than either of the other replays
can. For this reason, if it is possible that the net at hand was discovered from the
log, then using the new hide-and-reduce replay is the best approach. Granted,
it may be slower than the decomposed replay, but chances are that the case at
hand simply requires more than 10 minutes for the decomposed replay.</p>
      <p>Furthermore, we have shown that the costs as reported by the
hide-andreduce replay are at least as good as the costs as reported by the decomposed
replay, but possibly better. Therefore, if it is important to the user to have an
asgood-as-possible estimate for the correct costs, then using the hide-and-reduce
replay is better than using the decomposed replay.</p>
      <p>
        Finally, we have shown that in some cases the required reduction step in the
hide-and-reduce replay is the bottleneck of the entire replay. Whereas the entire
replay on the subnets would take only 2 seconds, the required reduction of these
subnets may have taken 60 seconds. As a result of this bottleneck, the
hide-andreduce replay is sometimes slower than the monolithic replay. For this reason,
we aim to improve in the near future on the reduction rules [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] as used by the
hide-and-reduce replay. possibly, the rules may be simplified knowing that the
reduction only needs to preserve the costs of any trace replayed on the net.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Aalst</surname>
            ,
            <given-names>W.M.P.</given-names>
          </string-name>
          v.d.: Process Mining: Discovery, Conformance and Enhancement of Business Processes. Springer-Verlag,
          <year>1st</year>
          edn. (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Aalst</surname>
            ,
            <given-names>W.M.P.</given-names>
          </string-name>
          v.d.:
          <article-title>Decomposing Petri nets for process mining: A generic approach</article-title>
          .
          <source>Distrib. Parallel Dat</source>
          .
          <volume>31</volume>
          (
          <issue>4</issue>
          ),
          <fpage>471</fpage>
          -
          <lpage>507</lpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Aalst</surname>
            ,
            <given-names>W.M.P.</given-names>
          </string-name>
          v.d.,
          <string-name>
            <surname>Adriansyah</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dongen</surname>
            ,
            <given-names>B.F.v.</given-names>
          </string-name>
          :
          <article-title>Replaying history on process models for conformance checking and performance analysis</article-title>
          .
          <source>Data Min. Knowl. Disc</source>
          .
          <volume>2</volume>
          (
          <issue>2</issue>
          ),
          <fpage>182</fpage>
          -
          <lpage>192</lpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Dongen</surname>
            ,
            <given-names>B.F.v.</given-names>
          </string-name>
          :
          <article-title>BPI challenge 2012 data set (</article-title>
          <year>2012</year>
          ), doi: 10.4121/uuid:
          <fpage>3926db30</fpage>
          - f712
          <string-name>
            <surname>-</surname>
          </string-name>
          4394
          <string-name>
            <surname>-</surname>
          </string-name>
          aebc-75976070e91f
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Dongen</surname>
            ,
            <given-names>B.F.v.</given-names>
          </string-name>
          :
          <article-title>BPI challenge 2015 data set (</article-title>
          <year>2015</year>
          ), doi: 10.4121/uuid:
          <fpage>31a308efc844</fpage>
          -48da
          <string-name>
            <surname>-</surname>
          </string-name>
          948c-305d167a0ec1
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Leemans</surname>
            ,
            <given-names>S.J.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fahland</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aalst</surname>
            ,
            <given-names>W.M.P.</given-names>
          </string-name>
          v.d.:
          <article-title>Discovering block-structured process models from event logs - a constructive approach</article-title>
          . In: Colom,
          <string-name>
            <given-names>J.M.</given-names>
            ,
            <surname>Desel</surname>
          </string-name>
          ,
          <string-name>
            <surname>J</surname>
          </string-name>
          . (eds.)
          <source>Application and Theory of Petri Nets and Concurrency. Lect. Notes Comput. Sc.</source>
          , vol.
          <volume>7927</volume>
          , pp.
          <fpage>311</fpage>
          -
          <lpage>329</lpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Maruster</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weijters</surname>
            ,
            <given-names>A.J.M.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aalst</surname>
            ,
            <given-names>W.M.P.</given-names>
          </string-name>
          v.d.,
          <string-name>
            <surname>Bosch</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          v.d.:
          <article-title>A rule-based approach for process discovery: Dealing with noise and imbalance in process logs</article-title>
          .
          <source>Data Min. Knowl. Disc</source>
          .
          <volume>13</volume>
          (
          <issue>1</issue>
          ),
          <fpage>67</fpage>
          -
          <lpage>87</lpage>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Munoz-Gama</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Carmona</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aalst</surname>
            ,
            <given-names>W.M.P.</given-names>
          </string-name>
          v.d.:
          <article-title>Conformance checking in the large: Partitioning and topology</article-title>
          . In: Daniel,
          <string-name>
            <given-names>F.</given-names>
            ,
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Weber</surname>
          </string-name>
          ,
          <string-name>
            <surname>B</surname>
          </string-name>
          . (eds.) Business Process Management: 11th International Conference,
          <string-name>
            <surname>BPM</surname>
          </string-name>
          <year>2013</year>
          , Beijing, China,
          <source>August 26-30</source>
          ,
          <year>2013</year>
          .
          <source>Proceedings. Lect. Notes Comput. Sc.</source>
          , vol.
          <volume>8094</volume>
          , pp.
          <fpage>130</fpage>
          -
          <lpage>145</lpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Munoz-Gama</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Carmona</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aalst</surname>
            ,
            <given-names>W.M.P.</given-names>
          </string-name>
          v.d.:
          <article-title>Single-entry single-exit decomposed conformance checking</article-title>
          .
          <source>Inf. Syst</source>
          .
          <volume>46</volume>
          ,
          <fpage>102</fpage>
          -
          <lpage>122</lpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Murata</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Petri nets: Properties, analysis and applications</article-title>
          .
          <source>P. IEEE</source>
          <volume>77</volume>
          (
          <issue>4</issue>
          ),
          <fpage>541</fpage>
          -
          <lpage>580</lpage>
          (
          <year>1989</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>