<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Process Model Repair by Detecting Un tting Fragments?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Alexey A. Mitsyuk</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Irina A. Lomazova</string-name>
          <email>ilomazovag@hse.ru</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ivan S. Shugurov</string-name>
          <email>shugurov94@gmail.com</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Wil M.P. van der Aalst</string-name>
          <email>w.m.p.v.d.aalst@tue.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Eindhoven University of Technology</institution>
          ,
          <addr-line>P.O. Box 513, NL-5600 MB, Eindhoven</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>National Research University Higher School of Economics</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Technical University of Munich, Arcisstra e 21, Munich</institution>
          ,
          <addr-line>Bavaria, 80333</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Process models often do not adequately re ect the behavior of real-life systems. In the general case, it is possible to construct a new adequate model by applying one of the discovery algorithms. At the same time, there are cases when the original model is of particular value. In such cases, it is better to apply model repair algorithms. Those algorithms construct a model which re ects real behavior according to some criteria. Moreover, the repaired model remains as similar to the original one as possible. This paper proposes a modular approach which consists of three parts: (1) decomposing the process model and event log into model fragments and sub-logs, (2) selecting the fragments which need to be repaired, (3) repairing the selected fragments using a process discovery algorithm.</p>
      </abstract>
      <kwd-group>
        <kwd>process mining</kwd>
        <kwd>process model repair</kwd>
        <kwd>process model decomposition</kwd>
        <kwd>Petri nets</kwd>
        <kwd>divide and conquer</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Modern process-aware information systems can be designed using model-driven</title>
      <p>software engineering approaches. However, the exact implementation of design
models in real-life systems is a rare case. Moreover, processes tend to change
during the system's life-cycle. A process owner usually wants relevant,
up-todate models describing the process. In this paper we suggest a method, which
can repair a model.</p>
    </sec>
    <sec id="sec-2">
      <title>Real behavior of a system can be studied by analyzing its event logs. One can discover the model of a real system from them [3]. Moreover, process engineers</title>
      <p>
        can diagnose the discrepancies between observed (event logs) and modeled
(process models) behaviors using conformance checking techniques [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. It is possible
to use the results of conformance checking as an input for model repair [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
    </sec>
    <sec id="sec-3">
      <title>Process model repair aims at improving the quality of a model (according</title>
      <p>to some quality criteria), while changing as few of its parts as possible. The
general model repair problem can be de ned as follows. The initial model N is a</p>
    </sec>
    <sec id="sec-4">
      <title>Petri net. The event log L does not conform to N according to some prede ned</title>
      <p>criteria. The general repair problem is to nd such a model N r (repaired model)
that L better conforms N r according to this criteria and N r is as similar to N
as possible.</p>
      <p>In this paper, we propose a new method for model repair. Our approach is
based on the idea of patch-ups. We nd the non-conforming fragments in a model
and replace them with conforming ones. This paper describes the constraints, as
well as pros and cons of this approach. We propose a general modular scheme
which can be implemented using di erent algorithms. In this paper, we show one
possible implementation of tness-aware repair. The general scheme employs the
divide &amp; conquer concept.
2</p>
      <sec id="sec-4-1">
        <title>Related Work</title>
        <p>
          One of the rst papers in the eld of process model repair was published in
2011 [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ], where the authors formulated the problem of repair. Later in [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] they
focused on tness-aware control- ow repair. Models here are labeled work ow
nets. The approach separates non-conforming and conforming behaviors from the
event log. Non-conforming traces are then grouped into non-conforming
sublogs, and for each of them the corresponding sub-process work ow model is
discovered. Then these sub-process are added to the initial model in such a way
that the repaired model successfully replays the given event log. This approach
also allows to remove infrequently used parts of the model, and hence to make
the repaired model more simple. The problem of model simpli cation is close
to model repair. [
          <xref ref-type="bibr" rid="ref17 ref9">9, 17</xref>
          ] present a method for simplifying the model structure in
such a way that it can still replay most of the behavior described in the log.
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Using frequency metrics for model simpli cation is also described in [11].</title>
      <p>
        Another view on process repair, called impact-driven repair, is introduced
in [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. This procedure can be applied when a number of repair operations is
limited, and various repairs have di erent values. Based on the notion of
alignments [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], the paper suggests assigning costs to change operations. The approach
investigates the space of all possible repairs. Authors proposed and evaluated a
set of algorithms which can seek optimal (w.r.t. to assigned costs for repair
operations) repairs. In other words, the repair for Petri nets has been reduced to an
optimization problem. Plenty of experiments were conducted to select the best
algorithm with suitable parameters.
      </p>
    </sec>
    <sec id="sec-6">
      <title>One more contribution to the eld of model repair is [5], where the well</title>
      <p>structured process models (process trees) are improved w.r.t. the four classical
quality metrics: tness, precision, generalization, and simplicity. The authors use
a special genetic discovery algorithm, and pay attention to similarity between
the repaired and original models.</p>
      <p>
        In this paper, we use the divide &amp; conquer principle which has already been
employed within process mining. The core papers on it are [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ] and [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. In [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]
the author considers the task of model and log decompositions in context of
process mining. The practical questions of applying the divide &amp; conquer principle in
process mining are considered in [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. In particular, the author proposed a
modular scheme for process discovery. Hompes et al. [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] investigated the so-called
re-composition techniques. These methods help to select the most appropriate
size of decomposition fragments during process discovery [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. Model and event
log decompositions have been used for boosting the performance of conformance
checking [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ].
      </p>
      <p>In contrast to the existing approaches, this paper presents model repair based
on model decomposition, which allows keeping the initial model structure
unchanged as much as possible. This is important when a source model is readable
and well structured, i.e. it was developed by experts. If there were just minor
local changes in the process, we can repair the model by correcting its local
fragments.
3</p>
      <sec id="sec-6-1">
        <title>Preliminaries</title>
        <p>Multi-sets, Functions, and Sequences. Let IN denote the set of natural numbers
(including zero). A multi-set over a set S is a function b : S ! IN. By B(S) we
denote the set of all multi-sets over S. Further by b = [e; e; e; c; d; d] = e3; c; d2
we designate the multi-set over S = fe; c; dg, where b(e) = 3; b(c) = 1; b(d) = 2.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>By abuse of notation, we extend set operations to multi-set in the standard way.</title>
      <sec id="sec-7-1">
        <title>For a function f : X ! Y , dom(f ) denotes its domain, and f : X 9 Y denotes</title>
        <p>a partial function. A function f Q is a function projection of a (partial) function
f onto a set Q X , i dom(f Q) = dom(f ) \ Q and 8x 2 dom(f Q): f Q(x) =
f (x). This notation can be extended to multi-sets, e.g. e3; c; d2 fc;dg= c; d2 .</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>By X we denote the set of all nite sequences over a set X, and we use</title>
      <p>triangle brackets for sequences, e.g. = hx1; x2; :::; xni is a sequence of length n.</p>
    </sec>
    <sec id="sec-9">
      <title>By 1 2 we denote the concatenation of two sequences, Q is the projection</title>
      <p>of sequence onto the set Q.</p>
    </sec>
    <sec id="sec-10">
      <title>Process Models. In this paper, a process model is a labeled work ow net. A</title>
      <p>Petri net is a triple (P; T; F ), where P and T are disjoint sets of places and
transitions, and F : (P T ) [ (T P ) ! IN is a ow relation. For a transition
t 2 T a preset t and a postset t are de ned as the subsets of P such that
t = fpjF (p; t) 6= 0g and t = fpjF (t; p) 6= 0g. A labeled Petri net is a tuple
(P; T; F; l), where l: T ! UA [ f g is a labeling function, which maps transitions
to activity labels from UA. A transition t is called invisible, if l(t) = , otherwise
it is called visible. A marking of a Petri net is a function M : P ! IN, i.e. a
multiset of places. A transition t 2 T is enabled in a marking M i 8p 2 P M (p)</p>
    </sec>
    <sec id="sec-11">
      <title>F (p; t). An enabled transition t may re yielding a new marking M 0, such that</title>
      <p>M 0(p) = M (p) F (p; t) + F (t; p) for each p 2 P (denoted M !t M 0).</p>
      <p>A work ow net (WF-net) N = (P; T; F; l) is a labeled Petri net, such that
(1) there is one source place i 2 P such that i = ;, and Mi = [i],
(2) there is one sink place o 2 P such that o = ;, and Mo = [o],
(3) every node n 2 P [ T is on a path from i to o.
h
t8
c
t3
d
t4
f
t6
g
t7
a
source t1
b
t2
c1
c2
e
t5
c3
c4
sink</p>
    </sec>
    <sec id="sec-12">
      <title>An example of a WF-net is shown in Figure 1. This model is a variation of the conventional process mining example which can be found in [3]. It represents processing of compensation requests in a company.</title>
      <p>We will also use the following notations. Tv (N ) is a set of all labeled (visible)
transitions in WF-net N : l(t 2 Tv) 6= . Tvu (N ) is a set of visible transitions
in WF-net N with unique labels, that is 8t; t0 2 Tvu : l(t) 6= l(t0) i t 6= t0. The
union of two WF-nets is a WF-net N U = N 1 [ N 2 , which is built by the union
of sets of places, transitions, and ows: N 1 [N 2 = (P 1 [P 2; T 1 [T 2; F 1 [F 2; lu),
where lu 2 (T 1 [ T 2) 9 UA is a union of l1 and l2, dom(lu) = dom(l1) [ dom(l2),
lu(t) = l1(t) if t 2 dom(l1), and lu(t) = l2(t) if t 2 dom(l2) n dom(l1).</p>
      <sec id="sec-12-1">
        <title>Event Logs. Let A UA be a set of activities. A trace is a nite sequence of activities from A, i.e. 2 A . An event log L is a nite multi-set of traces, i.e.</title>
        <p>L 2 B(A ), i.e. L = ha; b; c; d; ei3; ha; b; a; d; ei5; ha; d; c; b; ei is an event log. A
projection of an event log L = [ 1; 2; : : : ; n] onto a set of activities B is a log
L B = [ 1 B ; 2 B ; : : : ; n B ], where i B is a projection of the trace i onto
B.</p>
      </sec>
    </sec>
    <sec id="sec-13">
      <title>Let N be a WF-net with transition labels from A, an initial marking [i], and</title>
      <p>a nal marking [o]. Let be a trace over A. We say that a trace = a1; : : : ; ak
perfectly ts N i there exists a sequence of rings [i] = m0 !t1 : : : !tk mk+1 =
[o] in N , s.t. the sequence of activities (t1); (t2); : : : ; (tk) after deleting all
invisible activities coincides with . A log L perfectly ts N i every trace from</p>
    </sec>
    <sec id="sec-14">
      <title>L perfectly ts N . E.g. the following traces perfectly t the model in Figure 1:</title>
      <p>[ha; b; c; e; f i, ha; b; c; e; h; b; d; e; gi, ha; b; d; e; gi].
4</p>
      <sec id="sec-14-1">
        <title>Modular Repair Scheme</title>
      </sec>
    </sec>
    <sec id="sec-15">
      <title>We consider the following repair problem: given a work ow net N and an event</title>
      <p>log L, which does not perfectly t N , to construct a model N r (repaired model),
so that L perfectly ts N r. We propose a modular approach for model repair.</p>
    </sec>
    <sec id="sec-16">
      <title>In this paper, we have deliberately limited the scope of its application but it is</title>
      <p>also applicable to the general model repair problem.</p>
      <p>The main idea here is to de ne a general description for a class of model repair
algorithms, based on the divide &amp; conquer principle. This scheme can be re ned
into di erent repair algorithms by choosing appropriate methods as its actual
procedural parameters. The repair scheme consists of building blocks, each of
which executes one of the steps. A graphical representation of the model repair
scheme is shown in Figure 2. Building blocks are shown with black frames. The
following blocks are present: (1) Decomposition, (2) Projection, (3) Selection,
(4) Repair, (5) Composition, (6) Evaluation. Each block contains an applicable
algorithm. Arcs show the data transfer between building blocks.</p>
      <p>The input of the scheme is a
pair (L; N ) where L is a norma- ELvoegnt Input MInoitdiaell
tive event log and N is a
nonconforming initial model. Model N Decomposition
is decomposed into several frag- Projection
ments using one of
decomposition algorithms. In the projection
block, an algorithm splits the event Selection MPaordtesl
log into sub-logs which correspond Log
to the model fragments. The se- Parts
lmecatniocne chbelockckingcaolngtoariintshma. Tchonisfoarl-- LMothogedPe«alBrPtasadrf»tosr «PBaardts» «GPaorotds»
gorithm calculates the conformance Repair
level for each pair (Li; N i).
According to this number, all model Composition
fragments are separated into two
sets. The rst one contains con- RePpaaritrsed
forming (good ) fragments. The sec- Evaluation
ond one consists of model
fragments which do not conform
corresponding sub-logs (bad ). The repair
block contains an algorithm which EvRaelusuatltiosn Output RMepoadireeld
somehow replaces non-conforming
model fragments by conforming
ones. The composition block is Fig. 2: Modular Repair Scheme
paired with the decomposition one.</p>
    </sec>
    <sec id="sec-17">
      <title>The result can be evaluated in the evaluation block by using any of the conformance checking methods. This is a general scheme.</title>
      <p>De nition 1 (Modular Repair Scheme). Let UA denote the universal set of
activities, and UN be the set of all WF-nets with transitions labeled over UA.</p>
      <p>The modular repair scheme is a procedure, which takes an event log L 2
B(UA) and a WF-net N 2 UN as an input and outputs a repaired model N r ,
which perfectly ts L, i.e. N r = ModularRepair (L; N ).</p>
      <p>The modular repair scheme has four procedural parameters eval , split , repair ,
and compose, where eval 2 (B(UA) UN ) ! [0; 1] is an evaluation function,
repair 2 B(UA) UN ) ! UN is a repair function, split 2 UN ! P(UN ) is a
decomposition function which is always paired with a related composition function
compose 2 P(UN ) ! UN .</p>
    </sec>
    <sec id="sec-18">
      <title>In Section 5, we propose a speci c set of algorithms which can be used to</title>
      <p>
        implement the scheme. Obviously, these algorithms cannot be chosen
arbitrarily. The main constraint comes from the requirement of valid decompositions [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
    </sec>
    <sec id="sec-19">
      <title>Valid decomposition assumes that the only elements, which can be shared between di erent fragments, are visible transitions with unique (in the initial net) labels. We slightly modify the de nition in [2] for our purpose.</title>
      <p>De nition 2 (Valid Decomposition for WF-nets). Let N 2 UN be a
WFnet with a labeling function l. D = fN 1; N 2; : : : ; N ng UN is a valid
decomposition if and only if
{ N i = (P i; T i; F i; li) is a Petri net for all 1 i n,
{ li = l T i for all 1 i n,
{ P i \ P j = ; for 1 i &lt; j n,
{ T i \ T j Tvu(N ) for 1 i &lt; j n, and
{ N = S1 i n N i.</p>
      <p>D(N ) is the set of all valid decompositions of N .</p>
    </sec>
    <sec id="sec-20">
      <title>It is proved in [2] that each valid decomposition can be used to decompose</title>
      <p>
        both the process discovery and conformance checking. It means that one can split
the model into several fragments using valid decomposition and then
unambiguously compose the initial model from model fragments using the corresponding
composition function. More formally, the following statements are valid.
Theorem 1 (Checking for Perfect Fitness Can Be Decomposed [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]).
Let L 2 B(A ) be an event log with A UA and let N 2 UN be a WF-net. For
any valid decomposition D = fN 1; N 2; : : : ; N ng 2 D(N ): L is perfectly tting
WF-net N if and only if for all 1 i n: L Av(N i) is perfectly tting N i.
Theorem 2 (Discovery can be decomposed [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]). Let L 2 B(A ) be an
event log with A = fa 2 j 2 Lg UA, C = A1; A2; : : : ; An with A = S C,
and disc 2 B(UA) ! UN a discovery algorithm. Let distr disc(L; C; disc) =
fN 1; N 2; : : : ; N ng and N = S1 i n N i. N 2 UN and D = fN 1; N 2; : : : ; N ng
is a valid decomposition of N , distr disc(L; C; disc) 2 D(N ).
      </p>
    </sec>
    <sec id="sec-21">
      <title>Using these results we can formulate the conditions for tness repair.</title>
      <p>De nition 3 (Perfect Fitness Repair for WF-nets). Let L 2 B(A ) be an
event log with A UA, and let N 2 UN be a WF-net such that tness (L; N ) &lt; 1.
Let N 0 = ModularRepair (L; N ) be a modular repair scheme. ModularRepair is
a perfect tness repair if tness (L; N 0) = 1.</p>
      <p>Proposition 1 (Su cient Conditions for Perfect Fitness Repair).
ModularRepair is a perfect tness repair if
1. split is a valid decomposition;
2. repair is a perfect discovery, for any event log it makes perfectly tting model;
3. compose is a transition fusion which merge all transitions with equal labels.
Proof. This proposition is a direct corollary of the Theorem 1. Let N be a
WFnet model with tness(L; N ) &lt; 1. By applying split N is decomposed into a set
of fragments D = fN 1; N 2; : : : ; N ng. Each fragment N i is then evaluated, let
fi = tness(L Av(N i); N i). By Theorem 1 9j: 1 j n; tness(L Av(N j); N j ) &lt; 1,
since the decomposition is valid. Fragments with not perfect tness should be
repaired. Then repair function repair is applied to each such fragment. For each i,
1 i n let Nri = repair (L Av(N i); N i) if tness(L Av(N i); N i) &lt; 1, and Nri = N i
if tness(L Av(N i); N i) = 1. Since we use a perfect discovery function, we get
8i: 1 i n; tness(L Av(Nri); Nri) = 1. Note that the set of activities was not
changed. By Theorem 2 the set Dr = fNr1; Nr2; : : : ; Nrng is a valid decomposition
of the net Nr = compose(Dr). Moreover, each fragment perfectly ts the
corresponding event log projection. Hence, by Theorem 1 we get tness(L; Nr) = 1 .
tu
5</p>
      <p>Modular Repair using Maximal Decomposition,
Inductive and ILP Miners</p>
    </sec>
    <sec id="sec-22">
      <title>The proposed general scheme can use speci c algorithms as building blocks. In</title>
      <p>
        particular, the maximal decomposition is a valid decomposition [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Maximal
decomposition is based on partitioning of net edges and is de ned as follows.
Each edge resides strictly in a single fragment. Transitions with unique labels
are placed on borders of fragments. This is needed to support the causal
dependencies between fragments. As a consequence, each transition, which has a
unique label in the original net, will reside in two or more fragments. And nally,
splitting saves an initial marking of the net. A place in a fragment is marked i
it was marked in the original net. It is important to mention that for a given
system net there can be one and only one maximal decomposition. Maximally
decomposed net can be easily composed back by fusion of transitions with the
same labels. Maximal decomposition is valid by construction [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>SN2
SN1</p>
      <p>a a
source t1 t1
h SN3
t8
b b
t2 t2
c1
c2
c
t3
d
t4</p>
      <p>SN4
c
t3
d
t4</p>
      <p>SN5
h
t8
c3
e e
t5 t5
c4</p>
      <p>SN6
f f
t6 t6
g g
t7 t7
sink</p>
    </sec>
    <sec id="sec-23">
      <title>We developed Algorithm 1 for constructing the maximal decomposition. It</title>
      <p>works as follows. Function decompose starts from an arbitrary place of
initial net. It calls handlePlace function while there are places to traverse. The
handlePlace function is needed to start the recursive procedure
handlePlace</p>
    </sec>
    <sec id="sec-24">
      <title>Recursively. This procedure traverses successor and predecessor nodes of a</title>
    </sec>
    <sec id="sec-25">
      <title>Algorithm 1 Maximal Decomposition Construction</title>
      <p>Input: N = (P; T; F; Minit; Mfinal) | a Petri net with markings
Output: DN | a decomposition (set of Petri nets)
place. The algorithm splits the net fragments if it nds visible transition. Note,
that all visible transitions in the initial net are assumed to have unique labels.</p>
    </sec>
    <sec id="sec-26">
      <title>The proof of this algorithm correctness is straightforward and rather technical,</title>
      <p>so we omit it here. Figure 3 shows the result of decomposing the example model
(Figure 1) according to this algorithm. It was decomposed into six net fragments.</p>
    </sec>
    <sec id="sec-27">
      <title>Note, that the result is always unique.</title>
      <p>
        A number of process discovery algorithms have been proposed in the
literature [
        <xref ref-type="bibr" rid="ref11 ref14">11, 14, 22</xref>
        ]. In this paper, we use the Inductive miner [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. This algorithm
guarantees perfect tness of a discovered model when executed with particular
settings (inductive miner infrequent with zero noise threshold ). It uses sequential
(and recursive) inductive inferences to build the so-called process trees, which
can be simply translated into well-structured work ow nets. Another discovery
algorithm, which we also use, is ILP (Integer Linear Programming)-based
algorithm [22]. It also guarantees perfect tness with particular settings. Further we
show experiments with both algorithms.
      </p>
    </sec>
    <sec id="sec-28">
      <title>There have been published a</title>
      <p>
        SN3 c SN3' d c lot of papers on log-model
contb2 c2 ttd43 s-start ttτs41 s2 tb2 s3 ttτs32 s-end fAomrmoanngcetheevamluaaitnionqu[a4l,it7y, 1d5i]-.
mensions are tness and
precision [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Fitness measures the
Fig. 4: This Model is Discovered by Inductive proportion of traces in an event
Miner log which can be successfully
replayed by the model. All traces from a perfectly tting log can be replayed.
Precision shows the proportion of model runs which have corresponding traces in an
event log. That is, perfect precision means the model allows only the behavior
presented in the log.
h
t8
e
t5 c4
the Wmeaicnocnrsiitdeerria ftonresesvaals- source ta1 c1 b tc3 c3 tgf6 sink
uFaotriontnoefssrecphaeirckrinesgulwtse. s-start tdτs1 s2 t2 s3 tτs2 s-end t7
use alignment-based t- t4
ness evaluation function
(cf. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] for details). Let
us consider an example. Fig. 5: The Petri Net after Transition Fusion
      </p>
    </sec>
    <sec id="sec-29">
      <title>Suppose, the model shown in Figure 3 does not correspond exactly to the current</title>
      <p>behavior in terms of the order of implementation of events with labels b and d.</p>
    </sec>
    <sec id="sec-30">
      <title>Now in contrast to the behavior of the model d always precedes b if both appear</title>
      <p>in the trace. Namely, our log contains traces ha; d; b; e; f i and ha; b; c; e; gi, and
there are no traces in which b precedes d. Then by applying the alignment-based
tness evaluation algorithm to model fragments shown in Figure 3 we obtain that
fragment SN 3 does not perfectly t the log. So, we discover a correct WF-net
for this fragment using Inductive miner (see Figure 4).</p>
    </sec>
    <sec id="sec-31">
      <title>After discovery of the correct model fragment and fusion of all fragments</title>
      <p>we obtain the net shown in Figure 5. This model perfectly ts the log with
traces ha; d; b; e; f i and ha; b; c; e; gi. One can easily see that repaired model is
not a WF-net. It contains tokens in places source and s-start in initial marking.</p>
    </sec>
    <sec id="sec-32">
      <title>The nal marking for the net consists of two tokens in places sink and s-end.</title>
      <p>However, this model can be easily transformed into an equivalent WF-net by
adding invisible start and nish transitions. Note, that we have removed places
s-start and s-end, since they restrict the model behavior and may generate a
deadlock. However, this generates another problem | increases the number of
possible model runs, and hence and reduces the model precision. To avoid such
problems it is better not to change fragment border nodes during the repair
transformations. This can be done by considering larger fragments.</p>
      <sec id="sec-32-1">
        <title>Evaluation</title>
      </sec>
    </sec>
    <sec id="sec-33">
      <title>We have implemented the approach as a plug-in for ProM Framework [21] which</title>
      <p>
        is a well-known tool in the process mining community. We already described some
details of implementation in [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. In order to evaluate our method we selected
several arti cial WF-nets. For these models we generated event logs using the
tool Gena [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. Then the initial models were slightly changed. These updated
models were repaired using our approach. Selected results are shown in Table 1.
      </p>
      <p>
        Two models SM1 and LM2 were selected. Both models are larger than the
example which is shown in Figure 1. One can nd the number of nodes in each
model in Size column. Fitness and Precision columns show the corresponding
metrics (calculated using technique from [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]). The Sim column contains similarity
evaluation results. Calculate Graph Edit Distance Similarity ProM plug-in [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]
was used for calculating similarity. In turn, this plug-in is based on the
theoretical backgrounds presented in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. The method returns a number between 0
(models are completely di erent) and 1 (models are equal). This plug-in takes
into consideration not only the structure of both models but also node labels.
The N-f column shows the overall number of decomposition fragments. A
number of changed (repaired) fragments is shown in N-bf column. We show Time in
milliseconds4. However, we understand the relativity of this calculation. It is also
interesting to compare the results for repair and direct discovery. For
comparison, we used two discovery algorithms which provide perfect tness: Inductive
4 Test con guration: Intel Core i7-3630QM, 2.40 GHz; 4 GB RAM; Windows 7 x64
and ILP miners. BL models were changed locally | we replaced neighbor
transitions, which were connected via one or two places. BNL models contain more
substantial changes | we replaced distant transitions.
      </p>
    </sec>
    <sec id="sec-34">
      <title>The results in Table 1 were obtained using the deletion of sink and source</title>
      <p>places. An implementation of the algorithm for alignments calculation consumes
a lot of memory in such cases, as it checks all possible behaviors of the model.</p>
    </sec>
    <sec id="sec-35">
      <title>In some cases the memory of our machine (4GB) was too small. These cases are</title>
      <p>marked with f in corresponding cell. Existing algorithm for similarity calculation
also failed in some cases. These are cases, when a repaired model contains many
silent transitions. Again, the number of possible combinations is too large.</p>
      <p>Table 1 shows how our repair approach provided perfectly tting models.</p>
    </sec>
    <sec id="sec-36">
      <title>However, we need to improve the precision of repaired models. In cases with local changes (BL) the approach shows acceptable results. It is interesting that the repair using ILP is faster than direct discovery (even with all the overhead on checking the fragments' tness).</title>
      <p>7</p>
      <sec id="sec-36-1">
        <title>Conclusion</title>
      </sec>
    </sec>
    <sec id="sec-37">
      <title>This paper proposes a modular approach for process model repair. The approach</title>
      <p>can be instantiated using di erent algorithms as building blocks. The main idea is
to use model decomposition and then replace un tting model fragments instead
of complete re-discovery. A proposed approach has been evaluated on several
arti cial process models.</p>
    </sec>
    <sec id="sec-38">
      <title>The method we propose allows to keep the initial model structure as much</title>
      <p>as possible. This is crucial when the initial model is of some special value, i.e. it
has a clear structure and/or was developed by human experts, who will continue
working with it. The experiments we made show that repairing a model based
on its decomposition can be very helpful. However, there are still open problems
and questions.</p>
    </sec>
    <sec id="sec-39">
      <title>First, our approach signi cantly reduces the precision of the model. To cope</title>
      <p>
        with this problem, we plan to propose more subtle decomposition techniques to
avoid problems with fragment border nodes. The re-composition method [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]
may be also helpful for improving the quality of obtained models.
      </p>
    </sec>
    <sec id="sec-40">
      <title>Second, we plan to compare the presented approach with other existing model</title>
      <p>repair methods. Moreover, the approach evaluation on real-life cases is needed.</p>
    </sec>
    <sec id="sec-41">
      <title>It would be also interesting to consider decomposition based model repair taking into account a combination of tness, precision, generalization, and simplicity metrics, and to study the applicability of this method to the cases of non-local changes.</title>
    </sec>
    <sec id="sec-42">
      <title>Acknowledgments. The authors thank the anonymous reviewers for important and helpful remarks and suggestions.</title>
      <p>21. Verbeek, H.M.W., Buijs, J.C.A.M., van Dongen, B.F., van der Aalst, W.M.P.:
ProM 6: The Process Mining Toolkit. In: La Rosa, M. (ed.) Proc. of BPM Demo
Track 2010. CEUR-WS.org, vol. 615, pp. 34{39 (2010)
22. van der Werf, J.M.E.M., van Dongen, B.F., Hurkens, C.A.J., Serebrenik, A.:
Process discovery using integer linear programming. Fundam. Inform. 94(3-4), 387{412
(2009)</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>van der Aalst</surname>
            ,
            <given-names>W.M.P.</given-names>
          </string-name>
          :
          <article-title>Decomposing Process Mining Problems Using Passages</article-title>
          . In: Haddad,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Pomello</surname>
          </string-name>
          ,
          <string-name>
            <surname>L</surname>
          </string-name>
          . (eds.)
          <source>PETRI NETS 2012. LNCS</source>
          , vol.
          <volume>7347</volume>
          , pp.
          <volume>72</volume>
          {
          <fpage>91</fpage>
          . Springer-Verlag, Berlin (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>van der Aalst</surname>
            ,
            <given-names>W.M.P.</given-names>
          </string-name>
          :
          <article-title>Decomposing Petri Nets for Process Mining: A Generic Approach</article-title>
          .
          <source>Distributed and Parallel Databases</source>
          <volume>31</volume>
          (
          <issue>4</issue>
          ),
          <volume>471</volume>
          {
          <fpage>507</fpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>van der Aalst</surname>
            ,
            <given-names>W.M.P.</given-names>
          </string-name>
          : Process Mining - Data Science in Action,
          <source>Second Edition</source>
          . Springer (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Adriansyah</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Aligning observed and modeled behavior</article-title>
          .
          <source>Ph.D. thesis</source>
          , Technische Universiteit Eindhoven (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Buijs</surname>
            ,
            <given-names>J.C.A.M.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>La</given-names>
            <surname>Rosa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Reijers</surname>
          </string-name>
          , H.A.,
          <string-name>
            <surname>van Dongen</surname>
            ,
            <given-names>B.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>van der Aalst</surname>
            ,
            <given-names>W.M.P.</given-names>
          </string-name>
          :
          <article-title>Improving Business Process Models Using Observed Behavior</article-title>
          . In: CudreMauroux,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Ceravolo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Gasevic</surname>
          </string-name>
          ,
          <string-name>
            <surname>D</surname>
          </string-name>
          . (eds.)
          <article-title>SIMPDA 2012</article-title>
          .
          <article-title>LNBIP</article-title>
          , vol.
          <volume>162</volume>
          , pp.
          <volume>44</volume>
          {
          <fpage>59</fpage>
          . Springer-Verlag, Berlin (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Dijkman</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dumas</surname>
          </string-name>
          , M.,
          <string-name>
            <surname>van Dongen</surname>
            ,
            <given-names>B.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ka</surname>
            <given-names></given-names>
          </string-name>
          <article-title>arik</article-title>
          , R.,
          <string-name>
            <surname>Mendling</surname>
          </string-name>
          , J.:
          <source>Similarity of Business Process Models: Metrics and Evaluation. Inf. Syst</source>
          .
          <volume>36</volume>
          (
          <issue>2</issue>
          ),
          <volume>498</volume>
          {
          <fpage>516</fpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7. van Dongen,
          <string-name>
            <given-names>B.F.</given-names>
            ,
            <surname>Carmona</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Chatain</surname>
          </string-name>
          ,
          <string-name>
            <surname>T.</surname>
          </string-name>
          :
          <article-title>A uni ed approach for measuring precision and generalization based on anti-alignments</article-title>
          .
          <source>In: BPM. LNCS</source>
          , vol.
          <volume>9850</volume>
          , pp.
          <volume>39</volume>
          {
          <fpage>56</fpage>
          . Springer (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Fahland</surname>
            , D., van der Aalst,
            <given-names>W.M.P.</given-names>
          </string-name>
          :
          <article-title>Repairing Process Models to Re ect Reality</article-title>
          . In: Barros,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Gal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Kindler</surname>
          </string-name>
          , E. (eds.) In
          <source>: Proc. of the BPM 2012. LNCS</source>
          , vol.
          <volume>7481</volume>
          , pp.
          <volume>229</volume>
          {
          <fpage>245</fpage>
          . Springer (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Fahland</surname>
            , D., van der Aalst,
            <given-names>W.M.P.</given-names>
          </string-name>
          :
          <article-title>Simplifying discovered process models in a controlled manner</article-title>
          .
          <source>Inf. Syst</source>
          .
          <volume>38</volume>
          (
          <issue>4</issue>
          ),
          <volume>585</volume>
          {
          <fpage>605</fpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Fahland</surname>
            , D., van der Aalst,
            <given-names>W.M.P.</given-names>
          </string-name>
          :
          <string-name>
            <surname>Model Repair - Aligning Process</surname>
          </string-name>
          Models to Reality. Inf. Syst.
          <volume>47</volume>
          ,
          <issue>220</issue>
          {
          <fpage>243</fpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11. Gunther, C.W., van der Aalst,
          <string-name>
            <surname>W.M.P.</surname>
          </string-name>
          :
          <article-title>Fuzzy mining: Adaptive process simpli cation based on multi-perspective metrics</article-title>
          .
          <source>BPM</source>
          <year>2007</year>
          , pp.
          <volume>328</volume>
          {
          <fpage>343</fpage>
          ., Springer (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Hompes</surname>
            ,
            <given-names>B.F.A.</given-names>
          </string-name>
          :
          <article-title>On Decomposed Process Discovery: How to Solve a Jigsaw Puzzle with Friends</article-title>
          .
          <source>Master's thesis</source>
          , Eindhoven University of Technology (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Hompes</surname>
            ,
            <given-names>B.F.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Verbeek</surname>
            , H.M.W., van der Aalst,
            <given-names>W.M.P.</given-names>
          </string-name>
          :
          <article-title>Finding suitable activity clusters for decomposed process discovery</article-title>
          .
          <source>In: SIMPDA 2014</source>
          . vol.
          <volume>1293</volume>
          , pp.
          <volume>16</volume>
          {
          <fpage>30</fpage>
          .
          <string-name>
            <surname>CEUR-WS.org</surname>
          </string-name>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Leemans</surname>
            ,
            <given-names>S.J.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fahland</surname>
            , D., van der Aalst,
            <given-names>W.M.P.</given-names>
          </string-name>
          :
          <article-title>Discovering block-structured process models from incomplete event logs</article-title>
          . In: Ciardo,
          <string-name>
            <given-names>G.</given-names>
            ,
            <surname>Kindler</surname>
          </string-name>
          , E. (eds.)
          <source>PETRI NETS</source>
          <year>2014</year>
          ,
          <article-title>LNCS</article-title>
          , vol.
          <volume>8489</volume>
          , pp.
          <volume>91</volume>
          {
          <fpage>110</fpage>
          . Springer (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Munoz-Gama</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Conformance Checking and Diagnosis in Process Mining - Comparing Observed and Modeled Processes, LNBIP</article-title>
          , vol.
          <volume>270</volume>
          . Springer (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Polyvyanyy</surname>
          </string-name>
          , A.,
          <string-name>
            <surname>van der Aalst</surname>
            ,
            <given-names>W.M.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>ter Hofstede</surname>
            ,
            <given-names>A.H.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wynn</surname>
          </string-name>
          , M.T.:
          <article-title>Impact-driven process model repair</article-title>
          .
          <source>ACM TOSEM</source>
          .
          <volume>25</volume>
          (
          <issue>4</issue>
          ),
          <volume>28</volume>
          :1{
          <fpage>28</fpage>
          :
          <fpage>60</fpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17. de San Pedro, J.,
          <string-name>
            <surname>Carmona</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cortadella</surname>
          </string-name>
          , J.:
          <article-title>Log-based simpli cation of process models</article-title>
          .
          <source>In: BPM. LNCS</source>
          , vol.
          <volume>9253</volume>
          , pp.
          <volume>457</volume>
          {
          <fpage>474</fpage>
          . Springer (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Shugurov</surname>
            ,
            <given-names>I.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mitsyuk</surname>
            ,
            <given-names>A.A.</given-names>
          </string-name>
          :
          <article-title>Generation of a Set of Event Logs with Noise</article-title>
          .
          <source>In: Proc. of the SYRCoSE 2014</source>
          . pp.
          <volume>88</volume>
          {
          <issue>95</issue>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Shugurov</surname>
            ,
            <given-names>I.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mitsyuk</surname>
            ,
            <given-names>A.A.</given-names>
          </string-name>
          :
          <article-title>Iskra: A Tool for Process Model Repair</article-title>
          .
          <source>Proceedings of the Institute for System Programming of the RAS</source>
          <volume>27</volume>
          (
          <issue>3</issue>
          ),
          <volume>237</volume>
          {
          <fpage>254</fpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Verbeek</surname>
            ,
            <given-names>H.M.W.</given-names>
          </string-name>
          :
          <article-title>Decomposed Process Mining with Divide And Conquer</article-title>
          .
          <source>In: BPM 2014 Demos</source>
          , vol.
          <volume>1295</volume>
          , pp.
          <volume>86</volume>
          {
          <fpage>90</fpage>
          .
          <string-name>
            <surname>CEUR-WS.org</surname>
          </string-name>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>