Process Model Repair by Detecting Unfitting
                   Fragments?

        Alexey A. Mitsyuk1 , Irina A. Lomazova1 , Ivan S. Shugurov2 , and
                          Wil M.P. van der Aalst3,1
             1
                  National Research University Higher School of Economics,
                       20 Myasnitskaya ul., 101000 Moscow, Russia.
                              {amitsyuk,ilomazova}@hse.ru
                            2
                               Technical University of Munich,
                    Arcisstraße 21, Munich, Bavaria, 80333, Germany.
                                  shugurov94@gmail.com
                          3
                            Eindhoven University of Technology,
                 P.O. Box 513, NL-5600 MB, Eindhoven, The Netherlands.
                                 w.m.p.v.d.aalst@tue.nl


        Abstract. Process models often do not adequately reflect the behavior
        of real-life systems. In the general case, it is possible to construct a
        new adequate model by applying one of the discovery algorithms. At
        the same time, there are cases when the original model is of particular
        value. In such cases, it is better to apply model repair algorithms. Those
        algorithms construct a model which reflects real behavior according to
        some criteria. Moreover, the repaired model remains as similar to the
        original one as possible. This paper proposes a modular approach which
        consists of three parts: (1) decomposing the process model and event
        log into model fragments and sub-logs, (2) selecting the fragments which
        need to be repaired, (3) repairing the selected fragments using a process
        discovery algorithm.

        Keywords: process mining, process model repair, process model de-
        composition, Petri nets, divide and conquer


1     Introduction
Modern process-aware information systems can be designed using model-driven
software engineering approaches. However, the exact implementation of design
models in real-life systems is a rare case. Moreover, processes tend to change
during the system’s life-cycle. A process owner usually wants relevant, up-to-
date models describing the process. In this paper we suggest a method, which
can repair a model.
    Real behavior of a system can be studied by analyzing its event logs. One can
discover the model of a real system from them [3]. Moreover, process engineers
?
    This work is supported by the Basic Research Program of the National Research
    University Higher School of Economics.
can diagnose the discrepancies between observed (event logs) and modeled (pro-
cess models) behaviors using conformance checking techniques [3]. It is possible
to use the results of conformance checking as an input for model repair [8].
    Process model repair aims at improving the quality of a model (according
to some quality criteria), while changing as few of its parts as possible. The
general model repair problem can be defined as follows. The initial model N is a
Petri net. The event log L does not conform to N according to some predefined
criteria. The general repair problem is to find such a model N r (repaired model)
that L better conforms N r according to this criteria and N r is as similar to N
as possible.
    In this paper, we propose a new method for model repair. Our approach is
based on the idea of patch-ups. We find the non-conforming fragments in a model
and replace them with conforming ones. This paper describes the constraints, as
well as pros and cons of this approach. We propose a general modular scheme
which can be implemented using different algorithms. In this paper, we show one
possible implementation of fitness-aware repair. The general scheme employs the
divide & conquer concept.


2   Related Work

One of the first papers in the field of process model repair was published in
2011 [8], where the authors formulated the problem of repair. Later in [10] they
focused on fitness-aware control-flow repair. Models here are labeled workflow
nets. The approach separates non-conforming and conforming behaviors from the
event log. Non-conforming traces are then grouped into non-conforming sub-
logs, and for each of them the corresponding sub-process workflow model is
discovered. Then these sub-process are added to the initial model in such a way
that the repaired model successfully replays the given event log. This approach
also allows to remove infrequently used parts of the model, and hence to make
the repaired model more simple. The problem of model simplification is close
to model repair. [9, 17] present a method for simplifying the model structure in
such a way that it can still replay most of the behavior described in the log.
Using frequency metrics for model simplification is also described in [11].
    Another view on process repair, called impact-driven repair, is introduced
in [16]. This procedure can be applied when a number of repair operations is
limited, and various repairs have different values. Based on the notion of align-
ments [4], the paper suggests assigning costs to change operations. The approach
investigates the space of all possible repairs. Authors proposed and evaluated a
set of algorithms which can seek optimal (w.r.t. to assigned costs for repair oper-
ations) repairs. In other words, the repair for Petri nets has been reduced to an
optimization problem. Plenty of experiments were conducted to select the best
algorithm with suitable parameters.
    One more contribution to the field of model repair is [5], where the well-
structured process models (process trees) are improved w.r.t. the four classical
quality metrics: fitness, precision, generalization, and simplicity. The authors use
a special genetic discovery algorithm, and pay attention to similarity between
the repaired and original models.
    In this paper, we use the divide & conquer principle which has already been
employed within process mining. The core papers on it are [1, 2] and [20]. In [2]
the author considers the task of model and log decompositions in context of pro-
cess mining. The practical questions of applying the divide & conquer principle in
process mining are considered in [20]. In particular, the author proposed a mod-
ular scheme for process discovery. Hompes et al. [12] investigated the so-called
re-composition techniques. These methods help to select the most appropriate
size of decomposition fragments during process discovery [13]. Model and event
log decompositions have been used for boosting the performance of conformance
checking [15].
    In contrast to the existing approaches, this paper presents model repair based
on model decomposition, which allows keeping the initial model structure un-
changed as much as possible. This is important when a source model is readable
and well structured, i.e. it was developed by experts. If there were just minor
local changes in the process, we can repair the model by correcting its local
fragments.


3    Preliminaries

Multi-sets, Functions, and Sequences. Let IN denote the set of natural numbers
(including zero). A multi-set over a set S is a function b : S → IN. ByB(S) we
denote the set of all multi-sets over S. Further by b = [e, e, e, c, d, d] = e3 , c, d2
we designate the multi-set over S = {e, c, d}, where b(e) = 3, b(c) = 1, b(d) = 2.
By abuse of notation, we extend set operations to multi-set in the standard way.
    For a function f : X → Y , dom(f ) denotes its domain, and f : X 9 Y denotes
a partial function. A function f Q is a function projection of a (partial) function
f onto a set Q ⊆ X , iff dom(f Q ) = dom(f ) ∩ Q and ∀x ∈ dom(f         Q ): f Q
                                                                                     (x) =
                                                                                          
f (x). This notation can be extended to multi-sets, e.g. e3 , c, d2 {c,d} = c, d2 .
    By X ∗ we denote the set of all finite sequences over a set X, and we use
triangle brackets for sequences, e.g. σ = hx1 , x2 , ..., xn i is a sequence of length n.
By σ1 · σ2 we denote the concatenation of two sequences, σQ is the projection
of sequence σ onto the set Q.

Process Models. In this paper, a process model is a labeled workflow net. A
Petri net is a triple (P, T, F ), where P and T are disjoint sets of places and
transitions, and F : (P × T ) ∪ (T × P ) → IN is a flow relation. For a transition
t ∈ T a preset •t and a postset t• are defined as the subsets of P such that
•t = {p|F (p, t) 6= 0} and t• = {p|F (t, p) 6= 0}. A labeled Petri net is a tuple
(P, T, F, l), where l: T → UA ∪ {τ } is a labeling function, which maps transitions
to activity labels from UA . A transition t is called invisible, if l(t) = τ , otherwise
it is called visible. A marking of a Petri net is a function M : P → IN, i.e. a multi-
set of places. A transition t ∈ T is enabled in a marking M iff ∀p ∈ P M (p) ≥
F (p, t). An enabled transition t may fire yielding a new marking M 0 , such that
                                                                t
M 0 (p) = M (p) − F (p, t) + F (t, p) for each p ∈ P (denoted M → M 0 ).
    A workflow net (WF-net) N = (P, T, F, l) is a labeled Petri net, such that
(1) there is one source place i ∈ P such that •i = ∅, and Mi = [i],
(2) there is one sink place o ∈ P such that o• = ∅, and Mo = [o],
(3) every node n ∈ P ∪ T is on a path from i to o.
                                            h
                                            t8

                                            c                          f
                                            t3                         t6
                      a         b                        e
             source   t1   c1   t2   c2     d      c3    t5   c4       g      sink
                                            t4                         t7


Fig. 1: The Model of Compensation Requests Processing (activities have the fol-
lowing labels: a = Register request; b = Check ticket; c = Examine casually; d
= Examine thoroughly; e = Decide; f = Send rejection letter; g = Pay compen-
sation; h = Re-initiate request)
     An example of a WF-net is shown in Figure 1. This model is a variation of
the conventional process mining example which can be found in [3]. It represents
processing of compensation requests in a company.
     We will also use the following notations. Tv (N ) is a set of all labeled (visible)
transitions in WF-net N : l(t ∈ Tv ) 6= τ . Tvu (N ) is a set of visible transitions
in WF-net N with unique labels, that is ∀t, t0 ∈ Tvu : l(t) 6= l(t0 ) iff t 6= t0 . The
union of two WF-nets is a WF-net N U = N 1 ∪ N 2 , which is built by the union
of sets of places, transitions, and flows: N 1 ∪N 2 = (P 1 ∪P 2 , T 1 ∪T 2 , F 1 ∪F 2 , lu ),
where lu ∈ (T 1 ∪ T 2 ) 9 UA is a union of l1 and l2 , dom(lu ) = dom(l1 ) ∪ dom(l2 ),
lu (t) = l1 (t) if t ∈ dom(l1 ), and lu (t) = l2 (t) if t ∈ dom(l2 ) \ dom(l1 ).

Event Logs. Let A ⊆ UA be a set of activities. A trace σ is a finite sequence of
activities from A, i.e. σ ∈ A∗ . An event log L is a finite multi-set           of traces, i.e.
L ∈ B(A∗ ), i.e. L = ha, b, c, d, ei3 , ha, b, a, d, ei5 , ha, d, c, b, ei is an event log. A
                                                                          

projection of an event log L = [σ1 , σ2 , . . . , σn ] onto a set of activities B is a log
L B = [σ1 B , σ2 B , . . . , σn B ], where σi B is a projection of the trace σi onto
B.
    Let N be a WF-net with transition labels from A, an initial marking [i], and
a final marking [o]. Let σ be a trace over A. We say that a trace σ = a1 , . . . , ak
                                                                            t1     tk
perfectly fits N iff there exists a sequence of firings [i] = m0 →             ... →  mk+1 =
[o] in N , s.t. the sequence of activities λ(t1 ), λ(t2 ), . . . , λ(tk ) after deleting all
invisible activities coincides with σ. A log L perfectly fits N iff every trace from
L perfectly fits N . E.g. the following traces perfectly fit the model in Figure 1:
[ha, b, c, e, f i, ha, b, c, e, h, b, d, e, gi, ha, b, d, e, gi].

4    Modular Repair Scheme
We consider the following repair problem: given a workflow net N and an event
log L, which does not perfectly fit N , to construct a model N r (repaired model),
so that L perfectly fits N r . We propose a modular approach for model repair.
In this paper, we have deliberately limited the scope of its application but it is
also applicable to the general model repair problem.
    The main idea here is to define a general description for a class of model repair
algorithms, based on the divide & conquer principle. This scheme can be refined
into different repair algorithms by choosing appropriate methods as its actual
procedural parameters. The repair scheme consists of building blocks, each of
which executes one of the steps. A graphical representation of the model repair
scheme is shown in Figure 2. Building blocks are shown with black frames. The
following blocks are present: (1) Decomposition, (2) Projection, (3) Selection,
(4) Repair, (5) Composition, (6) Evaluation. Each block contains an applicable
algorithm. Arcs show the data transfer between building blocks.
    The input of the scheme is a
pair (L, N ) where L is a norma-                      Event
                                                       Log       Input Model Initial

tive event log and N is a non-
conforming initial model. Model N                           Decomposition
is decomposed into several frag-                         Projection
ments using one of decomposi-
tion algorithms. In the projection
block, an algorithm splits the event                         Selection                Model
                                                                                       Parts

log into sub-logs which correspond               Log
                                                Parts
to the model fragments. The se-
lection block contains a confor-                   Log Parts for
                                                                                     «Good»
                                                                      «Bad»           Parts
                                                    the «Bad»
mance checking algorithm. This al-                 Model Parts         Parts

gorithm calculates the conformance                     Repair
level for each pair (Li , N i ). Ac-
cording to this number, all model
                                                                 Composition
fragments are separated into two
                                            Repaired
sets. The first one contains con-             Parts

forming (good ) fragments. The sec-        Evaluation
ond one consists of model frag-
ments which do not conform corre-
sponding sub-logs (bad ). The repair
block contains an algorithm which         Evaluation
                                           Results          Output       Repaired
                                                                           Model

somehow replaces non-conforming
model fragments by conforming
ones. The composition block is                   Fig. 2: Modular Repair Scheme
paired with the decomposition one.
The result can be evaluated in the evaluation block by using any of the confor-
mance checking methods. This is a general scheme.

Definition 1 (Modular Repair Scheme). Let UA denote the universal set of
activities, and UN be the set of all WF-nets with transitions labeled over UA .
    The modular repair scheme is a procedure, which takes an event log L ∈
    ∗
B(UA  ) and a WF-net N ∈ UN as an input and outputs a repaired model N r ,
which perfectly fits L, i.e. N r = ModularRepair (L, N ).
   The modular repair scheme has four procedural parameters eval , split, repair ,
                                  ∗
and compose, where eval ∈ (B(UA     ) × UN ) → [0; 1] is an evaluation function,
             ∗
repair ∈ B(UA ) × UN ) → UN is a repair function, split ∈ UN → P(UN ) is a
decomposition function which is always paired with a related composition function
compose ∈ P(UN ) → UN .
     In Section 5, we propose a specific set of algorithms which can be used to
implement the scheme. Obviously, these algorithms cannot be chosen arbitrar-
ily. The main constraint comes from the requirement of valid decompositions [2].
Valid decomposition assumes that the only elements, which can be shared be-
tween different fragments, are visible transitions with unique (in the initial net)
labels. We slightly modify the definition in [2] for our purpose.
Definition 2 (Valid Decomposition for WF-nets). Let N ∈ UN be a WF-
net with a labeling function l. D = {N 1 , N 2 , . . . , N n } ⊆ UN is a valid decompo-
sition if and only if
  – N i = (P i , T i , F i , li ) is a Petri net for all 1 ≤ i ≤ n,
  – li = lT i for all 1 ≤ i ≤ n,
  – P i ∩ P j = ∅ for 1 ≤ i < j ≤ n,
  – Ti ∩ T  j      u
          S ⊆ Tv (Ni) for 1 ≤ i < j ≤ n, and
  – N = 1≤i≤n N .
D(N ) is the set of all valid decompositions of N .
   It is proved in [2] that each valid decomposition can be used to decompose
both the process discovery and conformance checking. It means that one can split
the model into several fragments using valid decomposition and then unambigu-
ously compose the initial model from model fragments using the corresponding
composition function. More formally, the following statements are valid.
Theorem 1 (Checking for Perfect Fitness Can Be Decomposed [2]).
Let L ∈ B(A∗ ) be an event log with A ⊆ UA and let N ∈ UN be a WF-net. For
any valid decomposition D = {N 1 , N 2 , . . . , N n } ∈ D(N ): L is perfectly fitting
WF-net N if and only if for all 1 ≤ i ≤ n: LAv (N i ) is perfectly fitting N i .
Theorem 2 (Discovery can be decomposed            1 [2]). Let L ∈ B(A∗ ) beSan
event log with A = {a ∈ σ | σ ∈ L} ⊆ UA , C = A , A , . . . , An with A = C,
                                                        2
                    ∗
and disc ∈ B(UA       ) → UN aSdiscovery algorithm. Let distr disc(L, C, disc) =
{N , N , . . . , N } and N = 1≤i≤n N i . N ∈ UN and D = {N 1 , N 2 , . . . , N n }
    1   2         n

is a valid decomposition of N , distr disc(L, C, disc) ∈ D(N ).
Using these results we can formulate the conditions for fitness repair.
Definition 3 (Perfect Fitness Repair for WF-nets). Let L ∈ B(A∗ ) be an
event log with A ⊆ UA , and let N ∈ UN be a WF-net such that fitness(L, N ) < 1.
Let N 0 = ModularRepair (L, N ) be a modular repair scheme. ModularRepair is
a perfect fitness repair if fitness(L, N 0 ) = 1.
Proposition 1 (Sufficient Conditions for Perfect Fitness Repair).
ModularRepair is a perfect fitness repair if
 1. split is a valid decomposition;
 2. repair is a perfect discovery, for any event log it makes perfectly fitting model;
 3. compose is a transition fusion which merge all transitions with equal labels.
Proof. This proposition is a direct corollary of the Theorem 1. Let N be a WF-
net model with fitness(L, N ) < 1. By applying split N is decomposed into a set
of fragments D = {N 1 , N 2 , . . . , N n }. Each fragment N i is then evaluated, let
fi = fitness(LAv (N i ) , N i ). By Theorem 1 ∃j: 1≤j≤n, fitness(LAv (N j ) , N j ) < 1,
since the decomposition is valid. Fragments with not perfect fitness should be re-
paired. Then repair function repair is applied to each such fragment. For each i,
1≤i≤n let Nri = repair (LAv (N i ) , N i ) if fitness(LAv (N i ) , N i ) < 1, and Nri = N i
if fitness(LAv (N i ) , N i ) = 1. Since we use a perfect discovery function, we get
∀i: 1≤i≤n, fitness(L Av (Nri ) , Nri ) = 1. Note that the set of activities was not
changed. By Theorem 2 the set Dr = {Nr1 , Nr2 , . . . , Nrn } is a valid decomposition
of the net Nr = compose(Dr ). Moreover, each fragment perfectly fits the corre-
sponding event log projection. Hence, by Theorem 1 we get fitness(L, Nr ) = 1 .
                                                                                           t
                                                                                           u


5      Modular Repair using Maximal Decomposition,
       Inductive and ILP Miners
The proposed general scheme can use specific algorithms as building blocks. In
particular, the maximal decomposition is a valid decomposition [2]. Maximal
decomposition is based on partitioning of net edges and is defined as follows.
Each edge resides strictly in a single fragment. Transitions with unique labels
are placed on borders of fragments. This is needed to support the causal de-
pendencies between fragments. As a consequence, each transition, which has a
unique label in the original net, will reside in two or more fragments. And finally,
splitting saves an initial marking of the net. A place in a fragment is marked iff
it was marked in the original net. It is important to mention that for a given
system net there can be one and only one maximal decomposition. Maximally
decomposed net can be easily composed back by fusion of transitions with the
same labels. Maximal decomposition is valid by construction [2].
                  SN2                                             SN5
                             h    SN3             SN4              h              SN6
                             t8                                   t8
    SN1                                      c     c                         f     f
                                             t3   t3                         t6   t6
             a    a          b    b                          e    e
    source   t1   t1    c1   t2   t2    c2   d    d     c3   t5   t5    c4   g    g      sink
                                             t4   t4                         t7   t7


             Fig. 3: A Maximal Decomposition of the Model from Figure 1
    We developed Algorithm 1 for constructing the maximal decomposition. It
works as follows. Function decompose starts from an arbitrary place of ini-
tial net. It calls handlePlace function while there are places to traverse. The
handlePlace function is needed to start the recursive procedure handlePlace-
Recursively. This procedure traverses successor and predecessor nodes of a
Algorithm 1 Maximal Decomposition Construction
Input: N = (P, T, F, Minit , Mf inal ) — a Petri net with markings
Output: DN — a decomposition (set of Petri nets)
 1: function decompose(N )
 2:    HandledP laces ← emptymap; DN ← ∅;
 3:    for all p ∈ P do
 4:        if p ∈/ keys(HandledP laces) then
 5:            N 0 = (P 0 , T 0 , F 0 , Minit
                                         0
                                              , Mf0 inal );                    . make new empty fragment
 6:            HandledP laces ← HandledP laces ∪ handlePlace(N , p, N 0 );
 7:            marking(N 0 ) ← CopyM arking(N );
 8:            DN ← DN ∪ {N 0 };
 9:        end if
10:     end for
11:     return DN ;
12: end function
13: function handlePlace(N , p, N 0 )
14:     P Dict, T Dict ← emptymap;               . mapping of nodes from original net to a current fragment
15:     handlePlaceRecursively(N , p, N 0 , P Dict, T Dict);
16:     return P Dict;
17: end function
18: function handlePlaceRecursively(N , p, N 0 , P Dict, T Dict)
19:     if p ∈ keys(P Dict) then
20:         return ;
21:     end if
22:     P Dict[p] ←− createP lace(N 0 , label(p));                 . create new place in N 0 with label of p
23:     handlePrecedingTransitions(N , p, N 0 , P Dict, T Dict);
24:     handleSubsequentTransitions(N , p, N 0 , P Dict, T Dict);
25: end function
26: function handle(Preseding/Subsequent)Transitions(N , p, N 0 , P Dict, T Dict)
27:     for all f ∈ (incoming/outgoing)(N, p) do
28:         t0 ←− null; t ← (source/target)(f );
29:         if t ∈ keys(T Dict) then
30:             t0 ←− T Dict[t];                                   . a transition has already been handled
31:         else
                 0                                  0
32:             t ←− createT ransition(N , label(t), visible(t));
33:             T Dict[t] ←− t0 ;                                                           . new transition
34:         end if
                        0    0                              0
35:         addArc(N , (t /P Dict[p]), (P Dict[p]/t ));
36:         if ¬visible(t0 ) then
37:             for f 0 ∈ (incoming/outgoing)(N, t) do
38:                 handlePlaceRecursively(N , (source/target)(f 0 ), N 0 , P Dict, T Dict);
39:             end for
40:         end if
41:     end for
42: end function


place. The algorithm splits the net fragments if it finds visible transition. Note,
that all visible transitions in the initial net are assumed to have unique labels.
The proof of this algorithm correctness is straightforward and rather technical,
so we omit it here. Figure 3 shows the result of decomposing the example model
(Figure 1) according to this algorithm. It was decomposed into six net fragments.
Note, that the result is always unique.
    A number of process discovery algorithms have been proposed in the litera-
ture [11, 14, 22]. In this paper, we use the Inductive miner [14]. This algorithm
guarantees perfect fitness of a discovered model when executed with particular
settings (inductive miner infrequent with zero noise threshold ). It uses sequential
(and recursive) inductive inferences to build the so-called process trees, which
can be simply translated into well-structured workflow nets. Another discovery
algorithm, which we also use, is ILP (Integer Linear Programming)-based algo-
rithm [22]. It also guarantees perfect fitness with particular settings. Further we
show experiments with both algorithms.

                                                     There have been published a
  SN3               SN3'
             c              d            c
                                                 lot of papers on log-model con-
    b
             t3             t4    b      t3      formance evaluation [4, 7, 15].
    t2  c2   d     s-start  τ  s2 t2 s3  τ s-end
             t4
                           ts1          ts2      Among the main quality di-
                                                 mensions are fitness and preci-
                                                 sion [3]. Fitness measures the
Fig. 4: This Model is Discovered by Inductive proportion of traces in an event
Miner                                            log which can be successfully re-
played by the model. All traces from a perfectly fitting log can be replayed. Pre-
cision shows the proportion of model runs which have corresponding traces in an
event log. That is, perfect precision means the model allows only the behavior
presented in the log.
                                                                                    h
                                                                                    t8
    We consider fitness as                                                    f
                                                                                              t6
                                    a             c              e
the main criteria for eval-     source
                                           bt1    c1             t3           gc3   t5   c4        sink

uation of repair results.           τ                  t2   s3         s-end                  t7
                                            ts1
                                                  τ
                                                                 ts2
For fitness checking we          s-start          s2
                                    d
use alignment-based fit-                   t4

ness evaluation function
(cf. [4] for details). Let
us consider an example.         Fig. 5: The Petri Net after Transition Fusion
Suppose, the model shown in Figure 3 does not correspond exactly to the current
behavior in terms of the order of implementation of events with labels b and d.
Now in contrast to the behavior of the model d always precedes b if both appear
in the trace. Namely, our log contains traces ha, d, b, e, f i and ha, b, c, e, gi, and
there are no traces in which b precedes d. Then by applying the alignment-based
fitness evaluation algorithm to model fragments shown in Figure 3 we obtain that
fragment SN 3 does not perfectly fit the log. So, we discover a correct WF-net
for this fragment using Inductive miner (see Figure 4).

    After discovery of the correct model fragment and fusion of all fragments
we obtain the net shown in Figure 5. This model perfectly fits the log with
traces ha, d, b, e, f i and ha, b, c, e, gi. One can easily see that repaired model is
not a WF-net. It contains tokens in places source and s-start in initial marking.
The final marking for the net consists of two tokens in places sink and s-end.
However, this model can be easily transformed into an equivalent WF-net by
adding invisible start and finish transitions. Note, that we have removed places
s-start and s-end, since they restrict the model behavior and may generate a
deadlock. However, this generates another problem — increases the number of
possible model runs, and hence and reduces the model precision. To avoid such
problems it is better not to change fragment border nodes during the repair
transformations. This can be done by considering larger fragments.
6     Evaluation

We have implemented the approach as a plug-in for ProM Framework [21] which
is a well-known tool in the process mining community. We already described some
details of implementation in [19]. In order to evaluate our method we selected
several artificial WF-nets. For these models we generated event logs using the
tool Gena [18]. Then the initial models were slightly changed. These updated
models were repaired using our approach. Selected results are shown in Table 1.
                     Table 1: Selected Experimental Results


    Two models SM1 and LM2 were selected. Both models are larger than the
example which is shown in Figure 1. One can find the number of nodes in each
model in Size column. Fitness and Precision columns show the corresponding
metrics (calculated using technique from [4]). The Sim column contains similarity
evaluation results. Calculate Graph Edit Distance Similarity ProM plug-in [12]
was used for calculating similarity. In turn, this plug-in is based on the theo-
retical backgrounds presented in [6]. The method returns a number between 0
(models are completely different) and 1 (models are equal). This plug-in takes
into consideration not only the structure of both models but also node labels.
The N-f column shows the overall number of decomposition fragments. A num-
ber of changed (repaired) fragments is shown in N-bf column. We show Time in
milliseconds4 . However, we understand the relativity of this calculation. It is also
interesting to compare the results for repair and direct discovery. For compari-
son, we used two discovery algorithms which provide perfect fitness: Inductive
4
    Test configuration: Intel Core i7-3630QM, 2.40 GHz; 4 GB RAM; Windows 7 x64
and ILP miners. BL models were changed locally — we replaced neighbor tran-
sitions, which were connected via one or two places. BNL models contain more
substantial changes — we replaced distant transitions.
    The results in Table 1 were obtained using the deletion of sink and source
places. An implementation of the algorithm for alignments calculation consumes
a lot of memory in such cases, as it checks all possible behaviors of the model.
In some cases the memory of our machine (4GB) was too small. These cases are
marked with f in corresponding cell. Existing algorithm for similarity calculation
also failed in some cases. These are cases, when a repaired model contains many
silent transitions. Again, the number of possible combinations is too large.
    Table 1 shows how our repair approach provided perfectly fitting models.
However, we need to improve the precision of repaired models. In cases with
local changes (BL) the approach shows acceptable results. It is interesting that
the repair using ILP is faster than direct discovery (even with all the overhead
on checking the fragments’ fitness).


7   Conclusion

This paper proposes a modular approach for process model repair. The approach
can be instantiated using different algorithms as building blocks. The main idea is
to use model decomposition and then replace unfitting model fragments instead
of complete re-discovery. A proposed approach has been evaluated on several
artificial process models.
    The method we propose allows to keep the initial model structure as much
as possible. This is crucial when the initial model is of some special value, i.e. it
has a clear structure and/or was developed by human experts, who will continue
working with it. The experiments we made show that repairing a model based
on its decomposition can be very helpful. However, there are still open problems
and questions.
    First, our approach significantly reduces the precision of the model. To cope
with this problem, we plan to propose more subtle decomposition techniques to
avoid problems with fragment border nodes. The re-composition method [12]
may be also helpful for improving the quality of obtained models.
    Second, we plan to compare the presented approach with other existing model
repair methods. Moreover, the approach evaluation on real-life cases is needed.
It would be also interesting to consider decomposition based model repair taking
into account a combination of fitness, precision, generalization, and simplicity
metrics, and to study the applicability of this method to the cases of non-local
changes.


Acknowledgments. The authors thank the anonymous reviewers for impor-
tant and helpful remarks and suggestions.
References

 1. van der Aalst, W.M.P.: Decomposing Process Mining Problems Using Passages. In:
    Haddad, S., Pomello, L. (eds.) PETRI NETS 2012. LNCS, vol. 7347, pp. 72–91.
    Springer-Verlag, Berlin (2012)
 2. van der Aalst, W.M.P.: Decomposing Petri Nets for Process Mining: A Generic
    Approach. Distributed and Parallel Databases 31(4), 471–507 (2013)
 3. van der Aalst, W.M.P.: Process Mining - Data Science in Action, Second Edition.
    Springer (2016)
 4. Adriansyah, A.: Aligning observed and modeled behavior. Ph.D. thesis, Technische
    Universiteit Eindhoven (2014)
 5. Buijs, J.C.A.M., La Rosa, M., Reijers, H.A., van Dongen, B.F., van der Aalst,
    W.M.P.: Improving Business Process Models Using Observed Behavior. In: Cudre-
    Mauroux, P., Ceravolo, P., Gasevic, D. (eds.) SIMPDA 2012. LNBIP, vol. 162, pp.
    44–59. Springer-Verlag, Berlin (2013)
 6. Dijkman, R., Dumas, M., van Dongen, B.F., Käärik, R., Mendling, J.: Similarity of
    Business Process Models: Metrics and Evaluation. Inf. Syst. 36(2), 498–516 (2011)
 7. van Dongen, B.F., Carmona, J., Chatain, T.: A unified approach for measuring
    precision and generalization based on anti-alignments. In: BPM. LNCS, vol. 9850,
    pp. 39–56. Springer (2016)
 8. Fahland, D., van der Aalst, W.M.P.: Repairing Process Models to Reflect Reality.
    In: Barros, A., Gal, A., Kindler, E. (eds.) In: Proc. of the BPM 2012. LNCS, vol.
    7481, pp. 229–245. Springer (2012)
 9. Fahland, D., van der Aalst, W.M.P.: Simplifying discovered process models in a
    controlled manner. Inf. Syst. 38(4), 585–605 (2013)
10. Fahland, D., van der Aalst, W.M.P.: Model Repair - Aligning Process Models to
    Reality. Inf. Syst. 47, 220–243 (2015)
11. Günther, C.W., van der Aalst, W.M.P.: Fuzzy mining: Adaptive process simplifica-
    tion based on multi-perspective metrics. BPM 2007, pp.328–343., Springer (2007)
12. Hompes, B.F.A.: On Decomposed Process Discovery: How to Solve a Jigsaw Puzzle
    with Friends. Master’s thesis, Eindhoven University of Technology (2014)
13. Hompes, B.F.A., Verbeek, H.M.W., van der Aalst, W.M.P.: Finding suitable ac-
    tivity clusters for decomposed process discovery. In: SIMPDA 2014. vol. 1293, pp.
    16–30. CEUR-WS.org (2014)
14. Leemans, S.J.J., Fahland, D., van der Aalst, W.M.P.: Discovering block-structured
    process models from incomplete event logs. In: Ciardo, G., Kindler, E. (eds.)
    PETRI NETS 2014, LNCS, vol. 8489, pp. 91–110. Springer (2014)
15. Munoz-Gama, J.: Conformance Checking and Diagnosis in Process Mining - Com-
    paring Observed and Modeled Processes, LNBIP, vol. 270. Springer (2016)
16. Polyvyanyy, A., van der Aalst, W.M.P., ter Hofstede, A.H.M., Wynn, M.T.:
    Impact-driven process model repair. ACM TOSEM. 25(4), 28:1–28:60 (2017)
17. de San Pedro, J., Carmona, J., Cortadella, J.: Log-based simplification of process
    models. In: BPM. LNCS, vol. 9253, pp. 457–474. Springer (2015)
18. Shugurov, I.S., Mitsyuk, A.A.: Generation of a Set of Event Logs with Noise. In:
    Proc. of the SYRCoSE 2014. pp. 88–95 (2014)
19. Shugurov, I.S., Mitsyuk, A.A.: Iskra: A Tool for Process Model Repair. Proceedings
    of the Institute for System Programming of the RAS 27(3), 237–254 (2015)
20. Verbeek, H.M.W.: Decomposed Process Mining with Divide And Conquer. In:
    BPM 2014 Demos, vol. 1295, pp. 86–90. CEUR-WS.org (2014)
21. Verbeek, H.M.W., Buijs, J.C.A.M., van Dongen, B.F., van der Aalst, W.M.P.:
    ProM 6: The Process Mining Toolkit. In: La Rosa, M. (ed.) Proc. of BPM Demo
    Track 2010. CEUR-WS.org, vol. 615, pp. 34–39 (2010)
22. van der Werf, J.M.E.M., van Dongen, B.F., Hurkens, C.A.J., Serebrenik, A.: Pro-
    cess discovery using integer linear programming. Fundam. Inform. 94(3-4), 387–412
    (2009)