<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>ILP-Based Process Discovery Using Hybrid Regions</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>S.J. van Zelst</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>B.F. van Dongen</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>W.M.P. van der Aalst</string-name>
          <email>w.m.p.v.d.aalstg@tue.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Mathematics and Computer Science Eindhoven University of Technology</institution>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
      </contrib-group>
      <fpage>47</fpage>
      <lpage>61</lpage>
      <abstract>
        <p>The language-based theory of regions, stemming from the area of Petri net synthesis, forms a fundamental basis for Integer Linear Programming (ILP)-based process discovery. Based on example behavior in an event log, a process model is derived that aims to describe the observed behavior. Building on top of the existing ILP-formulation, we present a new ILP-based process discovery formulation that unifies two existing types of language-based regions and, additionally, we present a generalized ILP objective function that captures both region-types and helps us to find suitable process discovery results.</p>
      </abstract>
      <kwd-group>
        <kwd>Process mining</kwd>
        <kwd>process discovery</kwd>
        <kwd>integer linear programming</kwd>
        <kwd>region theory</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>
        Process Mining [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] forms the connecting element between business process modeling
and data analysis. Its main aim is to extract knowledge from business process execution
data originating from a variety of data sources, e.g., enterprise information systems,
social media, embedded systems etc. Within process mining, three main branches are
distinguished being process discovery, conformance checking and process enhancement.
Within process discovery the aim is to, given an event log, reconstruct a process model.
Within conformance checking the aim is to assess, given a process model and an event
log, the conformance of the event log to the process model. Within process
enhancement the aim is to further improve and/or align existing process models by
combining the two aforementioned disciplines, e.g., identification and removal of bottlenecks
within business processes.
      </p>
      <p>
        The area of Petri net synthesis [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] is concerned with deciding whether there exists
a Petri net that exactly describes a given sequential behavioral system description. A
subclass of synthesis techniques is the area of region theory which is both defined on
transition systems, known as state-based region theory, and on languages, known as
language-based region theory.
      </p>
      <p>
        Language-based region theory forms a fundamental basis for ILP-based process
discovery [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Within process discovery however, a process model is typically valued
w.r.t. four essential process mining quality dimensions being replay-fitness, precision,
generalization and simplicity [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Applying traditional Petri net synthesis techniques
in a process discovery context will result in models that have perfect replay-fitness,
maximal precision, low generalization and often low simplicity.
      </p>
      <p>In this paper we propose a unified approach w.r.t. ILP-based process discovery,
based on the newly introduced concept of hybrid variable-based regions. Hybrid
variablebased regions unite two existing language-based region definitions and allow us to vary
the number of variables used within the ILP problems being solved. Therefore, we
assess the impact of using a different number of variables on the average computation
time of solving ILPs during ILP-based process discovery. Additionally we present a
new generalized ILP objective function that supports hybrid variable-based regions and
show that the objective function yields correct process discovery results.</p>
      <p>The outline of this paper is as follows. In Section 2 we present basic preliminaries.
In Section 3 we present language-based regions. In Section 4 we show how to adopt
language-based regions within process discovery using ILP. In Section 5 we present
a brief evaluation of the performance of the approach. Section 6 covers related work.
Section 7 concludes the paper.
2
2.1</p>
    </sec>
    <sec id="sec-2">
      <title>Preliminaries</title>
      <sec id="sec-2-1">
        <title>Bags, Sequences and Vectors</title>
        <p>Let Z denote the set of integers, let N denote the set of positive integers including 0
and let N+ denote the set of positive integers excluding 0. Let R denote the set of real
numbers and let R+ denote the set of positive real numbers excluding 0.</p>
        <p>Let S denote a set. Let us define a bag B as a function B : S ! N. Notation-wise
we write a bag as [e1n1 ; e2n2 ; :::; e3n3 ], where ei 2 S, ni 2 N+ and eini B(ei) = ni. If
for some element e, B(e) = 1, we omit its superscript. An empty bag is denoted as ;.
As an example let S1 = fa; b; c; dg and let B1 denote a bag consisting of 3 a’s, 5 b’s,
1 c and 0 d’s, i.e. B1 = [a3; b5; c]. Element inclusion applies to bags as well, i.e. given
e 2 S and B(e) &gt; 0 then also e 2 B. Thus we have a 2 B1 whereas d 2= B1.</p>
        <p>A sequence is a function relating positions to elements e 2 S, i.e. : f1; 2; :::;
kg ! S. An empty sequence is denoted as . We write every non-empty sequence as
he1; e2; :::; eki. The set of all possible sequences over a set S is denoted as S . We define
concatenation of sequences 1 and 2 as 1 2, e.g., ha; bi hc; di = ha; b; c; di. If we
are given a set X of sequences over S, i.e., X S , we define X’s prefix-closure as
X = f 1 2 S j9 22S ( 1 2 2 X)g. Let X S , X is prefix-closed if X = X. Let
X S and let BX : X ! N be a bag, we define BX ’s prefix closure as BX : X ! N
BX ( ) = B( ) +</p>
        <p>X
hei2X</p>
        <p>BX (
hei)
Thus, for set X1 = fha; bi; ha; cig we have X1 = f ; hai; ha; bi; ha; cig whereas for
bag B2 = [ha; bi5; ha; ci3] we have B2 = [ 8; a 8
h i ; ha; bi5; ha; ci3].</p>
        <p>Let S be a set on which we can pose some total order and let R R be a range
of values. Vectors are denoted as ~z 2 RjSj, where ~z(e) 2 R and e 2 S. A vector
~z represents a column vector. For vector multiplication we assume vectors to agree
on their indices. Thus, given a totally ordered set S = fe1; e2; :::; eng and ~z1; ~z2 2
RjSj we have ~z1|~z2 = Pi2f1;2;:::;ng ~z1(ei)~z2(ei). Parikh vectors represent the number
of occurrences of a certain element within a sequence. A Parikh vector is a function
p~ : S ! NjSj with p~( ) = ( e1 ; e2 ; :::; en ) where ei = jfj j 1 j j j;
(j) = eigj. As an example consider S = fa; b; c; dg, 1 = ha; b; di and 2 = ha; c; di.
We have p~( 1)| = (1; 1; 0; 1) and p~( 2)| = (1; 0; 1; 1). Note that 1 b = 1 whereas
2 b = 0. Given a Parikh vector p~ : S ! NjSj and a set Q S, we define p~Q : S !
NjQj. Thus given S = fa; b; c; dg and 1 = ha; b; di we have p~fa;c;dg( 1)| = (1; 0; 1).
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Event Logs and Petri Nets</title>
        <p>The main goal of process discovery is to describe the observed behavior within an
event log by means of a process model. Thus, event logs act as the input of process
discovery whereas some process model acts as the output of process discovery. An
abstract example of an event log is presented in Table 1. Often more data attributes are
available in an event log, e.g., customer id, credit balance etc. In this paper we focus on
the sequential ordering of activities w.r.t. cases, i.e. the control-flow perspective.
Definition 1 (Event log). Let A be a set of activities, an event log L is a bag of
sequences over A, i.e., L : A ! N.1
A sequence of activities 2 L is referred to as a trace. We assume that any given set
of activities A is totally ordered (or we are able to trivially pose a total ordering on A).</p>
        <p>We consider Petri nets to describe process models. A Petri net is a bipartite graph
consisting of a set of vertices called places and a set of vertices called transitions. Arcs
(directed edges) are connecting places and transitions and have an associated positive
weight.</p>
        <p>Definition 2 (Petri net). A Petri net is a 3-tuple N = (P; T; W ), where P is a set of
places and T is a set of transitions with P \ T = ;. W is the weighted flow relation of
N , W : (P T ) [ (T P ) ! N.</p>
        <p>For a given node x 2 P [ T , the pre-set of x in N is defined as x = fy j W (y;
x) &gt; 0g. Correspondingly x = fy j W (x; y) &gt; 0g denotes the post-set of x in
N . Graphically we represent places as circles and transitions as boxes. For every (x;
y) 2 (P T ) [ (T P ) with W (x; y) &gt; 0 we draw an arc from x to y. If W (x; y) &gt; 1</p>
        <p>1In practice we extract an event log L from some information system. Consequently A is
implicitly defined by the event log, i.e., A only consists of events that are actually present L.
we denote the arc’s weight W (x; y) on top of the arc from x to y. A Petri net is pure
if it does not contain self-loops, i.e., if W (x; y) &gt; 0 then W (y; x) = 0. An example
(pure) Petri net is depicted in Figure 1.</p>
        <p>Definition 3 (Marked Petri net). Let N = (P; T; W ) be a Petri net. A marking of N
is a bag over N ’s places, i.e. M : P ! N. A marked Petri net is a pair (N; M0), where
M0 represents N ’s initial marking.</p>
        <p>Let N = (P; T; W ) be a Petri net and let M be a marking of N . A transition t 2 T
is enabled, denoted (N; M )[ti, if and only if 8p2 t(M (p) W (p; t)). Graphically
we represent a marking by drawing exactly a place’s marking-multiplicity number of
dots inside the place (e.g. p1 in Figure 1 with M0 = [p1]). If a transition t is enabled
in (N; M ), t may fire, resulting in a new marking M 0. When t fires, denoted as (N;
M )[ti(N; M 0), we have 8p2P (M 0(p) = M (p) W (p; t) + W (t; p)). For example in
Figure 1 we have (N1; [p1])[ai(N1; [p2]).</p>
        <p>Definition 4 (Firing sequences). Let N = (P; T; W ) be a Petri net and let (N; M0)
be a corresponding marked Petri net. Sequence = ht1; t2; :::; tni 2 T is a firing
sequence of (N; M0), written as (N; M0)[ i(N; Mn) if and only if for n = j j there
exist markings M0; M1; M2; :::; Mn such that (N; M0)[t1i(N; M1), (N; M1)[t2i(N;
M2); :::; (N; Mn 1)[tni(N; Mn).</p>
        <p>Note that infinitely many firing sequences can exist given a Petri net. Some example
firing sequences of the Petri net depicted in Figure 1 are: (N1; [p1])[haii(N1; [p2]),
(N1; [p1])[ha; bii(N1; [p3]) and (N1; [p1])[ha; c; d; e; f; e; f; e; gii(N1; ;). The set of all
possible firing sequences in a Petri net N is called N 0s language, i.e., all sequences
2 T s.t. (N; M0)[ i(N; Mi). N ’s language is denoted as L(N ) T and is
prefixclosed.</p>
        <p>Consider Figure 1 and event log L1 = [ha; b; d; e; gi10; ha; c; d; e; gi9; ha; b; d; e;
f; e; gi11; ha; c; d; e; f; e; gi8] over A1 = fa; b; c; d; e; f; gg. Clearly we have L1
L(N1), and thus replay-fitness of L1 on N1 is perfect. We will use N1, A1 and L1
throughout the paper as a running-example.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Regions</title>
      <p>The concept of regions forms the basis of region theory within Petri net synthesis. Given
an event log L over a set of activities A, language-based regions are used to represent
a
p1
p2</p>
      <p>d
p3
p4
p5</p>
      <p>g
b
c
e
f
places in a resulting Petri net N = (P; A; W ). A language-based region maps every
a 2 A to an integer representing arc-weight. For such mapping over A to be a region
we pose the restriction that the corresponding place p 2 P should not block any 2 L,
i.e., 2 L ) 2 L(N ). We identify two basic definitions of language-based regions
which we classify as single variable-based regions and dual variable-based regions.
The main difference is the number of variables used to define a region.</p>
      <sec id="sec-3-1">
        <title>Definition 5 (Single variable-based regions). Let L be an event log over a set of ac</title>
        <p>tivities A. Let m 2 N and let ~v 2 ZjAj. A pair r = (m; ~v) is a single variable-based
region iff:
8 2L(m + p~( )|~v
0)
(3.1)
Let Rs(L) denote the set of all possible single variable-based regions of L. Equation 3.1
generates a set of linear inequalities, i.e. applying Equation 3.1 on L1 yields:
m
m + ~v(a)
m + ~v(a) + ~v(b)</p>
        <p>...
m + ~v(a) + ~v(b) + ~v(d) + 2~v(e) + ~v(f) + ~v(g)
m + ~v(a) + ~v(c) + ~v(d) + 2~v(e) + ~v(f) + ~v(g)
...</p>
        <p>Single variable-based regions use one single decision variable for each a 2 A,
represented by ~v 2 ZjAj. Expressing a single variable-based region r = (m; ~v) as
a place p 2 P in a marked net (N; M0) with N = (P; A; W ) is straightforward.
We have M0(p) = m, if ~v(a) &gt; 0 then W (a; p) = ~v(a), if ~v(a) &lt; 0 then W (p;
a) = ~v(a) and if ~v(a) = 0 then W (a; p) = W (p; a) = 0. Consider place p2 depicted
in Figure 1 which can be represented as a single variable-based region r = (m; (~v(a);
~v(b); :::; ~v(g))) = (0; (1; 1; 1; 0; 0; 0; 0)). Note that for each inequality generated by
Equation 3.1, place p2 has a value of at least 0 and hence is a member of Rs(L).</p>
        <p>
          Single variable-based regions only allow us to synthesize/discover pure Petri nets.
As a consequence we can not find self-loops, i.e. we can not find a place p in a resulting
net N = (P; A; W ) s.t. p 2 a\a , for any a 2 A. A lot of workflow patterns [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] in fact
exhibit places that include self-loops, e.g. milestone patterns, mutual-exclusion patterns,
priority patterns etc. Hence, we define dual variable-based regions which explicitly
allow us to distinguish between incoming and outgoing arcs.
        </p>
        <p>Definition 6 (Dual variable-based regions). Let L be an event log over a set of
activities A. Let m 2 N and ~x; ~y 2 NjAj. A triple r = (m; ~x; ~y) is a dual variable-based
region iff:
8 = 0 hai2L(m + p~( 0)|~x
p~( )|~y
0)
(3.2)
Let Rd(L) denote the set of all possible dual variable-based regions of L. Like
Definition 5, Definition 6 generates a set of linear inequalities. Applying Definition 6 on L1
yields:
m
m
y~(a) + ~x(a)
y~(a) + ~x(a)
y~(b) + ~x(b)
y~(c) + ~x(c)
2y~(e) + 2~x(e)
2y~(e) + 2~x(e)
y~(f) + ~x(f)
y~(f) + ~x(f)
y~(g)
y~(g)
...
0 hai
0 ha; bi
0 ha; ci</p>
        <p>...
0 ha; b; d; e; f; e; gi
0 ha; c; d; e; f; e; gi
m
m</p>
        <p>m y~(a)
y~(a) + ~x(a)
y~(a) + ~x(a)</p>
        <p>...
y~(d) + ~x(d)
y~(d) + ~x(d)
y~(b)
y~(c)</p>
        <p>Dual variable-based regions use two decision variables per a 2 A, represented by
~x; ~y 2 NjAj. The variables allow us to distinguish between incoming arcs and outgoing
arcs when translating regions to Petri nets. Expressing a dual variable-based region
r = (m; ~x; ~y) as a place p 2 P in marked net (N; M0) with N = (P; A; W ) is again
straightforward. We have M0(p) = m, W (a; p) = ~x(a) and W (p; a) = ~y(a). Again
consider place p2 depicted in Figure 1 which can be represented as a dual variable-based
region r = (m; (~x(a); ~x(b); :::; ~x(g)); (~y(a); ~y(b); :::; ~y(g))) = (0; (1; 0; 0; 0; 0; 0; 0);
(0; 1; 1; 0; 0; 0; 0)). Verify that for each linear in-equality generated by Definition 6,
place p2 has a value of at least 0 and hence is a member of Rd(L).
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Hybrid Variable-Based Regions in Process Discovery</title>
      <p>Using dual variable-based regions allows us to express non-pure places, i.e, self-loops,
milestones etc. However, this type of regions uses roughly twice as many variables
compared with single variable-based regions. To balance the number of variables used,
though still enhance the possibility of finding non-pure Petri net structures, we
introduce the new concept of hybrid regions, capturing both single and dual variable-based
regions.</p>
      <p>Definition 7 (Hybrid variable-based regions). Let L be an event log over a set of
activities A. Let As; Ad A be two sets of activities with As [ Ad = A and As \ Ad =
;. Let m 2 N, ~v 2 ZjAsj and ~x; ~y 2 NjAdj. A quadruple r = (m; ~v; ~x; ~y) is a hybrid
variable-based region iff:
(4.1)
(4.2)
8 = 0 hai2L(m + p~As ( )|~v + p~Ad ( 0)|~x
Equation 4.2 does not equal Equation 3.1, however, as p~( ) = ~0 and m 2 N, the
equations are equivalent in this context.</p>
      <p>Note that the set of hybrid variable-based regions, i.e. the set of variable assignments
that adhere to Definition 7 depends on the hybrid partition of A into As and Ad.
Therefore, we let As act as a parameter for the set of feasible hybrid variable-based regions.
Let RAhs (L) denote the set of all possible hybrid variable-based regions of L where As
represent a hybrid partition of A. Note RAh(L) = Rs(L) and Rh(L) = Rd(L). Region
;
r = (0; ~0; ~0; ~0) is deemed the trivial region.</p>
      <p>All three language-based region definitions, i.e. single, dual and hybrid
variablebased regions, provide means to accept and/or reject potential places in a to be
constructed Petri net. In classical Petri net synthesis approaches, one keeps looking for
feasible places until either L = L(N ), or if this is impossible, L(N ) n L is minimized.
Unfortunately, most models returned by classical Petri net synthesis techniques result
in models that are unusable from a process discovery perspective. Hence, we need to
relax the strict formal requirements posed on the relation between L and L(N ).</p>
      <sec id="sec-4-1">
        <title>Definition 8 (Hybrid variable-based process discovery ILP-formulation). Let L be</title>
        <p>an event log over a set of activities A and let As; Ad A be a hybrid partition. Let
Ms; Md; M0d be three matrices where Ms is an jLj n f g As matrix with Ms( ;
a) = p~( )(a) and Md, M0d are two jLj n f g Ad matrices with Md( ; a) = p~( )(a)
and M0d( ; a) = p~( 0)(a) (where = 0 ha0i 2 L). Let cm 2 R, c~v 2 RjAsj and
c~x; c~y 2 RjAdj. The hybrid variable-based process discovery ILP-formulation, denoted
ILPLh, is defined as:
minimize z = cmm + c~v|~v + c~x|~x + c~y|~y objective function
such that m~1 + Ms~v + M0d~x Md~y ~0 hybrid variable-based region
and ~1 ~v ~1 i.e. ~v 2 f 1; 0; 1gjAsj
~0 ~x ~1 i.e. ~x 2 f0; 1gjAdj
~0 ~y ~1 i.e. ~y 2 f0; 1gjAdj
0 m 1 i.e. m 2 f0; 1g</p>
        <p>
          The ILP-fromulation presented in Definition 8 uses the set of linear in-equalities
generated by Equation 4.1 within its constraint body. The formulation however binds
~v to f 1; 0; 1gjAsj and ~x; ~y to f0; 1gjAdj, i.e. the formulation only allows for
discovering Petri nets with arc weights restricted to f0; 1g. Additionally it defines an objective
function, i.e. z = cmm + c~v|~v + c~x|~x + c~y|~y, that maps each region to a real value,
i.e. z : RAhs (L) ! R. In general we are free to choose whatever objective function
we like, however, using different objective functions will lead to different process
discovery results. Choosing cm = 0 and ~cv = ~cx = ~cy = ~1 leads to arc minimization
whereas maximizing the same objective leads to arc maximization, resulting in
different places/regions found by the underlying ILP solver. The objective function proposed
in [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], being a minimization function, tries to minimize the number of incoming arcs
and maximize the number of outgoing arcs. The objective function can be defined in
terms of cm; c~v; c~x and c~y as cm = jLj, c~v = P 2L p~As ( ), c~x = P 2L p~Ad ( ) and
c~y = c~x, i.e.:
z(r) =
        </p>
        <p>X(m + p~As ( )|~v + p~Ad ( )|(~x
~y))
(4.3)
2L
4.1</p>
      </sec>
      <sec id="sec-4-2">
        <title>Optimizing Token Throughput</title>
        <p>
          The objective function used within [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], i.e. Equation 4.3 is less generic as the one
proposed in Definition 8. As shown in [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], it favors minimal regions. A minimal region is
a region that is not expressible as the sum of two other regions. The objective function
is solely based on the set-representation of the prefix-closure of an event log. However,
an event log is a bag of traces and thus consists of information on trace frequency. We
propose a generalized prefix-closure-based objective function that incorporates a
wordbased scaling function . The scaling function is required to map all sequences in the
prefix-closure of the event log to some positive real value. The actual implementation
is up to the user, although we present an instantiation of that works well for process
discovery. We show that the proposed generalized weighted prefix-closure-based hybrid
region objective function favors minimal regions, given any scaling function .
Definition 9 (Generalized weighted prefix-closure-based hybrid region objective
function). Let L be an event log over a set of activities A and let As; Ad A be a
hybrid partition. Let r = (m; ~v; ~x; ~y) 2 RAhs (L) be a hybrid variable-based region and
let be a scaling function over L, i.e. : L ! R+. The generalized weighted
prefixclosure-based hybrid region objective function is instantiated as cm = P 2L ( ),
c~v = P 2L ( )p~As ( ), c~x = P 2L ( )p~Ad ( ) and c~y = c~x, i.e.:
z (r) = X
( )(m + p~As ( )|~v + p~Ad ( )|(~x
~y))
(4.4)
Note that if we choose ( ) = 1 for all 2 L, denoted 1, we instantiate the
generalized objective function as the objective function proposed in [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. We denote this
objective function as z1, i.e. Equation 4.3.
        </p>
        <p>To relate the behavior in a given event log to the objective function defined in
Definition 9 we instantiate the scaling function making use of the frequencies of the traces
present in the event log, i.e. we let ( ) = L( ) leading to:
zL(r) = X L( )(m + p~As ( )|~v + p~Ad ( )|(~x
~y))
(4.5)</p>
        <p>To assess the difference between z1 and the newly proposed objective function zL,
consider the Petri net depicted in Figure 2. Assume we are given some event log L =
[ha; b; di5; ha; c; di3]. Let r1 denote the hybrid variable-based region corresponding to
place p1, let r2 correspond to p2 and let r3 correspond to p3. In this case we have
z1(r1) = 1, z1(r2) = 1 and z1(r3) = 2. On the other hand we have zL(r1) = zL(r2) =
zL(r3) = 5 + 3 = 8. Thus, using z L leads to more intuitive objective values compared
to using z1 as zL evaluates to the absolute number of discrete time-steps a token would
remain in the corresponding place when replaying log L w.r.t. the place.</p>
        <p>
          In [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] it is shown that the objective function used favors minimal regions. The proof
cannot directly be adapted to hold for hybrid variable-based regions. Moreover it does
not provide means to show that any arbitrary instantiation of z favors minimal
hybrid variable-based regions. Here we show that any instantiation of the generalized
weighted prefix-closure-based hybrid region objective function with some scaling
function : L ! R+ favors minimal hybrid variable-based regions. We first show that the
objective value of a non-minimal hybrid region equals the sum of the two minimal
regions defining it after which we show that the given objective function maps each region
to some positive real value, i.e. rng(z ) R+.
        </p>
        <p>2L
2L
Fig. 2: A simple Petri net N with L(N ) = f ; hai; ha; bi; ha; ciha; b; di; ha; c; dig.
X
2L
X
2L</p>
      </sec>
      <sec id="sec-4-3">
        <title>Lemma 1 (Objective value composition of non-minimal regions). Let L be an event</title>
        <p>log over a set of activities A and let As; Ad A be a hybrid partition. Let r1 = (m1;
~v1; ~x1; ~y1), r2 = (m2; ~v2; ~x2; ~y2) and r3 = (m1 +m2; ~v1 +~v2; ~x1 +~x2; ~y1 +~y2) with r1;
r2; r3 2 RAhs (L). Let z : RAhs (L) ! R where z is an instantiation of the generalized
weighted objective function as defined in Definition 9, then z (r3) = z (r1) + z (r2).
Proof (By definition of z ). Let us denote z (r3):
( )((m1 + m2) + p~As ( )|(v~1 + v~2) + p~Ad ( )|((x~1 + x~2)
(y~1 + y~2)))
( )(m1 +p~As ( )|v~1 +p~Ad ( )|(x~1 y~1)+m2 +p~As ( )|v~2 +p~Ad ( )|(x~2 y~2))
( )(m1 + p~As ( )|v~1 + p~Ad ( )|(x~1</p>
        <p>y~1)) +
( )(m2 + p~As ( )|v~2 + p~Ad ( )|(x~2
y~2))
(4.6)
(4.7)
(4.8)</p>
        <p>0:
(4.9)
Clearly z (r3) = z (r1) + z (r2).</p>
        <p>Lemma 1 shows that the value of z for a non-minimal region equals the sum of the
z values of the two regions it is composed of. If we additionally show that z can
only evaluate to positive values, we show that z favors minimal hybrid variable-based
regions.</p>
      </sec>
      <sec id="sec-4-4">
        <title>Lemma 2 (Range of z is strictly positive). Let L be an event log over a set of activ</title>
        <p>ities A and let As; Ad A be a hybrid partition. Let r = (m; ~v; ~x; ~y) be a non-trivial
region, i.e., r 2 RAhs (L). If we let z be any instantiation of the generalized weighted
objective function as defined in Definition 9, then z : RAhs (L) ! R+.
Proof (By showing z (r) &gt; 0; 8r 2 RAhs (L)). Let r = (m; ~v; ~x; ~y) be a non-trivial
hybrid variable-based region, i.e. r 2 RAhs (L). Because r 2 RAhs (L) we have
8 = 0 hai2Lnf g(m + p~As ( )|~v + p~Ad ( 0)|~x
Using Equation 4.7 we can substitute p~Ad ( 0)|~x with p~Ad ( )|~x in Equation 4.6:
8 = 0 hai2Lnf g(m + p~As ( )|~v + p~Ad ( )|(~x
~y)
0)
Combining rng( )</p>
        <p>R+, p~( ) = ~0 and m 2 N with Equation 4.8 we find z (r)
X
2L
X
2L
X
2L
( )(m + p~As ( )|~v + p~Ad ( )|(~x
~y))</p>
        <p>0
If m &gt; 0 then ( )(m+p~As ( )|~v+p~Ad ( )|(~x ~y)) &gt; 0. Combined with Equations 4.8
and 4.9 leads to z (r) &gt; 0.</p>
        <p>Observe that if m = 0 then for r to be a non-trivial hybrid variable-based region,
i.e. r 2 RAhs (L), either (I). 9a 2 As s.t. ~v(a) &gt; 0 or (II). 9a 2 Ad s.t. ~x(a) &gt; 0.</p>
        <p>(I). Let m = 0 and a 2 As s.t. ~v(a) &gt; 0. We know 9 = 0 hai 2 L. Because
r 2 RAhs (L) (using Equation 4.8) we have:
m + p~As ( )|~v + p~Ad ( )|(~x
~y)
0
If 0 = , we have p~Ad ( ) = ~0 and p~As ( )|~v = ~v(a) hence we deduce:
m + p~As ( )|~v + p~Ad ( )|(~x
~y) &gt; 0
Combining this with Equations 4.8 and 4.9 yields z (r) &gt; 0.</p>
        <p>If 0 6= we have (using Equation 4.8):
m + p~As ( 0)|~v + p~Ad ( 0)|(~x
~y)
0
Observe that p~As ( )|~v = p~As ( 0)|~v + ~v(a), together with p~Ad ( 0) = p~Ad ( ) leads us
to reformulate this to:
m + p~As ( )|~v</p>
        <p>~v(a) + p~Ad ( )|(~x
m + p~As ( )|~v + p~Ad ( )|(~x
~y)
~y)
~v(a)
Combining this with Equations 4.8 and 4.9 yields z (r) &gt; 0.</p>
        <p>(II) Let m = 0 and a 2 Ad s.t. ~x(a) &gt; 0. We know 9
r 2 RAhs (L) (using Equation 4.6) we have:
= 0 hai 2 L. Because
m + p~As ( )|~v + p~Ad ( 0)|~x
Observe that p~Ad ( )|~x = p~Ad ( 0)|~x + ~x(a) and thus:
m + p~As ( )|~v + p~Ad ( )|(~x
~y)
~x(a)
Combining this with Equations 4.8 and 4.9 yields z (r) &gt; 0.</p>
        <p>By combining Lemma 1 and Lemma 2 we can easily show that any instantiation of
z favors minimal regions.</p>
      </sec>
      <sec id="sec-4-5">
        <title>Theorem 1 (Any instantiation of z favors minimal regions). Let L be an event log</title>
        <p>over a set of activities A and let As; Ad A be a hybrid partition. Let r1 = (m1;
~v1; ~x1; ~y1), r2 = (m2; ~v2; ~x2; ~y2) and r3 = (m1 + m2; ~v1 + ~v2; ~x1 + ~x2; ~y1 + ~y2) be
three non-trivial regions, i.e., r1; r2; r3 2 RAhs (L). For any z : RAhs (L) ! R being an
instantiation of the generalized weighted objective function as defined in Definition 9:
z (r3) &gt; z (r1) and z (r3) &gt; z (r2).</p>
        <p>Proof (By composition of Lemma 1 and Lemma 2). By Lemma 1 we know z (r3) =
z (r1) + z (r2). By Lemma 2 we know that z (r1) &gt; 0, z (r2) &gt; 0 and z (r3) &gt; 0.
Thus we deduce z (r3) &gt; z (r1) and z (r3) &gt; z (r2). Consequently, any instantiation
of the objective function as defined in Definition 9 favors minimal regions.
0
0
Both objective functions presented, i.e.z1 and zL, are expressible in terms of the more
general objective function z as presented in Definition 9. As we have seen the two
objective functions may favors different regions. Combining an objective function with
the ILP-formulation presented in Definition 8 establishes means to find Petri net places.
However, solving one ILP only yields one solution and hence we need means to find a
set of places, which together form a Petri net that represents the input event log L.
The most apparent technique to find multiple places is the use of causal relations within
an event log. Within the context of this paper we define a causal relation as follows.
Definition 10 (Causal relation). Let L be an event log over a set of activities A. A
causal relation L is a function L : A A ! R where L(a; b) denotes the causality
between activity a and b exhibited in L.</p>
        <p>
          Several approaches exist to compute causalities hence we refer to [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] for an overview of
the use of causalities within different process discovery approaches. As a consequence,
        </p>
        <p>L(a; b) can have different meanings w.r.t. the causality between a and b. Some
approaches limit rng( L) to f0; 1g where L(a; b) = 1 means that there is a causal
relation between a and b whereas L((a; b)) = 0 means there is no causal relation from
a to b. Other approaches map rng( L) to the real-valued domain ( 1; 1) where a high
positive L(a; b) value (close to 1) indicates a strong causal relation from a to b and a
low negative value (close to 1) indicates a strong causal relation from b to a.</p>
        <p>When adopting a causal-based ILP process discovery strategy, we try to find places
which will be added in the resulting Petri net, each representing a causality found. A
first step is to compute L values given some causal definition. Depending on the actual
meaning of the L, whenever we find a causal relation from a to b (possibly because</p>
        <p>L(a; b) , where is some threshold value), we enrich the constraint body for the
given causal constraint as follows:</p>
        <p>m = 0 and ~v(a) = 1 and ~v(b) =
m = 0 and ~x(a) = 1 and ~v(b) =
m = 0 and ~v(a) = 1 and ~y(b) = 1; if a 2 As; b 2 Ad
m = 0 and ~x(a) = 1 and ~y(b) = 1; if a; b 2 Ad</p>
        <p>1; if a; b 2 As
1; if a 2 Ad; b 2 As
After the constraint body is enriched, we solve the ILP yielding a place having an
optimal value for the specific objective function chosen. We repeat the procedure for every
causal relation yielding a Petri net.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Performance</title>
      <p>
        The hybrid variable-based ILP-formulation is implemented as a plug-in in the
ProMFramework (http://www.promtools.org) [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]2. The plug-in allows the user to
2The source-code is available at https://svn.win.tue.nl/repos/prom/
Packages/HybridILPMiner/Trunk
specify parameters of ILP-based process discovery, e.g. several different objective
functions, additional constraints and causality based mining.
      </p>
      <p>To test the performance of hybrid variable-based ILP process discovery we have
used the basic implementation within an experimentation framework3. The framework
allows for generating random process models and from these models generate event
logs. We have generated 40 models containing a minimum of 2 activities and a
maximum of 12 activities. For each model we have generated 10 event logs, each containing
5000 traces. For each log we ran three different instantiations of the process discovery
formulation, one purely single variable-based, one hybrid variant and one purely dual
variable-based. The distribution of event classes in A over As and Ad is depicted in
Table 2. For the hybrid variant the size of As was kept constant and independent of the
number of activities in A (as of jAj 3). We ran each instantiation 10 times per log
using causal relations as a process discovery strategy. For each run of an instantiation
we calculated the total time spend in solving all ILPs based on the causalities present
in the event log. These times have been aggregated based on the number of activities
present in an event log. The results of the experiment, plotted on a logarithmic scale,
are depicted in Figure 34.</p>
      <p>As shown in Figure 3, the average time to solve the hybrid formulation is in-between
the single and dual formulation. We additionally note the hybrid formulation to get
slightly closer to the dual formulation when jAj increases. This is as expected as the
number of single variables within the hybrid formulation is constant and hence the
average time of solving the ILPs within the formulation increases with respect to the single
variable formulation. Figure 3 shows that reducing the number of variables used within
the ILP-formulation has an impact on the average time of solving ILPs. The differences
are however marginal, which is to be expected as solving ILPs is exponential by nature.
Therefore the experiments show that using ILPs for the aim of process discovery in
3All framework files and results can be found at https://svn.win.tue.nl/repos/
prom/Packages/HybridILPMiner/Tags/papers/source_files/hybrid_
ilp_runtime_experiments.tar.gz</p>
      <p>4The experiments where distributed over four servers (Dell PowerEdge R520, Intel Xeon
E52407 v2 2.40GHz, 10M Cache, 8 8GB RDIMM, 1600MT/s Memory), on each server a total
of 10 models and corresponding logs where generated.
jAj
8
9
a practical setting might need incorporation of more advanced techniques in order to
reduce the average time of solving the ILPs significantly.
6</p>
    </sec>
    <sec id="sec-6">
      <title>Related Work</title>
      <p>
        The concept of hybrid variable-based regions originates from language-based region
theory, which in turn originates from the area of Petri net synthesis. We identify two
main branches of language-based region theory within Petri net synthesis being the
separating regions approach [
        <xref ref-type="bibr" rid="ref10 ref8 ref9">8–10</xref>
        ] and the minimal linear basis approach [
        <xref ref-type="bibr" rid="ref11 ref12">11, 12</xref>
        ]. In
the separating regions approach, which uses single-variable based regions, behavior not
seen in a given prefix closed language is specified as being undesired. In the minimal
linear basis approach, given a prefix closed language, a polyhedral cone of integer points
is constructed based on dual variable-based regions. Using the cone a minimal basis of
the set of regions is calculated which defines a minimal set of places to be synthesized.
Both approaches try to minimize L(N ) n L, where N represents the resulting Petri
net and L is a prefix-closed language. The approaches lead to Petri nets with perfect
replay-fitness. Moreover precision is maximized. A side effect of this property is the
fact that the synthesized net N scores low on both the generalization and the simplicity
dimension.
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] a first design of an ILP-formulation was presented based on the concept of
dual variable-based regions. The work presents objective function z1 which we have
further developed in this work to zL, and more generally, z . The work also focuses on
formulation of several different net-types in terms of linear in-equalities which even go
beyond classical Petri nets, e.g. reset- and inhibitor arcs.
      </p>
      <p>
        An alternative approach is to use the concept of state-based regions for the purpose
of process discovery [
        <xref ref-type="bibr" rid="ref13 ref14">13, 14</xref>
        ]. Within this approach an abstraction of the event log is
computed in the form of a transition system. Regions are computed within the transition
system where, like in language-based region theory, each region corresponds to a place
in the resulting Petri net.
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] a process discovery algorithm is proposed based on the concept of
numerical abstract domains using Parikh vectors as a basis. Based on an event log, a
prefixclosed language is computed of which a convex polyhedron is approximated by means
of calculating a convex hull. The convex hull is then used to compute causalities within
the input log, by deducing a set of linear inequalities which represent places. The
formulation used to calculate these causalities is in essence based the concept of single
variable-based regions. The approach allows for finding pure Petri nets with arc weights
and multiple marked places.
      </p>
      <p>
        A multitude of other Petri net-based process discovery approaches exist. For a
detailed description of these approaches we refer to [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
7
      </p>
    </sec>
    <sec id="sec-7">
      <title>Conclusion</title>
      <p>We presented a new breed of language-based regions, i.e. hybrid variable-based regions,
that captures the two existing region types being single and dual variable-based regions.
Hybrid variable-based regions allow us to decide whether we want to use one or two
variables for an activity present in the input event log. This allows us to achieve gains
in terms of performance whilst maintaining the possibility to find complex (workflow)
patterns. We have shown that within the hybrid variable-based ILP process discovery
formulation, using only one variable per activity a 2 A performs optimal in terms of
the average time spent in solving the ILPs constructed.</p>
      <p>We presented a generalized weighted objective function and showed that any
instantiation of the objective function leads to ILPs that favor minimal regions. As a
result, practitioners may vary the scaling function within the objective function under
the guarantee that the objective function favors minimal regions. We presented a
logbased scaling function that exploits trace frequencies available in the input log. Using
the log based scaling function within the objective function assigns an objective value
to each region equal to the token throughput of the corresponding place, given the event
log. Hence, as we have shown using the new objective function leads to more intuitive
objective values.</p>
      <p>
        As an interesting direction for future work we identify the assessment of the impact
of filtering, either a-priori or within the ILP itself, on the quality dimensions of the
resulting nets, i.e. are we able to leverage the perfect repaly-fitness property? Also,
an assessment of the impact of decomposition techniques [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] on the performance of
ILP-based approaches is an interesting direction for future work.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Aalst</surname>
            ,
            <given-names>W.M.P.</given-names>
          </string-name>
          v.d.:
          <source>Process Mining: Discovery, Conformance and Enhancement of Business Processes. 1st edn</source>
          . Springer Publishing Company, Incorporated (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Desel</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Reisig</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          :
          <article-title>The synthesis problem of Petri nets</article-title>
          .
          <source>Acta Inf</source>
          .
          <volume>33</volume>
          (
          <issue>4</issue>
          ) (
          <year>1996</year>
          )
          <fpage>297</fpage>
          -
          <lpage>315</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Werf</surname>
            ,
            <given-names>J.M.E.M.</given-names>
          </string-name>
          v.d.,
          <string-name>
            <surname>Dongen</surname>
            ,
            <given-names>B.F.v.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hurkens</surname>
            ,
            <given-names>C.A.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Serebrenik</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Process discovery using integer linear programming</article-title>
          .
          <source>Fundamenta Informaticae</source>
          <volume>94</volume>
          (
          <issue>3</issue>
          ) (
          <year>2009</year>
          )
          <fpage>387</fpage>
          -
          <lpage>412</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Buijs</surname>
            ,
            <given-names>J.C.A.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dongen</surname>
            ,
            <given-names>B.F.v.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aalst</surname>
            ,
            <given-names>W.M.P.</given-names>
          </string-name>
          v.d.:
          <article-title>On the role of fitness, precision, generalization and simplicity in process discovery</article-title>
          . In Meersman, R.,
          <string-name>
            <surname>Panetto</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dillon</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rinderle-Ma</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dadam</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pearson</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ferscha</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bergamaschi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cruz</surname>
            ,
            <given-names>I.F</given-names>
          </string-name>
          ., eds.: On the Move to Meaningful
          <source>Internet Systems: OTM 2012. Volume 7565 of Lecture Notes in Computer Science</source>
          . Springer Berlin Heidelberg (
          <year>2012</year>
          )
          <fpage>305</fpage>
          -
          <lpage>322</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Aalst</surname>
            ,
            <given-names>W.M.P.</given-names>
          </string-name>
          v.d.,
          <string-name>
            <surname>Hofstede</surname>
            ,
            <given-names>A.H.M.</given-names>
          </string-name>
          <year>t</year>
          .,
          <string-name>
            <surname>Kiepuszewski</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barros</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Workflow patterns</article-title>
          .
          <source>Distributed and Parallel Databases</source>
          <volume>14</volume>
          (
          <issue>1</issue>
          ) (
          <year>2003</year>
          )
          <fpage>5</fpage>
          -
          <lpage>51</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Dongen</surname>
            ,
            <given-names>B.F.v.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Medeiros</surname>
            ,
            <given-names>A.K.A.d.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wen</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Process mining: Overview and outlook of Petri net discovery algorithms</article-title>
          .
          <source>T. Petri Nets and Other Models of Concurrency</source>
          <volume>2</volume>
          (
          <year>2009</year>
          )
          <fpage>225</fpage>
          -
          <lpage>242</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Dongen</surname>
            ,
            <given-names>B.F.v.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Medeiros</surname>
            ,
            <given-names>A.K.A.d.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Verbeek</surname>
            ,
            <given-names>H.M.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weijters</surname>
            ,
            <given-names>A.J.M.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aalst</surname>
            ,
            <given-names>W.M.P.</given-names>
          </string-name>
          v.d.:
          <article-title>The ProM framework: A new era in process mining tool support</article-title>
          . In Ciardo, G.,
          <string-name>
            <surname>Darondeau</surname>
          </string-name>
          , P., eds.
          <source>: Applications and Theory of Petri Nets 2005. Volume 3536 of Lecture Notes in Computer Science</source>
          . Springer Berlin Heidelberg (
          <year>2005</year>
          )
          <fpage>444</fpage>
          -
          <lpage>454</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Badouel</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bernardinello</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Darondeau</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Polynomial algorithms for the synthesis of bounded nets</article-title>
          . In Mosses, P.D.,
          <string-name>
            <surname>Nielsen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schwartzbach</surname>
          </string-name>
          , M.I., eds.
          <source>: TAPSOFT '95: Theory and Practice of Software Development. Volume 915 of Lecture Notes in Computer Science</source>
          . Springer Berlin Heidelberg (
          <year>1995</year>
          )
          <fpage>364</fpage>
          -
          <lpage>378</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Badouel</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Darondeau</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Theory of regions</article-title>
          . In Reisig, W.,
          <string-name>
            <surname>Rozenberg</surname>
          </string-name>
          , G., eds.
          <source>: Lectures on Petri Nets I: Basic Models. Volume 1491 of Lecture Notes in Computer Science</source>
          . Springer Berlin Heidelberg (
          <year>1998</year>
          )
          <fpage>529</fpage>
          -
          <lpage>586</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Darondeau</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Deriving unbounded Petri nets from formal languages</article-title>
          . In Sangiorgi, D.,
          <string-name>
            <surname>Simone</surname>
          </string-name>
          , R.d., eds.
          <source>: CONCUR'98 Concurrency Theory. Volume 1466 of Lecture Notes in Computer Science</source>
          . Springer Berlin Heidelberg (
          <year>1998</year>
          )
          <fpage>533</fpage>
          -
          <lpage>548</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Bergenthum</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Desel</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lorenz</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mauser</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Process mining based on regions of languages</article-title>
          . In Alonso, G.,
          <string-name>
            <surname>Dadam</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosemann</surname>
          </string-name>
          , M., eds.:
          <source>Business Process Management. Volume 4714 of Lecture Notes in Computer Science</source>
          . Springer Berlin Heidelberg (
          <year>2007</year>
          )
          <fpage>375</fpage>
          -
          <lpage>383</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Lorenz</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mauser</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Juhas</surname>
          </string-name>
          , G.:
          <article-title>How to synthesize nets from languages - a survey</article-title>
          .
          <source>In: Simulation Conference</source>
          ,
          <source>2007 Winter. (Dec</source>
          <year>2007</year>
          )
          <fpage>637</fpage>
          -
          <lpage>647</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13. Sole´,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Carmona</surname>
          </string-name>
          , J.:
          <article-title>Process mining from a basis of state regions</article-title>
          .
          <source>In: Applications and Theory of Petri Nets</source>
          , 31st International Conference,
          <source>PETRI NETS</source>
          <year>2010</year>
          , Braga, Portugal, June 21-25,
          <year>2010</year>
          . Proceedings. (
          <year>2010</year>
          )
          <fpage>226</fpage>
          -
          <lpage>245</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Aalst</surname>
            ,
            <given-names>W.M.P.</given-names>
          </string-name>
          v.d.,
          <string-name>
            <surname>Rubin</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Verbeek</surname>
            ,
            <given-names>H.M.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dongen</surname>
            ,
            <given-names>B.F.v.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kindler</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          , Gu¨nther, C.W.:
          <article-title>Process mining: a two-step approach to balance between underfitting and overfitting</article-title>
          .
          <source>Software and System Modeling</source>
          <volume>9</volume>
          (
          <issue>1</issue>
          ) (
          <year>2010</year>
          )
          <fpage>87</fpage>
          -
          <lpage>111</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Carmona</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cortadella</surname>
          </string-name>
          , J.:
          <article-title>Process discovery algorithms using numerical abstract domains</article-title>
          .
          <source>IEEE Trans. Knowl. Data Eng</source>
          .
          <volume>26</volume>
          (
          <issue>12</issue>
          ) (
          <year>2014</year>
          )
          <fpage>3064</fpage>
          -
          <lpage>3076</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Aalst</surname>
            ,
            <given-names>W.M.P.</given-names>
          </string-name>
          v.d.:
          <article-title>Decomposing Petri nets for process mining: A generic approach</article-title>
          .
          <source>Distributed and Parallel Databases</source>
          <volume>31</volume>
          (
          <issue>4</issue>
          ) (
          <year>2013</year>
          )
          <fpage>471</fpage>
          -
          <lpage>507</lpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>