<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">ILP-Based Process Discovery Using Hybrid Regions</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">S</forename><forename type="middle">J</forename><surname>Van Zelst</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Department of Mathematics and Computer Science</orgName>
								<orgName type="institution">Eindhoven University of Technology</orgName>
								<address>
									<country key="NL">The Netherlands</country>
								</address>
							</affiliation>
						</author>
						<author role="corresp">
							<persName><forename type="first">B</forename><forename type="middle">F</forename><surname>Van Dongen</surname></persName>
							<email>b.f.v.dongen@tue.nl</email>
							<affiliation key="aff0">
								<orgName type="department">Department of Mathematics and Computer Science</orgName>
								<orgName type="institution">Eindhoven University of Technology</orgName>
								<address>
									<country key="NL">The Netherlands</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">W</forename><forename type="middle">M P</forename><surname>Van Der Aalst</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Department of Mathematics and Computer Science</orgName>
								<orgName type="institution">Eindhoven University of Technology</orgName>
								<address>
									<country key="NL">The Netherlands</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">ILP-Based Process Discovery Using Hybrid Regions</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">570568E1E08F9CE572AE6B8E179B03AD</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-25T08:01+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Process mining</term>
					<term>process discovery</term>
					<term>integer linear programming</term>
					<term>region theory</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>The language-based theory of regions, stemming from the area of Petri net synthesis, forms a fundamental basis for Integer Linear Programming (ILP)-based process discovery. Based on example behavior in an event log, a process model is derived that aims to describe the observed behavior. Building on top of the existing ILP-formulation, we present a new ILP-based process discovery formulation that unifies two existing types of language-based regions and, additionally, we present a generalized ILP objective function that captures both region-types and helps us to find suitable process discovery results.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Process Mining <ref type="bibr" target="#b0">[1]</ref> forms the connecting element between business process modeling and data analysis. Its main aim is to extract knowledge from business process execution data originating from a variety of data sources, e.g., enterprise information systems, social media, embedded systems etc. Within process mining, three main branches are distinguished being process discovery, conformance checking and process enhancement. Within process discovery the aim is to, given an event log, reconstruct a process model. Within conformance checking the aim is to assess, given a process model and an event log, the conformance of the event log to the process model. Within process enhancement the aim is to further improve and/or align existing process models by combining the two aforementioned disciplines, e.g., identification and removal of bottlenecks within business processes.</p><p>The area of Petri net synthesis <ref type="bibr" target="#b1">[2]</ref> is concerned with deciding whether there exists a Petri net that exactly describes a given sequential behavioral system description. A subclass of synthesis techniques is the area of region theory which is both defined on transition systems, known as state-based region theory, and on languages, known as language-based region theory.</p><p>Language-based region theory forms a fundamental basis for ILP-based process discovery <ref type="bibr" target="#b2">[3]</ref>. Within process discovery however, a process model is typically valued w.r.t. four essential process mining quality dimensions being replay-fitness, precision, generalization and simplicity <ref type="bibr" target="#b3">[4]</ref>. Applying traditional Petri net synthesis techniques in a process discovery context will result in models that have perfect replay-fitness, maximal precision, low generalization and often low simplicity.</p><p>In this paper we propose a unified approach w.r.t. ILP-based process discovery, based on the newly introduced concept of hybrid variable-based regions. Hybrid variablebased regions unite two existing language-based region definitions and allow us to vary the number of variables used within the ILP problems being solved. Therefore, we assess the impact of using a different number of variables on the average computation time of solving ILPs during ILP-based process discovery. Additionally we present a new generalized ILP objective function that supports hybrid variable-based regions and show that the objective function yields correct process discovery results.</p><p>The outline of this paper is as follows. In Section 2 we present basic preliminaries. In Section 3 we present language-based regions. In Section 4 we show how to adopt language-based regions within process discovery using ILP. In Section 5 we present a brief evaluation of the performance of the approach. Section 6 covers related work. Section 7 concludes the paper. A sequence σ is a function relating positions to elements e ∈ S, i.e. σ : {1, 2, ..., k} → S. An empty sequence is denoted as . We write every non-empty sequence as e 1 , e 2 , ..., e k . The set of all possible sequences over a set S is denoted as S * . We define concatenation of sequences σ 1 and σ 2 as σ 1 • σ 2 , e.g., a, b • c, d = a, b, c, d . If we are given a set X of sequences over S, i.e., X ⊆ S * , we define X's prefix-closure as</p><formula xml:id="formula_0">X = {σ 1 ∈ S * |∃ σ2∈S * (σ 1 • σ 2 ∈ X)}. Let X ⊆ S * , X is prefix-closed if X = X. Let X ⊆ S * and let B X : X → N be a bag, we define B X 's prefix closure as B X : X → N B X (σ) = B(σ) + σ• e ∈X B X (σ • e ) Thus, for set X 1 = { a, b , a, c } we have X 1 = { , a , a, b , a, c } whereas for bag B 2 = [ a, b 5 , a, c 3 ] we have B 2 = [ 8 , a 8 , a, b 5 , a, c 3 ].</formula><p>Let S be a set on which we can pose some total order and let R ⊆ R be a range of values. Vectors are denoted as z ∈ R |S| , where z(e) ∈ R and e ∈ S. A vector z represents a column vector. For vector multiplication we assume vectors to agree on their indices. Thus, given a totally ordered set S = {e 1 , e 2 , ..., e n } and z 1 , z 2 ∈ R |S| we have z 1 z 2 = i∈{1,2,...,n} z 1 (e i ) z 2 (e i ). Parikh vectors represent the number of occurrences of a certain element within a sequence. A Parikh vector is a function p : S * → N |S| with p(σ) = (σ e1 , σ e2 , ..., σ en ) where σ ei = |{j | 1 ≤ j ≤ |σ|, σ(j) = e i }|. As an example consider S = {a, b, c, d}, σ 1 = a, b, d and σ 2 = a, c, d . We have p(σ 1 ) = (1, 1, 0, 1) and p(σ 2 ) = (1, 0, 1, 1). Note that σ 1 b = 1 whereas σ 2 b = 0. Given a Parikh vector p : S * → N |S| and a set Q ⊆ S, we define p Q : S * → N |Q| . Thus given S = {a, b, c, d} and σ 1 = a, b, d we have p {a,c,d} (σ 1 ) = (1, 0, 1).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">Event Logs and Petri Nets</head><p>The main goal of process discovery is to describe the observed behavior within an event log by means of a process model. Thus, event logs act as the input of process discovery whereas some process model acts as the output of process discovery. An abstract example of an event log is presented in Table <ref type="table" target="#tab_0">1</ref>. Often more data attributes are available in an event log, e.g., customer id, credit balance etc. In this paper we focus on the sequential ordering of activities w.r.t. cases, i.e. the control-flow perspective.</p><p>Definition 1 (Event log). Let A be a set of activities, an event log L is a bag of sequences over A, i.e., L : A * → N. <ref type="foot" target="#foot_0">1</ref>A sequence of activities σ ∈ L is referred to as a trace. We assume that any given set of activities A is totally ordered (or we are able to trivially pose a total ordering on A).</p><p>We consider Petri nets to describe process models. A Petri net is a bipartite graph consisting of a set of vertices called places and a set of vertices called transitions. Arcs (directed edges) are connecting places and transitions and have an associated positive weight.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Definition 2 (Petri net).</head><p>A Petri net is a 3-tuple N = (P, T, W ), where P is a set of places and T is a set of transitions with P ∩ T = ∅. W is the weighted flow relation of N , W : (P × T ) ∪ (T × P ) → N.</p><p>For a given node x ∈ P ∪ T , the pre-set of x in N is defined as •x = {y | W (y, x) &gt; 0}. Correspondingly x• = {y | W (x, y) &gt; 0} denotes the post-set of x in N . Graphically we represent places as circles and transitions as boxes. For every (x, y) ∈ (P × T ) ∪ (T × P ) with W (x, y) &gt; 0 we draw an arc from x to y. If W (x, y) &gt; 1 we denote the arc's weight W (x, y) on top of the arc from x to y. A Petri net is pure if it does not contain self-loops, i.e., if W (x, y) &gt; 0 then W (y, x) = 0. An example (pure) Petri net is depicted in Figure <ref type="figure" target="#fig_1">1</ref>.</p><p>Definition 3 (Marked Petri net). Let N = (P, T, W ) be a Petri net. A marking of N is a bag over N 's places, i.e. M : P → N. A marked Petri net is a pair (N, M 0 ), where M 0 represents N 's initial marking.</p><p>Let N = (P, T, W ) be a Petri net and let M be a marking of</p><formula xml:id="formula_1">N . A transition t ∈ T is enabled, denoted (N, M )[t , if and only if ∀ p∈•t (M (p) ≥ W (p, t)).</formula><p>Graphically we represent a marking by drawing exactly a place's marking-multiplicity number of dots inside the place (e.g. p 1 in Figure <ref type="figure" target="#fig_1">1</ref> with M 0 = [p 1 ]). If a transition t is enabled in (N, M ), t may fire, resulting in a new marking M . When t fires, denoted as (N, M )[t (N, M ), we have ∀ p∈P (M (p) = M (p) − W (p, t) + W (t, p)). For example in Figure <ref type="figure" target="#fig_1">1</ref> we have</p><formula xml:id="formula_2">(N 1 , [p 1 ])[a (N 1 , [p 2 ]).</formula><p>Definition 4 (Firing sequences). Let N = (P, T, W ) be a Petri net and let (N, M 0 ) be a corresponding marked Petri net. Sequence σ = t 1 , t 2 , ..., t n ∈ T * is a firing sequence of (N, M 0 ), written as</p><formula xml:id="formula_3">(N, M 0 )[σ (N, M n ) if and only if for n = |σ| there exist markings M 0 , M 1 , M 2 , ..., M n such that (N, M 0 )[t 1 (N, M 1 ), (N, M 1 )[t 2 (N, M 2 ), ..., (N, M n−1 )[t n (N, M n ).</formula><p>Note that infinitely many firing sequences can exist given a Petri net. Some example firing sequences of the Petri net depicted in Figure <ref type="figure" target="#fig_1">1</ref> </p><formula xml:id="formula_4">are: (N 1 , [p 1 ])[ a (N 1 , [p 2 ]), (N 1 , [p 1 ])[ a, b (N 1 , [p 3 ]) and (N 1 , [p 1 ])[ a, c, d, e, f, e, f, e, g (N 1 , ∅).</formula><p>The set of all possible firing sequences in a Petri net N is called N s language, i.e., all sequences σ ∈ T * s.t. (N, M 0 )[σ (N, M i ). N 's language is denoted as L(N ) ⊆ T * and is prefixclosed.</p><p>Consider Figure <ref type="figure" target="#fig_1">1</ref> and event log L 1 = [ a, b, d, e, g 10 , a, c, d, e, g 9 , a, b, d, e, f, e, g 11 , a, c, d, e, f, e, g 8 ] over A 1 = {a, b, c, d, e, f, g}. Clearly we have L 1 ⊂ L(N 1 ), and thus replay-fitness of L 1 on N 1 is perfect. We will use N 1 , A 1 and L 1 throughout the paper as a running-example.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Regions</head><p>The concept of regions forms the basis of region theory within Petri net synthesis. Given an event log L over a set of activities A, language-based regions are used to represent </p><formula xml:id="formula_5">N 1 = (P 1 , T 1 , W 1 ) with P 1 = {p 1 , p 2 , ..., p 5 }, T 1 = {a, b, ..., g} and W 1 (p 1 , a) = W 1 (a, p 2 ) = ... = W 1 (p 5 , g) = 1.</formula><p>places in a resulting Petri net N = (P, A, W ). A language-based region maps every a ∈ A to an integer representing arc-weight. For such mapping over A to be a region we pose the restriction that the corresponding place p ∈ P should not block any σ ∈ L, i.e., σ ∈ L ⇒ σ ∈ L(N ). We identify two basic definitions of language-based regions which we classify as single variable-based regions and dual variable-based regions.</p><p>The main difference is the number of variables used to define a region.</p><p>Definition 5 (Single variable-based regions). Let L be an event log over a set of activities</p><formula xml:id="formula_6">A. Let m ∈ N and let v ∈ Z |A| . A pair r = (m, v) is a single variable-based region iff: ∀ σ∈L (m + p(σ) v ≥ 0) (3.1)</formula><p>Let R s (L) denote the set of all possible single variable-based regions of L. Equation 3.1 generates a set of linear inequalities, i.e. applying Equation <ref type="formula">3</ref>.1 on L 1 yields: <ref type="figure" target="#fig_1">1</ref> which can be represented as a single variable-based region r = (m, ( v(a), v(b), ..., v(g))) = (0, (1, −1, −1, 0, 0, 0, 0)). Note that for each inequality generated by Equation 3.1, place p 2 has a value of at least 0 and hence is a member of R s (L).</p><formula xml:id="formula_7">m ≥ 0 m + v(a) ≥ 0 a m + v(a) + v(b) ≥ 0 a, b . . . . . . . . . m + v(a) + v(b) + v(d) + 2 v(e) + v(f ) + v(g) ≥ 0 a, b, d, e, f, e, g m + v(a) + v(c) + v(d) + 2 v(e) + v(f ) + v(g) ≥ 0 a, c, d, e, f, e, g</formula></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Single variable-based regions use one single decision variable for each</head><formula xml:id="formula_8">a ∈ A, represented by v ∈ Z |A| . Expressing a single variable-based region r = (m, v) as a place p ∈ P in a marked net (N, M 0 ) with N = (P, A, W ) is straightforward. We have M 0 (p) = m, if v(a) &gt; 0 then W (a, p) = v(a), if v(a) &lt; 0 then W (p, a) = − v(a) and if v(a) = 0 then W (a, p) = W (p, a) = 0. Consider place p 2 depicted in Figure</formula><p>Single variable-based regions only allow us to synthesize/discover pure Petri nets. As a consequence we can not find self-loops, i.e. we can not find a place p in a resulting net N = (P, A, W ) s.t. p ∈ •a∩a•, for any a ∈ A. A lot of workflow patterns <ref type="bibr" target="#b4">[5]</ref> in fact exhibit places that include self-loops, e.g. milestone patterns, mutual-exclusion patterns, priority patterns etc. Hence, we define dual variable-based regions which explicitly allow us to distinguish between incoming and outgoing arcs.</p><p>Definition 6 (Dual variable-based regions). Let L be an event log over a set of activities A. Let m ∈ N and x, y ∈ N |A| . A triple r = (m, x, y) is a dual variable-based region iff:</p><formula xml:id="formula_9">∀ σ=σ • a ∈L (m + p(σ ) x − p(σ) y ≥ 0) (3.2)</formula><p>Let R d (L) denote the set of all possible dual variable-based regions of L. Like Definition 5, Definition 6 generates a set of linear inequalities. Applying Definition 6 on L 1 yields:</p><formula xml:id="formula_10">m − y(a) ≥ 0 a m − y(a) + x(a) − y(b) ≥ 0 a, b m − y(a) + x(a) − y(c) ≥ 0 a, c . . . . . . . . . m − y(a) + x(a) − y(b) + x(b) − y(d) + x(d) − 2 y(e) + 2 x(e) − y(f ) + x(f ) − y(g) ≥ 0 a, b, d, e, f, e, g m − y(a) + x(a) − y(c) + x(c) − y(d) + x(d) − 2 y(e) + 2 x(e) − y(f ) + x(f ) − y(g) ≥ 0 a, c, d, e, f, e, g</formula><p>Dual variable-based regions use two decision variables per a ∈ A, represented by x, y ∈ N |A| . The variables allow us to distinguish between incoming arcs and outgoing arcs when translating regions to Petri nets. Expressing a dual variable-based region r = (m, x, y) as a place p ∈ P in marked net (N, M 0 ) with N = (P, A, W ) is again straightforward. We have M 0 (p) = m, W (a, p) = x(a) and W (p, a) = y(a). Again consider place p 2 depicted in Figure <ref type="figure" target="#fig_1">1</ref> which can be represented as a dual variable-based region r = (m, ( x(a), x(b), ..., x(g)), ( y(a), y(b), ..., y(g))) = (0, (1, 0, 0, 0, 0, 0, 0), (0, 1, 1, 0, 0, 0, 0)). Verify that for each linear in-equality generated by Definition 6, place p 2 has a value of at least 0 and hence is a member of R d (L).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Hybrid Variable-Based Regions in Process Discovery</head><p>Using dual variable-based regions allows us to express non-pure places, i.e, self-loops, milestones etc. However, this type of regions uses roughly twice as many variables compared with single variable-based regions. To balance the number of variables used, though still enhance the possibility of finding non-pure Petri net structures, we introduce the new concept of hybrid regions, capturing both single and dual variable-based regions.</p><p>Definition 7 (Hybrid variable-based regions). Let L be an event log over a set of activities A. Let A s , A d ⊆ A be two sets of activities with </p><formula xml:id="formula_11">A s ∪ A d = A and A s ∩ A d = ∅. Let m ∈ N, v ∈ Z |As| and x, y ∈ N |A d | . A quadruple r = (m, v, x, y) is a hybrid variable-based region iff: ∀ σ=σ • a ∈L (m + p As (σ) v + p A d (σ ) x − p A d (σ) y ≥ 0) (<label>4</label></formula><formula xml:id="formula_12">∀ σ∈L\{ } (m + p(σ) v ≥ 0)<label>(4.2)</label></formula><p>Equation 4.2 does not equal Equation 3.1, however, as p( ) = 0 and m ∈ N, the equations are equivalent in this context. Note that the set of hybrid variable-based regions, i.e. the set of variable assignments that adhere to Definition 7 depends on the hybrid partition of A into A s and A d . Therefore, we let A s act as a parameter for the set of feasible hybrid variable-based regions. Let R h As (L) denote the set of all possible hybrid variable-based regions of L where A s represent a hybrid partition of</p><formula xml:id="formula_13">A. Note R h A (L) = R s (L) and R h ∅ (L) = R d (L)</formula><p>. Region r = (0, 0, 0, 0) is deemed the trivial region.</p><p>All three language-based region definitions, i.e. single, dual and hybrid variablebased regions, provide means to accept and/or reject potential places in a to be constructed Petri net. In classical Petri net synthesis approaches, one keeps looking for feasible places until either L = L(N ), or if this is impossible, L(N ) \ L is minimized. Unfortunately, most models returned by classical Petri net synthesis techniques result in models that are unusable from a process discovery perspective. Hence, we need to relax the strict formal requirements posed on the relation between L and L(N ). </p><formula xml:id="formula_14">z β (r) = σ∈L β(σ)(m + p As (σ) v + p A d (σ) ( x − y)) (4.4)</formula><p>Note that if we choose β(σ) = 1 for all σ ∈ L, denoted β 1 , we instantiate the generalized objective function as the objective function proposed in <ref type="bibr" target="#b2">[3]</ref>. We denote this objective function as z 1 , i.e. Equation <ref type="formula" target="#formula_11">4</ref>.3.</p><p>To relate the behavior in a given event log to the objective function defined in Definition 9 we instantiate the scaling function β making use of the frequencies of the traces present in the event log, i.e. we let β(σ) = L(σ) leading to:</p><formula xml:id="formula_15">z L (r) = σ∈L L(σ)(m + p As (σ) v + p A d (σ) ( x − y)) (4.5)</formula><p>To assess the difference between z 1 and the newly proposed objective function z L , consider the Petri net depicted in Figure <ref type="figure" target="#fig_4">2</ref>. Assume we are given some event log L = [ a, b, d 5 , a, c, d 3 ]. Let r 1 denote the hybrid variable-based region corresponding to place p 1 , let r 2 correspond to p 2 and let r 3 correspond to p 3 . In this case we have z 1 (r 1 ) = 1, z 1 (r 2 ) = 1 and z 1 (r 3 ) = 2. On the other hand we have z L (r 1 ) = z L (r 2 ) = z L (r 3 ) = 5 + 3 = 8. Thus, using z β L leads to more intuitive objective values compared to using z 1 as z L evaluates to the absolute number of discrete time-steps a token would remain in the corresponding place when replaying log L w.r.t. the place.</p><p>In <ref type="bibr" target="#b2">[3]</ref> it is shown that the objective function used favors minimal regions. The proof cannot directly be adapted to hold for hybrid variable-based regions. Moreover it does not provide means to show that any arbitrary instantiation of z β favors minimal hybrid variable-based regions. Here we show that any instantiation of the generalized weighted prefix-closure-based hybrid region objective function with some scaling function β : L → R + favors minimal hybrid variable-based regions. We first show that the objective value of a non-minimal hybrid region equals the sum of the two minimal regions defining it after which we show that the given objective function maps each region to some positive real value, i.e. rng(z β ) ⊆ R + . Lemma 1 (Objective value composition of non-minimal regions). Let L be an event log over a set of activities A and let A s , A d ⊆ A be a hybrid partition. Let</p><formula xml:id="formula_16">r 1 = (m 1 , v 1 , x 1 , y 1 ), r 2 = (m 2 , v 2 , x 2 , y 2 ) and r 3 = (m 1 +m 2 , v 1 + v 2 , x 1 + x 2 , y 1 + y 2 ) with r 1 , r 2 , r 3 ∈ R h</formula><p>As (L). Let z β : R h As (L) → R where z β is an instantiation of the generalized weighted objective function as defined in Definition 9, then z β (r 3 ) = z β (r 1 ) + z β (r 2 ).</p><p>Proof (By definition of z β ). Let us denote z β (r 3 ):</p><formula xml:id="formula_17">σ∈L β(σ)((m 1 + m 2 ) + p As (σ) ( v 1 + v 2 ) + p A d (σ) (( x 1 + x 2 ) − ( y 1 + y 2 ))) σ∈L β(σ)(m 1 + p As (σ) v 1 + p A d (σ) ( x 1 − y 1 )+m 2 + p As (σ) v 2 + p A d (σ) ( x 2 − y 2 )) σ∈L β(σ)(m 1 + p As (σ) v 1 + p A d (σ) ( x 1 − y 1 )) + σ∈L β(σ)(m 2 + p As (σ) v 2 + p A d (σ) ( x 2 − y 2 )) Clearly z β (r 3 = z β (r 1 ) + z β (r 2 ).</formula><p>Lemma 1 shows that the value of z β for a non-minimal region equals the sum of the z β values of the two regions it is composed of. If we additionally show that z β can only evaluate to positive values, we show that z β favors minimal hybrid variable-based regions.</p><p>Lemma 2 (Range of z β is strictly positive). Let L be an event log over a set of activities A and let A s , A d ⊆ A be a hybrid partition. Let r = (m, v, x, y) be a non-trivial region, i.e., r ∈ R h As (L). If we let z β be any instantiation of the generalized weighted objective function as defined in Definition 9, then z β : R h As (L) → R + . Proof (By showing z β (r) &gt; 0, ∀r ∈ R h As (L)). Let r = (m, v, x, y) be a non-trivial hybrid variable-based region, i.e. r ∈ R h As (L). Because r ∈ R h As (L) we have</p><formula xml:id="formula_18">∀ σ=σ • a ∈L\{ } (m + p As (σ) v + p A d (σ ) x − p A d (σ) y ≥ 0) (4.6)</formula><p>Note that each Parikh value of an activity a ∈ A given a sequence σ, i.e. p(σ)(a), is greater or equal than the Parikh value of a, given σ's prefix, i.e.,:</p><formula xml:id="formula_19">∀ σ=σ • a ∈L,a∈A ( p(σ)(a) ≥ p(σ )(a)) (4.7)</formula><p>Using Equation <ref type="formula" target="#formula_11">4</ref>.7 we can substitute p A d (σ ) x with p A d (σ) x in Equation <ref type="formula" target="#formula_11">4</ref>.6:</p><formula xml:id="formula_20">∀ σ=σ • a ∈L\{ } (m + p As (σ) v + p A d (σ) ( x − y) ≥ 0) (4.8)</formula><p>Combining rng(β) ⊆ R + , p( ) = 0 and m ∈ N with Equation 4.8 we find z β (r) ≥ 0:</p><formula xml:id="formula_21">σ∈L β(σ)(m + p As (σ) v + p A d (σ) ( x − y)) ≥ 0 (4.9)</formula><p>If m &gt; 0 then β( )(m+ p As ( ) v+ p A d ( ) ( x− y)) &gt; 0. Combined with Equations 4.8 and 4.9 leads to z β (r) &gt; 0.</p><p>Observe that if m = 0 then for r to be a non-trivial hybrid variable-based region, i.e. r ∈ R h As (L), either (I). ∃a ∈ A s s.t. v(a) &gt; 0 or (II). ∃a ∈ A d s.t. x(a) &gt; 0. (I). Let m = 0 and a ∈ A s s.t. v(a) &gt; 0. We know ∃σ = σ • a ∈ L. Because r ∈ R h As (L) (using Equation <ref type="formula" target="#formula_11">4</ref>.8) we have:</p><formula xml:id="formula_22">m + p As (σ) v + p A d (σ) ( x − y) ≥ 0 If σ = , we have p A d (σ) = 0 and p As (σ) v = v(a)</formula><p>hence we deduce:</p><formula xml:id="formula_23">m + p As (σ) v + p A d (σ) ( x − y) &gt; 0</formula><p>Combining this with Equations 4.8 and 4.9 yields z β (r) &gt; 0.</p><p>If σ = we have (using Equation <ref type="formula" target="#formula_11">4</ref>.8):</p><formula xml:id="formula_24">m + p As (σ ) v + p A d (σ ) ( x − y) ≥ 0 Observe that p As (σ) v = p As (σ ) v + v(a), together with p A d (σ ) = p A d (σ)</formula><p>leads us to reformulate this to:</p><formula xml:id="formula_25">m + p As (σ) v − v(a) + p A d (σ) ( x − y) ≥ 0 m + p As (σ) v + p A d (σ) ( x − y) ≥ v(a)</formula><p>Combining this with Equations 4.8 and 4.9 yields z β (r) &gt; 0.</p><p>(II) Let m = 0 and a ∈ A d s.t. x(a) &gt; 0. We know ∃σ = σ • a ∈ L. Because r ∈ R h As (L) (using Equation <ref type="formula" target="#formula_11">4</ref>.6) we have:</p><formula xml:id="formula_26">m + p As (σ) v + p A d (σ ) x − p A d (σ) y ≥ 0 Observe that p A d (σ) x = p A d (σ )</formula><p>x + x(a) and thus:</p><formula xml:id="formula_27">m + p As (σ) v + p A d (σ) ( x − y) ≥ x(a)</formula><p>Combining this with Equations 4.8 and 4.9 yields z β (r) &gt; 0.</p><p>By combining Lemma 1 and Lemma 2 we can easily show that any instantiation of z β favors minimal regions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Theorem 1 (Any instantiation of z β favors minimal regions). Let L be an event log over a set of activities A and let</head><formula xml:id="formula_28">A s , A d ⊆ A be a hybrid partition. Let r 1 = (m 1 , v 1 , x 1 , y 1 ), r 2 = (m 2 , v 2 , x 2 , y 2 ) and r 3 = (m 1 + m 2 , v 1 + v 2 , x 1 + x 2 , y 1 + y 2 ) be three non-trivial regions, i.e., r 1 , r 2 , r 3 ∈ R h</formula><p>As (L). For any z β : R h As (L) → R being an instantiation of the generalized weighted objective function as defined in Definition 9: z β (r 3 ) &gt; z β (r 1 ) and z β (r 3 ) &gt; z β (r 2 ).</p><p>Proof (By composition of Lemma 1 and Lemma 2). By Lemma 1 we know z β (r 3 ) = z β (r 1 ) + z β (r 2 ). By Lemma 2 we know that z β (r 1 ) &gt; 0, z β (r 2 ) &gt; 0 and z β (r 3 ) &gt; 0. Thus we deduce z β (r 3 ) &gt; z β (r 1 ) and z β (r 3 ) &gt; z β (r 2 ). Consequently, any instantiation of the objective function as defined in Definition 9 favors minimal regions.</p><p>Both objective functions presented, i.e.z 1 and z L , are expressible in terms of the more general objective function z β as presented in Definition 9. As we have seen the two objective functions may favors different regions. Combining an objective function with the ILP-formulation presented in Definition 8 establishes means to find Petri net places. However, solving one ILP only yields one solution and hence we need means to find a set of places, which together form a Petri net that represents the input event log L.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">Finding Several Places</head><p>The most apparent technique to find multiple places is the use of causal relations within an event log. Within the context of this paper we define a causal relation as follows.</p><p>Definition 10 (Causal relation). Let L be an event log over a set of activities A. A causal relation γ L is a function γ L : A × A → R where γ L (a, b) denotes the causality between activity a and b exhibited in L.</p><p>Several approaches exist to compute causalities hence we refer to <ref type="bibr" target="#b5">[6]</ref> for an overview of the use of causalities within different process discovery approaches. As a consequence, γ L (a, b) can have different meanings w.r.t. the causality between a and b. Some approaches limit rng When adopting a causal-based ILP process discovery strategy, we try to find places which will be added in the resulting Petri net, each representing a causality found. A first step is to compute γ L values given some causal definition. Depending on the actual meaning of the γ L , whenever we find a causal relation from a to b (possibly because γ L (a, b) ≥ δ, where δ is some threshold value), we enrich the constraint body for the given causal constraint as follows: After the constraint body is enriched, we solve the ILP yielding a place having an optimal value for the specific objective function chosen. We repeat the procedure for every causal relation yielding a Petri net.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Performance</head><p>The hybrid variable-based ILP-formulation is implemented as a plug-in in the ProM-Framework (http://www.promtools.org) <ref type="bibr" target="#b6">[7]</ref> 2 . The plug-in allows the user to specify parameters of ILP-based process discovery, e.g. several different objective functions, additional constraints and causality based mining.</p><p>To test the performance of hybrid variable-based ILP process discovery we have used the basic implementation within an experimentation framework 3 . The framework allows for generating random process models and from these models generate event logs. We have generated 40 models containing a minimum of 2 activities and a maximum of 12 activities. For each model we have generated 10 event logs, each containing 5000 traces. For each log we ran three different instantiations of the process discovery formulation, one purely single variable-based, one hybrid variant and one purely dual variable-based. The distribution of event classes in A over A s and A d is depicted in Table <ref type="table" target="#tab_1">2</ref>. For the hybrid variant the size of A s was kept constant and independent of the number of activities in A (as of |A| ≥ 3). We ran each instantiation 10 times per log using causal relations as a process discovery strategy. For each run of an instantiation we calculated the total time spend in solving all ILPs based on the causalities present in the event log. These times have been aggregated based on the number of activities present in an event log. The results of the experiment, plotted on a logarithmic scale, are depicted in Figure <ref type="figure" target="#fig_8">3</ref>  4 .</p><p>As shown in Figure <ref type="figure" target="#fig_8">3</ref>, the average time to solve the hybrid formulation is in-between the single and dual formulation. We additionally note the hybrid formulation to get slightly closer to the dual formulation when |A| increases. This is as expected as the number of single variables within the hybrid formulation is constant and hence the average time of solving the ILPs within the formulation increases with respect to the single variable formulation. Figure <ref type="figure" target="#fig_8">3</ref> shows that reducing the number of variables used within the ILP-formulation has an impact on the average time of solving ILPs. The differences are however marginal, which is to be expected as solving ILPs is exponential by nature. Therefore the experiments show that using ILPs for the aim of process discovery in  3 All framework files and results can be found at https://svn.win.tue.nl/repos/ prom/Packages/HybridILPMiner/Tags/papers/source_files/hybrid_ ilp_runtime_experiments.tar.gz 4 The experiments where distributed over four servers (Dell PowerEdge R520, Intel Xeon E5-2407 v2 2.40GHz, 10M Cache, 8 × 8GB RDIMM, 1600MT/s Memory), on each server a total of 10 models and corresponding logs where generated. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">Related Work</head><p>The concept of hybrid variable-based regions originates from language-based region theory, which in turn originates from the area of Petri net synthesis. We identify two main branches of language-based region theory within Petri net synthesis being the separating regions approach [8-10] and the minimal linear basis approach <ref type="bibr" target="#b10">[11,</ref><ref type="bibr" target="#b11">12]</ref>. In the separating regions approach, which uses single-variable based regions, behavior not seen in a given prefix closed language is specified as being undesired. In the minimal linear basis approach, given a prefix closed language, a polyhedral cone of integer points is constructed based on dual variable-based regions. Using the cone a minimal basis of the set of regions is calculated which defines a minimal set of places to synthesized. Both approaches try to minimize L(N ) \ L, where N the resulting Petri net and L is a prefix-closed language. The approaches lead to Petri nets with perfect replay-fitness. Moreover precision is maximized. A side effect of this property is the fact that the synthesized net N scores low on both the generalization and the simplicity dimension.</p><p>In <ref type="bibr" target="#b2">[3]</ref> a first design of an ILP-formulation was presented based on the concept of dual variable-based regions. The work presents objective function z 1 which we have further developed in this work to z L , and more generally, z β . The work also focuses on formulation of several different net-types in terms of linear in-equalities which even go beyond classical Petri nets, e.g. reset-and inhibitor arcs.</p><p>An alternative approach is to use the concept of state-based regions for the purpose of process discovery <ref type="bibr" target="#b12">[13,</ref><ref type="bibr" target="#b13">14]</ref>. Within this approach an abstraction of the event log is computed in the form of a transition system. Regions are computed within the transition system where, like in language-based region theory, each region corresponds to a place in the resulting Petri net.</p><p>In <ref type="bibr" target="#b14">[15]</ref> a process discovery algorithm is proposed based on the concept of numerical abstract domains using Parikh vectors as a basis. Based on an event log, a prefixclosed language is computed of which a convex polyhedron is approximated by means of calculating a convex hull. The convex hull is then used to compute causalities within the input log, by deducing a set of linear inequalities which represent places. The formulation used to calculate these causalities is in essence based the concept of single variable-based regions. The approach allows for finding pure Petri nets with arc weights and multiple marked places.</p><p>A multitude of other Petri net-based process discovery approaches exist. For a detailed description of these approaches we refer to <ref type="bibr" target="#b5">[6]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7">Conclusion</head><p>We presented a new breed of language-based regions, i.e. hybrid variable-based regions, that captures the two existing region types being single and dual variable-based regions. Hybrid variable-based regions allow us to decide whether we want to use one or two variables for an activity present in the input event log. This allows us to achieve gains in terms of performance whilst maintaining the possibility to find complex (workflow) patterns. We have shown that within the hybrid variable-based ILP process discovery formulation, using only one variable per activity a ∈ A performs optimal in terms of the average time spent in solving the ILPs constructed.</p><p>We presented a generalized weighted objective function and showed that any instantiation of the objective function leads to ILPs that favor minimal regions. As a result, practitioners may vary the scaling function within the objective function under the guarantee that the objective function favors minimal regions. We presented a logbased scaling function that exploits trace frequencies available in the input log. Using the log based scaling function within the objective function assigns an objective value to each region equal to the token throughput of the corresponding place, given the event log. Hence, as we have shown using the new objective function leads to more intuitive objective values.</p><p>As an interesting direction for future work we identify the assessment of the impact of filtering, either a-priori or within the ILP itself, on the quality dimensions of the resulting nets, i.e. are we able to leverage the perfect repaly-fitness property? Also, an assessment of the impact of decomposition techniques <ref type="bibr" target="#b15">[16]</ref> on the performance of ILP-based approaches is an interesting direction for future work.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head></head><label></label><figDesc>, Sequences and Vectors Let Z denote the set of integers, let N denote the set of positive integers including 0 and let N + denote the set of positive integers excluding 0. Let R denote the set of real numbers and let R + denote the set of positive real numbers excluding 0. Let S denote a set. Let us define a bag B as a function B : S → N. Notation-wise we write a bag as [e n1 1 , e n2 2 , ..., e n3 3 ], where e i ∈ S, n i ∈ N + and e ni i ≡ B(e i ) = n i . If for some element e, B(e) = 1, we omit its superscript. An empty bag is denoted as ∅. As an example let S 1 = {a, b, c, d} and let B 1 denote a bag consisting of 3 a's, 5 b's, 1 c and 0 d's, i.e. B 1 = [a 3 , b 5 , c]. Element inclusion applies to bags as well, i.e. given e ∈ S and B(e) &gt; 0 then also e ∈ B. Thus we have a ∈ B 1 whereas d / ∈ B 1 .</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Fig. 1 :</head><label>1</label><figDesc>Fig. 1: A pure Petri net N 1 = (P 1 , T 1 , W 1 ) with P 1 = {p 1 , p 2 , ..., p 5 }, T 1 = {a, b, ..., g} and W 1 (p 1 , a) = W 1 (a, p 2 ) = ... = W 1 (p 5 , g) = 1.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>. 1 )</head><label>1</label><figDesc>Given a set of activities A and two sets of activitiesA s , A d ⊆ A with A s ∪ A d = A and A s ∩ A d = ∅,we refer to a hybrid partition of A. If we choose A d = A, Equation 4.1 is equal to Equation 3.2. If we choose A s = A, Equation 4.1 can be reformulated as:</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Definition 9 (</head><label>9</label><figDesc>Generalized weighted prefix-closure-based hybrid region objective function). Let L be an event log over a set of activities A and let A s , A d ⊆ A be a hybrid partition. Let r = (m, v, x, y) ∈ R h As (L) be a hybrid variable-based region and let β be a scaling function over L, i.e. β : L → R + . The generalized weighted prefixclosure-based hybrid region objective function is instantiated as c m = σ∈L β(σ), c v = σ∈L β(σ) p As (σ), c x = σ∈L β(σ) p A d (σ) and c y = − c x , i.e.:</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head>Fig. 2 :</head><label>2</label><figDesc>Fig. 2: A simple Petri net N with L(N ) = { , a , a, b , a, c a, b, d , a, c, d }.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_5"><head></head><label></label><figDesc>(γ L ) to {0, 1} where γ L (a, b) = 1 means that there is a causal relation between a and b whereas γ L ((a, b)) = 0 means there is no causal relation from a to b. Other approaches map rng(γ L ) to the real-valued domain (−1, 1) where a high positive γ L (a, b) value (close to 1) indicates a strong causal relation from a to b and a low negative value (close to −1) indicates a strong causal relation from b to a.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_6"><head>m = 0</head><label>0</label><figDesc>and v(a) = 1 and v(b) = −1, if a, b ∈ A s m = 0 and x(a) = 1 and v(b) = −1, if a ∈ A d , b ∈ A s m = 0 and v(a) = 1 and y(b) = 1, if a ∈ A s , b ∈ A d m = 0 and x(a) = 1 and y(b) = 1, if a, b ∈ A d</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_8"><head>Fig. 3 :</head><label>3</label><figDesc>Fig. 3: Average time in milliseconds to solve the three ILP-based process discovery variants plotted on a logarithmic scale against the number of activities in the log.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 :</head><label>1</label><figDesc>An abstract example of an event log.</figDesc><table><row><cell cols="3">Case id Activity id Resource id</cell><cell>Time-stamp</cell></row><row><cell>c1</cell><cell>a</cell><cell>Lucy</cell><cell>2015-01-05T09:13:37+00:00</cell></row><row><cell>c2</cell><cell>a</cell><cell>John</cell><cell>2015-01-05T13:37:25+00:00</cell></row><row><cell>c2</cell><cell>b</cell><cell>Pete</cell><cell>2015-01-06T13:14:15+00:00</cell></row><row><cell>c2</cell><cell>d</cell><cell>Lucy</cell><cell>2015-01-06T15:27:18+00:00</cell></row><row><cell>c1</cell><cell>c</cell><cell>Pete</cell><cell>2015-01-07T14:28:56+00:00</cell></row><row><cell>c1</cell><cell>d</cell><cell>John</cell><cell>2015-01-07T15:30:00+00:00</cell></row><row><cell>. . .</cell><cell>. . .</cell><cell>. . .</cell><cell>. . .</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2 :</head><label>2</label><figDesc>Different settings used in performance measurements of the hybrid variablebased ILP-formulation.</figDesc><table /></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">In practice we extract an event log L from some information system. Consequently A is implicitly defined by the event log, i.e., A only consists of events that are actually present L.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">The source-code is available at https://svn.win.tue.nl/repos/prom/ Packages/HybridILPMiner/Trunk</note>
		</body>
		<back>
			<div type="annex">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Definition 8 (Hybrid variable-based process discovery ILP-formulation). Let L be an event log over a set of activities A and let A s , A d ⊆ A be a hybrid partition. Let M s , M d , M d be three matrices where M s is an |L| \ { } × A s matrix with M s (σ, a) = p(σ)(a) and</p><p>The hybrid variable-based process discovery ILP-formulation, denoted ILP h L , is defined as:</p><p>The ILP-fromulation presented in Definition 8 uses the set of linear in-equalities generated by Equation <ref type="formula">4</ref>.1 within its constraint body. The formulation however binds v to {−1, 0, 1} |As| and x, y to {0, 1} |A d | , i.e. the formulation only allows for discovering Petri nets with arc weights restricted to {0, 1}. Additionally it defines an objective function, i.e. z = c m m + c v v + c x x + c y y, that maps each region to a real value, i.e. z : R h As (L) → R. In general we are free to choose whatever objective function we like, however, using different objective functions will lead to different process discovery results. Choosing c m = 0 and c v = c x = c y = 1 leads to arc minimization whereas maximizing the same objective leads to arc maximization, resulting in different places/regions found by the underlying ILP solver. The objective function proposed in <ref type="bibr" target="#b2">[3]</ref>, being a minimization function, tries to minimize the number of incoming arcs and maximize the number of outgoing arcs. The objective function can be defined in terms of c m , c v , c x and c y as c m = |L|, c v = σ∈L p As (σ), c x = σ∈L p A d (σ) and c y = − c x , i.e.:</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">Optimizing Token Throughput</head><p>The objective function used within <ref type="bibr" target="#b2">[3]</ref>, i.e. Equation <ref type="formula">4</ref>.3 is less generic as the one proposed in Definition 8. As shown in <ref type="bibr" target="#b2">[3]</ref>, it favors minimal regions. A minimal region is a region that is not expressible as the sum of two other regions. The objective function is solely based on the set-representation of the prefix-closure of an event log. However, an event log is a bag of traces and thus consists of information on trace frequency. We propose a generalized prefix-closure-based objective function that incorporates a wordbased scaling function β. The scaling function β is required to map all sequences in the prefix-closure of the event log to some positive real value. The actual implementation is up to the user, although we present an instantiation of β that works well for process discovery. We show that the proposed generalized weighted prefix-closure-based hybrid region objective function favors minimal regions, given any scaling function β.</p></div>			</div>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<title level="m" type="main">Process Mining: Discovery, Conformance and Enhancement of Business Processes</title>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">M P V</forename><surname>Aalst</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2011">2011</date>
			<publisher>Springer Publishing Company, Incorporated</publisher>
		</imprint>
	</monogr>
	<note>1st edn</note>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">The synthesis problem of Petri nets</title>
		<author>
			<persName><forename type="first">J</forename><surname>Desel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Reisig</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Acta Inf</title>
		<imprint>
			<biblScope unit="volume">33</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="297" to="315" />
			<date type="published" when="1996">1996</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Process discovery using integer linear programming</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">M E M V D</forename><surname>Werf</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">F V</forename><surname>Dongen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">A J</forename><surname>Hurkens</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Serebrenik</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Fundamenta Informaticae</title>
		<imprint>
			<biblScope unit="volume">94</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="387" to="412" />
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">On the role of fitness, precision, generalization and simplicity in process discovery</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">C A M</forename><surname>Buijs</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">F V</forename><surname>Dongen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">M P V</forename><surname>Aalst</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">On the Move to Meaningful Internet Systems: OTM 2012</title>
		<title level="s">Lecture Notes in Computer Science</title>
		<editor>
			<persName><forename type="first">R</forename><surname>Meersman</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">H</forename><surname>Panetto</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">T</forename><surname>Dillon</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>Rinderle-Ma</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">P</forename><surname>Dadam</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">X</forename><surname>Zhou</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>Pearson</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Ferscha</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>Bergamaschi</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">I</forename><forename type="middle">F</forename><surname>Cruz</surname></persName>
		</editor>
		<meeting><address><addrLine>Berlin Heidelberg</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2012">2012</date>
			<biblScope unit="volume">7565</biblScope>
			<biblScope unit="page" from="305" to="322" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Workflow patterns</title>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">M P V D</forename><surname>Aalst</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">H M T</forename><surname>Hofstede</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Kiepuszewski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Barros</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Distributed and Parallel Databases</title>
		<imprint>
			<biblScope unit="volume">14</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="5" to="51" />
			<date type="published" when="2003">2003</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Process mining: Overview and outlook of Petri net discovery algorithms</title>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">F V</forename><surname>Dongen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">K A D</forename><surname>Medeiros</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Wen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">T. Petri Nets and Other Models of Concurrency</title>
		<imprint>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="page" from="225" to="242" />
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">The ProM framework: A new era in process mining tool support</title>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">F V</forename><surname>Dongen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">K A D</forename><surname>Medeiros</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">M W</forename><surname>Verbeek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">J M M</forename><surname>Weijters</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">M P V</forename><surname>Aalst</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Applications and Theory of Petri Nets</title>
		<title level="s">Lecture Notes in Computer Science</title>
		<editor>
			<persName><forename type="first">G</forename><surname>Ciardo</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">P</forename><surname>Darondeau</surname></persName>
		</editor>
		<meeting><address><addrLine>Berlin Heidelberg</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2005">2005. 2005</date>
			<biblScope unit="volume">3536</biblScope>
			<biblScope unit="page" from="444" to="454" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Polynomial algorithms for the synthesis of bounded nets</title>
		<author>
			<persName><forename type="first">E</forename><surname>Badouel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Bernardinello</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Darondeau</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">TAPSOFT &apos;95: Theory and Practice of Software Development</title>
		<title level="s">Lecture Notes in Computer Science</title>
		<editor>
			<persName><forename type="first">P</forename><forename type="middle">D</forename><surname>Mosses</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><surname>Nielsen</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><forename type="middle">I</forename><surname>Schwartzbach</surname></persName>
		</editor>
		<meeting><address><addrLine>Berlin Heidelberg</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="1995">1995</date>
			<biblScope unit="volume">915</biblScope>
			<biblScope unit="page" from="364" to="378" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Theory of regions</title>
		<author>
			<persName><forename type="first">E</forename><surname>Badouel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Darondeau</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Lectures on Petri Nets I: Basic Models</title>
		<title level="s">Lecture Notes in Computer Science</title>
		<editor>
			<persName><forename type="first">W</forename><surname>Reisig</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">G</forename><surname>Rozenberg</surname></persName>
		</editor>
		<meeting><address><addrLine>Berlin Heidelberg</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="1998">1998</date>
			<biblScope unit="volume">1491</biblScope>
			<biblScope unit="page" from="529" to="586" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Deriving unbounded Petri nets from formal languages</title>
		<author>
			<persName><forename type="first">P</forename><surname>Darondeau</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">CONCUR&apos;98 Concurrency Theory</title>
		<title level="s">Lecture Notes in Computer Science</title>
		<editor>
			<persName><forename type="first">D</forename><surname>Sangiorgi</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">R</forename><forename type="middle">D</forename><surname>Simone</surname></persName>
		</editor>
		<meeting><address><addrLine>Berlin Heidelberg</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="1998">1998</date>
			<biblScope unit="volume">1466</biblScope>
			<biblScope unit="page" from="533" to="548" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Process mining based on regions of languages</title>
		<author>
			<persName><forename type="first">R</forename><surname>Bergenthum</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Desel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Lorenz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Mauser</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Business Process Management</title>
		<title level="s">Lecture Notes in Computer Science</title>
		<editor>
			<persName><forename type="first">G</forename><surname>Alonso</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">P</forename><surname>Dadam</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><surname>Rosemann</surname></persName>
		</editor>
		<meeting><address><addrLine>Berlin Heidelberg</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2007">2007</date>
			<biblScope unit="volume">4714</biblScope>
			<biblScope unit="page" from="375" to="383" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">How to synthesize nets from languages -a survey</title>
		<author>
			<persName><forename type="first">R</forename><surname>Lorenz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Mauser</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Juhas</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Simulation Conference</title>
				<imprint>
			<publisher>Winter</publisher>
			<date type="published" when="2007-12">2007. Dec 2007</date>
			<biblScope unit="page" from="637" to="647" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Process mining from a basis of state regions</title>
		<author>
			<persName><forename type="first">M</forename><surname>Solé</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Carmona</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Applications and Theory of Petri Nets, 31st International Conference, PETRI NETS 2010</title>
				<meeting><address><addrLine>Braga, Portugal</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2010">June 21-25, 2010. 2010</date>
			<biblScope unit="page" from="226" to="245" />
		</imprint>
	</monogr>
	<note>Proceedings.</note>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Process mining: a two-step approach to balance between underfitting and overfitting</title>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">M P V D</forename><surname>Aalst</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Rubin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">M W</forename><surname>Verbeek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">F V</forename><surname>Dongen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Kindler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">W</forename><surname>Günther</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Software and System Modeling</title>
		<imprint>
			<biblScope unit="volume">9</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="87" to="111" />
			<date type="published" when="2010">2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Process discovery algorithms using numerical abstract domains</title>
		<author>
			<persName><forename type="first">J</forename><surname>Carmona</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Cortadella</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Trans. Knowl. Data Eng</title>
		<imprint>
			<biblScope unit="volume">26</biblScope>
			<biblScope unit="issue">12</biblScope>
			<biblScope unit="page" from="3064" to="3076" />
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Decomposing Petri nets for process mining: A generic approach</title>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">M P V</forename><surname>Aalst</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Distributed and Parallel Databases</title>
		<imprint>
			<biblScope unit="volume">31</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="471" to="507" />
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
