Introduction

Carmona

jcarmona@lsi.upc.edu 0 Computer Architecture Department , UPC 1 Software Department , UPC

175 190

The problem of synthesis of Petri nets from transition systems or languages has many applications, ranging from CAD for VLSI to medical applications, among others. The most common algorithms to accomplish this task are based on the theory of regions. However, one of the problems of such algorithms is its space requirements: for real-life or industrial instances, some of the region-based algorithms cannot handle in memory the internal representation of the input or the exploration lattice required. In this paper, the incremental derivation of a basis of regions and the later partitioned basis exploration is presented, which allows the splitting of large inputs.

Introduction

The introduction of the theory of regions [ 1 ] in the early nineties enabled a new area of research that strives for transforming language or state-based representations into event-based ones. This transformation, known as synthesis, was initially devoted to derive a Petri net whose reachability graph was bisimilar or isomorphic to the input transition system. A variant of this problem, known as mining, has weaker requirements: the language of the derived Petri net must be a superset (maybe proper) of the language of the input transition system [ 2 ]. The theory of this paper provides algorithms for mining.

Many research has been carried out since the introduction of regions, specially of theoretical nature, which has brought a better understanding of the theory [3{ 6], and has provided meaningful extensions to make it more general [7{10].

As a consequence of the aforementioned theoretical work, tools based on the theory of regions started to be available by the end of the nineties [ 11, 12 ] and still new ones are developed nowadays [ 9, 10 ]. These tools are the outcome of bridging the gap between the theory and practice, and many of them are used extensively in the academic domain, whereas few are used in industry. The reasons for the limited success of these tools in industry might be: 1. The algorithms involved are complex, i.e. in general polynomial with the size of the input [ 3 ], that might be prohibitive for large inputs, and the use of e cient data structures like BDDs or heuristics only alleviates the problem. 2. No high-level techniques are provided to cope with the inherent complexity of the problem. log L1 . .

PN Nn TS A

In this work we provide the theoretical basis for deriving high-level strategies that may allow to handle large speci cations. More concretely, as space requirements are typically the bottleneck for some of the tools listed above, in this paper we present an incremental technique that allows splitting the input into several smaller parts. Moreover, we show how the theory of regions can be extended algorithmically to combine the regions of each part in order to derive the regions of the whole input.

The theory presented will be oriented for the problem of mining: given a set of objects describing behaviors (like a Petri net, a transition systems or an event log) O1; O2; : : : ; On, we want to obtain a Petri net N such that L(N ) Sin=1 L(Oi). The traditional way to solve this problem is to generate a transition system Ai for each Oi, join all these Ai to create a single transition system, and then apply a synthesis technique to derive a Petri net from a transition system [ 12, 13 ].

In this paper we explore a di erent approach. Instead of working with a monolithic transition system, we use the fact that a transition system can be represented by a basis of regions, such that any other region is a linear combination of the regions in the basis (a recent publication shows an e cient technique to accomplish this task [ 14 ]). Then, bases can be combined to obtain a region basis for the whole system, from which we can derive the Petri net N .

The general picture of the approach of this paper is outlined on Fig. 1(a). Some arcs are labeled with an indication of the section or the reference where the conversion/operation is explained. Basically the rst step is to convert inputs that are not transition systems into a transition system, and then compute a region basis from which a PN can be generated. Another possible application is also shown in Fig. 1(b), where a large TS is split into smaller subsystems of manageable size, with the only restriction that all the subsystems must include the initial state and be connected.

Related work

In [ 16 ] an incremental approach was suggested based on the observation that any region of the union of two transition systems can be expressed as the union of regions of those systems. As in the approach presented in this paper, the approach in [ 16 ] trades space for time, since it must rst compute all the regions and then obtain the minimal ones, a slower process than nding directly the minimal regions if the whole transition system ts into memory. The main drawback of this method is that the complete set of regions of each component transition system must be stored (either in memory or disk) in order to compute the set of regions of the union. In this work we propose a faster methodology based on the fact that the complete set of regions can be succintly represented by a basis of regions. 1.2

Organization

We start by giving the necessary background in Sect. 2. The process of combining a set of bases to produce a unique basis for the whole system is explained in Sect. 3. A description of the generation of a PN from a region basis is given in Sect. 4 and, nally, Sect. 5 concludes this paper. 2 2.1

Background Finite Transition Systems and Petri Nets

De nition 1 (Transition system). A transition system (TS) is a tuple hS; ; T; s0i, where S is a set of states, is an alphabet of actions, T S S is a set of (labelled) transitions, and s0 2 S is the initial state.

We use s !es0 as a shortcut for (s; e; s0) 2 T , and we denote its transitive closure as !. A state s0 is said to be reachable from state s if s !s0. We extend the notation to transition sequences, i.e., s1 !sn+1 if = e1 : : : en and (si; ei; si+1) 2 T . We denote #( ; e) the number of times that event e occurs in . Let A = hS; ; T; s0i be a TS. We consider connected TSs that satisfy the following axioms: i) S and are nite sets, ii) every event has an occurrence and iii) every state is reachable from the initial state. The language of a TS A, L(A), is the set of transition sequences feasible from the initial state.

For two TSs A1 and A2, when L(A1) L(A2), we will say that A2 is an over-approximation of A1.

De nition 2 (Union of TSs). Given two TSs A1 = hS1; 1; T1; s0i and A1 = hS2; 2; T2; s0i, the union of A1 and A2 is the TS A1 [ A2 = hS1 [ S2; 1 [ 2; T1 [ T2; s0i.

Clearly, the TS A1 [ A2 is an over-approximation of the TSs A1 and A2, i.e.

L(A1) [ L(A2) L(A1 [ A2).

De nition 3 (Petri net [17]). A Petri net (PN) is a tuple (P; T; W; M0) where

the sets P and T represent disjoint nite sets of places and transitions, respectively, and W : (P T ) [ (T P ) ! N is the weighted ow relation. The initial marking M0 2 NP de nes the initial state of the system.

A transition t 2 T is enabled in a marking M if 8p 2 P : M (p) W (p; t). Firing an enabled transition t in a marking M leads to the marking M 0 de ned by M 0(p) = M (p) W (p; t) + W (t; p), for p 2 P , and is denoted by M !t M 0. The set of all markings reachable from the initial marking M0 is called its Reachability Set. The Reachability Graph of N , denoted RG(N ), is a transition system in which the set of states is the Reachability Set, the events are the transitions of t the net and a transition (M1; t; M2) exists if and only if M1 ! M2. We use L(N ) as a shortcut for L(RG(N )). 2.2

Generalized Regions

The theory of regions [ 1, 5 ] provides a way to derive a Petri net from a transition system. Intuitively, a region corresponds to a place in the derived Petri net. In the initial de nition, a region was de ned as a subset of states of the transition system satisfying a homogeneous relation with respect to the set of events. Later extensions [ 7, 18, 10 ] generalize this de nition to multisets, which is the notion used in this paper.

De nition 4 (Multiset, k-bounded Multiset, Subset). Given a set S, a

multiset r of S is a mapping r : S ! Z. The number r(s) is called the multiplicity of s in r. Multiset r is k-bounded if all its multiplicities are less or equal than k. Multiset r1 is a subset of r2 (r1 r2) if 8s 2 S : r1(s) r2(s).

We de ne the following operations on multisets:

De nition 5 (Multiset operations).

Maximum power pow(r) = maxs2S r(s) Minimum power minp(r) = mins2S r(s) Scalar product (k r)(s) = k r(s), for k 2 Z Scalar sum (r + k)(s) = r(s) + k, for k 2 Z Union (r1 [ r2)(s) = max(r1(s); r2(s)) Sum (r1 + r2)(s) = r1(s) + r2(s)

Subtraction (r1 r2)(s) = r1(s) r2(s) The operations described above have algebraic properties, e.g., r + r = 2 r and r1 k r2 = r1 + ( k) r2.

De nition 6 (Gradient). Let hS; ; T; s0i be a TS. Given a multiset r and a transition s !es0 2 T , its gradient is de ned as r(s !es0) = r(s0) r(s). If all the transitions of an event e 2 have the same gradient, we say that the event e has constant gradient, whose value is denoted as r(e). 0 s3

Example 1. Fig. 2(a) shows a TS. The numbers within the states correspond to the multiplicity of the multiset r shown. Multiset r is a region because both events a and b have constant gradient, i.e. r(a) = 2 and r(b) = 3. There is a direct correspondence between regions and places of a PN. The gradient of the region describes the ow relation of the corresponding place, and the multiplicity of the initial state indicates the number of initial tokens [ 10 ]. Fig. 2(b) shows the place corresponding to the region shown in Fig. 2(a).

We say that region r is normalized if minp(r) = 0. Similarly, it is nonnegative if minp(r) 0. Any region r can become normalized by subtracting minp(r) from the multiplicity of all the states.

De nition 8 (Normalization). We denote by #r the normalization of a region r, so that #r = r minp(r).

It is useful to de ne a normalized version of the sum operation between regions, since it is closed in the class of normalized regions.

De nition 9 (Normalized sum). Let r1 and r2 be normalized regions, we denote by r1 r2 their normalized sum, so that r1 r2 =#(r1 + r2).

De nition 10 (Gradient vector). Let r be a region of a TS whose set of

events is = fe1; e2; : : : ; eng. The gradient vector of r, denoted as (r), is the vector of the event gradients, i.e. (r) = ( r(e1); r(e2); : : : ; r(en)). Proposition 1. Gradient vectors have the following properties: (r1 + r2) =

(r + k) = (r1 a d s3

Regions can be partitioned into classes using their gradient vectors.

De nition 11 (Canonical region). Two regions r1 and r2 are said to be

equivalent if their gradient is the same, i.e. r1 r2 , (r1) = (r2). Given a region r, the equivalence class of r, is de ned as [r] = frij ri rg. A canonical region is the normalized region of an equivalence class, i.e. #r.

Example of canonical regions are provided in Fig. 4, where two TSs are shown in which some regions have been shadowed. For instance, the canonical region r0 = fs1; s2g has gradient vector (r0) = (1; 0). A PN built from the set of minimal canonical regions has the same language as a PN built using all the regions [ 5 ], thus it yields the smallest overapproximation with respect to the language of the TS [ 10 ].

De nition 12 (Subregion, Empty region, Minimal canonical region).

r1 is a subregion of r2, denoted as r1 v r2, if, for any state s, #r1(s) #r2(s). We denote by ; the region in which all states have zero multiplicity. A minimal canonical region r satis es that for any other region r0, if r0 v r then r0 ;. 3

Combining region bases

In this section we detail how region bases of di erent TSs can be joined, yielding a region basis of their union. We will illustrate the theory with the running example shown in Fig. 3. In this case, we assume A is a very large TS that cannot be easily handled, hence it is split into two smaller subsystems, namely A1 and A2, so that A = A1 [ A2. 3.1

Basis of regions

De nition 13 (Region basis). Given a TS, a region basis B = fr1; r2; : : : ; rng is a minimal subset of the canonical regions of TS such that any region r can be expressed as a linear combination of B ( i.e. r = Pn i=1 ci ri, with ci 2 Q; ri 2 B). r0 region r1 region

Theorem 1 ([ 18 ]). Let hS; ; T; s0i be a TS. The size of the region basis is less or equal to min(j j; jSj 1). Given two systems described by their region basis, we want to obtain the region basis of their union TS. The work more closely related is [ 16 ], in which it is described how the set of regions in the joined system can be obtained from the regions of the component systems. This is achieved by introducing the concept of compatible (standard) regions. In this section we rst review and extend this concept of compatibility to generalized regions.

De nition 14 (Compatible TSs). Two TSs A1 and A2 are compatible if they have the same initial state.

Def. 14 is more general than the one in [ 16 ], where the number of shared states is restricted to one (the initial state). For instance, systems A1 and A2 of Fig. 3 are compatible according to this de nition, but not using the de nition in [ 16 ], since they share states s0 and s2.

De nition 15 (Compatible regions, o set). Two regions r1 and r2 from

two compatible TSs A1 = hS1; 1; T1; s0i and A2 = hS2; 2; T2; s0i are said to be compatible if: { 8e 2 1\ 2; r1 (e) = r2 (e), i.e. they have the same gradient for all common events, and, { 9k 2 Z : 8s 2 S1 \ S2; r2(s) r1(s) = k, i.e. the di erence in multiplicity of each shared state is equal to a constant, that we call the o set between r1 and r2, denoted as o (r1; r2).

Two compatible regions are said to be directly compatible if o (r1; r2) = 0, a fact that we denote as r1 $ r2. Conversely, if o (r1; r2) 6= 0, we say that the regions are indirectly compatible and we use the following notation r1 ! r2.

An immediate consequence of Def. 15 is that, if there is only a single shared state, then any two regions with the same gradient for all common events are compatible. This is, for instance, the case when TSs represent execution trees and only the initial state is shared among them.

Two compatible regions can be made directly compatible by adding the o set to one of them.

De nition 16 (Directly compatible region). Given two compatible regions

r1 and r2 with o (r1; r2) > 0, the directly compatible region of r1 with respect to r2 is r1"r2 = r1 + o (r1; r2).

De nition 17 (Union of compatible regions). Given two compatible regions r1 and r2, de ned over sets of states S1 and S2, respectively, with o (r1; r2) > 0, their union, denoted r1 t r2, is (r1 t r2)(s) = ((r1"r2 )(s) r2(s) if s 2 S1 otherwise

Proposition 2 (Union of compatible regions is a region of union sys

tem). Given two compatible regions r1 and r2 of two compatible TSs A1 and A2, r1 t r2 is a region of A1 [ A2. For each shared event, its gradient is equal to the gradient in r1 or r2, which are equal. For non-shared events of A1 (A2), their gradient is the gradient in A1 (A2).

Proof. Assume o (r1; r2) 0. Region r = r1 t r2, where r1"r2 = r1 + o (r1; r2). The latter entails that, in a shared state s, the multiplicity is the same for r1"r2 and r2, which is the multiplicity assigned to r. For non-shared states, on the other hand, the multiplicity assigned in r is the multiplicity of the region whose TS contains the state. Thus any arc (either going from a shared to nonshared state or any other combination) has a constant gradient, equal to the gradient of that event in the corresponding region. Result transfers to r1 because (r1) = (r1"r2 ). tu Example 2. In Fig. 5 we show one region of each subsystem of our running example. They are compatible, since the gradient of the single shared event a is the same in both regions. More speci cally, they are indirectly compatible, since their o set is di erent than 0. s1 a b r s0 s2 s0 s2 q a c d s3 s1 a b r s0 s2 s0 s2 c q"r a d s3

Indirectly compatible

regions (r ! q)

Directly compatible

regions (r $ q "r)

Property 1. Given two compatible non-negative regions r1 and r2. If they are directly compatible then r1 t r2 is a normalized region, if and only if, one of them is normalized. If they are not directly compatible, but both of them are normalized, then again r1 t r2 is a normalized region.

Proof. If one of them is normalized and they are directly compatible no modi cation of multiplicities is performed, so the state with 0 multiplicity will remain untouched and since regions are non-negative, then minp(r1 t r2) = 0. Conversely, if r1 t r2 is a normalized region and r1 $ r2, then all multiplicities of either r1 or r2 are greater or equal to 0, and since there is at least one 0 multiplicity, at least one of them must be normalized. If both are normalized but are not directly compatible, only one of them modi es its multiplicities and we end up in the same situation. 3.3

Incremental algorithm for obtaining a basis

Given two TSs, A1 and A2. Let fr1; : : : ; rng be the region basis of A1 and fq1; : : : ; qmg the basis of A2. The set of all regions of the union system A1 [ A2 whose language satis es L(A1 [ A2) L(A1) [ L(A2) is obtained by nding all non-trivial solutions for xi variables that satisfy: 8ei 2 1 \ 2 : rj (ei) xj =

qk (ei) xn+k

X 1 j n

X 1 k m i.e. for all common events their gradient must be the same on both systems.

This system of equations can be rewritten in matrix form as M x = 0, where x is the column vector of variables xi and M is the following matrix: 0 r1 (e1) : : : rn (e1)

B r1 (e2) : : : rn (e2) M = BB@ ...

r1 (ec) : : : rn (ec) q1 (e1) : : : q1 (e2) : : : . .

. q1 (ec) : : : qm (e1)1 qm (e2)C

C C

A qm (ec) where c is the number of common events in the system, i.e. c = j 1 \ 2j. We call this matrix the gradient compatibility matrix between A1 and A2.

Compatibility of regions demands not only the gradients of common events to be the same, but also that the o set is the same for all shared states. To enforce such condition, assume that we shift all the regions in the bases fr1; : : : ; rng and fq1; : : : ; qmg so that for all ri and qj we have that ri(s0) = qj (s0) = 0. Now any region obtained by combining the regions in the bases will have a 0 multiplicity in the initial state. Thus, if the o set is the same in all shared states, their multiplicity must coincide in all remaining shared states.

If there is a path between s0 and shared state s in which the same events (and the same number of times but no matter in which order) are red in both TSs A1 and A2, then the condition is automatically satis ed. Only in the case where all paths are di erent, we say that shared state s is in con ict, and then we must explicitly enforce the equality of the multiplicity of s in both systems.

This condition can be expressed in matrix form using a row for each shared state in con ict (di erent than s0 since this state is never in con ict). For a shared state s in con ict, its corresponding row will be of the form (r1(s) : : : rn(s) q1(s) : : : qm(s)), where all regions ri and qj have been shifted, as said before, so that ri(s0) = qj (s0) = 0. Let us name as C the matrix containing such rows, we will call it the shared state con ict matrix between A1 and A2. Now since the multiplicity of all these states must be the same, their subtraction must be 0. Thus, C x = 0.

Theorem 2. Let A1 and A2 be two compatible TSs with region bases fr1; : : : ; rng and fq1; : : : ; qmg respectively. Let M be their gradient compatibility matrix, C be the shared state con ict matrix, and x the column vector of variables x1 to xn+m. Consider an assignment to the variables in x such that MC x = 0 and some xi 6= 0. This assignment identi es a region rtq = P1 i n rixi tP1 j m qj xn+j in A1 [ A2.

Proof. A non-trivial solution x de nes two compatible regions, namely r = P1 i n rixi and q = P1 j m qj xn+j , in A1 and A2 respectively. These two regions are compatible according to De nition 15 if M x = 0, because for common events the gradient is the same and the o set is the same in all shared states if C x = 0. By Proposition 2, r t q is a region of A1 [ A2. tu

So the problem reduces to nding the solutions to a homogeneous linear system. Note that the system does not require to have solutions in the integer domain. In fact all the solutions are in Q, since all the gradients are integers. Homogeneous linear systems have one trivial solution (i.e. 0) and in nite nontrivial solutions. Let yi be all the non-redundant solution vectors, then any possible solution of the system can be obtained by linear combination of these solution vectors, since adding or subtracting 0 from 0 does not change its value. These yi are a basis of the nullspace of MC , and any solution x can be written as a unique linear combination x = Pi iyi, with i 2 Q.

There are several well-known methods to obtain a basis for the nullspace [ 19 ], one of the easiest is to put the matrix in reduced row echelon form, determine the s0 s2 s0 s2 b0 b0 s0 s2 s0 s2 c c b02 a d b20 a d s3 s3 s1 s1 b11 b11 a b a b s0 s2 s0 s2 b1 b1 s0 s2 s0 s2 c c b21 a d b12 a d s3 s3 Fig. 6. Region basis fb0; b1g (and their coregions f b0; b1g) for system A1 [ A2. Regions are partitioned so that b0 = b0 [ b0, where bi0 is the part of b0 in Ai.

1 2 free variables, and then, for each free variable xi, derive a vector of the basis by assigning 1 to xi and 0 to the rest of free variables. The yi columns correspond to the combination of regions of both system that have the same gradient, and the resulting combined region is a region basis for the union TS. Example 3. We will compute the region basis of the union of TSs A1 and A2 shown in Fig. 3. Two possible region basis for these systems are fr0; r1g and fq0; q1g, show in Fig. 4. The matrix M in this case is where columns (from left to right) correspond to regions r0; r1; q0; q1 and there is only a single row that corresponds to the only shared event between A1 and A2, namely event a.

The set of shared states is fs0; s2g, thus the shared state con ict matrix C has only one row because only state s2 is reachable from s0 by ring a di erent multiset of events. Consequently C = r0(s2) r1(s2) q0(s2) q1(s2) . With matrices M and C we can now build

M C = which is an indeterminate system with 2 degrees of freedom. We can write x1 = 2x2 + x3 and x0 = x2 x3, so by changing the values of variables x2 and x3 (-1,0) (0,1) (0,-1) (1,1) (1,-1) (-1,1) (-1,-1) we can generate all the possible solutions. Given these parameters the region solution will be (( x2 x3) r0 + (2x2 + x3) r1) t (x2 q0 + x3 q1). Using the parameter values (1; 0), and (0; 1) for vector (x2; x3) we do obtain the following regions in the joined system: b0 = ( r0 + 2r1) t q0 and b1 = ( r0 + r1) t q1. The set fb0; b1g forms a region basis for the A1 [ A2 system (see Fig. 6). 4

Generating a PN from a basis

In [ 14 ] an algorithm was presented that allows nding minimal regions by careful exploration of the region space de ned by the region basis. The fundamental idea is that the regions in the basis are initially assumed to be minimal, and then combinations of these regions can only yield a minimal region if the resulting region is non-normalized, since otherwise the region is a superregion of any of its component regions.

However this approach cannot be directly used if the number of states in the monolithic TS is too high to easily perform region operations in memory. The alternative we propose in this paper is to partition the region operations into the di erent component TSs, so that each time only the information of one of the systems is accessed.

To achieve this partitioning, consider the region basis fb0; : : : ; bng, we denote by bij the part of region bi that belongs to subsystem j. All the regions in the basis are assumed to be normalized, but the di erent bij could be non-normalized, since they are de ned so that, given i, all bij are directly compatible (cf. Def. 15). For instance region b0 in Fig. 6 is normalized, however b20 it is not.

Consequently a combination of the basis P c

i i bij iis ia nboiny-ineoldrsmaalinzoend-nroegrmioanli(zbedy region only if, for all subsystem j, P c Petri Nets & Concurrency { 187 13

Partial order

in A1 (0,0)

Partial order

in A2 (0,0)

Property 1)3. Thus, the strategy would be to test the basis combinations in the rst subsystem, keeping only the combinations producing non-normalized regions. Then only these combinations will be tested in the second subsystem, discarding the ones that yield a normalized region, and the process will continue until all subsystems have been checked.

Finally, with only the surviving combinations, the subregion test will be performed to guarantee that only the minimal regions are found. Again this test can be distributed among the subsystems, since Pi ci bi Pi c0i bi if, and only if, for all subsystem j, Pi ci bij Pi c0i bij .

We will illustrate all this process using our running example. In Fig. 7 we can see a tree of combinations of the fb0; b1g basis. Each node is a tuple (c0; c1) describing the coe cients used to obtain a region r as c0 b0 + c1 b1. The second level of the tree corresponds to the regions in the basis and their coregions (as shown in Fig. 6). To bound the search space, assume we arbitrarily x that the coe cients are only allowed to take values in the set f 1; 0; 1g.

From the four possible combinations in the third level, two of them yield already normalized regions in A1, namely combinations (1; 1) and ( 1; 1). On the other hand, combinations (1; 1) and ( 1; 1) correspond to non-normalized regions in A1, as shown in the gure. Consequently only these two combinations will be checked in subsystem A2. In this case, both regions, namely, b20 + ( b12) and b20 +b21, are also non-normalized in A2. Thus these combinations correspond to non-normalized regions in A1 [ A2, which means that they might be minimal regions (once normalized). 3 Since Property 1 only holds for non-negative regions, negative ci coe cients are treated as markers for summing the normalized coregions of bi. For instance 2b1 3b2 will be actually computed as 2 b1 + 3 #( b2). This way all regions are always nonnegative and Property 1 can be safely used.

At this point we must check for minimal regions. The list of candidates includes all the regions in the basis (and their coregions), that is all the combinations in the second level, as well as combinations (1; 1) and ( 1; 1) that have been found during the exploration. To check for minimality, we create a partial order among all these combinations in each subsystem (see Fig. 8). A region is not minimal if there is a combination, di erent than (0; 0), that appears before it in all the partial orders. In our example, combinations (1; 0) and ( 1; 0) are not minimal. Conversely, for instance ( 1; 1) is minimal because, although combAi2na(ti.ieo.n (b012; 6 1) bp20re+cebd21e)s. it in A1 (i.e. b11 b10 + b11), it is not longer true in With the minimal regions found we build the mined PN of Fig. 9.

For a set of subsystems A1; : : : ; An the algorithm can be summarized as follows: { Take the rst subsystem (A1) and explore region space until either combinations are exhausted (according to the user de ned bounds on the coe cients) or the border of non-normalized regions is found. { Pass to next subsystem the set of combinations of non-normalized regions and check for normality in this other subsystem.

If region is normalized, then discard, but mark sibling combinations to be explored in the current subsystem (since all sibling combinations in the rst subsystem will remain to be non-normalized).

On the other hand, if region is non-normalized and we are not in the last subsystem, then add it to the list of regions that conform the border of non-normalized regions that will be passed to next subsystem. If we are already in the last subsystem, then add the combination to the list of candidate minimal regions, normalize it, and then check the normalization status in all subsystems. Sibling combinations are scheduled to be explored in the rst subsystem whose subpart of the region is normalized. 5

Conclusions

In this paper we have extended the theory developed in [ 14 ] to devise an incremental algorithm for process mining. Given that space requirements are often the Petri Nets & Concurrency { 189 15 real bottleneck of some of the region-based techniques in the literature, methods like the one presented in this paper might represent a crucial step into making the theory applicable in industrial scenarios.

1. Ehrenfeucht , A. , Rozenberg , G.: Partial (Set) 2-Structures. Part I, II . Acta Informatica 27 ( 1990 ) 315 { 368

2. van der Aalst , W.M.P. , Weijters , T. , Maruster , L. : Work ow mining: Discovering process models from event logs . IEEE Trans. Knowl. Data Eng . 16 ( 9 ) ( 2004 ) 1128 { 1142

3. Badouel , E. , Bernardinello , L. , Darondeau , P. : Polynomial algorithms for the synthesis of bounded nets . Lecture Notes in Computer Science 915 ( 1995 ) 364 { 383

4. Hoogers , P.W. , Kleijn , H.C.M. , Thiagarajan , P.S.: An event structure semantics for general petri nets . Theor. Comput. Sci . 153 ( 1 &2) ( 1996 ) 129 { 170

5. Desel , J. , Reisig , W. : The synthesis problem of Petri nets . Acta Inf . 33 ( 4 ) ( 1996 ) 297 { 315

6. Badouel , E. , Darondeau , P. : Theory of regions . In Reisig, W., Rozenberg , G., eds.: Petri Nets. Volume 1491 of LNCS., Springer ( 1998 ) 529 { 586

7. Mukund , M. : Petri nets and step transition systems . Int. Journal of Foundations of Computer Science 3 ( 4 ) ( 1992 ) 443 { 478

8. Darondeau , P. : Deriving unbounded petri nets from formal languages . In Sangiorgi, D., de Simone, R., eds. : CONCUR . Volume 1466 of Lecture Notes in Computer Science., Springer ( 1998 ) 533 { 548

9. Bergenthum , R. , Desel , J. , Lorenz , R. , Mauser , S. : Synthesis of petri nets from nite partial languages . Fundam. Inform . 88 ( 4 ) ( 2008 ) 437 { 468

10. Carmona , J. , Cortadella , J. , Kishinevsky , M. : New region-based algorithms for deriving bounded Petri nets . IEEE Transactions on Computers 59 ( 3 ) ( 2009 )

11. Caillaud , B. : Synet : A synthesizer of distributable bounded Petri-nets from nite automata . http://www.irisa.fr/s4/tools/synet/ ( 2002 )

12. Cortadella , J. , Kishinevsky , M. , Lavagno , L. , Yakovlev , A. : Deriving Petri nets from nite transition systems . IEEE Trans. on Computers 47 ( 8 ) ( 1998 ) 859 { 882

13. Carmona , J. , Cortadella , J. , Kishinevsky , M. , Kondratyev , A. , Lavagno , L. , Yakovlev , A. : A symbolic algorithm for the synthesis of bounded Petri nets . In: Application and Theory of Petri Nets and Other Models of Concurrency . ( 2008 )

14. Sole , M. , Carmona , J.: Process mining from a basis of state regions . In: Application and Theory of Petri Nets and other Models of Concurrency. LNCS , Springer ( 2010 ) 226 { 245

15. van der Aalst , W., Rubin , V. , Verbeek , H., van Dongen, B. , Kindler , E. , Gunther, C.: Process mining: a two-step approach to balance between under tting and over tting . Software and Systems Modeling ( 2009 )

16. Dongen , B.F.V. , Busi , N. , Pinna , G.M., van der Aalst , W.: An iterative algorithm for applying the theory of regions in process mining . In: Proceedings of the Workshop on Formal Approaches to Business Processes and Web Services (FABPWS'07) . ( 2007 ) 36 { 55

17. Murata , T. : Petri Nets: Properties, analysis and applications . Proceedings of the IEEE (April 1989 ) 541 { 580

18. Bernardinello , L. , Michelis , G.D. , Petruni , K. , Vigna , S. : On the synchronic structure of transition systems . In Desel, J., ed.: Structures in Concurrency Theory, Proc. of the Int. Workshop on Structures in Concurrency Theory . ( 1995 ) 69 { 84

19. Kalman , D. : Basic null space calculations . The College Mathematics Journal 15 ( 1 ) ( 1984 ) 42 { 47