-

Discovering Process Models from Incomplete Event Logs using Conjoint Occurrence Classes

Tonatiuh Tapia-Flores

Edelma Rodr´ıguez-P´erez

Ernesto L´opez-Mellado

0 0 CINVESTAV Unidad Guadalajara Av. del Bosque 1145 , Col. El Bajio. 45015 Zapopan Mexico

31 46

This paper addresses the problem of discovering a sound Workflow net (WFN) from sample event traces representing the behavior of a process. A current challenging problem deals with discovering a suitable model from incomplete logs since it is not possible to ensure that a log contains all executions of a process. A method that allows building WFN from logs including a reduced number of traces is presented. It is based on the concept of invariant occurrence between activities, which yields sets of activities named conjoint occurrence classes, which are used to infer the causal and concurrent behavior not exhibited in the log. Procedures derived from the method have been implemented in the ProM framework. Experimental results show that the algorithm can discover models form logs including less traces compared to other standing discovery algorithms.

Process discovery Petri nets Incomplete logs

The automated synthesis of formal models from the observation of a process execution in the form of event sequences is called process discovery [ 3 ]. Event data is provided commonly by resource planning and workflow management systems in organizations as logs of event traces, which represent the behavior of the process to be discovered. The aim is to obtain suitable models that describe clearly the causal and parallel relationships between the events in traces.

One significant problem in process discovery techniques is the fact that the log could not contain enough information to discover a process model, i.e. the log might be incomplete. This might cause that the obtained model excludes actual behavior or represents some behavior that is not in the real process. Many things can cause the incompleteness of the log; one of the most common is the presence of parallelism, due to the high number of possible parallel activities whose interleaving cannot be sampled in the traces.

The obtained model through a discovery technique must be easy to understand by representing clearly, the causal and parallel relationships between the activities. Furthermore, such a model must represent all the observed behavior (provided as input data) and the lowest amount as possible of non-observed behavior. Several metrics called quality dimensions [ 6 ] assess the features above mentioned for the discovered model with respect to the event log.

A procedure used to test whether or not a technique is able to discover models that have the same language as the real-life process by which a log was produced is called rediscoverability, which compares the obtained model directly with the real-life process. This procedure is summarized in Figure 1.

The method proposed in this paper pursues to obtain a suitable Petri net model that reveals some behavior hidden in the log, namely some causal and/or concurrent relationships. A method that allows building sound WFN from logs including a reduced number of traces is presented. The discovery technique is based on the concept of invariant occurrence between activities, which leads to obtain sets of events named conjoint occurrence classes; such classes help to infer the causal and concurrent behavior not exhibited in the log.

The paper is organized as follows. In Section 2 the related work is overviewed, In Section 3, the basic notions of PN and WFN are recalled. Section 4 states the first notions of invariance of occurrence and defines the occurrence classes. In Section 5 the synthesis of PN model from the conjoint occurrence classes is presented. Section 6 demonstrates the performance of the proposal through experiments and the results are compared with other approaches. Section 7 concludes the paper. 2

Related Work

Pioneer works on the matter, named language learning techniques, have been proposed in the field of computer sciences. The aim was to build fine representations (finite automata, grammars) of languages from samples of accepted words [ 12, 5 ]. In the field of workflow management systems, several process discovery techniques have been proposed. In an early proposal [ 4 ], a finite automaton, called conformal graph is obtained from sample traces execution. Later in [ 2 ] the alpha algorithm uses the log based ordering between activities to infer the a model. Numerous techniques present extensions To this algorithm, which are able to discover more complex behavior such as implicit dependencies [ 19 ]. In [ 8 ] a probabilistic approach to find the concurrent and direct relations between activities is presented. In [ 16 ] a formalism to represent the partial order between activities, which improve the visualizations of event logs with additional data is introduced. The discovery problem is addressed in an Integer linear programing approach in [ 20 ]. The inductive miner (IM) is presented in [ 13 ], which applies a divide-and-conquer approach that iteratively partitions the activities in process constructs as joint and splits. An adaptation of this algorithm capable to deal with incomplete logs called (IMin) [ 14 ] uses probabilistic behavioral relations to deal with incompleteness. Recently, a new formal model to represent process called process tree has been introduced in [ 1 ]. Many methods have adopted this representation; such is the case of evolutionary tree miner (ETM) [ 6 ], that uses a genetic algorithm to discover a process tree; the innovation in this technique is that it allows the user to select the preferences with respect to the quality dimensions of the final model.

In the field of Discrete Event Systems (DES), where the problem is named identification, several approaches have been proposed for building models representing the observed behavior of automated processes. The incremental approach proposed in [ 15 ] , allows building safe interpreted PN models from a continuous stream of system’s outputs. Later, a technique based on project the activities sequence on sub-alphabets to discover specific patterns is present in [ 17 ]. In [ 11 ] input-output identification of automated manufacturing process is addressed; an interpreted PN is obtained from a set of sequences of input-output vectors collected from the controller during the system cyclic operation. Recently an approach based on the computing t-invariants that allow finding implicit dependencies in the PN model is presented in [ 18 ]. The literature on this research line is vast; wide reviews on DES identification can be found in [ 10, 7 ].

In both fields, the aim in general is to discover processes models from observed behavior. However, the nature of processes leads to problem statements somehow different in terms of input data. 3

Background

This section presents the basic concepts and notations used in this paper.

Definition 1. An ordinary Petri Net structure G is a bipartite digraph repre

sented by the 4-tuple G = (P, T, I, O) where: P = {p1, p2, ..., p|P |} and T = {t1, t2, ..., t|T |} are finite sets of vertices named places and transitions respectively; I : P × T → {0, 1} (O : T × P → {0, 1}) is a function representing the arcs going from places to transitions (from transitions to places).

A marking function M : P → N represents the number of tokens residing inside each place; it is usually expressed as an |P |-entry vector.

A Petri Net system or Petri Net (P N ) is the pair N = (G, M0), where G is a P N structure and M0 is an initial marking. In a PN system, a transition tj is enabled at marking Mk if ∀pi ∈ P, Mk(pi) ≥ I(pi, tj ); an enabled transition tj can be fired reaching a new marking Mk+1. The reachability set of a PN is the set of all possible reachable markings from M0 firing only enabled transitions; this set is denoted by R(G, M0). For any tj ∈ T , •tj = {pi|I(pi, tj ) = 1}, and tj • = {pi|O(tj , pi) = 1}, similarly for any pi ∈ P , •pi = {tj |O(tj , pi) = 1}, and pi• = {tj |I(pi, tj ) = 1}.

A PN system is 1-bounded or safe iff, for any Mi ∈ R(G, M0) and any p ∈ P, Mi(p) ≤ 1. A PN system is live iff, for every reachable marking M i ∈ R(G, M0) and t ∈ T there is a reachable marking Mk ∈ R(G, M0) such that t is enabled in Mk.

Definition 2. A WorkFlow net (WFN) N is a subclass of PN owning the fol

lowing properties [ 3 ]: (1) it has two special places: i and o. Place i is a source place: •i = ∅, and place o is a sink place: o• = ∅. (2) If a transition te is added to PN connecting place o to i, then the resulting PN (called extended WFN) is strongly connected.

A consecutive activities sequence: x, d, e, a, b, y of a process executions that brings the system from initial state into the final state, corresponds to a trace; a workflow log is a multiset of traces. The set of traces that can be produced by a model N , is the language of N , is denoted by L(N ). An example workflow net is shown in Fig.2

Definition 3. A WFN N is said to be sound iff (N, M0) is safe and for any

marking Mi ∈ R(N, M0), o ∈ Mi → Mi = [o], N not contains dead transitions.

Definition 4. (Relations between activities). Let λ be a log over T . and let

a, b ∈ T :

- a →λ b, iff there is a trace σ = t1, t2, ...tn >∈ λ and 1 ≤ i < n − 1, such that ti = a, ti+1 = b, - a||λb iif a →λ b and b →λ a. - f irst(σ) = t1, last(σ) = tn, if n ≥ 1

The ordering relations has been used in [ 2, 4 ] as an abstraction of the behavior described by a log or model. The previous relations for a workflow model N , are defined similarly i.e. if . . . , a, b, . . . ∈ L(N ), then a →N b.

Definition 5. Completeness [2], Let N be a sound workflow net and λ a work

flow log of N . λ is a complete workflow log of N iff (1) for any workflow log λ of N : →λ ⊆→λ, (2) each activity of N is present in λ. 4

Invariance of occurrence

This section introduces the notion of invariance of occurrence and presents a technique for deriving a partial order relationship between activities. 4.1

Trace handling

A trace is a sequence of ordered activities, such that the immediately preceding activity of a referent activity in the sequence, is associated with a position that is less by exactly one from the position of the reference activity. i.e. < a, b, c > is a trace where the activity a is associated with position 1 the activity b is associated with the positions 2 and the activity c is associated with the position 3, we denote xi to refer the ith activity in the trace. First, useful operators for expressing the relative positions of activities in a trace and activity precedence/subsequence relationships are introduced.

Definition 6. (Trace handling operators). Let λ be a workflow log over T , let a ∈ T ; if there is a trace σk = x1, x2, ..., xn such that σk ∈ λ and a ∈ σk, • τ (xi, σk) provides the activity name of element xi in σk • P os(a, σk) = {p, q, r . . . , s} such that τ (xp, σk) = a, ..., τ (xs, σk) = a • P red(xi, σk) = τ (xi−1, σk) , i ∈ {2, 3, . . . , n} • Succ(xi, σk) = τ (xi+1, σk) , i ∈ {1, 2, . . . , n − 1}

, if 1 ∈ P os(a, σk) ⎧⎪ ∅ • P redset(a, σk) = ⎨⎪⎪

⎧⎪ ∅ • Succset(a, σk) = ⎨⎪⎪ ⎪⎩⎪⎪ ⎪⎩⎪⎪ ip=2 P red(xi, σk) ∩ · · · ∩ ir=q+2 P red(xi, σk)

s ∩ i=r+2 P red(xi, σk), if P os(a, σk) = {p, . . . , q, r, s}, p <, . . . , < r < s , if

n ∈ P os(a, σk) q−2 i=p Succ(xi, σk) ∩ ∩ in=−s1 Succ(xi, σk),

ir=−q2 Succ(xi, σk) ∩ . . .

if P os(a, σk) = {p, q, r, . . . , s}, p < q <, . . . , < s

The function P os(a, σk) provides a set of indices in which the symbol a appears in the trace σk, while the functions P red and Succ provide, respectively, the predecessor and successor activity of xi in the trace. The P redset is build from the activities sets appearing between the occurrence of the activity symbol, together with the beginning of the trace and the first occurrence of the activity symbol; the operator yields common symbols among these activities sets. The Succset is defined similarly. Example 1. Consider a log λ with three traces σ1 = x, a, b, d, e, c, a, b, y , σ2 = x, d, e, a, b, y , σ3 = x, a, b, d, c, a, e, b, y , which has been produced using the workflow net N in Fig. 2. Note that is not a complete with respect to N , since b →N c, e →N y hold in N but not in λ. Furthermore, the only knowledge about the parallel behavior generated between the cycle (a, b, c) and the sequence (d, e) is a||λe; all the other parallel relations are not in λ. The relations found in λ are the following: x →λ (a, d), a →λ (b, e), b →λ (d, y), c →λ a, d →λ (c, e), e →λ (b, a).

If we apply the P redset and Succset functions over the activity symbol a and the traces in λ we get: P redset(a, σ1) = {x} ∩ {b, d, e, c} = ∅, P redset(a, σ2) = {x, d, e}, P redset(a, σ3) = {x} ∩ {b, d, c} = ∅; Succset(a, σ1) = {b, d, e, c} ∩ {b, y} = {b}, P redset(a, σ2) = {b, y}, P redset(a, σ3) = {b, d, c} ∩ {e, b, y} = {b}. 4.2

Conjoint Occurrence Relation

Now, we are going to introduce several notions useful to determine the sets of activities that occur together in all traces of λ.

Definition 7. The Invariable precedence set of the activity symbol a in λ is

the set of activities that always precede a in every trace σk it appears, i.e. InvP red(a, λ) = σk P redset(a, σk). Similarly the Invariable subsequent set is InvSucc(a, λ) = σk Succset(a, σk).

Definition 8. The Occurrence Invariable Set of the activity symbol a in λ is the

set of activities that always occur when a occurs, that is, Oi(a) = InvP red(a, λ)∪ InvSucc(a, λ) ∪ {a}. Definition 9. Let R be a binary relation over T defined as R = {(a, b)|b ∈ Oi(a), ∀a, b ∈ T }. Let R be the coarsest equivalence relation such that R ⊆ R. R is called the Conjoint Occurrence Relation (COr) and the equivalence classes are called Conjoint Occurrence Classes (COci).

The Conjoint Occurrence Classes of the log above are COc1 = {a, b}, COc2 = {c}, COc3 = {x, d, e, y}; the Fig. 3 shows the matrix representation of the relation R in which the doted rectangles represent the COci.

Proposition 1. (Invariance). Let λ be a workflow log, COck a Conjoint Oc

currence Class induced by R , and a ∈ COck; when σi ∈ λ is executed, such that a ∈ σi all activities in COck are also executed in σi.

Proof. By definition, Oi(a) provides all the activities that invariably occurs when a occurs. Consider ∀a, b, c ∈ COck, a ∈ Oi(a), if a ∈ Oi(b) then b ∈ Oi(a) and if a ∈ Oi(b) ∧ b ∈ Oi(c) then a ∈ Oi(c), there is no activity unrelated among the rest of activities; then the set is invariable in all the traces of λ.

Proposition 2. (Order ). Let λ be a workflow log, COck a Conjoint Occurrence

Class induced by R . All the activities in COck occur in the same order in all the σi ∈ λ.

Proof. Assume that there are two activities a, b ∈ COck, in a parallel relation (a||b); then when the operators Predset and Succset are applied to one activity, the other one will not appear in its Oi(), consequently they are not in the same COck. Similarly, suppose that there are two traces in which a and b appear in a different order; then the same reasoning is applied to conclude that such activities cannot belong to the same COck.

If we take activities a, b ∈ T of Example 1, we can observe that a ∈ COc1 and the activity a is present in σ1, σ2, σ3; consequently, b is also in σ1, σ2, σ3, as shown in the traces. σ1 = x, a, b, d, e, c, a, b, y , σ2 = x, d, e, a, b, y , σ3 = x, a, b, d, c, a, e, b, y ; furthermore, they occur in the same order.

From Conjoint Occurrence Classes to Petri Net structures

Besides the Invariance and Order information provided by the Conjoint Occurrence classes, they provide much more information about the final model. Given the way in which this classes are computed and the partition of the alphabet T ; it is possible to obtain hidden relations between the elements of a COci; below we present some propositions that help to find this hidden relations. 5.1

Finding sequential substructures Proposition 3. The members of a COck form a sequential Petri Net sub

structure (transition-place-transition) according to their Order, in which each couple of consecutive activities are causally related.

Proof. (by contraposition). Suppose that the elements in COck do not have a PN sequential substructure according to their Order. Let a, b be any activities in COck such that a < b (a before b); given that there is not a PN sequential substructure, at least one of the places pi that connect a to b is missing causing that b could appear before a in some execution; therefore by proof of Proposition 2, these sets of elements cannot be in the same COck.

Definition 10. Each sequential Petri Net sub-structure can be disassembled in three parts: the start and the end activities according to the order and rest of the sequential substructure, which is called the body.

Definition 11. Two activities a, b belonging to a COci have a strong causality

denoted (a ⇒ b), iff they are consecutive in the order of the COci

Proposition 4. For all places pi in a sequential Petri Net sub-structure induced

by a COc, •pi = 1 and pi• = 1; additionally their input and output activities must belong to the COc.

Proof. Following the same reasoning as in the proof of Proposition 3, given that place pi exists but there is an activity k ∈ pi• such that k ∈/ COci, it could provoke an execution in which the activity a appears without the future appearance of b; therefore a, b are not in COc. In case that k ∈ •pi, this allows an execution in which the activity a appears without the previous appearance of b. 5.2

Parallel and Causal Behavior Deducted from COc

The incompleteness of the log is mostly caused by the high level of parallel behavior of activities in the real process, this is a common issue in several discovery techniques based on Log ordering relations. The COc classes provide a way to deal with this problem. Since the members of a COc are causally related, given a property of one of its elements, this may be transmitted to the other members in the class, i.e. it is possible to deduce parallel behavior between a couple of activities, even if they are not directly consecutive in both directions.

Most of the workflow systems offers standard building blocks such as ORsplit, OR-joint, AND-split, AND-joint [ 3 ], OR-splits/OR-joins correspond to places with multiple outgoing/ingoing arcs in a Petri Net, while AND-Split/ANDjoin corresponds to a transition with two or more output/input places, the latter blocks capture the parallel behavior of the process. Next proposition states how the COc and the Oi() can be used to infer these particular transitions.

Proposition 5. Let COci, COcj be two different conjoint occurrence classes,

such that at least an activity a ∈ COci is parallel to an activity b ∈ COcj i.e. (a||b) then.

1) If there exist two elements x, y corresponding to start and end activities of

either COci or COcj , such that COci ∪ COcj ⊆ Oi(x) ∩ Oi(y), then x, y are

And-split, And-join activities respectively. 2) The elements in COci have a parallel relation with the elements of COcj

(COci||COcj ), with the exception of activities corresponding to And-split or And-join.

Proof. (1) Given that Oi(a) contains the activities that occur whenever the a occurs and the And-split structure puts a token in two or more places enabling different threads, the only way in which an activity a has an And-split structure is that his Oi(a) contains all the activities involved in all the parallel threads. The reasoning is similar for the AND-joint case.

(2) Given that both COci and COcj define sequential substructures in which each place has one input and output, then the only way to see the presence of an activity a ∈ COcj during the sequential execution of COci, is that the activities in both sequential substructures are enabled by a previous And-split, consequently the activities in COci have a parallel behavior with the elements of COcj

The COc gives a partitioned view of the process. Next function helps to join this partitions in order to complete the model.

Definition 12. Let X ⊆ T be a set of activities and σi ∈ λ; the function φX (σi) filters the trace by applying the natural projection of σi over T \X.

The previous function makes a filtering over the log traces, by removing all the occurrences of any activity a ∈ X. The traces preserve the order of reminder activities.

Once the parallel behavior between all the COc have been detected, it is possible to use the φX function as a strategy to find missing parallel behavior or links (causal relations) between sequential substructures, by assigning to X the parallel activities of a COci w.r.t a COcj and vice versa.

In order to illustrate how the function φX operates, it is applied on the COc and previous log. Since e||a, then COc1||COc3. Giving X the parallel activities in COc1 the result is: φ{a,b}(σ1) = x, d, e, c, y , φ{a,b}(σ2) = x, d, e, y and φ{a,b}(σ3) = x, d, c, e, y , then by computing the Relations between activities for the filtered log, a new parallel behavior is discovered: c||e, therefore COc2||COc3.

Now being X the parallel activities in COc3, the result is, φ{d,e}(σ1) = x, a, b, c, a, b, y , φ{d,e}(σ2) = x, a, b, y and φ{d,e}(σ3) = x, a, b, c, a, b, y , then the new consecutive relation b →λ c is discovered.

Notice that x, y are not part of the parallel behavior because they are the start and end in COc3; furthermore, COc1 ∪ COc3 ⊆ Oi(x) ∩ Oi(y), thus x, y corresponds to an AND-split and AND-joint respectively. The application of this procedure to the new parallel relation (COc2||COc3) does not create new →λ relations. This log filtering is performed iteratively over all the classes that have a parallel relation in addition to the new ones which result of a previous filter. The computation ends when no new consecutive relations are found. 5.3

Building the Petri Net Model

Once a hidden behavior in the log is found, it is used to connect all the sequential substructures into the final model by creating new places. Table 2 shows the Relations between activities of the log λ in Example 1, after adding the found relations by filtering the log.

The rest of (a →λ b) relations after removing all the parallel behavior (||λ) determine the existence of a place between activities a and b; some of these are already considered by the places which conform a sequential Petri Net substructure, such is the case of the strong causal relations (a ⇒ b) in COc1 and (x ⇒ d), (d ⇒ e), (e ⇒ y) in COc3 for Example 1. For the rest of relations, the addition of places is quite intuitive, it is defined as follows. Definition 13. (Merging activities in →λ relation). When the left activity in two →λ relations are the same (a →λ b, a →λ c) two possible substructures can be created.

a) The places of →λ relations are merged into a single one iff (b λ c ∨ c λ b) This is denoted as [a, b + c].

b) The places of →λ relations are not merged iff (b →λ c ∨ c →λ b) This is denoted as [a, b||c].

Similarly, for →λ relations when the right activity is common, (b →λ a, c →λ a), the substructure created will be either [b||c , a] or [b + c , a] accordingly. In general, a set of →λ relations in the form (a →λ b, a →λ c, . . . , a →λ d) may produce either [a, b + c + d] or [a, b||c||d]. Consequently, the merging can be applied to composed relations, i.e. [b + c , a] and [b + c , d] leads [b + c , a + d].

For the activities in f irst(σ) or last(σ) we add two more places corresponding to the i and o places in the final workflow model i.e. a ∈ i • |∃σi, a ∈ f irst(σi) and a ∈ •o|∃σi, a ∈ last(σi)

A similar way to infer the merging of places is presented in [ 2 ]. In contrast to this result our proposal reduces considerably the space search since high percentage of the places are already found previously by the sequential sub-structures

The Fig. 5 shows the Petri Net model after connecting the sequential substructures with the new relations.

All the definitions previously described can be summarized in the following algorithm that we called the Conjoint Occurrence Miner (CoM ). 6

Implementation and tests

We implement the CoM algorithm as a plunging in the ProM Framework [ 9 ] to test our results and compare our approach with other important techniques in the literature.

The aim of these experiments is to show first, how the CoM algorithm deals with small logs, and then, to present how others process discovery algorithms work with the same incomplete logs.

Algorithm 1 Conjoint Occurrence Miner (CoM ) Experiment 1

We used the Workflow net N presented in Figure 2 to generate a complete workflow log λ according to the → relation, then we construct 4 smaller sublogs of λ by removing traces from λ , which have 0.90, 0.80, 0.75 and 0.60 of the average ratio of →-pairs compared to the model N , named λ0.90, λ0.85, λ0.75 and λ0.60 respectively. In this case, λ0.60 corresponds to λ presented in Example 1.

For this experiment, we consider the following discovery algorithms: Integer Linear Programing miner (ILP )[ 20 ], α-algorithm [ 2 ], Inductive Miner (IM ) [ 13 ], and Inductive Miner Incompleteness (IMin) [ 13 ], we use the ProM Framework [ 9 ] for perform the experiments; ILP (λ), for instance, means the application of ILP algorithm to the log λ. We want to show (1) for which of the mentioned logs the algorithm return a language equivalent model and (2) if the algorithm is able to rediscover the original model.

Results.

(1) L(N ) = L(ILP (λ0.85)), L(N ) = L(α(λ0.90)), L(N ) = L(IM (λ0.90)), L(N ) = L(IM in(λ0.60)) and L(N ) = L(CoM (λ0.60)). (2) ILP , α and IM require the full → relations for rediscover the original model N , whereas IM in and CoM rediscover N with λ0.90, λ0.60 respectively.

The one that needs less information for rediscover the model is the CoM algorithm. Figure 6 shows fragments of the resulting models applying the others algorithms to λ.60.

Experiment 2

In this experiment, thirty logs are used to tests the proposed method and other discovery techniques. Such logs have been obtained in the same way than in Experiment 1, from six diverse sound WFN involving form six to ten activities; the nets are shown in Figure 7.

Results. Table 3 shows the results of Experiment 2. Column 2: gives the percentage of logs for which the discovery technique returns a language equivalent model. Column 3: gives the percentage of logs for which the technique is able to rediscover the original model. Although the method shows good results for dealing with incompleteness, there are particular behaviors that cannot be discovered using this technique; such is the case of short loops structures as shown in N1 and N2 of Figure 8. This is because it is not easy to infer the conjoint occurrence classes COc from traces in which an activity can be observed directly after itself.

Conclusion

In this work a novel discovery technique is presented. It is based on a new relation between activities called Conjoint Occurrence Relation, which leads to a partition of the activities alphabet into classes. Such classes are used to infer causal and parallel relations missing in the incomplete log, and consequently, Petri net substructures. Furthermore, the experimental results show that the performance algorithm is acceptable with respect to other relevant discovery techniques. Although the technique is presented as a discovery one, which can deal with logs including reduced number of traces, the proposed notions and algorithms seem to be a promising approach for dealing with the problem of rediscoverability. Current research addresses study the class of models and the minimum behavior exhibited in the log for ensure this property.

1. Aalst , W. , Buijs , J. , Dongen , B. : Data-Driven Process Discovery and Analysis: First International Symposium , SIMPDA 2011, Campione d'Italia, Italy, June 29 - July 1, 2011 , Revised Selected Papers, chap. Towards Improving the Representational Bias of Process Mining , pp. 39 - 54 . Springer Berlin Heidelberg, Berlin, Heidelberg ( 2012 ), http://dx.doi.org/10.1007/978-3- 642 -34044- 4 _ 3

2. Van der Aalst , W., Weijters , T. , Maruster , L. : Workflow mining: Discovering process models from event logs. Knowledge and Data Engineering , IEEE Transactions on 16 ( 9 ), 1128 - 1142 ( 2004 )

3. van der Aalst , W.M.P. : Process Mining: Discovery, Conformance and Enhancement of Business Processes. Springer Publishing Company, Incorporated, 1st edn. ( 2011 )

4. Agrawal , R. , Gunopulos , D. , Leymann , F. : Mining Process Models from Workflow Logs . In: Schek, H.J. , Saltor , F. , Ramos , I. , Alonso , G. , Schek , H.J. , Saltor , F. , Ramos , I. , Alonso , G . (eds.) EDBT. Lecture Notes in Computer Science , vol. 1377 , pp. 469 - 483 . Springer ( 1998 ), http://dblp.uni-trier.de/rec/bibtex/ conf/edbt/AgrawalGL98

5. Angluin , D. : Queries and Concept Learning . Mach. Learn . 2 ( 4 ), 319 - 342 ( Apr 1988 ), http://dx.doi.org/10.1023/a:1022821128753

6. Buijs , J.C.A.M. , van Dongen , B.F. , van der Aalst , W.M.P. : Quality dimensions in process discovery: The importance of fitness, precision, generalization and simplicity . International Journal of Cooperative Information Systems 23 ( 01 ), 1440001 ( 2014 ), http://www.worldscientific.com/doi/abs/10.1142/ S0218843014400012

7. Cabasino , M.P. , Darondeau , P. , Fanti , M.P. , Seatzu , C. : Model identification and synthesis of discrete-event systems . Contemporary Issues in Systems Science and Engineering ( 2013 )

8. Cook , J.E. , Du , Z. , Liu , C. , Wolf , A.L. : Discovering models of behavior for concurrent workflows . Computers in industry 53(3) , 297 - 319 ( 2004 )

9. Dongen , B.F. , Medeiros , A.K.A. , Verbeek , H.M.W. , Weijters , A.J.M.M. , Aalst , W.M.P. : Applications and Theory of Petri Nets 2005 : 26th International Conference, ICATPN 2005, Miami , USA, June 20-25, 2005 . Proceedings, chap. The ProM Framework: A New Era in Process Mining Tool Support , pp. 444 - 454 . Springer Berlin Heidelberg, Berlin, Heidelberg ( 2005 ), http://dx.doi.org/10. 1007/11494744_ 25

10. Estrada-Vargas , A.P. , L´ opez-Mellado , E. , Lesage , J.J.: A comparative analysis of recent identification approaches for discrete-event systems . Mathematical Problems in Engineering 2010 ( 2010 )

11. Estrada-Vargas , A.P. , L´ opez-Mellado , E. , Lesage , J.J. : Input-output identification of controlled discrete manufacturing systems . International Journal of Systems Science 45 ( 3 ), 456 - 471 ( 2014 )

12. Gold , M.E. : Language identification in the limit . Information and Control 10 ( 5 ), 447 - 474 ( 1967 ), http://www.isrl.uiuc.edu/\~{}amag/langev/paper/ gold67limit.html

13. Leemans , S.J.J. , Fahland , D. , Aalst , W.M.P. : Application and Theory of Petri Nets and Concurrency: 34th International Conference , PETRI NETS 2013 , Milan, Italy, June 24-28, 2013 . Proceedings, chap. Discovering Block-Structured Process Models from Event Logs - A Constructive Approach , pp. 311 - 329 . Springer Berlin Heidelberg, Berlin, Heidelberg ( 2013 ), http://dx.doi.org/10. 1007/978-3- 642 -38697-8_ 17

14. Leemans , S.J.J. , Fahland , D. , Aalst , W.M.P. : Application and Theory of Petri Nets and Concurrency: 35th International Conference , PETRI NETS 2014 , Tunis, Tunisia, June 23-27, 2014 . Proceedings, chap. Discovering Block-Structured Process Models from Incomplete Event Logs , pp. 91 - 110 . Springer International Publishing, Cham ( 2014 ), http://dx.doi.org/10.1007/978-3- 319 -07734- 5 _ 6

15. Meda-Campana , M. , Ramirez-Treviro , A. , L´ opez-Mellado , E. : Asymptotic identification of discrete event systems . In: Decision and Control , 2000. Proceedings of the 39th IEEE Conference on. vol. 3 , pp. 2266 - 2271 . IEEE ( 2000 )

16. Mokhov , A. , Carmona , J.: Event log visualisation with conditional partial order graphs: from control flow to data . In: Proceedings of the International Workshop on Algorithms & Theories for the Analysis of Event Data, ATAED 2015 , Brussels, Belgium, June 22-23, 2015 . pp. 16 - 30 ( 2015 ), http://ceur-ws. org/ Vol- 1371 / paper02.pdf

17. Saives , J. , Faraut , G. , Lesage , J.J. : Identification of discrete event systems unobservable behaviour by petri nets using language projections . In: Control Conference (ECC) , 2015 European. pp. 464 - 471 ( July 2015 )

18. Tapia-Flores , T. , L´ opez-Mellado , E. , Estrada-Vargas , A.P. , Lesage , J.J.: Petri net discovery of discrete event processes by computing t-invariants . In: Emerging Technology and Factory Automation (ETFA) , 2014 IEEE. pp. 1 - 8 ( Sept 2014 )

19. Wen , L. , Aalst , W.M.P. , Wang , J. , Sun , J.: Mining process models with non-freechoice constructs . Data Mining and Knowledge Discovery 15 ( 2 ), 145 - 180 ( 2007 ), http://dx.doi.org/10.1007/s10618-007-0065-y

20. Werf , J.M.E.M. , Dongen , B.F. , Hurkens , C.A.J. , Serebrenik , A. : Applications and Theory of Petri Nets: 29th International Conference , PETRI NETS 2008 , Xi'an, China , June 23-27, 2008 . Proceedings, chap. Process Discovery Using Integer Linear Programming , pp. 368 - 387 . Springer Berlin Heidelberg, Berlin, Heidelberg ( 2008 ), http://dx.doi.org/10.1007/978-3- 540 -68746-7_ 24