=Paper= {{Paper |id=Vol-1592/paper03 |storemode=property |title=Discovering Process Models from Incomplete Event Logs using Conjoint Occurrence Classes |pdfUrl=https://ceur-ws.org/Vol-1592/paper03.pdf |volume=Vol-1592 |authors=Tonatiuh Tapia-Flores,Edelma Rodríguez-Pérez,Ernesto López-Mellado |dblpUrl=https://dblp.org/rec/conf/apn/Tapia-FloresRL16 }} ==Discovering Process Models from Incomplete Event Logs using Conjoint Occurrence Classes== https://ceur-ws.org/Vol-1592/paper03.pdf
    Discovering Process Models from Incomplete
    Event Logs using Conjoint Occurrence Classes

 Tonatiuh Tapia-Flores, Edelma Rodrı́guez-Pérez and Ernesto López-Mellado

                        CINVESTAV Unidad Guadalajara
            Av. del Bosque 1145, Col. El Bajio. 45015 Zapopan Mexico
                 {ttapia,erodrigue,elopez}@gdl.cinvestav.mx



      Abstract. This paper addresses the problem of discovering a sound Work-
      ﬂow net (WFN) from sample event traces representing the behavior of a
      process. A current challenging problem deals with discovering a suitable
      model from incomplete logs since it is not possible to ensure that a log
      contains all executions of a process. A method that allows building WFN
      from logs including a reduced number of traces is presented. It is based
      on the concept of invariant occurrence between activities, which yields
      sets of activities named conjoint occurrence classes, which are used to
      infer the causal and concurrent behavior not exhibited in the log. Pro-
      cedures derived from the method have been implemented in the ProM
      framework. Experimental results show that the algorithm can discover
      models form logs including less traces compared to other standing discov-
      ery algorithms.

      Keywords: Process discovery, Petri nets, Incomplete logs


1    Introduction

The automated synthesis of formal models from the observation of a process
execution in the form of event sequences is called process discovery [3]. Event
data is provided commonly by resource planning and workﬂow management
systems in organizations as logs of event traces, which represent the behavior of
the process to be discovered. The aim is to obtain suitable models that describe
clearly the causal and parallel relationships between the events in traces.
    One signiﬁcant problem in process discovery techniques is the fact that the
log could not contain enough information to discover a process model, i.e. the
log might be incomplete. This might cause that the obtained model excludes
actual behavior or represents some behavior that is not in the real process.
Many things can cause the incompleteness of the log; one of the most common is
the presence of parallelism, due to the high number of possible parallel activities
whose interleaving cannot be sampled in the traces.
    The obtained model through a discovery technique must be easy to under-
stand by representing clearly, the causal and parallel relationships between the
activities. Furthermore, such a model must represent all the observed behavior




                                      31
(provided as input data) and the lowest amount as possible of non-observed be-
havior. Several metrics called quality dimensions [6] assess the features above
mentioned for the discovered model with respect to the event log.
    A procedure used to test whether or not a technique is able to discover models
that have the same language as the real-life process by which a log was produced
is called rediscoverability, which compares the obtained model directly with the
real-life process. This procedure is summarized in Figure 1.




                         Fig. 1. Model quality evaluation


    The method proposed in this paper pursues to obtain a suitable Petri net
model that reveals some behavior hidden in the log, namely some causal and/or
concurrent relationships. A method that allows building sound WFN from logs
including a reduced number of traces is presented. The discovery technique is
based on the concept of invariant occurrence between activities, which leads to
obtain sets of events named conjoint occurrence classes; such classes help to infer
the causal and concurrent behavior not exhibited in the log.
    The paper is organized as follows. In Section 2 the related work is over-
viewed, In Section 3, the basic notions of PN and WFN are recalled. Section
4 states the ﬁrst notions of invariance of occurrence and deﬁnes the occurrence
classes. In Section 5 the synthesis of PN model from the conjoint occurrence
classes is presented. Section 6 demonstrates the performance of the proposal
through experiments and the results are compared with other approaches. Sec-
tion 7 concludes the paper.


2   Related Work
Pioneer works on the matter, named language learning techniques, have been
proposed in the ﬁeld of computer sciences. The aim was to build ﬁne representa-
tions (ﬁnite automata, grammars) of languages from samples of accepted words
[12, 5]. In the ﬁeld of workﬂow management systems, several process discovery
techniques have been proposed. In an early proposal [4], a ﬁnite automaton,
called conformal graph is obtained from sample traces execution. Later in [2]
the alpha algorithm uses the log based ordering between activities to infer the
a model. Numerous techniques present extensions To this algorithm, which are
able to discover more complex behavior such as implicit dependencies [19]. In
[8] a probabilistic approach to ﬁnd the concurrent and direct relations between




                                     32
activities is presented. In [16] a formalism to represent the partial order between
activities, which improve the visualizations of event logs with additional data is
introduced. The discovery problem is addressed in an Integer linear programing
approach in [20]. The inductive miner (IM) is presented in [13], which applies a
divide-and-conquer approach that iteratively partitions the activities in process
constructs as joint and splits. An adaptation of this algorithm capable to deal
with incomplete logs called (IMin) [14] uses probabilistic behavioral relations
to deal with incompleteness. Recently, a new formal model to represent process
called process tree has been introduced in [1]. Many methods have adopted this
representation; such is the case of evolutionary tree miner (ETM) [6], that uses
a genetic algorithm to discover a process tree; the innovation in this technique
is that it allows the user to select the preferences with respect to the quality
dimensions of the ﬁnal model.
    In the ﬁeld of Discrete Event Systems (DES), where the problem is named
identiﬁcation, several approaches have been proposed for building models repre-
senting the observed behavior of automated processes. The incremental approach
proposed in [15] , allows building safe interpreted PN models from a continuous
stream of system’s outputs. Later, a technique based on project the activities
sequence on sub-alphabets to discover speciﬁc patterns is present in [17]. In [11]
input-output identiﬁcation of automated manufacturing process is addressed;
an interpreted PN is obtained from a set of sequences of input-output vectors
collected from the controller during the system cyclic operation. Recently an
approach based on the computing t-invariants that allow ﬁnding implicit depen-
dencies in the PN model is presented in [18]. The literature on this research line
is vast; wide reviews on DES identiﬁcation can be found in [10, 7].
    In both ﬁelds, the aim in general is to discover processes models from observed
behavior. However, the nature of processes leads to problem statements somehow
diﬀerent in terms of input data.


3    Background

This section presents the basic concepts and notations used in this paper.

Deﬁnition 1. An ordinary Petri Net structure G is a bipartite digraph repre-
sented by the 4-tuple G = (P, T, I, O) where: P = {p1 , p2 , ..., p|P | } and T =
{t1 , t2 , ..., t|T | } are ﬁnite sets of vertices named places and transitions respec-
tively; I : P × T → {0, 1} (O : T × P → {0, 1}) is a function representing the
arcs going from places to transitions (from transitions to places).
     A marking function M : P → N represents the number of tokens residing
inside each place; it is usually expressed as an |P |-entry vector.
     A Petri Net system or Petri Net (P N ) is the pair N = (G, M0 ), where G is
a P N structure and M0 is an initial marking. In a PN system, a transition tj
is enabled at marking Mk if ∀pi ∈ P, Mk (pi ) ≥ I(pi , tj ); an enabled transition tj
can be ﬁred reaching a new marking Mk+1 . The reachability set of a PN is the
set of all possible reachable markings from M0 ﬁring only enabled transitions;




                                       33
this set is denoted by R(G, M0 ). For any tj ∈ T , •tj = {pi |I(pi , tj ) = 1}, and
tj • = {pi |O(tj , pi ) = 1}, similarly for any pi ∈ P , •pi = {tj |O(tj , pi ) = 1}, and
pi • = {tj |I(pi , tj ) = 1}.
   A PN system is 1-bounded or safe iﬀ, for any Mi ∈ R(G, M0 ) and any
p ∈ P, Mi (p) ≤ 1. A PN system is live iﬀ, for every reachable marking M i ∈
R(G, M0 ) and t ∈ T there is a reachable marking Mk ∈ R(G, M0 ) such that t is
enabled in Mk .
Deﬁnition 2. A WorkFlow net (WFN) N is a subclass of PN owning the fol-
lowing properties [3]: (1) it has two special places: i and o. Place i is a source
place: •i = ∅, and place o is a sink place: o• = ∅. (2) If a transition te is added
to PN connecting place o to i, then the resulting PN (called extended WFN) is
strongly connected.
    A consecutive activities sequence: x, d, e, a, b, y of a process executions that
brings the system from initial state into the ﬁnal state, corresponds to a trace;
a workﬂow log is a multiset of traces. The set of traces that can be produced by
a model N , is the language of N , is denoted by L(N ). An example workﬂow net
is shown in Fig.2




                           Fig. 2. A sound Workﬂow net N .



Deﬁnition 3. A WFN N is said to be sound iﬀ (N, M0 ) is safe and for any
marking Mi ∈ R(N, M0 ), o ∈ Mi → Mi = [o], N not contains dead transitions.
Deﬁnition 4. (Relations between activities). Let λ be a log over T . and let
a, b ∈ T :
    - a →λ b, iﬀ there is a trace σ = t1 , t2 , ...tn >∈ λ and 1 ≤ i < n − 1, such
that ti = a, ti+1 = b,
    - a||λ b iif a →λ b and b →λ a.
    - f irst(σ) = t1 , last(σ) = tn , if n ≥ 1
   The ordering relations has been used in [2, 4] as an abstraction of the behavior
described by a log or model. The previous relations for a workﬂow model N , are
deﬁned similarly i.e. if . . . , a, b, . . . ∈ L(N ), then a →N b.




                                        34
Deﬁnition 5. Completeness [2], Let N be a sound workﬂow net and λ a work-
ﬂow log of N . λ is a complete workﬂow log of N iﬀ (1) for any workﬂow log λ
of N : →λ ⊆→λ , (2) each activity of N is present in λ.


4     Invariance of occurrence

This section introduces the notion of invariance of occurrence and presents a
technique for deriving a partial order relationship between activities.


4.1    Trace handling

A trace is a sequence of ordered activities, such that the immediately preceding
activity of a referent activity in the sequence, is associated with a position that is
less by exactly one from the position of the reference activity. i.e. < a, b, c > is a
trace where the activity a is associated with position 1 the activity b is associated
with the positions 2 and the activity c is associated with the position 3, we denote
xi to refer the ith activity in the trace. First, useful operators for expressing the
relative positions of activities in a trace and activity precedence/subsequence
relationships are introduced.

Deﬁnition 6. (Trace handling operators). Let λ be a workﬂow log over T ,
let a ∈ T ; if there is a trace σk = x1 , x2 , ..., xn such that σk ∈ λ and a ∈ σk ,

• τ (xi , σk ) provides the activity name of element xi in σk
• P os(a, σk ) = {p, q, r . . . , s} such that τ (xp , σk ) = a, ..., τ (xs , σk ) = a
• P red(xi , σk ) = τ (xi−1 , σk ) , i ∈ {2, 3, . . . , n}
• Succ(xi , σk ) = τ (xi+1 , σk ) , i ∈ {1, 2, . . . , n − 1}
                       ⎧
                       ⎪
                       ⎪ ∅           , if 1 ∈ P os(a, σk )
                       ⎪
                       ⎨p P red(x , σ ) ∩ · · · ∩ r
                                            i  k              i=q+2 P red(xi , σk )
• P redset(a, σk ) =        i=2
                                  s
                       ⎪
                       ⎪       ∩ i=r+2 P red(xi , σk ),
                       ⎪
                       ⎩
                                    if P os(a, σk ) = {p, . . . , q, r, s}, p <, . . . , < r < s
                       ⎧
                       ⎪
                       ⎪ ∅          , if n ∈ P os(a, σk )
                       ⎪
                       ⎨q−2 Succ(x , σ ) ∩ r−2 Succ(x , σ ) ∩ . . .
                                           i  k                       i   k
• Succset(a, σk ) =         i=p
                                  n−1                   i=q
                       ⎪
                       ⎪      ∩          Succ(x    , σ   ),
                       ⎪
                       ⎩            i=s          i     k
                                   if P os(a, σk ) = {p, q, r, . . . , s}, p < q <, . . . , < s

   The function P os(a, σk ) provides a set of indices in which the symbol a
appears in the trace σk , while the functions P red and Succ provide, respectively,
the predecessor and successor activity of xi in the trace. The P redset is build
from the activities sets appearing between the occurrence of the activity symbol,
together with the beginning of the trace and the ﬁrst occurrence of the activity
symbol; the operator yields common symbols among these activities sets. The
Succset is deﬁned similarly.




                                              35
Example 1. Consider a log λ with three traces σ1 = x, a, b, d, e, c, a, b, y , σ2 =
x, d, e, a, b, y , σ3 = x, a, b, d, c, a, e, b, y , which has been produced using the
workﬂow net N in Fig. 2. Note that is not a complete with respect to N , since
b →N c, e →N y hold in N but not in λ. Furthermore, the only knowledge about
the parallel behavior generated between the cycle (a, b, c) and the sequence (d, e)
is a||λ e; all the other parallel relations are not in λ. The relations found in λ
are the following: x →λ (a, d), a →λ (b, e), b →λ (d, y), c →λ a, d →λ (c, e),
e →λ (b, a).
    If we apply the P redset and Succset functions over the activity symbol a and
the traces in λ we get: P redset(a, σ1 ) = {x} ∩ {b, d, e, c} = ∅, P redset(a, σ2 ) =
{x, d, e}, P redset(a, σ3 ) = {x} ∩ {b, d, c} = ∅; Succset(a, σ1 ) = {b, d, e, c} ∩
{b, y} = {b}, P redset(a, σ2 ) = {b, y}, P redset(a, σ3 ) = {b, d, c} ∩ {e, b, y} = {b}.


4.2   Conjoint Occurrence Relation
Now, we are going to introduce several notions useful to determine the sets of
activities that occur together in all traces of λ.
Deﬁnition 7. The Invariable precedence set of the activity symbol a in λ is
                   that always precede a in every trace σk it appears, i.e.
the set of activities
InvP red(a, λ) =  σk P redset(a, σk ). Similarly the Invariable subsequent set is
InvSucc(a, λ) = σk Succset(a, σk ).
Deﬁnition 8. The Occurrence Invariable Set of the activity symbol a in λ is the
set of activities that always occur when a occurs, that is, Oi(a) = InvP red(a, λ)∪
InvSucc(a, λ) ∪ {a}.
   Table 1 shows the Invariable precedence and subsequent sets along with their
respective Occurrence Invariable sets of each activity symbol in T , corresponding
to workﬂow log in Example. 1.

               Table 1. Occurrence Invariable set of activities a and d

                Activity    InvPred         InvSucc              Oi
                   x            ∅        {a, b, d, e, y} {x, a, b, d, e, y}
                   a            ∅              {b}            {a, b}
                   b          {a}               ∅             {a, b}
                   c      {x, a ,b ,d}     {a, b, y} {x, a, b, c, d, y}
                   d           {x}       {a, b, d, e, y} {x, a, b, d, e, y}
                   e         {x, d}          {b, y}       {x, b, d, e, y}
                   y     {x, a, b, d, e}        ∅        {x, a, b, d, e, y}



Deﬁnition 9. Let R be a binary relation over T deﬁned as R = {(a, b)|b ∈
Oi(a), ∀a, b ∈ T }. Let R be the coarsest equivalence relation such that R ⊆ R.
R is called the Conjoint Occurrence Relation (COr ) and the equivalence classes
are called Conjoint Occurrence Classes (COci ).




                                         36
    The Conjoint Occurrence Classes of the log above are COc1 = {a, b}, COc2 =
{c}, COc3 = {x, d, e, y}; the Fig. 3 shows the matrix representation of the rela-
tion R in which the doted rectangles represent the COci .




Fig. 3. Matrix representation of relation R, 1 for related elements, noting otherwise.



   Conjoint Occurrence Classes have interesting properties, namely Invariance
and Order.

Proposition 1. (Invariance). Let λ be a workﬂow log, COck a Conjoint Oc-
currence Class induced by R , and a ∈ COck ; when σi ∈ λ is executed, such that
a ∈ σi all activities in COck are also executed in σi .

Proof. By deﬁnition, Oi(a) provides all the activities that invariably occurs when
a occurs. Consider ∀a, b, c ∈ COck , a ∈ Oi(a), if a ∈ Oi(b) then b ∈ Oi(a) and if
a ∈ Oi(b) ∧ b ∈ Oi(c) then a ∈ Oi(c), there is no activity unrelated among the
rest of activities; then the set is invariable in all the traces of λ.           
                                                                                 

Proposition 2. (Order ). Let λ be a workﬂow log, COck a Conjoint Occurrence
Class induced by R . All the activities in COck occur in the same order in all
the σi ∈ λ.

Proof. Assume that there are two activities a, b ∈ COck , in a parallel relation
(a||b); then when the operators Predset and Succset are applied to one activity,
the other one will not appear in its Oi(), consequently they are not in the same
COck . Similarly, suppose that there are two traces in which a and b appear
in a diﬀerent order; then the same reasoning is applied to conclude that such
activities cannot belong to the same COck .                                    
                                                                               

    If we take activities a, b ∈ T of Example 1, we can observe that a ∈ COc1
and the activity a is present in σ1 , σ2 , σ3 ; consequently, b is also in σ1 , σ2 , σ3 ,
as shown in the traces. σ1 = x, a, b, d, e, c, a, b, y , σ2 = x, d, e, a, b, y , σ3 =
x, a, b, d, c, a, e, b, y ; furthermore, they occur in the same order.




                                        37
5     From Conjoint Occurrence Classes to Petri Net
      structures

Besides the Invariance and Order information provided by the Conjoint Occur-
rence classes, they provide much more information about the ﬁnal model. Given
the way in which this classes are computed and the partition of the alphabet T ;
it is possible to obtain hidden relations between the elements of a COci ; below
we present some propositions that help to ﬁnd this hidden relations.


5.1   Finding sequential substructures

Proposition 3. The members of a COck form a sequential Petri Net sub-
structure (transition-place-transition) according to their Order, in which each
couple of consecutive activities are causally related.

Proof. (by contraposition). Suppose that the elements in COck do not have a
PN sequential substructure according to their Order. Let a, b be any activities in
COck such that a < b (a before b); given that there is not a PN sequential sub-
structure, at least one of the places pi that connect a to b is missing causing that
b could appear before a in some execution; therefore by proof of Proposition 2,
these sets of elements cannot be in the same COck .                               
                                                                                  

Deﬁnition 10. Each sequential Petri Net sub-structure can be disassembled in
three parts: the start and the end activities according to the order and rest of
the sequential substructure, which is called the body.

    Fig. 4 shows the sequential PN sub-structure of each COc in λ, the dotted
rectangle corresponds to the class body.




         Fig. 4. The sequential substructure of the COc of the previous log.



Deﬁnition 11. Two activities a, b belonging to a COci have a strong causality
denoted (a ⇒ b), iﬀ they are consecutive in the order of the COci




                                      38
Proposition 4. For all places pi in a sequential Petri Net sub-structure induced
by a COc, •pi = 1 and pi • = 1; additionally their input and output activities
must belong to the COc.

Proof. Following the same reasoning as in the proof of Proposition 3, given
that place pi exists but there is an activity k ∈ pi • such that k ∈   / COci , it
could provoke an execution in which the activity a appears without the future
appearance of b; therefore a, b are not in COc. In case that k ∈ •pi , this allows
an execution in which the activity a appears without the previous appearance
of b.                                                                            
                                                                                 


5.2   Parallel and Causal Behavior Deducted from COc

The incompleteness of the log is mostly caused by the high level of parallel be-
havior of activities in the real process, this is a common issue in several discovery
techniques based on Log ordering relations. The COc classes provide a way to
deal with this problem. Since the members of a COc are causally related, given
a property of one of its elements, this may be transmitted to the other members
in the class, i.e. it is possible to deduce parallel behavior between a couple of
activities, even if they are not directly consecutive in both directions.
    Most of the workﬂow systems oﬀers standard building blocks such as OR-
split, OR-joint, AND-split, AND-joint [3], OR-splits/OR-joins correspond to
places with multiple outgoing/ingoing arcs in a Petri Net, while AND-Split/AND-
join corresponds to a transition with two or more output/input places, the latter
blocks capture the parallel behavior of the process. Next proposition states how
the COc and the Oi() can be used to infer these particular transitions.

Proposition 5. Let COci , COcj be two diﬀerent conjoint occurrence classes,
such that at least an activity a ∈ COci is parallel to an activity b ∈ COcj i.e.
(a||b) then.

1) If there exist two elements x, y corresponding to start and end activities of
   either COci or COcj , such that COci ∪ COcj ⊆ Oi(x) ∩ Oi(y), then x, y are
   And-split, And-join activities respectively.
2) The elements in COci have a parallel relation with the elements of COcj
   (COci ||COcj ), with the exception of activities corresponding to And-split or
   And-join.

Proof. (1) Given that Oi(a) contains the activities that occur whenever the a
occurs and the And-split structure puts a token in two or more places enabling
diﬀerent threads, the only way in which an activity a has an And-split structure
is that his Oi(a) contains all the activities involved in all the parallel threads.
The reasoning is similar for the AND-joint case.                                 
                                                                                 
    (2) Given that both COci and COcj deﬁne sequential substructures in which
each place has one input and output, then the only way to see the presence
of an activity a ∈ COcj during the sequential execution of COci , is that the




                                      39
activities in both sequential substructures are enabled by a previous And-split,
consequently the activities in COci have a parallel behavior with the elements
of COcj                                                                       
                                                                              

    The COc gives a partitioned view of the process. Next function helps to join
this partitions in order to complete the model.

Deﬁnition 12. Let X ⊆ T be a set of activities and σi ∈ λ; the function φX (σi )
ﬁlters the trace by applying the natural projection of σi over T \X.

    The previous function makes a ﬁltering over the log traces, by removing all
the occurrences of any activity a ∈ X. The traces preserve the order of reminder
activities.
    Once the parallel behavior between all the COc have been detected, it is
possible to use the φX function as a strategy to ﬁnd missing parallel behavior
or links (causal relations) between sequential substructures, by assigning to X
the parallel activities of a COci w.r.t a COcj and vice versa.
    In order to illustrate how the function φX operates, it is applied on the COc
and previous log. Since e||a, then COc1 ||COc3 . Giving X the parallel activities
in COc1 the result is: φ{a,b} (σ1 ) = x, d, e, c, y , φ{a,b} (σ2 ) = x, d, e, y and
φ{a,b} (σ3 ) = x, d, c, e, y , then by computing the Relations between activities for
the ﬁltered log, a new parallel behavior is discovered: c||e, therefore COc2 ||COc3 .
    Now being X the parallel activities in COc3 , the result is, φ{d,e} (σ1 ) =
x, a, b, c, a, b, y , φ{d,e} (σ2 ) = x, a, b, y and φ{d,e} (σ3 ) = x, a, b, c, a, b, y , then
the new consecutive relation b →λ c is discovered.
    Notice that x, y are not part of the parallel behavior because they are the
start and end in COc3 ; furthermore, COc1 ∪ COc3 ⊆ Oi(x) ∩ Oi(y), thus x, y
corresponds to an AND-split and AND-joint respectively. The application of this
procedure to the new parallel relation (COc2 ||COc3 ) does not create new →λ
relations. This log ﬁltering is performed iteratively over all the classes that have
a parallel relation in addition to the new ones which result of a previous ﬁlter.
The computation ends when no new consecutive relations are found.


5.3    Building the Petri Net Model

Once a hidden behavior in the log is found, it is used to connect all the sequential
substructures into the ﬁnal model by creating new places. Table 2 shows the
Relations between activities of the log λ in Example 1, after adding the found
relations by ﬁltering the log.
    The rest of (a →λ b) relations after removing all the parallel behavior (||λ )
determine the existence of a place between activities a and b; some of these
are already considered by the places which conform a sequential Petri Net sub-
structure, such is the case of the strong causal relations (a ⇒ b) in COc1 and
(x ⇒ d), (d ⇒ e), (e ⇒ y) in COc3 for Example 1. For the rest of relations, the
addition of places is quite intuitive, it is deﬁned as follows.




                                           40
               Table 2. Relations between activities after log ﬁltering

                             T   x   a b c d e y
                             x       →        ⇒
                             a          ⇒     || ||
                             b             → || || →
                             c       →        || ||
                             d       || || ||    ⇒
                             e       || || ||       ⇒
                             y



Deﬁnition 13. (Merging activities in →λ relation). When the left activity in
two →λ relations are the same (a →λ b, a →λ c) two possible substructures can
be created.
    a) The places of →λ relations are merged into a single one iﬀ (b λ c ∨
c λ b) This is denoted as [a, b + c].
    b) The places of →λ relations are not merged iﬀ (b →λ c ∨ c →λ b) This is
denoted as [a, b||c].
    Similarly, for →λ relations when the right activity is common, (b →λ a, c →λ
a), the substructure created will be either [b||c , a] or [b + c , a] accordingly. In
general, a set of →λ relations in the form (a →λ b, a →λ c, . . . , a →λ d) may
produce either [a, b + c + d] or [a, b||c||d]. Consequently, the merging can be
applied to composed relations, i.e. [b + c , a] and [b + c , d] leads [b + c , a + d].

    For the activities in f irst(σ) or last(σ) we add two more places corresponding
to the i and o places in the ﬁnal workﬂow model i.e. a ∈ i • |∃σi , a ∈ f irst(σi )
and a ∈ •o|∃σi , a ∈ last(σi )
    A similar way to infer the merging of places is presented in [2]. In contrast to
this result our proposal reduces considerably the space search since high percent-
age of the places are already found previously by the sequential sub-structures
    The Fig. 5 shows the Petri Net model after connecting the sequential sub-
structures with the new relations.
    All the deﬁnitions previously described can be summarized in the following
algorithm that we called the Conjoint Occurrence Miner (CoM ).


6    Implementation and tests

We implement the CoM algorithm as a plunging in the ProM Framework [9] to
test our results and compare our approach with other important techniques in
the literature.
    The aim of these experiments is to show ﬁrst, how the CoM algorithm deals
with small logs, and then, to present how others process discovery algorithms
work with the same incomplete logs.




                                       41
Algorithm 1 Conjoint Occurrence Miner (CoM )
Input: Log λ.
Output: sound Workﬂow net N .
 1: Compute →λ , ||λ relations.
 2: for ∀a ∈ T do
 3:   Compute InvP red(a, λ) and InvSucc(a, λ)
 4:   Compute Oi(a)
 5: end for
 6: From Oi compute the COc classes COc
 7: Build the P N sequential substructures
 8: Find the missing → relations
 9: Build the ﬁnal P N as stated in (5.3)




                     Fig. 5. Petri Net model discovered from λ



6.1   Experiments

Experiment 1
We used the Workﬂow net N presented in Figure 2 to generate a complete
workﬂow log λ according to the → relation, then we construct 4 smaller sub-
logs of λ by removing traces from λ , which have 0.90, 0.80, 0.75 and 0.60 of the
average ratio of →-pairs compared to the model N , named λ0.90 , λ0.85 , λ0.75 and
λ0.60 respectively. In this case, λ0.60 corresponds to λ presented in Example 1.
     For this experiment, we consider the following discovery algorithms: Integer
Linear Programing miner (ILP )[20], α-algorithm [2], Inductive Miner (IM ) [13],
and Inductive Miner Incompleteness (IMin) [13], we use the ProM Framework
[9] for perform the experiments; ILP (λ), for instance, means the application of
ILP algorithm to the log λ. We want to show (1) for which of the mentioned
logs the algorithm return a language equivalent model and (2) if the algorithm
is able to rediscover the original model.

Results.
   (1) L(N ) = L(ILP (λ0.85 )), L(N ) = L(α(λ0.90 )), L(N ) = L(IM (λ0.90 )),
L(N ) = L(IM in(λ0.60 )) and L(N ) = L(CoM (λ0.60 )).




                                      42
   (2) ILP , α and IM require the full → relations for rediscover the original
model N , whereas IM in and CoM rediscover N with λ0.90 , λ0.60 respectively.
   The one that needs less information for rediscover the model is the CoM
algorithm. Figure 6 shows fragments of the resulting models applying the others
algorithms to λ.60 .




                       Fig. 6. Discovered models from λ.60


    Experiment 2
In this experiment, thirty logs are used to tests the proposed method and other
discovery techniques. Such logs have been obtained in the same way than in
Experiment 1, from six diverse sound WFN involving form six to ten activities;
the nets are shown in Figure 7.
    Results. Table 3 shows the results of Experiment 2. Column 2: gives the per-
centage of logs for which the discovery technique returns a language equivalent
model. Column 3: gives the percentage of logs for which the technique is able to
rediscover the original model.


                      Table 3. Results of the Experiment 2

                            Language
                       miner           Rediscoverability
                            equivalent
                        ILP    36%           23%
                         α     23%           16%
                        IM     56%           33%
                       IMin   100%           53%
                       CoM     90%           83%




                                     43
                     Fig. 7. Workﬂow nets of Experiment 2.



6.2   Limitations of the method



Although the method shows good results for dealing with incompleteness, there
are particular behaviors that cannot be discovered using this technique; such is
the case of short loops structures as shown in N1 and N2 of Figure 8. This is
because it is not easy to infer the conjoint occurrence classes COc from traces
in which an activity can be observed directly after itself.




                            Fig. 8. Two sound WFN




                                    44
7   Conclusion
In this work a novel discovery technique is presented. It is based on a new
relation between activities called Conjoint Occurrence Relation, which leads to
a partition of the activities alphabet into classes. Such classes are used to infer
causal and parallel relations missing in the incomplete log, and consequently,
Petri net substructures. Furthermore, the experimental results show that the
performance algorithm is acceptable with respect to other relevant discovery
techniques. Although the technique is presented as a discovery one, which can
deal with logs including reduced number of traces, the proposed notions and
algorithms seem to be a promising approach for dealing with the problem of
rediscoverability. Current research addresses study the class of models and the
minimum behavior exhibited in the log for ensure this property.

References
 1. Aalst, W., Buijs, J., Dongen, B.: Data-Driven Process Discovery and Analysis: First
    International Symposium, SIMPDA 2011, Campione d’Italia, Italy, June 29 – July
    1, 2011, Revised Selected Papers, chap. Towards Improving the Representational
    Bias of Process Mining, pp. 39–54. Springer Berlin Heidelberg, Berlin, Heidelberg
    (2012), http://dx.doi.org/10.1007/978-3-642-34044-4_3
 2. Van der Aalst, W., Weijters, T., Maruster, L.: Workﬂow mining: Discovering pro-
    cess models from event logs. Knowledge and Data Engineering, IEEE Transactions
    on 16(9), 1128–1142 (2004)
 3. van der Aalst, W.M.P.: Process Mining: Discovery, Conformance and Enhancement
    of Business Processes. Springer Publishing Company, Incorporated, 1st edn. (2011)
 4. Agrawal, R., Gunopulos, D., Leymann, F.: Mining Process Models from Work-
    ﬂow Logs. In: Schek, H.J., Saltor, F., Ramos, I., Alonso, G., Schek, H.J., Saltor,
    F., Ramos, I., Alonso, G. (eds.) EDBT. Lecture Notes in Computer Science,
    vol. 1377, pp. 469–483. Springer (1998), http://dblp.uni-trier.de/rec/bibtex/
    conf/edbt/AgrawalGL98
 5. Angluin, D.: Queries and Concept Learning. Mach. Learn. 2(4), 319–342 (Apr
    1988), http://dx.doi.org/10.1023/a:1022821128753
 6. Buijs, J.C.A.M., van Dongen, B.F., van der Aalst, W.M.P.: Quality dimen-
    sions in process discovery: The importance of ﬁtness, precision, generaliza-
    tion and simplicity. International Journal of Cooperative Information Systems
    23(01), 1440001 (2014), http://www.worldscientific.com/doi/abs/10.1142/
    S0218843014400012
 7. Cabasino, M.P., Darondeau, P., Fanti, M.P., Seatzu, C.: Model identiﬁcation and
    synthesis of discrete-event systems. Contemporary Issues in Systems Science and
    Engineering (2013)
 8. Cook, J.E., Du, Z., Liu, C., Wolf, A.L.: Discovering models of behavior for concur-
    rent workﬂows. Computers in industry 53(3), 297–319 (2004)
 9. Dongen, B.F., Medeiros, A.K.A., Verbeek, H.M.W., Weijters, A.J.M.M., Aalst,
    W.M.P.: Applications and Theory of Petri Nets 2005: 26th International Con-
    ference, ICATPN 2005, Miami, USA, June 20-25, 2005. Proceedings, chap. The
    ProM Framework: A New Era in Process Mining Tool Support, pp. 444–454.
    Springer Berlin Heidelberg, Berlin, Heidelberg (2005), http://dx.doi.org/10.
    1007/11494744_25




                                       45
10. Estrada-Vargas, A.P., López-Mellado, E., Lesage, J.J.: A comparative analysis of
    recent identiﬁcation approaches for discrete-event systems. Mathematical Problems
    in Engineering 2010 (2010)
11. Estrada-Vargas, A.P., López-Mellado, E., Lesage, J.J.: Input–output identiﬁcation
    of controlled discrete manufacturing systems. International Journal of Systems
    Science 45(3), 456–471 (2014)
12. Gold, M.E.: Language identiﬁcation in the limit. Information and Con-
    trol 10(5), 447–474 (1967), http://www.isrl.uiuc.edu/\~{}amag/langev/paper/
    gold67limit.html
13. Leemans, S.J.J., Fahland, D., Aalst, W.M.P.: Application and Theory of Petri
    Nets and Concurrency: 34th International Conference, PETRI NETS 2013, Mi-
    lan, Italy, June 24-28, 2013. Proceedings, chap. Discovering Block-Structured
    Process Models from Event Logs - A Constructive Approach, pp. 311–329.
    Springer Berlin Heidelberg, Berlin, Heidelberg (2013), http://dx.doi.org/10.
    1007/978-3-642-38697-8_17
14. Leemans, S.J.J., Fahland, D., Aalst, W.M.P.: Application and Theory of Petri
    Nets and Concurrency: 35th International Conference, PETRI NETS 2014, Tu-
    nis, Tunisia, June 23-27, 2014. Proceedings, chap. Discovering Block-Structured
    Process Models from Incomplete Event Logs, pp. 91–110. Springer International
    Publishing, Cham (2014), http://dx.doi.org/10.1007/978-3-319-07734-5_6
15. Meda-Campana, M., Ramirez-Treviro, A., López-Mellado, E.: Asymptotic identi-
    ﬁcation of discrete event systems. In: Decision and Control, 2000. Proceedings of
    the 39th IEEE Conference on. vol. 3, pp. 2266–2271. IEEE (2000)
16. Mokhov, A., Carmona, J.: Event log visualisation with conditional partial order
    graphs: from control ﬂow to data. In: Proceedings of the International Workshop
    on Algorithms & Theories for the Analysis of Event Data, ATAED 2015, Brussels,
    Belgium, June 22-23, 2015. pp. 16–30 (2015), http://ceur-ws.org/Vol-1371/
    paper02.pdf
17. Saives, J., Faraut, G., Lesage, J.J.: Identiﬁcation of discrete event systems unob-
    servable behaviour by petri nets using language projections. In: Control Conference
    (ECC), 2015 European. pp. 464–471 (July 2015)
18. Tapia-Flores, T., López-Mellado, E., Estrada-Vargas, A.P., Lesage, J.J.: Petri net
    discovery of discrete event processes by computing t-invariants. In: Emerging Tech-
    nology and Factory Automation (ETFA), 2014 IEEE. pp. 1–8 (Sept 2014)
19. Wen, L., Aalst, W.M.P., Wang, J., Sun, J.: Mining process models with non-free-
    choice constructs. Data Mining and Knowledge Discovery 15(2), 145–180 (2007),
    http://dx.doi.org/10.1007/s10618-007-0065-y
20. Werf, J.M.E.M., Dongen, B.F., Hurkens, C.A.J., Serebrenik, A.: Applications and
    Theory of Petri Nets: 29th International Conference, PETRI NETS 2008, Xi’an,
    China, June 23-27, 2008. Proceedings, chap. Process Discovery Using Integer Linear
    Programming, pp. 368–387. Springer Berlin Heidelberg, Berlin, Heidelberg (2008),
    http://dx.doi.org/10.1007/978-3-540-68746-7_24




                                       46