An Optimal Process Model for a Real Time
                     Process

 Likewin Thomas∗ , Manoj Kumar M V∗ , Annappa B∗ , and Vishwanath K P†
                    ∗
                      Department of Computer Science and Engineering
              †
                  Department of Mathematical and Computational Science
                  National Institute of Technology Karnataka, Surathkal,
                                    Mangalore - 575025
                                           India
                         {likewinthomas, manojmv}@nitk.ac.in
                                     annappa@ieee.org
                              shastryvishwanath@gmail.com
                               http://www.cse.nitk.ac.in


        Abstract. Recommending an optimal path of execution and a com-
        plete process model for a real time partial trace of large and complex
                                                                 α
        organization is a challenge. The proposed AlfyMiner ( y M iner) does
        this recommendation in cross organization process mining technique by
        comparing the variants of same process encountered in different organiza-
            α
        tion. y M iner proposes two novel techniques Process Model Compara-
            α                                                            α
        tor ( y Comp ) and Resource Behaviour Analyser (RBAM iner ). y Comp
        identifies Next Probable Activity of the partial trace along with the
        complete process model of the partial trace. RBAM iner identifies the
        resources preferable for performing Next Probable Activity and analyse
                                                                  α
        their behaviour based on performance, load and queue. y M iner does
        this analysis and recommend the best suitable resource for performing
        Next Probable Activity and process models for the real time partial trace.
        Experiments were conducted on process logs of CoSeLoG Project1 and
        72% of accuracy is obtained in identifying and recommending NPA and
        the performance of resources were optimized by 59 % by decreasing their
        load.


        Keywords: Cross Organization Process Mining, Resource Behavior, Best
        Resource, Polynomial Regression Model, Resource Performance, Resource
        Load, Resource Queue: Average Waiting Time.


1     Introduction
In the current world where the resources are being shared among different or-
ganization through the cloud computing paradigm, most of the organizations
have started to shift towards Shared Business Process Management Infrastruc-
ture (SBPMI). Due to this shift in modelling paradigm, organizations have to
1
    http://dx.doi.org/10.4121/uuid:26aba40d-8b2d-435b-b5af-6d4bfbd7a270


                                        117
continuously improve their process [1]. But most of the organizations are still
depending on the external service providers to monitor their business process,
hence the business links are to be established with those external agencies [2].
This issue was well addressed by the Information Technology by developing var-
ious work-flow tools [3] [4] [5] [6]. The challenge here is to extend the service
from boundary of single organization to cross organizations.

    Due to data explosion [7] getting insight and performing analysis on the
data to understand their behaviour and discover an optimized process model
is always been a challenge to any organization in the process mining environ-
         α
ment. y M iner uses SBPMI, to analyse the data behaviour of an organization.
This is achieved by comparing the model of same variant using RBAM iner in
SBPMI and recommending the best suitable process model. The context of this
paper is the CoSeLoG Project2 . The data used for the experiment and analy-
sis of proposed algorithm is obtained from the Configurable Services for Local
Government (CoSeLoG) Project. This project was executed under Dutch Orga-
nization for Scientific Research (NWO) [8].

     αy M iner is a new analytical tool for discovering the optimal path of com-
pletion of a partial trace along with recommendation of complete process model.
It proposes two novel techniques    α  y Comp and RBAM iner .   α  y Comp identifies
the optimal path of completion by matching the partial trace and discovering
the variants in all process models logged in the repository. It identify and recom-
mends the Next Probable Activity (NPA) of partial trace. RBAM iner identifies
the suitable resource for performing the discovered NPA, by analysing the be-
haviour of all resources capable of performing NPA based on their performance,
load and waiting time.

     α y M iner is analysed using the running example [2]. NPA for the partial
trace and optimal process model is identified in cross organization environment
         α
using y Comp [3] and the resource preferable for performing NPA is analysed
and recommended using RBAM iner [4]. The experiment is conducted using the
real time event log of CoSeLoG Project3 and the result of RBAM iner is presented
in section [5].


2     Running Example

               α
The proposed y M iner is illustrated using the running example of four variant
process model containing 9 activities, shown in Figure[1b]. The corresponding
sample event log describing the process execution of the process model is shown
in Table[1]. Here the traces matches model perfectly which is not the cases in
real life process model. The complete log file of the running example can be
2
    http://dx.doi.org/10.4121/uuid:26aba40d-8b2d-435b-b5af-6d4bfbd7a270
3
    http://dx.doi.org/10.4121/uuid:26aba40d-8b2d-435b-b5af-6d4bfbd7a270


                                      118
found at Process Mining @ NITK 4 . The experimental results are obtained using
the CoSeLoG Project5 .


2.1     Proposed Problem

Consider an online process shown in Figure[1a], the dotted line shows the path
of execution of the online process. Sub-scripted values at each activities are the
sequence of occurrence of the activities ( A1 → B2 → C3 ). At activity C 3 ,
decision has to be taken about which next activity to be performed, either D
or E.    α
         y M iner identify the NPA and recommends the suitable resource for
performing NPA.


                                                               29
                                                                         D 23 G 37                                    29
                                                                                                                           D 28 G 20
                                            A 85 B
                                                   85                          19               80
                           (1)                        C             20
                                                                    51
                                                                                     H   A 80 B    C 12                                     16
                                                                                                                                                     H
                                                                           E   35                    23
                                                                                                                           E
                                                                                                                                   23
                                                                                                                                        I       38


                                                   Process Model 1                                    Process Model 2
                           (2)
                                                     42   C 40  26 I 33                        28         B 47                     50   G 50
                                                                                                      29   42
                                            A 56 B 2414     14
                                                               E 15 15 H                 A   10 6
                                                                                             6   26
                                                                                                      15
                                                                                                          C    E 10
                                                                                                               6 5
                                                                                                                               6
                                                                                                                                   36
                                                                                                                                            6
                                                                                                                                                     H
                                                        D 26 30 G 23                          32           39
                                                                                                          D 60                          I 36
                           (3)                       20     19                                         15


                                                           F                                               F

         D (4)                   E                 Process Model 3                                    Process Model 4

                                         (b) Process Models: Four variants of interview pro-
F        G        H                      cess (registration (A), validity check (B), document
(Next Probable Activity)
                                         check (C), information check (D), decide (E), interview
(a) Illustration of On-                  (I), group discussion (G), result (H) and re-initiates
line Process Model                       (F))
Fig. 1: Running Example


3     Alfy Miner (               αy M iner)
αy M iner is intended to identify and predict the optimal path of execution
along with the complete process model, for a real time process. On identifying
the currently executing activity Ai , αy M iner recommends the optimal path of
completion and the best suitable process model matching the partial trace with
same variant event logs, logged in the process model repository. On identify-
ing the matched variants, the optimal process models are identified by running
                                     α
process model comparator y Comp which matches the partial trace. Recommen-
dation of Next probable Activity NPA is done by selecting NPA (Ai ) in identified
suitable process model. The Algorithm [1] gives the execution steps of y M iner.                                           α
4
  http://http://processminingnitk.blogspot.in/2015/03/best-resource-
  recommendation-for.html
5
  http://dx.doi.org/10.4121/uuid:26aba40d-8b2d-435b-b5af-6d4bfbd7a270


                                                  119
Case ID TRACE                                                                  Duration
10358444 A12350     630640    221210    23640     7560      631250
          24/01/14 B28/01/14 C29/01/14 D02/02/14 E15/02/14 H26/02/14           33
12421232 A23640     530640    230410    12350     7716      631250
          25/01/14 B26/01/14 C28/01/14 D09/02/14 G13/02/14 H24/02/14           30
12592056 A12350     4503      630450    721560    7560      631250
          02/03/14 B12/03/14 C18/03/14 G26/03/14 E27/03/14 H08/04/14           37
12610928 A23640     530640    230410    7560      23640     7716      631250
          12/05/14 B17/05/14 C29/05/14 E05/06/14 D16/06/14 G28/06/14 H05/07/14 54
12984815 A12350     630450    221210    12350     721560    7716      631250
          16/08/14 B29/08/14 C09/09/14 D16/09/14 G22/09/14 E15/10/14 H27/10/14 72


(a) Event Log of Process Model 1
Case ID TRACE                                                                            Duration
13945854 A23640     450320    630450    23640      720560   631250
          26/01/14 B28/01/14 C31/01/14 D15/02/14 G19/02/14 H26/02/14                     31
13968144 A12350     630450    221210    12350     631210    631250
          12/02/14 B19/02/14 C22/02/14 E09/03/14 I26/03/14 H28/03/14                     44
15073705 A12350     530640    630450    12350      771620   720560    631250
          12/04/14 B29/04/14 C02/05/14 D15/05/14 G19/05/14 E26/05/14 H08/06/14           57
16609162 A23640     530640    230410    720560     23640    771620    631250
          15/04/14 B19/04/14 C02/05/14 E15/05/14 D16/05/14 G18/05/14 H20/05/14           35
16789201 A12350     630450    221210    23640      721560   771620    641210    631250
          19/06/14 B23/06/14 C29/06/14 D15/07/14 G27/07/14 E09/08/14 I16/08/14 H23/08/14 65


(b) Event Log of Process Model 2
Case ID TRACE                                                                                                          Duration
16796450 A12350     450320    630450    720560    651210    631250
          02/05/14 B23/05/14 C15/06/14 E19/06/14 I09/07/14 H27/07/14                                                   86
          23640     450320    221210    720560    720560    630450    221210    720560    651210    631250
17031584 A26/07/14 B15/08/14 C29/08/14 E12/09/14 F28/09−14 B13/10/14 C18/10/14 E22/10/14 I29/10/14 H30/10/14           96
17939005 A12350     630450    630450    720560    720560    450320    23640     720560     720560   631250
          05/10/14 B13/10/14 C22/10/14 E29/10/14 F13/11/14 B19/11/14 D02/12/14 E06/12/14 G10/12/14 H24/12/14           80
19472044 A23640     530640    630450    720560    720560    630450    230410    720560     721560   631210    631250
          15/12/14 B19/12/14 C28/12/14 E03/01/15 F05/01/15 B16/01/15 C18/01/15 E22/01/15 G23/01/15 I28/01/15 H29/01/15 45
25845687 A23640     530640    630450    720560     721560   631250
          12/11/14 B14/12/14 C19/12/14 E22/12/14 G27/12/14 H30/12/14                                                   48

(c) Event Log of Process Model 3
Case ID TRACE                                                                                            Duration
19830478 A12350
          02/05/14 B 12350
                     02/05/14 E12350
                               02/05/14  G 12350
                                           02/05/14 H 12350
                                                      02/05/14                                           53
19834032 A12350      12350     12350      12350       12350     12350       12350    12350
          02/05/14 B02/05/14 E02/05/14 F02/05/14 C02/05/14 E02/05/14 G02/05/14 H02/05/14                 52
19836934 A12350      12350      12350     12350      12350       12350     12350      12350     12350
          02/05/14 B02/05/14 C02/05/14 E02/05/14 F02/05/14 B02/05/14 E02/05/14 G02/05/14 H02/05/14 59
19838656 A12350      12350      12350     12350      12350       12350     12350      12350     12350
          02/05/14 D02/05/14 B02/05/14 E02/05/14 F02/05/14 B02/05/14 E02/05/14 G02/05/14 H02/05/14 37
19844185 A12350
          02/05/14 D 12350
                     02/05/14 C 12350
                                02/05/14 E12350
                                          02/05/14  F12350
                                                     02/05/14  B 12350
                                                                 02/05/14 E12350
                                                                           02/05/14 G 12350
                                                                                      02/05/14 H12350
                                                                                                02/05/14 29


(d) Event Log of Process Model 4
Table 1: Event logs of four different process models of interview pro-
cess shown in figure[1b]. Each log table shows Case ID1 , Trace2 and the total
duration3 . Each cell in trace, shows the activity of the trace, Resource (Super-
scripted) and the time of occurrence of that activity (sub-scripted).

  Algorithm 1:             αy M iner
      Input: Partial Real Time Trace
      Output: NPA & Process Model
 1    Develop Process model repository;
 2    repeat
 3        MatchV ar ← Call Match Variant(Ai );
 4        α                 α
             y Comp ← y Comp(MatchV ar ) ;
 5        Set(NPA) ← InOutBinding (C-Net )
 6    until for each currently executing activity Ai


3.1      Process Model: Casual Net
αy M iner uses Casual Net: C-Net notation to represent the process model. C-
Net is a six-tuple: {A,D,ai ,ao ,I,O} representation of process model with A:{set
of activities}, D:{Set of Dependencies}, ai :{Set of Start activities}, ao : {Set of
Output activities} , I : {Set of Input Binding} , O: {Set of Output Binding}.


                                                         120
C-Net for all the four process model of the running example is shown in Figure 2.
The repository of process model is maintained for analysing process behaviour.

                         Process Model 1                                                                 Process Model 2
  A = { A, B, C, D, E, G, H}                                                  A = { A, B, C, D, E, G, I, H}
  D = {(A,B), (B,C), (C,D), (C,E), (D,G), (G,H), (E,H)}                       D = {(A,B), (B,C), (C,D), (C,E), (D,G), (E,I), (G,H), (I,H)}
  ai = {A}                                                                    ai = {A}
  ao = {H}                                                                    ao = {H}
   I = {I(A):{Null}, I(B):A, I(C):B, I(D):C, I(E):C, I(G):D, I(H):{G,E}}       I = {I(A):{Null}, I(B):A, I(C):B, I(D):C, I(E):C, I(G):D, I(H):{G,E}}
  O = {O(A): B, O(B):C, O(C):{D,E}, O(D):G, O(G):H, O(E):H, O(H):{Null}       O = {O(A): B, O(B):C, O(C):{D,E}, O(D):G, O(G):H, O(I):H, O(H):{Null}
                                                Process Model 3
  A = { A, B, C, D, E, F, G, I, H}
  D = {(A,B), (B,C), (B,D), (C,E), (D,E), (E,F), (E,I), (E,G), (F,B), (I,H), (G,H)}
  ai = {A}
  ao = {H}
   I = {I(A):{Null}, I(B):{A,F} I(C):B, I(D):B, I(E):{C,D}, I(F):E, I(I):E, I(G):E, I(H):{I,G}}
  O = {O(A): B, O(B):{C,D}, O(C):E, O(D):E, O(E):{I,G,F}, O(F):B, O(I):H, O(G):H, O(H):{Null}
                                              Process Model 4
  A = { A, B, C, D, E, F, G, I, H}
  D = {(A,B), (A,C), (A,D), (B,E), (C,E), (D,E), (E,F), (F,B), (F,C), (F,D), (E,G), (E,I), (G,H), (I,H)}
  ai = {A}
  ao = {H}
   I = {I(A):{Null}, I(B):{A,F}, I(C):{A,F}, I(D):{A,F}, I(E):{B,C,D}, I(F):E, I(G):E, I(I):E, I(H):{G,I}}
  O = {O(A): {B,C,D}, O(B):E, O(C):E, O(D):E, O(E):{F,G,I}, O(F):{B,C,D} O(I):H, O(G):H, O(H):{Null}
Fig. 2: C-Net Representation of process Model in Figure 1b


3.2      Matching variants with Path Detector
When an online process is getting executed, identifying to which variant the cur-
rently executing trace belongs is a challenge for y M iner. Algorithm Variant                   α
M atch [2] identify the path of execution along with the set of possible NPA.
VariantM atch uses the concept of linked list with 2 nodes: CellSN ode and Vari-
ant N ode which are represented as class. Cell N ode = {f rom1 ← {•a}, to2 ← a,
value3 ← {|•a → a|σ}, count4 = |•a → a| ∈ ζ. Variant N ode {*matrix (address of
Cell N ode ), *prev2 *next3 (address of next and previous Cell N ode )}. The Cell N ode
Figure[3a] stores the information of trace A→B→C→E→F→B→D→E→G→H
of process model 2. The value3 field remains 1 till the sequence in trace appears
first time. On identifying the loop, value in value3 filed is updated to 2 as shown
at Cell N ode with memory 500 in Figure[3a]. Value3 field is an array and stores
the value 1,2 to indicate the sequence B→C is appearing second time in the
trace.Count3 is a counter of the sequence appearance in the trace. Variant N ode
Figure[3b] stores the information of all the variants. This is used while compar-
ing the online sequence with the variants. If a variant matches the sequence,
then that variant is retained else it is deleted from the linked list.
3.3      Process Model Comparator (                                          αy Comp )
αy Comp compares the C-Net of all the variants in cross organization environ-
ment based on following comparison metrics.

 1. Process Model Metric: Compare total number of activities, resources, traces
    and variants
 2. Relation Metric: Compare total number of parallel, serial activities and
    loops.


                                                                      121
                                                                                          10000             10100             10200
           050   100   200   300       400   500   600   700   800   900
                                                                              *Matrix       50               1050              1100
From        A     B    C     E         F     B     C     E     G      I                     Null
                                                                              *Prev                         10000             10100
To         B      C     E    F         B     C     E     G      I     H
Value      1      1     1    1         1     1 2   1 2   2     2      2
                                                                              *Next        10100            10200              Null
                                                                                        First Cell Node   Second Cell Node Third Cell Node
Count      1      1    1     1         1     2     2     2     2      2
*Next      100   200   300   400       500   600   700   800   900   Null
                                                                              (b) Structure of Vari-
(a) Structure of Cell N ode for sequence                                      ant N ode for the set
A→B→C→E→F→B→D→E→G→H of process                                                of Cell N ode of process
model 2                                                                       model 2
Fig. 3: Structure of CellN ode and VariantN ode


        Algorithm 2: Matching the Variants: V ariantM atch ()
           Input: Online process
           Output: Matching matrix
       1   Match Variant() struct variantN ode ?gvn, ?tempvn; (gvn : address of linked list say
           globle Variant Node), Let ?gvn gives address of the double linked list, Initialize all counter
           in cellN ode → 0;
       2   repeat
       3        ?tempvn ← &gvn Get the address of the double linked list;
       4        repeat
       5             ?tempcn ← &matrix Get the address of the matrix ;
       6             tempcn→from = sequence[i] ∧ tempcn→to = sequence[i+1];
       7             if not found then Delete current variantN ode from double linked list and go to 5
       8             else Increment the member variable count;
       9             if count == val[count] (Current and previous check are passed) then Go to
                     next→variantN ode in the double linked list and go to step 5
 10                  else Delete the current→variantN ode from the double linked list and go to 5
 11             until ?next in double linked list is null
 12        until for each activity in online process
 13        Remaining variantnode present in tempvn are all matched variant table for the given
           sequence.


     3. Complexity Metric: Compare total number of split and join.
     4. Service Time Metric: Compare the queue time for each activity.
     5. Fitness Metric: Running fitness test along with the time of completion and
        valid no of sequence in each event log.


Process Model Metric The process model comparison is done based on No
of {Activities, Resources, Traces & Varinats } and is shown in Table 2a.


Relation Metric                    α
                       y Comp analysed that if a model has more parallel relation
it performs well when compared to serial relation, at the same time if the loop
is increased the consumption of execution time also increases. Parallel relation
is identified by Equation 4 in Definition 1. Loops are identified by Equation 5.

Definition 1. Log based ordering relation
Let A = [a, b, c, d, e] be the set of activities and let L be the simple event log


                                                         122
i.e., L ∈ A ∗ and Let A be ath                   th
                            i activity and B be ai+1 then,

                ˙
      DirectlyF ollow(a>L b) ← {iff ∃ trace σ = ht1 , t2 , ..., tn i ∧

                             i ∈ [1, 2, ....., n − 1] | σ ∈ L,
                              ∧ ti = a, ∧ ti + 1 = b}                          (1)
                 ˙ (a−→ b) ← {iff a >L b ∧ b ≯L a}
          Casuality                                                            (2)
                       L

                   ˙ (a# b) ← {iff a ≯L b ∧ b ≯L a}
           U nrelated                                                          (3)
                        L

                     ˙ (ak b) ← {iff a >L b ∧ b >L a}
              P arallel                                                        (4)
                          L


             Loop(a>˙ L b>L a) ← {iff (ai == ai+2 ) → ai >L ai+1 >L ai+2 }     (5)

   The Table 2b gives the relation metric of all the four models in running
example.


Complexity Metric Complexity metric identifies the joins and splits in the
process model. Joins and split are identified using the result of output and in-
put binding. Consider the Figure[1b] where for process model 1: O(A)={B}=85
times, similarly the split {CDE} = 20, its means 20 times activity C is 20 times
followed by both D and E, join {GEH} is joined 16 times. Using this information
complexity metric shown in Table[2c] is developed.


Service Time Metric This metric gives the total service time comparison
for an activity in each model. This comparison helps in identifying the model
serving an activity with less service time. The service time is calculated by
Peach cases
   i=1      duration(Ai ), where Ai ⊆ A (set of activities). The sample output
in seconds is shown in Table 2d.


Fitness Metric This gives the numbers of traces that can be successfully run
on the model. This is helpful in deciding how efficient the model is, in running
             α
the trace. y Comp identifies the model which runs maximum number of traces
with minimum time. Consider the Table 2e.


3.4     Binding Relation

On identifying variants following the partial trace, the NPA of currently execut-
ing activity Ai is identified using binding relation which bind the incoming and
outgoing activity of Ai . Algorithm 3 eplain the concept of binding relation, where
for each trace in a case, if an activity A is followed by B, then A.outbond ← B
∧ B.inbound ← A, i.e., A has out-bounding relationship with B and similarly B
as in-bounding relationship with A


                                          123
     No of Activities No of Resources No of Traces No of Variants
PM 1 8                16              90           10
PM 2 8                14              80           13
PM 3 9                14              56           19
PM 4 9                14              86           51

(a) Process Model Metric

    No of Dependency No of Parallel No of Loops No of Serial        Joins Splits
PM1 7                2              0           5               PM1 19    20
PM2 8                4              0           4               PM2 16    12
PM3 11               2              2           7               PM3 29    29
PM4 14               3              3
                                                                PM4 33    28

(b) Relation Metric                                             (c) Complexity Metric

    A       B         C       D
                                                                           PM1 T(PM1) PM2 T(PM2) PM3 T(PM3) PM4 T(PM4)
PM1 3678956 45896374 56987845 1236589                           Event Log1 1    56897845 0.9 78456975 0.75 45789647 0.65 56587874
PM2 2598964 56978746 78594785 4589647                           Event Log2 0.8 45878123 1    45678412 0.9 78956478 0.95 78945698
PM3 4577896 36987567 23698124 5698347                           Event Log3 0.6 45236984 0.75 56898774 1    69875457 1    65327841
                                                                Event Log4 0.45 32789564 0.6 68974564 0.75 39845641 1
PM4 1236978 23678945 22456378 4548768
                                                                (e) Fitness Metric
(d) Service Time Metric
Table 2: Process Model Comparator (                             αy Comp)
    Algorithm 3: To calculate Input & Output Binding
  1   InOutBinding () Input: Ai , RTrace
      Output: Ai .InputBinding , Ai .OutputBinding
  2   repeat
  3       if (|a >L b|) then
  4            a.Outbound ← b ∧ b.Inbound ← a
                      P
  5       |a >L b| =      L(σ) × |{1 ≤ i < |σ| | σ(i) = a ∧ σ(i + 1) = b}| [see [7]]
                         σ∈L
  6   until for each sequence in trace σ in event log L


4      Resource Behaviour Analyser (RBAM iner )

αy M iner on discovering suitable process model with NPA identifies the re-
sources preferable for performing NPA. Set of resource preferable for perform-
ing NPA is identified using Activity/Resourcerep [3]. RBAM iner analyse the be-
haviour and recommend the suitable resource for performing NPA. Behaviour of
the resources is analysed based on 3 parameter: Performance, Load and Queue
using polynomial regression model for load and performance [4.2] and Average
Servicing Time at resource using queue model [4.3]. Algorithm 4 explains the
concept of resource behaviour analysis.


4.1      Activity/Resourcerep

αy M iner identifies the list of resources performing an activity in entire process
log along with the time consumed by them for performing that activity. The
Table 3 gives representational view of list of resources performing an activity in
process model 1 along with the time consumed.


                                                          124
  Algorithm 4: RBAM iner
 1    RBA(N P A)()
      Input: N P A&BestResActivity
      Output: Recommendationof Res(N P A)
 2    repeat
 3        Load(Res(N P A) ) ←− P oly.Load(Load(Res(N P A) )); [see algo5]
 4        P erf (Res(N P A) ) ←− P oly.P erf (Res(N P A) ); [see algo5]
 5        AvgW aiting T ime(Res(N P A) ) ←− Queue(Res(N P A) ); [see algo 6]
 6    until (for each resource of N P A in BestResActivity Table)
 7    Recommend the optimal load, performance and waiting time resource


Activity Res 12350 Res 23640 Res 630450 Res 530640 Res 450320 Res 221210 Res 230410 Res 501 Res 771620 Res 502 Res 771620 Res 721560
A        36.657 45.380 DNP              DNP        DNP        DNP        DNP        DNP DNP            DNP DNP            DNP
B        DNP       DNP       18.473     22.667     9.25       DNP        DNP        DNP DNP            DNP DNP            DNP
C        DNP       7         24.684     DNP        DNP        5.4667     22.294     DNP DNP            DNP DNP            DNP
D        DNP       25.53     DNP        DNP        DNP        DNP        DNP        72      11.5       DNP DNP            DNP
E        DNP       25.531 DNP           DNP        DNP        DNP        DNP        62.5    DNP        91      11.5       DNP
G        DNP       DNP       DNP        DNP        DNP        DNP        DNP        DNP 7              DNP DNP            13.944
Table 3: Activity/Resourcerep of process model 1 of running example
[DNP: Did Not Play ]

4.2      Resource load & performance analyser


The Yerkes-Dodson Law of Arousal, also known as Arousal Theory, states that
by increasing arousal, the workers performance can be improved. However, if the
level of arousal increases too much, performance decreases Figure[4a] [9]. The
RBAM iner identifies the level of arousal : Optimal Load i.e., the maximum load
the resource can handle efficiently, along with its performance using polynomial
regression model. Performance is a ratio of Total time taken by Load. The per-
formance was analysed by increasing the load and observing the time taken.
It was observed that, as the load was increased, the consumption of the time
was decreasing. But at some point there was a drift and the time consumption
started increasing. That drifted point is known as Arousal (optimal load and
performance of the resources). The Algorithm[5] identifies the load ` and perfor-
mance [T otal time ÷ `] for /resource/unit time.

    The RBAM iner first filters the unperformed load 1 (an activity with 0 ms)
and residual load 2 (an activities with exceptional duration). Then the actual
load (`) and average time of Service (α) of each worker each month is identi-
fied. Polynomial regression model[5] is applied on this cleaned data. Since the
RBAM iner is intended in identifying the second degree regression model, the
regression model initialize a 3×3 matrix (A) and 3×1 matrix (B) as shown in
figure [4b& 4c]. Then the transpose of matrix A is multiplied with matrix B.
The result obtained is the coefficient of polynomial equation. On applying the
load on an equation the polynomial curve (power curve) is obtained as shown in
figure. On analysing the polynomial curve and applying the Yerkes-Dodson Law
the optimal load and optimal performance of a resource is identified for each
month.


                                                            125
                             Optimal Performance


                                                                   n       n
                                                                                                 n
                                                                                                            
                                                                             `2
                                                                   P       P
                                                               n       `
                                                                                                    P
                                                                                                        α
Perfromance


                                                        n n1 n1                           n1      
                                                   A =  1 ` 1 `2 1 `3 
                                                       P P P 
                                                                                       B =  1 (α×`) 
                                                                                           P        
                                                       n    n    n
                                                                                          n        
                                                           P 2 P 3 P 4
                                                            `   `   `                            (α×`2 )
                                                                                                P
                              Optimal Load
                                                           1       1       1                    1

                  Load

(a) Yerkes Dodson Law (b) Matrix Table A                                              (c) Matrix Table B
Fig. 4: Structure of Power Curve for identifying the Optimal Load and Per-
formance and the Structure Initial load & performance matrix for running Poly-
nomial Regression Model

              Algorithm 5: Resource Second Order Polynomial Regression Model
                  Input: ` (Total Load) on each resource each month and α (Log(Average Service Time)) for
                          running the load ` per month
                  Output: Optimal Load L & Optimal Performance P
              1   Let A[3,3] & B[1,3] be 2 initial Matrix as shown in figure[4b & 4c]; k=0;
              2   repeat
              3       AInverse ←− T ranspose(A, 3); Transpose: Function transposing the matrix ;
                      Result ←− multiplyM atrices(AInverse , B); multiplyMatrices: Function for
                      multiplying matrix ;
              4       repeat
              5            β[i] ←− Result[i][j]; where β= Coefficient of Polynomial Equation
              6       until ((i=0 to 3) ∧ j=0)
              7       Polynomial Equation : β0 + β1 ` + β2 `2
              8   until (for each resource each unit time)


4.3                      Activity Servicing Time Model

Along with identification of load and performance of the resource preferable
for performing NPA, RBAM iner also finds the Activity Servicing Time (i.e.,
the average waiting time for an activity to be served by a resource), before
that resource is recommended. Since the interest is in finding the queue at
each resource, RBAM iner uses Single-Server Models (M/M/1):(GD/∞/∞) and
(M/M/1):(GD/N/∞). Here the model (M/M/1):(GD/∞/∞) describe (Arrival1 /
Departure2 / Server3 ):(Queue discipline4 / Max number in Queue5 / Source of
Calling6 ).

    Arrival1 (λ) is the rate at which the activities are arrived at each resources
and Departure2 (µ) is the rate at which the arrived activities are served. Since
RBAM iner is intended in identifying the average waiting time at each resource,
the single server model is applied. When data was analyzed for First Come First
Serve FCFS, Last Come First Serve LCFS and Service in Random Order SIRO,
it was understood that arrival of the activity was following General Discipline
GD as its Queue Discipline4 . As the number in queue and source of calling is
not defined RBAM iner marks them as infinity. The average waiting time in the


                                                        126
system Ws is identified using Equations [6- 9]. The Algorithm Activity Servicing
Time [6] starts with identifying the arrival rate λ and the servicing rate µ at
each resources.

    The λn & µn in generalized model is shown in Equation[6]. The traffic ρ:
number of activities arriving and getting served per unit time is shown in Equa-
tion[7]. Hence the Average waiting time in system Ls is given in Equation[9].
           )
    λn = λ                                                                λ
                    Where n = 0,1,2.... (6)                          ρ=                      (7)
    µn = µ                                                                µ

                           Ls                                              ρ
                    Ws =                     (8)                   Ls =                       (9)
                           λ                                              1−ρ


    Algorithm 6: To Discover the Activity Servicing Time
      Input: Set of resources:<, Trace:=, Duration of service:∂
      Output: Arrival λ, Service µ, Traffic ρ, Ls , Ws
 1    Let Arrival λ ∈ Load ` discovered on /Resource/month; Service µ ∈ service rate of λ; Π be
      No of Days in month
 2    if (if ((Π − =.Date) × 24hrs × 60Sec) ≥ =.∂ then Event is executed in same month;
      µ(<F iltered Year Month) ←− µ(<F iltered Year Month) + 1;
                   ((=.∂) − ((Π − =.Dt) × 24hrs × 60Sec))
 3    else ⊥ = d                                          e
                                Π × 24 × 60
      µ(<F iltered Year Month + ⊥) ←− µ(<F iltered Year Month + ⊥) + 1;
 4    Average Servicing Time in system ← Equation [6 to 9]


5      Experimental Analysis and Result

The αy M iner algorithm is evaluated by running it on CoSeLoG Project6 . The
experiments ExpN P A , ExpAST and ExpL&P was performed on the CoSeLoG
Municipality 2, which contains 645 cases and 376 activities. Experiments were
conducted and analysed on set of every 100 cases. αy M iner makes 4 assumption:
Any activity whose duration is recorded as 0 millisecond is considered as never
been executed, since the nanosecond time is not recorded, vocabulary of an
activity is not taken into account [1], don’t deal with Live or Dead locks and
assume that all process have same starting activity.


5.1      Design of Experiment

The αy M iner experimental set up is shown in Figure [5]. Where the log is first
cleaned and initialized using initializer from which the NPA is identified. Optimal
resource for performing NPA is identified and their behaviour is analysed. Finally
αy M iner recommends the best process and resource model.
6
    http://dx.doi.org/10.4121/uuid:26aba40d-8b2d-435b-b5af-6d4bfbd7a270


                                             127
Fig. 5: Illustration of Online Process Model
5.2 Recommendation of Next Probable Activity (NPA): ExpN P A

Experiment was simulated in the form of supervised learning, where the test
ExpN P A was conducted for every 100 cases and starting from 2nd activity of the
sequence. ExpN P A was analysed by comparing it with the actual path of execu-
tion. The result of this comparison is shown un Figure [6] and on analysis it is
studied that the percentage of error rate (marked by green line) in recommenda-
tion is lesser in later positions of execution when compared to earlier positions.
The ExpN P A achieved 72.8568% of efficiency. On analysing the graph, it is
understood that the behaviour of recommended path is always below the actual
path of execution. Inclination shows the huge difference of behaviour between
the actual and recommended path. For the cases 400 to 500, it is observed that
the graph don’t have red line, as the path of execution is critical and was ob-
served to take optimal time for completion. Hence this proves that αy M iner,
don’t recommend if the path of execution is observed to be optimal.


Fig. 6: Result of ExpN P A


                                     128
5.3   Recommendation of Resource capable for performing NPA:
      ExpAST

The ExpAST for each resource performing NPA. Waiting time of recommended
resource was compared with the actual resource and it was studied that their per-
formance was improved by 59.7303%. The Figure [7] show the result of ExpAST .
The ExpAST , discovered the better path of execution based on resource aver-
age service time and it is also understood αy M iner, don’t recommend if the
resources to whom the task is assigned is efficient in performing.


Fig. 7: Result of ExpAST


        100       200      300      400      500      600      Overall
560530 0.0178571 0.005128 0.005525 0.006329 0.005495 0.009009 0.000787
560598 0.1666667 0.083333 0.166667 0.333333 0.111111 0.090909 0.016949
560521 0.0714286 0.090909 0.083333 0.012987 0.052632 0.008621 0.004587
560532 0.0076336 0.005102 0.009009 0.007194 0.003279 0.005051 0.000517
4634935 0.1428571 0.083333 0.142857 0.043478 0.009709 0.016129 0.006329
560458 0.0069444 0.007519 -0.00115 0.00304 0.00625 0.006369 0.00036
560429 0          1        1        1        1        1        1
560528 0          1        0.5      1        0.5      1        0.166667
560519 0.0153846 0.009009 0.013699 0.01      0.007246 0.001605 0.001754
Table 4: Result of Average Waiting time for CoSeLoG project


5.4   Polynomial regression model: ExpL&P

The result of ExpL&P is shown in Table [5] and the Figure [8] shows the polyno-
mial curve. Using the law of Arousal, the optimal load and performance at each
resource can be identified. This result is used in making appropriate decision


                                    129
about resource behaviour and load assignments. Using the outcome of experi-
ment proper recommendations can be made, whether to assign the task to that
resource ot not.

                                14000                                                                                                                                                                                1200                                                                                                                                                                                                           600

                                                                                                                                                                                                                                                                                                                                                                y = -0.0043x2 + 1.2426x + 37.412
                                12000                                                                                                                                                                                                                                                                                                                                      R² = 0.0465
                                                                                                                                                          y = 0.0232x2 - 19.935x + 3468.5                            1000                                                                                                                                                                                                           500                                                                                       y = -0.0071x2 + 0.1691x + 106
                                                                                                                                                                    R² = 0.3719                                                                                                                                                                                                                                                                                                                                                        R² = 0.0471
                                10000
                                                                                                                                                                                                                     800
        Total Time/Total Load


                                                                                                                                                                                                                                                                                                                                                                                                     Total Time/ Total Load
                                                                                                                                                                                                                                                                                                                                                                                                                                    400


                                                                                                                                                                                            Total Time/ Total Load
                                 8000
                                                                                                                                                                                                                     600
                                                                                                                                                                                                                                                                                                                                                                                                                                    300
                                 6000                                                                                                                                    560458                                                                                                                                                                                                   560519                                                                                                                                                                      560521
                                                                                                                                                                         Poly. (560458)                              400                                                                                                                                                          Poly. (560519)                                                                                                                                                              Poly. (560521)
                                 4000                                                                                                                                                                                                                                                                                                                                                                                               200
                                                                                                 Optimal Performance                                                                                                                                                                                            Optimal Performance                                                                                                                                          Optimal Performance
                                                                                                                                                                                                                     200                                                  7.132075364                                                   60.08506375                                                                                                      24.30951968                                     19.23434689
                                 2000
                                                                                    89.27518988                                                                                                                                                                                                                                                                                                                                     100
                                                                                                                 2.783782314
                                       0                                                                                                                                                                                                  0
                                           0   50   100   150     200     250   300    350       400    450     500    550     600     650   700    750     800                                                                                  0        25    50        75     100    125    150      175      200     225      250     275    300   325   350      375
                                                                                                                                                                                                                                                                                                                                                                                                                                           0
                                 -2000                                                           Optimal Load                                Load                                                                                                                                                    Optimal Load                                                                                                                              0   5   10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105 110 115 120 125 130 135
                                                                                                                                                                                                                     -200
                                                                                                                                                                                                                                                                                                                                                             Load                                                                                                        Optimal Load                           load


                                800                                                                                                                                                                                                      3000
                                                                                                                                                                                                                                                                                                                                                                                                                                           600


                                                                                                                                                                                                                                                                                                                                                                                                     Performance= Total Time/ Total Load
                                700                                                                                                                                                                                                                                                                                                                                y = 0.0059x2 - 4.9853x + 1028.9
                                                                                                                                                          y = 0.0007x2 - 0.935x + 317.12                                                 2500                                                                                                                                R² = 0.2758                                                   500                                                                               y = -0.0658x2 + 7.8119x + 7.2516
                                                                                                                                                                   R² = 0.1996                                                                                                                                                                                                                                                                                                                                                         R² = 0.2151
                                600
                                                                                                                                                                                                                                         2000                                                                                                                                                                                              400
Total Time/ Total Load


                                500                                                                                                                                                                                   Total time/ total load
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   Optimal Performance
                                                                                                                                                                                                                                                                               180.5425777                                                                                                                                                                                  361.111102
                                400                                                                                                                                                                                                      1500                                                                                                                                                                                              300
                                                                                                                                                                         560530                                                                                                                                     85.64472927                                                   560532
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         210.9521607                                   4634935
                                                                                                                                                                                                                                                                                         Optimal Performance
                                300                             54.11534808                                                                                              Poly. (560530)                                                                                                                                                                                           Poly. (560532)
                                                                                                                42.39623095                                                                                                              1000                                                                                                                                                                                              200                                                                                                         Poly. (4634935)
                                                                                Optimal Performance
                                200
                                                                                                                                                                                                                                               500                                                                                                                                                                                         100
                                100

                                                                                                                                                                                                                                                 0                                                                                                                                                                                             0
                                  0                                                                                                                                                                                                                   0    25   50   75    100 125 150 175 200 225 250 275 300 325 350 375 400 425 450 475 500                                                                                                     0     10    20      30   40       50      60     70    80    90     100 110 120 130
                                       0       50   100   150       200     250       300    350       400       450    500      550      600      650      700                                                                                                                          Optimal Load

                                -100
                                                                                  Optimal Load
                                                                                                                                         Load                                                                                                  -500
                                                                                                                                                                                                                                                                                                                                                                                                                                           -100                                       Optimal Load
                                                                                                                                                                                                                                                                                                               Load                                                                                                                                                                                                              Load


Fig. 8: Result of ExperimentLoad&P erf ormance


              Resources No of load total time R2     Load Range Performance Range
              4634935 744          124351.8   0.2151 50-70      210.9521607 - 361.111102
              560458    7838       1161852    0.3719 300-600    2.783782314 - 89.27519
              560519    4809       390477.9   0.0465 125-250    7.132075 - 60.08506
              560521    1475       92446.43   0.0471 30-70      19.23435 - 24.30952
              560530    11140      905091.6   0.1996 250-400    42.39623 - 54.11535
              560532    7817       1221671    0.2758 150-250    180.5426 - 85.64473
Table 5: Result of Polynomial Regression for CoSeLoG project


6                                                   Conclusion

αy M iner provided a solution for recommending an optimal path of execution:
NPA along with the complete process model and resource preferable for perform-
ing NPA. αy M iner is a analytical tool which gave solution for real time busi-
ness process execution, by analysing the process and resource behaviour. The
Experimental result shows 72% of optimization in process execution and 59%
improvement in the behaviour of resource based on their Average waiting time,
load and performance. αy M iner was successful in recommending appropriate
process and resource model for the real time process.


References
1. Joos CAM Buijs, Boudewijn F van Dongen, and Wil MP van der Aalst. Towards
   cross-organizational process mining in collections of process models and their exe-
   cutions. In Business Process Management Workshops, pages 2–13. Springer, 2012.


                                                                                                                                                                                                                                                                                                     130
2. Justus Klingemann, Jurgen Wasch, and Karl Aberer. Deriving service models in
   cross-organizational workflows. In Research Issues on Data Engineering: Informa-
   tion Technology for Virtual Enterprises, 1999. RIDE-VE’99. Proceedings., Ninth
   International Workshop on, pages 100–107. IEEE, 1999.
3. Diimitrios Georgakopoulos, Mark Hornick, and Amit Sheth. An overview of work-
   flow management: From process modeling to workflow automation infrastructure.
   Distributed and parallel Databases, 3(2):119–153, 1995.
4. Gustavo Alonso, Divyakant Agrawal, Amr El Abbadi, and Carl Mohan. Func-
   tionality and limitations of current workflow management systems. IEEE Expert,
   12(5):105–111, 1997.
5. Asuman Dogac. Workflow management systems and interoperability. Number 164.
   Springer Science & Business Media, 1998.
6. Andrzej Cichocki. Workflow and process automation: concepts and technology.
   Springer Science & Business Media, 1998.
7. Wil Van Der Aalst. Process mining: discovery, conformance and enhancement of
   business processes. Springer Science & Business Media, 2011.
8. J.C.A.M.; Buijs. Environmental permit application process (wabo), coselog project,
   2014.
9. Joyce Nakatumba and Wil MP van der Aalst. Analyzing resource behavior using
   process mining. In Business Process Management Workshops, pages 69–80. Springer,
   2010.


                                      131