-

No of Activities No of Resources No of Traces No of Variants PM

10.4121/uuid:26aba40d-8b2d-435b-b5af-6d4bfbd7a270

An Optimal Process Model for a Real Time Process

Likewin Thomas

Manoj Kumar M V

manojmvg@nitk.ac.in 0

Annappa B

annappa@ieee.org 0

Vishwanath K Py

shastryvishwanath@gmail.com 0 0 Department of Computer Science and Engineering

1 8 117 131

Recommending an optimal path of execution and a complete process model for a real time partial trace of large and complex organization is a challenge. The proposed AlfyMiner ( yM iner) does this recommendation in cross organization process mining technique by comparing the variants of same process encountered in di erent organization. yM iner proposes two novel techniques Process Model Comparator ( yComp) and Resource Behaviour Analyser (RBAMiner). yComp identi es Next Probable Activity of the partial trace along with the complete process model of the partial trace. RBAMiner identi es the resources preferable for performing Next Probable Activity and analyse their behaviour based on performance, load and queue. yM iner does this analysis and recommend the best suitable resource for performing Next Probable Activity and process models for the real time partial trace. Experiments were conducted on process logs of CoSeLoG Project1 and 72% of accuracy is obtained in identifying and recommending NPA and the performance of resources were optimized by 59 % by decreasing their load.

Cross Organization Process Mining Resource Behavior Best Resource Polynomial Regression Model Resource Performance Resource Load Resource Queue Average Waiting Time

In the current world where the resources are being shared among di erent organization through the cloud computing paradigm, most of the organizations have started to shift towards Shared Business Process Management Infrastructure (SBPMI). Due to this shift in modelling paradigm, organizations have to continuously improve their process [1]. But most of the organizations are still depending on the external service providers to monitor their business process, hence the business links are to be established with those external agencies [ 2 ]. This issue was well addressed by the Information Technology by developing various work- ow tools [ 3 ] [ 4 ] [ 5 ] [ 6 ]. The challenge here is to extend the service from boundary of single organization to cross organizations.

Due to data explosion [ 7 ] getting insight and performing analysis on the data to understand their behaviour and discover an optimized process model is always been a challenge to any organization in the process mining environment. y M iner uses SBPMI, to analyse the data behaviour of an organization. This is achieved by comparing the model of same variant using RBAMiner in SBPMI and recommending the best suitable process model. The context of this paper is the CoSeLoG Project2. The data used for the experiment and analysis of proposed algorithm is obtained from the Con gurable Services for Local Government (CoSeLoG) Project. This project was executed under Dutch Organization for Scienti c Research (NWO) [ 8 ].

y M iner is a new analytical tool for discovering the optimal path of completion of a partial trace along with recommendation of complete process model. It proposes two novel techniques y Comp and RBAMiner. y Comp identi es the optimal path of completion by matching the partial trace and discovering the variants in all process models logged in the repository. It identify and recommends the Next Probable Activity (NPA) of partial trace. RBAMiner identi es the suitable resource for performing the discovered NPA, by analysing the behaviour of all resources capable of performing NPA based on their performance, load and waiting time.

y M iner is analysed using the running example [ 2 ]. NPA for the partial trace and optimal process model is identi ed in cross organization environment using y Comp [ 3 ] and the resource preferable for performing NPA is analysed and recommended using RBAMiner [ 4 ]. The experiment is conducted using the real time event log of CoSeLoG Project3 and the result of RBAMiner is presented in section [ 5 ]. 2

Running Example The proposed y M iner is illustrated using the running example of four variant process model containing 9 activities, shown in Figure[1b]. The corresponding sample event log describing the process execution of the process model is shown in Table[1]. Here the traces matches model perfectly which is not the cases in real life process model. The complete log le of the running example can be 2 http://dx.doi.org/10.4121/uuid:26aba40d-8b2d-435b-b5af-6d4bfbd7a270 3 http://dx.doi.org/10.4121/uuid:26aba40d-8b2d-435b-b5af-6d4bfbd7a270 found at Process Mining @ NITK 4. The experimental results are obtained using the CoSeLoG Project5. 2.1

Proposed Problem

Consider an online process shown in Figure[1a], the dotted line shows the path of execution of the online process. Sub-scripted values at each activities are the sequence of occurrence of the activities ( A1 ! B2 ! C3). At activity C 3, decision has to be taken about which next activity to be performed, either D or E. y M iner identify the NPA and recommends the suitable resource for performing NPA.

(1) (2) (3) D (4)

E F G H (Next Probable Activity) (a) Illustration of Online Process Model Fig. 1: Running Example (b) Process Models: Four variants of interview process (registration (A), validity check (B), document check (C), information check (D), decide (E), interview (I), group discussion (G), result (H) and re-initiates (F)) 3

Alfy Miner (

y M iner) y M iner is intended to identify and predict the optimal path of execution along with the complete process model, for a real time process. On identifying the currently executing activity Ai, y M iner recommends the optimal path of completion and the best suitable process model matching the partial trace with same variant event logs, logged in the process model repository. On identifying the matched variants, the optimal process models are identi ed by running process model comparator y Comp which matches the partial trace. Recommendation of Next probable Activity NPA is done by selecting NPA (Ai) in identi ed suitable process model. The Algorithm [1] gives the execution steps of y M iner. 4 http://http://processminingnitk.blogspot.in=2015=03=best-resourcerecommendation-for.html 5 http://dx.doi.org/10.4121/uuid:26aba40d-8b2d-435b-b5af-6d4bfbd7a270 (a) Event Log of Process Model 1

Case ID

B450320 28=01=14 B630450 19=02=14 B530640 29=04=14 B530640 19=04=14 B630450 23=06=14 C630450 31=01=14 C221210 22=02=14 C630450 02=05=14 C230410 02=05=14 C221210 29=06=14 D23640 15=02=14 E12350 09=03=14 D12350 15=05=14 E720560 15=05=14 D23640 15=07=14 G720560 19=02=14 I631210 26=03=14 G771620 19=05=14 D23640 16=05=14 G721560 27=07=14 H631250 26=02=14 H631250 28=03=14 E720560 26=05=14 G771620 18=05=14 E771620 09=08=14 H631250 05=07=14 H631250 27=10=14 H631250 08=06=14 H631250 20=05=14 I641210 16=08=14

Duration

33 30 37 54 72 H631250 23=08=14

Duration 31 44 57 35 65 (b) Event Log of Process Model 2 Case ID

TRACE duration3. Each cell in trace, shows the activity of the trace, Resource (Superscripted) and the time of occurrence of that activity (sub-scripted). 1 3 4 5 2 repeat

MatchV ar

y Comp Set(NPA)

Algorithm 1:

y M iner

Input: Partial Real Time Trace Output: NPA & Process Model Develop Process model repository;

Call Match Variant(Ai);

y Comp (MatchV ar ) ;

InOutBinding (C-Net ) 6 until for each currently executing activity Ai 3.1

Process Model: Casual Net

y M iner uses Casual Net: C-Net notation to represent the process model. CNet is a six-tuple: fA,D,ai,ao,I,Og representation of process model with A: set f g , I : fSet of Input Binding of activitiesg, D :fSet of Dependenciesg, ai:fSet of Start activitiesg, ao: Output activities g , O : fSet of Output Bindingg.

Set of f C-Net for all the four process model of the running example is shown in Figure 2. The repository of process model is maintained for analysing process behaviour.

Process Model 1 Process Model 2 aAaDIoi ===== {{{{{I(HAA(AA}},,B)B:){,,NC(Bu,lD,lC},),E,I((,CBG,)D,:AH),,}(IC(C,E):)B, ,(DI(,DG)):,C(,GI,(HE)),:C(E,,IH(G)}):D, I(H):{G,E}} aAaDIoi ===== {{{{{I(HAA(AA}},,B)B:){,,NC(Bu,lD,lC},),E,I((,CBG,)D,:AI),,, H(IC(}C,E):)B, ,(DI(,DG)):,C(,EI,(IE),)(:CG,,HI()G,)(:ID,H,)I}(H):{G,E}} O = {O(A): B, O(B):C, O(C):{D,E}, O(D):G, O(G):H, O(E):H, O(H):{Null} O = {O(A): B, O(B):C, O(C):{D,E}, O(D):G, O(G):H, O(I):H, O(H):{Null}

Process Model 3 A = { A, B, C, D, E, F, G, I, H} D = {(A,B), (B,C), (B,D), (C,E), (D,E), (E,F), (E,I), (E,G), (F,B), (I,H), (G,H)} ai = {A} ao = {H} I = {I(A):{Null}, I(B):{A,F} I(C):B, I(D):B, I(E):{C,D}, I(F):E, I(I):E, I(G):E, I(H):{I,G}} O = {O(A): B, O(B):{C,D}, O(C):E, O(D):E, O(E):{I,G,F}, O(F):B, O(I):H, O(G):H, O(H):{Null}

Process Model 4 A = { A, B, C, D, E, F, G, I, H} D = {(A,B), (A,C), (A,D), (B,E), (C,E), (D,E), (E,F), (F,B), (F,C), (F,D), (E,G), (E,I), (G,H), (I,H)} ai = {A} ao = {H} I = {I(A):{Null}, I(B):{A,F}, I(C):{A,F}, I(D):{A,F}, I(E):{B,C,D}, I(F):E, I(G):E, I(I):E, I(H):{G,I}}

O = {O(A): {B,C,D}, O(B):E, O(C):E, O(D):E, O(E):{F,G,I}, O(F):{B,C,D} O(I):H, O(G):H, O(H):{Null} Fig. 2: C-Net Representation of process Model in Figure 1b 3.2

Matching variants with Path Detector

When an online process is getting executed, identifying to which variant the currently executing trace belongs is a challenge for yM iner. Algorithm Variant Match[ 2 ] identify the path of execution along with the set of possible NPA. VariantMatch uses the concept of linked list with 2 nodes: Cell Node and Variant Node which are represented as class. Cell Node = ff rom1 Sf ag, to2 a, value3 fj a ! aj g, count4 = j a ! aj 2 . Variant Node f*matrix (address of Cell Node), *prev2 *next3 (address of next and previous Cell Node)g. The Cell Node Figure[3a] stores the information of trace A!B!C!E!F!B!D!E!G!H of process model 2. The value3 eld remains 1 till the sequence in trace appears rst time. On identifying the loop, value in value3 led is updated to 2 as shown at Cell Node with memory 500 in Figure[3a]. Value3 eld is an array and stores the value 1,2 to indicate the sequence B!C is appearing second time in the trace.Count3 is a counter of the sequence appearance in the trace. Variant Node Figure[3b] stores the information of all the variants. This is used while comparing the online sequence with the variants. If a variant matches the sequence, then that variant is retained else it is deleted from the linked list. 3.3

Process Model Comparator ( yComp) yComp compares the C-Net of all the variants in cross organization environment based on following comparison metrics. 1. Process Model Metric: Compare total number of activities, resources, traces and variants 2. Relation Metric: Compare total number of parallel, serial activities and loops. 050 From A To B Value 1 Count 1 *Next 100

Null *Matrix *Prev

Null

First Cell Node Second Cell Node Third Cell Node (b) Structure of Variant Node for the

set of Cell Node of process model 2

Algorithm 2: Matching the Variants: V ariantMatch() 11 12 until for each activity in online process 10

Input: Online process

Output: Matching matrix 1 Match Variant() struct variantNode?gvn, ?tempvn; (gvn : address of linked list say globle Variant Node), Let ?gvn gives address of the double linked list, Initialize all counter in cellNode ! 0; 2 repeat 3 ?tempvn &gvn Get the address of the double linked list ; 4 repeat 5 ?tempcn &matrix Get the address of the matrix ; 6 tempcn!from = sequence[i] ^ tempcn!to = sequence[i+1]; 7 if not found then Delete current variantNode from double linked list and go to 5 8 else Increment the member variable count; 9 if count == val[count] (Current and previous check are passed) then Go to next!variantNode in the double linked list and go to step 5 else Delete the current!variantNode from the double linked list and go to 5 until ?next in double linked list is null 13 Remaining variantnode present in tempvn are all matched variant table for the given sequence. 3. Complexity Metric: Compare total number of split and join. 4. Service Time Metric: Compare the queue time for each activity. 5. Fitness Metric: Running tness test along with the time of completion and valid no of sequence in each event log.

Process Model Metric The process model comparison is done based on No of fActivities, Resources, Traces & Varinats g and is shown in Table 2a.

Relation Metric

y Comp analysed that if a model has more parallel relation it performs well when compared to serial relation, at the same time if the loop is increased the consumption of execution time also increases. Parallel relation is identi ed by Equation 4 in De nition 1. Loops are identi ed by Equation 5. De nition 1. Log based ordering relation Let A = [a, b, c, d, e] be the set of activities and let L be the simple event log

and Let A be aith activity and B be ait+h1 then, DirectlyF o_llow(a>Lb)

Casualit_y(a !Lb) U nrelat_ed(a#Lb)

P arall_el(akLb) Loop(a >_Lb>La) 9 trace

= ht1; t2; :::; tni ^ fi fi fi fi i 2 [1; 2; :::::; n

Complexity Metric Complexity metric identi es the joins and splits in the process model. Joins and split are identi ed using the result of output and input binding. Consider the Figure[1b] where for process model 1: O(A)=fBg=85 times, similarly the split fCDEg = 20, its means 20 times activity C is 20 times followed by both D and E, join fGEHg is joined 16 times. Using this information complexity metric shown in Table[2c] is developed.

Service Time Metric This metric gives the total service time comparison for an activity in each model. This comparison helps in identifying the model serving an activity with less service time. The service time is calculated by Pie=ac1h cases duration(Ai), where Ai A (set of activities). The sample output in seconds is shown in Table 2d.

Fitness Metric This gives the numbers of traces that can be successfully run on the model. This is helpful in deciding how e cient the model is, in running the trace. y Comp identi es the model which runs maximum number of traces with minimum time. Consider the Table 2e. 3.4

Binding Relation

On identifying variants following the partial trace, the NPA of currently executing activity Ai is identi ed using binding relation which bind the incoming and outgoing activity of Ai. Algorithm 3 eplain the concept of binding relation, where for each trace in a case, if an activity A is followed by B, then A.outbond B ^ B.inbound A, i.e., A has out-bounding relationship with B and similarly B as in-bounding relationship with A (a) Process Model Metric 1 InOutBinding() Input: Ai, RTrace

Output: Ai:InputBinding; Ai:OutputBinding 2 repeat 43 if (jaa.>OLutbbjo)utnhden b ^ b:Inbound a 5 ja >L bj = P L( ) jf1 i < j j j (i) = a ^ (i + 1) = bgj [see [ 7 ]]

2L 6 until for each sequence in trace in event log L 4

Resource Behaviour Analyser (RBAM iner) y M iner on discovering suitable process model with NPA identi es the resources preferable for performing NPA. Set of resource preferable for performing NPA is identi ed using Activity/Resourcerep[ 3 ]. RBAMiner analyse the behaviour and recommend the suitable resource for performing NPA. Behaviour of the resources is analysed based on 3 parameter: Performance, Load and Queue using polynomial regression model for load and performance [4.2] and Average Servicing Time at resource using queue model [4.3]. Algorithm 4 explains the concept of resource behaviour analysis. y M iner identi es the list of resources performing an activity in entire process log along with the time consumed by them for performing that activity. The Table 3 gives representational view of list of resources performing an activity in process model 1 along with the time consumed.

Algorithm 4: RBAMiner

1 RBA(NP A)()

Input: NP A&BestResActivity

Output: RecommendationofRes(NP A) 2 repeat 3 Load(Res(NP A)) P oly:Load(Load(Res(NP A))); [see algo5] 4 P erf(Res(NP A)) P oly:P erf(Res(NP A)); [see algo5] 5 AvgW aitingT ime(Res(NP A)) Queue(Res(NP A)); [see algo 6] 6 until (for each resource of NP A in BestResActivity Table) 7 Recommend the optimal load, performance and waiting time resource

Resource load & performance analyser

The Yerkes-Dodson Law of Arousal, also known as Arousal Theory, states that by increasing arousal, the workers performance can be improved. However, if the level of arousal increases too much, performance decreases Figure[4a] [ 9 ]. The RBAMiner identi es the level of arousal : Optimal Load i.e., the maximum load the resource can handle e ciently, along with its performance using polynomial regression model. Performance is a ratio of Total time taken by Load. The performance was analysed by increasing the load and observing the time taken. It was observed that, as the load was increased, the consumption of the time was decreasing. But at some point there was a drift and the time consumption started increasing. That drifted point is known as Arousal (optimal load and performance of the resources). The Algorithm[ 5 ] identi es the load ` and performance [T otal time `] for /resource/unit time.

The RBAMiner rst lters the unperformed load 1 (an activity with 0 ms) and residual load 2 (an activities with exceptional duration). Then the actual load (`) and average time of Service ( ) of each worker each month is identied. Polynomial regression model[ 5 ] is applied on this cleaned data. Since the RBAMiner is intended in identifying the second degree regression model, the regression model initialize a 3 3 matrix (A) and 3 1 matrix (B) as shown in gure [4b& 4c]. Then the transpose of matrix A is multiplied with matrix B. The result obtained is the coe cient of polynomial equation. On applying the load on an equation the polynomial curve (power curve) is obtained as shown in gure. On analysing the polynomial curve and applying the Yerkes-Dodson Law the optimal load and optimal performance of a resource is identi ed for each month. Optimal Load 2 n Pn ` Pn `2 3

1 1 A = 66 Pn ` Pn `2 Pn `3 77 6 1 1 1 7 4 Pn `2 Pn `3 Pn `4 5 1 1 1 Along with identi cation of load and performance of the resource preferable for performing NPA, RBAMiner also nds the Activity Servicing Time (i.e., the average waiting time for an activity to be served by a resource), before that resource is recommended. Since the interest is in nding the queue at each resource, RBAMiner uses Single-Server Models (M/M/1):(GD/1/1) and (M/M/1):(GD/N/1). Here the model (M/M/1):(GD/1/1) describe (Arrival1/ Departure2/ Server3):(Queue discipline4/ Max number in Queue5/ Source of Calling6).

Arrival1 ( ) is the rate at which the activities are arrived at each resources and Departure2 ( ) is the rate at which the arrived activities are served. Since RBAMiner is intended in identifying the average waiting time at each resource, the single server model is applied. When data was analyzed for First Come First Serve FCFS, Last Come First Serve LCFS and Service in Random Order SIRO, it was understood that arrival of the activity was following General Discipline GD as its Queue Discipline4. As the number in queue and source of calling is not de ned RBAMiner marks them as in nity. The average waiting time in the (7) (9) be system Ws is identi ed using Equations [ 6- 9 ]. The Algorithm Activity Servicing Time [ 6 ] starts with identifying the arrival rate and the servicing rate at each resources.

The n & n in generalized model is shown in Equation[ 6 ]. The tra c : number of activities arriving and getting served per unit time is shown in Equation[ 7 ]. Hence the Average waiting time in system Ls is given in Equation[ 9 ]. n = n = )

Where n = 0,1,2.... (6) Ws =

Ls (8)

= Ls = 1 Algorithm 6: To Discover the Activity Servicing Time

Input: Set of resources:<, Trace:=, Duration of service:@

Output: Arrival , Service , Tra c , Ls; Ws

1 Let Arrival 2 Load ` discovered on /Resource/month; Service 2 service rate of ;

No of Days in month 2 if (if (( =:Date) 24hrs 60Sec) =:@ then Event is executed in same month; (<F iltered Year Month) (<F iltered Year Month) + 1; 3 else ? = d ((=:@) (( =:D2t4) 6204hrs 60Sec)) e

(<F iltered Year Month + ?) (<F iltered Year Month + ?) + 1; 4 Average Servicing Time in system Equation [6 to 9] 5

Experimental Analysis and Result The yM iner algorithm is evaluated by running it on CoSeLoG Project6. The experiments ExpNP A, ExpAST and ExpL&P was performed on the CoSeLoG Municipality 2, which contains 645 cases and 376 activities. Experiments were conducted and analysed on set of every 100 cases. yM iner makes 4 assumption: Any activity whose duration is recorded as 0 millisecond is considered as never been executed, since the nanosecond time is not recorded, vocabulary of an activity is not taken into account [1], don't deal with Live or Dead locks and assume that all process have same starting activity. 5.1

Design of Experiment

The yM iner experimental set up is shown in Figure [ 5 ]. Where the log is rst cleaned and initialized using initializer from which the NPA is identi ed. Optimal resource for performing NPA is identi ed and their behaviour is analysed. Finally yM iner recommends the best process and resource model. Fig. 5: Illustration of Online Process Model 5.2 Recommendation of Next Probable Activity (NPA): ExpNP A Experiment was simulated in the form of supervised learning, where the test ExpNP A was conducted for every 100 cases and starting from 2nd activity of the sequence. ExpNP A was analysed by comparing it with the actual path of execution. The result of this comparison is shown un Figure [ 6 ] and on analysis it is studied that the percentage of error rate (marked by green line) in recommendation is lesser in later positions of execution when compared to earlier positions. The ExpNP A achieved 72.8568% of e ciency. On analysing the graph, it is understood that the behaviour of recommended path is always below the actual path of execution. Inclination shows the huge di erence of behaviour between the actual and recommended path. For the cases 400 to 500, it is observed that the graph don't have red line, as the path of execution is critical and was observed to take optimal time for completion. Hence this proves that yM iner, don't recommend if the path of execution is observed to be optimal. 5.3

Recommendation of Resource capable for performing NPA:

ExpAST The ExpAST for each resource performing NPA. Waiting time of recommended resource was compared with the actual resource and it was studied that their performance was improved by 59.7303%. The Figure [ 7 ] show the result of ExpAST . The ExpAST , discovered the better path of execution based on resource average service time and it is also understood yM iner, don't recommend if the resources to whom the task is assigned is e cient in performing. The result of ExpL&P is shown in Table [ 5 ] and the Figure [ 8 ] shows the polynomial curve. Using the law of Arousal, the optimal load and performance at each resource can be identi ed. This result is used in making appropriate decision ab out resource b ehaviour and load ment prop er recommendations can resource ot not. the outcome of exp eriassign the task to that 80 70 60 ad50 /lieoaLTTm30 ltao40 toT20 -11000 0 y M iner provided a solution for recommending an optimal path of execution: NPA along with the complete process model and resource preferable for performing NPA. y M iner is a analytical to ol which gave solution for real time business pro cess execution, by analysing the pro cess and resource b ehaviour. The Exp erimental result shows 72% of optimization in process execution and 59% improvement in the b ehaviour of resource based on their Average waiting time, load and performance. y M iner was successful in recommending appropriate process and resource model for the real time pro cess.

Jo os CAM Buijs, Boudewijn cross-organizational pro cess cutions. In Business Process Towards their exeger, 2012.

Justus

Klingemann , Jurgen Wasch, and

Karl

Aberer . Deriving service models in cross-organizational work ows . In Research Issues on Data Engineering: Information Technology for Virtual Enterprises , 1999 . RIDE-VE' 99 . Proceedings ., Ninth International Workshop on, pages 100 { 107 . IEEE, 1999 .

Diimitrios

Georgakopoulos ,

Mark

Hornick , and

Amit

Sheth . An overview of workow management: From process modeling to work ow automation infrastructure . Distributed and parallel Databases , 3 ( 2 ): 119 { 153 , 1995 .

Gustavo

Alonso , Divyakant Agrawal, Amr El Abbadi, and

Carl

Mohan . Functionality and limitations of current work ow management systems . IEEE Expert , 12 ( 5 ): 105 { 111 , 1997 .

Asuman

Dogac . Work ow management systems and interoperability . Number 164 . Springer Science & Business Media , 1998 .

Andrzej

Cichocki . Work ow and process automation: concepts and technology . Springer Science & Business Media , 1998 .

Wil

Van Der Aalst . Process mining: discovery, conformance and enhancement of business processes . Springer Science & Business Media , 2011 .

8. J.C.A.M.; Buijs . Environmental permit application process (wabo ), coselog project , 2014 .

Joyce

Nakatumba and Wil MP van der Aalst . Analyzing resource behavior using process mining . In Business Process Management Workshops , pages 69 { 80 . Springer, 2010 .