=Paper=
{{Paper
|id=Vol-1592/paper10
|storemode=property
|title=Calculating the Number of Unique Paths in a Block-Structured Process Model
|pdfUrl=https://ceur-ws.org/Vol-1592/paper10.pdf
|volume=Vol-1592
|authors=Gert Janssenswillen,Benoît Depaire,Toon Jouck
|dblpUrl=https://dblp.org/rec/conf/apn/JanssenswillenD16
}}
==Calculating the Number of Unique Paths in a Block-Structured Process Model==
Calculating the Number of Unique Paths in a Block-Structured Process Model Gert Janssenswillen1,2 , Benoît Depaire1 , and Toon Jouck1 Hasselt University, Agoralaan Bldg D, 3590 Diepenbeek, Belgium Research Foundation Flanders (FWO), Egmontstraat 5, 1060 Brussels, Belgium gert.janssenswillen, benoit.depaire, toon.jouck@uhasselt.be Abstract. Estimating the number of execution paths in a process model is a non-trivial task as one runs quickly into an combinatorial explosion of possible paths. This paper introduces a new algorithm to calculate the number of different execution paths for finite-behavior block-structured models in a computationally efficient way. Block functions are defined for the workflow constructs sequence, parallel, exclusive choice and finite loops, such that the amount of behavior in each block-construct can be computed efficiently. Subsequently, the block-structuredness of the model is exploited to efficiently calculate the number of unique paths in the model. The algorithm has been implemented for process trees, although the translation to other modeling notations is straightforward. An empirical analysis showed that the run-time of the algorithm is very low, and only slightly impacted by the complexity of the model. Keywords: Process Modeling, Process Mining, Process Trees, Process Model Complexity 1 Introduction In the field of process mining and process modeling, there are certain situations where it is desirable to quantify the amount of behavior of a process model. For example, if one wants to quantify the variance of behavior present in a process model. Related to this, the precision measure for discovered process models, expresses to what extent the behavior in the model does not exceed the behavior in the log [3]. Similarly, the implicit realism measure [4] uses the number of unique paths in a model to calculate the probability that a certain amount of behavior from the model did not show up in the log. The amount of behavior in a model can moreover be used as a proxy for model complexity. As it can be computationally hard to compute the amount of behavior, several metrics to calculate model complexity use proxies instead. [6]. Determining the amount of behavior in a process model, which we quantify in this paper as the number of unique (execution) paths, is a challenging task. One could naively traverse the process model recursively and count the number of unique paths, but this quickly becomes computationally infeasible due to a combinatorial explosion of different (parallel) paths. 138 The main idea presented in this paper is to compute the number of unique paths in a block-structured finite-behavior process model in a more intelligent and computationally efficient way. As we will show, this is possible by exploiting the block-structuredness of the model. In this paper, we make the following contributions: – We construct a block function, which calculates the number of unique paths, for each of the following process constructs: sequence (→), exclusive choice (×), parallelism (∧) and structured finite loops (k ). – We develop a generic approach to determine the total amount of behavior in a block-structured finite-behavior process model. – We provide an implementation for process trees. Section 2 describes the general approach used by the algorithm, while in Section 3 the implementation is elaborated upon. The performance of the tech- nique in terms of run-time is discussed in Section 4 while Section 5 concludes the paper. 2 General approach In this section, the formal approach of the calculation will be described. First, some assumptions will be make regarding the types of models taken into account. In the subsequent paragraphs, different block functions for each of the specific operator type will be defined. Finally, some limitations to the formal approach will be pointed out, together with work-arounds in order to solve them. 2.1 Assumptions and used notations It is important to keep in mind that we impose two restrictions on the pro- cess models. Firstly, we assume finite-behavior models, since it would otherwise make no sense to determine the number of unique traces. Consequently, loops in our models have a maximum number of repetitions. While this appears very restrictive, this can be justified by accepting a so-called fairness assumption, which states that a task of a process cannot be postponed indefinitely. This assumption therefore rules out infinite behaviors that are considered unrealis- tic [1]. Secondly, we assume that models are block-structured, i.e. they can be decomposed in properly nested subprocesses. For the development and discussion of our approach, we will use the process tree notation, because process trees are block-structured by definition. However, the ideas in this paper are applicable to other notation languages as long as the models are block-structured and finite in behavior. We formally define a finite-behavior Process Tree, which is largely based on the definition in [2], as follows: Definition 1. Let A be the activity alphabet and A ⊆ A be a finite set of activ- ities,then P T = (N, r, m, c) is a process tree such that: 139 – N is a non-empty finite set of nodes consisting of operator (NO ) and leaf nodes (NL ) such that: NO ∩ NL = ∅ – r ∈ NO is the root node of the tree – O = {→, ×, ∧, k , ∨}, the set of operator types. – m : N → A ∪ O is a mapping function mapping each node to an operator or activity: a ∈ A ∪ {τ }, if n ∈ NL . m(n) = o ∈ O, if n ∈ NO . – c : N → N ∗ is the direct-child-relation function: c(n) = if n ∈ NL c(n) ∈ N + if n ∈ NO such that • each node except the root node has exactly one parent: ∀n ∈ N \{r} : ∃p ∈ NO : n ∈ c(p) ∧ q ∈ NO : p = q ∧ n ∈ c(q); • the root node has no parent: n ∈ N : r ∈ c(n); • each node appears only once in the list of children of its parent: ∀n ∈ N : ∀1≤i