<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Error Metrics for Business Process Models</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jan Mendling</string-name>
          <email>jan.mendling@wu-wien.ac.at</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gustaf Neumann</string-name>
          <email>neumann@wu-wien.ac.at</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Vienna University of Economics and Business Administration Augasse 2-6</institution>
          ,
          <addr-line>1090 Vienna</addr-line>
          ,
          <country country="AT">Austria</country>
        </aff>
      </contrib-group>
      <fpage>53</fpage>
      <lpage>56</lpage>
      <abstract>
        <p>Little research has been conducted so far on causes for errors in business process models. In this paper we investigate on how mainly domain independent factors such as the size or complexity of models influence errors observed in a wide range of existing business process models. In particular, we provide a set of six metrics presumably related to the comprehensibility of both the process model structure and the process state space, and discuss their capability to predict errors in the SAP reference model. The results show that already the three metrics size, separability, and structuredness suffice to achieve a high Nagelkerke R2 value of 0.853 demonstrating a good predictive efficacy.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Up to now there has been little research on why people introduce errors in
business process models in practice. In a more general context, Simon [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] points to
the limited cognitive capabilities and concludes that humans act rationally only
to a limited extent. Related to modeling errors, this argument would imply that
human modelers loose track of the interrelations of large and complex models
due to their limited cognitive capabilities, and then introduce errors that they
would not insert in a small model. A recent study provides first evidence for
this hypothesis [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Before we can test such a hypothesis appropriately, we have
to establish an understanding of which determinants drive error probability of
process models and how we can measure them (cf. e.g. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]).
      </p>
      <p>
        In this context, the contribution of this paper is a newly developed set of
metrics for measuring the error probability of business process models. Beyond
the theoretical foundation of these metrics, we provide a first validation based
on the EPCs of the SAP reference model [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. By defining quality concepts like
error probability in a measurable way, we contribute to the understanding of bad
process design in general. Against this background, the remainder of the paper
is structured as follows. In Section 2 we identify comprehensibility of a process
model’s structure and its state space as the key determinants for error
probability. For each of both we define a set of metrics and discuss their theoretical
impact on error probability. Section 3 provides a first evaluation of this set of
metrics for predicting errors in the SAP reference model. Finally, Section 4 gives
a summary and an outlook on future research. For related work, refer to [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
    </sec>
    <sec id="sec-2">
      <title>Error Determinants for Business Process Models</title>
      <p>
        Following the principles of measurement theory (see e.g. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]), we consider the
comprehensibility of the business process model as the main determinant for
error probability. This is motivated by the assumption that the process models
are constructed by human modelers and that their design is subject to bounded
rationality [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The comprehensibility of any model by a person is determined by
her familiarity with the real-world process and by the way the model elements
are combined to represents real-world process. In this paper we only investigate
into the second aspect. More precisely, we analyze (a) the process model structure
and (b) the process model state space. For both these determinants, we identify
a set of sub-determinants and discuss their impact on error probability. In the
following, we consider a business process model to be a special kind of graph
G = (N, A) with at least three node types N = T ∪ S ∪ J , i.e., tasks T , splits S,
joins J , and control flow arcs A ⊆ N × N to connect them. We use the generic
term connectors C = S ∪ J for splits and joins collectively. Each connector has
a label AND, OR, or XOR that gives its routing of merging semantics.
Size: Several papers point to size as an important factor for the
comprehensibility of software and process models [
        <xref ref-type="bibr" rid="ref3 ref5">5, 3</xref>
        ]. While the size of software is frequently
equated with lines of code, the size of a process model is often related to the
number of nodes N of the process model. The metric SN measures the number
of nodes of the process model graph G. An increase in SN (G) should imply an
increase in error probability of the overall model.
      </p>
      <p>SN (G) = |N |
Separability: Separability is closely related to the notion of a cut-vertex (or
articulation point), i.e., a node whose deletion separates the process model into
multiple components. We define the separability ratio Π as the number of
cutvertices to number of nodes. Cut-vertices can be found using depth-first search.
An increase in Π(G) should imply a decrease in error probability of the model.
Π(G) = |{n ∈ N | n is cut-vertex}|
|N | − 2
Sequentiality: Sequentiality relates to the fact that sequences of consecutive
tasks are the most simple building blocks of a process model. The sequentiality
ratio Ξ relates arcs of a sequence to the total number of arcs. An increase in
Ξ(G) should imply a decrease in error probability of the overall model.
Ξ(G) = |{a ∈ A | a ∈ (T × T )}|</p>
      <p>
        A
| |
Structuredness: Structuredness relates to how far a process model can be built
by nesting blocks of matching join and split connectors (see e.g. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]). The degree
of structuredness can be determined by applying reduction rules and comparing
the size of the reduced model to the original size. The structuredness ratio Φ
of the process graph is one minus the number of nodes in the reduced process
graph divided by the number of nodes in the original process graph. An increase
in Φ(G) should imply a decrease in error probability of the overall model.
      </p>
      <p>SN (G )
ΦN = 1 − SN (G)</p>
      <p>CY CN = |NC |
|N |
Cyclicity: Cyclic parts of a model are more difficult to understand than
sequential parts. |NC | gives the number of nodes on some cycle and cyclicity CY C
relates it to the total number of nodes. An increase in CY C(G) should imply an
increase in error probability of the overall model.</p>
      <p>Parallelism: Modelers have to keep track of concurrent paths that need to be
synchronized. AND- and OR-splits introduce new threads of control such that
the number of control tokens potentially increases by the number of the output
degree minus one. The Token Split T S metric counts these newly introduced
tokens. An increase in T S(G) should imply an increase in error probability of
the overall model.</p>
      <p>T S(G) =</p>
      <p>dout(n) − 1
c∈Cor∪Cand
3</p>
    </sec>
    <sec id="sec-3">
      <title>Evaluation of the Metrics</title>
      <p>
        In this section, we utilize the SAP reference model in order to evaluate the
metrics proposed in the previous section for their capability to predict errors.
A relaxed soundness analysis (see [
        <xref ref-type="bibr" rid="ref2 ref7">7, 2</xref>
        ]) of this model collection revealed that
34 of the about 600 EPC business process models (see [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]) have errors. Since
the dependent variable is binary (error yes/no), we use a logistic regression
(logit) model (see e.g. [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]) where a positive coefficient increases and negative one
decreases error probability.
      </p>
      <p>Depending on the metrics defined before, we calculated multivariate logit
models to predict error probability. The model with three variables was the
largest that satisfied various goodness tests, i.e., the Hosmer&amp;Lemeshow test
is 0.216 which is significantly greater that 0.05 and all Wald statistics indicate
that the coefficients are significantly different from zero. The Nagelkerke R2 as
a coefficient of determination describes which fraction of the variability is
explained. With an R2 of 0.853 we achieve a high rate of explanation by the metrics.
Furthermore, it is interesting to note that the three variables of the model
confirm the tendency that was postulated in the hypothesis, i.e. error probability
increases with size and decreases with higher separability or structuredness.</p>
    </sec>
    <sec id="sec-4">
      <title>Contribution and Limitations</title>
      <p>
        In this paper we explored in how far errors in a business process model can be
predicted by the help of suitable metrics. Based on the general hypothesis that
error probability is determined by the comprehensibility of the process model
structure and the process state space, we identify a set of six metrics for
predicting error probability. Each of these metrics is discussed in terms of its motivation,
calculation, and its theoretical impact on error probability. We use the sample of
the SAP reference model to evaluate the metrics in a multivariate logit model.
The results are statistically significant and show that already three variables
(size, separability, and structuredness) suffice to achieve a high Nagelkerke R2
value of 0.853. Furthermore, the hypothetical direction of their impact on error
probability is confirmed. This is a considerable improvement compared to the
previous analysis where count metrics yielded a Nagelkerke R2 value of less than
0.35 [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Still, our evaluation has some limitations that we aim to address in
future research. First, relaxed soundness is not able to find all problematic parts
of a process model. We are currently working on a soundness notion for EPCs
and a respective verification approach in order to get more precise information
about errors in an EPC. Second, there is no research available that discusses
the representativeness of the SAP reference model as a process model collection.
Therefore, future research will have to evaluate the metrics against other samples
in order to establish them as error predictors.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Simon</surname>
            ,
            <given-names>H.A.</given-names>
          </string-name>
          :
          <source>Sciences of the Artificial. 3rd edn</source>
          . The MIT Press (
          <year>1996</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Mendling</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moser</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Neumann</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Verbeek</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dongen</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aalst</surname>
            ,
            <given-names>W.:</given-names>
          </string-name>
          <article-title>A Quantitative Analysis of Faulty EPCs in the SAP Reference Model</article-title>
          .
          <source>BPM Center Report BPM-06-08</source>
          , BPMCenter.org (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Fenton</surname>
            ,
            <given-names>N.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pfleeger</surname>
            ,
            <given-names>S.L.: Software</given-names>
          </string-name>
          <string-name>
            <surname>Metrics</surname>
          </string-name>
          .
          <article-title>A Rigorous and Practical Approach</article-title>
          . PWS, Boston (
          <year>1997</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4. Keller, G.,
          <string-name>
            <surname>Teufel</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <string-name>
            <surname>SAP(R) R</surname>
          </string-name>
          <article-title>/3 Process Oriented Implementation: Iterative Process Prototyping</article-title>
          .
          <string-name>
            <surname>Addison-Wesley</surname>
          </string-name>
          (
          <year>1998</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Mendling</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Neumann</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          :
          <article-title>Error metrics for business process models</article-title>
          .
          <source>Technical Report JM-2006-12-03</source>
          , Vienna Univ. of Econ. and
          <string-name>
            <surname>Business Administration</surname>
          </string-name>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Kiepuszewski</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hofstede</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bussler</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>On structured workflow modelling</article-title>
          . In Wangler,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Bergman</surname>
          </string-name>
          , L., eds.:
          <source>Advanced Information Systems Engineering</source>
          , 12th International Conference CAiSE
          <year>2000</year>
          . Volume
          <volume>1789</volume>
          of Lecture Notes in Computer Science., Springer (
          <year>2000</year>
          )
          <fpage>431</fpage>
          -
          <lpage>445</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Dehnert</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rittgen</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Relaxed Soundness of Business Processes</article-title>
          . In Dittrick,
          <string-name>
            <given-names>K.R.</given-names>
            ,
            <surname>Geppert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Norrie</surname>
          </string-name>
          , M.C., eds.
          <source>: Proceedings of the 13th International Conference on Advanced Information Systems Engineering. Volume 2068 of Lecture Notes in Computer Science., Interlaken</source>
          , Springer (
          <year>2001</year>
          )
          <fpage>151</fpage>
          -
          <lpage>170</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8. Keller, G., Nu¨ttgens,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Scheer</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.W.</surname>
          </string-name>
          :
          <article-title>Semantische Prozessmodellierung auf der Grundlage “Ereignisgesteuerter Prozessketten (EPK)”</article-title>
          . Heft 89,
          <string-name>
            <surname>Institut</surname>
            <given-names>fu</given-names>
          </string-name>
          ¨r Wirtschaftsinformatik, Saarbru¨cken,
          <string-name>
            <surname>Germany</surname>
          </string-name>
          (
          <year>1992</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Hair</surname>
            , jr.,
            <given-names>J.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Anderson</surname>
            ,
            <given-names>R.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tatham</surname>
            ,
            <given-names>R.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Black</surname>
            ,
            <given-names>W.C.</given-names>
          </string-name>
          :
          <article-title>Multivariate Data Analysis. 5th edition edn</article-title>
          . Prentice-Hall International, Inc. (
          <year>1998</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>