Where did I go wrong?
           Explaining errors in business process models
                                      Niels Lohmann

        Universität Rostock, Institut für Informatik, 18051 Rostock, Germany
                          niels.lohmann@uni-rostock.de

       Abstract. Business process modeling is still a challenging task —
       especially since more and more aspects are added to the models, such
       as data lifecycles, security constraints, or compliance rules. At the same
       time, formal methods allow for the detection of errors in the early model-
       ing phase. Detected errors are usually explained with a path from the
       initial to the error state. These paths can grow unmanageably and make
       the understanding and fixing of errors very time consuming. This paper
       addresses this issue and proposes a more intelligible explanation of errors:
       Instead of listing the actions on the path to the error, only which decisions
       lead to it are reported and highlighted in the original model.


1    Introduction
Business process modeling is a sophisticated task and received a lot of attention
in the past decades. With the advent of domain-specific languages and a growing
scientific community, the act of creating and managing business process models
has become a discipline on its own. Despite all efforts, design flaws may still occur.
This can have different impacts, ranging from syntactically incorrect models,
which are harder to understand, up to catastrophic faults and down times in the
execution that yield to a loss of money or a legal aftermath. Consequently, a
large branch of research focuses in the detection, correction, and avoidance of
errors in business process models. Whereas plain control flow analysis is now well
understood, other aspects such as data, business rules, or security may introduce
more subtle flaws that are harder to detect.
    Using the prominent soundness [1] property, we can classify existing ap-
proaches into three classes: (1) Some approaches exploit certain structural con-
straints of the business process model, for instance by focussing on workflow
graphs that only consist of AND/XOR-gateways, for instance [10]. (2) Other
approaches rely on the definition of soundness which can be defined in terms of
standard Petri net properties such as boundedness, liveness, or the existence of
place invariants [11]. The two mentioned approaches are domain-specific in the
sense that they exploit the fact that they investigate business process models. In
contrast, (3) general purpose verification tools (usually called model checkers [3])
can check all kinds of properties as long as they can be expressed in terms of
temporal logics. As this is the case for soundness, these tools are also applicable
for the verification of business process models.
    Due to the ongoing evolution of business process modeling languages, the
growing number of aspects that need to be covered by a business process model,


N. Herzberg, M. Kunze (eds.): Proceedings of the 6th Central European Workshop on Services and
their Composition (ZEUS 2014), Potsdam, Germany, 27-03-2014, published at http://ceur-ws.org
                                                      Where did I go wrong?         9

or the trend toward executable business process models, the verification of
business process models has become a moving target. As a consequence, specific
approaches may become inapplicable for novel demands, leaving only general
purpose approaches as stable tools for the future.

Problem description. In principle, a model checker takes a formal model (e.g., a
Petri net) and a formal description of the property to check (usually described
by temporal logic formula φ) as input and tries to prove the property by an
exhaustive investigation of the model’s states. In case the property is violated
(e.g., a deadlocking state is detected), a path π to this error state is reported [3].
The path contains all actions of the model that need to be executed to reach
the error state from the initial state. Due to this operational nature of paths,
the scenario that led to the error can be simulated. It is furthermore possible to
explain the scenario in terms of the original model; that is, to map the states of
the Petri net back to events of a BPMN model.
    Unfortunately, the size of the paths correlates with the size of the model and
paths of industrial models can thus be very long and hardly understandable.
Furthermore, the path can contain a lot of irrelevant or diverting information
that makes the comprehension of the error very difficult. For instance, the path
usually contains actions that only “set up” the process (e.g., initializations and
login procedures). These inevitable actions are certainly necessary to be able
to reach the error state, but are usually not the cause of it. Another aspect
that makes paths hard to understand is the fact that business process models
may span several components where activities are executed in parallel. On the
path, these originally unordered activities are reported in a fixed — and possibly
arbitrary — order which may yield confusion due to unintuitive error descriptions.

Contribution. This paper addresses the mentioned problems by shortening paths
by focussing on the choices made rather than on each individual action. We shall
use a large case study as experimental evaluation of our proposed approach.


2    Model checking Petri nets
Business process modeling languages are usually semiformal and hence are not
directly applicable to a mathematically rigorous proof of correctness criteria.
However, the operational semantics can be captured in formalisms such as Petri
nets or process calculi. With the advent of executable languages such as WS-BPEL
2.0 or BPMN 2.0, such a formalization became much easier, because a precise
execution semantics yielded more careful language specifications. In fact, for most
of today’s languages from industry or academia, translations into Petri nets [9]
exists [7].

Example. Figure 1(a) depicts a small business process model from [4] which
contains two subtle control flow errors: a lack of synchronization and a local
deadlock. Its translation into a Petri net is shown in Fig. 1(b). As we see, the
Petri net’s structure is very similar to the original model.
10      Niels Lohmann


                                                                M2
                         M1
                                                                                  J1
                                                  F1


                               (a) business process model from [4]
                                                  t4    p5    t5

           p1      t1                   t3   p4                      p6     t6    p7

                        p2    t2   p3                   p10   t11                       t7   p8

                   t8                   t9   p9   t10                       t13   p13

                                                        p11   t12    p12

                                                                            t14   p14


                                        (b) Petri net model
     Fig. 1. A business process model (a) and its translation into a Petri net (b)


    Model checking [3] is an approach to prove that a system satisfied a given
correctness criterion; for instance soundness, the absence of a deadlocking state,
the presence of a sound process configuration, correct data life cycles, or com-
pliance to business rules. In contrast to theorem provers, which sometimes need
manual inputs, or testing, which can only prove the existence of errors, but never
their absence, model checking is an automated and complete way to investigate
systems.
    For the remainder of the paper, we use model checking tool LoLA [12] that
takes a Petri net N and a temporal logical formula φ as input. If the formula is
satisfied by the Petri net (e.g., if the Petri net is sound), this is reported as “yes”
to the modeler. In case the formula is violated (e.g., a deadlocking marking m is
found), this is reported as “no” to the modeler. In addition, a path π = t1 · · · tn is
given to the modeler which explains how m is reachable from the initial marking
                   t1        tn
m0 ; that is, m0 −→   · · · −→  m. Depending on the nature of the formula φ, the
marking reached by the reported path either is a proof that the formula is not
satisfied by the behavior of the Petri net N and is called a counterexample or
marking itself is the proof that the formula is satisfied (e.g., if φ expresses the
reachability of that marking m) and is called a witness. In this paper, we do
not distinguish the semantics of the marking m and always refer to m as goal
marking.

Example (cont.). The business process from Fig. 1(a) has a lack of synchronization.
This can be detected by checking the Petri net from Fig. 1(b). The following path
π describes how a marking m can be reached which puts two tokens on place p6 .

                π = t1 t2 t9 t10 t11 t12 t14 t8 t2 t3 t4 t5                m = {p6 7→ 2}

The path contains 12 transitions. In the remainder of this paper, we use this
path to exemplify the proposed reductions.
                                                       Where did I go wrong?     11

   It is worthwhile to mention that model checking suffers a devastating worst
case complexity due to the well-known state explosion problem which yields
reachability graphs with exponential blow-ups compared to the size of the models.
However, even industrial business process models can be model checked in few
microseconds, because heuristics that fight the state space explosion proved to
be very effective in this domain [4].


3     Representing paths by made choices

3.1    The problem: long paths = big problems

In the remainder of the paper, we focus on the following problem:


    Given a path π to a goal marking m of a Petri net model N , how can the
    reason for the error modeled by m be briefly and comprehensively explained
    to the modeler of N ?


    Apparently, π describes how the goal marking m can be reached from the
initial marking m0 of N . Consequently, reporting the transitions of π together
with the intermediated markings to the modeler should help to understand the
reasons m was reached. Unfortunately, this approach is futile in case π contains
dozens of transitions. The reasons for such long paths are:

 Detours: Model checkers usually investigate the markings of a Petri net in a
    depth first search1 . As a result, the reported paths do not need to be optimal
    and may contain some transitions that model “detours” in the reachability
    graph that do not contribute in the actual reaching of the goal marking.
 Interleaving of concurrent transitions: A marking of N may activate two tran-
    sitions t1 and t2 which are not mutually exclusive. That is, firing either
    transition first does not disable the other one. A typical reason for this is
    that t1 and t2 do not share any resources. Consequently, the order in which t1
    and t2 occur on the path π is arbitrary. If each transition belongs to different
    components of the underlying business process model, then these arbitrary
    interleaving of the transitions may be irritating to the modeler if she tries
    to understand the path π. In the example path, transition t11 and t12 are
    concurrent and the reported order in path π (t11 before t12 ) is arbitrary.
 Indisputable parts: Though the path π is an actual proof that the goal marking
    m can be reached in N , not every transition on the path is an actual cause
    of m. In the example process, any path will begin with firing t1 and hence
    does not need to be reported to the modeler as reason for an error.
1
    Breadth-first approaches are not applicable to many classes of formulae.
12      Niels Lohmann

3.2   The solution: don’t report the obvious
To tackle the problem of long paths with redundant or unhelpful information,
we shall exploit two aspects to shorten paths in the remainder of this section:
progress and conflicts.
    Progress is the assumption that the model never “gets stuck” in case a
transition is activated. That is, if a marking activates one or more transitions,
then this marking is eventually left by firing on of these transitions. Progress is a
natural assumption for business process models in which the execution of tasks
also cannot be postponed indefinitely. Though the actual occurrence of message
or timer events cannot be precisely predicted, the respective states are always
assumed to be eventually left by the modeled actions.
    A conflict is a situation in which there exist more than one possible continua-
tions (e.g., an XOR gateway). In terms of Petri nets, it is a marking in which
two transitions t1 and t2 are enabled, but after firing either of them, the other
transition is disabled. This situation is dual to concurrent transitions (see above)
that do not disable each other. A detailed discussion of these aspects can be
found in [9].
    The combination of these aspects brings us to the following intuitive ob-
servation: Only the conflicts on the path π carry information on how to reach
the goal markings. Any other marking m on the path between the initial and
the goal marking either (1) enables no transition: Then this must be the goal
marking itself, because it has no successor marking. Alternatively, (2) marking
m enables exactly one transition: Then this transition is eventually fired due
to the assumption of progress. Consequently, this transition does not need to
be reported to the modeler as its firing was already determined by the previous
transition on π thad lead to m. Finally, (3) marking m enables several concurrent
transitions. These transitions may fire independently, and if all of them are on π,
then the exact order is arbitrary.

Example (cont.). The conflicts of our running example are shaded gray in
Fig. 1(b): transitions t3 and t9 , as well as t13 and t14 are conflicting. As a result,
we can reduce the path π as follows:

                      πreduced = t9 t14 t3      m = {p6 7→ 2}

The firing of all other transitions is clear from the context from the intermediate
markings and the assumption of progress. Note that the transition names need to
be translated back into the terms of the original model. A different representation
of πreduced could be: “After (1) decision D1: No, (2) decision D2: No, and (3)
decision D1: Yes, a lack of synchronization occurs after after merge M2.” This is
depicted in Fig. 2.

3.3   Experimental results
To evaluate the path reduction algorithm, we applied it to a large collection of
industrial process models created by IBM customers using the IBM WebSphere
    translation, we therefore assumed that each input set can potentially activate each output
    set. Two di↵erent translations into workflow graphs and Petri nets were implemented,
    although the Petri nets could also be directly obtained from the workflow graphs by a
    well-known construction [2]. The Petri net models are available at http://www.service-
    technology.org/soundness in PNML format.

    2.3   Soundness
    Figure 4 shows a workflow graph without any tasks as it occurs in the middle part of the
                                                             Where
    process in Fig. 2 and to which we added a start and an end       didThis
                                                                 event.  I goprocess
                                                                               wrong?model 13
    contains a lack of synchronization error as well as a local deadlock, which are not so
    easy to spot in the first place.
                                  3
                               D1=Yes

                                                              M2


                                  1
               M1
                                                                                J1
                                               F1
                                D1=No
                                                                    2
                                                                   D2=No
           Fig. 4: Workflow graph with deadlock and lack of synchronization errors.
       Fig. 2. Mapping back the reduced path run to the original process model
        A local deadlock is a reachable state s of the process that has a token on an incom-
    ing edge e of an AND-join such that each state that is in turn reachable from s also
Business Modeler. The models were first presented in a report by Fahland et
al. [4], where the 1386 process models were checked for soundness using different
approaches. As a general-purpose model   6 checker, LoLA [12], took part in this
investigation, the process models were also translated into Petri nets.2 The models
are partitioned into five libraries (A, B1, B2, B3, C) and stem from different
business areas, ranging from financial services, automotive, telecommunications,
construction, supply chain, health care, and customer relationship management.
    As experiment, we replayed verification checks reported in [4,2] and reduced
the generated paths. The results summarized in Table 1–3 report are promising:
we report reductions between 77 % and 95 %, leaving average reduced path
lengths between 2 and 7 transitions. Though the reduced paths consist of Petri
net transitions, it can be easily translated back into the nomenclature of the
original model as demonstrated in Fig. 2.


4    Further reduction: remove spurious conflicts

In the previous section, we showed how paths to errors in business process models
can be reduced by only reporting conflict transitions. This reduction decided, for
each marking that activates a transition, whether conflicting transitions are also
activated. This check is local in the sense that it is not checked whether those
transitions that were not taken in the decisions actually could have avoided the
next conflict transition on the path.
    Intuitively, a transition ti on a reduced path π is a spurious conflict iff every
transition t in conflict to ti eventually reaches the marking mi+1 which enables
the next transition ti+1 on path π. In this case, choosing any transition from the
ith conflict will eventually enable the next conflict on the path to the goal state.
Consequently, reporting the spurious conflict ti is of little help to the modeler to
understand the error itself.
    The check for spurious transitions defined above can be straightforwardly be
implemented using a model checker.3 We integrated this check as postprocessing
2
  The original models and their Petri net translations are available for download at
  http://service-technology.org/soundness.
3
  We check whether N with initial marking m′i satisfies the CTL formula φ = AF mi+1 .
14          Niels Lohmann

                    Table 1. Paths from the checks for local deadlocks [4]

library                                   A             B1              B2            B3              C
avg. path length before / after      17.51 / 1.83   17.52 / 2.11   16.06 / 1.54   20.34 / 1.67   13.40 / 2.30
max. path length before / after         53 / 8         66 / 7         56 / 6         54 / 5         21 / 3
sum of path lengths before / after    1699 / 178     1419 / 171     1349 / 129     1688 / 139      134 / 23
reduction                              89.52 %        87.95 %        90.44 %        91.77 %        82.84 %


               Table 2. Paths from the checks for lack of synchronization [4]

library                                   A             B1              B2            B3              C
avg. path length before / after      30.83 / 3.17   10.47 / 0.66   12.16 / 0.68   11.50 / 0.59   51.00 / 7.57
max. path length before / after        89 / 13         52 / 7        100 / 8        103 / 14       120 / 17
sum of path lengths before / after    1079 / 111     1047 / 66      1459 / 82      1507 / 77       357 / 53
reduction                              89.71 %        93.70 %        94.38 %        94.89 %        85.15 %


                    Table 3. Paths from the checks for noninterference [2]

library                                   A             B1             B2             B3              C
avg. path length before / after      12.06 / 2.79   13.82 / 2.55   18.13 / 2.33   14.27 / 2.55   11.27 / 2.33
max. path length before / after         44 / 7         70 / 7         95 / 7         95 / 7         27 / 3
sum of path lengths before / after   19699 / 4557   5707 / 1054    13835 / 1777   17494 / 3130     169 / 35
reduction                              76.87 %        81.53 %        87.16 %        82.11 %        79.29 %


step after reducing the paths as described in the previous section. Note that
executing a model checker can be very time and memory consuming. However,
even if a check is not finished with a reasonable amount of resources, we just failed
to proof whether a conflict is spurious and can continue with the investigation
of the next transition. That said, the postpocessing can be aborted at any
time — any intermediate result is still correct.
    We applied the reduction of spurious conflicts to the case studies described in
the previous section. Table 4–6 summarize the results. In all three experiments,
the paths could be further reduced by 50–86%. Note that in some cases, the
check for spurious conflicts has been aborted after more than 2 GB of memory
were consumed. In these cases, the conflict was kept in the path and the check
proceeded with the next conflict.

5         Concluding remarks
Related work. The analysis and verification of business process models is a
broad field of research. Consequently, there exists a variety of domain-specific
approaches (e.g., the decomposition of workflow graphs into SESE regions to check
soundness [10]). However, we are not aware of other approaches that postprocess
error information from general purpose model checkers to explain these errors to
the modelers.
    Related to the presentation of error information is the automated correction
of flawed business process models [6,5]. These approaches use similarity metrics
                                                                  Where did I go wrong?              15

                 Table 4. Reduced paths from the checks for local deadlocks

library                                  A             B1            B2            B3            C
avg. path length before / after      1.84 / 0.91   2.11 / 0.67   1.54 / 0.57   1.67 / 0.41   2.30 / 0.90
max. path length before / after         8/2           7/1           6/1           5/1           3/1
sum of path lengths before / after    178 / 88      171 / 54      129 / 49      139 / 34       23 / 10
reduction                             50.56 %       68.42 %       62.79 %       75.54 %       60.87 %
aborted checks                           1             0             0             0             0


           Table 5. Reduced paths from the checks for lack of synchronization

library                                  A             B1            B2            B3            C
avg. path length before / after      3.17 / 0.86   0.66 / 0.17   0.68 / 0.14   0.59 / 0.09   7.57 / 1.00
max. path length before / after        13 / 2         7/2           8/2          14 / 2        17 / 2
sum of path lengths before / after    111 / 30       66 / 17       82 / 17       72 / 12       53 / 7
reduction                             72.97 %       54.55 %       79.27 %       84.42 %       86.79 %
aborted checks                           1             4             0             0             4


                 Table 6. Reduced paths from the checks for noninterference

library                                   A            B1            B2            B3            C
avg. path length before / after      2.79 / 0.99   2.55 / 0.75   2.33 / 0.55   2.55 / 0.63   2.33 / 0.40
max. path length before / after         7/2           7/2           7/2           7/2           3/1
sum of path lengths before / after   4557 / 1614   1054 / 310    1777 / 423    3130 / 772      35 / 6
reduction                             64.58 %        70.59 %      76.20 %       75.34 %       82.86 %
aborted checks                           12             4            4             7             0


to find a correct business process model which maximally resembles the flawed
model. These approaches have the benefit of avoiding lengthy manual correction
steps altogether.


Future work. In this paper, we focused on reducing paths to error states and
neglected the retranslation into the original business process model. Visualizations
such as Fig. 2, possibly enriched with animations, need to be automated and
evaluated by business process modelers. Here, understandability criteria [8] could
be of great value. However, this was out of scope of this paper which aimed at
evaluating the idea of using conflicts to reduce paths with three experimental
setups checking different correctness criteria with thousands of industrial business
process models.
    We see in this paper a first step toward a diagnosis framework which uses
general purpose verification tools to verify business process models. As motivated
in the introduction, domain-specific approaches are very closely coupled to the
structure or the property under investigation, but may become inapplicable for
future developments. In contrast, the modularization (a translation into Petri
nets as frontend, a general purpose model checking tool as middleware, and a
diagnosis framework as backend) may be more flexible when it comes to novel
business process languages and properties.
16      Niels Lohmann

References
 1. W. M. P. v. d. Aalst. The application of Petri nets to workflow management.
    Journal of Circuits, Systems and Computers, 8(1):21–66, 1998.
 2. R. Accorsi and A. Lehmann. Automatic information flow analysis of business
    process models. In BPM 2012, LNCS 7481, pages 172–187. Springer, 2012.
 3. E. M. Clarke, O. Grumberg, and D. A. Peled. Model Checking. MIT Press, 1999.
 4. D. Fahland, C. Favre, B. Jobstmann, J. Koehler, N. Lohmann, H. Völzer, and
    K. Wolf. Instantaneous soundness checking of industrial business process models.
    In BPM 2009, LNCS 5701, pages 278–293. Springer, 2009.
 5. M. Gambini, M. La Rosa, S. Migliorini, and A. ter Hofstede. Automated error
    correction of business process models. In BPM 2011, LNCS 6896, pages 148–165.
    Springer, 2011.
 6. N. Lohmann. Correcting deadlocking service choreographies using a simulation-
    based graph edit distance. In BPM 2008, LNCS 5240, pages 132–147. Springer,
    2008.
 7. N. Lohmann, H. Verbeek, and R. M. Dijkman. Petri net transformations for business
    processes – a survey. LNCS ToPNoC, II(5460):46–63, 2009.
 8. J. Mendling, H. A. Reijers, and J. Cardoso. What makes process models under-
    standable? In BPM 2007, LNCS 4714, pages 48–63. Springer, 2007.
 9. W. Reisig. Petri Nets. Springer, EATCS Monographs on Theoretical Computer
    Science edition, 1985.
10. J. Vanhatalo, H. Völzer, and F. Leymann. Faster and more focused control-flow
    analysis for business process models through SESE decomposition. In ICSOC 2007,
    LNCS 4749, pages 43–55. Springer, 2007.
11. H. M. W. Verbeek, T. Basten, and W. M. P. v. d. Aalst. Diagnosing workflow
    processes using Woflan. Comput. J., 44(4):246–279, 2001.
12. K. Wolf. Generating Petri net state spaces. In ICATPN 2007, LNCS 4546, pages
    29–42. Springer, 2007.