SMT-based Safety Verification of Data-Aware
Processes: Foundations and Applications
(Extended Abstract)
Alessandro Gianola1,*
1
    Free University of Bozen-Bolzano, Bolzano, Italy


                                         Abstract
                                         Integrating data and processes to understand their concrete interplay is a long-standing problem in
                                         business process management. We present a general framework for the formalization and verification
                                         of infinite-state transition systems that can interact with a persistent storage represented as relational
                                         databases. We develop sophisticated algorithmic techniques, based on automated reasoning and SMT
                                         solving, for the verification of safety properties. Building on top of these techniques, we apply our
                                         framework to business process management: we introduce general models for the safety verification of
                                         complex business processes enriched with real data. For those models, we devise formal and operational
                                         methods that are based on standard languages and/or can capture advanced modeling capabilities, con-
                                         sidering in particular the BPMN standard and variants of Petri nets for the process control-flow, and
                                         SQL queries and updates for the interaction with data.


1. Overview of the Problem
The thesis of our work is that principles, methods and techniques from automated reasoning
and Satisfiability Modulo Theories (SMT) can be effectively employed to lay solid foundations
and to develop concrete tools for the formalization and verification of infinite-state systems
arising from the interplay between the control flow of a business process and the data it queries
and manipulates.
Finite-State Model Checking. Formal verification has the goal of providing automated meth-
ods for verifying complex systems against some property of interest. Since the eighties, model
checking [1] has gained increasing attention, becoming one of the most celebrated approaches. It
consists of exhaustively and automatically checking whether the (abstracted) models of interest
meet some given temporal specification. Traditional model checking relies on the assumption
that the systems can be abstractly described using finitely many states, so as to exploit explicit
search procedures in the finite space of all configurations. In many software contexts, where the
systems are only considered from the point of view of the control-flow, sophisticated techniques
for a smart exploration of the state space guarantee that these approaches are successful [2].

BPM 2022: Best BPM Dissertation Award 2022, September 11–16, 2022, Münster, Germany
*
 Corresponding author.
" gianola@inf.unibz.it (A. Gianola)
~ https://gianola.people.unibz.it/ (A. Gianola)
 0000-0003-4216-5199 (A. Gianola)
                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)


                                                                                                          20
Data-Aware Processes. In recent years, a growing number of application domains asks for
process modeling languages paired with automated verification techniques that do not just
consider the control flow dimension, but also take into consideration the data dimension. From
a theoretical perspective, the development of formal, abstract frameworks for attacking the
problem of verifying so-called Data-Aware Processes (DAPs) has consequently flourished [3, 4],
leading to a plethora of formal models that differentiate depending on how the data and process
components, as well as their interplay, is actually represented. What all these frameworks have
in common is that they strive to focus on very general DAP models that formalize abstract
dynamic systems (i.e., the process component) interacting with data persistent storage (i.e., the
data component). In these models, the concept of “process” should be interpreted in an abstract
sense, as a (possibly undetermined) mechanism that guides the evolution of a system.
   One may wonder whether formal frameworks for verifying DAPs can employ the techniques
used in finite-state model checking. Verification of DAPs should reflect the possibility of
expressing properties that simultaneously account for the data and the process perspectives,
and most importantly for their interaction. However, due to the presence of data, DAP models are
intrinsically infinite-state: the content of a relational database is finite but its size is unbounded
and unknown a priory (since a new tuple can always be added to some relation using data
elements taken from infinite domains); in addition, many real-world scenarios often require
the presence of possibly fresh data values (e.g., string, new identifiers, integer or real numbers)
injected into the process by external users. In such sophisticated settings, explicit model
checking is not possible anymore: due to the infinity of the state space, exhaustive search
cannot be applied. Moreover, traditional approaches in data management do not help here,
since they typically investigate static, structural aspects of a domain of interest, disregarding
dynamics [5, 6]. The results obtained for the verification of DAPs are quite fragmented, since
they consider a variety of different assumptions on the model and on the analysis tasks [3, 4],
assumptions that are usually too restrictive when compared with both concrete business process
and data management models.
Business Processes enriched with Data. While theoretical DAP frameworks were studied,
a huge body of research has been dedicated to the challenging problem of reconciling data and
business process management within contemporary organizations [7, 8, 9]. More specifically,
in the BPM context it has become more and more important to study multi-perspective models
that do not just conceptually account for the control-flow dimension of business processes, but
also consider the interplay with data [3]: in contrast with abstract DAPs, these models are more
focused on concrete processes as they are interpreted by stakeholders and BPM practitioners.
Traditionally, BPM models are formalized using pure control-flow models, ignoring the effects
caused by data to the evolution of the process, i.e., neglecting how data are manipulated during
the tasks execution, and how the executability of tasks is affected by data-related conditions.
   To overcome this fundamental limitation, many concrete languages and corresponding
formalisms have been proposed to represent business processes where the data and the control-
flow dimension are both considered as first-class citizens [10]. One important challenge that
naturally arises is how to formally verify such business processes enriched with data: in this
respect, it seems natural to employ the abstract frameworks developed for DAPs, and to try
to adapt these frameworks to the desired application domains (see for some attempts in this


                                                 21
direction, e.g., [11, 12]). However, this adaptation is quite challenging because of the lack of
a comprehensive framework for the verification of DAPs that is able to capture most of the
essential modeling features of business process data-aware extensions. The solutions provided
in the literature consist either of abstracting from data to non-determinism or of bounding
data to only finitely many configurations, fixed apriori. Nonetheless, these solutions are partial
since only support very limited types of unboundedness, whereas real data, in general, account
for two more complex types: the one due to fresh values injected into the process from the
environment, and the one due to how relations (the most basic data structures) can yield new
configurations. These, in turn, can easily lead to Turing-complete mechanisms.
   Moreover, currently there are few theoretical frameworks that also come with a corresponding
implemented tool for concrete verification, which is one of the desiderata for every model
checking application domain. Finally, an orthogonal challenging dimension regards the abstract
flavor of these frameworks and their modeling constructs, which often depart from those
offered by process and data modeling standard languages used in practice, such as the Business
Process Model and Notation (BPMN) for processes, and the Structured Query Language (SQL)
for persistent data. In fact, it is essential to rely on standard languages when dealing with
real applications, considering that model checking should have a big impact in guaranteeing
trustworthiness of concretely deployed systems.
Infinite-state Model Checking. In parallel, in the nineties, a completely separated line of
research started to investigate how to attack the problem of verifying infinite-state systems
in general, disregarding the data perspective. A particularly interesting class of infinite-state
systems is given by parameterized systems [13]. Such systems can be seen as transition systems
that are parametric in the number 𝑛 of (possibly independent) components, called parameters.
Parameterized verification approaches provide automatic methods to prove trustworthiness of
these systems regardless of the number of their components.
   Another successful way for tackling infinite-state model checking consists of providing a
symbolic setting [14] for representing states. Thanks to this representation, the exploration of
the state space does not need to be explicit, but it can be performed symbolically. Satisfiability
Modulo Theories (SMT) [15] is one of the most fruitful emerging areas in computational logic
that provides techniques for symbolic model checking: the SMT problem consists of deciding
the satisfiability of logical formulae with respect to combinations of background first-order
logic theories. SMT leverages highly efficient technologies, such as those stemming from
SMT solvers, and has been proved to be particularly successful for verifying infinite-state
systems. There exists a wide range of methods using SMT solvers as their underlying engine.
A successful framework based on SMT solving (and applied, e.g., to distributed systems) is
the one of array-based systems [16], a declarative formalism where arrays (as known from
programming languages) are the central notion.


2. Contributions
As we have seen, DAP verification has recently gained increasing attention, with strong motiva-
tion from BPM applications. Several approaches have been proposed for attacking this problem,
but they present two significant limitations. First, the employed verification techniques are


                                               22
developed ad hoc, and do not rely on well-established methods and implemented and state-of-the-
art technologies. Second, these frameworks are usually studied at a quite abstract level, which
makes it challenging to connect them to the standard front-end languages used in practice. To
summarize, there is the lack of a universal and comprehensive approach to verify DAPs via a
sufficiently expressive formalism that natively supports effective techniques and can encode
“front-end” languages and models.
   At the same time, many approaches to symbolic reasoning for infinite-state systems have
been successfully applied to different models, such as parameterized systems. Most of these
approaches provide not only a solid theoretical framework, but also the support of computation-
ally efficient and highly engineered tools. This is the case of SMT solvers and model checkers
based on them. Nevertheless, importing SMT-based model checking in the context of DAP
verification is extremely challenging and cannot be done as is: it requires carrying out genuinely
novel research for handling the data perspective, and to develop algorithmic techniques. In
this work, we bridge the gap between DAP verification and infinite-state model checking: (i) we
propose a theoretical SMT-based framework for DAP verification and, to make it effective
in practice, we devise efficient algorithmic techniques based on automated reasoning; (ii) we
apply the previous machinery to provide practical approaches for modeling and verifying
business processes enriched with real data.
DAP Verification via SMT. We define a general model-theoretic/algebraic framework, called
(Universal) RASs, for automated verification of DAPs based on SMT solving, focusing in par-
ticular on parameterized safety: this amounts to establishing whether a system can reach an
undesired, unsafe configuration, irrespective of the specific content of its background data
storage. Our approach relies on array-based systems: we incorporate the needed capabilities to
formalize and reason about relational databases. We exhibit how arrays can represent read-write
relations that can be queried and updated during the evolution of the system.
   For this purpose, we develop sophisticated algorithmic techniques building on top of well-
known methods in automated reasoning. We also demonstrate the feasibility of our approach
by showing these techniques in action thanks to the implementation in the state-of-the-art
mcmt model checker, testing it against a concrete DAP benchmark: in this respect we provide a
first, preliminary experimental evaluation of our verification machinery.
Modeling and Verification of Data-Aware Business Processes. As a second, key contribu-
tion, we exploit the developed machinery to provide BPM-oriented practical approaches, strictly
linked to other major research lines on modeling and verifying processes with data.
   First, within the general framework of RASs, we propose Data-Aware BPMN models, called
DABs, based on standard languages used in the BPM literature. These models natively combine
process modeling and querying languages used in practice (BPMN+SQL) in terms of a significant
subset of this combination, exhibiting how this subset can be both automatically translated
into the formal verification framework of RASs. These models are implemented and executed
in an operational framework called delta-BPMN, where a standard process execution engine
(provided by Camunda) is enriched with data management features, i.e., with the support of
relational databases and data manipulation capabilities.
   Then, we introduce a theoretical formalism, called COA-nets, that captures expressive model-
ing capabilities in the recently born spectrum of so-called object-centric processes. COA-nets are


                                               23
in line with process modeling principles studied in process research, specifically Petri nets and
their extensions, and represent an extension of Colored Petri nets that is able both to model data
flow in tokens and to access a relational database. Once again, the translation to the verification
framework of Universal RASs is provided. We explicitly discuss ways of modeling process
and data interactions along a comprehensive setting, where expressive modeling patterns are
introduced and shown how to employed in practice without hampering verification guarantees.


References
 [1] E. M. Clarke, T. A. Henzinger, H. Veith, R. Bloem (Eds.), Handbook of Model Checking,
     Springer, 2018.
 [2] J. R. Burch, E. M. Clarke, K. L. McMillan, D. L. Dill, L. J. Hwang, Symbolic model checking:
     10ˆ20 states and beyond, Inf. Comput. 98 (1992) 142–170.
 [3] D. Calvanese, G. De Giacomo, M. Montali, Foundations of data-aware process analysis: A
     database theory perspective, in: Proc. of PODS 2013, 2013, pp. 1–12.
 [4] A. Deutsch, R. Hull, Y. Li, V. Vianu, Automatic verification of database-centric systems,
     ACM SIGLOG News 5 (2018) 37–56.
 [5] S. Abiteboul, R. Hull, V. Vianu, Foundations of Databases, Addison-Wesley, 1995.
 [6] F. Baader, D. Calvanese, D. L. McGuinness, D. Nardi, P. F. Patel-Schneider (Eds.), The
     Description Logic Handbook: Theory, Implementation, and Applications, Cambridge
     University Press, 2003.
 [7] C. Richardson, Warning: Don’t assume your business processes use master data, in:
     Proceedings of BPM 2010, volume 6336 of LNCS, Springer, 2010, pp. 11–12.
 [8] M. Dumas, On the convergence of data and process engineering, in: Proc. of ADBIS 2011,
     2011, pp. 19–26.
 [9] M. Reichert, Process and data: Two sides of the same coin?, in: Proceedings of OTM 2012,
     volume 7565 of LNCS, Springer, 2012, pp. 2–19.
[10] A. Meyer, L. Pufahl, D. Fahland, M. Weske, Modeling and enacting complex data depen-
     dencies in business processes, in: Proc. of BPM 2013, 2013, pp. 171–186.
[11] E. Damaggio, A. Deutsch, R. Hull, V. Vianu, Automatic verification of data-centric business
     processes, in: Proc. of BPM 2011, 2011, pp. 3–16.
[12] D. Solomakhin, M. Montali, S. Tessaris, R. De Masellis, Verification of artifact-centric
     systems: Decidability and modeling issues, in: Proc. of ICSOC 2013, 2013, pp. 252–266.
[13] P. A. Abdulla, G. Delzanno, Parameterized verification, Int. J. Softw. Tools Technol. Transf.
     18 (2016) 469–473.
[14] K. L. McMillan, Symbolic Model Checking, Kluwer Academic Publishers, Norwell, MA,
     USA, 1993.
[15] C. W. Barrett, C. Tinelli, Satisfiability modulo theories, in: Handbook of Model Checking,
     2018, pp. 305–343.
[16] S. Ghilardi, S. Ranise, Backward reachability of array-based systems by SMT solving:
     Termination and invariant synthesis, Log. Methods Comput. Sci. 6 (2010).


                                               24