SMT-Based Safety Verification of
         Data-Aware Processes under Ontologies
                (Preliminary Results)

                      Diego Calvanese1,2 , Alessandro Gianola1 ,
                      Andrea Mazzullo1 , and Marco Montali1
                  1
                  KRDB Research Centre for Knowledge and Data
                      Free University of Bozen-Bolzano, Italy
                             surname @inf.unibz.it
             2
               Computing Science Department, Umeå University, Sweden


        Abstract. In the context of verification of data-aware processes, a for-
        mal approach based on satisfiability modulo theories (SMT) has been
        considered to verify parameterised safety properties of so-called artifact-
        centric systems. This approach requires a combination of model-theoretic
        notions and algorithmic techniques based on backward reachability. We
        introduce here a variant of one of the most investigated models in this
        spectrum, namely simple artifact systems (SASs), where, instead of
        managing a database, we operate over a description logic (DL) ontol-
        ogy expressed in (a slight extension of) RDFS. This DL, enjoying suit-
        able model-theoretic properties, allows us to define DL-based SASs to
        which backward reachability can still be applied, leading to decidability
        in PSpace of the corresponding safety problems.


1     Introduction
Verifying and reasoning about dynamic systems that integrate processes and
data is a long-standing challenge that attracted considerable attention, and that
led to a flourishing series of results, within business process management [28,9,20]
and data management [30,8,4,17,18]. Among the several conceptual models stud-
ied in this area, data-centric systems and in particular artifact-centric systems
have been brought forward as a principled approach where relevant (business)
objects are elicited, then defining how actions evolve them throughout their life-
cycle [24]. Different formal models have been proposed to capture artifact sys-
tems and study their verification [8]. One of the most studied settings considers
artifact systems as being composed of: a read-only database storing background
information about artifacts that does not change during the evolution of the sys-
tem; a working memory, used to store data that can be modified in the course
of the evolution; and transitions (also called actions or services) that query the
    Copyright c 2021 for this paper by its authors. Use permitted under Creative Com-
    mons License Attribution 4.0 International (CC BY 4.0).
2       D. Calvanese et al.

read-only database and the working memory and use the retrieved answers to
update the working memory. Verification of such systems is particularly challeng-
ing, not only because the working memory in general evolves through infinitely
many different configurations, but also because the desired verification proper-
ties should hold regardless of the specific content of the read-only database, thus
calling for a particular form of parameterised verification [15,18,10,11].

    In this paper, we study for the first time semantic artifact systems where
the read-only database is substituted by a Description Logic ontology, which
stores background, incomplete information about the artifacts. In this setting,
two possible notions of parameterisation may be studied: one where the evolution
of the system is verified against all possible choices for the ABox, another where
verification is against all possible models of a fixed ABox. In this work, we adopt
the latter hypothesis, and thus verify whether the artifact system enjoys desired
properties irrespectively of how the information explicitly provided by the ABox
is completed through the TBox assertions.

    More in detail, we consider an extensively studied, basic model of such
artifact-centric systems, called simple artifact system (SAS ) in [11], where the
artifact working memory consists of a fixed set of artifact variables [16,15,11].
On top of this basis, we study the verification of safety properties in the case
where the ontology is specified in (a slight extension of) RDFS [6], a schema
language for the Semantic Web formalized by the W3C, and we make use of the
ontology signature to express the transitions that update the working memory.
For this setting, we show that we can decide safety properties in PSpace by
relying on an SMT-based backward reachability procedure.

    In spirit, our approach is reminiscent of previous works studying the veri-
fication of dynamic systems (in particular, Golog programs) operating over a
DL ontology, such as [14,31]. In fact, both in their settings and ours, the dy-
namic system evolves each model of the ontology, and verification properties
are assessed over all the resulting evolutions. This is radically different from
approaches where the ABox itself is evolved by the process, with an execution
semantics following Levesque’s functional approach, in which query entailment
over the current state is used to compute the successor states [3,5]. However, we
differ from [14,31] in that our goal is not only to derive foundational results, but
also to transfer such results into practical algorithms and thus obtain a model
that is readily implementable by relying on a state-of-the-art model checker such
as MCMT [22]. As customary in the formal literature on artifact-centric systems,
our approach is based on actions that manipulate the artifact variables, coupled
with condition-action rules that declaratively define which actions are currently
executable, and with which parameters. Alternative choices could be seamlessly
taken, by adapting approaches that rely on an explicit description of the control-
flow, e.g., based on state machines [26] or Petri nets interpreted with interleaving
semantics [20,27].

    Full details are provided in the extended version of this paper [12].
    SMT-Based Safety Verification of Data-Aware Processes under Ontologies               3

2     Preliminaries


In this section, we first recall the syntax and semantics of first-order logic (FO).
We then define the syntax of the DL RDFS+ considered in this paper, which is a
slight extension of RDFS [6]. Its semantics is provided by means of the standard
translation, mapping RDFS+ ontologies into equivalent sets of FO formulas.


2.1   First-Order Logic Preliminaries


The alphabet of first-order logic (FO) consists of: countably infinite and pairwise
disjoint sets NP of predicate symbols (with ar(P ) ∈ N being the arity of P ∈ NP ),
NF of function symbols (with ar(f ) ∈ N being the arity of f ∈ NF ), NI of individual
symbols (or individual names), and Var of variables; the equality symbol ‘=’; the
Boolean operators ‘¬’ and ‘∧’; and the existential quantifier ‘∃’. An (FO) formula
is an expression ϕ ::= P (t) | s = t | ¬ϕ | (ϕ ∧ ϕ) | ∃xϕ, where x ∈ Var, P ∈ NP ,
s, t are terms, and t = (t1 , . . . , tar(P ) ) is a (possibly empty) tuple of terms, where
terms are defined inductively as follows: t ::= x | a | f (t), where x ∈ Var, a ∈ NI ,
f ∈ NF , and t = (t1 , . . . , tar(f ) ). A formula of the form P (t) is called an atom,
and a literal has the form P (t) or ¬P (t). We adopt the usual abbreviations
and conventions: in particular, ϕ ∨ ψ = ¬(¬ϕ ∧ ¬ψ) and ∀xϕ = ¬∃x¬ϕ, where
∀ is the universal quantifier. We write ϕ(x) to indicate that the free variables
(defined as usual) of ϕ are included in x, and we write ϕ(a) for the formula
obtained from ϕ(x) by substituting a to x. Similar notions and notation are
adopted for terms. A sentence is defined as a formula without free variables,
while we call quantifier-free a formula without any occurrence of existential or
universal quantifiers. A formula is existential if it has the form ∃xϕ(x), where ϕ
is a quantifier-free formula, and it is universal if it has the form ∀xϕ(x), where
ϕ is quantifier-free. A (FO) theory T is a set of FO sentences, and T is said to be
universal if every ϕ ∈ T is universal. A signature Σ is a subset of NP ∪ NF ∪ NI .
For a set Γ of formulas, the signature of Γ , denoted ΣΓ , is the set of predicate,
function, and individual symbols occurring in Γ . Given a signature Σ, we say
that Γ is a set of Σ-formulas if ΣΓ = Σ (we will use Σ-formula, Σ-theory, etc.,
in an analogous way).
    An (FO) interpretation is a pair I = (∆I , ·I ), where ∆I is a non-empty
set, called domain of I, and ·I is an interpretation function such that: P I ⊆
(∆I )ar(P ) , for every P ∈ NP; f I : (∆I )ar(f ) −→ ∆I , for every f ∈ NF ; and
aI ∈ ∆I , for every a ∈ NI . An assignment in I is a function a : Var −→ ∆I .
We define the value of a term t in I under a as follows: a(t) = a(x), if t = x;
a(t) = aI , if t = a ∈ NI ; and a(t) = f I (a(t)), if t = f (t), where f ∈ NF and,
for an m-tuple t = (t1 , . . . , tm ) of terms, we set a(t) = (a(t1 ), . . . , a(tm )). The
notion of a formula ϕ being satisfied in an interpretation I under an assignment
4       D. Calvanese et al.

a, or of I being a model of ϕ under a, written I |=a ϕ, is inductively defined as:

        I |=a P (t) iff a(t) ∈ P I ,
        I |=a s = t iff a(s) = a(t),
        I |=a ¬ψ    iff not I |=a ψ,
        I |= ψ ∧ χ iff I |=a ψ and I |=a χ,
            a
                             0
        I |=a ∃xψ iff I |=a ψ for some a0 that can differ from a on x.

For a formula ϕ(x), we write I |= ϕ[d] in place of I |=a ϕ(x), with a(x) = d.
We say that a set Γ of formulas is satisfied in an interpretation I under an
assignment a, or that I is a model of Γ under a, written I |=a Γ , if I |=a ϕ, for
every ϕ ∈ Γ (we refer to a singleton Γ = {ϕ} simply as ϕ). For a sentence ϕ,
the satisfaction of ϕ in I under a does not depend on a, thus we write I |= ϕ in
place of I |=a ϕ, and we say that ϕ is satisfied in I. For a theory T , we say that
T is satisfied in an interpretation I (or that I is a model of T ), written I |= T ,
if every sentence of T is satisfied in I. A formula ϕ is satisfiable w.r.t. T (or
T -satisfiable) if there exist an interpretation I and an assignment a in I such
that I |= T and I |=a ϕ. Moreover, we say that T logically implies a formula ϕ,
or that ϕ is a logical consequence of T , written T |= ϕ, if, for every interpretation
I and every assignment a in I, I |= T implies that I |=a ϕ. Finally, formulas ϕ,
ψ are equivalent w.r.t. T (or T -equivalent) if T |= ϕ ↔ ψ.

2.2   Description Logics Preliminaries
Let NC , NR , and NI be countably infinite and pairwise disjoint sets of concept,
role, and individual names, respectively (with NC ∪ NR ⊆ NP , i.e., concept and
role names are predicate symbols, with arity 1 and 2, respectively).
     The DL we consider here is an extension of RDFS [6] with disjointness be-
tween concepts and roles, conjunction and (one-level) qualified existential quan-
tification on the left-hand side of inclusions, and inclusion of direct and inverse
roles. We denote such DL RDFS+ , and we define it below.
     In RDFS+ , concepts C and roles R are defined according to the grammar

                     R ::= P | P − ,
                     C ::= A1 u · · · u An | ∃R.> | ∃R.A,

where P ∈ NR , n ≥ 1, and A, A1 , . . . , An ∈ NC .
    A concept inclusion (CI) has the form C v A or C v ¬A, and a role inclusion
(RI) has the form R v R0 or R v ¬R0 , where C is an RDFS+ concept, A ∈ NC ,
and R, R0 are roles. An RDFS+ TBox T is a finite set of CIs and RIs. An
assertion has the form A(a), ¬A(a), P (a, b), ¬P (a, b), (a = b), or ¬(a = b),
where A ∈ NC , P ∈ NR , and a, b ∈ NI . An ABox A is a finite set of assertions.
(We point out that in an ABox we allow for negated assertions, which is a feature
that is not always supported in DLs.) An RDFS+ ontology O is a pair (T , A),
where T is a TBox and A is an ABox.
    We observe that RDFS+ is incomparable in expressive power with the DLs
of the popular DL-Lite family [7,2]. Indeed, while DL-Lite allows for the use of
   SMT-Based Safety Verification of Data-Aware Processes under Ontologies         5

existential quantification on the right-hand side of CIs, these are ruled out in
RDFS+ . On the other hand, in RDFS+ one can locally type the second com-
ponent of a role through the use of qualified existential quantification on the
left-hand side of CIs, while this is not possible in DL-Lite. As we will see later,
differently from what happens for DL-Lite, the FO translation of an RDFS+
ontology is a universal theory.
Example 1. To represent part of the domain knowledge on job hiring processes
for university personnel, we define the RDFS+ ontology O = (T , A), where T
consists of the following concept inclusions:

AcademicPosition v JobPosition               AcademicPosition v ¬AdminPosition
  AdminPosition v JobPosition                              User v ¬JobPosition
   ∃appliesFor.> v User                        ∃appliesFor− .> v JobPosition
  ∃suitableFor.> v User                       ∃suitableFor− .> v JobPosition
  ∃suitableFor.> v PositivelyEvaluated             EligibleUser v User
 User u Graduate v EligibleUser                    EligibleUser v Graduate

while A, which stores data on available job positions, contains the assertions

        AcademicPosition(professor123 ),      AdminPosition(secretary123 ),
        AcademicPosition(researcher123 ),     AdminPosition(secretary456 ).

Moreover, we assume that A contains all the assertions of the form ¬A(u),
¬P (u, a) and ¬P (a, u), for a distinguished individual name u ∈ NI and every
A, P, a ∈ ΣO , so that u can be used to represent an undefined value. The CIs of
T formalise the following facts: there are both academic and administrative job
positions and these are disjoint; users and job positions are disjoint; appliesFor
relates users to job positions; to be suitable for something one has to be a user
that is positively evaluated; the range of suitableFor is included in the extension
of JobPositions; an eligible user is defined as a graduate user.                  /
   We define now the standard translation from RDFS+ expressions to FO for-
mulas, which maps concepts to FO formulas with one free variable, and roles
to FO formulas with two free variables. Specifically, the translation T generates
formulas that contain just two variables x, y ∈ Var, and is defined as follows:

    T(A1 u · · · u An ) = A1 (x) ∧ · · · ∧ An (x),
                  T(P ) = P (x, y),                  T(P − ) = P (y, x),
            T(∃R.>) = ∃yT(R),                      T(∃R.A) = ∃y(T(R) ∧ A(y)),
                T(¬A) = ¬T(A),                       T(¬R) = ¬T(R),

where A, A1 , . . . , An are unary predicates and P is a binary predicate. Moreover,
we map CIs and RIs into universal FO sentences in the following way:

  T(C v D) = ∀x(T(C) → T(D)),                 T(R v S) = ∀x∀y(T(R) → T(S)),

where D stands Sfor either A or ¬A, and S stands for either R0 or ¬R0 . We
also set T(T ) = β∈T {T(β)}. Assertions α are (identically) mapped into FO
6       D. Calvanese et al.

literals
S        without free variables (i.e., ground ), as T(α) = α, and we set T(A) =
   α∈A {T(α)}. Finally, T(O) = T(T ) ∪ T(A). It is easy to see that the set of FO
sentences obtained as the translation T(O) of an RDFS+ ontology O, can be
equivalently rewritten into a universal Horn theory [25,23]. Such a theory, which
we identify with T (O), can be obtained from T (O) by simply putting formulas
into prenex normal form.
    The semantics for RDFS+ expressions can be given in terms of their FO
translation [25]. For an interpretation I = (∆I , ·I ) and a concept C, we define
the extension of C in I as C I = {d ∈ ∆I | I |= T(C)[d]}. Similarly, for a role
R, we define its extension in I as RI = {(d, e) ∈ ∆I × ∆I | I |= T(R)[d, e]}. We
say that C and R are satisfied in I if C I 6= ∅ and RI 6= ∅, respectively.
    Moreover, given a CI, RI, assertion, TBox, ABox, or ontology Γ , we say that
Γ is satisfied in I (or that I is a model of Γ ), written I |= Γ , iff I |= T(Γ ).
Given an ontology O and (a concept, role, CI, RI, or assertion mapped, via
its FO translation, into) an FO formula ϕ, we say that ϕ is satisfiable w.r.t.
O (or O-satisfiable) if there exists a model I of O that satisfies ϕ under some
assignment in I. Finally, we say that O logically implies an FO formula ϕ, or
that ϕ is a logical consequence of O, written O |= ϕ, if, for every model I of O
and every assignment a in I, we have that I satisfies ϕ under a.


3   Basic Model-Theoretic Properties

In this section, we prove the model-theoretic properties that will be used later
on to develop our verification machinery. Specifically, we show here that the
standard translation of the RDFS+ ontologies introduced in the previous section
admits model completion, and has the constraint satisfiability problem decidable.
These properties will allow us, in the subsequent sections, to verify suitably
defined DL-based data-aware processes by employing a variant of the SMT-based
backward reachability procedure introduced in [10]. To present our results, we
require the following preliminary notions.
     A formula that is a conjunction of Σ-literals is called a Σ-constraint. Given a
Σ-theory T , we define the constraint satisfiability problem for T as follows: given
a formula ∃yϕ(x, y), where ϕ(x, y) is a Σ-constraint, decide whether ∃yϕ(x, y)
is satisfiable w.r.t. T . A theory T has quantifier elimination iff, for every ΣT -
formula ϕ(x), there exists a quantifier-free formula ψ(x) such that T |= ϕ(x) ↔
ψ(x). Finally, we will use the following definition of model completion, which
is restricted to cover the case of universal theories (the ones considered in this
work) and that is nonetheless known to be equivalent (for universal theories) to
the usual one from model theory [13,19,10]. Let T be a universal Σ-theory and
let T ∗ ⊇ T be a further Σ-theory. We say that T ∗ is a model completion of T
iff: (i) every Σ-constraint satisfied in some model of T is also satisfied in some
model of T ∗ ; (ii) T ∗ has quantifier elimination.
     We now state the main technical result of the section.
    SMT-Based Safety Verification of Data-Aware Processes under Ontologies               7

Theorem 2. Given an RDFS+ ontology O, T(O) is a finite universal FO theory
that (i) has a decidable constraint satisfiability problem, and (ii) admits a model
completion.

Proof (Sketch). To prove Point (i), we reduce to RDFS+ (seen as a fragment of
ALCHI, [29]) ontology satisfiability. For Point (ii), since there is no function
symbol in ΣT (O) , it is sufficient to show that T (O) enjoys the amalgamation
property: this is proved by explicitly constructing a T (O)-amalgam for every
pair of models I1 and I2 of T (O) sharing a submodel I0 (see [12] for details).

Remark 3. For every RDFS+ ontology O, the model completion T (O)∗ of T (O)
admits quantifier elimination. The algorithm for quantifier elimination in T (O)∗
follows from the proof of Theorem 2: to eliminate ∃x from a ΣT (O) -formula
∃xϕ(x, y), it is sufficient to take the conjunction of the clauses χ(y) implied
by ϕ(x, y), which are finitely many for T (O), up to T (O)-equivalence. This
procedure is used in Algorithm 1 below and is crucial to get the decidability
results of Theorem 6.                                                           /

    Properties (i) and (ii) from Theorem 2 are in line with the foundational
framework of [10,11], where a third property is additionally assumed: the finite
model property for constraint satisfiability (see the references for the definition).
However, differently from [10,11], this property is not needed anymore for the
results of our paper. This is an important difference from [10,11], since the
artifact systems studied there require to interact with finite structures (i.e.,
databases), whereas in the context of the present work we admit that the models
of the knowledge base of our artifact systems can be infinite.


4     Ontology-Based Data-Aware Processes
In this section, we present our main contributions. We first define our model,
called RDFS+ -based simple artifact systems, or RDFS+ -SASs for short, to for-
malise data-aware processes under RDFS+ ontologies. These systems are a vari-
ant of the artifact-centric systems studied in [10]. RDFS+ -SASs read data from
a given RDFS+ ontology, used to store background information of the system,
and manipulate individual variables, called artifact variables, which represent
the current state of the process. We then study the parameterised safety prob-
lems of such models by adopting a symbolic version [21,11] of the well-known
backward reachability procedure [1].

4.1   DL-Based Simple Artifact Systems
We first require the following preliminary notions. For an RDFS+ ontology O,
an O-partition
          Wn     is a finite V set P = {κ1 (x), . . . , κn (x)} of ΣO -literals such that
O |= ∀x i=1 κi (x) ∧ ∀x i6=j ¬(κi (x) ∧ κj (x)). Given an ontology O, an O-
partition P = {κ1 (x), . . . , κn (x)}, and ΣO -terms t(x) = (t1 (x), . . . , tn (x)), (the
value of) a case-defined function F based on P and t, for a fresh function symbol
8         D. Calvanese et al.

F ∈ NF , is defined as follows: for every model I of O, every assignment a in I,
and every tuple x of variables, a(F (x)) = a(ti (x)), if I |=a κi (x).
   In order to introduce verification problems in a symbolic setting, one first
has to specify which formulas are used to represent (i) the sets of states, (ii) the
system initialisations, and (iii) the system evolution. To capture these aspects,
we provide the following definitions.
   An RDFS+ -based simple artifact system (RDFS+ -SAS) is a tuple
                                         Sm
                         S = (O, x, ι(x), j=1 {τj (x, x0 )}),

where m ∈ N, and
    – O = (T , A) is an RDFS+ ontology;
    – x = (x1 , . . . , xn ) is a tuple of variables, called artifact variables, and x0 is a
      tuple ofVvariables that are renamed copies of variables in x;
                  n
    – ι(x) = i=1 xi = ai , with ai ∈ NI , is an initial state formula;
                                     Vn
    – τj (x, x0 ) = ∃y(γ j (x, y) ∧ i=1 x0i = Fij (x, y)), for 1 ≤ j ≤ m, is a transition
      formula, where γ j (x, y) is a conjunction of ΣO -literals called guard of τj ,
      and x0i = Fij (x, y), where each Fij is a case-defined function based on some
      O-partition and list of ΣO -terms, is an update of τj .
    Given an RDFS+ ontology O, we call state (ΣO -)formula a quantifier-free
ΣO -formula ϕ(x). A state formula constrains the content of the artifact variables
characterising the current states of the systems. Notice that a state formula can
represent a (possibly infinite) set of states, because of the presence of (possibly
infinitely many) different elements in a model of the ontology O. A safety formula
for S is a state ΣO -formula ν(x), used to describe the undesired states of the
system. We say that S is safe w.r.t. ν(x) if there does not exist k ≥ 0 and a
formula of the form

                   ι(x0 ) ∧ τj0 (x0 , x1 ) ∧ · · · ∧ τjk−1 (xk−1 , xk ) ∧ ν(xk ),       (?)

that is satisfiable w.r.t. O, where 1 ≤ j0 , . . . , jk−1 ≤ m and each xh , with
0 ≤ h ≤ k, is a tuple of variables that are renamed copies of variables in x. The
safety problem for S is the following decision problem: given a safety formula
ν(x) for S, decide whether S is safe w.r.t. ν(x). This verification problem is
parametric on the models of a fixed RDFS+ ontology, since safety is assessed
irrespectively of the choice of such a model. This implies that, when the system
is safe, it is so for every execution of the process under every possible model
(which in principle are infinitely many) of the given ontology.
Example 4. We develop a simplified job hiring process for university personnel
based on the domain knowledge formalised in Example 1. Each application is
created using a dedicated website portal, where users that are potentially inter-
ested in applying need to register in advance. When a registered user decides
to apply, the data created by this single application do not have to be stored
persistently and thus can be maintained just by using artifact variables (de-
scribed below) that can interact with the knowledge base. All these variables
   SMT-Based Safety Verification of Data-Aware Processes under Ontologies               9

are initialised with an undefined value u. In the first transition of the system,
an application is created by a registered user, which falls into the extension of
the concept User: the information about this user is then stored in the artifact
variable xapplicant . At this point, the application website asks the user whether
they hold a university degree: in case of an affermative answer, the website ac-
cepts the user as eligible, the information about the user is stored using xapplicant
and the process can progress. Then, the user picks up a job position (assigned to
xjob ) and applies for it. The following steps of the process involve the evaluation
of the application: for both academic and administrative positions, if the eligible
candidate is suitable for the chosen position, they are declared winner (assigned
to xwinner ), otherwise they are declared loser (assigned to xloser ). To formalise this
                                                        S7
process, we define the RDFS+ -SAS S = (O, x, ι(x), j=1 {τj (x, x0 )}) so that:
 – the ontology O is the RDFS+ ontology given in Example 1;
 – the artifact variables are x = (xapplicant , xjob , xeligible , xwinner , xloser );
 – the initial state formula is
    ι = (xapplicant = u) ∧ (xjob = u) ∧ (xeligible = u) ∧ (xwinner = u) ∧ (xloser = u);
 – the transition formulas are
   τ1 = ∃y1 (User(y1 ) ∧ x0applicant = y1 ),
   τ2 = EligibleUser(xapplicant ) ∧ x0eligible = xapplicant ,
   τ3 = ∃z1 (JobPosition(z1 ) ∧ appliesFor(xeligible , z1 ) ∧ x0job = z1 ),
   τ4 = AcademicPosition(xjob ) ∧ suitableFor(xeligible , xjob ) ∧ x0winner = xeligible ,
   τ5 = AdminPosition(xjob ) ∧ suitableFor(xeligible , xjob ) ∧ x0winner = xeligible ,
   τ6 = AcademicPosition(xjob ) ∧ ¬suitableFor(xeligible , xjob ) ∧ x0loser = xeligible ,
   τ7 = AdminPosition(xjob ) ∧ ¬suitableFor(xeligible , xjob ) ∧ x0loser = xeligible .
An undesired situation of the system is the one where an applicant registered
user is declared winner even if they were not eligible. This situation is formally
described by the following safety formula for S:

                     ν = User(xwinner ) ∧ ¬EligibleUser(xwinner ).                      /

4.2   Backward Search for RDFS+ -SASs
Algorithm 1 shows the SMT-based backward reachability procedure (or backward
search) for handling the safety problem for an RDFS+ -SAS S. An integral part
of the algorithm is to compute symbolic preimages (Line 5). The intuition behind
the algorithm is to execute a loop where, starting from the undesired states of
the system (described by the safety formula ν(x)), the state space of the system
is explored backward : in every iteration of the while loop (Line 2), the current set
of states is regressed through transitions thanks to the preimage computation.
                                 0                    0
             Wm for any τ (z, z ) and φ(z) (where z are
For that purpose,                                          renamed copies of z), we
define τ := h=1 τh and Pre(τ, φ) as the formula ∃z 0 (τ (z, z 0 ) ∧ φ(z 0 )). Let φ(x)
be a state formula, describing the state of the artifact variables x. The preimage
of the set of states described by theWformula φ(x) is the setWof states described by
Pre(τ, φ) (notice that, when τ = τ̂ , then Pre(τ, φ) = Pre(τ̂ , φ)). We recall
that a state formula is a quantifier-free ΣO -formula. Unfortunately, because of
10      D. Calvanese et al.


 Algorithm 1: SMT-based backward reachability procedure
   Function BReach(ν)
 1    φ ←− ν; B ←− ⊥;
 2    while φ ∧ ¬B is T (O)-satisfiable do
 3       if ι ∧ φ is T (O)-satisfiable then
              return (unsafe, unsafe trace of form (?));
 4        B ←− φ ∨ B;
 5        φ ←− Pre(τ, φ);
 6        φ ←− QE(T (O)∗ , φ);
       return safe;


the presence of the existentially quantified variables y in τ , Pre(τ, φ) is not a
state formula, in general. As stated in [10,11], if the quantified variables were
not eliminated, we would break the regressability of the procedure: indeed, the
states reached by computing preimages, intuitively described by Pre(τ, φ), need
to be represented by a state formula φ0 in the new iteration of the while loop.
In addition, the increase of the number of variables due to the iteration of the
preimage computation would affect the performance of the satisfiability tests
described below, in case the loop is executed many times. In order to solve these
issues, it is essential to introduce the subprocedure QE(T (O)∗ , φ) in Line 6.
    QE(T (O)∗ , φ) in Line 6 is a subprocedure that implements the quantifier
elimination algorithm of T (O)∗ and that converts the preimage Pre(τ, φ) of a
state formula φ into a state formula (equivalent to it modulo the axioms of
T (O)∗ ), so as to guarantee the regressability of the procedure: this conversion
is possible since T (O)∗ eliminates from τh the existentially quantified variables
y. Backward search computes iterated preimages of the safety formula ν, until
a fixpoint is reached (in that case, S is safe w.r.t. ν) or until a set intersecting
the initial states (i.e., satisfying ι) is found (in that case, S is unsafe w.r.t. ν).
Inclusion (Line 2) and disjointness (Line 3) tests can be discharged via proof
obligations to be handled by SMT solvers. The fixpoint is reached when the test
in Line 2 returns unsat: the preimage of the set of the current states is included
in the set of states reached by the backward search so far (represented as the
iterated application of preimages to the safety formula ν). The test at Line 3
is satisfiable when the states visited so far by the backward search includes a
possible initial state (i.e., a state satisfying ι). If this is the case, then S is
unsafe w.r.t. ν. Together with the unsafe outcome, the algorithm also returns an
unsafe trace of the form (?), explicitly witnessing the sequence of transitions τh
that, starting from the initial configurations, lead the system to a set of states
satisfying the undesired conditions described by ν(x).
Theorem 5. Backward search (Algorithm 1) is correct for detecting whether an
RDFS+ -SAS S is safe w.r.t. ν(x).
Proof (Sketch). First, we require the following claim, which follows immediately
from the definitions.
   SMT-Based Safety Verification of Data-Aware Processes under Ontologies        11

Claim 1. For every safety formula ν(x) for S and every k ≥ 0, a formula ϑ of
the form (?) is satisfiable w.r.t. O iff ϑ is satisfiable w.r.t. T (O).

    Then, we need to show that, instead of considering satisfiability of formulas of
the form (?) in models of T (O), we can concentrate on satisfiability w.r.t. T (O)∗
(T (O)∗ exists thanks to Property (ii) of Theorem 2). Then, by exploiting the
algorithm for quantifier elimination in T (O)∗ described in Remark 3, formulas
of the form (?) can be represented via backward search by using quantifier-free
formulas. We finally conclude by noticing that safety/unsafety of S w.r.t ν(x)
can be now detected invoking the satisfiability tests (which are effective thanks
to Property (i) of Theorem 2) over those quantifier-free formulae.


    Backward search for generic artifact systems is not guaranteed to termi-
nate [11]. However, in case S is unsafe w.r.t. ν(x), an unsafe trace—which is
finite—is found after finitely many iterations of the while loop: hence, in the
unsafe case, backward search must terminate. Together with the theorem above,
this means that the backward reachability procedure is at least a semi-decision
procedure for detecting unsafety of RDFS+ -SASs. Nevertheless, we show in the
following theorem that, in case of RDFS+ -SASs, backward search always termi-
nates: thus, it is a full decision procedure, for which we also provide a PSpace
upper bound.

Theorem S   6. For an RDFS+ ontology O and an RDFS+ -SAS S =
              m
(O, x, ι(x), j=1 {τj (x, x0 )}), the safety problem for S is decidable in PSpace
                                       Sm
in the combined size of x, ι(x) and j=1 {τj (x, x0 )}.

Proof (Sketch). For every RDFS+ ontology O, there are only finitely many
quantifier-free ΣT (O) -formulas, up to T (O)-equivalence, that can be built out
of a finite set of variables x. Thanks to the availability of the quantifier elim-
ination procedure QE(T (O)∗ , ϕ), the overall number of variables in ϕ is never
increased. This implies that globally there are only finitely many quantifier-free
ΣT (O) -formulas that Algorithm 1 needs to analyse. Hence, Algorithm 1 termi-
nates. Concerning complexity, we first note that the translation T (O) requires
polynomial time. Then, we need to eliminate the occurrences of case-defined
functions (creating an equivalent SAS whose size is polynomial in the size of the
original one), and to modify Algorithm 1 by making it nondeterministic with an
NPSpace complexity. The claim follows by applying Savitch’s Theorem.


    We observe that Algorithm 1 is not yet implemented in the state-of-the-art
model checker MCMT (Model Checker Modulo Theories [21]), which is based
on SMT solving. Such an implementation, however, can be obtained by extend-
ing MCMT with the quantifier elimination algorithm for T (O)∗ (described in
Remark 3), required in Line 6, together with a procedure for RDFS+ ontology
satisfiability (seen as a fragment of ALCHI, [29]), required in Lines 2 and 3.
12      D. Calvanese et al.

5    Conclusions

We have studied the problem of verification of data-aware processes under
RDFS+ ontologies, where the process component can interact with a knowledge
base specified by means of the DL RDFS+ , underpinning the RDFS constructs.
We addressed this problem by introducing a suitable model of DL-based artifact-
centric systems, called RDFS+ -based SASs, and by leveraging the SMT-based
version of the backward reachability procedure, which is a well-known technique
to employ for verifying systems of this kind. Specifically, we showed that this
procedure is a full decision procedure for detecting safety of RDFS+ -based SASs,
and we also provided a PSpace complexity upper bound.
    This work opens several directions for future work. First, we notice that the
choice of RDFS+ ontologies is not intrinsic to our approach. Indeed, motivated
by conceptual modelling and data integration issues in OBDA applications, we
are currently working on the DL-Lite family of DLs, to define suitable DL-
Lite-based SASs with analogous decidability and complexity results. The main
difference we have to account for is that, for a DL-Lite ontology O, we have an
equisatisfiable (but not equivalent) translation into a universal one-variable FO
sentence T (O), and Claim 1 in the proof of Theorem 5 has to be modified to show
that a trace ϑ is satisfiable w.r.t. O iff a suitably translated trace ϑ̂ is satisfiable
w.r.t. T (O). In general, nonetheless, we point out that any DL satisfying the
two conditions stated in Theorem 2 can be chosen for our purposes, and that
the same theoretical guarantees can be obtained over the SMT-based backward
reachability procedure. As future work, we thus intend to introduce a more
general framework for DL-based SASs that is able to account for different DLs.
We also intend to extend the results obtained here to more sophisticated artifact-
centric models, such as the relational artifact systems (RASs) studied in [10,11].
Moreover, it could be worth investigating in this setting also properties that go
beyond safety, such as liveness and fairness.


Acknowledgements
This research has been partially supported by the Wallenberg AI, Autonomous
Systems and Software Program (WASP) funded by the Knut and Alice Wal-
lenberg Foundation, by the Italian Basic Research (PRIN) project HOPE, by
the EU H2020 project INODE (grant agreement 863410) by the CHIST-ERA
project PACMEL, by the project IDEE (FESR1133) funded by the Eur. Reg.
Development Fund (ERDF) Investment for Growth and Jobs Programme 2014-
2020, and by the Free University of Bozen-Bolzano through the projects KGID,
GeoVKG, STyLoLa, VERBA and MENS.
   SMT-Based Safety Verification of Data-Aware Processes under Ontologies              13

References

 1. Abdulla, P.A., Cerans, K., Jonsson, B., Tsay, Y.: General decidability the-
    orems for infinite-state systems. In: Proc. of the 11th Int. Conf. on Logic
    in Computer Science (LICS). pp. 313–321. IEEE Computer Society (1996).
    https://doi.org/10.1109/LICS.1996.561359
 2. Artale, A., Calvanese, D., Kontchakov, R., Zakharyaschev, M.: The DL-Lite
    family and relations. J. of Artificial Intelligence Research 36, 1–69 (2009).
    https://doi.org/10.1613/jair.2820
 3. Bagheri Hariri, B., Calvanese, D., De Giacomo, G., De Masellis, R., Montali, M.,
    Felli, P.: Verification of description logic Knowledge and Action Bases. In: Proc. of
    the 20th Eur. Conf. on Artificial Intelligence (ECAI). pp. 103–108 (2012)
 4. Bagheri Hariri, B., Calvanese, D., De Giacomo, G., Deutsch, A., Montali, M.:
    Verification of relational data-centric dynamic systems with external services. In:
    Proc. of the 32nd ACM Symp. on Principles of Database Systems (PODS). pp.
    163–174 (2013)
 5. Bagheri Hariri, B., Calvanese, D., Montali, M., De Giacomo, G., De Masellis, R.,
    Felli, P.: Description logic Knowledge and Action Bases. J. of Artificial Intelligence
    Research 46, 651–686 (2013). https://doi.org/10.1613/jair.3826
 6. Brickley, D., Guha, R.: RDF Schema 1.1. W3C Recommendation, World Wide
    Web Consortium (2014), available at https://www.w3.org/TR/rdf-schema/
 7. Calvanese, D., De Giacomo, G., Lembo, D., Lenzerini, M., Rosati, R.: Tractable
    reasoning and efficient query answering in description logics: The DL-Lite family. J.
    of Automated Reasoning 39(3), 385–429 (2007). https://doi.org/10.1007/s10817-
    007-9078-x
 8. Calvanese, D., De Giacomo, G., Montali, M.: Foundations of data aware
    process analysis: A database theory perspective. In: Proc. of the 32nd
    ACM Symp. on Principles of Database Systems (PODS). pp. 1–12 (2013).
    https://doi.org/10.1145/2463664.2467796
 9. Calvanese, D., Ghilardi, S., Gianola, A., Montali, M., Rivkin, A.: Formal
    modeling and SMT-based parameterized verification of data-aware BPMN. In:
    Proc. of the 17th Int. Conf. on Business Process Management (BPM). Lec-
    ture Notes in Computer Science, vol. 11675, pp. 157–175. Springer (2019).
    https://doi.org/10.1007/978-3-030-26619-6 12
10. Calvanese, D., Ghilardi, S., Gianola, A., Montali, M., Rivkin, A.: From model
    completeness to verification of data aware processes. In: Description Logic, Theory
    Combination, and All That – Essays Dedicated to Franz Baader on the Occasion
    of his 60th Birthday. Lecture Notes in Computer Science, vol. 11560, pp. 212–239.
    Springer (2019). https://doi.org/10.1007/978-3-030-22102-7 10
11. Calvanese, D., Ghilardi, S., Gianola, A., Montali, M., Rivkin, A.: SMT-
    based verification of data-aware processes: A model-theoretic approach.
    Mathematical Structures in Computer Science 30(3), 271–313 (2020).
    https://doi.org/10.1017/S0960129520000067
12. Calvanese, D., Gianola, A., Mazzullo, A., Montali, M.: SMT-Based Safety
    Verification of Data-Aware Processes under Ontologies (Extended Version).
    Tech. Rep. arXiv:2108.12330, arXiv.org e-Print archive (2021), available at
    https://arxiv.org/abs/2108.12330
13. Chang, C.C., Keisler, H.J.: Model Theory. North-Holland Publ. Co., 3rd edn.
    (1990)
14      D. Calvanese et al.

14. Claßen, J., Liebenberg, M., Lakemeyer, G., Zarrieß, B.: Exploring the boundaries
    of decidable verification of non-terminating golog programs. In: Proc. of the 28th
    AAAI Conf. on Artificial Intelligence (AAAI). pp. 1012–1019. AAAI Press (2014)
15. Damaggio, E., Deutsch, A., Vianu, V.: Artifact systems with data dependen-
    cies and arithmetic. ACM Trans. on Database Systems 37(3), 22 (2012).
    https://doi.org/10.1145/2338626.2338628
16. Deutsch, A., Hull, R., Patrizi, F., Vianu, V.: Automatic verification of data-centric
    business processes. In: Proc. of the 12th Int. Conf. on Database Theory (ICDT).
    pp. 252–267 (2009)
17. Deutsch,     A.,     Hull,   R.,    Vianu,     V.:   Automatic      verification    of
    database-centric      systems.    SIGMOD        Record      43(3),   5–17      (2014).
    https://doi.org/10.1145/2694428.2694430
18. Deutsch, A., Li, Y., Vianu, V.: Verification of hierarchical artifact sys-
    tems. ACM Trans. on Database Systems 44(3), 12:1–12:68 (2019).
    https://doi.org/10.1145/3321487
19. Ghilardi, S.: Model-theoretic methods in combined constraint satisfiability. J. of
    Automated Reasoning 33(3–4), 221–249 (2004)
20. Ghilardi, S., Gianola, A., Montali, M., Rivkin, A.: Petri nets with parameterised
    data - modelling and verification. In: Proc. of the 18th Int. Conf. on Business
    Process Management (BPM). Lecture Notes in Computer Science, vol. 12168, pp.
    55–74. Springer (2020). https://doi.org/10.1007/978-3-030-58666-9 4
21. Ghilardi, S., Ranise, S.: Backward reachability of array-based systems by SMT
    solving: Termination and invariant synthesis. Log. Methods Comput. Sci. 6(4)
    (2010). https://doi.org/10.2168/LMCS-6(4:10)2010
22. Ghilardi, S., Ranise, S.: MCMT: A model checker modulo theories. In: Giesl, J.,
    Hähnle, R. (eds.) Proc. of the 5th Int. Joint Conf. on Automated Reasoning (IJ-
    CAR). Lecture Notes in Computer Science, vol. 6173, pp. 22–29. Springer (2010).
    https://doi.org/10.1007/978-3-642-14203-1 3
23. Hodges, W.: Model Theory, Encyclopedia of Mathematics and its applications,
    vol. 42. Cambridge University Press (1993)
24. Hull, R.: Artifact-centric business process models: Brief survey of research results
    and challenges. In: Proc. of the 7th Int. Conf. on Ontologies, DataBases, and Ap-
    plications of Semantics (ODBASE). Lecture Notes in Computer Science, vol. 5332,
    pp. 1152–1163. Springer (2008). https://doi.org/10.1007/978-3-540-88873-4 17
25. Kontchakov, R., Zakharyaschev, M.: An introduction to description logics and
    query rewriting. In: Reasoning Web: Reasoning on the Web in the Big Data Era
    – 10th Int. Summer School Tutorial Lectures (RW). Lecture Notes in Computer
    Science, vol. 8714, pp. 195–244. Springer (2014). https://doi.org/10.1007/978-3-
    319-10587-1 5
26. de Leoni, M., Felli, P., Montali, M.: Strategy synthesis for data-aware dynamic sys-
    tems with multiple actors. In: Calvanese, D., Erdem, E., Thielscher, M. (eds.) Proc.
    of the 17th Int. Conf. on Principles of Knowledge Representation and Reasoning
    (KR). pp. 315–325 (2020). https://doi.org/10.24963/kr.2020/32
27. Montali, M., Rivkin, A., Ritter, D.: Formalizing integration patterns
    with multimedia data. In: Proc. of the 24th Int. Conf. on Enter-
    prise Distributed Object Computing (EDOC). pp. 67–76. IEEE (2020).
    https://doi.org/10.1109/EDOC49727.2020.00018
28. Reichert, M.: Process and data: Two sides of the same coin? In: Proc. of the On
    the Move Confederated Int. Conf. (OTM). Lecture Notes in Computer Science,
    vol. 7565, pp. 2–19. Springer (2012). https://doi.org/10.1007/978-3-642-33606-5 2
   SMT-Based Safety Verification of Data-Aware Processes under Ontologies           15

29. Rudolph, S.: Foundations of description logics. In: Reasoning Web: Semantic
    Technologies for the Web of Data – 7th Int. Summer School Tutorial Lectures
    (RW). Lecture Notes in Computer Science, vol. 6848, pp. 76–136. Springer (2011).
    https://doi.org/10.1007/978-3-642-23032-5 2
30. Vianu, V.: Automatic verification of database-driven systems: a new frontier.
    In: Proc. of the 12th Int. Conf. on Database Theory (ICDT). pp. 1–13 (2009).
    https://doi.org/10.1145/1514894.1514896
31. Zarrieß, B., Claßen, J.: Decidable verification of golog programs over non-local
    effect actions. In: Proc. of the 30th AAAI Conf. on Artificial Intelligence (AAAI).
    pp. 1109–1115. AAAI Press (2016)