SMT-Based Safety Verification of Data-Aware Processes under Ontologies (Preliminary Results) Diego Calvanese1,2 , Alessandro Gianola1 , Andrea Mazzullo1 , and Marco Montali1 1 KRDB Research Centre for Knowledge and Data Free University of Bozen-Bolzano, Italy surname @inf.unibz.it 2 Computing Science Department, Umeå University, Sweden Abstract. In the context of verification of data-aware processes, a for- mal approach based on satisfiability modulo theories (SMT) has been considered to verify parameterised safety properties of so-called artifact- centric systems. This approach requires a combination of model-theoretic notions and algorithmic techniques based on backward reachability. We introduce here a variant of one of the most investigated models in this spectrum, namely simple artifact systems (SASs), where, instead of managing a database, we operate over a description logic (DL) ontol- ogy expressed in (a slight extension of) RDFS. This DL, enjoying suit- able model-theoretic properties, allows us to define DL-based SASs to which backward reachability can still be applied, leading to decidability in PSpace of the corresponding safety problems. 1 Introduction Verifying and reasoning about dynamic systems that integrate processes and data is a long-standing challenge that attracted considerable attention, and that led to a flourishing series of results, within business process management [28,9,20] and data management [30,8,4,17,18]. Among the several conceptual models stud- ied in this area, data-centric systems and in particular artifact-centric systems have been brought forward as a principled approach where relevant (business) objects are elicited, then defining how actions evolve them throughout their life- cycle [24]. Different formal models have been proposed to capture artifact sys- tems and study their verification [8]. One of the most studied settings considers artifact systems as being composed of: a read-only database storing background information about artifacts that does not change during the evolution of the sys- tem; a working memory, used to store data that can be modified in the course of the evolution; and transitions (also called actions or services) that query the Copyright c 2021 for this paper by its authors. Use permitted under Creative Com- mons License Attribution 4.0 International (CC BY 4.0). 2 D. Calvanese et al. read-only database and the working memory and use the retrieved answers to update the working memory. Verification of such systems is particularly challeng- ing, not only because the working memory in general evolves through infinitely many different configurations, but also because the desired verification proper- ties should hold regardless of the specific content of the read-only database, thus calling for a particular form of parameterised verification [15,18,10,11]. In this paper, we study for the first time semantic artifact systems where the read-only database is substituted by a Description Logic ontology, which stores background, incomplete information about the artifacts. In this setting, two possible notions of parameterisation may be studied: one where the evolution of the system is verified against all possible choices for the ABox, another where verification is against all possible models of a fixed ABox. In this work, we adopt the latter hypothesis, and thus verify whether the artifact system enjoys desired properties irrespectively of how the information explicitly provided by the ABox is completed through the TBox assertions. More in detail, we consider an extensively studied, basic model of such artifact-centric systems, called simple artifact system (SAS ) in [11], where the artifact working memory consists of a fixed set of artifact variables [16,15,11]. On top of this basis, we study the verification of safety properties in the case where the ontology is specified in (a slight extension of) RDFS [6], a schema language for the Semantic Web formalized by the W3C, and we make use of the ontology signature to express the transitions that update the working memory. For this setting, we show that we can decide safety properties in PSpace by relying on an SMT-based backward reachability procedure. In spirit, our approach is reminiscent of previous works studying the veri- fication of dynamic systems (in particular, Golog programs) operating over a DL ontology, such as [14,31]. In fact, both in their settings and ours, the dy- namic system evolves each model of the ontology, and verification properties are assessed over all the resulting evolutions. This is radically different from approaches where the ABox itself is evolved by the process, with an execution semantics following Levesque’s functional approach, in which query entailment over the current state is used to compute the successor states [3,5]. However, we differ from [14,31] in that our goal is not only to derive foundational results, but also to transfer such results into practical algorithms and thus obtain a model that is readily implementable by relying on a state-of-the-art model checker such as MCMT [22]. As customary in the formal literature on artifact-centric systems, our approach is based on actions that manipulate the artifact variables, coupled with condition-action rules that declaratively define which actions are currently executable, and with which parameters. Alternative choices could be seamlessly taken, by adapting approaches that rely on an explicit description of the control- flow, e.g., based on state machines [26] or Petri nets interpreted with interleaving semantics [20,27]. Full details are provided in the extended version of this paper [12]. SMT-Based Safety Verification of Data-Aware Processes under Ontologies 3 2 Preliminaries In this section, we first recall the syntax and semantics of first-order logic (FO). We then define the syntax of the DL RDFS+ considered in this paper, which is a slight extension of RDFS [6]. Its semantics is provided by means of the standard translation, mapping RDFS+ ontologies into equivalent sets of FO formulas. 2.1 First-Order Logic Preliminaries The alphabet of first-order logic (FO) consists of: countably infinite and pairwise disjoint sets NP of predicate symbols (with ar(P ) ∈ N being the arity of P ∈ NP ), NF of function symbols (with ar(f ) ∈ N being the arity of f ∈ NF ), NI of individual symbols (or individual names), and Var of variables; the equality symbol ‘=’; the Boolean operators ‘¬’ and ‘∧’; and the existential quantifier ‘∃’. An (FO) formula is an expression ϕ ::= P (t) | s = t | ¬ϕ | (ϕ ∧ ϕ) | ∃xϕ, where x ∈ Var, P ∈ NP , s, t are terms, and t = (t1 , . . . , tar(P ) ) is a (possibly empty) tuple of terms, where terms are defined inductively as follows: t ::= x | a | f (t), where x ∈ Var, a ∈ NI , f ∈ NF , and t = (t1 , . . . , tar(f ) ). A formula of the form P (t) is called an atom, and a literal has the form P (t) or ¬P (t). We adopt the usual abbreviations and conventions: in particular, ϕ ∨ ψ = ¬(¬ϕ ∧ ¬ψ) and ∀xϕ = ¬∃x¬ϕ, where ∀ is the universal quantifier. We write ϕ(x) to indicate that the free variables (defined as usual) of ϕ are included in x, and we write ϕ(a) for the formula obtained from ϕ(x) by substituting a to x. Similar notions and notation are adopted for terms. A sentence is defined as a formula without free variables, while we call quantifier-free a formula without any occurrence of existential or universal quantifiers. A formula is existential if it has the form ∃xϕ(x), where ϕ is a quantifier-free formula, and it is universal if it has the form ∀xϕ(x), where ϕ is quantifier-free. A (FO) theory T is a set of FO sentences, and T is said to be universal if every ϕ ∈ T is universal. A signature Σ is a subset of NP ∪ NF ∪ NI . For a set Γ of formulas, the signature of Γ , denoted ΣΓ , is the set of predicate, function, and individual symbols occurring in Γ . Given a signature Σ, we say that Γ is a set of Σ-formulas if ΣΓ = Σ (we will use Σ-formula, Σ-theory, etc., in an analogous way). An (FO) interpretation is a pair I = (∆I , ·I ), where ∆I is a non-empty set, called domain of I, and ·I is an interpretation function such that: P I ⊆ (∆I )ar(P ) , for every P ∈ NP; f I : (∆I )ar(f ) −→ ∆I , for every f ∈ NF ; and aI ∈ ∆I , for every a ∈ NI . An assignment in I is a function a : Var −→ ∆I . We define the value of a term t in I under a as follows: a(t) = a(x), if t = x; a(t) = aI , if t = a ∈ NI ; and a(t) = f I (a(t)), if t = f (t), where f ∈ NF and, for an m-tuple t = (t1 , . . . , tm ) of terms, we set a(t) = (a(t1 ), . . . , a(tm )). The notion of a formula ϕ being satisfied in an interpretation I under an assignment 4 D. Calvanese et al. a, or of I being a model of ϕ under a, written I |=a ϕ, is inductively defined as: I |=a P (t) iff a(t) ∈ P I , I |=a s = t iff a(s) = a(t), I |=a ¬ψ iff not I |=a ψ, I |= ψ ∧ χ iff I |=a ψ and I |=a χ, a 0 I |=a ∃xψ iff I |=a ψ for some a0 that can differ from a on x. For a formula ϕ(x), we write I |= ϕ[d] in place of I |=a ϕ(x), with a(x) = d. We say that a set Γ of formulas is satisfied in an interpretation I under an assignment a, or that I is a model of Γ under a, written I |=a Γ , if I |=a ϕ, for every ϕ ∈ Γ (we refer to a singleton Γ = {ϕ} simply as ϕ). For a sentence ϕ, the satisfaction of ϕ in I under a does not depend on a, thus we write I |= ϕ in place of I |=a ϕ, and we say that ϕ is satisfied in I. For a theory T , we say that T is satisfied in an interpretation I (or that I is a model of T ), written I |= T , if every sentence of T is satisfied in I. A formula ϕ is satisfiable w.r.t. T (or T -satisfiable) if there exist an interpretation I and an assignment a in I such that I |= T and I |=a ϕ. Moreover, we say that T logically implies a formula ϕ, or that ϕ is a logical consequence of T , written T |= ϕ, if, for every interpretation I and every assignment a in I, I |= T implies that I |=a ϕ. Finally, formulas ϕ, ψ are equivalent w.r.t. T (or T -equivalent) if T |= ϕ ↔ ψ. 2.2 Description Logics Preliminaries Let NC , NR , and NI be countably infinite and pairwise disjoint sets of concept, role, and individual names, respectively (with NC ∪ NR ⊆ NP , i.e., concept and role names are predicate symbols, with arity 1 and 2, respectively). The DL we consider here is an extension of RDFS [6] with disjointness be- tween concepts and roles, conjunction and (one-level) qualified existential quan- tification on the left-hand side of inclusions, and inclusion of direct and inverse roles. We denote such DL RDFS+ , and we define it below. In RDFS+ , concepts C and roles R are defined according to the grammar R ::= P | P − , C ::= A1 u · · · u An | ∃R.> | ∃R.A, where P ∈ NR , n ≥ 1, and A, A1 , . . . , An ∈ NC . A concept inclusion (CI) has the form C v A or C v ¬A, and a role inclusion (RI) has the form R v R0 or R v ¬R0 , where C is an RDFS+ concept, A ∈ NC , and R, R0 are roles. An RDFS+ TBox T is a finite set of CIs and RIs. An assertion has the form A(a), ¬A(a), P (a, b), ¬P (a, b), (a = b), or ¬(a = b), where A ∈ NC , P ∈ NR , and a, b ∈ NI . An ABox A is a finite set of assertions. (We point out that in an ABox we allow for negated assertions, which is a feature that is not always supported in DLs.) An RDFS+ ontology O is a pair (T , A), where T is a TBox and A is an ABox. We observe that RDFS+ is incomparable in expressive power with the DLs of the popular DL-Lite family [7,2]. Indeed, while DL-Lite allows for the use of SMT-Based Safety Verification of Data-Aware Processes under Ontologies 5 existential quantification on the right-hand side of CIs, these are ruled out in RDFS+ . On the other hand, in RDFS+ one can locally type the second com- ponent of a role through the use of qualified existential quantification on the left-hand side of CIs, while this is not possible in DL-Lite. As we will see later, differently from what happens for DL-Lite, the FO translation of an RDFS+ ontology is a universal theory. Example 1. To represent part of the domain knowledge on job hiring processes for university personnel, we define the RDFS+ ontology O = (T , A), where T consists of the following concept inclusions: AcademicPosition v JobPosition AcademicPosition v ¬AdminPosition AdminPosition v JobPosition User v ¬JobPosition ∃appliesFor.> v User ∃appliesFor− .> v JobPosition ∃suitableFor.> v User ∃suitableFor− .> v JobPosition ∃suitableFor.> v PositivelyEvaluated EligibleUser v User User u Graduate v EligibleUser EligibleUser v Graduate while A, which stores data on available job positions, contains the assertions AcademicPosition(professor123 ), AdminPosition(secretary123 ), AcademicPosition(researcher123 ), AdminPosition(secretary456 ). Moreover, we assume that A contains all the assertions of the form ¬A(u), ¬P (u, a) and ¬P (a, u), for a distinguished individual name u ∈ NI and every A, P, a ∈ ΣO , so that u can be used to represent an undefined value. The CIs of T formalise the following facts: there are both academic and administrative job positions and these are disjoint; users and job positions are disjoint; appliesFor relates users to job positions; to be suitable for something one has to be a user that is positively evaluated; the range of suitableFor is included in the extension of JobPositions; an eligible user is defined as a graduate user. / We define now the standard translation from RDFS+ expressions to FO for- mulas, which maps concepts to FO formulas with one free variable, and roles to FO formulas with two free variables. Specifically, the translation T generates formulas that contain just two variables x, y ∈ Var, and is defined as follows: T(A1 u · · · u An ) = A1 (x) ∧ · · · ∧ An (x), T(P ) = P (x, y), T(P − ) = P (y, x), T(∃R.>) = ∃yT(R), T(∃R.A) = ∃y(T(R) ∧ A(y)), T(¬A) = ¬T(A), T(¬R) = ¬T(R), where A, A1 , . . . , An are unary predicates and P is a binary predicate. Moreover, we map CIs and RIs into universal FO sentences in the following way: T(C v D) = ∀x(T(C) → T(D)), T(R v S) = ∀x∀y(T(R) → T(S)), where D stands Sfor either A or ¬A, and S stands for either R0 or ¬R0 . We also set T(T ) = β∈T {T(β)}. Assertions α are (identically) mapped into FO 6 D. Calvanese et al. literals S without free variables (i.e., ground ), as T(α) = α, and we set T(A) = α∈A {T(α)}. Finally, T(O) = T(T ) ∪ T(A). It is easy to see that the set of FO sentences obtained as the translation T(O) of an RDFS+ ontology O, can be equivalently rewritten into a universal Horn theory [25,23]. Such a theory, which we identify with T (O), can be obtained from T (O) by simply putting formulas into prenex normal form. The semantics for RDFS+ expressions can be given in terms of their FO translation [25]. For an interpretation I = (∆I , ·I ) and a concept C, we define the extension of C in I as C I = {d ∈ ∆I | I |= T(C)[d]}. Similarly, for a role R, we define its extension in I as RI = {(d, e) ∈ ∆I × ∆I | I |= T(R)[d, e]}. We say that C and R are satisfied in I if C I 6= ∅ and RI 6= ∅, respectively. Moreover, given a CI, RI, assertion, TBox, ABox, or ontology Γ , we say that Γ is satisfied in I (or that I is a model of Γ ), written I |= Γ , iff I |= T(Γ ). Given an ontology O and (a concept, role, CI, RI, or assertion mapped, via its FO translation, into) an FO formula ϕ, we say that ϕ is satisfiable w.r.t. O (or O-satisfiable) if there exists a model I of O that satisfies ϕ under some assignment in I. Finally, we say that O logically implies an FO formula ϕ, or that ϕ is a logical consequence of O, written O |= ϕ, if, for every model I of O and every assignment a in I, we have that I satisfies ϕ under a. 3 Basic Model-Theoretic Properties In this section, we prove the model-theoretic properties that will be used later on to develop our verification machinery. Specifically, we show here that the standard translation of the RDFS+ ontologies introduced in the previous section admits model completion, and has the constraint satisfiability problem decidable. These properties will allow us, in the subsequent sections, to verify suitably defined DL-based data-aware processes by employing a variant of the SMT-based backward reachability procedure introduced in [10]. To present our results, we require the following preliminary notions. A formula that is a conjunction of Σ-literals is called a Σ-constraint. Given a Σ-theory T , we define the constraint satisfiability problem for T as follows: given a formula ∃yϕ(x, y), where ϕ(x, y) is a Σ-constraint, decide whether ∃yϕ(x, y) is satisfiable w.r.t. T . A theory T has quantifier elimination iff, for every ΣT - formula ϕ(x), there exists a quantifier-free formula ψ(x) such that T |= ϕ(x) ↔ ψ(x). Finally, we will use the following definition of model completion, which is restricted to cover the case of universal theories (the ones considered in this work) and that is nonetheless known to be equivalent (for universal theories) to the usual one from model theory [13,19,10]. Let T be a universal Σ-theory and let T ∗ ⊇ T be a further Σ-theory. We say that T ∗ is a model completion of T iff: (i) every Σ-constraint satisfied in some model of T is also satisfied in some model of T ∗ ; (ii) T ∗ has quantifier elimination. We now state the main technical result of the section. SMT-Based Safety Verification of Data-Aware Processes under Ontologies 7 Theorem 2. Given an RDFS+ ontology O, T(O) is a finite universal FO theory that (i) has a decidable constraint satisfiability problem, and (ii) admits a model completion. Proof (Sketch). To prove Point (i), we reduce to RDFS+ (seen as a fragment of ALCHI, [29]) ontology satisfiability. For Point (ii), since there is no function symbol in ΣT (O) , it is sufficient to show that T (O) enjoys the amalgamation property: this is proved by explicitly constructing a T (O)-amalgam for every pair of models I1 and I2 of T (O) sharing a submodel I0 (see [12] for details). Remark 3. For every RDFS+ ontology O, the model completion T (O)∗ of T (O) admits quantifier elimination. The algorithm for quantifier elimination in T (O)∗ follows from the proof of Theorem 2: to eliminate ∃x from a ΣT (O) -formula ∃xϕ(x, y), it is sufficient to take the conjunction of the clauses χ(y) implied by ϕ(x, y), which are finitely many for T (O), up to T (O)-equivalence. This procedure is used in Algorithm 1 below and is crucial to get the decidability results of Theorem 6. / Properties (i) and (ii) from Theorem 2 are in line with the foundational framework of [10,11], where a third property is additionally assumed: the finite model property for constraint satisfiability (see the references for the definition). However, differently from [10,11], this property is not needed anymore for the results of our paper. This is an important difference from [10,11], since the artifact systems studied there require to interact with finite structures (i.e., databases), whereas in the context of the present work we admit that the models of the knowledge base of our artifact systems can be infinite. 4 Ontology-Based Data-Aware Processes In this section, we present our main contributions. We first define our model, called RDFS+ -based simple artifact systems, or RDFS+ -SASs for short, to for- malise data-aware processes under RDFS+ ontologies. These systems are a vari- ant of the artifact-centric systems studied in [10]. RDFS+ -SASs read data from a given RDFS+ ontology, used to store background information of the system, and manipulate individual variables, called artifact variables, which represent the current state of the process. We then study the parameterised safety prob- lems of such models by adopting a symbolic version [21,11] of the well-known backward reachability procedure [1]. 4.1 DL-Based Simple Artifact Systems We first require the following preliminary notions. For an RDFS+ ontology O, an O-partition Wn is a finite V set P = {κ1 (x), . . . , κn (x)} of ΣO -literals such that O |= ∀x i=1 κi (x) ∧ ∀x i6=j ¬(κi (x) ∧ κj (x)). Given an ontology O, an O- partition P = {κ1 (x), . . . , κn (x)}, and ΣO -terms t(x) = (t1 (x), . . . , tn (x)), (the value of) a case-defined function F based on P and t, for a fresh function symbol 8 D. Calvanese et al. F ∈ NF , is defined as follows: for every model I of O, every assignment a in I, and every tuple x of variables, a(F (x)) = a(ti (x)), if I |=a κi (x). In order to introduce verification problems in a symbolic setting, one first has to specify which formulas are used to represent (i) the sets of states, (ii) the system initialisations, and (iii) the system evolution. To capture these aspects, we provide the following definitions. An RDFS+ -based simple artifact system (RDFS+ -SAS) is a tuple Sm S = (O, x, ι(x), j=1 {τj (x, x0 )}), where m ∈ N, and – O = (T , A) is an RDFS+ ontology; – x = (x1 , . . . , xn ) is a tuple of variables, called artifact variables, and x0 is a tuple ofVvariables that are renamed copies of variables in x; n – ι(x) = i=1 xi = ai , with ai ∈ NI , is an initial state formula; Vn – τj (x, x0 ) = ∃y(γ j (x, y) ∧ i=1 x0i = Fij (x, y)), for 1 ≤ j ≤ m, is a transition formula, where γ j (x, y) is a conjunction of ΣO -literals called guard of τj , and x0i = Fij (x, y), where each Fij is a case-defined function based on some O-partition and list of ΣO -terms, is an update of τj . Given an RDFS+ ontology O, we call state (ΣO -)formula a quantifier-free ΣO -formula ϕ(x). A state formula constrains the content of the artifact variables characterising the current states of the systems. Notice that a state formula can represent a (possibly infinite) set of states, because of the presence of (possibly infinitely many) different elements in a model of the ontology O. A safety formula for S is a state ΣO -formula ν(x), used to describe the undesired states of the system. We say that S is safe w.r.t. ν(x) if there does not exist k ≥ 0 and a formula of the form ι(x0 ) ∧ τj0 (x0 , x1 ) ∧ · · · ∧ τjk−1 (xk−1 , xk ) ∧ ν(xk ), (?) that is satisfiable w.r.t. O, where 1 ≤ j0 , . . . , jk−1 ≤ m and each xh , with 0 ≤ h ≤ k, is a tuple of variables that are renamed copies of variables in x. The safety problem for S is the following decision problem: given a safety formula ν(x) for S, decide whether S is safe w.r.t. ν(x). This verification problem is parametric on the models of a fixed RDFS+ ontology, since safety is assessed irrespectively of the choice of such a model. This implies that, when the system is safe, it is so for every execution of the process under every possible model (which in principle are infinitely many) of the given ontology. Example 4. We develop a simplified job hiring process for university personnel based on the domain knowledge formalised in Example 1. Each application is created using a dedicated website portal, where users that are potentially inter- ested in applying need to register in advance. When a registered user decides to apply, the data created by this single application do not have to be stored persistently and thus can be maintained just by using artifact variables (de- scribed below) that can interact with the knowledge base. All these variables SMT-Based Safety Verification of Data-Aware Processes under Ontologies 9 are initialised with an undefined value u. In the first transition of the system, an application is created by a registered user, which falls into the extension of the concept User: the information about this user is then stored in the artifact variable xapplicant . At this point, the application website asks the user whether they hold a university degree: in case of an affermative answer, the website ac- cepts the user as eligible, the information about the user is stored using xapplicant and the process can progress. Then, the user picks up a job position (assigned to xjob ) and applies for it. The following steps of the process involve the evaluation of the application: for both academic and administrative positions, if the eligible candidate is suitable for the chosen position, they are declared winner (assigned to xwinner ), otherwise they are declared loser (assigned to xloser ). To formalise this S7 process, we define the RDFS+ -SAS S = (O, x, ι(x), j=1 {τj (x, x0 )}) so that: – the ontology O is the RDFS+ ontology given in Example 1; – the artifact variables are x = (xapplicant , xjob , xeligible , xwinner , xloser ); – the initial state formula is ι = (xapplicant = u) ∧ (xjob = u) ∧ (xeligible = u) ∧ (xwinner = u) ∧ (xloser = u); – the transition formulas are τ1 = ∃y1 (User(y1 ) ∧ x0applicant = y1 ), τ2 = EligibleUser(xapplicant ) ∧ x0eligible = xapplicant , τ3 = ∃z1 (JobPosition(z1 ) ∧ appliesFor(xeligible , z1 ) ∧ x0job = z1 ), τ4 = AcademicPosition(xjob ) ∧ suitableFor(xeligible , xjob ) ∧ x0winner = xeligible , τ5 = AdminPosition(xjob ) ∧ suitableFor(xeligible , xjob ) ∧ x0winner = xeligible , τ6 = AcademicPosition(xjob ) ∧ ¬suitableFor(xeligible , xjob ) ∧ x0loser = xeligible , τ7 = AdminPosition(xjob ) ∧ ¬suitableFor(xeligible , xjob ) ∧ x0loser = xeligible . An undesired situation of the system is the one where an applicant registered user is declared winner even if they were not eligible. This situation is formally described by the following safety formula for S: ν = User(xwinner ) ∧ ¬EligibleUser(xwinner ). / 4.2 Backward Search for RDFS+ -SASs Algorithm 1 shows the SMT-based backward reachability procedure (or backward search) for handling the safety problem for an RDFS+ -SAS S. An integral part of the algorithm is to compute symbolic preimages (Line 5). The intuition behind the algorithm is to execute a loop where, starting from the undesired states of the system (described by the safety formula ν(x)), the state space of the system is explored backward : in every iteration of the while loop (Line 2), the current set of states is regressed through transitions thanks to the preimage computation. 0 0 Wm for any τ (z, z ) and φ(z) (where z are For that purpose, renamed copies of z), we define τ := h=1 τh and Pre(τ, φ) as the formula ∃z 0 (τ (z, z 0 ) ∧ φ(z 0 )). Let φ(x) be a state formula, describing the state of the artifact variables x. The preimage of the set of states described by theWformula φ(x) is the setWof states described by Pre(τ, φ) (notice that, when τ = τ̂ , then Pre(τ, φ) = Pre(τ̂ , φ)). We recall that a state formula is a quantifier-free ΣO -formula. Unfortunately, because of 10 D. Calvanese et al. Algorithm 1: SMT-based backward reachability procedure Function BReach(ν) 1 φ ←− ν; B ←− ⊥; 2 while φ ∧ ¬B is T (O)-satisfiable do 3 if ι ∧ φ is T (O)-satisfiable then return (unsafe, unsafe trace of form (?)); 4 B ←− φ ∨ B; 5 φ ←− Pre(τ, φ); 6 φ ←− QE(T (O)∗ , φ); return safe; the presence of the existentially quantified variables y in τ , Pre(τ, φ) is not a state formula, in general. As stated in [10,11], if the quantified variables were not eliminated, we would break the regressability of the procedure: indeed, the states reached by computing preimages, intuitively described by Pre(τ, φ), need to be represented by a state formula φ0 in the new iteration of the while loop. In addition, the increase of the number of variables due to the iteration of the preimage computation would affect the performance of the satisfiability tests described below, in case the loop is executed many times. In order to solve these issues, it is essential to introduce the subprocedure QE(T (O)∗ , φ) in Line 6. QE(T (O)∗ , φ) in Line 6 is a subprocedure that implements the quantifier elimination algorithm of T (O)∗ and that converts the preimage Pre(τ, φ) of a state formula φ into a state formula (equivalent to it modulo the axioms of T (O)∗ ), so as to guarantee the regressability of the procedure: this conversion is possible since T (O)∗ eliminates from τh the existentially quantified variables y. Backward search computes iterated preimages of the safety formula ν, until a fixpoint is reached (in that case, S is safe w.r.t. ν) or until a set intersecting the initial states (i.e., satisfying ι) is found (in that case, S is unsafe w.r.t. ν). Inclusion (Line 2) and disjointness (Line 3) tests can be discharged via proof obligations to be handled by SMT solvers. The fixpoint is reached when the test in Line 2 returns unsat: the preimage of the set of the current states is included in the set of states reached by the backward search so far (represented as the iterated application of preimages to the safety formula ν). The test at Line 3 is satisfiable when the states visited so far by the backward search includes a possible initial state (i.e., a state satisfying ι). If this is the case, then S is unsafe w.r.t. ν. Together with the unsafe outcome, the algorithm also returns an unsafe trace of the form (?), explicitly witnessing the sequence of transitions τh that, starting from the initial configurations, lead the system to a set of states satisfying the undesired conditions described by ν(x). Theorem 5. Backward search (Algorithm 1) is correct for detecting whether an RDFS+ -SAS S is safe w.r.t. ν(x). Proof (Sketch). First, we require the following claim, which follows immediately from the definitions. SMT-Based Safety Verification of Data-Aware Processes under Ontologies 11 Claim 1. For every safety formula ν(x) for S and every k ≥ 0, a formula ϑ of the form (?) is satisfiable w.r.t. O iff ϑ is satisfiable w.r.t. T (O). Then, we need to show that, instead of considering satisfiability of formulas of the form (?) in models of T (O), we can concentrate on satisfiability w.r.t. T (O)∗ (T (O)∗ exists thanks to Property (ii) of Theorem 2). Then, by exploiting the algorithm for quantifier elimination in T (O)∗ described in Remark 3, formulas of the form (?) can be represented via backward search by using quantifier-free formulas. We finally conclude by noticing that safety/unsafety of S w.r.t ν(x) can be now detected invoking the satisfiability tests (which are effective thanks to Property (i) of Theorem 2) over those quantifier-free formulae. Backward search for generic artifact systems is not guaranteed to termi- nate [11]. However, in case S is unsafe w.r.t. ν(x), an unsafe trace—which is finite—is found after finitely many iterations of the while loop: hence, in the unsafe case, backward search must terminate. Together with the theorem above, this means that the backward reachability procedure is at least a semi-decision procedure for detecting unsafety of RDFS+ -SASs. Nevertheless, we show in the following theorem that, in case of RDFS+ -SASs, backward search always termi- nates: thus, it is a full decision procedure, for which we also provide a PSpace upper bound. Theorem S 6. For an RDFS+ ontology O and an RDFS+ -SAS S = m (O, x, ι(x), j=1 {τj (x, x0 )}), the safety problem for S is decidable in PSpace Sm in the combined size of x, ι(x) and j=1 {τj (x, x0 )}. Proof (Sketch). For every RDFS+ ontology O, there are only finitely many quantifier-free ΣT (O) -formulas, up to T (O)-equivalence, that can be built out of a finite set of variables x. Thanks to the availability of the quantifier elim- ination procedure QE(T (O)∗ , ϕ), the overall number of variables in ϕ is never increased. This implies that globally there are only finitely many quantifier-free ΣT (O) -formulas that Algorithm 1 needs to analyse. Hence, Algorithm 1 termi- nates. Concerning complexity, we first note that the translation T (O) requires polynomial time. Then, we need to eliminate the occurrences of case-defined functions (creating an equivalent SAS whose size is polynomial in the size of the original one), and to modify Algorithm 1 by making it nondeterministic with an NPSpace complexity. The claim follows by applying Savitch’s Theorem. We observe that Algorithm 1 is not yet implemented in the state-of-the-art model checker MCMT (Model Checker Modulo Theories [21]), which is based on SMT solving. Such an implementation, however, can be obtained by extend- ing MCMT with the quantifier elimination algorithm for T (O)∗ (described in Remark 3), required in Line 6, together with a procedure for RDFS+ ontology satisfiability (seen as a fragment of ALCHI, [29]), required in Lines 2 and 3. 12 D. Calvanese et al. 5 Conclusions We have studied the problem of verification of data-aware processes under RDFS+ ontologies, where the process component can interact with a knowledge base specified by means of the DL RDFS+ , underpinning the RDFS constructs. We addressed this problem by introducing a suitable model of DL-based artifact- centric systems, called RDFS+ -based SASs, and by leveraging the SMT-based version of the backward reachability procedure, which is a well-known technique to employ for verifying systems of this kind. Specifically, we showed that this procedure is a full decision procedure for detecting safety of RDFS+ -based SASs, and we also provided a PSpace complexity upper bound. This work opens several directions for future work. First, we notice that the choice of RDFS+ ontologies is not intrinsic to our approach. Indeed, motivated by conceptual modelling and data integration issues in OBDA applications, we are currently working on the DL-Lite family of DLs, to define suitable DL- Lite-based SASs with analogous decidability and complexity results. The main difference we have to account for is that, for a DL-Lite ontology O, we have an equisatisfiable (but not equivalent) translation into a universal one-variable FO sentence T (O), and Claim 1 in the proof of Theorem 5 has to be modified to show that a trace ϑ is satisfiable w.r.t. O iff a suitably translated trace ϑ̂ is satisfiable w.r.t. T (O). In general, nonetheless, we point out that any DL satisfying the two conditions stated in Theorem 2 can be chosen for our purposes, and that the same theoretical guarantees can be obtained over the SMT-based backward reachability procedure. As future work, we thus intend to introduce a more general framework for DL-based SASs that is able to account for different DLs. We also intend to extend the results obtained here to more sophisticated artifact- centric models, such as the relational artifact systems (RASs) studied in [10,11]. Moreover, it could be worth investigating in this setting also properties that go beyond safety, such as liveness and fairness. Acknowledgements This research has been partially supported by the Wallenberg AI, Autonomous Systems and Software Program (WASP) funded by the Knut and Alice Wal- lenberg Foundation, by the Italian Basic Research (PRIN) project HOPE, by the EU H2020 project INODE (grant agreement 863410) by the CHIST-ERA project PACMEL, by the project IDEE (FESR1133) funded by the Eur. Reg. Development Fund (ERDF) Investment for Growth and Jobs Programme 2014- 2020, and by the Free University of Bozen-Bolzano through the projects KGID, GeoVKG, STyLoLa, VERBA and MENS. SMT-Based Safety Verification of Data-Aware Processes under Ontologies 13 References 1. Abdulla, P.A., Cerans, K., Jonsson, B., Tsay, Y.: General decidability the- orems for infinite-state systems. In: Proc. of the 11th Int. Conf. on Logic in Computer Science (LICS). pp. 313–321. IEEE Computer Society (1996). https://doi.org/10.1109/LICS.1996.561359 2. Artale, A., Calvanese, D., Kontchakov, R., Zakharyaschev, M.: The DL-Lite family and relations. J. of Artificial Intelligence Research 36, 1–69 (2009). https://doi.org/10.1613/jair.2820 3. Bagheri Hariri, B., Calvanese, D., De Giacomo, G., De Masellis, R., Montali, M., Felli, P.: Verification of description logic Knowledge and Action Bases. In: Proc. of the 20th Eur. Conf. on Artificial Intelligence (ECAI). pp. 103–108 (2012) 4. Bagheri Hariri, B., Calvanese, D., De Giacomo, G., Deutsch, A., Montali, M.: Verification of relational data-centric dynamic systems with external services. In: Proc. of the 32nd ACM Symp. on Principles of Database Systems (PODS). pp. 163–174 (2013) 5. Bagheri Hariri, B., Calvanese, D., Montali, M., De Giacomo, G., De Masellis, R., Felli, P.: Description logic Knowledge and Action Bases. J. of Artificial Intelligence Research 46, 651–686 (2013). https://doi.org/10.1613/jair.3826 6. Brickley, D., Guha, R.: RDF Schema 1.1. W3C Recommendation, World Wide Web Consortium (2014), available at https://www.w3.org/TR/rdf-schema/ 7. Calvanese, D., De Giacomo, G., Lembo, D., Lenzerini, M., Rosati, R.: Tractable reasoning and efficient query answering in description logics: The DL-Lite family. J. of Automated Reasoning 39(3), 385–429 (2007). https://doi.org/10.1007/s10817- 007-9078-x 8. Calvanese, D., De Giacomo, G., Montali, M.: Foundations of data aware process analysis: A database theory perspective. In: Proc. of the 32nd ACM Symp. on Principles of Database Systems (PODS). pp. 1–12 (2013). https://doi.org/10.1145/2463664.2467796 9. Calvanese, D., Ghilardi, S., Gianola, A., Montali, M., Rivkin, A.: Formal modeling and SMT-based parameterized verification of data-aware BPMN. In: Proc. of the 17th Int. Conf. on Business Process Management (BPM). Lec- ture Notes in Computer Science, vol. 11675, pp. 157–175. Springer (2019). https://doi.org/10.1007/978-3-030-26619-6 12 10. Calvanese, D., Ghilardi, S., Gianola, A., Montali, M., Rivkin, A.: From model completeness to verification of data aware processes. In: Description Logic, Theory Combination, and All That – Essays Dedicated to Franz Baader on the Occasion of his 60th Birthday. Lecture Notes in Computer Science, vol. 11560, pp. 212–239. Springer (2019). https://doi.org/10.1007/978-3-030-22102-7 10 11. Calvanese, D., Ghilardi, S., Gianola, A., Montali, M., Rivkin, A.: SMT- based verification of data-aware processes: A model-theoretic approach. Mathematical Structures in Computer Science 30(3), 271–313 (2020). https://doi.org/10.1017/S0960129520000067 12. Calvanese, D., Gianola, A., Mazzullo, A., Montali, M.: SMT-Based Safety Verification of Data-Aware Processes under Ontologies (Extended Version). Tech. Rep. arXiv:2108.12330, arXiv.org e-Print archive (2021), available at https://arxiv.org/abs/2108.12330 13. Chang, C.C., Keisler, H.J.: Model Theory. North-Holland Publ. Co., 3rd edn. (1990) 14 D. Calvanese et al. 14. Claßen, J., Liebenberg, M., Lakemeyer, G., Zarrieß, B.: Exploring the boundaries of decidable verification of non-terminating golog programs. In: Proc. of the 28th AAAI Conf. on Artificial Intelligence (AAAI). pp. 1012–1019. AAAI Press (2014) 15. Damaggio, E., Deutsch, A., Vianu, V.: Artifact systems with data dependen- cies and arithmetic. ACM Trans. on Database Systems 37(3), 22 (2012). https://doi.org/10.1145/2338626.2338628 16. Deutsch, A., Hull, R., Patrizi, F., Vianu, V.: Automatic verification of data-centric business processes. In: Proc. of the 12th Int. Conf. on Database Theory (ICDT). pp. 252–267 (2009) 17. Deutsch, A., Hull, R., Vianu, V.: Automatic verification of database-centric systems. SIGMOD Record 43(3), 5–17 (2014). https://doi.org/10.1145/2694428.2694430 18. Deutsch, A., Li, Y., Vianu, V.: Verification of hierarchical artifact sys- tems. ACM Trans. on Database Systems 44(3), 12:1–12:68 (2019). https://doi.org/10.1145/3321487 19. Ghilardi, S.: Model-theoretic methods in combined constraint satisfiability. J. of Automated Reasoning 33(3–4), 221–249 (2004) 20. Ghilardi, S., Gianola, A., Montali, M., Rivkin, A.: Petri nets with parameterised data - modelling and verification. In: Proc. of the 18th Int. Conf. on Business Process Management (BPM). Lecture Notes in Computer Science, vol. 12168, pp. 55–74. Springer (2020). https://doi.org/10.1007/978-3-030-58666-9 4 21. Ghilardi, S., Ranise, S.: Backward reachability of array-based systems by SMT solving: Termination and invariant synthesis. Log. Methods Comput. Sci. 6(4) (2010). https://doi.org/10.2168/LMCS-6(4:10)2010 22. Ghilardi, S., Ranise, S.: MCMT: A model checker modulo theories. In: Giesl, J., Hähnle, R. (eds.) Proc. of the 5th Int. Joint Conf. on Automated Reasoning (IJ- CAR). Lecture Notes in Computer Science, vol. 6173, pp. 22–29. Springer (2010). https://doi.org/10.1007/978-3-642-14203-1 3 23. Hodges, W.: Model Theory, Encyclopedia of Mathematics and its applications, vol. 42. Cambridge University Press (1993) 24. Hull, R.: Artifact-centric business process models: Brief survey of research results and challenges. In: Proc. of the 7th Int. Conf. on Ontologies, DataBases, and Ap- plications of Semantics (ODBASE). Lecture Notes in Computer Science, vol. 5332, pp. 1152–1163. Springer (2008). https://doi.org/10.1007/978-3-540-88873-4 17 25. Kontchakov, R., Zakharyaschev, M.: An introduction to description logics and query rewriting. In: Reasoning Web: Reasoning on the Web in the Big Data Era – 10th Int. Summer School Tutorial Lectures (RW). Lecture Notes in Computer Science, vol. 8714, pp. 195–244. Springer (2014). https://doi.org/10.1007/978-3- 319-10587-1 5 26. de Leoni, M., Felli, P., Montali, M.: Strategy synthesis for data-aware dynamic sys- tems with multiple actors. In: Calvanese, D., Erdem, E., Thielscher, M. (eds.) Proc. of the 17th Int. Conf. on Principles of Knowledge Representation and Reasoning (KR). pp. 315–325 (2020). https://doi.org/10.24963/kr.2020/32 27. Montali, M., Rivkin, A., Ritter, D.: Formalizing integration patterns with multimedia data. In: Proc. of the 24th Int. Conf. on Enter- prise Distributed Object Computing (EDOC). pp. 67–76. IEEE (2020). https://doi.org/10.1109/EDOC49727.2020.00018 28. Reichert, M.: Process and data: Two sides of the same coin? In: Proc. of the On the Move Confederated Int. Conf. (OTM). Lecture Notes in Computer Science, vol. 7565, pp. 2–19. Springer (2012). https://doi.org/10.1007/978-3-642-33606-5 2 SMT-Based Safety Verification of Data-Aware Processes under Ontologies 15 29. Rudolph, S.: Foundations of description logics. In: Reasoning Web: Semantic Technologies for the Web of Data – 7th Int. Summer School Tutorial Lectures (RW). Lecture Notes in Computer Science, vol. 6848, pp. 76–136. Springer (2011). https://doi.org/10.1007/978-3-642-23032-5 2 30. Vianu, V.: Automatic verification of database-driven systems: a new frontier. In: Proc. of the 12th Int. Conf. on Database Theory (ICDT). pp. 1–13 (2009). https://doi.org/10.1145/1514894.1514896 31. Zarrieß, B., Claßen, J.: Decidable verification of golog programs over non-local effect actions. In: Proc. of the 30th AAAI Conf. on Artificial Intelligence (AAAI). pp. 1109–1115. AAAI Press (2016)