=Paper=
{{Paper
|id=None
|storemode=property
|title=Semantically-Governed Data-Aware Processes
|pdfUrl=https://ceur-ws.org/Vol-861/KiBP2012_paper_1.pdf
|volume=Vol-861
|dblpUrl=https://dblp.org/rec/conf/kr/CalvaneseGLMS12
}}
==Semantically-Governed Data-Aware Processes==
Semantically-Governed Data-Aware Processes Diego Calvanese1 , Giuseppe De Giacomo2 , Domenico Lembo2 , Marco Montali1 , and Ario Santoso1 1 Free University of Bozen-Bolzano, lastname@inf.unibz.it 2 Sapienza Università di Roma, lastname@dis.uniroma1.it Abstract. In this paper we consider processes that run over data stored in a relational database. Our setting is that of ontology-based data access (OBDA), where the information in the database is conceptually represented as an ontology and is declaratively mapped to it through queries. We are interested in verifying temporal logic formulas on the evolution of the information at the conceptual level, taking into account the knowledge present in the ontology, which allows for deducing information that is only implicitly available. Specifically, we show how, building on first-order rewritability of queries over the system state that is typical of ontology languages for OBDA, we are able to reformulate the temporal properties into temporal properties expressed over the underlying database. This allows us adopt notable decidability results on verification of evolving databases that have been established recently. 1 Introduction Recent work in business processes, services and databases brought the necessity of considering both data and processes simultaneously while designing the system. This holistic view of considering data and processes together has given rise to a line of research under the name of artifact-centric business processes [16, 14, 19, 1] that aims at avoiding the notorious discrepancy of traditional approaches where these aspects are considered separately [7]. Recently, interesting decidability results for verification of temporal properties over such systems have been obtained in the context of so-called Data-centric Dynamic Systems (DCDSs) based on relational technology [12, 6, 4, 5]. In a DCDS, processes operate over the data of the system and evolve it by executing actions that may issue calls to external services. The data returned by such external services is injected into the system, effectively making it infinite state. There has been also some work on a form of DCDS based on ontologies, where the data layer is represented in a rich ontology formalism, and actions perform a form of instance level update of the ontology [3]. The use of an ontology allows for a high-level conceptual view of the data layer that is better suited for a business level treatment of the manipulated information. Here we introduce Semantically-Governed Data-Aware Processes (SGDAP), in which we merge these two approaches by enhancing a relational layer constituted by a DCDS based system, with an ontology, constituting a semantic layer. The ontology captures the domain in which the SGDAP is executed, and allows for seeing the data and their manipulation at a conceptual level through an ontology-based data access (OBDA) system [8, 18]. Hence it provides us with a way of semantically governing 22 D. Calvanese et al. the underlying DCDS. Specifically, an SGDAP is constituted by two main components: (i) an OBDA system [8] which includes (the intensional level of) an ontology, a relational database schema, and a mapping between the ontology and the database; (ii) a process component, which characterizes the evolution of the system in terms of a process specifying preconditions and effects of action execution over the relational layer. The ontology is represented through a Description Logic (DL) TBox [2], expressed in a lightweight ontology language of the DL-Lite family [10], a family of DLs specifically designed for efficiently accessing to large amounts of data. The mapping is defined in terms of a set of assertions, each relating an arbitrary (SQL) query over the relational layer to a set of atoms whose predicates are the concepts and roles of the ontology, and whose arguments are terms built using specific function symbols applied to the answer variables of the SQL query. Such mappings specify how to populate the elements of the ontology from the data in the database, and function symbols are used to construct (abstract) objects (object terms) from the concrete values retrieved from the database. When an SGDAP evolves, each snapshot of the system is characterized by a database instance at the relational layer, and by a corresponding virtual ABox, which together with the TBox provides a conceptual view of the relational instance at the semantic layer. When the system is progressed by the process component, we assume that at every time the current instance can be arbitrarily queried, and can be updated through action executions, possibly involving external service calls to get new values from the environment. Hence the process component relies on three main notions: actions, which are the atomic progression steps for the data layer; external services, which can be called during the execution of actions; and a process, which is essentially a non-deterministic program that uses actions as atomic instructions. During the execution, the snapshots of the relational layer can be virtually mapped as ABoxes in the semantic layer. This enables to: (i) understand the evolution of the system at the conceptual level, and (ii) govern it at the semantic level, rejecting those actions that, executed at the relational layer, would lead to a new semantic snapshot that is inconsistent with the semantic layer’s TBox. In this work, we are interested in verifying dynamic properties specified in a variant of µ-calculus [15], one of the most powerful temporal logics, expressed over the semantic layer of an SGDAP. We consider properties expressed as µ-calculus formulae whose atoms are queries built over the semantic layer. By relying on techniques for query an- swering in DL-Lite OBDA systems, which exploit FOL rewritability of query answering and of ontology satisfiability, we reformulate the temporal properties expressed over the semantic layer into analogous properties over the relational layer. Given that our systems are in general infinite-state, verification of temporal properties is undecidable. However, we show how we can adapt to our setting recent results on the decidability of verification of DCDSs based on suitable finite-state abstractions [5]. 2 Preliminaries In this section we introduce the description logic (DL) DL-LiteA,id and describe the ontology-based data access (OBDA) framework. DL-LiteA,id [11, 8] allows for specifying concepts, representing sets of objects, roles, representing binary relations between objects, and attributes, representing binary rela- tions between objects and values. The syntax of concept, role and attribute expressions Semantically-Governed Data-Aware Processes 23 in DL-LiteA,id is as follows: B −→ N | ∃R | δ(U ) R −→ P | P − Here, N , P , and U respectively denote a concept name, a role name, and an attribute name, P − denotes the inverse of a role, and B and R respectively denote basic concepts and basic roles. The concept ∃R, also called unqualified existential restriction, denotes the domain of a role R, i.e., the set of objects that R relates to some object. Similarly, the concept δ(U ) denotes the domain of an attribute U , i.e., the set of objects that U relates to some value. Note that we consider here a simplified version of DL-LiteA,id where we distinguish between between objects and values, but do not further deal with different datatypes; similarly, we consider only a simplified version of identification assertions. A DL-LiteA,id ontology is a pair (T , A), where T is a TBox, i.e., a finite set of TBox assertions, and A is an Abox, i.e., a finite set of ABox assertions. DL-LiteA,id TBox assertions have the following form: B1 v B2 R1 v R2 U1 v U2 B1 v ¬B2 R1 v ¬R2 U1 v ¬U2 (id B Z1 , . . . , Zn ) (funct R) (funct U ) From left to right, assertions of the first row denote inclusions between basic concepts, basic roles, and attributes; assertions of the second row denote disjointness between basic concepts, basic roles, and attributes; assertions of the last row denote identification (assertions) (IdA), and global functionality on roles and attributes. In the IdA, each Zi denotes either an attribute or a basic role. Intuitively, an IdA of the above form asserts that for any two different instances o, o0 of B, there is at least one Zi such that o and o0 differ in the set of their Zi -fillers, that is the set of objects (if Zi is a role) or values (if Zi is an attribute) that are related to o by Zi . As usual, in DL-LiteA,id TBoxes we impose that roles and attributes occurring in functionality assertions or IdAs cannot be specialized (i.e., they cannot occur in the right-hand side of inclusions). DL-LiteA,id ABox assertions have the form N (t1 ), P (t1 , t2 ), or U (t1 , v1 ), where t1 and t2 denote individual objects and v1 denotes a value. The semantics of DL-LiteA,id is given in [11]. We only recall here that we interpret objects and values over distinct domains, and that for both we adopt the Unique Name Assumption, i.e., different constants denote different objects (or values). The notions of entailment, satisfaction, and model are as usual [11]. We also say that A is consistent wrt T if (T , A) is satisfiable, i.e., admits at least one model. Next we introduce queries. As usual (cf. OWL 2), answers to queries are formed by terms denoting individuals appearing in the ABox. The domain of an ABox A, denoted by ADOM(A), is the (finite) set of terms appearing in A. A union of conjunctive queries (UCQ) q over a TBox T is a FOL formula of the form ∃~y1 .conj 1 (~x, ~y1 ) ∨ · · · ∨ ∃~yn .conj n (~x, ~yn ), with free variables ~x and existentially quantified variables ~y1 , . . . , ~yn . Each conj i (~x, y~i ) in q is a conjunction of atoms of the form N (z), P (z, z 0 ), U (z, z 0 ) where N , P and U respectively denote a concept, role and attribute name of T , and z, z 0 are constants in a set C or variables in ~x or y~i , for some i ∈ {1, . . . , n}. The (certain) answers to q over an ontology (T , A) is the set ans (q, T , A) of substitutions3 3 As customary, we can view each substitution simply as a tuple of constants, assuming some ordering of the free variables of q. 24 D. Calvanese et al. σ of the free variables of q with constants in ADOM(A) such that qσ evaluates to true in every model of (T , A). If q has no free variables, then it is called boolean, and its certain answers are true or false. Computing ans (q, T , A) of a UCQ q over a DL-LiteA,id ontology (T , A) is in AC 0 in the size of A [11]. This is actually a consequence of the fact that DL-LiteA,id enjoys the FOL rewritability property, which in our setting says that for every UCQ q, ans (q, T , A) can be computed by evaluating the UCQ REW(q, T ) over A considered as a database. REW(q, T ) is the so-called perfect reformulation of q w.r.t. T [11]. We also recall that, in DL-LiteA,id , ontology satisfiability is FOL rewritable. In other words, we can construct a boolean FOL query qunsat (T ) that evaluates to true over an ABox A iff the ontology (T , A) is unsatisfiable. In our framework, we consider an extension of UCQs, called ECQs, which are queries of the query language EQL-Lite(UCQ) [9]. Formally, an ECQ over a TBox T is a possibly open domain independent formula of the form: Q −→ [q] | ¬Q | Q1 ∧ Q2 | ∃x.Q | x = y where q is a UCQ over T and [q] denotes that q is evaluated under the (minimal) knowledge operator (cf. [9]). To compute the certain answers ANS (Q, T , A) to an ECQ Q over an ontology (T , A), we can compute the certain answers over (T , A) of each UCQ embedded in Q, and evaluate the first-order part of Q over the relations obtained as the certain answers of the embedded UCQs. Hence, also computing ANS (Q, T , A) of an ECQ Q over a DL-LiteA,id ontology (T , A) is in AC 0 in the size of A [9]. Ontology-Based Data Access (OBDA). In an OBDA system, a relational database is connected to an ontology that represents the domain of interest by a mapping, which relates database values with values and (abstract) objects in the ontology (c.f. [8]). In particular, we make use of a countably infinite set V of values and a set Λ of function symbols, each with an associated arity. We also define the set C of constants as the union of V and the set {f (d1 , . . . , dn ) | f ∈ Λ and d1 , . . . , dn ∈ V} of object terms. Formally, an OBDA system is a structure O = hR, T , Mi, where: (i) R = {R1 , . . . , Rn } is a database schema, constituted by a finite set of relation schemas; (ii) T is a DL-LiteA,id TBox; (iii) M is a set of mapping assertions, each of the form: Φ(~x) ; Ψ (~y , ~t), where: (a) ~x is a non-empty set of variables, (b) ~y ⊆ ~x, (c) ~t is a set of object terms of the form f (~z), with f ∈ Λ and ~z ⊆ ~x, (d) Φ(~x) is an arbitrary SQL query over D, with ~x as output variables, and (e) Ψ (~y , ~t) is a conjunctive query over T of arity n > 0 without non-distinguished variables, whose atoms are over the variables ~y and the object terms ~t. Example 1. As a running example, we consider a simple university information system that stores and manipulates data concerning students and their degree. In particular, we define an OBDA system O = hR, T , Mi to capture the conceptual schema of such a domain, how data are concretely maintained in a relational database, and how the two information levels are linked through mappings. The conceptual schema is depicted in Figure 1, and formalized as the following DL-LiteA,id TBox T : Bachelor v Student δ(MNum) v Student (funct MNum) Master v Student Student v δ(MNum) (id Student MNum) Graduated v Student The conceptual schema states that Bachelor and Master are subclasses of Student, that some Students could be already Graduated, and that MNum (representing the matriculation number) is Semantically-Governed Data-Aware Processes 25 Bachelor Student Graduated mNum: String Master Fig. 1. UML conceptual schema for our running example. an attribute relating individuals of type Student (domain of the attribute) to corresponding Codes (range of the attribute). The conceptual schema also expresses that each Student has exactly one matriculation number, and we assume that matriculation numbers can be used to identify Students (i.e., each MNum is associated to at most one Student). Data related to students are maintained in a concrete underlying data source that obeys the database schema R, constituted by the following relation schemas: (i) ENROLLED(id, name, surname, type, endDate) stores information about students that are currently (endDate=NULL) or were enrolled in a bachelor (type="Bachelor") or master (type="Master") course. (ii) GRAD(id, mark, type) stores data of former students who have been graduated. (iii) TRANSF M(name, surname) is a temporary relation used to maintain information about master students that have been recently transferred from another university, and must still complete the enrollment process. The interconnection between the database schema R and the conceptual schema T is specified through the following set M of mappings: m1 : SELECT name, surname, type FROM ENROLLED WHERE type ="Bachelor" ; Bachelor(stu1 (name, surname, type)) m2 : SELECT name, surname, type FROM ENROLLED WHERE type ="Master" ; Master(stu1 (name, surname, type)) m3 : SELECT name, surname, type, id FROM ENROLLED ; MNum(stu1 (name, surname, type), val(id)) m4 : SELECT name, surname FROM TRANSF M ; Master(stu1 (name, surname, "Master")) m5 : SELECT e.name, e.surname, e.type FROM ENROLLED e, GRAD g WHERE e.id = g.id ; Graduated(stu1 (name, surname, type)) Intuitively, m1 (m2 resp.) maps every id in ENROLLED with type "Bachelor" ("Master") to a bachelor (master) student. Such a student is constructed by “objectifying” the name, surname and course type using variable term stu1 /3. In m3 , the MNum attribute is instead created using directly the value of id to fill in the target of the attribute. Notice the use of the val function symbol for mapping id to the range of MNum. Mapping m4 leads to create further master students by starting from the temporary TRANSF M table. Since such students are not explicitly associated to course type, but it is intended that they are "Master", objectification is applied to students’ name and surname, adding "Master" as a constant in the variable term. Notice that, according to the TBox T , such students have a matriculation number, but its value is not known (and, in fact, no mapping exists to generate their MNum attribute). Finally, m5 generates graduated students by selecting only those students in the ENROLLED table whose matriculation number is also contained in the GRAD table. u t Given a database instance D made up of values in V and conforming to schema R, and given a mapping M, the virtual ABox generated from D by a mapping assertion m = Φ(x) ; Ψ (y, t) in M is m(D) = v∈eval(Φ,D) Ψ [x/v], where eval (Φ, D) S denotes the evaluation of the SQL query Φ over D, and where we consider Ψ [x/v] to be a set of atoms (as opposed to Sa conjunction). Then, the ABox generated from D by the mapping M is M(D) = m∈M m(D). Notice that ADOM(M(D)) ⊆ C. As for ABoxes, the active domain ADOM(D) of a database instance D is the set of values occurring in D. Notice that ADOM(D) ⊆ V. Given an OBDA system O = hR, T , Mi and a database instance D for R, a model for O wrt D is a model of the ontology (T , M(D)). We say that O wrt D is satisfiable if it admits a model wrt D. 26 D. Calvanese et al. Example 2. Consider a database instance D = {ENROLLED(123, john, doe, Bachelor, NULL)}. The corresponding virtual ABox obtained from the application of the mapping M is M(D) = {Bachelor(stu1 (john, doe, Bachelor)), MNum(stu1 (john, doe, Bachelor), val(123))}. u t An UCQ q over an OBDA system O = hR, T , Mi is simply an UCQ over T . To compute the certain answers of q over O wrt a database instance D for R, we follow a three-step approach: (i) q is rewritten to compile away T , obtaining qr = REW(q, T ); (ii) the mapping M is used to unfold qr into a query over R, denoted by UNFOLD(qr , M), which turns out to be an SQL query [17]; (iii) such a query is executed over D, obtaining the certain answers. For an ECQ, we can proceed in a similar way, applying the rewriting and unfolding steps to the embedded UCQs. It follows that computing certain answers to UCQs/ECQs in an OBDA system is FOL rewritable. Applying the unfolding step to qunsat (T ), we obtain also that satisfiability in O is FOL rewritable. 3 Semantically-Governed Data-Aware Processes A Semantically-Governed Data-Aware Process (SGDAP) S = hO, P, D0 i is formed by an OBDA System O = hR, T , Mi by a process component P, and by an initial database instance D0 that conforms to the relational schema R in O. Intuitively, the OBDA system keeps all the data of interest, while the process component modifies and evolves such data, starting from the initial database D0 . The process component P constitutes the progression mechanism for the SGDAP. Formally, P = hF, A, πi, where: (i) F is a finite set of functions representing calls to external services, which return values; (ii) A is a finite set of actions, whose execution progresses the data layer, and may involve external service calls; (iii) π is a finite set of condition-action rules that form the specification of the overall process, which tells at any moment which actions can be executed. An action α ∈ A has the form α(p1 , . . . , pn ) : {e1 , . . . , em }, where: (i) α(p1 , . . . , pn ) is the signature of the action, constituted by a name α and a sequence p1 , . . . , pn of input parameters that need to be substituted with values for the execution of the action, and (ii) {e1 , . . . , em } is a set of effect specifications, whose specified effects are assumed to take place simultaneously. Each ei has the form qi+ ∧ Q− i Ei , where: (a) qi+ ∧Q− i is a query over R whose terms are variables ~ x , action parameters, and constants from ADOM(D0 ). The query qi+ is a UCQ, and the query Q− i is an arbitrary FO formula whose free variables are included in those of qi+ . Intuitively, qi+ selects the tuples to instantiate the effect, and Q− i filters away some of them. (b) Ei is the effect, i.e., a set of facts for R, which includes as terms: terms in ADOM(D0 ), input parameters, free variables of qi+ , and in addition Skolem terms formed by applying a function f ∈ F to one of the previous kinds of terms. Such Skolem terms involving functions represent external service calls and are interpreted so as to return a value chosen by an external user/environment when executing the action. The process π is a finite set of condition-action rules Q 7→ α, where α is an action in A and Q is a FO query over R whose free variables are exactly the parameters of α, and whose other terms can be quantified variables or values in ADOM(D0 ). Example 3. Consider the OBDA system O defined in Example 1. We now define a process component P = hF, A, πi over the relational schema R of O, so as to obtain a full SGDAP. Semantically-Governed Data-Aware Processes 27 In particular, π is constituted by the following condition-action rules (’ ’ denotes existentially quantified variables that are not used elsewhere): – ENROLLED(id, , , , NULL) GRADUATE(id) – TRANSF M(name, surname) COMPL - ENR (name, surname) The first rule extracts a matriculation number id of a currently enrolled student (endDate=NULL) from the ENROLLED relation and graduates the student, whereas the sec- ond rule selects a pair name surname in TRANSF M and use them to complete the enrollment of that student. In order to be effectively executed, the involved actions rely on the following set F of service calls: (i) today() returns the current date; (ii) getMark(id, type) returns the final mark received by student id; (iii) getID(name, surname, type) returns the matriculation number for the name-surname pair of a student. The two actions GRADUATE and COMPL - ENR are then defined as follows: GRADUATE(id) : { GRAD(id2 , m, t) GRAD(id2 , m, t), TRANSF M(n, s) TRANSF M(n, s), ENROLLED(id2 , n, s, t, d) ∧ id2 6= id ENROLLED(id2 , n, s, t, d), ENROLLED(id, n, s, t, NULL) ENROLLED(id, n, s, t, today()), ENROLLED(id, , , t, NULL) GRAD(id, getMark(id, t), t) }; COMPL - ENR (n, s) : { GRAD(id, m, t) GRAD(id, m, t), ENROLLED(id, n2 , s2 , t, d) ENROLLED(id, n2 , s2 , t, d), TRANSF M(n2 , s2 ) ∧ (n2 6= n ∨ s2 6= s) TRANSF M(n2 , s2 ), TRANSF M(n, s) ENROLLED(getID(n, s, "Master"), n, s, "Master", NULL)} Given a matriculation number id, action GRADUATE inserts a new tuple for id in GRAD, updating at the same time the enrollment’s end date for id in ENROLLED to the current date, while keeping all other entries in TRANSF M, GRAD and ENROLLED. Given a name and surname, action COMPL - ENR has the effect of moving the corresponding tuple in TRANSF M to a new tuple in ENROLLED, for which the matriculation number is obtained by interacting with the getID service call; all other entries TRANSF M, GRAD and ENROLLED are preserved. u t 4 Semantics of SGDAP This work focuses on the semantics of SGDAP assuming that external services behave nondeterministically, i.e., two calls of a service with the same arguments may return different results during the same run. This captures both services that model a truly nondeterministic process (e.g., human operators), and services that model stateful servers. Let S = hO, P, D0 i be a SGDAP where O = hR, T , Mi and P = hF, A, πi. The semantics of S is defined in terms of a possibly infinite transition system (TS), which represents all possible computations that the process component can do over the data starting from D0 . We start by defining the semantics of action execution. Let α be an action in A of the form α(~ p) : {e1 , . . . , en } with effects ei = qi+ ∧ Q− i Ei , and let σ be a substitution of p~ with values in V. The evaluation of the effects of α on a database instance D using a substitution σ is captured by the following function: S S DO (D, α, σ) = qi+ ∧Q− Ei in α θ∈ANS ((qi+ ∧Q− Ei σθ i i )σ,D) which returns a database instance made up of values in V and Skolem terms represent- ing service calls. We denote with CALLS(DO (D, α, σ)) such service calls, and with EVALS (D, α, σ) the set of substitutions that replace these service calls with values in V: EVALS (D, α, σ) = {θ | θ : CALLS(DO (D, α, σ)) → V is a total function}. 28 D. Calvanese et al. We then say that the database instance D0 over V and conforming to R is produced from D by the application of action α using substitution σ if D0 = DO (D, α, σ)θ, where θ ∈ EVALS (D, α, σ). Relational Layer Transition System (RTS). Let S = hO, P, D0 i be a SGDAP with O = hR, T , Mi. The RTS ΥSR of S is formally defined as hR, Σ, s0 , db, ⇒i, where Σ is a (possibly infinite) set of states, s0 is the initial state, db is a total function from states in Σ to database instances made up of values in V and conforming to R, and ⇒⊆ Σ × Σ is a transition relation. Σ, ⇒ and db are defined by simultaneous induction as the smallest sets such that s0 ∈ Σ, with db(s0 ) = D0 , and satisfying the following property: Given s ∈ Σ, for each condition-action rule Q(~ p) 7→ α(~ p) ∈ π, for each substitution σ of p~ such that σ ∈ ANS (Q, D), consider every database instance D0 produced from D by the application of α using σ. Then: (i) if there exists s0 ∈ Σ such that db(s0 ) = D0 , then s ⇒ s0 ; (ii) otherwise, if O is satisfiable wrt D0 , then s0 ∈ Σ, s ⇒ s0 and db(s0 ) = D0 , where s0 is a fresh state. We observe that the satisfiability check done in the last step of the RTS construction accounts for semantic governance. Semantic Layer Transition System (STS). Given a SGDAP S with O = hR, T , Mi and with RTS ΥSR = hR, Σ, s0 , db, ⇒i, the STS ΥSS of S is a “virtualization” of the RTS in the semantic layer. In particular, ΥSS maintains the structure of ΥSR unaltered, reflecting that the process component is executed over the relational layer, but it associates each state to a virtual ABox obtained from the application of the mapping M to the database instance associated by ΥSR to the same state. Formally, ΥSS = hT , Σ, s0 , abox, ⇒i, where abox is a total function from Σ to ABoxes made up of individual objects in C and conforming to T , such that for each s ∈ Σ with db(s) = D, abox(s) = M(D). 5 Dynamic Constraints Formalism Let S = hO, P, D0 i be an SGDAP where O = hR, T , Mi and P = hF, A, πi. We are interested in the verification of conceptual temporal properties over S, i.e., properties that constrain the dynamics of S understood at the semantic layer. Technically, this means that properties are verified over the SGDAP’s STS ΥSS , combining temporal operators with queries posed over the ontologies obtained by combining the TBox T with the ABoxes associated to the states of ΥSS . More specifically, we adopt ECQs [9] to query the ontologies of ΥSS , and µ-calculus [15] to predicate over the dynamics of ΥSS . We use a variant of µ-calculus [15], one of the most powerful temporal logics subsuming LTL, PSL, and CTL* [13], called µLEQL C , whose formulae have the form: Φ ::= Q | Z | ¬Φ | Φ1 ∧ Φ2 | ∃x ∈ C0 .Φ | h−iΦ | µZ.Φ where Q is an ECQ over T , C0 = ADOM(M(D0 )) is the set of object terms appearing in the initial virtual ABox (obtained by applying the mapping M over the database instance D0 ), and Z is a predicate variable. As usual, syntactic monotonicity is enforced to ensure existence of unique fixpoints. Beside the usual FOL abbreviations, we also make use of the following ones: [−]Φ = ¬h−i(¬Φ) and νZ.Φ = ¬µZ.¬Φ[Z/¬Z]. The subscript C in µLEQLC stands for “closed”, and attests that ECQs are closed queries. In fact, µLEQL C formulae only support the limited form of quantification ∃x ∈ C0 .Φ, which Semantically-Governed Data-Aware Processes 29 W is a convenient, compact notation for c∈ADOM(M(D0 )) Φ[x/c]. We make this assumption for simplicity, but actually, with some care, our result can be extended to a more general form of quantification over time [5]. In order to define the semantics of µLEQL C we resort to transition systems. Let Υ = hT , Σ, s0 , abox, ⇒i be an STS. Let V be a predicate and individual variable valuation on Υ , i.e., a mapping from the predicate variables Z to subsets of the states Σ, and from individual variables to constants in ADOM(M(D0 )). Then, we assign meaning to µLEQL C formulas by associating to Υ and V an extension function (·)A V , which maps EQL µLC formulas to subsets of Σ. The extension function (·)A V is defined inductively as: (Q)A V = {s ∈ Σ | ANS (QV, T , abox(s)) = true} (Z)A V = V (Z) ⊆ Σ (¬Φ)AV = Σ − (Φ)AV A (Φ1 ∧ Φ2 )V = (Φ1 )AV ∩ (Φ2 )A V (∃x ∈ C0 .Φ)A {(Φ)A S V = V [x/c] | c ∈ ADOM (M(D0 ))} (h−iΦ)AV = {s ∈ Σ | ∃s0 . s ⇒ s0 and s0 ∈ (Φ)A V} A = {E ⊆ Σ | (Φ)A T (µZ.Φ)V v[Z/E],V ⊆ E} When Φ is a closed formula, (Φ)A V does not depend on V , and we denote it by (Φ) . We A EQL are interested in the model checking problem, i.e., verify whether a µLC closed formula S Φ holds for the SGDAP S. This problem is defined as checking whether s0 ∈ (Φ)ΥS , that is, whether Φ is true in the initial state s0 of ΥSS . If it is the case, we write ΥSS |= Φ. Example 4. An example of dynamic property in our running example is Φ = µZ.((∀s.[Student(s)] → [Graduated(s)])∨[−]Z), which says that every evolution of the system leads to a state in which all students present in that state are graduated. u t 6 Verification of Dynamic Properties over SGDAPs We now describe how µLEQL C properties can be effectively verified over SGDAPs. Let S = hO, P, D0 i be an SGDAP where O = hR, T , Mi and P = hF, A, πi. Let Φ be a µLEQL C dynamic property specified over the T , and let ΥSS and ΥSR respectively be the STS and RTS of S. The main issue to be tackled is that ΥSS and ΥSR are in general infinite-state, and their verification undecidable. In [5], some decidability boundaries for the verification of Data-Centric Dynamic Systems (DCDSs) have been extensively studied. DCDSs are tightly related to SGDAPs, with some key differences in the data component: (i) the process component is identical in the two frameworks; (ii) DCDSs are only equipped with a relational layer, i.e., no ontology nor mapping are specified; (iii) while SGDAPs define constraints over the data at the semantic layer, DCDSs are equipped with denial constraints posed directly over the database schema. Given a µLEQL C property Φ, we therefore attack the verification problem ΥSS |= Φ in the following way: (1) We transform Φ into a corresponding µLC property Φ0 , i.e., a µL property whose atoms are closed FO queries over R, thus reducing ΥSS |= Φ to ΥSR |= Φ0 . (2) We show, again exploiting FOL rewritability in DL-LiteA , that the consistency check used to generate ΥSR can be rewritten as denial constraints over R. This means that ΥSR can be generated by a purely relational DCDS. (3) We argue that Φ0 belongs to the dynamic 30 D. Calvanese et al. A1 S1 S3 A2 ΥSS A0 S0 A3 |= Φ S2 Semantic Transition System Φ� = unfold(rew(Φ, T ), M) M M M M D1 S1 S3 D1 S1 S3 ΥR S D0 S0 S2 D2 D3 |= Φ� D0 S0 S2 D2 D3 |= Φ� Relational Transition System Abstract Transition System Fig. 2. Verification of dynamic µLEQL C properties over SGDAP property language investigated in [5] for DCDSs under the nondeterministic semantics. (4) We can therefore reuse the decidability results of [5] to check whether ΥSR |= Φ0 can be decided and, in the positive case, we apply the abstraction technique defined in [5] for reducing the verification problem to conventional finite-state model checking. Details are provided below. The idea of the approach is depicted in Figure 2. Property Transformation. In order to transform the property, we separate the treatment of the dynamic part and of the embedded ECQs. Since the dynamics of an SGDAP is completely determined at the relational layer, the dynamic part is maintained unaltered. ECQs are instead manipulated as defined in Section 2. In particular, the rewriting of Φ wrt the TBox T , denoted by Φr = REW(Φ, T ), is done by replacing each embedded ECQ with its corresponding rewriting wrt T . Example 5. Consider the µLEQL C property Φ described in Example 4, together with the TBox T introduced in Example 1. The rewriting of Φ wrt T produces Φr = REW(Φ, T ), which is: µZ.(∀s.[Student(s) ∨ Bachelor(s) ∨ Master(s) ∨ MNum(s, )] → [Graduated(s)]) ∨ [−]Z t u Before unfolding the rewritten dynamic propertyW Φ r we translate each subformula of the form ∃x ∈ C0 .Ψ into the equivalent form c∈ADOM(M(D0 )) Ψ [x/c]. This means that when such a form of quantification is used, the initial ABox must be materialized in order to compute the initial active domain of the semantic layer. We then extend the UNFOLD() function defined in Section 2 to unfold a µLEQL C dynamic property over the semantic layer into a corresponding property over the relational layer. As for the rewriting, the temporal structure is maintained unaltered, reflecting that the dynamics of SGDAPs is determined at the relational layer. For what concerns the ECQs embedded in the property, the interesting case to be discussed is the one of (existential) quantification: UNFOLD (∃x.ϕ, M) = ∃x. UNFOLD (ϕ, M) ∨ W (f /n)∈FS(M) ∃x1 , . . . , xn . UNFOLD (ϕ[x/f (x1 , . . . , xn )], M) where FS(M) is the set of function symbols contained in M. This unfolding reflects that quantification over individuals at the semantic layer must be properly rephrased as a corresponding quantification over those values in the relational layer that could lead to produce such individuals through the application of M. This is done by unfolding ∃x.ϕ into a disjunction of formulae, where: (i) the first formula corresponds to ∃x.ϕ itself, and is used to tackle the case in which x appears in the range of an attribute, which is in fact a value; (ii) Each of the other formulae is obtained from ϕ by replacing x with one of the possible variable terms produced by M, and quantifying over the existence of values used to construct the corresponding object term. Semantically-Governed Data-Aware Processes 31 Example 6. Let us consider the µLEQL C property Φr of Example 5, together with the mapping M defined in Example 1. We get that UNFOLD(Φr , M) corresponds to: µZ. ∀x1 , x2 , x3 .AUXm3 (x1 , x2 , x3 , ) → AUXm5 (x1 , x2 , x3 ) ∨ [−]Z where AUXm3 (name, surname, type, id) and AUXm5 (name, surname, type) represent the aux- iliary view predicates of mapping assertions m3 and m5 respectively, whose defining queries are the SQL queries in the left-hand side of the mapping assertion themselves. When unfolding the UCQ Student(stu1 (x1 , x2 , x3 )) ∨ Bachelor(stu1 (x1 , x2 , x3 )) ∨ Master(stu1 (x1 , x2 , x3 )) ∨ MNum(stu1 (x1 , x2 , x3 ), ), we notice that the involved mapping assertions are m1 , m2 , and m3 . However, we only consider m3 , because the query on its left-hand side contains the ones on the left-hand side of m1 and m2 . Reduction to Data-Centric Dynamic Systems. The connection between SGDAPs and DCDSs is straightforward (see [5] for the definition of DCDS). Given a SGDAP S = hO, P, D0 i with O = hR, T , Mi, we can construct a corresponding DCDS with nondeterministic services SREL = hD, Pi, where D = hV, R, {qunsat (T ) → false}, D0 i. Thanks to this encoding, we obtain ΥSR ≡ ΥSDCDS REL , where ΥSDCDS REL is the RTS constructed for the DCDS SREL following the definition in [5]. Verification. Leveraging on the parallel between SGDAPs and DCDSs, verification of a µLEQL C property over a SGDAP can be reduced to the verification of a µLC property over the corresponding DCDS. In fact, µLC (µ-calculus over closed FOL queries) is contained in the fragments of FO µ-calculus studied for DCDSs in [5], namely µLA and µLP . Both µLA and µLP support FOL queries over the DCDS, allowing for controlled forms of FO quantification across states, and therefore they clearly support FO sentences. Let S = hO, P, D0 i be a SGDAP with O = hR, T , Mi, STS ΥSS and ΥSR = hR, Σ, s0 , db, ⇒i. We say that ΥSR is state-bounded if there exists a bound b such that for each s ∈ Σ, |ADOM(db(s))| < b. Let Φ be a µLEQL C property, and let Φ0 = UNFOLD ( REW (Φ, T ), M). Since (i) ΥS |= Φ can be reduced to ΥSR |= Φ0 , (ii) Φ0 S belongs to µLC (which is contained in µLP ), (iii) ΥSR can be generated by a DCDS with nondeterministic services, we can reuse the decidability results presented in [5]. In particular, we obtain that ΥSS |= Φ is decidable if ΥSR is state bounded. Verification can in this case be reduced to conventional finite-state model checking. Example 7. Consider the SGDAP S = hO, P, D0 i, where O is the OBDA system defined in Example 1, P the process component defined in Example 3. It is easy to see that the resulting RTS ΥSR is state-bounded. Intuitively, this follows from the facts that the actions of S either move tuples from the TRANSF M table to the ENROLLED one, or copy tuples from the ENROLLED table to the GRAD one. Hence, the size of each database instance appearing in ΥSR is at most twice the size of D0 , thus verification of µLEQL C properties over the STS ΥSS is decidable. u t Acknowledgements. This research has been partially supported by the ICT Collabora- tive Project ACSI (Artifact-Centric Service Interoperation), funded by the EU under FP7 ICT Call 5, 2009.1.2, grant agreement No. FP7-257593. 32 D. Calvanese et al. References 1. S. Abiteboul, P. Bourhis, A. Galland, and B. Marinoiu. The AXML artifact model. In Proc. of TIME 2009, pages 11–17, 2009. 2. F. Baader, D. Calvanese, D. McGuinness, D. Nardi, and P. F. Patel-Schneider, editors. The De- scription Logic Handbook: Theory, Implementation and Applications. Cambridge University Press, 2003. 3. B. Bagheri Hariri, D. Calvanese, G. De Giacomo, and R. De Masellis. Verification of conjunctive-query based semantic artifacts. In Proc. of DL 2011, volume 745 of CEUR, ceur-ws.org, 2011. 4. B. Bagheri Hariri, D. Calvanese, G. De Giacomo, R. De Masellis, and P. Felli. Foundations of relational artifacts verification. In Proc. of BPM 2011, volume 6896 of LNCS, pages 379–395. Springer, 2011. 5. B. Bagheri Hariri, D. Calvanese, G. De Giacomo, A. Deutsch, and M. Montali. Verification of relational data-centric dynamic systems with external services. CoRR Technical Report arXiv:1203.0024, arXiv.org e-Print archive, 2012. Available at http://arxiv.org/ abs/1203.0024. 6. F. Belardinelli, A. Lomuscio, and F. Patrizi. Verification of deployed artifact systems via data abstraction. In Proc. of ICSOC 2011, 2011. 7. K. Bhattacharya, C. Gerede, R. Hull, R. Liu, and J. Su. Towards formal analysis of artifact- centric business process models. In Proc. of BPM 2007, volume 4714 of LNCS, pages 288–234. Springer, 2007. 8. D. Calvanese, G. De Giacomo, D. Lembo, M. Lenzerini, A. Poggi, M. Rodrı́guez-Muro, and R. Rosati. Ontologies and databases: The DL-Lite approach. In S. Tessaris and E. Franconi, editors, Semantic Technologies for Informations Systems – 5th Int. Reasoning Web Summer School (RW 2009), volume 5689 of LNCS, pages 255–356. Springer, 2009. 9. D. Calvanese, G. De Giacomo, D. Lembo, M. Lenzerini, and R. Rosati. EQL-Lite: Effective first-order query processing in description logics. In Proc. of IJCAI 2007, 2007. 10. D. Calvanese, G. De Giacomo, D. Lembo, M. Lenzerini, and R. Rosati. Tractable reasoning and efficient query answering in description logics: The DL-Lite family. J. of Automated Reasoning, 39(3):385–429, 2007. 11. D. Calvanese, G. De Giacomo, D. Lembo, M. Lenzerini, and R. Rosati. Path-based identifica- tion constraints in description logics. In Proc. of KR 2008, pages 231–241, 2008. 12. P. Cangialosi, G. De Giacomo, R. De Masellis, and R. Rosati. Conjunctive artifact-centric services. In Proc. of ICSOC 2010, volume 6470 of LNCS, pages 318–333. Springer, 2010. 13. E. M. Clarke, O. Grumberg, and D. A. Peled. Model checking. The MIT Press, Cambridge, MA, USA, 1999. 14. D. Cohn and R. Hull. Business artifacts: A data-centric approach to modeling business operations and processes. IEEE Bull. on Data Engineering, 32(3):3–9, 2009. 15. E. A. Emerson. Automated temporal reasoning about reactive systems. In F. Moller and G. Birtwistle, editors, Logics for Concurrency: Structure versus Automata, volume 1043 of LNCS, pages 41–101. Springer, 1996. 16. A. Nigam and N. S. Caswell. Business artifacts: An approach to operational specification. IBM Systems Journal, 42(3):428–445, 2003. 17. A. Poggi, D. Lembo, D. Calvanese, G. De Giacomo, M. Lenzerini, and R. Rosati. Linking data to ontologies. J. on Data Semantics, X:133–173, 2008. 18. M. Rodrı́guez-Muro and D. Calvanese. Dependencies: Making ontology based data access work in practice. In Proc. of AMW 2011, volume 749 of CEUR, ceur-ws.org, 2011. 19. W. M. P. van der Aalst, P. Barthelmess, C. A. Ellis, and J. Wainer. Proclets: A framework for lightweight interacting workflow processes. Int. J. of Cooperative Information Systems, 10(4):443–481, 2001.