=Paper= {{Paper |id=None |storemode=property |title=Semantically-Governed Data-Aware Processes |pdfUrl=https://ceur-ws.org/Vol-861/KiBP2012_paper_1.pdf |volume=Vol-861 |dblpUrl=https://dblp.org/rec/conf/kr/CalvaneseGLMS12 }} ==Semantically-Governed Data-Aware Processes== https://ceur-ws.org/Vol-861/KiBP2012_paper_1.pdf
        Semantically-Governed Data-Aware Processes

           Diego Calvanese1 , Giuseppe De Giacomo2 , Domenico Lembo2 ,
                       Marco Montali1 , and Ario Santoso1
            1
                 Free University of Bozen-Bolzano, lastname@inf.unibz.it
             2
                 Sapienza Università di Roma, lastname@dis.uniroma1.it



       Abstract. In this paper we consider processes that run over data stored in a
       relational database. Our setting is that of ontology-based data access (OBDA),
       where the information in the database is conceptually represented as an ontology
       and is declaratively mapped to it through queries. We are interested in verifying
       temporal logic formulas on the evolution of the information at the conceptual
       level, taking into account the knowledge present in the ontology, which allows
       for deducing information that is only implicitly available. Specifically, we show
       how, building on first-order rewritability of queries over the system state that is
       typical of ontology languages for OBDA, we are able to reformulate the temporal
       properties into temporal properties expressed over the underlying database. This
       allows us adopt notable decidability results on verification of evolving databases
       that have been established recently.


1   Introduction

Recent work in business processes, services and databases brought the necessity of
considering both data and processes simultaneously while designing the system. This
holistic view of considering data and processes together has given rise to a line of
research under the name of artifact-centric business processes [16, 14, 19, 1] that aims
at avoiding the notorious discrepancy of traditional approaches where these aspects are
considered separately [7]. Recently, interesting decidability results for verification of
temporal properties over such systems have been obtained in the context of so-called
Data-centric Dynamic Systems (DCDSs) based on relational technology [12, 6, 4, 5]. In
a DCDS, processes operate over the data of the system and evolve it by executing actions
that may issue calls to external services. The data returned by such external services is
injected into the system, effectively making it infinite state. There has been also some
work on a form of DCDS based on ontologies, where the data layer is represented in
a rich ontology formalism, and actions perform a form of instance level update of the
ontology [3]. The use of an ontology allows for a high-level conceptual view of the data
layer that is better suited for a business level treatment of the manipulated information.
    Here we introduce Semantically-Governed Data-Aware Processes (SGDAP), in
which we merge these two approaches by enhancing a relational layer constituted by
a DCDS based system, with an ontology, constituting a semantic layer. The ontology
captures the domain in which the SGDAP is executed, and allows for seeing the data
and their manipulation at a conceptual level through an ontology-based data access
(OBDA) system [8, 18]. Hence it provides us with a way of semantically governing
22       D. Calvanese et al.

the underlying DCDS. Specifically, an SGDAP is constituted by two main components:
(i) an OBDA system [8] which includes (the intensional level of) an ontology, a relational
database schema, and a mapping between the ontology and the database; (ii) a process
component, which characterizes the evolution of the system in terms of a process
specifying preconditions and effects of action execution over the relational layer.
     The ontology is represented through a Description Logic (DL) TBox [2], expressed in
a lightweight ontology language of the DL-Lite family [10], a family of DLs specifically
designed for efficiently accessing to large amounts of data. The mapping is defined in
terms of a set of assertions, each relating an arbitrary (SQL) query over the relational
layer to a set of atoms whose predicates are the concepts and roles of the ontology, and
whose arguments are terms built using specific function symbols applied to the answer
variables of the SQL query. Such mappings specify how to populate the elements of
the ontology from the data in the database, and function symbols are used to construct
(abstract) objects (object terms) from the concrete values retrieved from the database.
     When an SGDAP evolves, each snapshot of the system is characterized by a database
instance at the relational layer, and by a corresponding virtual ABox, which together
with the TBox provides a conceptual view of the relational instance at the semantic
layer. When the system is progressed by the process component, we assume that at
every time the current instance can be arbitrarily queried, and can be updated through
action executions, possibly involving external service calls to get new values from the
environment. Hence the process component relies on three main notions: actions, which
are the atomic progression steps for the data layer; external services, which can be called
during the execution of actions; and a process, which is essentially a non-deterministic
program that uses actions as atomic instructions. During the execution, the snapshots of
the relational layer can be virtually mapped as ABoxes in the semantic layer. This enables
to: (i) understand the evolution of the system at the conceptual level, and (ii) govern it at
the semantic level, rejecting those actions that, executed at the relational layer, would
lead to a new semantic snapshot that is inconsistent with the semantic layer’s TBox.
     In this work, we are interested in verifying dynamic properties specified in a variant
of µ-calculus [15], one of the most powerful temporal logics, expressed over the semantic
layer of an SGDAP. We consider properties expressed as µ-calculus formulae whose
atoms are queries built over the semantic layer. By relying on techniques for query an-
swering in DL-Lite OBDA systems, which exploit FOL rewritability of query answering
and of ontology satisfiability, we reformulate the temporal properties expressed over the
semantic layer into analogous properties over the relational layer. Given that our systems
are in general infinite-state, verification of temporal properties is undecidable. However,
we show how we can adapt to our setting recent results on the decidability of verification
of DCDSs based on suitable finite-state abstractions [5].

2    Preliminaries
In this section we introduce the description logic (DL) DL-LiteA,id and describe the
ontology-based data access (OBDA) framework.
DL-LiteA,id [11, 8] allows for specifying concepts, representing sets of objects, roles,
representing binary relations between objects, and attributes, representing binary rela-
tions between objects and values. The syntax of concept, role and attribute expressions
                                            Semantically-Governed Data-Aware Processes                23

in DL-LiteA,id is as follows:
                       B −→ N | ∃R | δ(U )                     R −→ P | P −

 Here, N , P , and U respectively denote a concept name, a role name, and an attribute
name, P − denotes the inverse of a role, and B and R respectively denote basic concepts
and basic roles. The concept ∃R, also called unqualified existential restriction, denotes
the domain of a role R, i.e., the set of objects that R relates to some object. Similarly, the
concept δ(U ) denotes the domain of an attribute U , i.e., the set of objects that U relates
to some value. Note that we consider here a simplified version of DL-LiteA,id where we
distinguish between between objects and values, but do not further deal with different
datatypes; similarly, we consider only a simplified version of identification assertions.
    A DL-LiteA,id ontology is a pair (T , A), where T is a TBox, i.e., a finite set of TBox
assertions, and A is an Abox, i.e., a finite set of ABox assertions. DL-LiteA,id TBox
assertions have the following form:
                      B1 v B2                 R1 v R2                  U1 v U2
                      B1 v ¬B2                R1 v ¬R2                 U1 v ¬U2
                  (id B Z1 , . . . , Zn )     (funct R)                (funct U )
 From left to right, assertions of the first row denote inclusions between basic concepts,
basic roles, and attributes; assertions of the second row denote disjointness between
basic concepts, basic roles, and attributes; assertions of the last row denote identification
(assertions) (IdA), and global functionality on roles and attributes. In the IdA, each Zi
denotes either an attribute or a basic role. Intuitively, an IdA of the above form asserts
that for any two different instances o, o0 of B, there is at least one Zi such that o and
o0 differ in the set of their Zi -fillers, that is the set of objects (if Zi is a role) or values
(if Zi is an attribute) that are related to o by Zi . As usual, in DL-LiteA,id TBoxes we
impose that roles and attributes occurring in functionality assertions or IdAs cannot be
specialized (i.e., they cannot occur in the right-hand side of inclusions).
    DL-LiteA,id ABox assertions have the form N (t1 ), P (t1 , t2 ), or U (t1 , v1 ), where t1
and t2 denote individual objects and v1 denotes a value.
    The semantics of DL-LiteA,id is given in [11]. We only recall here that we interpret
objects and values over distinct domains, and that for both we adopt the Unique Name
Assumption, i.e., different constants denote different objects (or values). The notions of
entailment, satisfaction, and model are as usual [11]. We also say that A is consistent
wrt T if (T , A) is satisfiable, i.e., admits at least one model.
    Next we introduce queries. As usual (cf. OWL 2), answers to queries are formed
by terms denoting individuals appearing in the ABox. The domain of an ABox A,
denoted by ADOM(A), is the (finite) set of terms appearing in A. A union of conjunctive
queries (UCQ) q over a TBox T is a FOL formula of the form ∃~y1 .conj 1 (~x, ~y1 ) ∨ · · · ∨
∃~yn .conj n (~x, ~yn ), with free variables ~x and existentially quantified variables ~y1 , . . . , ~yn .
Each conj i (~x, y~i ) in q is a conjunction of atoms of the form N (z), P (z, z 0 ), U (z, z 0 )
where N , P and U respectively denote a concept, role and attribute name of T , and
z, z 0 are constants in a set C or variables in ~x or y~i , for some i ∈ {1, . . . , n}. The
(certain) answers to q over an ontology (T , A) is the set ans (q, T , A) of substitutions3
 3
     As customary, we can view each substitution simply as a tuple of constants, assuming some
     ordering of the free variables of q.
24       D. Calvanese et al.

σ of the free variables of q with constants in ADOM(A) such that qσ evaluates to true
in every model of (T , A). If q has no free variables, then it is called boolean, and its
certain answers are true or false. Computing ans (q, T , A) of a UCQ q over a DL-LiteA,id
ontology (T , A) is in AC 0 in the size of A [11]. This is actually a consequence of the
fact that DL-LiteA,id enjoys the FOL rewritability property, which in our setting says
that for every UCQ q, ans (q, T , A) can be computed by evaluating the UCQ REW(q, T )
over A considered as a database. REW(q, T ) is the so-called perfect reformulation of q
w.r.t. T [11]. We also recall that, in DL-LiteA,id , ontology satisfiability is FOL rewritable.
In other words, we can construct a boolean FOL query qunsat (T ) that evaluates to true
over an ABox A iff the ontology (T , A) is unsatisfiable.
    In our framework, we consider an extension of UCQs, called ECQs, which are
queries of the query language EQL-Lite(UCQ) [9]. Formally, an ECQ over a TBox T is
a possibly open domain independent formula of the form:
                      Q −→ [q] | ¬Q | Q1 ∧ Q2 | ∃x.Q | x = y
 where q is a UCQ over T and [q] denotes that q is evaluated under the (minimal)
knowledge operator (cf. [9]). To compute the certain answers ANS (Q, T , A) to an ECQ
Q over an ontology (T , A), we can compute the certain answers over (T , A) of each
UCQ embedded in Q, and evaluate the first-order part of Q over the relations obtained
as the certain answers of the embedded UCQs. Hence, also computing ANS (Q, T , A) of
an ECQ Q over a DL-LiteA,id ontology (T , A) is in AC 0 in the size of A [9].
Ontology-Based Data Access (OBDA). In an OBDA system, a relational database is
connected to an ontology that represents the domain of interest by a mapping, which
relates database values with values and (abstract) objects in the ontology (c.f. [8]). In
particular, we make use of a countably infinite set V of values and a set Λ of function
symbols, each with an associated arity. We also define the set C of constants as the union
of V and the set {f (d1 , . . . , dn ) | f ∈ Λ and d1 , . . . , dn ∈ V} of object terms.
     Formally, an OBDA system is a structure O = hR, T , Mi, where: (i) R =
{R1 , . . . , Rn } is a database schema, constituted by a finite set of relation schemas;
(ii) T is a DL-LiteA,id TBox; (iii) M is a set of mapping assertions, each of the form:
Φ(~x) ; Ψ (~y , ~t), where: (a) ~x is a non-empty set of variables, (b) ~y ⊆ ~x, (c) ~t is a set
of object terms of the form f (~z), with f ∈ Λ and ~z ⊆ ~x, (d) Φ(~x) is an arbitrary SQL
query over D, with ~x as output variables, and (e) Ψ (~y , ~t) is a conjunctive query over T
of arity n > 0 without non-distinguished variables, whose atoms are over the variables ~y
and the object terms ~t.
Example 1. As a running example, we consider a simple university information system that
stores and manipulates data concerning students and their degree. In particular, we define an
OBDA system O = hR, T , Mi to capture the conceptual schema of such a domain, how data
are concretely maintained in a relational database, and how the two information levels are linked
through mappings. The conceptual schema is depicted in Figure 1, and formalized as the following
DL-LiteA,id TBox T :
           Bachelor v Student        δ(MNum) v Student           (funct MNum)
             Master v Student          Student v δ(MNum)         (id Student MNum)
          Graduated v Student
The conceptual schema states that Bachelor and Master are subclasses of Student, that some
Students could be already Graduated, and that MNum (representing the matriculation number) is
                                            Semantically-Governed Data-Aware Processes                25

                                 Bachelor
                                                  Student      Graduated
                                                mNum: String
                                 Master


                     Fig. 1. UML conceptual schema for our running example.

an attribute relating individuals of type Student (domain of the attribute) to corresponding Codes
(range of the attribute). The conceptual schema also expresses that each Student has exactly one
matriculation number, and we assume that matriculation numbers can be used to identify Students
(i.e., each MNum is associated to at most one Student). Data related to students are maintained in
a concrete underlying data source that obeys the database schema R, constituted by the following
relation schemas: (i) ENROLLED(id, name, surname, type, endDate) stores information about
students that are currently (endDate=NULL) or were enrolled in a bachelor (type="Bachelor") or
master (type="Master") course. (ii) GRAD(id, mark, type) stores data of former students who
have been graduated. (iii) TRANSF M(name, surname) is a temporary relation used to maintain
information about master students that have been recently transferred from another university, and
must still complete the enrollment process. The interconnection between the database schema R
and the conceptual schema T is specified through the following set M of mappings:
  m1 : SELECT name, surname, type FROM ENROLLED WHERE type ="Bachelor"
         ; Bachelor(stu1 (name, surname, type))
  m2 : SELECT name, surname, type FROM ENROLLED WHERE type ="Master"
         ; Master(stu1 (name, surname, type))
  m3 : SELECT name, surname, type, id FROM ENROLLED ; MNum(stu1 (name, surname, type), val(id))
  m4 : SELECT name, surname FROM TRANSF M ; Master(stu1 (name, surname, "Master"))
  m5 : SELECT e.name, e.surname, e.type FROM ENROLLED e, GRAD g WHERE e.id = g.id
         ; Graduated(stu1 (name, surname, type))

     Intuitively, m1 (m2 resp.) maps every id in ENROLLED with type "Bachelor" ("Master")
to a bachelor (master) student. Such a student is constructed by “objectifying” the name, surname
and course type using variable term stu1 /3. In m3 , the MNum attribute is instead created using
directly the value of id to fill in the target of the attribute. Notice the use of the val function symbol
for mapping id to the range of MNum. Mapping m4 leads to create further master students by
starting from the temporary TRANSF M table. Since such students are not explicitly associated
to course type, but it is intended that they are "Master", objectification is applied to students’
name and surname, adding "Master" as a constant in the variable term. Notice that, according to
the TBox T , such students have a matriculation number, but its value is not known (and, in fact,
no mapping exists to generate their MNum attribute). Finally, m5 generates graduated students
by selecting only those students in the ENROLLED table whose matriculation number is also
contained in the GRAD table.                                                                            u
                                                                                                        t
     Given a database instance D made up of values in V and conforming to schema R,
and given a mapping M, the virtual ABox generated      from D by a mapping assertion
m = Φ(x) ; Ψ (y, t) in M is m(D) = v∈eval(Φ,D) Ψ [x/v], where eval (Φ, D)
                                             S

denotes the evaluation of the SQL query Φ over D, and where we consider Ψ [x/v] to
be a set of atoms (as opposed to Sa conjunction). Then, the ABox generated from D
by the mapping M is M(D) = m∈M m(D). Notice that ADOM(M(D)) ⊆ C. As
for ABoxes, the active domain ADOM(D) of a database instance D is the set of values
occurring in D. Notice that ADOM(D) ⊆ V. Given an OBDA system O = hR, T , Mi
and a database instance D for R, a model for O wrt D is a model of the ontology
(T , M(D)). We say that O wrt D is satisfiable if it admits a model wrt D.
26       D. Calvanese et al.

Example 2. Consider a database instance D = {ENROLLED(123, john, doe, Bachelor, NULL)}.
The corresponding virtual ABox obtained from the application of the mapping M is M(D) =
{Bachelor(stu1 (john, doe, Bachelor)), MNum(stu1 (john, doe, Bachelor), val(123))}.   u
                                                                                      t
An UCQ q over an OBDA system O = hR, T , Mi is simply an UCQ over T . To
compute the certain answers of q over O wrt a database instance D for R, we follow a
three-step approach: (i) q is rewritten to compile away T , obtaining qr = REW(q, T );
(ii) the mapping M is used to unfold qr into a query over R, denoted by UNFOLD(qr , M),
which turns out to be an SQL query [17]; (iii) such a query is executed over D, obtaining
the certain answers. For an ECQ, we can proceed in a similar way, applying the rewriting
and unfolding steps to the embedded UCQs. It follows that computing certain answers
to UCQs/ECQs in an OBDA system is FOL rewritable. Applying the unfolding step to
qunsat (T ), we obtain also that satisfiability in O is FOL rewritable.

3    Semantically-Governed Data-Aware Processes
A Semantically-Governed Data-Aware Process (SGDAP) S = hO, P, D0 i is formed
by an OBDA System O = hR, T , Mi by a process component P, and by an initial
database instance D0 that conforms to the relational schema R in O. Intuitively, the
OBDA system keeps all the data of interest, while the process component modifies and
evolves such data, starting from the initial database D0 .
     The process component P constitutes the progression mechanism for the SGDAP.
Formally, P = hF, A, πi, where: (i) F is a finite set of functions representing calls to
external services, which return values; (ii) A is a finite set of actions, whose execution
progresses the data layer, and may involve external service calls; (iii) π is a finite set of
condition-action rules that form the specification of the overall process, which tells at
any moment which actions can be executed.
     An action α ∈ A has the form α(p1 , . . . , pn ) : {e1 , . . . , em }, where:
(i) α(p1 , . . . , pn ) is the signature of the action, constituted by a name α and a sequence
p1 , . . . , pn of input parameters that need to be substituted with values for the execution
of the action, and (ii) {e1 , . . . , em } is a set of effect specifications, whose specified
effects are assumed to take place simultaneously. Each ei has the form qi+ ∧ Q−            i      Ei ,
where: (a) qi+ ∧Q−      i is a query over R  whose   terms are variables ~
                                                                         x , action parameters,  and
constants from ADOM(D0 ). The query qi+ is a UCQ, and the query Q−                 i is an arbitrary
FO formula whose free variables are included in those of qi+ . Intuitively, qi+ selects the
tuples to instantiate the effect, and Q−      i filters away some of them. (b) Ei is the effect,
i.e., a set of facts for R, which includes as terms: terms in ADOM(D0 ), input parameters,
free variables of qi+ , and in addition Skolem terms formed by applying a function f ∈ F
to one of the previous kinds of terms. Such Skolem terms involving functions represent
external service calls and are interpreted so as to return a value chosen by an external
user/environment when executing the action.
     The process π is a finite set of condition-action rules Q 7→ α, where α is an action
in A and Q is a FO query over R whose free variables are exactly the parameters of α,
and whose other terms can be quantified variables or values in ADOM(D0 ).
Example 3. Consider the OBDA system O defined in Example 1. We now define a process
component P = hF, A, πi over the relational schema R of O, so as to obtain a full SGDAP.
                                             Semantically-Governed Data-Aware Processes                  27

In particular, π is constituted by the following condition-action rules (’ ’ denotes existentially
quantified variables that are not used elsewhere):
  – ENROLLED(id, , , , NULL)              GRADUATE(id)
  – TRANSF M(name, surname)               COMPL - ENR (name, surname)
     The first rule extracts a matriculation number id of a currently enrolled student
(endDate=NULL) from the ENROLLED relation and graduates the student, whereas the sec-
ond rule selects a pair name surname in TRANSF M and use them to complete the enrollment of
that student. In order to be effectively executed, the involved actions rely on the following set F
of service calls: (i) today() returns the current date; (ii) getMark(id, type) returns the final mark
received by student id; (iii) getID(name, surname, type) returns the matriculation number for the
name-surname pair of a student. The two actions GRADUATE and COMPL - ENR are then defined as
follows:

    GRADUATE(id) : { GRAD(id2 , m, t)      GRAD(id2 , m, t),
                        TRANSF M(n, s)      TRANSF M(n, s),
                        ENROLLED(id2 , n, s, t, d) ∧ id2 6= id   ENROLLED(id2 , n, s, t, d),
                        ENROLLED(id, n, s, t, NULL)       ENROLLED(id, n, s, t, today()),
                        ENROLLED(id, , , t, NULL)         GRAD(id, getMark(id, t), t) };
    COMPL - ENR (n, s) : { GRAD(id, m, t)    GRAD(id, m, t),
                           ENROLLED(id, n2 , s2 , t, d)    ENROLLED(id, n2 , s2 , t, d),
                           TRANSF M(n2 , s2 ) ∧ (n2 6= n ∨ s2 6= s)   TRANSF M(n2 , s2 ),
                           TRANSF M(n, s)       ENROLLED(getID(n, s, "Master"), n, s, "Master", NULL)}

    Given a matriculation number id, action GRADUATE inserts a new tuple for id in GRAD,
updating at the same time the enrollment’s end date for id in ENROLLED to the current date,
while keeping all other entries in TRANSF M, GRAD and ENROLLED. Given a name and
surname, action COMPL - ENR has the effect of moving the corresponding tuple in TRANSF M to
a new tuple in ENROLLED, for which the matriculation number is obtained by interacting with
the getID service call; all other entries TRANSF M, GRAD and ENROLLED are preserved. u   t


4    Semantics of SGDAP

This work focuses on the semantics of SGDAP assuming that external services behave
nondeterministically, i.e., two calls of a service with the same arguments may return
different results during the same run. This captures both services that model a truly
nondeterministic process (e.g., human operators), and services that model stateful servers.
    Let S = hO, P, D0 i be a SGDAP where O = hR, T , Mi and P = hF, A, πi. The
semantics of S is defined in terms of a possibly infinite transition system (TS), which
represents all possible computations that the process component can do over the data
starting from D0 . We start by defining the semantics of action execution. Let α be an
action in A of the form α(~ p) : {e1 , . . . , en } with effects ei = qi+ ∧ Q−
                                                                             i Ei , and let σ
be a substitution of p~ with values in V. The evaluation of the effects of α on a database
instance D using a substitution σ is captured by the following function:
                                    S                       S
                  DO (D, α, σ) =        qi+ ∧Q−   Ei in α   θ∈ANS ((qi+ ∧Q−
                                                                                    Ei σθ
                                              i                           i )σ,D)

which returns a database instance made up of values in V and Skolem terms represent-
ing service calls. We denote with CALLS(DO (D, α, σ)) such service calls, and with
EVALS (D, α, σ) the set of substitutions that replace these service calls with values in V:

          EVALS (D, α, σ) = {θ      | θ : CALLS(DO (D, α, σ)) → V is a total function}.
28       D. Calvanese et al.

We then say that the database instance D0 over V and conforming to R is produced from
D by the application of action α using substitution σ if D0 = DO (D, α, σ)θ, where
θ ∈ EVALS (D, α, σ).
Relational Layer Transition System (RTS). Let S = hO, P, D0 i be a SGDAP with
O = hR, T , Mi. The RTS ΥSR of S is formally defined as hR, Σ, s0 , db, ⇒i, where Σ is
a (possibly infinite) set of states, s0 is the initial state, db is a total function from states in
Σ to database instances made up of values in V and conforming to R, and ⇒⊆ Σ × Σ is
a transition relation. Σ, ⇒ and db are defined by simultaneous induction as the smallest
sets such that s0 ∈ Σ, with db(s0 ) = D0 , and satisfying the following property: Given
s ∈ Σ, for each condition-action rule Q(~       p) 7→ α(~  p) ∈ π, for each substitution σ of p~
such that σ ∈ ANS (Q, D), consider every database instance D0 produced from D by the
application of α using σ. Then: (i) if there exists s0 ∈ Σ such that db(s0 ) = D0 , then
s ⇒ s0 ; (ii) otherwise, if O is satisfiable wrt D0 , then s0 ∈ Σ, s ⇒ s0 and db(s0 ) = D0 ,
where s0 is a fresh state. We observe that the satisfiability check done in the last step of
the RTS construction accounts for semantic governance.
Semantic Layer Transition System (STS). Given a SGDAP S with O = hR, T , Mi
and with RTS ΥSR = hR, Σ, s0 , db, ⇒i, the STS ΥSS of S is a “virtualization” of the RTS
in the semantic layer. In particular, ΥSS maintains the structure of ΥSR unaltered, reflecting
that the process component is executed over the relational layer, but it associates each
state to a virtual ABox obtained from the application of the mapping M to the database
instance associated by ΥSR to the same state. Formally, ΥSS = hT , Σ, s0 , abox, ⇒i, where
abox is a total function from Σ to ABoxes made up of individual objects in C and
conforming to T , such that for each s ∈ Σ with db(s) = D, abox(s) = M(D).


5    Dynamic Constraints Formalism

Let S = hO, P, D0 i be an SGDAP where O = hR, T , Mi and P = hF, A, πi. We are
interested in the verification of conceptual temporal properties over S, i.e., properties
that constrain the dynamics of S understood at the semantic layer. Technically, this means
that properties are verified over the SGDAP’s STS ΥSS , combining temporal operators
with queries posed over the ontologies obtained by combining the TBox T with the
ABoxes associated to the states of ΥSS . More specifically, we adopt ECQs [9] to query
the ontologies of ΥSS , and µ-calculus [15] to predicate over the dynamics of ΥSS .
    We use a variant of µ-calculus [15], one of the most powerful temporal logics
subsuming LTL, PSL, and CTL* [13], called µLEQL    C , whose formulae have the form:

                   Φ ::= Q | Z | ¬Φ | Φ1 ∧ Φ2 | ∃x ∈ C0 .Φ | h−iΦ | µZ.Φ

 where Q is an ECQ over T , C0 = ADOM(M(D0 )) is the set of object terms appearing
in the initial virtual ABox (obtained by applying the mapping M over the database
instance D0 ), and Z is a predicate variable. As usual, syntactic monotonicity is enforced
to ensure existence of unique fixpoints. Beside the usual FOL abbreviations, we also
make use of the following ones: [−]Φ = ¬h−i(¬Φ) and νZ.Φ = ¬µZ.¬Φ[Z/¬Z]. The
subscript C in µLEQLC   stands for “closed”, and attests that ECQs are closed queries. In
fact, µLEQL
         C    formulae  only support the limited form of quantification ∃x ∈ C0 .Φ, which
                                        Semantically-Governed Data-Aware Processes            29

                                      W
is a convenient, compact notation for c∈ADOM(M(D0 )) Φ[x/c]. We make this assumption
for simplicity, but actually, with some care, our result can be extended to a more general
form of quantification over time [5].
     In order to define the semantics of µLEQL C    we resort to transition systems. Let
Υ = hT , Σ, s0 , abox, ⇒i be an STS. Let V be a predicate and individual variable
valuation on Υ , i.e., a mapping from the predicate variables Z to subsets of the states Σ,
and from individual variables to constants in ADOM(M(D0 )). Then, we assign meaning
to µLEQL
       C   formulas by associating to Υ and V an extension function (·)A    V , which maps
    EQL
µLC formulas to subsets of Σ. The extension function (·)A      V  is defined inductively as:

                  (Q)A
                     V          = {s ∈ Σ | ANS (QV, T , abox(s)) = true}
                  (Z)A
                     V          = V (Z) ⊆ Σ
                  (¬Φ)AV        = Σ − (Φ)AV
                            A
                  (Φ1 ∧ Φ2 )V = (Φ1 )AV ∩ (Φ2 )A V
                  (∃x ∈ C0 .Φ)A     {(Φ)A
                                  S
                              V =       V [x/c] | c ∈ ADOM (M(D0 ))}
                  (h−iΦ)AV      = {s ∈ Σ | ∃s0 . s ⇒ s0 and s0 ∈ (Φ)A
                                                                    V}
                         A
                                = {E ⊆ Σ | (Φ)A
                                  T
                  (µZ.Φ)V                          v[Z/E],V ⊆ E}

 When Φ is a closed formula, (Φ)A  V does not depend on V , and we denote it by (Φ) . We
                                                                                           A
                                                                              EQL
are interested in the model checking problem, i.e., verify whether a µLC closed formula
                                                                                               S
Φ holds for the SGDAP S. This problem is defined as checking whether s0 ∈ (Φ)ΥS ,
that is, whether Φ is true in the initial state s0 of ΥSS . If it is the case, we write ΥSS |= Φ.

Example 4. An example of dynamic property in our running example is Φ =
µZ.((∀s.[Student(s)] → [Graduated(s)])∨[−]Z), which says that every evolution of the system
leads to a state in which all students present in that state are graduated.               u
                                                                                          t


6    Verification of Dynamic Properties over SGDAPs

We now describe how µLEQL   C    properties can be effectively verified over SGDAPs. Let
S = hO, P, D0 i be an SGDAP where O = hR, T , Mi and P = hF, A, πi. Let Φ be
a µLEQL
      C    dynamic property specified over the T , and let ΥSS and ΥSR respectively be
the STS and RTS of S. The main issue to be tackled is that ΥSS and ΥSR are in general
infinite-state, and their verification undecidable. In [5], some decidability boundaries
for the verification of Data-Centric Dynamic Systems (DCDSs) have been extensively
studied. DCDSs are tightly related to SGDAPs, with some key differences in the data
component: (i) the process component is identical in the two frameworks; (ii) DCDSs
are only equipped with a relational layer, i.e., no ontology nor mapping are specified;
(iii) while SGDAPs define constraints over the data at the semantic layer, DCDSs are
equipped with denial constraints posed directly over the database schema. Given a
µLEQL
    C   property Φ, we therefore attack the verification problem ΥSS |= Φ in the following
way: (1) We transform Φ into a corresponding µLC property Φ0 , i.e., a µL property
whose atoms are closed FO queries over R, thus reducing ΥSS |= Φ to ΥSR |= Φ0 . (2) We
show, again exploiting FOL rewritability in DL-LiteA , that the consistency check used
to generate ΥSR can be rewritten as denial constraints over R. This means that ΥSR can
be generated by a purely relational DCDS. (3) We argue that Φ0 belongs to the dynamic
30           D. Calvanese et al.


              A1            S1                S3
                                  A2
  ΥSS   A0          S0                  A3             |= Φ
                            S2
                   Semantic Transition System
                                                        Φ� = unfold(rew(Φ, T ), M)
        M      M                 M       M
              D1            S1                S3                                 D1         S1                S3

  ΥR
   S
        D0          S0
                            S2
                                   D2
                                        D3             |= Φ�                D0        S0
                                                                                            S2
                                                                                                  D2
                                                                                                       D3          |= Φ�
                   Relational Transition System                                  Abstract Transition System


                         Fig. 2. Verification of dynamic µLEQL
                                                           C   properties over SGDAP

property language investigated in [5] for DCDSs under the nondeterministic semantics.
(4) We can therefore reuse the decidability results of [5] to check whether ΥSR |= Φ0 can
be decided and, in the positive case, we apply the abstraction technique defined in [5] for
reducing the verification problem to conventional finite-state model checking. Details
are provided below. The idea of the approach is depicted in Figure 2.
Property Transformation. In order to transform the property, we separate the treatment
of the dynamic part and of the embedded ECQs. Since the dynamics of an SGDAP is
completely determined at the relational layer, the dynamic part is maintained unaltered.
ECQs are instead manipulated as defined in Section 2. In particular, the rewriting of Φ
wrt the TBox T , denoted by Φr = REW(Φ, T ), is done by replacing each embedded
ECQ with its corresponding rewriting wrt T .
Example 5. Consider the µLEQL
                          C   property Φ described in Example 4, together with the TBox T
introduced in Example 1. The rewriting of Φ wrt T produces Φr = REW(Φ, T ), which is:
 µZ.(∀s.[Student(s) ∨ Bachelor(s) ∨ Master(s) ∨ MNum(s, )] → [Graduated(s)]) ∨ [−]Z
                                                                                        t
                                                                                        u
Before unfolding the rewritten dynamic propertyW      Φ r we translate each  subformula of
the form ∃x ∈ C0 .Ψ into the equivalent form c∈ADOM(M(D0 )) Ψ [x/c]. This means that
when such a form of quantification is used, the initial ABox must be materialized in order
to compute the initial active domain of the semantic layer. We then extend the UNFOLD()
function defined in Section 2 to unfold a µLEQL C    dynamic property over the semantic
layer into a corresponding property over the relational layer. As for the rewriting, the
temporal structure is maintained unaltered, reflecting that the dynamics of SGDAPs is
determined at the relational layer. For what concerns the ECQs embedded in the property,
the interesting case to be discussed is the one of (existential) quantification:
     UNFOLD (∃x.ϕ, M) = ∃x. UNFOLD (ϕ, M) ∨
                                         W
                                             (f /n)∈FS(M) ∃x1 , . . . , xn . UNFOLD (ϕ[x/f (x1 , . . . , xn )], M)

where FS(M) is the set of function symbols contained in M. This unfolding reflects
that quantification over individuals at the semantic layer must be properly rephrased as a
corresponding quantification over those values in the relational layer that could lead to
produce such individuals through the application of M. This is done by unfolding ∃x.ϕ
into a disjunction of formulae, where: (i) the first formula corresponds to ∃x.ϕ itself,
and is used to tackle the case in which x appears in the range of an attribute, which is
in fact a value; (ii) Each of the other formulae is obtained from ϕ by replacing x with
one of the possible variable terms produced by M, and quantifying over the existence of
values used to construct the corresponding object term.
                                           Semantically-Governed Data-Aware Processes              31

Example 6. Let us consider the µLEQL
                                 C   property Φr of Example 5, together with the mapping
M defined in Example 1. We get that UNFOLD(Φr , M) corresponds to:
                                                                             
            µZ. ∀x1 , x2 , x3 .AUXm3 (x1 , x2 , x3 , ) → AUXm5 (x1 , x2 , x3 ) ∨ [−]Z

where AUXm3 (name, surname, type, id) and AUXm5 (name, surname, type) represent the aux-
iliary view predicates of mapping assertions m3 and m5 respectively, whose defining queries
are the SQL queries in the left-hand side of the mapping assertion themselves. When unfolding
the UCQ Student(stu1 (x1 , x2 , x3 )) ∨ Bachelor(stu1 (x1 , x2 , x3 )) ∨ Master(stu1 (x1 , x2 , x3 )) ∨
MNum(stu1 (x1 , x2 , x3 ), ), we notice that the involved mapping assertions are m1 , m2 , and m3 .
However, we only consider m3 , because the query on its left-hand side contains the ones on the
left-hand side of m1 and m2 .


Reduction to Data-Centric Dynamic Systems. The connection between SGDAPs
and DCDSs is straightforward (see [5] for the definition of DCDS). Given a SGDAP
S = hO, P, D0 i with O = hR, T , Mi, we can construct a corresponding DCDS with
nondeterministic services SREL = hD, Pi, where D = hV, R, {qunsat (T ) → false}, D0 i.
Thanks to this encoding, we obtain ΥSR ≡ ΥSDCDS
                                            REL
                                                , where ΥSDCDS
                                                           REL
                                                               is the RTS constructed
for the DCDS SREL following the definition in [5].
Verification. Leveraging on the parallel between SGDAPs and DCDSs, verification of
a µLEQL
      C   property over a SGDAP can be reduced to the verification of a µLC property
over the corresponding DCDS. In fact, µLC (µ-calculus over closed FOL queries) is
contained in the fragments of FO µ-calculus studied for DCDSs in [5], namely µLA and
µLP . Both µLA and µLP support FOL queries over the DCDS, allowing for controlled
forms of FO quantification across states, and therefore they clearly support FO sentences.
    Let S = hO, P, D0 i be a SGDAP with O = hR, T , Mi, STS ΥSS and ΥSR =
hR, Σ, s0 , db, ⇒i. We say that ΥSR is state-bounded if there exists a bound b such
that for each s ∈ Σ, |ADOM(db(s))| < b. Let Φ be a µLEQL         C   property, and let Φ0 =
UNFOLD ( REW (Φ, T ), M). Since (i) ΥS |= Φ can be reduced to ΥSR |= Φ0 , (ii) Φ0
                                           S

belongs to µLC (which is contained in µLP ), (iii) ΥSR can be generated by a DCDS
with nondeterministic services, we can reuse the decidability results presented in [5]. In
particular, we obtain that ΥSS |= Φ is decidable if ΥSR is state bounded. Verification can in
this case be reduced to conventional finite-state model checking.

Example 7. Consider the SGDAP S = hO, P, D0 i, where O is the OBDA system defined in
Example 1, P the process component defined in Example 3. It is easy to see that the resulting
RTS ΥSR is state-bounded. Intuitively, this follows from the facts that the actions of S either move
tuples from the TRANSF M table to the ENROLLED one, or copy tuples from the ENROLLED
table to the GRAD one. Hence, the size of each database instance appearing in ΥSR is at most twice
the size of D0 , thus verification of µLEQL
                                        C    properties over the STS ΥSS is decidable.            u
                                                                                                  t



Acknowledgements. This research has been partially supported by the ICT Collabora-
tive Project ACSI (Artifact-Centric Service Interoperation), funded by the EU under FP7
ICT Call 5, 2009.1.2, grant agreement No. FP7-257593.
32       D. Calvanese et al.

References
 1. S. Abiteboul, P. Bourhis, A. Galland, and B. Marinoiu. The AXML artifact model. In Proc.
    of TIME 2009, pages 11–17, 2009.
 2. F. Baader, D. Calvanese, D. McGuinness, D. Nardi, and P. F. Patel-Schneider, editors. The De-
    scription Logic Handbook: Theory, Implementation and Applications. Cambridge University
    Press, 2003.
 3. B. Bagheri Hariri, D. Calvanese, G. De Giacomo, and R. De Masellis. Verification of
    conjunctive-query based semantic artifacts. In Proc. of DL 2011, volume 745 of CEUR,
    ceur-ws.org, 2011.
 4. B. Bagheri Hariri, D. Calvanese, G. De Giacomo, R. De Masellis, and P. Felli. Foundations of
    relational artifacts verification. In Proc. of BPM 2011, volume 6896 of LNCS, pages 379–395.
    Springer, 2011.
 5. B. Bagheri Hariri, D. Calvanese, G. De Giacomo, A. Deutsch, and M. Montali. Verification
    of relational data-centric dynamic systems with external services. CoRR Technical Report
    arXiv:1203.0024, arXiv.org e-Print archive, 2012. Available at http://arxiv.org/
    abs/1203.0024.
 6. F. Belardinelli, A. Lomuscio, and F. Patrizi. Verification of deployed artifact systems via data
    abstraction. In Proc. of ICSOC 2011, 2011.
 7. K. Bhattacharya, C. Gerede, R. Hull, R. Liu, and J. Su. Towards formal analysis of artifact-
    centric business process models. In Proc. of BPM 2007, volume 4714 of LNCS, pages
    288–234. Springer, 2007.
 8. D. Calvanese, G. De Giacomo, D. Lembo, M. Lenzerini, A. Poggi, M. Rodrı́guez-Muro, and
    R. Rosati. Ontologies and databases: The DL-Lite approach. In S. Tessaris and E. Franconi,
    editors, Semantic Technologies for Informations Systems – 5th Int. Reasoning Web Summer
    School (RW 2009), volume 5689 of LNCS, pages 255–356. Springer, 2009.
 9. D. Calvanese, G. De Giacomo, D. Lembo, M. Lenzerini, and R. Rosati. EQL-Lite: Effective
    first-order query processing in description logics. In Proc. of IJCAI 2007, 2007.
10. D. Calvanese, G. De Giacomo, D. Lembo, M. Lenzerini, and R. Rosati. Tractable reasoning
    and efficient query answering in description logics: The DL-Lite family. J. of Automated
    Reasoning, 39(3):385–429, 2007.
11. D. Calvanese, G. De Giacomo, D. Lembo, M. Lenzerini, and R. Rosati. Path-based identifica-
    tion constraints in description logics. In Proc. of KR 2008, pages 231–241, 2008.
12. P. Cangialosi, G. De Giacomo, R. De Masellis, and R. Rosati. Conjunctive artifact-centric
    services. In Proc. of ICSOC 2010, volume 6470 of LNCS, pages 318–333. Springer, 2010.
13. E. M. Clarke, O. Grumberg, and D. A. Peled. Model checking. The MIT Press, Cambridge,
    MA, USA, 1999.
14. D. Cohn and R. Hull. Business artifacts: A data-centric approach to modeling business
    operations and processes. IEEE Bull. on Data Engineering, 32(3):3–9, 2009.
15. E. A. Emerson. Automated temporal reasoning about reactive systems. In F. Moller and
    G. Birtwistle, editors, Logics for Concurrency: Structure versus Automata, volume 1043 of
    LNCS, pages 41–101. Springer, 1996.
16. A. Nigam and N. S. Caswell. Business artifacts: An approach to operational specification.
    IBM Systems Journal, 42(3):428–445, 2003.
17. A. Poggi, D. Lembo, D. Calvanese, G. De Giacomo, M. Lenzerini, and R. Rosati. Linking
    data to ontologies. J. on Data Semantics, X:133–173, 2008.
18. M. Rodrı́guez-Muro and D. Calvanese. Dependencies: Making ontology based data access
    work in practice. In Proc. of AMW 2011, volume 749 of CEUR, ceur-ws.org, 2011.
19. W. M. P. van der Aalst, P. Barthelmess, C. A. Ellis, and J. Wainer. Proclets: A framework
    for lightweight interacting workflow processes. Int. J. of Cooperative Information Systems,
    10(4):443–481, 2001.