=Paper= {{Paper |id=Vol-477/paper-32 |storemode=property |title=Tractable Query Answering over Ontologies with Datalog+/- |pdfUrl=https://ceur-ws.org/Vol-477/paper_46.pdf |volume=Vol-477 |dblpUrl=https://dblp.org/rec/conf/dlog/CaliGL09 }} ==Tractable Query Answering over Ontologies with Datalog+/-== https://ceur-ws.org/Vol-477/paper_46.pdf
                      Tractable Query Answering over
                         Ontologies with Datalog±

               Andrea Calı̀2,1 , Georg Gottlob1,2 , and Thomas Lukasiewicz1,‡
                        1
                        Computing Laboratory, University of Oxford, UK
                       firstname.lastname@comlab.ox.ac.uk
             2
               Oxford-Man Institute of Quantitative Finance, University of Oxford, UK



          Abstract. We present a family of expressive extensions of Datalog, called Data-
          log± , as a new paradigm for query answering over ontologies. The Datalog±
          family admits existentially quantified variables in rule heads, and has suitable
          restrictions to ensure highly efficient ontology querying. In particular, we show
          that query answering under so-called guarded Datalog± is PTIME-complete in
          data complexity, and that query answering under so-called linear Datalog± is in
          AC 0 in data complexity. We also show how negative constraints and a general
          class of key constraints can be added to Datalog± while keeping ontology query-
          ing tractable. We then show that linear Datalog± , enriched with a special class
          of key constraints, generalizes the well-known DL-Lite family of tractable de-
          scription logics. Furthermore, the Datalog± family is of interest in its own right
          and can, moreover, be used in various contexts such as data integration and data
          exchange. This work is a short version of [8].


1      Introduction
Ontologies play a key role in the Semantic Web [4], data modeling, and information
integration [19]. Recent trends in ontological reasoning have shifted from decidability
issues to tractability ones, as e.g. reflected by the work on the DL-Lite family of tractable
description logics (DLs) [12, 22]. An important result of these works is that the main
variants of DL-Lite are not only decidable, but that answering (unions of) conjunctive
queries for these logics is in LOGSPACE, and actually in AC0 , in data complexity (i.e.,
the complexity where both the query and the constraints are fixed), and query answering
in DL-Lite is FO-rewritable [12]. As observed in [21], the lack of value creation makes
plain Datalog not very well suited for ontological reasoning with inclusion axioms ei-
ther. It is thus natural to extend Datalog in order to nicely accommodate ontological
axioms and constraints such as those expressible in the DL-Lite family.
    This paper proposes and studies variants of Datalog that are suited for efficient
ontological reasoning, and, in particular, for tractable ontology-based query answering.
We introduce the Datalog± family of Datalog variants, which extend plain Datalog
by the possibility of existential quantification in rule heads, and by a number of other
features, and, at the same time, restrict the rule syntax in order to achieve tractability.
In the Datalog± family, rules are a restricted form of tuple-generating dependencies
(TGDs) and equality-generating dependencies (EGDs). More specifically, in guarded
 ‡
     Alternative address: Institut für Informationssysteme, Technische Universität Wien, Austria.
2       A. Calı̀, G. Gottlob, and T. Lukasiewicz

Datalog± , rules are guarded TGDs (GTGDs), i.e., TGDs with a single atom in the body
that contains all universally quantified variables; in linear Datalog± , rules are linear
TGDs (LTGDs), i.e., TGDs that have a single atom in the body. We characterize the
data complexity of query answering for both the above sublanguages of Datalog± . We
show that query answering is PTIME-complete and in AC0 (which is the complexity of
evaluating fixed first-order formulas over a database or finite structure), when the query
and the set of GTGDs and LTGDs are fixed, respectively. We then enrich Datalog± by
adding negative constraints, which are Horn clauses with a (not necessarily guarded)
conjunction of atoms in their body and the truth constant false, denoted ⊥, in the head.
We show that the introduction of such constraints does not increase the complexity
of query answering in Datalog± . As a further extension, we add non-conflicting keys,
which are special EGDs that do not interact with TGDs, and thus also do not increase
the complexity of query answering in Datalog± . We deal only with keys, since this
suffices to capture the most common tractable ontology languages in the literature. The
class of non-conflicting keys is a generalization of the one in [10].
Further results. We have several other results that, for space reasons, we do not include
here. We refer the reader to [8] for more details. A central result is that the Datalog±
family is able to express the most common tractable ontology languages, in particular,
the DL-Lite family [12] and F-Logic Lite [9]. In particular, it is interesting to see that
linear Datalog± enriched with non-conflicting keys is expressive enough to capture the
expressive power of the description logics DL-LiteF , DL-LiteR , and DL-LiteA , which
are highly suitable for ontological modeling and reasoning. Finally, we are able to deal
with an extension of Datalog± with stratified negation. We provide a canonical model
and a perfect model semantics, and we show that they coincide. We thus provide a
natural stratified negation for query answering over ontologies, which has been an open
problem to date, since it is in general based on several strata of infinite models.
Related work. Note that the results of the present paper are related to but very dif-
ferent from the results in [7], where complexity issues of guarded TGDs as well as of
another class, called weakly guarded TGDs, were first studied. There, it was shown that
the combined complexity of query answering and fact inference with guarded TGDs is
2-EXPTIME complete, and it was noted that the data complexity is polynomial, which
is now much refined by the present paper. Extensions of Datalog that allow the in-
troduction of values not appearing in the active domain of the input database have
been proposed in the literature [1, 5, 11, 10]; the introduction of such values is usually
called value invention. Other works integrate Datalog with description logic knowledge
bases [16, 6, 23], which is a different perspective.


2   Preliminaries
We briefly recall some basics on databases, queries, TGDs, and the chase.
Databases and Queries. We assume (i) an infinite universe of data constants ∆ (which
constitute the “normal” domain of a database), (ii) an infinite set of (labeled) nulls ∆N
(used as “fresh” Skolem terms, which are placeholders for unknown values, and can
thus be seen as variables), and (iii) an infinite set of variables X (used in dependencies
and queries). Different constants represent different values (unique name assumption),
while different nulls may represent the same value. We assume a lexicographic order
                         Tractable Query Answering over Ontologies with Datalog±        3

on ∆ ∪ ∆N , with every symbol in ∆N following all symbols in ∆. We denote by X
sequences of variables X1 , . . . , Xk with k > 0. We assume a relational schema R,
which is a finite set of relation names (or predicates). A term t is a constant, null, or
variable. An atomic formula (or atom) a has the form P (t1 , ..., tn ), where P is n-ary
predicate, and t1 , ..., tn are terms. We denote by dom(a) the set of all its arguments.
These notations naturally extend to sets and conjunctions of atoms. Conjunctions of
atoms are often identified with the sets of their atoms. A database (instance) D for R is
a (possibly infinite) set of atoms with predicates from R and arguments from ∆ ∪ ∆N .
Such D is ground iff it contains only atoms with arguments from ∆. A conjunctive
query (CQ) over R has the form Q(X) = ∃YΦ(X, Y), where Φ(X, Y) is a conjunc-
tion of atoms with the variables X and Y. A Boolean CQ (BCQ) over R is a CQ of
the form Q(). Answers to CQs and BCQs are defined via homomorphisms, which are
mappings µ : ∆ ∪ ∆N ∪ X → ∆ ∪ ∆N ∪ X such that (i) c ∈ ∆ implies µ(c) = c,
(ii) c ∈ ∆N implies µ(c) ∈ ∆ ∪ ∆N , and (iii) µ is naturally extended to atoms, sets of
atoms, and conjunctions of atoms. The set of all answers to a CQ Q(X) = ∃YΦ(X, Y)
over a database D, denoted Q(D), is the set of all tuples t over ∆ for which there exists
a homomorphism µ : X ∪ Y → ∆ ∪ ∆N such that µ(Φ(X, Y)) ⊆ D and µ(X) = t. The
answer to a BCQ Q() over D is Yes, denoted D |= Q, iff Q(D) 6= ∅.
TGDs. Given a relational schema R, a tuple-generating dependency (TGD) σ is a first-
order formula of the form ∀X∀Y Φ(X, Y) → ∃Z Ψ (X, Z), where Φ(X, Y) and Ψ (X,
Z) are conjunctions of atoms over R, called the body and the head of σ, respectively.
We usually omit the universal quantifiers in TGDs. Such σ is satisfied in a database D
for R iff, whenever there is a homomorphism h that maps the atoms of Φ(X, Y) to
atoms of D, there exists an extension h0 of h that maps the atoms of Ψ (X, Z) to atoms
of D. All sets of TGDs are finite here. The notion of query answering under TGDs is
defined as follows. For a set of TGDs Σ on R, and a database D for R, the set of models
of D given Σ, denoted mods(Σ, D), is the set of all databases B such that B |= D ∪ Σ.
The set of answers to a CQ Q on D given Σ, denoted ans(Q, Σ, D), is the set of all
tuples a such that a ∈ Q(B) for all B ∈ mods(Σ, D). The answer to a BCQ Q over D
given Σ is Yes, denoted D ∪ Σ |= Q, iff ans(Q, Σ, D) 6= ∅. Note that query answering
under general TGDs is undecidable [3], even when the schema and TGDs are fixed [7].
Query answering under a certain class C of TGDs is said to be FO-rewritable iff, for
every given CQ Q and for every given set Σ of TGDs in C, there exists a first order query
QF O such that, for every instance D, it holds QF O (D) = ans(Q, Σ, D). We recall that
the two problems of CQ and BCQ evaluation under TGDs are LOGSPACE-equivalent
[13, 18, 17, 15]. Moreover, it is easy to see that the query output tuple (QOT) problem
(as a decision version of CQ evaluation) and BCQ evaluation are AC0 -reducible to each
other. Henceforth, we thus focus only on the BCQ evaluation problem. All complexity
results carry over to the other problems. We also recall that query answering under
TGDs is equivalent to query answering under TGDs with only singleton atoms in their
heads [7]. In the sequel, w.l.o.g., every TGD has a singleton atom in its head.
The TGD Chase. The chase was introduced for checking the implication of depen-
dencies [20], and later also for checking query containment [18]. It repairs a database
relative to a set of dependencies, so that the result of the chase satisfies the dependen-
cies. By “chase”, we refer both to the chase procedure and to its output. The TGD chase
works on a database through so-called TGD chase rules (for an extended chase with
also equality-generating dependencies (EGDs), see [8]). The TGD chase rule comes in
4       A. Calı̀, G. Gottlob, and T. Lukasiewicz

two flavors: oblivious and restricted, where the restricted one repairs TGDs only when
not satisfied. We focus on the oblivious one, since it makes proofs technically simpler.
The (oblivious) TGD chase rule defined below is the building block of the chase.
     TGD C HASE RULE . Consider a database D for a relational schema R, and a TGD σ
on R of the form Φ(X, Y) → ∃Z Ψ (X, Z). Then, σ is applicable to D if there exists a
homomorphism h that maps the atoms of Φ(X, Y) to atoms of D. Let σ be applicable,
and h1 be a homomorphism that extends h as follows: for each Xi ∈ X, h1 (Xi ) =
h(Xi ); for each Zj ∈ Z, h1 (Zj ) = zj , where zj is a “fresh” null, i.e., zj ∈ ∆N , zj
does not occur in D, and zj lexicographically follows all other nulls already introduced.
The application of σ adds to D the atom h1 (Ψ (X, Z)) if not already in D.
     The important notion of the (derivation) level of an atom in a TGD chase is defined
as follows. Let D be the initial database from which the chase is constructed. Then:
(1) The atoms in D have level 0. (2) Let a TGD Φ(X, Y) → ∃Z Ψ (X, Z) be applied at
some point in the construction of the chase, and let h and h1 be as in the TGD chase rule.
If the atom with highest level among those in h1 (Φ(X, Y)) has level k, then the added
atom h1 (Ψ (X, Z)) has level k + 1. The chase algorithm for Σ and D consists of an ex-
haustive application of the TGD chase rule in a breadth-first (level-saturating) fashion,
which leads as result to a (possibly infinite) chase for Σ and D, denoted chase(Σ, D).
The chase of level up to k > 0 for Σ and D, denoted chase k (Σ, D), is the set of all
atoms in chase(Σ, D) of level at most k. The (possibly infinite) chase relative to TGDs
is a universal model, i.e., for every B ∈ mods(Σ, D), there exists a homomorphism that
maps chase(Σ, D) onto B [15, 7], and thus Q(chase(Σ, D)) = ans(Q, Σ, D).

3   Guarded Datalog±
We now introduce guarded Datalog± as a class of special TGDs that show computa-
tional data tractability, while being at the same time expressive enough to model on-
tologies. BCQs relative to such TGDs can be evaluated on a finite part of the chase of
constant size in the data complexity.
    A TGD σ is guarded iff it contains an atom in its body that contains all universally
quantified variables of σ. The leftmost such atom is the guard atom (or guard) of σ.
The non-guard atoms in the body of σ are the side atoms of σ.
Example 3.1. The TGD r(X, Y ), s(Y, X, Z) → ∃W s(Z, X, W ) is guarded (via the
guard s(Y, X, Z)), while the TGD r(X, Y ), r(Y, Z) → r(X, Z) is not guarded.
     Sets of guarded TGDs (with single-atom heads) are theories in the guarded fragment
of first-order logic [2]. In the sequel, let R be a relational schema, let D be a database
for R, and let Σ be a set of guarded TGDs on R. We first give some preliminary
definitions as follows. The chase graph for Σ and D is the directed graph consisting of
chase(Σ, D) as the set of nodes and having an arc from a to b iff b is obtained from a
and possibly other atoms by a one-step application of a TGD σ ∈ Σ. Here, we mark a
as guard iff a is the guard of σ. The guarded chase forest for Σ and D is the restriction
of the chase graph for Σ and D to all atoms marked as guards and their children. The
subtree of an atom a in this forest, denoted subtree(a), is the restriction of the forest to
all successors of a. The type of an atom a, denoted type(a), is the set of all atoms b in
chase(Σ, D) that have only constants and nulls from a as arguments. Informally, the
type of a is the set of all atoms that determine the subtree of a in the forest.
                          Tractable Query Answering over Ontologies with Datalog±          5

Example 3.2. Consider the TGDs σ1 : r1 (X, Y ), r2 (Y ) → ∃Zr1 (Z, X) and σ2 :
r1 (X, Y ) → r2 (X), applied on an instance D = {r1 (a, b), r2 (b)}. The first part of
the (infinite) guarded chase forest for {σ1 , σ2 } and D is shown in Fig. 3, where every
arc is labeled with the applied TGD.

     Given a finite set S ⊆ ∆ ∪ ∆N , two sets of atoms A1 and r1 (a, b)              r2 (b)
A2 are S-isomorphic (or isomorphic if S = ∅) iff there exists σ1
                                                                                  σ2
a bijection β : A1 ∪ dom(A1 ) → A2 ∪ dom(A2 ) such that (i)
         −1                                                −1     r   (z   , a)      r2 (a)
β and β are homomorphisms, and (ii) β(c) = c = β (c)                1   1

for all c ∈ S. Two atoms a1 and a2 are S-isomorphic (or iso- σ1
                                                                                  σ2
morphic if S = ∅) iff {a1 } and {a2 } are S-isomorphic. The
notion of S-isomorphism (or isomorphism if S = ∅) is nat-         r1 (z 2 , z 1 )    r2 (z1 )
urally extended to more complex structures, such as pairs of σ
                                                                   1              σ2
two subtrees (V1 , E1 ) and (V2 , E2 ) of the guarded chase for-
est, and two pairs (b1 , S1 ) and (b2 , S2 ), where b1 and b2 are     ...            r2 (z2 )
atoms, and S1 and S2 are sets of atoms. The following lemma Fig. 1. Guarded chase
shows that if two atoms have S-isomorphic types, then their forest in Example 3.2.
subtrees are also S-isomorphic.

Lemma 3.1. Let S be a set of constants and nulls, and a1 and a2 be atoms from
chase(Σ, D) with S-isomorphic types. Then, the subtree of a1 is S-isomorphic to the
subtree of a2 .

    The next lemma provides an upper bound for the number of all non-T -isomorphic
pairs consisting of an atom and a type.
                                                                                       w·|R|
Lemma 3.2. Let w be the maximal arity of a predicate in R, δ = (2w)w · 2(2w)         ,
and a ∈ chase(Σ, D). Let P be a set of pairs (b, S), each consisting of an atom b and
a type S of atoms c with arguments from a and new nulls. If |P | > δ, then P contains
at least two dom(a)-isomorphic pairs.

    Using the above two lemmas, we are now ready to prove that the atoms in the type
of an atom a in the guarded chase forest cannot be generated at a depth of the guarded
chase forest that exceeds the depth of the atom a by a value that depends only on the
TGDs. Here, the guarded depth of an atom a in the guarded chase forest for Σ and D,
denoted depth(a), is the length of the path from D to a in the forest. Note that this is
generally different from the derivation level.

Lemma 3.3. Let a be a guard in the chase graph for Σ and D, and b ∈ type(a). Then,
depth(b) 6 depth(a) + k, where k depends only on Σ.
                                w·|R|
Proof. Let k = (2w)w · 2(2w)        , where w is the maximal arity of a predicate in R.
Suppose depth(b) > depth(a) + k for one atom b in the type of a. That is, the path
P in the guarded chase forest leading to b has a length greater than k from the depth
of a. Suppose first b contains no nulls. By Lemma 3.2, there are two isomorphic atoms
h and h0 (in this order) on P with isomorphic types (see Fig. 3). By Lemma 3.1, the
subtree of h is isomorphic to the subtree of h0 . But then b is also in the subtree of h,
on a path Q that is at least one edge shorter than P , which contradicts the assumption
6       A. Calı̀, G. Gottlob, and T. Lukasiewicz

that P is the path leading to b. Suppose next the set of nulls N in b is nonempty, and
consider the common predecessor c of a and b of largest depth. By Lemma 3.2, there
are two N -isomorphic atoms h and h0 (in this order) on P with N -isomorphic types
(see Fig. 3). Since b is in the type of a, it cannot contain new nulls compared to a.
                  level 0
                                                  Since c is the common predecessor
                 h           c
                                              of a and b of largest depth, b also can-
                                              not contain new nulls compared to c, and
             h0                      h
                           a
                                              thus compared   to h and h0 . So, b is in the
                                  h0
 a                                            types of both h and h0 . By Lemma 3.1,
                                     b        the subtree of h is N -isomorphic to the
              b
                                              subtree of h0 . But then b is also in the
                                              subtree of h, on a path Q that is at least
             (a)                  (b)
                                              one edge shorter than P , which contra-
                                              dicts the assumption that P is the path
Fig. 2. Construction in Lemma 3.3’s proof.
                                              leading to b. In summary, all atoms in the
type of a can be obtained on paths of length at most k.                                  2
    We next show that BCQs can be evaluated using only a finite, initial portion of the
guarded chase forest, whose size is determined by the TGDs only. Here, the guarded
chase of level up to k > 0 for Σ and D, denoted g-chase k (Σ, D), is the set of all atoms
in the forest of depth at most k.
Lemma 3.4. Let Q be a BCQ over R. If there exists a homomorphism µ that maps Q
into chase(Σ, D), then there exists a homomorphism λ that maps Q into g-chase k (Σ,
D), where k depends only on Q and R.
                                         level 0
                                                Proof. Let k = n · δ, where n = |Q|, δ =
                                                               w·|R|
                              a
                                                (2w)w · 2(2w)        , and w is the maximal
                                h               arity of a predicate    in R. Suppose there
                                                exists a homomorphism       that maps Q into
                              h0
                                                chase(Σ, D). Let µ be a homomorphism  P
                                                of this kind such that depth(µ) = q∈Q
                                b
                                                depth(µ(q)) is minimal. We now show
                                                that µ(Q) is contained in g-chase k (Σ,
                                                D). Towards a contradiction, suppose the
                                                contrary. Consider the tree consisting of
Fig. 3. Construction in Lemma 3.4’s proof.
                                                all atoms in µ(Q) and their ancestors in
                                                the guarded chase forest for Σ and D. As
µ(Q) is not contained in g-chase k (Σ, D), this tree must contain a path P of length
greater than δ of which all inner nodes (i.e., without start and end node) do not belong
to µ(Q) and have no branches (i.e., have exactly one outgoing edge). Let a be the start
node of P . By Lemma 3.2, there are two dom(a)-isomorphic atoms h and h0 on P
with dom(a)-isomorphic types. By Lemma 3.1, subtree(h) is dom(a)-isomorphic to
subtree(h0 ). Thus, we can remove h and the path to h0 , obtaining a path that is at least
one edge shorter. Let ι be the homomorphism mapping subtree(h0 ) to subtree(h), and
let µ0 = µ ◦ ι. Then, µ0 is a homomorphism that maps Q into chase(Σ, D) such that
depth(µ0 ) < depth(µ), which contradicts µ being a homomorphism of this kind such
that depth(µ) is minimal. This shows that µ(Q) is contained in g-chase k (Σ, D). 2
                         Tractable Query Answering over Ontologies with Datalog±        7

    The following definition captures a general property, where also the whole deriva-
tion of the query atoms are contained in the finite, initial portion of the guarded chase
forest, whose size is determined by the TGDs only.
Definition 3.1. We say that Σ has the bounded guard-depth property (BGDP) iff, for
every database D for R and for every BCQ Q, whenever there exists a homomorphism
µ that maps Q into chase(Σ, D), then there exists a homomorphism λ of this kind
such that all ancestors of λ(Q) in the chase graph for Σ and D are contained in g-
chase γg (Σ, D), where γg depends only on Q and R.
    It is not difficult to prove, based on Lemmas 3.3 and 3.4, that guarded TGDs enjoy
the BGDP. Hence, BCQ answering under guarded TGDs is in PTIME in data complexity,
since by the existence of the homomorphism λ in Lemma 3.4 and in Definition 3.1,
we obtain Q(g-chaseγg (Σ, D)) = Q(chase(Σ, D)) = ans(Q, Σ, D). The following
theorem summarizes these and other results from [8].
Theorem 3.1. Let D be a database for a relational schema R, Σ a finite set of guarded
TGDs on R, and Q a BCQ Q over R. Then, deciding D ∪ Σ |= Q is PTIME-complete
in data complexity. It can be done in linear time in data complexity for atomic Q.

4   Linear Datalog±
We now introduce linear Datalog± as a variant of guarded Datalog± , where we prove
that query answering is even FO-rewritable (and thus in AC0 ) in the data complexity.
Nonetheless, linear Datalog± is still expressive enough for representing ontologies [8]
in many cases. A TGD is linear iff it contains only a singleton body atom. Notice that
linear Datalog± generalizes the well-known class of inclusion dependencies.
    We first define the bounded derivation-depth property, which is strictly stronger than
the bounded guard-depth property.
Definition 4.1. A set of TGDs Σ has the bounded derivation-depth property (BDDP)
iff, for every database D for R and for every BCQ Q over R, whenever D ∪ Σ |= Q,
then chase γd (Σ, D) |= Q, where γd depends only on Q and R.
    Clearly, in the case of linear TGDs, for every a ∈ chase(Σ, D), the subtree of a
is determined only by a, instead of type(a) (cf. Lemma 3.1). Therefore, for a single
atom, its depth coincides with the number of applications of the TGD chase rule that
are necessary to generate it. By this observation, as an immediate consequence of the
fact that guarded TGDs enjoy the BGDP, we obtain that linear TGDs have the bounded
derivation-depth property. The next result shows that queries relative to TGDs with the
bounded derivation-depth property are FO-rewritable.
Theorem 4.1. Let Σ be a set of TGDs over a relational schema R, D be a database
for R, and Q be a BCQ over R. If Σ enjoys the BDDP, then Q is FO-rewritable.
   As an immediate consequence of the fact that linear TGDs enjoy the BDDP and of
Theorem 4.1, we have that BCQs are FO-rewritable in the linear case.
Corollary 4.1. Let R be a relational schema, Σ be a set of linear TGDs over R, D be
a database for R, and Q be a BCQ over R. Then, Q is FO-rewritable.
8       A. Calı̀, G. Gottlob, and T. Lukasiewicz

5   Extensions

In this section, we extend Datalog± with negative constraints and EGDs, which are both
important when representing ontologies.
Negative Constraints. A negative constraint (or constraint) is a first-order formula
of the form ∀XΦ(X) → ⊥, where Φ(X) is a (not necessarily guarded) conjunction of
atoms. It is often also written as ∀XΦ0 (X) → ¬p(X), where Φ0 (X) is obtained from
Φ(X) by removing the atom p(X). We usually omit the universal quantifiers.

Example 5.1. If the unary predicates c and c0 represent two classes, we may use the
constraint c(X), c0 (X) → ⊥ to assert that the two classes have no common instances.
Similarly, if additionally the binary predicate r represents a relationship, we may use
c(X), r(X, Y ) → ⊥ to enforce that no member of the class c participates to r.

     Query answering on a database D under TGDs ΣT and constraints ΣC can be done
effortlessly by additionally checking that every constraint σ = Φ(X) → ⊥ ∈ ΣC is sat-
isfied in D given ΣT , each of which can be done by checking that the BCQ Qσ = Φ(X)
evaluates to false in D given ΣT . We write D ∪ ΣT |= ΣC iff every σ ∈ ΣC is false
in D given ΣT . We thus obtain immediately the following result. Here, a BCQ Q is
true in D given ΣT and ΣC , denoted D ∪ ΣT ∪ ΣC |= Q, iff (i) D ∪ ΣT |= Q or
(ii) D ∪ ΣT 6|= ΣC (as usual in DLs).

Theorem 5.1. Let R be a relational schema, ΣT and ΣC be sets of TGDs and con-
straints on R, respectively, D be a database for R, and Q be a BCQ on R. Then,
D ∪ ΣT ∪ ΣC |= Q iff (i) D ∪ ΣT |= Q or (ii) D ∪ ΣT |= Qσ for some σ ∈ ΣC .

    The next theorem shows that constraints do not increase the data and combined
complexity of answering BCQs in the guarded and linear case. It follows from The-
orem 5.1, by which the additional effort of deciding D ∪ ΣT |= ΣC can be done by
answering |ΣC | BCQs Qσ without constraints, where each query Qσ has a size of
kσk 6 kΣC k, and in the data complexity, |ΣC | and kσk are constants. Here, |ΣC | de-
notes the cardinality of ΣC , while kΣC k denotes the input size of ΣC . This additional
effort does not increase the complexity of answering BCQs without constraints.

Theorem 5.2. Let R be a relational schema, ΣT and ΣC be sets of TGDs and con-
straints on R, respectively, D be a database for R, and Q be a BCQ. Then, deciding
D ∪ ΣT ∪ ΣC |= Q in the guarded (resp., linear) case has the same data and the same
combined complexity as deciding D ∪ ΣT |= Q in the guarded (resp., linear) case.

EGDs and Non-Conflicting Keys. An equality-generating dependency (or EGD) σ is
a first-order formula of the form ∀X Φ(X) → Xi = Xj , where Φ(X), called the body
of σ, is a conjunction of atoms, and Xi and Xj are variables from X. We call Xi = Xj
the head of σ. Such σ is satisfied in a database D for R iff, whenever there exists a
homomorphism h such that h(Φ(X, Y)) ⊆ D, it holds that h(Xi ) = h(Xj ).

Example 5.2. The EGD r(X, Y ), r(X, Y 0 ) → Y = Y 0 expresses that the second argu-
ment of the binary predicate r functionally depends on the first argument of r.
                         Tractable Query Answering over Ontologies with Datalog±        9

    The chase is naturally extended to databases under EGDs in addition to TGDs: The
chase of a database D in the presence of two sets ΣT and ΣE of TGDs and EGDs,
respectively, denoted chase(ΣT ∪ ΣE , D), is computed by iteratively applying (1) a
single TGD once, according to the order specified above, and (2) the EGDs, as long as
applicable (i.e., until a fixpoint is reached), where EGDs are applied as follows:
    EGD C HASE RULE . Consider a database D for a relational schema R, and an
EGD σ on R of the form Φ(X) → Xi = Xj . Such an EGD σ is applicable to D iff
there exists a homomorphism η : Φ(X) → D such that η(Xi ) and η(Xj ) are different
and not both constants. If η(Xi ) and η(Xj ) are different constants, then there is a hard
violation of σ, and the chase fails. Otherwise, the result of the application of σ to D is
the database h(D) obtained from D by replacing every occurrence of a non-constant
element e ∈ {η(Xi ), η(Xj )} in D by the other element e0 (if e and e0 are both nulls,
then e precedes e0 in the lexicographic order).
    While adding negative constraints is computationally effortless, adding EGDs is
more problematic. The interaction of TGDs and EGDs leads to undecidability of query
answering even in simple cases, such that of functional and inclusion dependencies [14],
or keys and inclusion dependencies (see, e.g., [10], where the proof of undecidability
is done in the style of Vardi as in [18]). It can even be seen that a fixed set of EGDs
and guarded TGDs can simulate a universal Turing machine, and thus query answering
and even propositional ground atom inference is undecidable with such fixed sets of
dependencies. For this reason, we consider a restricted class of EGDs, namely, non-
conflicting key dependencies (or NC keys), which show a controlled interaction with
TGDs (and negative constraints), such that they do not increase the complexity of an-
swering BCQs. Nonetheless, this class is sufficient for ontology modeling.
    We now first concentrate on the semantic notion of separability for EGDs, which
formulates a controlled interaction between EGDs and TGDs (and negative constraints),
such that the EGDs do not increase the complexity of answering BCQs. We then provide
a sufficient syntactic condition for the separability of EGDs, where we transfer a result
by [10] about non-key-conflicting inclusion dependencies to the more general setting of
Datalog± . In the context of description logics, general EGDs cannot be formulated, but
only keys. Therefore, we mainly focus on keys here.

Definition 5.1. Let R be a relational schema, and ΣT and ΣE be sets of TGDs and
EGDs on R, respectively. Then, ΣE is separable from ΣT iff for every database D
for R, the following conditions (i) and (ii) are both satisfied:
 (i) If there is a hard violation of an EGD in chase(ΣT ∪ ΣE , D), then there is also a
     hard violation of some EGD of ΣE , when this EGD is directly applied to D.
(ii) If there is no chase failure, then for every BCQ Q, chase(ΣT ∪ ΣE , D) |= Q iff
     chase(ΣT , D) |= Q.

    The following result shows that adding separable EGDs to TGDs and constraints
does not increase the data and combined complexity of answering BCQs in the guarded
and linear case. It follows immediately from the fact that the separability implies that
chase failure can be directly evaluated on D. For fixed (resp., variable) ΣE , this can
be done by evaluating a first-order formula on D (resp., in polynomial time in the size
of D and ΣE ), which clearly does not increase the data (resp., combined) complexity
of answering BCQs in the three cases.
10       A. Calı̀, G. Gottlob, and T. Lukasiewicz

Theorem 5.3. Let R be a relational schema, ΣT and ΣC be sets of TGDs and con-
straints on R, respectively, and D be a database for R. Let ΣE be a set of EGDs that is
separable from ΣT , and Q be a BCQ. Then, deciding D ∪ ΣT ∪ ΣC ∪ ΣE |= Q in the
guarded (resp., linear) case has the same data complexity and also the same combined
complexity as deciding D ∪ ΣT |= Q in the guarded (resp., linear) case.

     We next provide a sufficient syntactic condition for the separability of EGDs. We
focus on key dependencies (KDs, or simply keys): a key κ on a predicate r specifies a set
K of key attribute positions of r; it is satisfied in a database D if no two atoms in D of
the form r(t) have the same values in all positions in K. Clearly, KDs are special types
of EGDs. The EGD chase rule in case of a KD applies to pairs of tuples that violate
the KD. For example, a KD κ asserting that the key attribute of a ternary predicate r is
the first one is applicable to the instance D = {r(a, b, z3 ), r(a, z2 , z1 )} (where the zi ’s
are nulls), and its application yields the new instance {r(a, b, z1 )}. The following def-
inition generalizes the notion of “non-key-conflicting” dependency relative to a set of
keys, introduced in [10], to the context of arbitrary TGDs.

Definition 5.2. Let κ be a key, and σ be a TGD of the form Φ(X, Y) → ∃Z r(X, Z).
Then, κ is non-conflicting (NC) with σ iff either (i) the key does not apply to the pred-
icate r in the head of σ or (ii) the positions of κ in r are not a proper subset of the
X-positions in r in the head of σ, and every existentially quantified variable in σ ap-
pears only once in the head of σ. We say κ is non-conflicting (NC) with a set of TGDs
ΣT iff κ is NC with every σ ∈ ΣT . We say a set of keys ΣK is non-conflicting (NC)
with ΣT iff every κ ∈ ΣK is NC with ΣT .

Example 5.3. Consider the four keys κ1 , κ2 , κ3 , and κ4 defined by the key attribute sets
K1 = {r[1], r[2]}, K2 = {r[1], r[3]}, K3 = {r[3]}, and K4 = {r[1]}, respectively,
and the TGD σ = p(X, Y ) → ∃Z r(X, Y, Z). Then, the head predicate of σ is r,
and the set of positions in r with universally quantified variables is H = {r[1], r[2]}.
Observe that all keys but κ4 are NC with σ, since only K4 ⊂ H. Roughly, every atom
added in a chase by applying σ would have a fresh null in some position in K1 , K2 ,
and K3 , thus never firing κ1 , κ2 , and κ3 , respectively.

    The following theorem shows that the NC property between keys and TGDs implies
their separability. This generalizes a useful result of [10] on inclusion dependencies to
the much larger class of all TGDs. The main idea behind the proof is roughly as follows.
The NC condition between a key κ and a TGD σ assures that either (a) the application
of σ in the chase generates an atom with a fresh null in a position of κ, and so the fact
does not violate κ (see also Example 5.3), or (b) the X-positions in the predicate r in
the head of σ coincide with the key positions of κ in r, and thus any newly generated
atom must have fresh distinct nulls in all but the key position, and may eventually be
eliminated without violation. It then follows that the full chase does not fail. Since the
new nulls are all distinct, it also contains a homomorphic image of the TGD chase.
Therefore, the full chase is in fact homomorphically equivalent to the TGD chase.

Theorem 5.4. Let R be a relational schema, ΣT and ΣK be sets of TGDs and keys,
respectively, such that ΣK is NC with ΣT . Then, ΣK is separable from ΣT .
                         Tractable Query Answering over Ontologies with Datalog±       11

    We conclude this section by stating that in the NC case, keys do not increase the
data and the combined complexity of answering BCQs under guarded (resp., linear)
TGDs and constraints. This result is immediate by Theorems 5.4 and 5.3.
Corollary 5.1. Let R be a relational schema, ΣT and ΣC be sets of TGDs and con-
straints on R, respectively, and D be a database for R. Let ΣE be a set of EGDs that
is NC with ΣT , and Q be a BCQ. Then, deciding D ∪ ΣT ∪ ΣC ∪ ΣE |= Q in the
guarded (resp., linear) case has the same data complexity and also the same combined
complexity as deciding D ∪ ΣT |= Q in the guarded (resp., linear) case.

6   Ontology Querying
In [8], we have provided a translation of the DLs DL-LiteF , DL-LiteR , and DL-LiteA
of the DL-Lite family [12] into linear Datalog± with negative constraints and non-
conflicting keys, called Datalog±0 , and shown that the former are strictly less expres-
sive than the latter. The other DLs of the DL-Lite family (including DL-LiteF ,u and
DL-LiteR,u ) can be similarly translated to Datalog±
                                                   0 . We now illustrate the translation.
Example 6.1. Consider the following sets of atomic concepts, abstract roles, and indi-
viduals, which represent (i) the classes of scientists, articles, conference papers, and
journal papers, (ii) the binary relations “has author”, “has first author”, and “is author
of”, and (iii) two individuals, respectively:
               A = {Scientist, Article, ConPaper, JouPaper},
               RA = {hasAuthor, hasFirstAuthor, isAuthorOf}, I = {i1 , i2 }.
The following are concept inclusion axioms, which informally express that (i) con-
ference and journal papers are articles, (ii) conference papers are not journal papers,
(iii) every scientist has a publication, (iv) isAuthorOf relates scientists and articles:
         ConPaper v Article, JouPaper v Article, ConPaper v ¬JouPaper,
         Scientist v ∃isAuthorOf, ∃isAuthorOf v Scientist, ∃isAuthorOf − v Article.
The following role inclusion and functionality axioms express that (v) isAuthorOf is
the inverse of hasAuthor, and (vi) hasFirstAuthor is a functional binary relationship:
       isAuthorOf − v hasAuthor, hasAuthor− v isAuthorOf, (funct hasFirstAuthor).
Finally, the concept and role membership axioms Scientist(i1 ), isAuthorOf(i1 , i2 ), and
Article(i2 ) express that the individual i1 is a scientist who authors the article i2 .
    Then, the concept inclusion axioms are translated to the following TGDs and con-
straints (where we identify atomic concepts and roles with their predicates):
          ConPaper(X) → Article(X), JouPaper(X) → Article(X),
          ConPaper(X) → ¬JouPaper(X), Scientist(X) → ∃Z isAuthorOf(X, Z),
          isAuthorOf(X, Y ) → Scientist(X), isAuthorOf(Y, X) → Article(X).
The role inclusion and functionality axioms are translated to these TGDs and EGDs:
       isAuthorOf(Y, X) → hasAuthor(X, Y ), hasAuthor(Y, X) → isAuthorOf(X, Y ),
       hasFirstAuthor(X, Y ), hasFirstAuthor(X, Y 0 ) → Y = Y 0 .
Finally, the three concept and role membership axioms are translated to the three data-
base atoms Scientist(i1 ), isAuthorOf(i1 , i2 ), and Article(i2 ), respectively (where we
also identify individuals with their constants).
12       A. Calı̀, G. Gottlob, and T. Lukasiewicz

Acknowledgments. The work of A. Calı̀ and G. Gottlob was supported by the EPSRC
grant Number EP/E010865/1 “Schema Mappings and Automated Services for Data In-
tegration”. G. Gottlob, whose work was partially carried out at the Oxford-Man Institute
of Quantitative Finance, gratefully acknowledges support from the Royal Society as the
holder of a Royal Society-Wolfson Research Merit Award. T. Lukasiewicz’s work was
supported by the German Research Foundation under the Heisenberg Programme.

References
 1. S. Abiteboul and V. Vianu. Datalog extensions for database queries and updates. J. Comput.
    Syst. Sci., 43(1):62–124, 1991.
 2. H. Andréka, I. Németi, and J. van Benthem. Modal languages and bounded fragments of
    predicate logic. J. Philos. Logic, 27(3):217–274, 1998.
 3. C. Beeri and M. Y. Vardi. The implication problem for data dependencies. In Proc. ICALP-
    1981, LNCS 115, pp. 73–85. Springer, 1981.
 4. T. Berners-Lee, J. Hendler, and O. Lassila. The Semantic Web. Sci. Am., 284:34–43, 2001.
 5. L. Cabibbo. The expressive power of stratified logic programs with value invention. Inf.
    Comput., 147(1):22–56, 1998.
 6. M. Cadoli, L. Palopoli, and M. Lenzerini. Datalog and description logics: Expressive power.
    In Proc. DBPL-1997, LNCS 1369, pp. 281–298. Springer, 1998.
 7. A. Calı̀, G. Gottlob, and M. Kifer. Taming the infinite chase: Query answering under ex-
    pressive relational constraints. In Proc. KR-2008, pp. 70–80. AAAI Press, 2008. Revised
    version: http://benner.dbai.tuwien.ac.at/staff/gottlob/CGK.pdf.
 8. A. Calı̀, G. Gottlob, and T. Lukasiewicz. A general Datalog-based framework for tractable
    query answering over ontologies. In Proc. PODS-2009. ACM Press, 2009.
 9. A. Calı̀ and M. Kifer. Containment of conjunctive object meta-queries. In Proc. VLDB-2006.
10. A. Calı̀, D. Lembo, R. Rosati. On the decidability and complexity of query answering over in-
    consistent and incomplete databases. In Proc. PODS-2003, pp. 260–271. ACM Press, 2003.
11. F. Calimeri, S. Cozza, and G. Ianni. External sources of knowledge and value invention in
    logic programming. Ann. Math. Artif. Intell., 50(3/4):333–361, 2007.
12. D. Calvanese, G. De Giacomo, D. Lembo, M. Lenzerini, and R. Rosati. Tractable reasoning
    and efficient query answering in description logics: The DL-Lite family. J. Autom. Reason-
    ing, 39(3):385–429, 2007.
13. A. Chandra and P. Merlin. Optimal implementation of conjunctive queries in relational data
    bases. In Proc. STOC-1977, pp. 77–90. ACM Press, 1977.
14. A. K. Chandra and M. Y. Vardi. The implication problem for functional and inclusion de-
    pendencies is undecidable. SIAM J. Comput., 14(3):671–677, 1985.
15. A. Deutsch, A. Nash, J. B. Remmel. The chase revisited. In Proc. PODS-2008, pp. 149–158.
16. F. M. Donini, M. Lenzerini, D. Nardi, and A. Schaerf. AL-log: Integrating Datalog and
    description logics. J. Intell. Inf. Syst., 10(3):227–252, 1998.
17. R. Fagin, P. G. Kolaitis, R. J. Miller, and L. Popa. Data exchange: Semantics and query
    answering. Theor. Comput. Sci., 336(1):89– 124, 2005.
18. D. Johnson and A. Klug. Testing containment of conjunctive queries under functional and
    inclusion dependencies. J. Comput. Syst. Sci., 28(1):167–189, 1984.
19. M. Lenzerini. Data integration: A theoretical perspective. In Proc. PODS-2002, pp. 233–246.
20. D. Maier, A. O. Mendelzon, and Y. Sagiv. Testing implications of data dependencies. ACM
    Trans. Database Syst., 4(4):455–469, 1979.
21. P. F. Patel-Schneider and I. Horrocks. Position paper: A comparison of two modelling
    paradigms in the Semantic Web. In Proc. WWW-2006, pp. 3–12. ACM Press, 2006.
22. A. Poggi, D. Lembo, D. Calvanese, G. De Giacomo, M. Lenzerini, and R. Rosati. Linking
    data to ontologies. J. Data Semantics, 10:133–173, 2008.
23. R. Rosati. On combining description logic ontologies and nonrecursive Datalog rules. In
    Proc. RR-2008, LNCS 5341, pp. 13–27. Springer, 2008.