On (In)Tractability of OBDA with OWL 2 QL

                  S. Kikot, R. Kontchakov, and M. Zakharyaschev

              Department of Computer Science and Information Systems,
                          Birkbeck College, London. U.K.
                      {kikot,roman,michael}@dcs.bbk.ac.uk


        Abstract. We show that, although conjunctive queries over OWL 2 QL
        ontologies are reducible to database queries, no algorithm can construct
        such a reduction in polynomial time without changing the data. On the
        other hand, we give a polynomial reduction for OWL 2 QL ontologies
        without role inclusions.


1     Introduction
Ontology-based data access (OBDA) [9, 13, 21] has recently emerged as a promis-
ing application area for description logic (DL) with a potential impact on the
new generation of information systems. One of the profiles of the Web Ontology
Language OWL 2, called OWL 2 QL, was tailored specifically aiming at OBDA.
In DL terms, OBDA involves the following reasoning problem:
CQA(A, T , q): given an ABox (data) A,1 a TBox (ontology describing the back-
  ground knowledge) T , a conjunctive query (CQ) q(x), and a tuple a of ABox
  elements, decide whether a is a certain answer to q(x) over (T , A).
In other words, the task is to check whether I |= q(a) for every model I of (T , A).
It is to be noted that reasoning problems of this kind are well known in logic
and computer science (cf. Prolog or Datalog). A distinctive feature of OWL 2 QL
is that ‘in OWL 2 QL, conjunctive query answering can be implemented using
conventional relational database systems. Using a suitable reasoning technique,
sound and complete conjunctive query answering can be performed in LogSpace
with respect to the size of the data’ (www.w3.org/TR/owl2-profiles).
    There exists a number of reductions of OBDA with OWL 2 QL to answer-
ing queries in relational database management systems, which transform (or
rewrite) the problem CQA(A, T , q) to the database query evaluation problem
QE(A, q 0 ), where the first-order (FO) query q 0 does not depend on A. They have
been implemented in the systems QuOnto [1], REQUIEM [20], Presto [23] and
Nyaya [11]. In all of these approaches, the size of the query q 0 , posed to the
database system, can be O((|T | · |q|)|q| ) in the worst case.
    The aim of this paper is to try and understand whether the exponential blow-
up in the size of the rewritten query is inevitable and whether polynomial rewrit-
ings are possible, at least for fragments of OWL 2 QL. In Section 3, we show that
1
    Here we ignore the problem of data representation in database systems; see Section 5.
the problem CQA({A(a)}, T , q) for singleton ABoxes and OWL 2 QL TBoxes is
NP-complete for combined complexity. As the problem QE({A(a)}, q 0 ) is solved
in linear time (LogSpace) in |q 0 |, it follows that no algorithm can construct FO
rewritings q 0 (over {A(a)}) in polynomial time, unless P = NP. For OWL 2 QL
without role inclusions and the logic ELH, the problem CQA({A(a)}, T , q) is
polynomial for combined complexity, while for ELI it is ExpTime-complete. We
observe that the parameterised complexity of the problem CQA({A(a)}, T , q),
where |q| is regarded as a parameter, is fixed-parameter tractable. In Section 4,
we present a polynomial FO rewriting of conjunctive queries over OWL 2 QL on-
tologies without role inclusions. This result improves on the polynomial rewriting
from [14], which reduces CQA(A, T , q) to QE(A + Aux , q 0 ), where Aux is a set
of fresh constants encoding the canonical model of (T , A). Note also the recent
polynomial reduction [12] of CQA(A, T , q) to QE(A + {0, 1}, q 00 ), which uses
two fresh constants 0, 1 and works for the extension Datalog± of OWL 2 QL
(see Remark 1). We discuss the implications of the obtained results for OBDA
in Section 5.


2    OWL 2 QL and DL-LiteH
                         core

The description logic underlying OWL 2 QL was introduced under the name
DL-LiteR [6, 7] and called DL-LiteH core in the more general classification of [2]
(for simplicity, we disregard some constructs such as reflexivity constraints).
The language of DL-LiteH  core contains individual names ai , concept names Ai ,
and role names Pi , i ≥ 1. Roles R and concepts B are defined by:

    R    ::=    Pi   |   Pi− ,         B    ::=    ⊥    |   >    |   Ai    |   ∃R.

A DL-LiteH    core TBox, T , is a finite set of concept and role inclusions of the form
B1 v B2 , B1 u B2 v ⊥ and R1 v R2 , R1 u R2 v ⊥, respectively. An ABox, A,
is a finite set of assertions of the form B(ai ) and R(ai , aj ). T and A together
constitute the knowledge base (KB) K = (T , A). The semantics of DL-LiteH           core
is defined as usual in DL [4]. The presented results do not depend on the UNA.
DL-Litecore is DL-LiteH   core without role inclusions of the form R1 v R2 . Note also
that OWL 2 QL contains concept inclusions of the form B 0 v ∃R.B, which here
will be regarded as abbreviations for DL-LiteH                      0            −
                                                   core inclusions B v ∃RB , ∃RB v B
and RB v R, where RB is a fresh role name.
    A conjunctive query (CQ) q(x) is a first-order formula ∃y ϕ(x, y), where ϕ
is constructed, using only ∧, from atoms of the form A(t1 ) and P (t1 , t2 ), with
A being a concept name, P a role name and ti a term (an individual name or
variable from x or y). Given an ABox A, we use Ind(A) to denote the set of
individual names in A. A tuple a ⊆ Ind(A) is a certain answer to q(x) over
K = (T , A) if I |= q[a] for all models I of K; in this case we write K |= q[a].
To simplify notation, we will often identify q with the set of its atoms and use
P − (t, t0 ) ∈ q as a synonym of P (t0 , t) ∈ q; term(q) is the set of terms in q.
    Query answering over OWL 2 QL KBs is based on the fact that, for any
consistent KB K = (T , A), there is an interpretation UK such that, for all CQs
q(x) and a ⊆ Ind(A), we have K |= q[a] iff UK |= q[a]. The interpretation UK ,
called the canonical interpretation of K, is constructed as follows. Let v∗T be
the reflexive and transitive closure of the role inclusion relation given by T ,
[R] = {S | R v∗T S and S v∗T R}, and let [R] ≤T [S] iff R v∗T S. For each [R],
we introduce a fresh symbol c[R] , the witness for [R], and define a generating
relation ;K on the set of these witnesses together with Ind(A) by taking:
 – a ;K c[R] if a ∈ Ind(A) and [R] is ≤T -minimal such that K |= ∃R(a) and
   K 6|= R(a, b) for every b ∈ Ind(A);
 – c[S] ;K c[R] if [R] is ≤T -minimal with T |= ∃S − v ∃R and [S − ] 6= [R].
A generating path for K is a finite sequence ac[R1 ] · · · c[Rn ] , n ≥ 0, such that
a ∈ Ind(A), a ;K c[R1 ] and c[Ri ] ;K c[Ri+1 ] , for i < n. Denote by path(K) the
set of all generating paths for K and by tail(σ) the last element in σ ∈ path(K).
Now, UK is defined by taking:
         ∆UK = path(K),         aUK = a, for all a ∈ Ind(A),
         AUK = {a ∈ Ind(A) | K |= A(a)} ∪ {σ · c[R] | T |= ∃R− v A},
         P UK = {(a, b) ∈ Ind(A) × Ind(A) | K |= P (a, b)} ∪
                   {(σ, σ · c[R] ) | tail(σ) ;K c[R] , [R] ≤T [P ]} ∪
                   {(σ · c[R] , σ) | tail(σ) ;K c[R] , [R] ≤T [P − ]}.
We shall also need a compact representation of (in general infinite) UK in the
form of the generating interpretation GK = (∆GK , ·GK ) defined as follows. Its
domain ∆GK consists of Ind(A) and all c[Rn ] for which there are generating paths
ac[R1 ] · · · c[Rn ] ∈ path(K); and we set aGK = aUK , AGK = {tail(σ) | σ ∈ AUK }
and P GK = {(tail(σ), tail(σ 0 )) | (σ, σ 0 ) ∈ P UK }. It is readily seen that GK can be
constructed in polynomial time in K.
     The problem CQA(A, T , q), for DL-LiteH         core TBoxes T , is reducible to the
database query evaluation problem QE(A, q 0 ), with q 0 being independent of
A [7, 2]. However, in all known reductions, the size of q 0 is exponential in the
size of q: for instance, |q 0 | = O((|T | · |q|)|q| ) for both QuOnto [1] and RE-
QUIEM [20]; Presto [23] uses sophisticated optimisation techniques and pro-
duces a non-recursive Datalog program q 0 , which is still exponential in the worst
case. The size of q 0 is irrelevant for data complexity, but heavily influences the
performance of database systems; see Section 5 for a discussion.
     In the next section, we show that no algorithm can produce FO rewritings
q 0 of CQs q and DL-LiteH    core TBoxes T in polynomial time (unless P = NP).


3    Intractability of Query Rewriting for OWL 2 QL
To see that query rewriting for DL-LiteH  core is not tractable, we separate the
contributions of A and T to the complexity of the problem CQA(A, T , q). Indeed,
NP-completeness of CQA(A, T , q) for combined complexity does not give any
information on the size of rewritings because the lower bound follows from NP-
hardness of database query evaluation. To remove the influence of A, one can
analyse the combined complexity of CQ answering over singleton ABoxes of the
form A = {A(a)}, i.e., the problem CQA({A(a)}, T , q). We call this measure
the primitive combined complexity (PCC). The reason behind this notion is that
any FO query q over such an ABox alone can be answered in linear time in |q|.
Thus, if CQ answering is NP-hard for PCC, then no algorithm can construct
FO rewritings QE(A, q 0 ) of CQA(A, T , q) in polynomial time, unless P = NP.
Theorem 1. CQ answering in DL-LiteH
                                  core is NP-complete for PCC.

Proof. The lower
           Vm bound is proved by reduction of Boolean satisfiability. Given
a CNF ϕ = j=1 Dj over variables p1 , . . . , pn , where Dj is a clause, we consider
the TBox T containing the following axioms, for 1 ≤ i ≤ n, 1 ≤ j ≤ m, k = 0, 1:

Ai−1 v ∃P − .Xik ,                 Xik v Ai ,
  Xi0 v ∃P.Cj           if ¬pi ∈ Dj ,      Xi1 v ∃P.Cj     if pi ∈ Dj ,      Cj   v ∃P.Cj .

The canonical interpretation UK of K = (T , {A0 (a)}) is obtained by ‘unravelling’
the generating interpretation GK shown below. Consider the CQ q(y0 ):
                                     Vn
  q(y0 ) = ∃yz 1 . . . z m A0 (y0 ) ∧ i=1 P (yi , yi−1 ) ∧ An (yn ) ∧
                                   Vm             j     Vn       j      j          j
                                                                                      
                                     j=1 P (yn , z0 ) ∧  i=1 P (zi−1 , zi ) ∧ Cj (zn ) .

(Note n atoms P connecting yn to y0 and n + 1 atoms P connecting yn to znj ,
which means that any match of q in UK must map znj onto a point in the infinite
chain containing Cj .) One can show that ϕ is satisfiable iff K |= q(a).

                                            C1           C2

                                        X11 , A1   X21 , A2       X31 , A3   X41 , A4
                   GK
                             A0
                             a
                                        X10 , A1   X20 , A2       X30 , A3   X40 , A4
                  q(y0 )     y0           y1         y2             y3         y4
                           A0 1                                                    A4
            z41              z3           z21        z11            z01
       C1
       C2
            z42              z32          z22        z12            z02

Theorem 2. Unless P = NP, no polynomial-time algorithm can reduce the
problem CQA(A, T , q), for DL-LiteH     core TBoxes T and CQs q, to the problem
QE(A, q 0 ), where q 0 is a first-order query independent of A.

   Note that it is still open whether, for any A, T and q, there exists a polyno-
mial FO query q 0 giving the same answers over A as q over (T , A).

Remark 1. If we extend the ABox with fresh constants      V0n and 1 then q(yV0m) in the
proof above can be rewritten as A0 (y0 ) ∧ ∃p1 . . . ∃pn ( i=1 (pi 6= y0 ) ∧ j=1 Dj0 ),
where Dj0 is obtained from Dj by replacing every literal pi with pi = 1 and every
¬pi with pi = 0. Moreover, using ∀pi , one can polynomially encode the PSpace-
complete validity problem for QBFs. A polynomial reduction of CQA(A, T , q)
to QE(A + {0, 1}, q 0 ) is given in [12] for the extension Datalog± of OWL 2 QL,
where |T | + |q| steps of the chase are simulated using 0 and 1.
    Theorem 1 means that there are two sources of non-determinism in OBDA
with OWL 2 QL: finding a match in the ABox and finding a match in the re-
maining tree part of the canonical interpretation. It turns out that, from the
complexity-theoretic point of view, these two sources have different status. Re-
call from [19] that query evaluation QE(A, q) is not fixed-parameter tractable if
|q| is regarded as a parameter.
    Our next result shows, on the contrary, that the problem CQA({A(a)}, T , q)
is fixed-parameter tractable for DL-LiteH  core TBoxes T . This means that there
exist a deterministic algorithm A, a computable function f and a polynomial p
such that, for any TBox T and CQ q, A can check whether (T , {A(a)}) |= q in
time bounded by f (|q|) · p(|T |). In a nutshell, the idea of the proof is as follows.
First, given a CQ q, we construct all tree-shaped homomorphic images of q, the
number of which is bounded by a function exponential in |q| and independent
of T . Then we show that (T , {A(a)}) |= q iff at least one of those tree-shaped
homomorphic images can be ‘embedded’ in UK , and that the existence of such
an embedding can be established by a dynamic programming (elimination) al-
gorithm in time polynomial in |T | and |q|.
Theorem 3. The problem CQA({A(a)}, T , q) with |q| a parameter is fixed-
parameter tractable for DL-LiteH
                               core TBoxes T .

Proof. A CQ q is tree-shaped if its primal graph (term(q), {(t, t0 ) | R(t, t0 ) ∈ q})
is a tree. By a tree reduct of q we mean a pair (q 0 , r), where q 0 is a set of atoms
and r ∈ term(q 0 ) is such that the following conditions are satisfied (cf. [10]):
(tree) the query q 0 is tree-shaped and all of its predicate names occur in q;
(root) if a ∈ term(q 0 ) then r = a;
(hom) there exists a surjection h : term(q) → term(q 0 ) such that h(a) = a for
    a ∈ term(q), A(h(t)) ∈ q 0 for A(t) ∈ q, and P (h(t), h(t0 )) ∈ q 0 for P (t, t0 ) ∈ q.
By (hom), for every I and every tree reduct (q 0 , r) of q, if I |= q 0 then I |= q.
    Let (q 0 , r) be a tree reduct of q and let K = (T , {A(a)}). An embedding of
(q , r) in UK is an injective map a : term(q 0 ) → ∆UK such that UK |=a q 0 and
  0

(e-root) a(t) = a(r) · σ, for all t ∈ term(q 0 ), i.e., a(r) is located in UK nearer to
    its root than any other a(t).
Let UK |= q. Then there is a homomorphism a of q in UK . As UK is a tree
with root aUK , we can construct a tree reduct (q 0 , r) of q by taking q 0 to be the
quotient of q under equivalence {(t, t0 ) | a(t) = a(t0 )} and r the equivalence class
of t such that a(t) is nearest to the root aUK . It follows that (q 0 , r) is embeddable
in UK . Checking whether a tree reduct (q 0 , r) of q is embeddable in UK can be
done in time polynomial in |T | and |q| using the interpretation GK (constructed
in polynomial time in |T |) and a standard dynamic programming algorithm [8].
    Theorem 1 reflects the interaction between role inclusions and inverse roles.
The observations below supplement this theorem by giving a somewhat broader
picture (we remind the reader that DL-Litehorn extends DL-Litecore with con-
cept inclusions of the form B1 u · · · u Bn v B, EL allows qualified existential
restrictions and conjunctions in both sides of concept inclusions, H allows role
inclusions and I inverse roles; for details see [3]):
Theorem 4. With respect to primitive combined complexity, CQ answering is
(i) P-complete for DL-Litehorn and ELH, and (ii) ExpTime-complete for ELI.
Proof. The polynomial-time upper bound for DL-Litehorn and ELH can be ob-
tained using the fact that, for each CQ q and each r ∈ term(q), one can construct
a unique tree reduct of q with root r (by eliminating ‘forks’) [16] and then check
whether it is embeddable in the generating interpretation as in the proof of The-
orem 3 (see also Section 4). ExpTime-completeness for ELI follows from [3].
    Although ALC and ALCH have no canonical interpretations (they are not
Horn), a unique tree reduct for a CQ with a root exists [10], and CQ answering
is ExpTime-complete for PCC; in ALCI, we again have to consider multiple
tree reducts, which makes CQ answering 2ExpTime-complete for PCC [16].
    That CQ answering is in P for PCC does not mean yet that there is a
polynomial rewriting q 0 for any CQ q and ontology T . For instance, as CQ
answering for ELH is P-complete for data complexity, we cannot have any first-
order rewriting at all. The reason is that if we put an ABox element a to a
concept A, then a TBox axiom of the form ∃R.A v B requires adding every
ABox element b with R(b, a) to B, and so on. In this case, a pre-processing of
the ABox, constructing the generating interpretation, is required; see [18].


4    Polynomial Rewriting for DL-Litecore
The combined approach to CQ answering [18, 14] first constructs the generating
interpretation GK for K = (T , A), and then rewrites the given CQ q (indepen-
dently of A) to an FO query q 0 to be answered over GK . An important achieve-
ment of this approach is that (i ) |q 0 | = O(|q|2 + |q| · |T |), even for DL-Litehorn ,
and (ii ) q 0 is obtained by expanding q by simple conjuncts with = and without
any extra variables and quantifiers. The two-step construction of GK and q 0 can
be encoded in a polynomial non-recursive Datalog program for DL-Litehorn , and
a polynomial FO query for DL-Litecore , which require auxiliary constants in the
database domain. Here we give a polynomial FO rewriting for DL-Litecore , which
is based on the ideas of [14] but does not involve any constants.
    Let T be a DL-Litecore TBox. As we do not have role inclusions, instead of
c[R] we write cR . Let RT = {cR | R a role in T } and R∗T be the set of all finite
words over RT (including the empty word ε). We use tail(σ) to denote the last
element of σ ∈ R∗T \ {ε}; by definition, tail(ε) = ε.
    Consider a CQ q(x). Without loss of generality we assume that (the primal
graph of) q is connected. Let R be a role and t a term in q. A partial function
f from term(q) to (RT )∗ is called a tree witness for (R, t) in q if
 – the domain of f is minimal with respect to set-theoretic inclusion,
 – f (t) = ε,
 – for all atoms S(s, s0 ) ∈ q with f (s) defined, we have
                       
                       cR ,
                                      if f (s) = ε and S = R,
              f (s0 ) = σ,             if f (s) = σ · cS − ,
                       
                          f (s) · cS , if f (s) 6= ε and tail(f (s)) 6= cS − .
                       

By definition, if a tree witness for (R, t) exists then it is unique; in this case
we denote it by fR,t and use dom fR,t for the domain of fR,t . Note that even
if q contains no atom of the form R(t, t0 ), the tree witness for (R, t) exists and
fR,t (t) = ε. Denote by q|R,t the set of atoms of q whose terms are in dom fR,t .
When we consider q|R,t as a query, we assume that all of its variables are free.
    Informally, a tree witness fR,t has root t and direction R, and describes the
situation where t is mapped to an ABox element a of some canonical interpreta-
tion without R-successors in the ABox. In this case, the only choice for mapping
any t0 in R(t, t0 ) ∈ q is acR = a · fR,t (t0 ). Further, any t00 in S(t0 , t00 ) ∈ q has
to be mapped to acR cS = a · fR,t (t00 ), if S 6= R− ; however, if S = R− then
t00 can only be mapped onto a, which reflects the fact that acR has a single
R− -successor a in the canonical interpretation. To illustrate, consider the CQ
q = {T (y0 , y1 ), S(y1 , y0 ), R(y1 , y2 ), S(y2 , y3 ), S(y4 , y3 )}. The tree witnesses
for (R, y1 ) and (S, y4 ) in q exist and are as depicted below:
     undef. S     ε         cR                  undef. S undef.            ε
                                  S c c                                         S   cS
       y0        y1    R     y2        R   S       y0          y1     R   y2
             T              cR                            T
                                        y3                                 ε        y3
         fR,y1               y4
                                   S                           fS,y4       y4
                                                                                 S


For (S, y1 ), (T − , y1 ) and (R− , y2 ), tree witnesses do not exist.

Proposition 1. Suppose a tree witness for (R, t) exists and s ∈ dom fR,t . If
fR,t (s) 6= ε then a tree witness exists for every (S, s) with tail(fR,t (s)) 6= cS − .
If fR,t (s) = ε then a tree witness exists for (S, s) with S = R. In either case,
dom fS,s ⊆ dom fR,t and fR,t (s0 ) = fR,t (s) · fS,s (s0 ), for all s0 ∈ dom fS,s .

    Even if a tree witness for (R, t) exists, q|R,t is not necessarily a tree-shaped
query. Define a relation ≡R as the set of all pairs (t, s) such that a tree witness
for (R, t) exists and fR,t (s) = ε. By Proposition 1, ≡R is an equivalence relation
(on its domain). By taking the quotient of q|R,t under ≡R , we obtain a tree
reduct of q|R,t (cf. [17]). We call q a quasi-tree S with root t ∈ term(q) if a tree
witness for (R, t) exists for all directions R and R dom fR,t = term(q).

Proposition 2. Suppose q is not a quasi-tree and tree witnesses exist for (R1 , t1 )
and (R2 , t2 ). If fR1 ,t1 (t2 ) is defined, fR1 ,t1 (t2 ) 6= ε then dom fR2 ,t2 ( dom fR1 ,t1
and fR1 ,t1 (s) = fR1 ,t1 (t2 ) · fR2 ,t2 (s), for all s ∈ dom fR2 ,t2 .

   We are now in a position to introduce the ingredients of our polynomial
rewriting. Let K = (T , A) and q(x) = ∃y ϕ(x, y). Consider an atom B(t) for a
concept B. Then
                                         _                                          _
        extB (x)   =                               B 0 (x)        ∨                     ∃w R(x, w)
                                     0             0                    role R s.t. T |=∃RvB
                           concept B s.t. T |=B vB

gives the answers to B(t) over ABox A: for every a ∈ Ind(A), we have UK |= B(a)
iff A |= extB (a). Note that, for all other elements σ in the domain ∆UK of UK ,
we have UK |= B(σ) iff T |= ∃T − v B, where tail(σ) = cT .

                                                  R1                               S2
q|R,t                                        a2                   Rn      s               quasi-tree q 0
                       R   t                                                       S1
               a1 cR                                   a2 cR1   a2 cR1 · · · cRn
                           a1                A

Consider now an atom R(t, t0 ) ∈ q and the ways its terms can be mapped in UK .
1. If both t and t0 are mapped to ABox elements a, a0 then UK |= R(a, a0 ) iff
R(a, a0 ) ∈ A because UK inherits the binary relations from A.
2. If t is mapped to an ABox element a and t0 to an ‘anonymous’ element in
∆UK \ Ind(A), then R(t, t0 ) can only be true if (i ) a ;K cR , (ii ) a tree witness for
(R, t) exists, and (iii ) q|R,t can be embedded into the sub-tree of UK beginning
with the edge (a, acR ); see the left-hand side of the picture above. Condition (i )
can be defined by the formula

                           wtR (x)       =        ext∃R (x) ∧ ¬∃w R(x, w).

For all R and a ∈ Ind(A), we have A |= wtR (a) iff a ;K cR (i.e., acR ∈ ∆UK ).
For condition (iii ), consider the conjunction treeAqR,t (x) of the formulas:
(t0 ) extA (x), for all A(s) ∈ q|R,t with fR,t (s) = ε;
(t1 ) > if T |= ∃T − v A and ⊥ otherwise, for all A(s) ∈ q|R,t , tail(fR,t (s)) = cT ;
(t2 ) > if T |= ∃T − v ∃S and ⊥ otherwise, for S(s, s0 ) ∈ q|R,t , tail(fR,t (s)) = cT .
One can show that A |= wtR (a) ∧ treeAqR,t (a) iff UK |=a q|R,t for an assignment
a such that a(s) = a · fR,t (s), for all s ∈ dom fR,t .
3. If both t, t0 are mapped to anonymous elements in ∆UK \ Ind(A), then two
more cases need consideration.
3.1. Suppose first that there is a tree witness for some (S, s) such that s is
mapped to an ABox element a with a ;K cS , (iv ) both t and t0 are in dom fS,s ,
and (v ) all the terms s0 ∈ dom fS,s with fS,s (s0 ) 6= ε are existentially quantified
variables in q (only existential variables can be mapped to anonymous elements).
In this case, as we observed above, R(t, t0 ) is true in UK if the formula

                       wtS (s) ∧ treeAqS,s (s) ∧ s≡S s0 (s = s0 )
                                                V

is true in A under an assignment a such that a(s0 ) = a·fS,s (s0 ), for s0 ∈ dom fS,s .
The disjunction of all such formulas for (S, s) satisfying (iv )–(v ) depends only
on the choice of terms t, t0 and will be denoted by attached-treet,t0 (x, y). (This
case is a generalisation of Case 2.)
3.2. Thus, it remains to consider the case (shown in the right-hand side of the
picture) where the whole query is mapped to the anonymous part of UK . Then
q a quasi-tree and all terms in q are existentially quantified variables that are
mapped to the sub-tree of UK generated by some ABox element a. More precisely,
a ∈ Ind(A) generates a S sequence of the form a ;K cR1 ;K · · · ;K cRn , q has a
root s (i.e., term(q) = S dom fS,s ), s is mapped to σ = acR1 · · · cRn , while all
other terms s0 are mapped to σ ·fS,s (s0 ). The latter condition can be captured by
a formula similar to the one in the previous case. The difference is that now we
begin with σ, tail(σ) = cRn (rather than a). To cope with this, consider the union
                                                                                 0
q 0 of q and {Rn (v, s)}, for a fresh variable v, and let treeTqcRn ,s be treeAqRn ,v ,
where the tree witnesses are computed in query q 0 . Note that treeTqcRn ,s is a
sentence because q 0 has no atoms for item (t0 ). We denote by detached-tree the
disjunction of sentences of the form

                               ∃w wtR1 (w) ∧ treeTqcRn ,s

for all roots s of q and all pairs of roles R1 , Rn such that there are R2 , . . . , Rn−1
with T |= ∃Ri− v ∃Ri+1 and Ri+1 6= Ri− , for 1 ≤ i < n; if q is not a quasi-tree
containing only existentially quantified variables, we set detached-tree = ⊥.
    Denote by q ∗ the result of replacing each A(t) and P (t, t0 ) in q with

               A∗ (t) = extA (t) ∨ attached-treet,t (x, y) ∨ detached-tree,
            P (t, t0 ) = P (t, t0 ) ∨ attached-treet,t0 (x, y) ∨ detached-tree,
              ∗


respectively. Note that these formulas depend not only on the predicate name
but also on the terms in the atom. The length of q ∗ is O(|q|2 · |T |3 ) and can be
made O(|q|2 · |T |) if the sentence detached-tree is computed separately (in fact,
for the majority of queries, e.g., queries with answer variables, it is simply ⊥).
Theorem 5. UK |=a q(x) iff A |=a q ∗ (x), whenever a(x) ∈ Ind(A) for all x ∈ x.
    The rewriting above can also be adapted to DL-Litehorn and even DL-LiteN    horn
under the UNA. In this case, however, we need non-recursive Datalog programs
to define the predicates extB (x); for details, see [14]. The non-recursive Datalog
queries can be transformed to unions of CQs, but at the expense of exponential
blowup. The problem whether a polynomial-size FO rewriting (without addi-
tional constants as in [12]) exists for DL-Litehorn is still open (and equivalent to
the complexity problem ‘LogSpace = P?’).


5    Discussion
FO reducibility (or AC0 data complexity) does not seem to provide enough in-
formation to judge whether a DL is suitable for OBDA. When measuring the
complexity of query evaluation in database systems, it is usually assumed that
queries are negligibly small compared to data. Thus, it makes sense to consider
data complexity [24], which takes account of the data but ignores the query. A
more subtle analysis [19] shows, however, that the obvious time |q|·|A||q| required
to check A |= q cannot be reduced to f (|q|) · p(|A|), for any computable function
f and polynomial p: QE(A, q) is W [1]-complete for parameterised complexity,
|q| being a parameter. The success of database systems—despite fixed-parameter
intractability—seems to imply that optimisation techniques are indispensable,
that the ‘real-world’ queries are small and of ‘special’ form. In OBDA, the latter
does not hold as the rewritten queries can be large and complex. However, data
complexity does not differentiate among, e.g., DL-Litecore , OWL 2 QL and the
language of sticky sets of TGDs [5], all of which are in AC0 for data complex-
ity, while the primitive combined complexity, reflecting the size of the rewrit-
ing, ranges from P to NP and further to ExpTime. Another explanation of
the database efficiency is that we only use queries with a bounded number of
variables, in which case query evaluation is P-complete for combined complex-
ity [25]. However, query rewritings may substantially increase the number of
variables (for example, a CQ q is rewritten in [12] into a query with O(N · log N )
auxiliary binary variables, where N = |T | + |q|).
    The W3C recommendation (www.w3.org/TR/owl2-profiles) for OBDA is
to reduce it to query evaluation in database systems. Two drawbacks of this
recommendation are that it (i ) disregards the complexity of possible reductions,
and (ii ) excludes some useful DLs from consideration. As we saw above, rewrit-
ings of CQs in OWL 2 QL cannot be done in polynomial time without adding
extra constants, variables and quantifiers as in [12]. One might argue that, in
the real-world ontologies, role inclusions do not interact with inverse roles in as
sophisticated way as in Theorem 1, but then more research is needed to support
                                                                        (HF )
this argument. A number of ‘lightweight’ DLs such as ELH or DL-Litehorn [2] are
deemed not suitable for OBDA because they are P-complete for data complexity.
Recall that both of these logics are P-complete for primitive combined complex-
ity (vs. NP in the case of OWL 2 QL). The combined approach to OBDA [18,
14, 15] resolves this issue by expanding the data at a pre-processing step and
then rewriting and answering CQs. The expansion is linear in |A| and can be
done by the database system itself; the size of the rewritten query for EL and
DL-LiteF horn is only quadratic (for OWL 2 QL, it is still exponential).
    In this paper, we do not touch on the problem of representing ABoxes in
database systems, where usually GLAV mappings are used to connect data
sources to ontologies. Such mappings introduce some problems as tuples in the
same relation can come from different data sources. Also, they provide certain
information on the completeness of concepts and roles, which can (and should)
be exploited in order to minimise the rewritings [22]. Finally, with so many lan-
guages and rewritings for OBDA suggested, it looks like the time is ripe for
comprehensive experiments that could clarify the future of OBDA with DLs.
Acknowledgments: supported by the U.K. EPSRC grant EP/H05099X/1.


References
 1. Acciarri, A., Calvanese, D., De Giacomo, G., Lembo, D., Lenzerini, M., Palmieri,
    M., Rosati, R.: QuOnto: Querying ontologies. In: Proc. AAAI, 1670–1671 (2005)
 2. Artale, A., Calvanese, D., Kontchakov, R., Zakharyaschev, M.: The DL-Lite family
    and relations. J. Artificial Intelligence Research 36, 1–69 (2009)
 3. Baader, F., Brandt, S., Lutz, C.: Pushing the EL envelope further. In: Clark, K.,
    Patel-Schneider, P.F. (eds.) In: Proc. OWLED DC (2008)
 4. Baader, F., Calvanese, D., McGuinness, D., Nardi, D., Patel-Schneider, P.F. (eds.):
    The Description Logic Handbook. Cambridge University Press (2003)
 5. Calı̀, A., Gottlob, G., Pieris, A.: Advanced processing for ontological queries. In:
    Proc. VLDB 3(1), 554–565 (2010)
 6. Calvanese, D., De Giacomo, G., Lembo, D., Lenzerini, M., Rosati, R.: Data com-
    plexity of query answering in description logics. In: Proc. KR. pp. 260–270 (2006)
 7. Calvanese, D., De Giacomo, G., Lembo, D., Lenzerini, M., Rosati, R.: Tractable
    reasoning and efficient query answering in description logics: The DL-Lite family.
    J. Automated Reasoning 39(3), 385–429 (2007)
 8. Dasgupta, S., Papadimitriou, C., Vazirani, U.V.: Algorithms. McGraw-Hill (2008)
 9. Dolby, J., Fokoue, A., Kalyanpur, A., Ma, L., Schonberg, E., Srinivas, K., Sun, X.:
    Scalable grounded conjunctive query evaluation over large and expressive knowl-
    edge bases. In: Proc. ISWC. LNCS, vol. 5318, pp. 403–418. Springer (2008)
10. Eiter, T., Lutz, C., Ortiz, M., Šimkus, M.: Query answering in description logics:
    The knots approach. In: Proc. WoLLIC. LNCS, vol. 5514. Springer (2009)
11. Gottlob, G., Orsi, G., Pieris, A.: Ontological queries: Rewriting and optimization.
    In: Proc. ICDE. (2011)
12. Gottlob, G., Schwentick, T.: Rewriting ontological queries into small nonrecursive
    Datalog programs. In: Proc. DL. (2011)
13. Heymans, S., Ma, L., et al.: Ontology reasoning with large data repositories. In:
    Ontology Management, pp. 89–128. Springer (2008)
14. Kontchakov, R., Lutz, C., Toman, D., Wolter, F., Zakharyaschev, M.: The com-
    bined approach to query answering in DL-Lite. In: Proc. KR (2010)
15. Kontchakov, R., Lutz, C., Toman, D., Wolter, F., Zakharyaschev, M.: The com-
    bined approach to ontology-based data access. In: Proc. IJCAI (2011)
16. Lutz, C.: Inverse roles make conjunctive queries hard. In: Proc. DL. CEUR Work-
    shop Proceedings, vol. 250. (2007)
17. Lutz, C.: The complexity of conjunctive query answering in expressive description
    logics. In: Proc. IJCAR. pp. 179–193. LNAI, vol. 5195. Springer (2008)
18. Lutz, C., Toman, D., Wolter, F.: Conjunctive query answering in the description
    logic EL using a relational database system. In: Proc. IJCAI. pp. 2070–2075 (2009)
19. Papadimitriou, C.H., Yannakakis, M.: On the complexity of database queries. J.
    Comput. Syst. Sci. 58(3), 407–427 (1999)
20. Pérez-Urbina, H., Motik, B., Horrocks, I.: A comparison of query rewriting tech-
    niques for DL-Lite. In: Proc. DL. CEUR Workshop Proceedings, vol. 477. (2009)
21. Poggi, A., Lembo, D., Calvanese, D., De Giacomo, G., Lenzerini, M., Rosati, R.:
    Linking data to ontologies. J. on Data Semantics X, 133–173 (2008)
22. Rodrı́guez-Muro M., Calvanese, D.: Dependencies to optimize ontology-based data
    access. In: Proc. DL. (2011)
23. Rosati, R., Almatelli, A.: Improving query answering over DL-Lite ontologies. In:
    Proc. KR (2010)
24. Vardi, M.: The complexity of relational query languages (extended abstract). In:
    Proc. STOC. pp. 137–146 (1982)
25. Vardi, M.: On the complexity of bounded-variable queries (extended abstract). In:
    Proc. PODS. pp. 266–276 (1995)