On (In)Tractability of OBDA with OWL 2 QL S. Kikot, R. Kontchakov, and M. Zakharyaschev Department of Computer Science and Information Systems, Birkbeck College, London. U.K. {kikot,roman,michael}@dcs.bbk.ac.uk Abstract. We show that, although conjunctive queries over OWL 2 QL ontologies are reducible to database queries, no algorithm can construct such a reduction in polynomial time without changing the data. On the other hand, we give a polynomial reduction for OWL 2 QL ontologies without role inclusions. 1 Introduction Ontology-based data access (OBDA) [9, 13, 21] has recently emerged as a promis- ing application area for description logic (DL) with a potential impact on the new generation of information systems. One of the profiles of the Web Ontology Language OWL 2, called OWL 2 QL, was tailored specifically aiming at OBDA. In DL terms, OBDA involves the following reasoning problem: CQA(A, T , q): given an ABox (data) A,1 a TBox (ontology describing the back- ground knowledge) T , a conjunctive query (CQ) q(x), and a tuple a of ABox elements, decide whether a is a certain answer to q(x) over (T , A). In other words, the task is to check whether I |= q(a) for every model I of (T , A). It is to be noted that reasoning problems of this kind are well known in logic and computer science (cf. Prolog or Datalog). A distinctive feature of OWL 2 QL is that ‘in OWL 2 QL, conjunctive query answering can be implemented using conventional relational database systems. Using a suitable reasoning technique, sound and complete conjunctive query answering can be performed in LogSpace with respect to the size of the data’ (www.w3.org/TR/owl2-profiles). There exists a number of reductions of OBDA with OWL 2 QL to answer- ing queries in relational database management systems, which transform (or rewrite) the problem CQA(A, T , q) to the database query evaluation problem QE(A, q 0 ), where the first-order (FO) query q 0 does not depend on A. They have been implemented in the systems QuOnto [1], REQUIEM [20], Presto [23] and Nyaya [11]. In all of these approaches, the size of the query q 0 , posed to the database system, can be O((|T | · |q|)|q| ) in the worst case. The aim of this paper is to try and understand whether the exponential blow- up in the size of the rewritten query is inevitable and whether polynomial rewrit- ings are possible, at least for fragments of OWL 2 QL. In Section 3, we show that 1 Here we ignore the problem of data representation in database systems; see Section 5. the problem CQA({A(a)}, T , q) for singleton ABoxes and OWL 2 QL TBoxes is NP-complete for combined complexity. As the problem QE({A(a)}, q 0 ) is solved in linear time (LogSpace) in |q 0 |, it follows that no algorithm can construct FO rewritings q 0 (over {A(a)}) in polynomial time, unless P = NP. For OWL 2 QL without role inclusions and the logic ELH, the problem CQA({A(a)}, T , q) is polynomial for combined complexity, while for ELI it is ExpTime-complete. We observe that the parameterised complexity of the problem CQA({A(a)}, T , q), where |q| is regarded as a parameter, is fixed-parameter tractable. In Section 4, we present a polynomial FO rewriting of conjunctive queries over OWL 2 QL on- tologies without role inclusions. This result improves on the polynomial rewriting from [14], which reduces CQA(A, T , q) to QE(A + Aux , q 0 ), where Aux is a set of fresh constants encoding the canonical model of (T , A). Note also the recent polynomial reduction [12] of CQA(A, T , q) to QE(A + {0, 1}, q 00 ), which uses two fresh constants 0, 1 and works for the extension Datalog± of OWL 2 QL (see Remark 1). We discuss the implications of the obtained results for OBDA in Section 5. 2 OWL 2 QL and DL-LiteH core The description logic underlying OWL 2 QL was introduced under the name DL-LiteR [6, 7] and called DL-LiteH core in the more general classification of [2] (for simplicity, we disregard some constructs such as reflexivity constraints). The language of DL-LiteH core contains individual names ai , concept names Ai , and role names Pi , i ≥ 1. Roles R and concepts B are defined by: R ::= Pi | Pi− , B ::= ⊥ | > | Ai | ∃R. A DL-LiteH core TBox, T , is a finite set of concept and role inclusions of the form B1 v B2 , B1 u B2 v ⊥ and R1 v R2 , R1 u R2 v ⊥, respectively. An ABox, A, is a finite set of assertions of the form B(ai ) and R(ai , aj ). T and A together constitute the knowledge base (KB) K = (T , A). The semantics of DL-LiteH core is defined as usual in DL [4]. The presented results do not depend on the UNA. DL-Litecore is DL-LiteH core without role inclusions of the form R1 v R2 . Note also that OWL 2 QL contains concept inclusions of the form B 0 v ∃R.B, which here will be regarded as abbreviations for DL-LiteH 0 − core inclusions B v ∃RB , ∃RB v B and RB v R, where RB is a fresh role name. A conjunctive query (CQ) q(x) is a first-order formula ∃y ϕ(x, y), where ϕ is constructed, using only ∧, from atoms of the form A(t1 ) and P (t1 , t2 ), with A being a concept name, P a role name and ti a term (an individual name or variable from x or y). Given an ABox A, we use Ind(A) to denote the set of individual names in A. A tuple a ⊆ Ind(A) is a certain answer to q(x) over K = (T , A) if I |= q[a] for all models I of K; in this case we write K |= q[a]. To simplify notation, we will often identify q with the set of its atoms and use P − (t, t0 ) ∈ q as a synonym of P (t0 , t) ∈ q; term(q) is the set of terms in q. Query answering over OWL 2 QL KBs is based on the fact that, for any consistent KB K = (T , A), there is an interpretation UK such that, for all CQs q(x) and a ⊆ Ind(A), we have K |= q[a] iff UK |= q[a]. The interpretation UK , called the canonical interpretation of K, is constructed as follows. Let v∗T be the reflexive and transitive closure of the role inclusion relation given by T , [R] = {S | R v∗T S and S v∗T R}, and let [R] ≤T [S] iff R v∗T S. For each [R], we introduce a fresh symbol c[R] , the witness for [R], and define a generating relation ;K on the set of these witnesses together with Ind(A) by taking: – a ;K c[R] if a ∈ Ind(A) and [R] is ≤T -minimal such that K |= ∃R(a) and K 6|= R(a, b) for every b ∈ Ind(A); – c[S] ;K c[R] if [R] is ≤T -minimal with T |= ∃S − v ∃R and [S − ] 6= [R]. A generating path for K is a finite sequence ac[R1 ] · · · c[Rn ] , n ≥ 0, such that a ∈ Ind(A), a ;K c[R1 ] and c[Ri ] ;K c[Ri+1 ] , for i < n. Denote by path(K) the set of all generating paths for K and by tail(σ) the last element in σ ∈ path(K). Now, UK is defined by taking: ∆UK = path(K), aUK = a, for all a ∈ Ind(A), AUK = {a ∈ Ind(A) | K |= A(a)} ∪ {σ · c[R] | T |= ∃R− v A}, P UK = {(a, b) ∈ Ind(A) × Ind(A) | K |= P (a, b)} ∪ {(σ, σ · c[R] ) | tail(σ) ;K c[R] , [R] ≤T [P ]} ∪ {(σ · c[R] , σ) | tail(σ) ;K c[R] , [R] ≤T [P − ]}. We shall also need a compact representation of (in general infinite) UK in the form of the generating interpretation GK = (∆GK , ·GK ) defined as follows. Its domain ∆GK consists of Ind(A) and all c[Rn ] for which there are generating paths ac[R1 ] · · · c[Rn ] ∈ path(K); and we set aGK = aUK , AGK = {tail(σ) | σ ∈ AUK } and P GK = {(tail(σ), tail(σ 0 )) | (σ, σ 0 ) ∈ P UK }. It is readily seen that GK can be constructed in polynomial time in K. The problem CQA(A, T , q), for DL-LiteH core TBoxes T , is reducible to the database query evaluation problem QE(A, q 0 ), with q 0 being independent of A [7, 2]. However, in all known reductions, the size of q 0 is exponential in the size of q: for instance, |q 0 | = O((|T | · |q|)|q| ) for both QuOnto [1] and RE- QUIEM [20]; Presto [23] uses sophisticated optimisation techniques and pro- duces a non-recursive Datalog program q 0 , which is still exponential in the worst case. The size of q 0 is irrelevant for data complexity, but heavily influences the performance of database systems; see Section 5 for a discussion. In the next section, we show that no algorithm can produce FO rewritings q 0 of CQs q and DL-LiteH core TBoxes T in polynomial time (unless P = NP). 3 Intractability of Query Rewriting for OWL 2 QL To see that query rewriting for DL-LiteH core is not tractable, we separate the contributions of A and T to the complexity of the problem CQA(A, T , q). Indeed, NP-completeness of CQA(A, T , q) for combined complexity does not give any information on the size of rewritings because the lower bound follows from NP- hardness of database query evaluation. To remove the influence of A, one can analyse the combined complexity of CQ answering over singleton ABoxes of the form A = {A(a)}, i.e., the problem CQA({A(a)}, T , q). We call this measure the primitive combined complexity (PCC). The reason behind this notion is that any FO query q over such an ABox alone can be answered in linear time in |q|. Thus, if CQ answering is NP-hard for PCC, then no algorithm can construct FO rewritings QE(A, q 0 ) of CQA(A, T , q) in polynomial time, unless P = NP. Theorem 1. CQ answering in DL-LiteH core is NP-complete for PCC. Proof. The lower Vm bound is proved by reduction of Boolean satisfiability. Given a CNF ϕ = j=1 Dj over variables p1 , . . . , pn , where Dj is a clause, we consider the TBox T containing the following axioms, for 1 ≤ i ≤ n, 1 ≤ j ≤ m, k = 0, 1: Ai−1 v ∃P − .Xik , Xik v Ai , Xi0 v ∃P.Cj if ¬pi ∈ Dj , Xi1 v ∃P.Cj if pi ∈ Dj , Cj v ∃P.Cj . The canonical interpretation UK of K = (T , {A0 (a)}) is obtained by ‘unravelling’ the generating interpretation GK shown below. Consider the CQ q(y0 ):  Vn q(y0 ) = ∃yz 1 . . . z m A0 (y0 ) ∧ i=1 P (yi , yi−1 ) ∧ An (yn ) ∧ Vm j Vn j j j  j=1 P (yn , z0 ) ∧ i=1 P (zi−1 , zi ) ∧ Cj (zn ) . (Note n atoms P connecting yn to y0 and n + 1 atoms P connecting yn to znj , which means that any match of q in UK must map znj onto a point in the infinite chain containing Cj .) One can show that ϕ is satisfiable iff K |= q(a). C1 C2 X11 , A1 X21 , A2 X31 , A3 X41 , A4 GK A0 a X10 , A1 X20 , A2 X30 , A3 X40 , A4 q(y0 ) y0 y1 y2 y3 y4 A0 1 A4 z41 z3 z21 z11 z01 C1 C2 z42 z32 z22 z12 z02 Theorem 2. Unless P = NP, no polynomial-time algorithm can reduce the problem CQA(A, T , q), for DL-LiteH core TBoxes T and CQs q, to the problem QE(A, q 0 ), where q 0 is a first-order query independent of A. Note that it is still open whether, for any A, T and q, there exists a polyno- mial FO query q 0 giving the same answers over A as q over (T , A). Remark 1. If we extend the ABox with fresh constants V0n and 1 then q(yV0m) in the proof above can be rewritten as A0 (y0 ) ∧ ∃p1 . . . ∃pn ( i=1 (pi 6= y0 ) ∧ j=1 Dj0 ), where Dj0 is obtained from Dj by replacing every literal pi with pi = 1 and every ¬pi with pi = 0. Moreover, using ∀pi , one can polynomially encode the PSpace- complete validity problem for QBFs. A polynomial reduction of CQA(A, T , q) to QE(A + {0, 1}, q 0 ) is given in [12] for the extension Datalog± of OWL 2 QL, where |T | + |q| steps of the chase are simulated using 0 and 1. Theorem 1 means that there are two sources of non-determinism in OBDA with OWL 2 QL: finding a match in the ABox and finding a match in the re- maining tree part of the canonical interpretation. It turns out that, from the complexity-theoretic point of view, these two sources have different status. Re- call from [19] that query evaluation QE(A, q) is not fixed-parameter tractable if |q| is regarded as a parameter. Our next result shows, on the contrary, that the problem CQA({A(a)}, T , q) is fixed-parameter tractable for DL-LiteH core TBoxes T . This means that there exist a deterministic algorithm A, a computable function f and a polynomial p such that, for any TBox T and CQ q, A can check whether (T , {A(a)}) |= q in time bounded by f (|q|) · p(|T |). In a nutshell, the idea of the proof is as follows. First, given a CQ q, we construct all tree-shaped homomorphic images of q, the number of which is bounded by a function exponential in |q| and independent of T . Then we show that (T , {A(a)}) |= q iff at least one of those tree-shaped homomorphic images can be ‘embedded’ in UK , and that the existence of such an embedding can be established by a dynamic programming (elimination) al- gorithm in time polynomial in |T | and |q|. Theorem 3. The problem CQA({A(a)}, T , q) with |q| a parameter is fixed- parameter tractable for DL-LiteH core TBoxes T . Proof. A CQ q is tree-shaped if its primal graph (term(q), {(t, t0 ) | R(t, t0 ) ∈ q}) is a tree. By a tree reduct of q we mean a pair (q 0 , r), where q 0 is a set of atoms and r ∈ term(q 0 ) is such that the following conditions are satisfied (cf. [10]): (tree) the query q 0 is tree-shaped and all of its predicate names occur in q; (root) if a ∈ term(q 0 ) then r = a; (hom) there exists a surjection h : term(q) → term(q 0 ) such that h(a) = a for a ∈ term(q), A(h(t)) ∈ q 0 for A(t) ∈ q, and P (h(t), h(t0 )) ∈ q 0 for P (t, t0 ) ∈ q. By (hom), for every I and every tree reduct (q 0 , r) of q, if I |= q 0 then I |= q. Let (q 0 , r) be a tree reduct of q and let K = (T , {A(a)}). An embedding of (q , r) in UK is an injective map a : term(q 0 ) → ∆UK such that UK |=a q 0 and 0 (e-root) a(t) = a(r) · σ, for all t ∈ term(q 0 ), i.e., a(r) is located in UK nearer to its root than any other a(t). Let UK |= q. Then there is a homomorphism a of q in UK . As UK is a tree with root aUK , we can construct a tree reduct (q 0 , r) of q by taking q 0 to be the quotient of q under equivalence {(t, t0 ) | a(t) = a(t0 )} and r the equivalence class of t such that a(t) is nearest to the root aUK . It follows that (q 0 , r) is embeddable in UK . Checking whether a tree reduct (q 0 , r) of q is embeddable in UK can be done in time polynomial in |T | and |q| using the interpretation GK (constructed in polynomial time in |T |) and a standard dynamic programming algorithm [8]. Theorem 1 reflects the interaction between role inclusions and inverse roles. The observations below supplement this theorem by giving a somewhat broader picture (we remind the reader that DL-Litehorn extends DL-Litecore with con- cept inclusions of the form B1 u · · · u Bn v B, EL allows qualified existential restrictions and conjunctions in both sides of concept inclusions, H allows role inclusions and I inverse roles; for details see [3]): Theorem 4. With respect to primitive combined complexity, CQ answering is (i) P-complete for DL-Litehorn and ELH, and (ii) ExpTime-complete for ELI. Proof. The polynomial-time upper bound for DL-Litehorn and ELH can be ob- tained using the fact that, for each CQ q and each r ∈ term(q), one can construct a unique tree reduct of q with root r (by eliminating ‘forks’) [16] and then check whether it is embeddable in the generating interpretation as in the proof of The- orem 3 (see also Section 4). ExpTime-completeness for ELI follows from [3]. Although ALC and ALCH have no canonical interpretations (they are not Horn), a unique tree reduct for a CQ with a root exists [10], and CQ answering is ExpTime-complete for PCC; in ALCI, we again have to consider multiple tree reducts, which makes CQ answering 2ExpTime-complete for PCC [16]. That CQ answering is in P for PCC does not mean yet that there is a polynomial rewriting q 0 for any CQ q and ontology T . For instance, as CQ answering for ELH is P-complete for data complexity, we cannot have any first- order rewriting at all. The reason is that if we put an ABox element a to a concept A, then a TBox axiom of the form ∃R.A v B requires adding every ABox element b with R(b, a) to B, and so on. In this case, a pre-processing of the ABox, constructing the generating interpretation, is required; see [18]. 4 Polynomial Rewriting for DL-Litecore The combined approach to CQ answering [18, 14] first constructs the generating interpretation GK for K = (T , A), and then rewrites the given CQ q (indepen- dently of A) to an FO query q 0 to be answered over GK . An important achieve- ment of this approach is that (i ) |q 0 | = O(|q|2 + |q| · |T |), even for DL-Litehorn , and (ii ) q 0 is obtained by expanding q by simple conjuncts with = and without any extra variables and quantifiers. The two-step construction of GK and q 0 can be encoded in a polynomial non-recursive Datalog program for DL-Litehorn , and a polynomial FO query for DL-Litecore , which require auxiliary constants in the database domain. Here we give a polynomial FO rewriting for DL-Litecore , which is based on the ideas of [14] but does not involve any constants. Let T be a DL-Litecore TBox. As we do not have role inclusions, instead of c[R] we write cR . Let RT = {cR | R a role in T } and R∗T be the set of all finite words over RT (including the empty word ε). We use tail(σ) to denote the last element of σ ∈ R∗T \ {ε}; by definition, tail(ε) = ε. Consider a CQ q(x). Without loss of generality we assume that (the primal graph of) q is connected. Let R be a role and t a term in q. A partial function f from term(q) to (RT )∗ is called a tree witness for (R, t) in q if – the domain of f is minimal with respect to set-theoretic inclusion, – f (t) = ε, – for all atoms S(s, s0 ) ∈ q with f (s) defined, we have  cR ,  if f (s) = ε and S = R, f (s0 ) = σ, if f (s) = σ · cS − ,  f (s) · cS , if f (s) 6= ε and tail(f (s)) 6= cS − .  By definition, if a tree witness for (R, t) exists then it is unique; in this case we denote it by fR,t and use dom fR,t for the domain of fR,t . Note that even if q contains no atom of the form R(t, t0 ), the tree witness for (R, t) exists and fR,t (t) = ε. Denote by q|R,t the set of atoms of q whose terms are in dom fR,t . When we consider q|R,t as a query, we assume that all of its variables are free. Informally, a tree witness fR,t has root t and direction R, and describes the situation where t is mapped to an ABox element a of some canonical interpreta- tion without R-successors in the ABox. In this case, the only choice for mapping any t0 in R(t, t0 ) ∈ q is acR = a · fR,t (t0 ). Further, any t00 in S(t0 , t00 ) ∈ q has to be mapped to acR cS = a · fR,t (t00 ), if S 6= R− ; however, if S = R− then t00 can only be mapped onto a, which reflects the fact that acR has a single R− -successor a in the canonical interpretation. To illustrate, consider the CQ q = {T (y0 , y1 ), S(y1 , y0 ), R(y1 , y2 ), S(y2 , y3 ), S(y4 , y3 )}. The tree witnesses for (R, y1 ) and (S, y4 ) in q exist and are as depicted below: undef. S ε cR undef. S undef. ε S c c S cS y0 y1 R y2 R S y0 y1 R y2 T cR T y3 ε y3 fR,y1 y4 S fS,y4 y4 S For (S, y1 ), (T − , y1 ) and (R− , y2 ), tree witnesses do not exist. Proposition 1. Suppose a tree witness for (R, t) exists and s ∈ dom fR,t . If fR,t (s) 6= ε then a tree witness exists for every (S, s) with tail(fR,t (s)) 6= cS − . If fR,t (s) = ε then a tree witness exists for (S, s) with S = R. In either case, dom fS,s ⊆ dom fR,t and fR,t (s0 ) = fR,t (s) · fS,s (s0 ), for all s0 ∈ dom fS,s . Even if a tree witness for (R, t) exists, q|R,t is not necessarily a tree-shaped query. Define a relation ≡R as the set of all pairs (t, s) such that a tree witness for (R, t) exists and fR,t (s) = ε. By Proposition 1, ≡R is an equivalence relation (on its domain). By taking the quotient of q|R,t under ≡R , we obtain a tree reduct of q|R,t (cf. [17]). We call q a quasi-tree S with root t ∈ term(q) if a tree witness for (R, t) exists for all directions R and R dom fR,t = term(q). Proposition 2. Suppose q is not a quasi-tree and tree witnesses exist for (R1 , t1 ) and (R2 , t2 ). If fR1 ,t1 (t2 ) is defined, fR1 ,t1 (t2 ) 6= ε then dom fR2 ,t2 ( dom fR1 ,t1 and fR1 ,t1 (s) = fR1 ,t1 (t2 ) · fR2 ,t2 (s), for all s ∈ dom fR2 ,t2 . We are now in a position to introduce the ingredients of our polynomial rewriting. Let K = (T , A) and q(x) = ∃y ϕ(x, y). Consider an atom B(t) for a concept B. Then _ _ extB (x) = B 0 (x) ∨ ∃w R(x, w) 0 0 role R s.t. T |=∃RvB concept B s.t. T |=B vB gives the answers to B(t) over ABox A: for every a ∈ Ind(A), we have UK |= B(a) iff A |= extB (a). Note that, for all other elements σ in the domain ∆UK of UK , we have UK |= B(σ) iff T |= ∃T − v B, where tail(σ) = cT . R1 S2 q|R,t a2 Rn s quasi-tree q 0 R t S1 a1 cR a2 cR1 a2 cR1 · · · cRn a1 A Consider now an atom R(t, t0 ) ∈ q and the ways its terms can be mapped in UK . 1. If both t and t0 are mapped to ABox elements a, a0 then UK |= R(a, a0 ) iff R(a, a0 ) ∈ A because UK inherits the binary relations from A. 2. If t is mapped to an ABox element a and t0 to an ‘anonymous’ element in ∆UK \ Ind(A), then R(t, t0 ) can only be true if (i ) a ;K cR , (ii ) a tree witness for (R, t) exists, and (iii ) q|R,t can be embedded into the sub-tree of UK beginning with the edge (a, acR ); see the left-hand side of the picture above. Condition (i ) can be defined by the formula wtR (x) = ext∃R (x) ∧ ¬∃w R(x, w). For all R and a ∈ Ind(A), we have A |= wtR (a) iff a ;K cR (i.e., acR ∈ ∆UK ). For condition (iii ), consider the conjunction treeAqR,t (x) of the formulas: (t0 ) extA (x), for all A(s) ∈ q|R,t with fR,t (s) = ε; (t1 ) > if T |= ∃T − v A and ⊥ otherwise, for all A(s) ∈ q|R,t , tail(fR,t (s)) = cT ; (t2 ) > if T |= ∃T − v ∃S and ⊥ otherwise, for S(s, s0 ) ∈ q|R,t , tail(fR,t (s)) = cT . One can show that A |= wtR (a) ∧ treeAqR,t (a) iff UK |=a q|R,t for an assignment a such that a(s) = a · fR,t (s), for all s ∈ dom fR,t . 3. If both t, t0 are mapped to anonymous elements in ∆UK \ Ind(A), then two more cases need consideration. 3.1. Suppose first that there is a tree witness for some (S, s) such that s is mapped to an ABox element a with a ;K cS , (iv ) both t and t0 are in dom fS,s , and (v ) all the terms s0 ∈ dom fS,s with fS,s (s0 ) 6= ε are existentially quantified variables in q (only existential variables can be mapped to anonymous elements). In this case, as we observed above, R(t, t0 ) is true in UK if the formula wtS (s) ∧ treeAqS,s (s) ∧ s≡S s0 (s = s0 ) V is true in A under an assignment a such that a(s0 ) = a·fS,s (s0 ), for s0 ∈ dom fS,s . The disjunction of all such formulas for (S, s) satisfying (iv )–(v ) depends only on the choice of terms t, t0 and will be denoted by attached-treet,t0 (x, y). (This case is a generalisation of Case 2.) 3.2. Thus, it remains to consider the case (shown in the right-hand side of the picture) where the whole query is mapped to the anonymous part of UK . Then q a quasi-tree and all terms in q are existentially quantified variables that are mapped to the sub-tree of UK generated by some ABox element a. More precisely, a ∈ Ind(A) generates a S sequence of the form a ;K cR1 ;K · · · ;K cRn , q has a root s (i.e., term(q) = S dom fS,s ), s is mapped to σ = acR1 · · · cRn , while all other terms s0 are mapped to σ ·fS,s (s0 ). The latter condition can be captured by a formula similar to the one in the previous case. The difference is that now we begin with σ, tail(σ) = cRn (rather than a). To cope with this, consider the union 0 q 0 of q and {Rn (v, s)}, for a fresh variable v, and let treeTqcRn ,s be treeAqRn ,v , where the tree witnesses are computed in query q 0 . Note that treeTqcRn ,s is a sentence because q 0 has no atoms for item (t0 ). We denote by detached-tree the disjunction of sentences of the form ∃w wtR1 (w) ∧ treeTqcRn ,s for all roots s of q and all pairs of roles R1 , Rn such that there are R2 , . . . , Rn−1 with T |= ∃Ri− v ∃Ri+1 and Ri+1 6= Ri− , for 1 ≤ i < n; if q is not a quasi-tree containing only existentially quantified variables, we set detached-tree = ⊥. Denote by q ∗ the result of replacing each A(t) and P (t, t0 ) in q with A∗ (t) = extA (t) ∨ attached-treet,t (x, y) ∨ detached-tree, P (t, t0 ) = P (t, t0 ) ∨ attached-treet,t0 (x, y) ∨ detached-tree, ∗ respectively. Note that these formulas depend not only on the predicate name but also on the terms in the atom. The length of q ∗ is O(|q|2 · |T |3 ) and can be made O(|q|2 · |T |) if the sentence detached-tree is computed separately (in fact, for the majority of queries, e.g., queries with answer variables, it is simply ⊥). Theorem 5. UK |=a q(x) iff A |=a q ∗ (x), whenever a(x) ∈ Ind(A) for all x ∈ x. The rewriting above can also be adapted to DL-Litehorn and even DL-LiteN horn under the UNA. In this case, however, we need non-recursive Datalog programs to define the predicates extB (x); for details, see [14]. The non-recursive Datalog queries can be transformed to unions of CQs, but at the expense of exponential blowup. The problem whether a polynomial-size FO rewriting (without addi- tional constants as in [12]) exists for DL-Litehorn is still open (and equivalent to the complexity problem ‘LogSpace = P?’). 5 Discussion FO reducibility (or AC0 data complexity) does not seem to provide enough in- formation to judge whether a DL is suitable for OBDA. When measuring the complexity of query evaluation in database systems, it is usually assumed that queries are negligibly small compared to data. Thus, it makes sense to consider data complexity [24], which takes account of the data but ignores the query. A more subtle analysis [19] shows, however, that the obvious time |q|·|A||q| required to check A |= q cannot be reduced to f (|q|) · p(|A|), for any computable function f and polynomial p: QE(A, q) is W [1]-complete for parameterised complexity, |q| being a parameter. The success of database systems—despite fixed-parameter intractability—seems to imply that optimisation techniques are indispensable, that the ‘real-world’ queries are small and of ‘special’ form. In OBDA, the latter does not hold as the rewritten queries can be large and complex. However, data complexity does not differentiate among, e.g., DL-Litecore , OWL 2 QL and the language of sticky sets of TGDs [5], all of which are in AC0 for data complex- ity, while the primitive combined complexity, reflecting the size of the rewrit- ing, ranges from P to NP and further to ExpTime. Another explanation of the database efficiency is that we only use queries with a bounded number of variables, in which case query evaluation is P-complete for combined complex- ity [25]. However, query rewritings may substantially increase the number of variables (for example, a CQ q is rewritten in [12] into a query with O(N · log N ) auxiliary binary variables, where N = |T | + |q|). The W3C recommendation (www.w3.org/TR/owl2-profiles) for OBDA is to reduce it to query evaluation in database systems. Two drawbacks of this recommendation are that it (i ) disregards the complexity of possible reductions, and (ii ) excludes some useful DLs from consideration. As we saw above, rewrit- ings of CQs in OWL 2 QL cannot be done in polynomial time without adding extra constants, variables and quantifiers as in [12]. One might argue that, in the real-world ontologies, role inclusions do not interact with inverse roles in as sophisticated way as in Theorem 1, but then more research is needed to support (HF ) this argument. A number of ‘lightweight’ DLs such as ELH or DL-Litehorn [2] are deemed not suitable for OBDA because they are P-complete for data complexity. Recall that both of these logics are P-complete for primitive combined complex- ity (vs. NP in the case of OWL 2 QL). The combined approach to OBDA [18, 14, 15] resolves this issue by expanding the data at a pre-processing step and then rewriting and answering CQs. The expansion is linear in |A| and can be done by the database system itself; the size of the rewritten query for EL and DL-LiteF horn is only quadratic (for OWL 2 QL, it is still exponential). In this paper, we do not touch on the problem of representing ABoxes in database systems, where usually GLAV mappings are used to connect data sources to ontologies. Such mappings introduce some problems as tuples in the same relation can come from different data sources. Also, they provide certain information on the completeness of concepts and roles, which can (and should) be exploited in order to minimise the rewritings [22]. Finally, with so many lan- guages and rewritings for OBDA suggested, it looks like the time is ripe for comprehensive experiments that could clarify the future of OBDA with DLs. Acknowledgments: supported by the U.K. EPSRC grant EP/H05099X/1. References 1. Acciarri, A., Calvanese, D., De Giacomo, G., Lembo, D., Lenzerini, M., Palmieri, M., Rosati, R.: QuOnto: Querying ontologies. In: Proc. AAAI, 1670–1671 (2005) 2. Artale, A., Calvanese, D., Kontchakov, R., Zakharyaschev, M.: The DL-Lite family and relations. J. Artificial Intelligence Research 36, 1–69 (2009) 3. Baader, F., Brandt, S., Lutz, C.: Pushing the EL envelope further. In: Clark, K., Patel-Schneider, P.F. (eds.) In: Proc. OWLED DC (2008) 4. Baader, F., Calvanese, D., McGuinness, D., Nardi, D., Patel-Schneider, P.F. (eds.): The Description Logic Handbook. Cambridge University Press (2003) 5. Calı̀, A., Gottlob, G., Pieris, A.: Advanced processing for ontological queries. In: Proc. VLDB 3(1), 554–565 (2010) 6. Calvanese, D., De Giacomo, G., Lembo, D., Lenzerini, M., Rosati, R.: Data com- plexity of query answering in description logics. In: Proc. KR. pp. 260–270 (2006) 7. Calvanese, D., De Giacomo, G., Lembo, D., Lenzerini, M., Rosati, R.: Tractable reasoning and efficient query answering in description logics: The DL-Lite family. J. Automated Reasoning 39(3), 385–429 (2007) 8. Dasgupta, S., Papadimitriou, C., Vazirani, U.V.: Algorithms. McGraw-Hill (2008) 9. Dolby, J., Fokoue, A., Kalyanpur, A., Ma, L., Schonberg, E., Srinivas, K., Sun, X.: Scalable grounded conjunctive query evaluation over large and expressive knowl- edge bases. In: Proc. ISWC. LNCS, vol. 5318, pp. 403–418. Springer (2008) 10. Eiter, T., Lutz, C., Ortiz, M., Šimkus, M.: Query answering in description logics: The knots approach. In: Proc. WoLLIC. LNCS, vol. 5514. Springer (2009) 11. Gottlob, G., Orsi, G., Pieris, A.: Ontological queries: Rewriting and optimization. In: Proc. ICDE. (2011) 12. Gottlob, G., Schwentick, T.: Rewriting ontological queries into small nonrecursive Datalog programs. In: Proc. DL. (2011) 13. Heymans, S., Ma, L., et al.: Ontology reasoning with large data repositories. In: Ontology Management, pp. 89–128. Springer (2008) 14. Kontchakov, R., Lutz, C., Toman, D., Wolter, F., Zakharyaschev, M.: The com- bined approach to query answering in DL-Lite. In: Proc. KR (2010) 15. Kontchakov, R., Lutz, C., Toman, D., Wolter, F., Zakharyaschev, M.: The com- bined approach to ontology-based data access. In: Proc. IJCAI (2011) 16. Lutz, C.: Inverse roles make conjunctive queries hard. In: Proc. DL. CEUR Work- shop Proceedings, vol. 250. (2007) 17. Lutz, C.: The complexity of conjunctive query answering in expressive description logics. In: Proc. IJCAR. pp. 179–193. LNAI, vol. 5195. Springer (2008) 18. Lutz, C., Toman, D., Wolter, F.: Conjunctive query answering in the description logic EL using a relational database system. In: Proc. IJCAI. pp. 2070–2075 (2009) 19. Papadimitriou, C.H., Yannakakis, M.: On the complexity of database queries. J. Comput. Syst. Sci. 58(3), 407–427 (1999) 20. Pérez-Urbina, H., Motik, B., Horrocks, I.: A comparison of query rewriting tech- niques for DL-Lite. In: Proc. DL. CEUR Workshop Proceedings, vol. 477. (2009) 21. Poggi, A., Lembo, D., Calvanese, D., De Giacomo, G., Lenzerini, M., Rosati, R.: Linking data to ontologies. J. on Data Semantics X, 133–173 (2008) 22. Rodrı́guez-Muro M., Calvanese, D.: Dependencies to optimize ontology-based data access. In: Proc. DL. (2011) 23. Rosati, R., Almatelli, A.: Improving query answering over DL-Lite ontologies. In: Proc. KR (2010) 24. Vardi, M.: The complexity of relational query languages (extended abstract). In: Proc. STOC. pp. 137–146 (1982) 25. Vardi, M.: On the complexity of bounded-variable queries (extended abstract). In: Proc. PODS. pp. 266–276 (1995)