=Paper=
{{Paper
|id=None
|storemode=property
|title=Long Rewritings, Short Rewritings
|pdfUrl=https://ceur-ws.org/Vol-846/paper_41.pdf
|volume=Vol-846
|dblpUrl=https://dblp.org/rec/conf/dlog/KikotKPZ12
}}
==Long Rewritings, Short Rewritings==
Long Rewritings, Short Rewritings S. Kikot1 , R. Kontchakov1 , V. Podolskii2 , and M. Zakharyaschev1 1 Department of Computer Science and Information Systems Birkbeck, University of London, U.K. {kikot,roman,michael}@dcs.bbk.ac.uk 2 Steklov Mathematical Institute, Moscow, Russia podolskii@mi.ras.ru 1 Introduction An ontology language L is said to enjoy FO-rewritability if any conjunctive query (CQ) q over any ontology T , given in L, can be transformed into an FO- formula q 0 such that, for any data A, all answers to q over the knowledge base (T , A) can be found by querying q 0 over A using a standard relational database management system (RDBMS). Ontology languages with this property include the OWL 2 QL profile of OWL 2, which is based on description logics of the DL-Lite family [7, 16, 2], and fragments of Datalog± such as linear or sticky TGDs [5, 6]. Various rewriting techniques have been implemented in the systems QuOnto [1], REQUIEM [15], Presto [22], Nyaya [8], IQAROS3 and Quest.4 The idea of using languages with FO-rewritability for ontology-based data ac- cess (OBDA) relies on the empirical fact that RDBMSs are usually very efficient in practice. However, the first rewritings of CQs over OWL 2 QL ontologies [7, 15] turned out to be too lengthy even for modern RDBMSs. The attempts to employ various optimisation techniques still produced rewritings of exponential size in the worst case: O((|T | · |q|)|q| ) [22, 8, 20, 21]. The alternative two-step combined approach [14, 13]—first expand the data by applying the ontology axioms to the data and introducing (some of) the missing individuals, and only then rewrite the query over the expanded data—resulted in a simple polynomial rewriting only for the fragment of OWL 2 QL without role inclusions; for the full language, the rewriting remained exponential. Two seemingly contradictory results, presented at DL 2011, added more spice to the quest for short rewritings: [9] showed that one can construct, in polynomial time, a nonrecursive Datalog (NDL) rewriting for some fragments of Datalog± containing OWL 2 QL, while [11] argued that no FO-rewriting for OWL 2 QL can be constructed in polynomial time. The aim of this paper is twofold. First, we investigate the worst-case size of FO- and NDL-rewritings for CQs over OWL 2 QL ontologies. We distinguish between ‘pure’ rewritings, which can use the signature of the original query and ontology as well as =, 6= (cf. [7]), and ‘impure’ rewritings, where other means such as new constants are allowed. Here is a summary of the obtained results: 3 http://code.google.com/p/iqaros/ 4 http://obda.inf.unibz.it/protege-plugin/quest/quest.html (1) An exponential blow-up is unavoidable for pure positive existential (PE) rewritings and pure NDL-rewritings; pure FO-rewritings can blow-up super- polynomially unless NP ⊆ P/poly. (2) Pure NDL-rewritings are in general exponentially more succinct than pure PE-rewritings. (3) Pure FO-rewritings can be superpolynomially more succinct than pure PE- rewritings. (4) Impure PE- and NDL-rewritings can always be made polynomial, and so they are exponentially more succinct than pure PE- and NDL-rewritings, respectively. (1)–(3) are proved by establishing connections between pure rewritings for CQs over OWL 2 QL ontologies and circuits for monotone Boolean functions. In a nutshell, we show that CQs and OWL 2 QL ontologies can encode such problems as the existence of a k-clique in a graph with n vertices whose edges are given by a single-element ABox. The polynomial PE-rewriting in (4) is similar to the NDL-rewriting of [9]: using two extra constants, = and polynomially many new existentially quantified variables, one can guess a relevant part of the canonical model of T in the rewritten query. The difference between the resulting impure PE-rewritings and exponential-size pure PE-rewritings is of the same kind as the difference between deterministic and nondeterministic Boolean circuits. Our second aim is to analyse the causes behind long rewritings and whether they occur in real-world queries and ontologies. As a result, we suggest some short rewritings that cover most practical cases. Omitted proofs can be found in [10] and the full version of [12]. 2 Queries over OWL 2 QL Ontologies The language of OWL 2 QL is defined by the following grammar:5 R ::= Pi | Pi− , B ::= ⊥ | Ai | ∃R, C ::= B | ∃R.B, where the Ai are concept names and the Pi are role names. An OWL 2 QL TBox, T , is a finite set of inclusions of the form B v C, R1 v R2 , B1 u B2 v ⊥ and R1 u R2 v ⊥. Note that concepts of the form ∃R.B can only occur in the right- hand side of concept inclusions. An ABox, A, is a finite set of assertions of the form Ak (ai ) and Pk (ai , aj ), where ai , aj are individual names. T and A together form the knowledge base (KB) K = (T , A). The semantics is defined in the usual way, based on interpretations I = (∆I , ·I ) with domain ∆I and interpretation function ·I . The set of individuals in A is denoted by ind(A); IA is the interpretation with domain ind(A) such that, for any concept or role E and any a ⊆ ind(A), we have IA |= E(a) iff E(a) ∈ A. We write E1 vT E2 if T |= E1 v E2 ; and we set [E] = {E 0 | E vT E 0 and E 0 vT E}. 5 We do not consider data properties, attributes and role (ir)reflexivity constraints. A conjunctive query (CQ) q(x) is a formula ∃y ϕ(x, y), where ϕ is a con- junction of atoms of the form Ak (t1 ) and Pk (t1 , t2 ), and each ti is a term (an individual or a variable from x, y). A tuple a ⊆ ind(A) is a certain answer to q(x) over K = (T , A) if I |= q(a) for all I |= K; then we write K |= q(a). Query answering over OWL 2 QL KBs is based on the fact that, for any consistent KB K = (T , A), there is an interpretation CK such that, for all CQs q(x) and a ⊆ ind(A), we have K |= q(a) iff CK |= q(a). The interpretation CK , called the canonical model of K, can be constructed as follows. For each pair [R], [B] with ∃R.B in T (we assume ∃R is just a shorthand for ∃R.>), we introduce a fresh symbol w[RB] and call it the witness for ∃R.B. We write K |= C(w[RB] ) if ∃R− vT C or B vT C. Define a generating relation, ;, on the set of these witnesses together with ind(A) by taking: – a ; w[RB] if a ∈ ind(A), [R] and [B] are vT -minimal such that K |= ∃R.B(a) and there is no b ∈ ind(A) with K |= R(a, b) ∧ B(b); – w[R0 B 0 ] ; w[RB] if u ; w[R0 B 0 ] , for some u, [R] and [B] are vT -minimal with K |= ∃R.B(w[R0 B 0 ] ) and it is not the case that R0 vT R− and K |= B(u). If a ; w[R1 B1 ] ; · · · ; w[Rn Bn ] , n ≥ 0, then we say that a generates the path aw[R1 B1 ] · · · w[Rn Bn ] . Denote by pathK (a) the set of paths generated by a, and by tail(π) the last element in π ∈ pathK (a). CK is defined by taking: [ ∆CK = pathK (a), aCK = a, for a ∈ ind(A), a∈ind(A) CK A = {π ∈ ∆CK | K |= A(tail(π))}, P CK = {(a, b) ∈ ind(A) × ind(A) | K |= P (a, b)} ∪ {(π, π · w[RB] ) | tail(π) ; w[RB] , R vT P } ∪ {(π · w[RB] , π) | tail(π) ; w[RB] , R vT P − }. Theorem 1 ([7, 13]). For every OWL 2 QL KB K = (T , A), every CQ q(x) and every a ⊆ ind(A), K |= q(a) iff CK |= q(a). Given a CQ q(x) and a TBox T , a first-order formula q 0 (x), possibly with = and 6=, is called an FO-rewriting for q(x) and T if, for any ABox A and any a ⊆ ind(A), we have (T , A) |= q(a) iff IA |= q 0 (a). If q 0 is an FO-rewriting of the form ∃y ϕ(x, y), where ϕ is built from atoms using only ∧ and ∨, then we call q 0 (x) a positive existential rewriting for q(x) and T (or a PE-rewriting, for short). We say that q 0 is pure if it does not contain constants that do not occur in q (such constants are interpreted by fresh individuals added to IA ). The size |q 0 | of q 0 is the number of symbols in q 0 . We also consider rewritings in the form of nonrecursive Datalog queries. We remind the reader that a Datalog program, Π, is a finite set of Horn clauses ∀x (A1 ∧ · · · ∧ Am → A0 ), where each Ai is an atom of the form P (t1 , . . . , tl ) and each tj is either a variable from x or a constant. A0 is called the head of the clause, and A1 , . . . , Am its body. All variables occurring in the head must also occur in the body. A predicate P depends on a predicate Q in Π if Π contains a clause whose head is P and whose body contains Q. Π is called nonrecursive if this dependence relation for Π is acyclic. A nonrecursive Datalog query consists of a nonrecursive Datalog program Π and a goal G, which is just a predicate. A tuple a ⊆ ind(A) is a certain answer to (Π, G) over A if Π, A |= G(a). The size |Π| of Π is the number of symbols in it. We distinguish between pure and impure Datalog queries [3]. In a pure query (Π, G), the clauses in Π do not contain constant symbols in their heads. One reason for considering only pure queries in OBDA is that impure ones can add new facts to the database that do not follow from the background ontology. Impure queries are known to be more succinct than pure ones. Given a CQ q(x) and an OWL 2 QL TBox T , a pure nonrecursive Datalog query (Π, G) is called a nonrecursive Datalog rewriting for q(x) and T (or an NDL-rewriting, for short) if, for any ABox A and any a ⊆ ind(A), we have (T , A) |= q(a) iff Π, A |= G(a). If Π does not contain constants that do not occur in q then we say that the NDL-rewriting (Π, G) is pure. 3 Query Rewritings and Boolean Circuits To establish results (1)–(3) on the size of rewritings mentioned in the introduc- tion, we show how the problem of constructing circuits that compute monotone Boolean functions can be reduced to the problem of finding pure FO- and NDL- rewritings for CQs over OWL 2 QL ontologies. Our reduction proceeds in three steps. First, we take any family f 1 , f 2 , . . . of monotone Boolean functions in NP, where f n : {0, 1}n → {0, 1}, a polynomial p and a family C1 , C2 , . . . of polynomial-size circuits such that f n (α) = 1 iff Cn (α, β) = 1, for some β ∈ {0, 1}p(n) . Using the Tseitin transformation [23], we construct a polynomial-size CNF θf n that computes f n in the following sense: Lemma 1. If f n is monotone then ϕα V fn = αj =0 ¬xj ∧ θf is satisfiable iff n n n f (α) = 1, for all α = (α1 , . . . , αn ) ∈ {0, 1} . Let ϕf n be ϕα α f n for α = (0, . . . , 0). It should be clear that ϕf n is obtained from ϕf n by removing the clauses ¬xj for which the jth component of α is 1. The second step is to encode the ϕf n by means of TBoxes Tf n and CQs q f n . Let p1 , . . . , pN be the propositional variables and C1 , . . . , Cd the clauses in ϕf n . Then Tf n contains the following concept inclusions, for 1 ≤ i ≤ N , 1 ≤ j ≤ d: Ai−1 v ∃P − .Xi` , Xi` v Ai , for ` = 0, 1, Xi0 v Zi,j if ¬pi ∈ Cj , Zi,j v ∃P.Zi−1,j , Xi1 v Zi,j if pi ∈ Cj , A0 u Ai v ⊥, A0 u ∃P v ⊥, A0 u Zi,j v ⊥, if (i, j) ∈ / {(0, 1), . . . , (0, n)}, and the CQ is defined as follows: h N ^ q f n = ∃y ∃z A0 (y0 ) ∧ P (yi , yi−1 ) ∧ i=1 d ^ −1 N^ i P (yN , zN −1,j ) ∧ P (zi,j , zi−1,j ) ∧ Z0,j (z0,j ) , j=1 i=1 where y = (y0 , . . . , yN ) and z = (z0,1 , . . . , zN −1,1 , . . . , z0,d , . . . , zN −1,d ). The size of Tf n and q f n is O(|Cn |2 ). Note that Tf n is acyclic and q f n is tree-shaped and has no answer variables. The canonical model C(Tf n ,{A0 (a)}) of (Tf n , {A0 (a)}) and the query q f n are illustrated below. Z0,1 Z1,1 X21 , A2 , Z2,1 X31 , A3 X30 , A3 X20 , A2 X31 , A3 X11 , A1 C(Tf n ,{A0 (a)}) a X30 , A3 X21 , A2 A0 X31 , A3 X10 , A1 X30 , A3 X20 , A2 X31 , A3 X30 , A3 , Z3,2 Z0,2 Z1,2 Z2,2 y0 y1 y2 y3 A0 z0,1 z1,1 z2,1 qf n Z0,1 Z0,2 z0,2 z1,2 z2,2 For each α = (α1 , . . . , αn ) ∈ {0, 1}n , we define the following ABox: Aα = A0 (a) ∪ Z0,j (a) | 1 ≤ j ≤ n and αj = 1 . Lemma 2. (Tf n , Aα ) |= q f n iff ϕα n f n is satisfiable, for α ∈ {0, 1} . To complete our reduction, we show that rewritings for q f n and Tf n can be turned into Boolean circuits computing f n . Lemma 3. (i) Suppose q 0f n is a pure FO- (PE-) rewriting for Tf n and q f n . Then there is a (monotone) Boolean formula ψf n computing f n with |ψf n | ≤ |q 0f n |. (ii) Suppose (Πf n , G) is a pure NDL-rewriting for Tf n and q f n . Then there is a monotone Boolean circuit Cf n computing f n with |Cf n | ≤ |Πf n |. The proof proceeds by eliminating quantifiers in the rewriting and replacing its predicates with propositional variables using the fact that, in ABoxes Aα , these predicates can only be true on the individual a. Lemmas 1 and 2 ensure that the resulting Boolean formula or circuit computes f n . The next lemma shows that circuits computing f n can be turned into pure rewritings for q f n and Tf n . Lemma 4. (i) Suppose q n is an FO-sentence such that (Tf n , Aα ) |= q f n iff IAα |= q n , for all α ∈ {0, 1}n . Then there exists a pure FO-rewriting q 0n for q fn and Tfn with |q 0n | ≤ |q n | + p(n), for a polynomial p. (ii) Suppose (Πn , G) is a pure NDL-query with a propositional goal G such that (Tf n , Aα ) |= q f n iff Πn , Aα |= G, for α ∈ {0, 1}n . Then there is a pure NDL-rewriting (Πn0 , G0 ) for q f n and Tf n with |Πn0 | ≤ |Πn |+p(n), p a polynomial. Now, results (1)–(3) formulated in the introduction can be obtained by ap- plying Lemmas 1–4 to three concrete Boolean functions. For (1), we use the function Cliquen,k of n(n − 1)/2 variables eij , 1 ≤ i < j ≤ n, which returns 1 iff the graph with vertices {1, . . . , n} and edges {{i, j} | eij = 1} contains a k-clique. A series of papers, started by Razborov’s [19], gave an exponential √ lower bound for the size of monotone circuits computing Cliquen,k : 2Ω( k) for k ≤ 41 (n/ log n)2/3 . For monotone formulas, an even better lower bound is known: 2Ω(k) for k = 2n/3 [18]. The question whether Cliquen,k can be computed by a polynomial-size circuit is equivalent to whether NP ⊆ P/poly. To show (2), we use the function Genn3 of n3 variables xijk , 1 ≤ i, j, k ≤ n, defined as follows. We say that 1 generates k ≤ n if either k = 1 or, for some i and j such that xijk = 1, 1 generates both i and j. Genn3 (x111 , . . . , xnnn ) returns 1 iff 1 generates n. It is clearly a monotone function computable by polynomial-size monotone circuits. On the other hand, any monotone formula ε computing Genn3 is of size at least 2n , for some ε > 0 [17]. For (3), we use the function Matching2n of n2 variables eij , 1 ≤ i, j ≤ n, which returns 1 iff there is a perfect matching in a bipartite graph G with vertices {v11 , . . . , vn1 , v12 , . . . , vn2 } and edges {{vi1 , vj2 } | eij = 1}, i.e., a subset E of edges in G such that every node in G occurs exactly once in E. An exponential lower bound 2Ω(n) for the size of monotone formulas computing this function was obtained in [18]. However, Matching2n is computable by non-monotone formulas of size nO(log n) [4]. The exponential bounds for the size of rewritings can be reduced to polyno- mial by using two extra constants, say 0 and 1, which are included in the domain of every IA (see [9] and [10] for polynomial-size NDL- and PE-rewritings, re- spectively). As these constants may not occur in the original query and intended ABoxes, we call such rewritings impure (in [9] and [10], 0, 1, = and polynomially- many fresh existentially quantified variables are used to guess some part of the canonical model). The difference between pure and impure rewritings is simi- lar to the difference between deterministic circuits and nondeterministic circuits with additional existentially quantified input variables. For example, Cliquen,k is computed by a polynomial-size monotone nondeterministic circuit, but not by a monotone deterministic circuit of polynomial size [19]. 4 Why are Pure Rewritings so Long? Let q(x) = ∃y ϕ(x, y) be a connected CQ and T a TBox. Let term q be the set of terms in q. Denote by CT the disjoint union of the canonical models for consistent (T , {R(aRB , bRB ), B(bRB )}), in which the generating relation is extended with aRB ; bRB . The RB-subtree of CT has root aRB and consists of the full subtree of CT with root bRB extended with the edge (aRB , bRB ). We also need the formulas _ _ _ extC (x) = A(x) ∨ ∃y R(x, y), extP (x, y) = R(x, y), AvT C ∃RvT C RvT P a for concepts C and role names P . Suppose CK |= ϕ, K = (T , A), where a is an assignment of elements of ∆CK to the variables in ϕ under which a(x) ∈ ind(A) for all x ∈ x. Consider an atom P (t, t0 ) ∈ q with bound variables t, t0 and assume that a(t0 ) ∈ ind(A), for some t0 ∈ term q. The assignment a can send t and t0 to four different locations in CK : (A) a(t), a(t0 ) ∈ ind(A); (B) a(t) ∈ ind(A), a(t0 ) ∈ / ind(A); (B− ) a(t) ∈ / ind(A), a(t0 ) ∈ ind(A); (O) a(t), a(t0 ) ∈ / ind(A). Let us see how these alternatives can be reflected in a rewriting. In case (A) we have CK |=a P (t, t0 ) iff A |=a extP (t, t0 ). Case (B) is possible only if, for some concept ∃R.B, we have a(t) ; w[RB] , R vT P and the atoms of q ‘linked to’ t0 can be mapped into the RB-subtree of CT . To illustrate, consider an example. Example 1. Let T and q be as in the picture below. The answer variable x must be mapped by a to an ABox element. However, y can be mapped either to an ABox element or to the point a(x) · w[S − B] provided that a(x) is an instance of ∃S − .B. In the latter case, we have two ways of mapping z: either to a(x) · w[S − B] w[SA0 ] , in which case we must set a(v) = a(x) · w[S − B] w[SA0 ] w[T >] , or to a(x), provided that a(x) is an instance of ∃T . Thus we have two (partial) maps f and g from q into the S − B-subtree of CT shown below. T = { A v ∃S − .B, B v ∃S.A0 , v v f A0 v ∃T } T T A 0 T z z q(x) = {S(y, x), S(y, z), T (z, v)} f S S B g S y y f g q S S− S q x x f aS − B g These observations motivate our key definition. Given a pair (t, t0 ) of adjacent terms in q, a tree witness 6 for (t, t0 ) is a homomorphism f from the query q f = E(s) ∈ q | s ⊆ dom f, s 6⊆ [t]f to the RB-subtree of CT , for some ∃R.B, such that f (t) = aRB , dom f is the smallest set containing t, t0 for which s0 ∈ dom f whenever S(s, s0 ) ∈ q with s ∈ dom f \ [t]f , and all s ∈ dom f \ [t]f are bound variables in q. Here ∼f denotes the equivalence relation on dom f defined by taking s ∼f s0 iff f (s) = f (s0 ) and [t]f the equivalence class of t. In Example 1, f and g are two tree witnesses for (x, y). Returning back to case (B), we can say now that there must exist a tree witness f for (t, t0 ) such that a(t) satisfies the following tree-witness formula twf for f with f (t) = aRB : ^ ^ twf = ext∃R.B (t) ∧ (t = s) ∧ extE (s). s∈[t]f E(s)∈q, s⊆[t]f Case (B− ) is symmetric, and in case (O) there must exist S(s, s0 ) ∈ q for which (B) holds, with P (t, t0 ) being ‘covered’ by the tree witness for (s, s0 ). This analysis suggests the following idea for a rewriting. We guess pairs of adjacent terms (t, t0 ) in q that will be mapped to edges of the tree part of CK 6 A different notion of tree witness was used for DL-LiteN horn [13], where the structure of the canonical models ensured uniqueness of every tree witness. and the tree witnesses that will cover them, as in cases (B), (B− ) and (O). The part of q that is not covered by these tree witnesses will be mapped to ind(A), as in case (A). The query representing the guesses is then evaluated over IA . The following example shows, however, that this idea needs a refinement. Example 2. Let q(x1 , x4 ) = {R(x1 , y2 ), R(y3 , y2 ), R(y3 , x4 )}, shown below on the left, and T = {A v ∃R, A v ∃R− }. C(T ,{A(a)}) y2 x4 aw[R>] aw[R− >] f R R R A g R R− x1 y3 a Clearly, there is a tree witness f for (x1 , y2 ) with dom f = {x1 , y2 , y3 } and [x1 ]f = {x1 , y3 }, and a tree witness g for (x4 , y3 ) with dom g = {x4 , y3 , y2 } and [x4 ]g = {x4 , y2 }. Although these tree witnesses cover the whole query q, they are only ‘realised,’ say, in the canonical model C(T ,{A(a)}) (shown above on the right) under conflicting maps: f sends x1 , y3 to a and y2 to aw[R>] , while g sends x4 , y2 to a and y3 to aw[R− >] ; in fact, (T , {A(a)}) 6|= q(a, a). Tree witnesses f and g, for (t, t0 ) and (s, s0 ), respectively, are compatible if dom f ∩dom g ⊆ [t]f ∩[s]g . If f and g are incompatible and neither dom f ⊆ dom g nor dom g ⊆ dom f , then we call f and g conflicting (e.g., f and g in Example 2). A set Ξ of tree witnesses is called consistent if all pairs of tree witnesses in Ξ are compatible. Let _ ^ ^ q e (x) = detachedq ∨ ∃y twf ∧ extE (s) , Ξ consistent f ∈Ξ E(s)∈q s6⊆dom f, for all f ∈Ξ where detachedq = ⊥ if x 6= ∅; otherwise it is a disjunction of the sentences ∃x ext∃R.B (x) such that there is a homomorphism from q to the RB-subtree of CT . The next theorem shows that q e is a pure PE-rewriting for q and T : Theorem 2. For all A and a ⊆ ind(A), we have (T , A) |= q(a) iff IA |= q e (a). Example 3. Let q(x) = {Ri (x, yi ) | i ≤ n} and T = {Ai v ∃Ri | i ≤ n}. Each pair (x, V to one treeVwitness fi with twfi = Ai (x) ∨ ∃y Ri (x, y), and W yi ) gives rise q e = N ⊆[0,n] ∃y ( i∈N twfi ∧ j ∈N / Rj (x, yj )). The size of q e is O((nT ,q + 1)|q| · |T | · |q|2 ), where nT ,q is the number of distinct tree witnesses. Now, we observe that if (conf ) there are no conflicting tree witnesses for q and T then q e can be transformed to the query ^ h ^ _ i q c (x) = detachedq ∨ ∃y extE (s) ∨ twf {t,t0 } E(s)∈q, s⊆{t,t0 } f is a tree witness t,t0 adjacent t,t0 ∈dom f (if q has no binary predicates then q c = q e ). Theorem 3. If T and q satisfy (conf ) then, for any ABox A and a ⊆ ind(A), we have (T , A) |= q(a) iff IA |= q c (a). The size of q c is O(nT ,q ·|T |·|q|2 ). So, if (conf ) holds and nT ,q is polynomial then q c is a polynomial pure PE-rewriting of q and T . For V instance, the expo- nential q e of Example 3 reduces to polynomial q c = ∃y i≤n (Ri (x, yi ) ∨ twfi ). Note, however, that the CQs and TBoxes used in Section 3 generate exponentially many distinct tree witnesses. We show now that, for a large class of CQs q and TBoxes T , all tree witnesses f for each (t, t0 ) in q (even if there are exponentially many of them) can be represented by a polynomial formula. Observe that each twf is determined by a concept ∃R.B (such that the RB-subtree of CT contains the range of f ) and the equivalence relation ∼f on dom f . As the number of concepts ∃R.B is linear in |T |, the rewriting q c may be regarded polynomial if we show that all tree witnesses for each (t, t0 ) have the same equivalence relation. For example, let q(x) = {S(x, y), R(y, zi ) | 1 ≤ i ≤ n} and T = {A v ∃S, ∃S − v ∃R.B1 , ∃S − v ∃R.B2 }. There are 2n tree witnesses for (x, y), as each zi can be mapped either to a B1 - or a B2 -point in CT , and yet they all define the same equivalence relation and the same tree-witness formula ext∃S (x). We formalise this intuition in the following definition. For a pair (t, t0 ) of adjacent terms in q, we call f = (dom f , ∼f ) a universal tree witness for (t, t0 ) if dom f is a subset of term q with t, t0 ∈ dom f and ∼f is an equivalence relation on dom f such that q f = E(s) ∈ q | s ⊆ dom f , s 6⊆ [t]f /∼f is a tree-shaped query with root [t]f and, for every tree witness g for (t, t0 ), – dom g = dom f and there exists a homomorphism h : q f → CT that preserves the distance from the root and g(s) = h([s]f ), for every s ∈ dom f . A universal tree witness f for (t, t0 ) is not a tree witness in the sense of our orig- inal definition, but rather a convenient structure representing all tree witnesses for (t, t0 ): we can merge the tree-witness formulas for (t, t0 ) into one formula h _ i ^ ^ twf = ext∃R.B (t) ∧ (t = s) ∧ extE (s), ∃R.B∈Φf s∈[t]f E(s)∈q, s⊆[t]f where Φf is the set of concepts ∃R.B such there is a homomorphism h from q f to the RB-subtree of CT with h([t]f ) = aRB . We now identify a class of CQs and TBoxes for which a universal tree witness is unique (if exists) and can be constructed in time polynomial in |q| and |T |. Intuitively, for each tree witness f , we disallow situations in which CT and the quotient q f /∼f of q f simultaneously contain fragments of the form T S q f /∼f CT T T S S We say that a role S is forward if u ; v for all (u, v) ∈ S CT . If neither S nor its inverse S − is forward then S is said to be a twisty role. A tree witness f for (t, t0 ) is called perfect if, for all T (s1 , s2 ), S(s2 , s3 ) ∈ q f /∼f such that s2 6= [t]f and S is twisty, we have CT 6|= inv (T, S) ∧ suc(T, S), where suc(T, S) = ∃x, y, z (T (x, y) ∧ S(y, z) ∧ (x 6= z)), inv (T, S) = ∃x, y (T (x, y) ∧ S(y, x)). Lemma 5. There is a polynomial-time algorithm which, given q, T and a pair (t, t0 ), checks whether all tree witnesses for (t, t0 ) are perfect, and if this is the case, returns a unique universal tree witness f t,t0 for (t, t0 ). Note that even though there may be exponentially many tree witnesses for (t, t0 ), the algorithm checks whether they all are perfect in polynomial time. We are now in a position to define our polynomial PE-rewriting q p for q and T : ^ h ^ _ i q p (x) = detachedq ∨ ∃y extE (s) ∨ twf s,s0 . {t,t0 } E(s)∈q, s⊆{t,t0 } s,s0 adjacent t,t0 adjacent t,t0 ∈dom f s,s0 Theorem 4. If all tree witnesses for q and T are perfect and condition (conf ) is satisfied then, for any ABox A and any a ⊆ ind(A), we have (T , A) |= q(a) iff IA |= q p (a). Moreover, q p is constructed in time polynomial in |q| and |T |. If T does not contain any twisty roles, then all tree witnesses in any CQ q over T are perfect. On the other hand, all examples of conflicting tree witnesses above involve twisty roles. The following theorem shows that this is no accident: Theorem 5. Let T be an OWL 2 QL ontology without twisty roles. Then, for any CQ q, there are no conflicting tree witnesses for q and T . Thus, q p is a pure PE-rewriting for q and T and it can be constructed in polynomial time. Note that OWL 2 EL ontologies satisfy this condition, and so a polynomial rewriting similar to q p can also be used for CQ answering over such ontologies provided that the ABoxes are complete with respect to the ontologies [12]. 5 Conclusions As we saw in Section 3, pure PE- and NDL-rewritings of CQs over OWL 2 QL ontologies are of exponential size in the worst case. The analysis in Section 4 showed that the length of a rewriting is related to the number of tree witnesses in the query, which reflect how various parts of the query can be homomorphically mapped to the intensional tree part of the canonical model. Thus, a rewriting can be lengthy if the original query is sufficiently long and the intensional part of the canonical model for the ontology is sufficiently complex. We proved that by restricting the interaction between inverse roles and role inclusions in ontologies and queries, we can guarantee transparent polynomial rewritings. Moreover, as shown by a series of experiments [12], real-world ontologies and CQs contain very few tree witnesses, which are never in conflict, satisfy the above mentioned restrictions, and so enjoy pure, polynomial PE-rewritings. References 1. Acciarri, A., Calvanese, D., De Giacomo, G., Lembo, D., Lenzerini, M., Palmieri, M., Rosati, R.: QuOnto: Querying ontologies. In: Proc. of the 20th Nat. Conf. on AI, AAAI. pp. 1670–1671 (2005) 2. Artale, A., Calvanese, D., Kontchakov, R., Zakharyaschev, M.: The DL-Lite family and relations. Journal of Artificial Intelligence Research (JAIR) 36, 1–69 (2009) 3. Benedikt, M., Gottlob, G.: The impact of virtual views on containment. PVLDB 3(1), 297–308 (2010) 4. Borodin, A., von zur Gathen, J., Hopcroft, J.E.: Fast parallel matrix and gcd computations. In: Proc. of FOCS. pp. 65–71 (1982) 5. Calı̀, A., Gottlob, G., Lukasiewicz, T.: A general Datalog-based framework for tractable query answering over ontologies. In: Proc. of PODS. pp. 77–86 (2009) 6. Calı̀, A., Gottlob, G., Pieris, A.: Advanced processing for ontological queries. PVLDB 3(1), 554–565 (2010) 7. Calvanese, D., De Giacomo, G., Lembo, D., Lenzerini, M., Rosati, R.: Tractable reasoning and efficient query answering in description logics: The DL-Lite family. J. of Automated Reasoning 39(3), 385–429 (2007) 8. Gottlob, G., Orsi, G., Pieris, A.: Ontological queries: Rewriting and optimization. In: Proc. of the IEEE Int. Conf. on Data Engineering, ICDE (2011) 9. Gottlob, G., Schwentick, T.: Rewriting ontological queries into small nonrecursive Datalog programs. In: Proc. of DL. vol. 745. CEUR-WS.org (2011) 10. Kikot, S., Kontchakov, R., Podolskii, V., Zakharyaschev, M.: Exponential lower bounds and separation for query rewriting. CoRR, arXiv:1202.4193, 2012. 11. Kikot, S., Kontchakov, R., Zakharyaschev, M.: On (in)tractability of OBDA with OWL 2 QL. In: Proc. of DL. vol. 745. CEUR-WS.org (2011) 12. Kikot, S., Kontchakov, R., Zakharyaschev, M.: Conjunctive query answering with OWL 2 QL. In: Proc. of KR. AAAI Press (2012) (see www.dcs.bbk.ac.uk/~kikot) 13. Kontchakov, R., Lutz, C., Toman, D., Wolter, F., Zakharyaschev, M.: The com- bined approach to query answering in DL-Lite. In: Proc. of KR. AAAI Press (2010) 14. Lutz, C., Toman, D., Wolter, F.: Conjunctive query answering in the description logic EL using a relational database system. In: Proceedings of the 21st Int. Joint Conf. on Artificial Intelligence, IJCAI 2009. pp. 2070–2075 (2009) 15. Pérez-Urbina, H., Motik, B., Horrocks, I.: A comparison of query rewriting tech- niques for DL-Lite. In: Proc. of DL. vol. 477. CEUR-WS.org (2009) 16. Poggi, A., Lembo, D., Calvanese, D., De Giacomo, G., Lenzerini, M., Rosati, R.: Linking data to ontologies. J. on Data Semantics X, 133–173 (2008) 17. Raz, R., McKenzie, P.: Separation of the monotone NC hierarchy. In: Proc. of FOCS. pp. 234–243 (1997) 18. Raz, R., Wigderson, A.: Monotone circuits for matching require linear depth. J. ACM 39(3), 736–744 (1992) 19. Razborov, A.: Lower bounds for the monotone complexity of some Boolean func- tions. Dokl. Akad. Nauk SSSR 281(4), 798–801 (1985) 20. Rodrı́guez-Muro, M., Calvanese, D.: Dependencies to optimize ontology based data access. In: Proc. of DL. vol. 745. CEUR-WS.org (2011) 21. Rodrı́guez-Muro, M., Calvanese, D.: Semantic index: Scalable query answering without forward chaining or exponential rewritings. In: Proc. of ISWC (2011) 22. Rosati, R., Almatelli, A.: Improving query answering over DL-Lite ontologies. In: Proc. of the 12th Int. Conf. KR. AAAI Press (2010) 23. Tseitin., G.: On the complexity of derivation in propositional calculus. In: Automa- tion of Reasoning 2: Classical Papers on Computational Logic 1967–1970 (1983)