=Paper=
{{Paper
|id=Vol-1193/paper_43
|storemode=property
|title=Succinctness of Query Rewriting in OWL 2 QL: The Case of Tree-like Queries
|pdfUrl=https://ceur-ws.org/Vol-1193/paper_43.pdf
|volume=Vol-1193
|dblpUrl=https://dblp.org/rec/conf/dlog/BienvenuKP14
}}
==Succinctness of Query Rewriting in OWL 2 QL: The Case of Tree-like Queries==
Succinctness of Query Rewriting in OWL 2 QL: The Case of Tree-like Queries Meghyn Bienvenu1 , Stanislav Kikot2 , and Vladimir Podolskii3 1 LRI – CNRS & Université Paris Sud, Orsay, France 2 Institute for Information Transmission Problems & MIPT, Moscow, Russia 3 Steklov Mathematical Institute, Moscow, Russia Abstract. This paper further investigates the succinctness landscape of query rewriting in OWL 2 QL. We clarify the worst-case size of positive existential (PE), non-recursive Datalog (NDL), and first-order (FO) rewritings for various classes of tree-like conjunctive queries, ranging from linear queries up to bounded treewidth queries. More specifically, we establish a superpolynomial lower bound on the size of PE-rewritings that holds already for linear queries and TBoxes of depth 2. For NDL-rewritings, we show that polynomial-size rewritings always exist for tree-shaped queries with a bounded number of leaves (and arbitrary TBoxes), and for bounded treewidth queries and bounded depth TBoxes. Finally, we show that the succinctness problems concerning FO-rewritings are equiva- lent to well-known problems in Boolean circuit complexity. Along with known results, this yields a complete picture of the succinctness landscape for the con- sidered classes of queries and TBoxes. 1 Introduction For several years now, conjunctive query (CQ) answering has been a major focus of description logic (DL) research (cf. survey [19]), due to the growing interest in using description logic ontologies to query data. Formally, the problem is to compute the certain answers to a CQ q(x) over a knowledge base (T , A), that is, the tuples of in- dividuals a that satisfy T , A |= q(a). Much of the work on CQ answering focuses on lightweight DLs of the DL-Lite family [5], and the corresponding OWL 2 QL profile [18]. The popularity of these languages is due to fact that they enjoy first-order (FO) rewritability, which means that for every CQ q(x) and every TBox T , there exists a computable FO-query q0 (x) (called a rewriting) such that the certain answers to q(x) over (T , A) coincide with the answers of the FO-query q0 (x) over the ABox A (viewed as a database). First-order rewritability provides a means of reducing CQ answering to the evaluation of FO (∼ SQL) queries in relational databases. A great many differ- ent query rewriting algorithms have been proposed for OWL 2 QL and its extensions, cf. [5, 20, 25, 6, 10, 24, 21, 7, 17, 23]. Most of these algorithms produce rewritings ex- pressed as unions of conjunctive queries (UCQs), and the size of such rewritings can be huge, making it difficult, or even impossible, to evaluate them using standard relational database management systems. It is not difficult to see that exponential-size rewritings are unavoidable if rewritings are given as UCQs (consider for instance the CQ q(x) = B1 (x)∧. . .∧Bn (x) and TBox {Ai v Bi | 1 ≤ i ≤ n}). A natural (and non-trivial) question is whether an exponential blowup can be avoided by moving to other standard query languages, like positive exis- tential (PE) queries, non-recursive datalog (NDL) queries, or first-order (FO-) queries4 . More generally, under what conditions can we ensure polynomial-size rewritings? A first (negative) answer was given in [14], which proved exponential lower bounds for the worst-case size of PE- and NDL-rewritings, as well as a superpolynomial lower bound for FO-rewritings (under the widely-held assumption that NP 6⊆ P/poly). Interestingly, all three results hold already for tree-shaped CQs, which are a well-studied class of CQs that often enjoy better computational properties, cf. [28, 4]. While the queries used in the proofs had a simple structure, the TBoxes induced full binary trees of depth n. This raised the question of whether better results could be obtained by considering restricted classes of TBoxes. A recent study [15] explored this question for TBoxes of depth 1 and 2, that is, TBoxes that generate canonical models whose elements are at most 1 or 2 ‘steps away’ from the ABox (see Section 2 for a formal definition). It was shown that for depth 1 TBoxes, polysize PE-rewritings do not exist, polysize NDL-rewritings do exist, and polysize FO-rewritings exist iff NL/poly ⊆ NC1 . For depth 2 TBoxes, neither polysize PE- nor NDL-rewritings exist, and polysize FO-rewritings do not exist unless NP 6⊆ P/poly. These results used simpler TBoxes, but the considered CQs were no longer tree-shaped. For depth 1 TBoxes, this distinction is crucial, as it was further shown in [15] that polysize PE-rewritings do exist for tree-shaped CQs. While existing results go a fair ways towards understanding the succinctness land- scape of query rewriting in OWL 2 QL, a number of questions remain open: – What happens if we consider tree-shaped queries and bounded depth TBoxes? – What happens if we consider generalizations or restrictions of tree-shaped CQs? In this paper, we address these questions by providing a complete picture of the suc- cinctness of rewritings for tree-shaped queries, their restriction to linear and bounded branching queries (i.e. tree-shaped CQs with a bounded number of leaves), and their generalization to bounded treewidth queries. More specifically, we establish a super- polynomial lower bound on the size of PE-rewritings that holds already for linear queries and TBoxes of depth 2. For NDL-rewritings, we show that polynomial-size rewritings always exist for bounded branching queries (and arbitrary TBoxes), and for bounded treewidth queries and bounded depth TBoxes. Finally, we show that the suc- cinctness problems concerning FO-rewritings are equivalent to well-known problems in Boolean circuit complexity: NL/poly ⊆ NC1 in the case of linear and bounded branching queries, and SAC1 ⊆ NC1 in the case of tree-shaped and bounded treewidth queries and bounded depth TBoxes. Along with known results, this yields a complete picture of the succinctness landscape for the considered classes of queries and TBoxes. To prove our results, we establish tight connections between Boolean functions induced by queries and TBoxes and the non-uniform complexity classes NL/poly and SAC1 , reusing and further extending the machinery developed in [14, 15]. Many proofs have been omitted for lack of space. We invite the interested reader to consult the long version [3] for full proofs and additional material. 4 We focus on so-called pure FO-rewritings, cf. for [8, 11] discussion and related results. 2 Preliminaries OWL 2 QL In this paper, we use the simplified DL syntax of the OWL 2 QL profile [18]. As usual, we assume countably infinite, mutually disjoint sets NC , NR , and NI of concept names, role names, and individual names. Roles R and basic concepts B are defined by the grammar: R ::= r | r− B ::= A | ∃R where A ∈ NC and r ∈ NR . We use N± R to refer to the set of all roles. A TBox (typically denoted T ) is a finite set of inclusions of the forms B1 v B2 B1 v ¬B2 R1 v R2 R1 v ¬R2 The signature of a TBox T , written sig(T ), is the set of concept and role names that appear in T . An ABox (typically denoted A) is a finite set of assertions the form A(a) or r(a, b), where A ∈ NC , r ∈ NR , and a, b ∈ NI . The set of individual names in A is denoted Inds(A). A TBox T and ABox A together form a knowledge base (KB) K = (T , A). The semantics of KBs is defined in the usual way based on interpretations I = (∆I , ·I ) [2]. We use vT to denote the subsumption relation induced by T and write P1 vT P2 if T |= P1 v P2 , where P1 , P2 are both concepts or roles. Query answering and rewriting A conjunctive query (CQ) q(x) is an FO-formula ∃y ϕ(x, y), where ϕ is a conjunction of atoms of the form A(z1 ) or r(z1 , z2 ) with zi ∈ x ∪ y. The free variables x are called answer variables. Note that we assume w.l.o.g. that CQs do not contain individual names, and where convenient, we regard a CQ as the set of its atoms. We use vars(q) (resp. avars(q)) to denote the set of variables (resp. answer variables) of q. The signature of q, denoted sig(q), is the set of concept and role names in q. To every CQ q, we associate the undirected graph Gq whose vertices are the variables of q, and which contains an edge {u, v} whenever q contains an atom r(u, v) or r(v, u). We call a CQ q tree-shaped if the graph Gq is a tree5 . A tuple a ⊆ Inds(A) is a certain answer to q(x) over K = (T , A) if I |= q(a) for all I |= K; in this case we write K |= q(a). By first-order semantics, I |= q(a) iff there is a mapping h : vars(q) → ∆I such that (i) h(z) ∈ AI whenever A(z) ∈ q, (ii) (h(z), h(z 0 )) ∈ rI whenever r(z, z 0 ) ∈ q, and (iii) h maps avars(q) to aI . If the first two conditions are satisified, then h is a homomorphism from q to I, and we write h : q → I. If (iii) also holds, then we write h : q(a) → I. To every ABox A, we associate the interpretation IA whose domain is Inds(A) and whose interpretation function makes true precisely the assertions from A. We say that an FO-formula q0 (x) with free variables x and without constants is an FO-rewriting of CQ q(x) and TBox T if, for any ABox A and any a ⊆ Inds(A), we have T , A |= q(a) iff IA |= q0 (a). If q0 is a positive existential formula (i.e. it only uses ∃, ∧, ∨), then it is called a PE-rewriting of q and T . We also consider rewritings in the form of nonrecursive Datalog queries. We remind the reader that a Datalog program (typically denoted Π) is a finite set of rules ∀x (γ1 ∧ · · · ∧ γm → γ0 ), where each γi is an atom 5 Tree-shaped conjunctive queries also go by the name of acyclic queries, cf. [28, 4] of the form P (x1 , . . . , xl ) with xi ∈ x. The atom γ0 is called the head of the rule, and γ1 , . . . , γm its body. All variables in the head must also occur in the body. A predicate P depends on a predicate Q in program Π if Π contains a rule whose head predicate is P and whose body contains Q. The program Π is called nonrecursive if there are no cycles in the dependence relation for Π. For a nonrecursive Datalog program Π and an atom goal(x), we say that (Π, goal) is an NDL-rewriting of q(x) and T in case T , A |= q(a) iff Π, A |= goal(a), for any ABox A and any a ⊆ Inds(A). For R ∈ {PE, NDL, FO}, we say that queries from Q and TBoxes from T have polysize R-rewritings if there exists a polynomial p such that every q ∈ Q and T ∈ T has a R-rewriting q0 with |q0 | ≤ p(|q| + |T |). Canonical model We recall that every consistent OWL 2 QL KB (T , A) possesses a canonical model CT ,A with the property that T , A |= q(a) iff CT ,A |= q(a), for every CQ q and tuple a ⊆ Inds(A). The domain of CT ,A consists of all individual names from A and all sequences aR1 R2 . . . Rn (n ≥ 1) such that – T , A |= ∃R1 (a); – for every 1 ≤ i < n: T |= ∃Ri− v ∃Ri+1 and T 6|= Ri− v Ri+1 . Concept and role names are interpreted as follows: ACT ,A = {a ∈ Inds(A) | T , A |= A(a)} ∪ {wR ∈ ∆CT ,A | T |= ∃R− v A} rCT ,A = {(a, b) | r(a, b) ∈ A} ∪ {(w, wS) ∈ ∆CT ,A × ∆CT ,A | T |= S v r} ∪ {(wS, w) ∈ ∆CT ,A × ∆CT ,A | T |= S v r− } Every individual name a ∈ Inds(A) is interpreted as itself: aCT ,A = a. We say that a TBox T is of depth ω if there is an ABox A such that CT ,A has an infinite domain; T is of depth d, 0 ≤ d < ω, if d is the greatest number such that some CT ,A contains an element of the form aR1 . . . Rd . The depth of T can be computed in polynomial time, and if T is of finite depth, then its depth cannot exceed 2|T |. 3 Boolean Functions as a Tool for Studying Rewritings In this section, we introduce different representations of Boolean functions that will play an important role in our results. We assume that the reader is familiar with Boolean circuits [1, 12], built using AND, OR, and NOT gates. The size of a circuit C, denoted |C|, is defined as its number of gates. We will be particularly interested in monotone circuits (that is, circuits with no NOT gates). (Monotone) formulas are (monotone) circuits whose underlying graph is a tree. Non-deterministic branching programs (NBP) are another well-known model for the representation of Boolean functions [22, 12]. An NBP is defined as a tuple P = (V, E, s, t, l), where (V, E) is an directed graph, s, t ∈ V , and l is a function that labels every edge e ∈ E with a conjunction of propositional literals. The NBP P induces the function fP defined as follows: for every valuation α of the propositional variables in P , fP (α) = 1 if and only if there is a path from s to t in the graph (V, E) such that all labels along the path evaluate to 1 under α. 3.1 Hypergraph functions and hypergraph programs We recall hypergraph functions and programs from [15]. Let H = (V, E) be a hyper- graph with vertices v ∈ V and hyperedges e ∈ E, E ⊆ 2V . A subset E 0 ⊆ E is independent if e ∩ e0 = ∅, for any distinct e, e0 ∈ E 0 . With each vertex v ∈ V and each hyperedge e ∈ E, we associate propositional variables pv and pe , respectively. The hypergraph function fH for H is given by the Boolean formula _ ^ ^ fH = pv ∧ pe . (1) ind. E 0 ⊆E v∈V \∪E 0 e∈E 0 A hypergraph program HGP P consists of a hypergraph HP = (V, E) and a function lP that labels every vertex with 0, 1, pi or ¬pi (here the pi are propositional variables, distinct from the pv , pe above). An input for P is a valuation of the propositional vari- ables in P ’s labels. We say that the hypergraph program P computes a Boolean function f in case, for any input α, we have f (α) = 1 if and only if there is an independent subset of E that covers all zeros—that is, contains all the vertices in V labelled with 0 under α. A hypergraph program is monotone if there are no negated variables among its vertex labels. The size, |H|, of a hypergraph program H is the number of its vertices and hyperedges. Observe that each hypergraph program that is based upon the hypergraph H computes the Boolean function that is obtained from the hypergraph function fH by substituting vertex labels for vertex variables and 1 for edge variables. Conversely, it is not hard to construct for a given hypergraph H, a hypergraph program that computes fH . Sometimes it is convenient to consider hypergraph programs whose labels are con- junctions of variables and their negations, rather than single literals. It is not hard to see that this does not change the power of such programs. 3.2 Upper bounds via tree witness hypergraph functions The upper bounds in [15] rely on associating a hypergraph function with every query and TBox. As the hypergraph is defined in terms of tree witnesses, we first recall the definition of tree witnesses. Consider a CQ q and a TBox T . For every role R, we let TR = T ∪ {AR v ∃R} and AR = {AR (a)} (for some fresh concept name AR ). Suppose that q0 ⊆ q (recall that we view queries as sets of atoms) and there is a homomorphism h : q0 → CTR ,AR such that h(x) = a for every x ∈ avars(q). Let tr = {x ∈ vars(q0 ) | h(x) = a}, and let ti be the remaining set of (quantified) variables in q0 . We call the pair t = (tr , ti ) a tree witness for q and T generated by R if ti 6= ∅ and q0 is a minimal subset of q such that, for any y ∈ ti , every atom in q containing y belongs to q0 . In this case, we denote q0 by qt . Note that the same tree witness can be generated by different roles R. We let ΘTq be the set of all tree witnesses of q and T and use ΘTq [R] to denote those generated by R. We use x ∈ t as a shorthand for x ∈ tr ∪ ti . To every CQ q and TBox T , we can naturally associate the hypergraph whose ver- tices are the atoms of q and whose hyperedges are the sets qt , for tree witnesses t for q and T . We denote this hypergraph by HTq and call the corresponding function fHTq the tree witness hypergraph function of q and T . It is known that the circuit complexity of fHTq provides an upper bound on the size of rewritings of q and T . Theorem 1 (from [15]). If fHTq is computed by a (monotone) Boolean formula χ then there is a (PE-) FO-rewriting of q and T of size O(|χ| · |q| · |T |). If fHTq is computed by a monotone Boolean circuit C then there is an NDL-rewriting of q and T of size O(|C| · |q| · |T |). Observe that fHTq contains a variable pt for every tree witness t. For this reason, it can only be used to show polynomial upper bounds in cases where |ΘTq | is bounded polynomially in |q| and |T |. This motivates us to consider a variant of fHTq : _ ^ ^ ^ _ ^ 0 pR fH q = p% ∧ pz=z0 ∧ z (2) T q Θ⊆ΘT %∈q\qΘ t∈Θ z,z 0 ∈t R∈N± z∈t R , independent q t∈ΘT [R] where qΘ = t qt . Intuitively, pz=z0 enforces that variables z and z 0 are mapped to S elements of CT ,A that begin by the same ABox individual; the variable pR z states that z is mapped to an element that begins by an individual a satisfying T , A |= ∃R(a). 0 We observe that the number of variables in fH q is polynomially bounded in |q| and T 0 |T |, but fH q retains the same properties as fH q regarding upper bounds. T T 0 Theorem 2. Theorem 1 continues to hold if fHTq is replaced by fH q. T 3.3 Lower bounds via primitive evaluation functions In order to obtain lower bounds on the size of rewritings, it will prove convenient to P associate to each pair (q, T ) a third function fq,T that describes the result of evaluating q on single-individual ABoxes. Given Boolean vectors α : NC ∩ (sig(T ) ∪ sig(q)) → {0, 1} and β : NR ∩ (sig(T ) ∪ sig(q)) → {0, 1}, we let A(α, β) = {A(a) | α(A) = 1} ∪ {r(a, a) | β(r) = 1} P and set fq,T (α, β) = 1 iff T , A(α, β) |= q(a), where a is a tuple of a’s of the required P length. We call fq,T the primitive evaluation function for q and T . Theorem 3 (implicit in [15]). If q0 is a (PE-) FO-rewriting of q and T , then there is a (monotone) Boolean formula χ of size O(|q0 |) which computes fq,TP . P If (Π, G) is an NDL-rewriting of q and T , then fq,T is computed by a monotone Boolean circuit C of size O(|Π|). 4 Bounded Branching Queries It is known from [14] that tree-shaped CQs do not have polysize PE- or NDL-rewritings (nor polysize FO-rewritings, unless NP ⊆ P/poly). In this section, we investigate the robustness of these results by considering a restricted form of tree-shaped queries. We will say that a tree-shaped CQ q has k leaves if the associated graph Gq (defined in Section 2) contains exactly k vertices of degree 1. We will be interested in bounded branching queries (that is, tree-shaped CQs with a bounded number of leaves) and linear queries (having exactly 2 leaves). 4.1 Bounded branching queries, interval hypergraphs and NBPs It is not hard to see that every linear query induces a tree witness hypergraph that is isomorphic to an interval hypergraph, i.e. a hypergraph H = (V, E) where V = {[i, i+ 1] | 1 ≤ i < n} for some finite n and E is a set of intervals of the form [i, j] = {[k, k + 1] | 1 ≤ k < j} where 1 ≤ i < j ≤ n. Since the hypergraph functions of interval hypergraphs can be computed by polynomial-size NBPs, the same holds for the tree witness hypergraph functions of linear queries. In fact, the following result shows that we can construct polysize NBPs not only for linear queries, but also for bounded branching queries: Theorem 4. Fix a constant ` > 1. Then there exists a polynomial p such that for every tree-shaped CQ q with at most ` leaves and every OWL 2 QL TBox T , there is an NBP of size at most p(|q| + |T |) that computes fHTq . We next give a polynomial translation from NBPs into interval hypergraph programs, thereby establishing the polynomial equivalence of these formalisms: Theorem 5. Every function f that is computable by an NBP P is also computable by an interval hypergraph program of size polynomial in |P |. To complete the chain, we show how to compute hypergraph functions for interval hypergraphs using primitive evaluation functions of linear CQs and TBoxes of depth 2. To every interval hypergraph H, we associate the linear CQ qH pictured below: ^ qH = ∃y (ri (yi , yi0 ) ∧ ri0 (yi0 , yi+1 )). [i,i+1]∈V y1 r1 y10 r10 y2 r2 y20 r20 y3 r3 y30 r30 y4 r4 y40 r40 y5 The TBox TH contains the following axioms (on the left) for each edge [i, j] ∈ E: Bij v ∃sij , ∃s− 0 ij v ∃sij , s013 r10 r2 r20 r3 sij v ri , s− 0 ij v rj , s0ij v rk0 , for every i ≤ k < j s13 r1 r30 s0ij v rk− , for every i < k ≤ j B13 To the right, we illustrate the canonical model generated by B13 . Observe how the axioms ensure that the subquery of qH lying between y1 and y4 can be mapped onto it. Theorem 6. For every interval hypergraph H = (V, E) and for all α : V → {0, 1} and β : E → {0, 1} we have fH (α, β) = 1 iff fqPH ,TH (γ) with γ defined as follows: γ(Bij ) = β([i, j]), γ(ri ) = γ(ri0 ) = α([i, i + 1]), and γ(sij ) = γ(s0ij ) = 0. 4.2 Size of Rewritings of Bounded Branching Queries We now apply the results from Section 4.1 to derive bounds on rewriting size. It is known that there is a sequence fn of monotone Boolean functions that are computable by polynomial-size monotone NBPs, but all monotone Boolean formulas computing fn are of size nΩ(log n) [13]. Using this fact, together with Theorems 1,3, 5, and 6, we obtain a strong negative result for PE-rewritings. Theorem 7. There is a sequence of linear CQs qn and TBoxes Tn of depth 2, both of polysize in n, such that any PE-rewriting of qn and Tn is of size nΩ(log n) . We obtain a positive result for NDL-rewritings using Theorems 1 and 4 and the fact that NBPs are representable as polynomial-size monotone circuits [22]. Theorem 8. Fix a constant ` > 1. Then all tree-shaped CQs with at most ` leaves and arbitrary TBoxes have polynomial-size NDL-rewritings. Finally, we use Theorems 1,3, 5, and 6 to show that the existence of polysize FO- rewritings is equivalent to the open problem of whether NL/poly ⊆ NC1 . Theorem 9. The following are equivalent: 1. There exist polysize FO-rewritings for all linear CQs and depth 2 TBoxes; 2. There exist polysize FO-rewritings for all tree-shaped CQs with at most ` leaves and arbitrary TBoxes (for any fixed `); 3. There exists a polynomial function p such that every NBP of size at most s is com- putable by a formula of size p(s). Equivalently, NL/poly ⊆ NC1 . 5 Bounded Treewidth Queries In Section 4, we gave bounds on the size of rewritings for restricted classes of tree- shaped CQs. In the present section, we consider arbitrary tree-shaped queries and their natural generalization to bounded treewidth queries [9]. As the rewriting size of tree- shaped queries and arbitrary TBoxes has already been studied [14], we will focus on a class of “well-behaved” TBoxes, that includes TBoxes of bounded depth as a special case. We begin by formally introducing the classes of queries and TBoxes we consider. Bounded treewidth queries We recall that a tree decomposition of an undirected graph G = (V, E) is a pair (T, λ) such that T is an (undirected) tree and λ assigns a label λ(N ) ⊆ V to every node N of T such that the following conditions are satisfied: 1. For every v ∈ V , there exists a node N with v ∈ λ(N ). 2. For every edge e ∈ E, there exists a node N such that e ⊆ λ(N ). 3. For every v ∈ V , the nodes {N | v ∈ λ(N )} induce a connected subtree of T . The width of a tree decomposition (T, λ) is equal to maxN |λ(N )|−1, and the treewidth of a graph G is the minimum width over all tree decompositions of G. The treewidth of a CQ q is defined as the treewidth of the graph Gq . Polynomial image property Let T be an OWL 2 QL TBox, and let q be a CQ. Then the set Wq,T of relevant words for q and T consists of all words w of length at most |T |+|q| such that there exists an ABox A that is consistent with T and a homomorphism h : q → CT ,A whose image contains an element of the form aw. The length bound is motivated by the following well-known fact: Lemma 1. If A is consistent with T and T , A |= q(a), then there is some h : q(a) → CT ,A whose image is contained in {aw | a ∈ Inds(A), w ∈ Wq,T }. We say that a class T of TBoxes has the polynomial image property if there is a poly- nomial p such that for every TBox T ∈ T and every CQ q, |Wq,T | ≤ p(|T | + |q|). Observe that if d ≥ 0 is fixed, then the class of TBoxes of depth at most d has the polynomial image property. Another relevant class of TBoxes with this property is the class of TBoxes that do not contain role inclusions. 5.1 Bounded treewidth queries and tree hypergraph programs As in Section 4, our first step will be to relate Boolean functions induced by the query and TBox with hypergraph programs. The main difference is that in lieu of interval hypergraph programs, we will use tree hypergraph programs. To formally define tree hypergraph programs, we must first introduce some defini- tions related to trees. Given a tree T with vertices u and v, the interval hu, vi is the set of edges that appear on the simple path connecting u and v. If v1 , . . . , vk are vertices of T , then the generalized interval hv1 , . . . , vk i is defined as the union of intervals hvi , vj i over all pairs (i, j). A hypergraph H = (VH , EH ) is a tree hypergraph if there is a tree T = (VT , ET ) such that VH = ET and every hyperedge in EH is a generalized interval of T . A hypergraph program is a tree hypergraph program (TreeHGP) if it is based on a tree hypergraph. 0 0 From fH q to TreeHGP. We show how to construct a TreeHGP that computes f q , HT T given a TBox T , a CQ q, and a tree decomposition (T, λ) of Gq of width t. We may suppose w.l.o.g. that T contains at most (2|q| − 1)2 nodes, cf. [16]. In order to more easily refer to the variables in λ(N ), we construct functions λ1 , . . . , λt such that λi (N ) ∈ λ(N ) and λ(N ) = ∪i λi (N ). The basic idea underlying the construction is as follows: for each node N in the tree decomposition of q, we select an abstract description of the way the variables in λ(N ) are homomorphically mapped into the canonical model, and we check that the selected descriptions respect the subqueries of each node and are consistent with each other. Formally, these abstract descriptions are given by the set Γt (q, T ) consisting of all t-tuples w = (w1 , . . . , wt ) of words from Wq,T . Intuitively, the words in w specify, for each variable x in λ(N ), the path of roles that lead from the ABox to the image of x in the canonical model. We say that w ∈ Γt (q, T ) is consistent with a node N in T if: – if A(λi (N )) ∈ q, then either w[i] = ε or w[i] = w0 R and ∃R− vT A – if r(λi (N ), λj (N )) ∈ q, then one of the following holds: • w[i] = w[j] = ε • w[j] = w[i] · R with R vT r • w[i] = w[j] · R with R vT r− We call a pair of tuples (w1 , w2 ) compatible with the pair of nodes (N1 , N2 ) if: – λi (N1 ) = λj (N2 ) implies that w1 [i] = w2 [j] We assume that the elements of Γt (q, T ) are numbered from 1 to M , and use ξi to refer to the i-th element. We define a tree T 0 that replaces each edge {Ni , Nj } in T by the following sequence of edges: {Ni , u1ij }, {u1ij , vij 1 1 }, {vij , u2ij }, {u2ij , vij 2 }, . . . {uM M M M ij , vij }{vij , vji } 2 2 1 2 1 1 1 {uM M ji , vji } . . . {uji , vji }, {vji , uji }, {uji , vji }, {Nj , uji } The desired TreeHGP (Hq,T , lq,T ) is based upon T 0 and contains the hyperedges: – Eik = hukij1 , . . . , ukijn i, for every ξk ∈ Γt (q, T ) that is consistent with Ni , where Nj1 , . . . , Njn are the neighbours of Ni km k m – Eij = hvij , vji i, for every pair of tuples (ξk , ξm ) that is compatible with (Ni , Nj ) Vertices of the hypergraph (i.e. the edges in T 0 ) are labeled by lq,T as follows: – every edge of the form {Ni , u1ij }, {vij ` , u`+1 M M ij }, or {vij , vji } is labelled 0 ` ` – every edge {uij , vij } with ξ` = w is labelled by the conjunction of: • p% , if vars(%) ⊆ λ(Ni ) and λg (Ni ) ∈ vars(%) implies w[g] = ε 0 • pR z , if vars(%) = {z} ⊆ λ(Ni ), z = λg (Ni ), and w[g] = Rw • pz , pz0 , and pz=z , if vars(%) = {z, z } ⊆ λ(Ni ), z = λg (Ni ), z 0 = λg0 (Ni ), R R 0 0 and either w[g] = Rw0 or w[g 0 ] = Rw0 0 Theorem 10. For every TBox T and CQ q, the TreeHGP (Hq,T , lq,T ) computes fH q. T If q has treewidth t, then |Hq,T | ≤ 8 · |q|2 · |Wq,T |2t . P From TreeHGP to fq,T . Consider a tree hypergraph H = (V, E) that is based upon the tree T whose vertices are v1 , . . . , vn . Let T ↓ be the directed tree obtained from T by fixing v1 as the root and orienting edges away from v1 . We wish to construct a tree-shaped query qH and TBox TH of depth 2 whose prim- itive evaluation function fqPH ,TH can be used to compute fH . The construction general- izes the one from the preceding section for linear queries. The query qH is obtained by doubling the edges in T ↓ : ^ 0 qH = ∃y (rij (yi , yij ) ∧ rij (yij , yj )). (vi ,vj )∈T ↓ The TBox TH is defined as the union of Te over all hyperedges e ∈ E. Consider some hyperedge e = hvi1 , . . . , vim i ∈ E, and suppose w.l.o.g. that vi1 is the vertex in e that is highest in T ↓ . Then Te contains Be v ∃se , ∃s− 0 e v ∃se , and the axioms: se v ri1 ,k if {vi1 , vk } ∈ e s− 0 e v rj` ,i` if 1 < ` ≤ n and (vj` , vi` ) ∈ T ↓ (so, {vj` , vi` } ∈ e) − s0e v rj,k if {vj , vk } ∈ e, (vj , vk ) ∈ T ↓ , and vj 6= vi1 s0e v rj,k 0 if {vj , vk } ∈ e, (vj , vk ) ∈ T ↓ , and vk 6= vi` for any 1 < ` ≤ m Observe that both qH and TH are of polynomial size in |H|. Theorem 11. For every tree hypergraph H = (V, E) and for all α : V → {0, 1} and β : E → {0, 1}, fH (α, β) = 1 iff fqPH ,TH (γ) = 1 where γ is defined as follows: 0 γ(Be ) = β(e), γ(rij ) = γ(rij ) = α({vi , vj }), and γ(se ) = γ(s0e ) = 0. 5.2 Tree hypergraph programs and SAC1 To characterize the power of tree hypergraph programs, we consider semi-unbounded fan-in circuits in which NOT gates are applied only to the inputs, AND gates have fan-in 2, and OR gates have unbounded fan-in. The complexity class SAC1 [27] is defined by considering circuits of this type having polynomial size and logarithmic depth; SAC1 is the non-uniform analog of the class LOGCFL of all languages logspace- reducible to context-free languages [26]. We consider semi-unbounded fan-in circuits of size σ and depth log σ, where σ is a parameter, and show that they are polynomially equivalent to TreeHGP by providing reductions in both directions (details can be found in [3]). Theorem 12. There exist polynomial functions p and p0 such that: – Every semi-unbounded fan-in circuit of size at most s and depth at most log σ is computable by a TreeHGP of size p(σ). – Every TreeHGP of size σ is computable by semi-unbounded fan-in circuit of size at most p0 (σ) and depth at most log p0 (σ). 5.3 Size of rewritings of bounded treewidth queries Theorems 10 and 12 together show us how to construct a polysize monotone SAC1 0 circuit that computes fH q . Thus, by applying Theorem 2, we obtain: T Theorem 13. Fix a constant t > 0, and let T be a class of OWL 2 QL TBoxes with the polynomial image property. Then all CQs of treewidth at most t and TBoxes in T have polynomial-size NDL-rewritings. In the case of FO-rewritings, we can show that the existence of polysize rewritings corresponds to the open question of whether SAC1 ⊆ NC1 . Theorem 14. The following are equivalent: 1. There exist polysize FO-rewritings for all tree-shaped CQs and depth 2 TBoxes; 2. There exist polysize FO-rewritings for all CQs of treewidth at most t and TBoxes from a class T with the polynomial image property (for any fixed t); 3. There exists a polynomial function p such that every semi-unbounded fan-in circuit of size at most σ and depth at most log σ is computable by a formula of size p(σ). Equivalently, SAC1 ⊆ NC1 . 6 Conclusion and Future Work In this paper, we filled some gaps in the succinctness landscape for query rewriting in OWL 2 QL by providing new bounds on the worst-case rewriting size for various forms of tree-like queries. In doing so, we closed one of the open questions from [15]. In future work, we plan to consider additional dimensions of the succinctness land- scape. For example, all existing lower bounds rely on sequences (qn , Tn ) in which the number of roles in qn and Tn grows with n. Moreover, Tn often contains a large number of role inclusions. Thus, an interesting and practically relevant problem is to explore the impact of restricting the number of roles and/or the use of role inclusions. Acknowledgments. This work was partially supported by ANR grant 12-JS02-007-01, Russian Foundation for Basic Research and the programme “Leading Scientific Schools”. References 1. S. Arora and B. Barak. Computational Complexity: A Modern Approach. Cambridge Uni- versity Press, New York, NY, USA, 1st edition, 2009. 2. F. Baader, D. Calvanese, D. McGuinness, D. Nardi, and P. Patel-Schneider, editors. The De- scription Logic Handbook: Theory, Implementation and Applications. Cambridge University Press, 2003. 3. M. Bienvenu, S. Kikot, and V. Podolskii. Succinctness of query rewriting in OWL 2 QL: The case of tree-like queries. CoRR, abs/1406.3047, 2014. 4. M. Bienvenu, M. Ortiz, M. Simkus, and G. Xiao. Tractable queries for lightweight descrip- tion logics. In Proc. of the 23rd International Joint Conference on Artificial Intelligence (IJCAI 2013). AAAI Press, 2013. 5. D. Calvanese, G. De Giacomo, D. Lembo, M. Lenzerini, and R. Rosati. Tractable reasoning and efficient query answering in description logics: The DL-Lite family. Journal of Auto- mated Reasoning, 39(3):385–429, 2007. 6. A. Chortaras, D. Trivela, and G. Stamou. Optimized query rewriting for OWL 2 QL. In Proc. of the 23rd Int. Conf. on Automated Deduction (CADE-23), volume 6803 of Lecture Notes in Computer Science, pages 192–206. Springer, 2011. 7. T. Eiter, M. Ortiz, M. Šimkus, T.-K. Tran, and G. Xiao. Query rewriting for Horn-SHIQ plus rules. In Proc. of the 26th AAAI Conf. on Artificial Intelligence (AAAI 2012). AAAI Press, 2012. 8. G. Gottlob, S. Kikot, R. Kontchakov, V. Podolskii, T. Schwentick, and M. Zakharyaschev. The price of query rewriting in ontology-based data access. Artificial Intelligence, to appear, 2014. 9. G. Gottlob, N. Leone, and F. Scarcello. Hypertree decompositions and tractable queries. In V. Vianu and C. H. Papadimitriou, editors, Proc. of the 18th ACM SIGACT-SIGMOD- SIGART Symposium on Principles of Database Systems (PODS 1999), pages 21–32. ACM Press, 1999. 10. G. Gottlob, G. Orsi, and A. Pieris. Ontological queries: Rewriting and optimization. In Proc. of the 27th Int. Conf. on Data Engineering (ICDE 2011), pages 2–13. IEEE Computer Society, 2011. 11. G. Gottlob and T. Schwentick. Rewriting ontological queries into small nonrecursive datalog programs. In Proc. of the 13th Int. Conf. on the Principles of Knowledge Representation and Reasoning (KR 2012), pages 254–263. AAAI Press, 2012. 12. S. Jukna. Boolean Function Complexity: Advances and Frontiers. Springer, 2012. 13. M. Karchmer and A. Wigderson. Monotone circuits for connectivity require super- logarithmic depth. In Proc./ of the 20th Annual ACM Symposium on Theory of Computing (STOC 1988), pages 539–550. ACM Press, 1988. 14. S. Kikot, R. Kontchakov, V. V. Podolskii, and M. Zakharyaschev. Exponential lower bounds and separation for query rewriting. In Proc. of the 39th Int. Colloquium on Automata, Lan- guages, and Programming (ICALP 2012), Part II, volume 7392 of Lecture Notes in Com- puter Science, pages 263–274. Springer, 2012. 15. S. Kikot, R. Kontchakov, V. V. Podolskii, and M. Zakharyaschev. On the succinctness of query rewriting over OWL 2 QL ontologies with shallow chases. In Proc. of the 29th Annual ACM/IEEE Symposium on Logic in Computer Science (LICS 2014). ACM Press, 2014. 16. T. Kloks. Treewidth: Computations and Approximations, volume 842 of Lecture Notes in Computer Science. Springer, 1994. 17. M. König, M. Leclère, M.-L. Mugnier, and M. Thomazo. A sound and complete backward chaining algorithm for existential rules. In Proc. of the 6th Int. Conf. on Web Reasoning and Rule Systems (RR 2012), volume 7497 of Lecture Notes in Computer Science, pages 122–138. Springer, 2012. 18. B. Motik, B. Cuenca Grau, I. Horrocks, Z. Wu, A. Fokoue, and C. Lutz. OWL 2 Web Ontology Language profiles. W3C Recommendation, 11 December 2012. Available at http://www.w3.org/TR/owl2-profiles/. 19. M. Ortiz and M. Simkus. Reasoning and query answering in description logics. In Reasoning Web, volume 7487 of Lecture Notes in Computer Science, pages 1–53. Springer, 2012. 20. H. Pérez-Urbina, B. Motik, and I. Horrocks. A comparison of query rewriting techniques for DL-Lite. In Proc. of the 22nd Int. Workshop on Description Logics (DL 2009), volume 477. CEUR-WS, 2009. 21. H. Pérez-Urbina, E. Rodrı́guez-Dı́az, M. Grove, G. Konstantinidis, and E. Sirin. Evaluation of query rewriting approaches for OWL 2. In Proc. of SSWS+HPCSW 2012, volume 943. CEUR-WS, 2012. 22. A. Razborov. Lower bounds for deterministic and nondeterministic branching programs. In Proc. of the 8th Int. Symposium on Fundamentals of Computation Theory (FCT’91), volume 529 of Lecture Notes in Computer Science, pages 47–60. Springer, 1991. 23. M. Rodrı́guez-Muro, R. Kontchakov, and M. Zakharyaschev. Ontology-based data access: Ontop of databases. In Proc. of the 12th Int. Semantic Web Conf. (ISWC 2013), volume 8218 of Lecture Notes in Computer Science, pages 558–573. Springer, 2013. 24. R. Rosati. Prexto: Query rewriting under extensional constraints in DL-Lite. In Proc. of the 9th Extended Semantic Web Conf. (EWSC 2012), volume 7295 of Lecture Notes in Computer Science, pages 360–374. Springer, 2012. 25. R. Rosati and A. Almatelli. Improving query answering over DL-Lite ontologies. In Proc. of the 10th Int. Conf. on the Principles of Knowledge Representation and Reasoning (KR 2010), pages 290–300. AAAI Press, 2010. 26. H. Venkateswaran. Properties that characterize LOGCFL. J. Computer and System Sciences, 43(2):380–404, 1991. 27. H. Vollmer. Introduction to circuit complexity - a uniform approach. Texts in theoretical computer science. Springer, 1999. 28. M. Yannakakis. Algorithms for acyclic database schemes. In Proc. of the 7th Int. Conf. on Very Large Data Bases (VLDB’81), pages 82–94. IEEE Computer Society, 1981.