=Paper= {{Paper |id=Vol-1193/paper_43 |storemode=property |title=Succinctness of Query Rewriting in OWL 2 QL: The Case of Tree-like Queries |pdfUrl=https://ceur-ws.org/Vol-1193/paper_43.pdf |volume=Vol-1193 |dblpUrl=https://dblp.org/rec/conf/dlog/BienvenuKP14 }} ==Succinctness of Query Rewriting in OWL 2 QL: The Case of Tree-like Queries== https://ceur-ws.org/Vol-1193/paper_43.pdf
      Succinctness of Query Rewriting in OWL 2 QL:
               The Case of Tree-like Queries

              Meghyn Bienvenu1 , Stanislav Kikot2 , and Vladimir Podolskii3
                     1
                          LRI – CNRS & Université Paris Sud, Orsay, France
         2
             Institute for Information Transmission Problems & MIPT, Moscow, Russia
                         3
                            Steklov Mathematical Institute, Moscow, Russia


       Abstract. This paper further investigates the succinctness landscape of query
       rewriting in OWL 2 QL. We clarify the worst-case size of positive existential
       (PE), non-recursive Datalog (NDL), and first-order (FO) rewritings for various
       classes of tree-like conjunctive queries, ranging from linear queries up to bounded
       treewidth queries. More specifically, we establish a superpolynomial lower bound
       on the size of PE-rewritings that holds already for linear queries and TBoxes of
       depth 2. For NDL-rewritings, we show that polynomial-size rewritings always
       exist for tree-shaped queries with a bounded number of leaves (and arbitrary
       TBoxes), and for bounded treewidth queries and bounded depth TBoxes. Finally,
       we show that the succinctness problems concerning FO-rewritings are equiva-
       lent to well-known problems in Boolean circuit complexity. Along with known
       results, this yields a complete picture of the succinctness landscape for the con-
       sidered classes of queries and TBoxes.


1   Introduction
For several years now, conjunctive query (CQ) answering has been a major focus of
description logic (DL) research (cf. survey [19]), due to the growing interest in using
description logic ontologies to query data. Formally, the problem is to compute the
certain answers to a CQ q(x) over a knowledge base (T , A), that is, the tuples of in-
dividuals a that satisfy T , A |= q(a). Much of the work on CQ answering focuses on
lightweight DLs of the DL-Lite family [5], and the corresponding OWL 2 QL profile
[18]. The popularity of these languages is due to fact that they enjoy first-order (FO)
rewritability, which means that for every CQ q(x) and every TBox T , there exists a
computable FO-query q0 (x) (called a rewriting) such that the certain answers to q(x)
over (T , A) coincide with the answers of the FO-query q0 (x) over the ABox A (viewed
as a database). First-order rewritability provides a means of reducing CQ answering to
the evaluation of FO (∼ SQL) queries in relational databases. A great many differ-
ent query rewriting algorithms have been proposed for OWL 2 QL and its extensions,
cf. [5, 20, 25, 6, 10, 24, 21, 7, 17, 23]. Most of these algorithms produce rewritings ex-
pressed as unions of conjunctive queries (UCQs), and the size of such rewritings can be
huge, making it difficult, or even impossible, to evaluate them using standard relational
database management systems.
    It is not difficult to see that exponential-size rewritings are unavoidable if rewritings
are given as UCQs (consider for instance the CQ q(x) = B1 (x)∧. . .∧Bn (x) and TBox
{Ai v Bi | 1 ≤ i ≤ n}). A natural (and non-trivial) question is whether an exponential
blowup can be avoided by moving to other standard query languages, like positive exis-
tential (PE) queries, non-recursive datalog (NDL) queries, or first-order (FO-) queries4 .
More generally, under what conditions can we ensure polynomial-size rewritings? A
first (negative) answer was given in [14], which proved exponential lower bounds for the
worst-case size of PE- and NDL-rewritings, as well as a superpolynomial lower bound
for FO-rewritings (under the widely-held assumption that NP 6⊆ P/poly). Interestingly,
all three results hold already for tree-shaped CQs, which are a well-studied class of CQs
that often enjoy better computational properties, cf. [28, 4]. While the queries used in
the proofs had a simple structure, the TBoxes induced full binary trees of depth n. This
raised the question of whether better results could be obtained by considering restricted
classes of TBoxes. A recent study [15] explored this question for TBoxes of depth 1
and 2, that is, TBoxes that generate canonical models whose elements are at most 1
or 2 ‘steps away’ from the ABox (see Section 2 for a formal definition). It was shown
that for depth 1 TBoxes, polysize PE-rewritings do not exist, polysize NDL-rewritings
do exist, and polysize FO-rewritings exist iff NL/poly ⊆ NC1 . For depth 2 TBoxes,
neither polysize PE- nor NDL-rewritings exist, and polysize FO-rewritings do not exist
unless NP 6⊆ P/poly. These results used simpler TBoxes, but the considered CQs were
no longer tree-shaped. For depth 1 TBoxes, this distinction is crucial, as it was further
shown in [15] that polysize PE-rewritings do exist for tree-shaped CQs.
     While existing results go a fair ways towards understanding the succinctness land-
scape of query rewriting in OWL 2 QL, a number of questions remain open:

 – What happens if we consider tree-shaped queries and bounded depth TBoxes?
 – What happens if we consider generalizations or restrictions of tree-shaped CQs?

In this paper, we address these questions by providing a complete picture of the suc-
cinctness of rewritings for tree-shaped queries, their restriction to linear and bounded
branching queries (i.e. tree-shaped CQs with a bounded number of leaves), and their
generalization to bounded treewidth queries. More specifically, we establish a super-
polynomial lower bound on the size of PE-rewritings that holds already for linear
queries and TBoxes of depth 2. For NDL-rewritings, we show that polynomial-size
rewritings always exist for bounded branching queries (and arbitrary TBoxes), and for
bounded treewidth queries and bounded depth TBoxes. Finally, we show that the suc-
cinctness problems concerning FO-rewritings are equivalent to well-known problems
in Boolean circuit complexity: NL/poly ⊆ NC1 in the case of linear and bounded
branching queries, and SAC1 ⊆ NC1 in the case of tree-shaped and bounded treewidth
queries and bounded depth TBoxes. Along with known results, this yields a complete
picture of the succinctness landscape for the considered classes of queries and TBoxes.
To prove our results, we establish tight connections between Boolean functions induced
by queries and TBoxes and the non-uniform complexity classes NL/poly and SAC1 ,
reusing and further extending the machinery developed in [14, 15].
    Many proofs have been omitted for lack of space. We invite the interested reader to
consult the long version [3] for full proofs and additional material.

 4
     We focus on so-called pure FO-rewritings, cf. for [8, 11] discussion and related results.
2      Preliminaries
OWL 2 QL In this paper, we use the simplified DL syntax of the OWL 2 QL profile
[18]. As usual, we assume countably infinite, mutually disjoint sets NC , NR , and NI of
concept names, role names, and individual names. Roles R and basic concepts B are
defined by the grammar:

                   R    ::=     r   |    r−            B     ::=    A     |    ∃R

where A ∈ NC and r ∈ NR . We use N±  R to refer to the set of all roles.
   A TBox (typically denoted T ) is a finite set of inclusions of the forms

                   B1 v B2          B1 v ¬B2         R1 v R2         R1 v ¬R2

The signature of a TBox T , written sig(T ), is the set of concept and role names that
appear in T . An ABox (typically denoted A) is a finite set of assertions the form A(a)
or r(a, b), where A ∈ NC , r ∈ NR , and a, b ∈ NI . The set of individual names in A is
denoted Inds(A).
    A TBox T and ABox A together form a knowledge base (KB) K = (T , A). The
semantics of KBs is defined in the usual way based on interpretations I = (∆I , ·I ) [2].
We use vT to denote the subsumption relation induced by T and write P1 vT P2 if
T |= P1 v P2 , where P1 , P2 are both concepts or roles.
Query answering and rewriting A conjunctive query (CQ) q(x) is an FO-formula
∃y ϕ(x, y), where ϕ is a conjunction of atoms of the form A(z1 ) or r(z1 , z2 ) with
zi ∈ x ∪ y. The free variables x are called answer variables. Note that we assume
w.l.o.g. that CQs do not contain individual names, and where convenient, we regard a
CQ as the set of its atoms. We use vars(q) (resp. avars(q)) to denote the set of variables
(resp. answer variables) of q. The signature of q, denoted sig(q), is the set of concept
and role names in q. To every CQ q, we associate the undirected graph Gq whose
vertices are the variables of q, and which contains an edge {u, v} whenever q contains
an atom r(u, v) or r(v, u). We call a CQ q tree-shaped if the graph Gq is a tree5 .
     A tuple a ⊆ Inds(A) is a certain answer to q(x) over K = (T , A) if I |= q(a)
for all I |= K; in this case we write K |= q(a). By first-order semantics, I |= q(a)
iff there is a mapping h : vars(q) → ∆I such that (i) h(z) ∈ AI whenever A(z) ∈ q,
(ii) (h(z), h(z 0 )) ∈ rI whenever r(z, z 0 ) ∈ q, and (iii) h maps avars(q) to aI . If the
first two conditions are satisified, then h is a homomorphism from q to I, and we write
h : q → I. If (iii) also holds, then we write h : q(a) → I.
     To every ABox A, we associate the interpretation IA whose domain is Inds(A) and
whose interpretation function makes true precisely the assertions from A. We say that
an FO-formula q0 (x) with free variables x and without constants is an FO-rewriting of
CQ q(x) and TBox T if, for any ABox A and any a ⊆ Inds(A), we have T , A |= q(a)
iff IA |= q0 (a). If q0 is a positive existential formula (i.e. it only uses ∃, ∧, ∨), then
it is called a PE-rewriting of q and T . We also consider rewritings in the form of
nonrecursive Datalog queries. We remind the reader that a Datalog program (typically
denoted Π) is a finite set of rules ∀x (γ1 ∧ · · · ∧ γm → γ0 ), where each γi is an atom
 5
     Tree-shaped conjunctive queries also go by the name of acyclic queries, cf. [28, 4]
of the form P (x1 , . . . , xl ) with xi ∈ x. The atom γ0 is called the head of the rule, and
γ1 , . . . , γm its body. All variables in the head must also occur in the body. A predicate
P depends on a predicate Q in program Π if Π contains a rule whose head predicate
is P and whose body contains Q. The program Π is called nonrecursive if there are no
cycles in the dependence relation for Π. For a nonrecursive Datalog program Π and
an atom goal(x), we say that (Π, goal) is an NDL-rewriting of q(x) and T in case
T , A |= q(a) iff Π, A |= goal(a), for any ABox A and any a ⊆ Inds(A).
     For R ∈ {PE, NDL, FO}, we say that queries from Q and TBoxes from T have
polysize R-rewritings if there exists a polynomial p such that every q ∈ Q and T ∈ T
has a R-rewriting q0 with |q0 | ≤ p(|q| + |T |).
Canonical model We recall that every consistent OWL 2 QL KB (T , A) possesses a
canonical model CT ,A with the property that T , A |= q(a) iff CT ,A |= q(a), for every
CQ q and tuple a ⊆ Inds(A). The domain of CT ,A consists of all individual names
from A and all sequences aR1 R2 . . . Rn (n ≥ 1) such that

    – T , A |= ∃R1 (a);
    – for every 1 ≤ i < n: T |= ∃Ri− v ∃Ri+1 and T 6|= Ri− v Ri+1 .
Concept and role names are interpreted as follows:

    ACT ,A = {a ∈ Inds(A) | T , A |= A(a)} ∪ {wR ∈ ∆CT ,A | T |= ∃R− v A}
     rCT ,A = {(a, b) | r(a, b) ∈ A} ∪ {(w, wS) ∈ ∆CT ,A × ∆CT ,A | T |= S v r} ∪
             {(wS, w) ∈ ∆CT ,A × ∆CT ,A | T |= S v r− }

Every individual name a ∈ Inds(A) is interpreted as itself: aCT ,A = a.
    We say that a TBox T is of depth ω if there is an ABox A such that CT ,A has an
infinite domain; T is of depth d, 0 ≤ d < ω, if d is the greatest number such that some
CT ,A contains an element of the form aR1 . . . Rd . The depth of T can be computed in
polynomial time, and if T is of finite depth, then its depth cannot exceed 2|T |.

3     Boolean Functions as a Tool for Studying Rewritings
In this section, we introduce different representations of Boolean functions that will
play an important role in our results. We assume that the reader is familiar with Boolean
circuits [1, 12], built using AND, OR, and NOT gates. The size of a circuit C, denoted
|C|, is defined as its number of gates. We will be particularly interested in monotone
circuits (that is, circuits with no NOT gates). (Monotone) formulas are (monotone)
circuits whose underlying graph is a tree.
    Non-deterministic branching programs (NBP) are another well-known model for
the representation of Boolean functions [22, 12]. An NBP is defined as a tuple P =
(V, E, s, t, l), where (V, E) is an directed graph, s, t ∈ V , and l is a function that labels
every edge e ∈ E with a conjunction of propositional literals. The NBP P induces the
function fP defined as follows: for every valuation α of the propositional variables in
P , fP (α) = 1 if and only if there is a path from s to t in the graph (V, E) such that all
labels along the path evaluate to 1 under α.
3.1   Hypergraph functions and hypergraph programs
We recall hypergraph functions and programs from [15]. Let H = (V, E) be a hyper-
graph with vertices v ∈ V and hyperedges e ∈ E, E ⊆ 2V . A subset E 0 ⊆ E is
independent if e ∩ e0 = ∅, for any distinct e, e0 ∈ E 0 . With each vertex v ∈ V and
each hyperedge e ∈ E, we associate propositional variables pv and pe , respectively.
The hypergraph function fH for H is given by the Boolean formula
                                _  ^                    ^ 
                     fH =                        pv ∧        pe .                 (1)
                                ind. E 0 ⊆E   v∈V \∪E 0    e∈E 0

A hypergraph program HGP P consists of a hypergraph HP = (V, E) and a function
lP that labels every vertex with 0, 1, pi or ¬pi (here the pi are propositional variables,
distinct from the pv , pe above). An input for P is a valuation of the propositional vari-
ables in P ’s labels. We say that the hypergraph program P computes a Boolean function
f in case, for any input α, we have f (α) = 1 if and only if there is an independent
subset of E that covers all zeros—that is, contains all the vertices in V labelled with 0
under α. A hypergraph program is monotone if there are no negated variables among its
vertex labels. The size, |H|, of a hypergraph program H is the number of its vertices and
hyperedges. Observe that each hypergraph program that is based upon the hypergraph
H computes the Boolean function that is obtained from the hypergraph function fH by
substituting vertex labels for vertex variables and 1 for edge variables. Conversely, it is
not hard to construct for a given hypergraph H, a hypergraph program that computes
fH . Sometimes it is convenient to consider hypergraph programs whose labels are con-
junctions of variables and their negations, rather than single literals. It is not hard to see
that this does not change the power of such programs.
3.2   Upper bounds via tree witness hypergraph functions
The upper bounds in [15] rely on associating a hypergraph function with every query
and TBox. As the hypergraph is defined in terms of tree witnesses, we first recall the
definition of tree witnesses. Consider a CQ q and a TBox T . For every role R, we let
TR = T ∪ {AR v ∃R} and AR = {AR (a)} (for some fresh concept name AR ).
Suppose that q0 ⊆ q (recall that we view queries as sets of atoms) and there is a
homomorphism h : q0 → CTR ,AR such that h(x) = a for every x ∈ avars(q). Let
tr = {x ∈ vars(q0 ) | h(x) = a}, and let ti be the remaining set of (quantified) variables
in q0 . We call the pair t = (tr , ti ) a tree witness for q and T generated by R if ti 6= ∅
and q0 is a minimal subset of q such that, for any y ∈ ti , every atom in q containing y
belongs to q0 . In this case, we denote q0 by qt . Note that the same tree witness can be
generated by different roles R. We let ΘTq be the set of all tree witnesses of q and T and
use ΘTq [R] to denote those generated by R. We use x ∈ t as a shorthand for x ∈ tr ∪ ti .
    To every CQ q and TBox T , we can naturally associate the hypergraph whose ver-
tices are the atoms of q and whose hyperedges are the sets qt , for tree witnesses t for q
and T . We denote this hypergraph by HTq and call the corresponding function fHTq the
tree witness hypergraph function of q and T . It is known that the circuit complexity of
fHTq provides an upper bound on the size of rewritings of q and T .
Theorem 1 (from [15]). If fHTq is computed by a (monotone) Boolean formula χ then
there is a (PE-) FO-rewriting of q and T of size O(|χ| · |q| · |T |).
    If fHTq is computed by a monotone Boolean circuit C then there is an NDL-rewriting
of q and T of size O(|C| · |q| · |T |).

    Observe that fHTq contains a variable pt for every tree witness t. For this reason,
it can only be used to show polynomial upper bounds in cases where |ΘTq | is bounded
polynomially in |q| and |T |. This motivates us to consider a variant of fHTq :

                     _              ^              ^      ^                     _        ^              
        0
                                                                                                pR
                                                                                                     
       fH q =                                p% ∧                    pz=z0 ∧                     z           (2)
          T
                         q
                    Θ⊆ΘT            %∈q\qΘ          t∈Θ   z,z 0 ∈t              R∈N±      z∈t
                                                                                    R ,
                  independent                                                     q
                                                                               t∈ΘT [R]


where qΘ = t qt . Intuitively, pz=z0 enforces that variables z and z 0 are mapped to
                S
elements of CT ,A that begin by the same ABox individual; the variable pR  z states that z
is mapped to an element that begins by an individual a satisfying T , A |= ∃R(a).
                                                   0
     We observe that the number of variables in fH   q is polynomially bounded in |q| and
                                                     T
           0
|T |, but fH q retains the same properties as fH q regarding upper bounds.
                                                T
              T


                                                               0
Theorem 2. Theorem 1 continues to hold if fHTq is replaced by fH q.
                                                                                     T


3.3   Lower bounds via primitive evaluation functions
In order to obtain lower bounds on the size of rewritings, it will prove convenient to
                                                 P
associate to each pair (q, T ) a third function fq,T that describes the result of evaluating
q on single-individual ABoxes. Given Boolean vectors α : NC ∩ (sig(T ) ∪ sig(q)) →
{0, 1} and β : NR ∩ (sig(T ) ∪ sig(q)) → {0, 1}, we let

                  A(α, β) = {A(a) | α(A) = 1} ∪ {r(a, a) | β(r) = 1}
         P
and set fq,T (α, β) = 1 iff T , A(α, β) |= q(a), where a is a tuple of a’s of the required
                  P
length. We call fq,T the primitive evaluation function for q and T .

Theorem 3 (implicit in [15]). If q0 is a (PE-) FO-rewriting of q and T , then there is a
(monotone) Boolean formula χ of size O(|q0 |) which computes fq,TP
                                                                    .
                                                      P
   If (Π, G) is an NDL-rewriting of q and T , then fq,T is computed by a monotone
Boolean circuit C of size O(|Π|).

4     Bounded Branching Queries
It is known from [14] that tree-shaped CQs do not have polysize PE- or NDL-rewritings
(nor polysize FO-rewritings, unless NP ⊆ P/poly). In this section, we investigate the
robustness of these results by considering a restricted form of tree-shaped queries. We
will say that a tree-shaped CQ q has k leaves if the associated graph Gq (defined in
Section 2) contains exactly k vertices of degree 1. We will be interested in bounded
branching queries (that is, tree-shaped CQs with a bounded number of leaves) and
linear queries (having exactly 2 leaves).
4.1        Bounded branching queries, interval hypergraphs and NBPs
It is not hard to see that every linear query induces a tree witness hypergraph that is
isomorphic to an interval hypergraph, i.e. a hypergraph H = (V, E) where V = {[i, i+
1] | 1 ≤ i < n} for some finite n and E is a set of intervals of the form [i, j] =
{[k, k + 1] | 1 ≤ k < j} where 1 ≤ i < j ≤ n. Since the hypergraph functions of
interval hypergraphs can be computed by polynomial-size NBPs, the same holds for the
tree witness hypergraph functions of linear queries. In fact, the following result shows
that we can construct polysize NBPs not only for linear queries, but also for bounded
branching queries:
Theorem 4. Fix a constant ` > 1. Then there exists a polynomial p such that for every
tree-shaped CQ q with at most ` leaves and every OWL 2 QL TBox T , there is an NBP
of size at most p(|q| + |T |) that computes fHTq .
We next give a polynomial translation from NBPs into interval hypergraph programs,
thereby establishing the polynomial equivalence of these formalisms:
Theorem 5. Every function f that is computable by an NBP P is also computable by
an interval hypergraph program of size polynomial in |P |.
To complete the chain, we show how to compute hypergraph functions for interval
hypergraphs using primitive evaluation functions of linear CQs and TBoxes of depth 2.
To every interval hypergraph H, we associate the linear CQ qH pictured below:
                                  ^
                    qH = ∃y             (ri (yi , yi0 ) ∧ ri0 (yi0 , yi+1 )).
                                           [i,i+1]∈V


      y1     r1      y10   r10   y2   r2    y20   r20   y3   r3   y30    r30    y4    r4    y40     r40   y5


The TBox TH contains the following axioms (on the left) for each edge [i, j] ∈ E:

                  Bij v ∃sij , ∃s−      0
                                 ij v ∃sij ,                            s013    r10   r2   r20    r3
                  sij v ri , s−      0
                               ij v rj ,
                  s0ij v rk0 , for every i ≤ k < j
                                                                        s13     r1                r30
                  s0ij v rk− ,    for every i < k ≤ j

                                                                          B13

To the right, we illustrate the canonical model generated by B13 . Observe how the
axioms ensure that the subquery of qH lying between y1 and y4 can be mapped onto it.

Theorem 6. For every interval hypergraph H = (V, E) and for all α : V → {0, 1}
and β : E → {0, 1} we have fH (α, β) = 1 iff fqPH ,TH (γ) with γ defined as follows:
γ(Bij ) = β([i, j]), γ(ri ) = γ(ri0 ) = α([i, i + 1]), and γ(sij ) = γ(s0ij ) = 0.

4.2        Size of Rewritings of Bounded Branching Queries
We now apply the results from Section 4.1 to derive bounds on rewriting size. It is
known that there is a sequence fn of monotone Boolean functions that are computable
by polynomial-size monotone NBPs, but all monotone Boolean formulas computing fn
are of size nΩ(log n) [13]. Using this fact, together with Theorems 1,3, 5, and 6, we
obtain a strong negative result for PE-rewritings.

Theorem 7. There is a sequence of linear CQs qn and TBoxes Tn of depth 2, both of
polysize in n, such that any PE-rewriting of qn and Tn is of size nΩ(log n) .

    We obtain a positive result for NDL-rewritings using Theorems 1 and 4 and the fact
that NBPs are representable as polynomial-size monotone circuits [22].

Theorem 8. Fix a constant ` > 1. Then all tree-shaped CQs with at most ` leaves and
arbitrary TBoxes have polynomial-size NDL-rewritings.

   Finally, we use Theorems 1,3, 5, and 6 to show that the existence of polysize FO-
rewritings is equivalent to the open problem of whether NL/poly ⊆ NC1 .

Theorem 9. The following are equivalent:
 1. There exist polysize FO-rewritings for all linear CQs and depth 2 TBoxes;
 2. There exist polysize FO-rewritings for all tree-shaped CQs with at most ` leaves
    and arbitrary TBoxes (for any fixed `);
 3. There exists a polynomial function p such that every NBP of size at most s is com-
    putable by a formula of size p(s). Equivalently, NL/poly ⊆ NC1 .

5   Bounded Treewidth Queries
In Section 4, we gave bounds on the size of rewritings for restricted classes of tree-
shaped CQs. In the present section, we consider arbitrary tree-shaped queries and their
natural generalization to bounded treewidth queries [9]. As the rewriting size of tree-
shaped queries and arbitrary TBoxes has already been studied [14], we will focus on a
class of “well-behaved” TBoxes, that includes TBoxes of bounded depth as a special
case. We begin by formally introducing the classes of queries and TBoxes we consider.

Bounded treewidth queries We recall that a tree decomposition of an undirected graph
G = (V, E) is a pair (T, λ) such that T is an (undirected) tree and λ assigns a label
λ(N ) ⊆ V to every node N of T such that the following conditions are satisfied:
 1. For every v ∈ V , there exists a node N with v ∈ λ(N ).
 2. For every edge e ∈ E, there exists a node N such that e ⊆ λ(N ).
 3. For every v ∈ V , the nodes {N | v ∈ λ(N )} induce a connected subtree of T .
The width of a tree decomposition (T, λ) is equal to maxN |λ(N )|−1, and the treewidth
of a graph G is the minimum width over all tree decompositions of G. The treewidth of
a CQ q is defined as the treewidth of the graph Gq .
Polynomial image property Let T be an OWL 2 QL TBox, and let q be a CQ. Then
the set Wq,T of relevant words for q and T consists of all words w of length at most
|T |+|q| such that there exists an ABox A that is consistent with T and a homomorphism
h : q → CT ,A whose image contains an element of the form aw. The length bound is
motivated by the following well-known fact:
Lemma 1. If A is consistent with T and T , A |= q(a), then there is some h : q(a) →
CT ,A whose image is contained in {aw | a ∈ Inds(A), w ∈ Wq,T }.

We say that a class T of TBoxes has the polynomial image property if there is a poly-
nomial p such that for every TBox T ∈ T and every CQ q, |Wq,T | ≤ p(|T | + |q|).
Observe that if d ≥ 0 is fixed, then the class of TBoxes of depth at most d has the
polynomial image property. Another relevant class of TBoxes with this property is the
class of TBoxes that do not contain role inclusions.
5.1   Bounded treewidth queries and tree hypergraph programs
As in Section 4, our first step will be to relate Boolean functions induced by the query
and TBox with hypergraph programs. The main difference is that in lieu of interval
hypergraph programs, we will use tree hypergraph programs.
     To formally define tree hypergraph programs, we must first introduce some defini-
tions related to trees. Given a tree T with vertices u and v, the interval hu, vi is the set
of edges that appear on the simple path connecting u and v. If v1 , . . . , vk are vertices of
T , then the generalized interval hv1 , . . . , vk i is defined as the union of intervals hvi , vj i
over all pairs (i, j). A hypergraph H = (VH , EH ) is a tree hypergraph if there is a tree
T = (VT , ET ) such that VH = ET and every hyperedge in EH is a generalized interval
of T . A hypergraph program is a tree hypergraph program (TreeHGP) if it is based on
a tree hypergraph.
         0                                                                               0
From fH    q to TreeHGP. We show how to construct a TreeHGP that computes f q ,
                                                                                         HT
           T
given a TBox T , a CQ q, and a tree decomposition (T, λ) of Gq of width t. We
may suppose w.l.o.g. that T contains at most (2|q| − 1)2 nodes, cf. [16]. In order to
more easily refer to the variables in λ(N ), we construct functions λ1 , . . . , λt such that
λi (N ) ∈ λ(N ) and λ(N ) = ∪i λi (N ).
     The basic idea underlying the construction is as follows: for each node N in the
tree decomposition of q, we select an abstract description of the way the variables in
λ(N ) are homomorphically mapped into the canonical model, and we check that the
selected descriptions respect the subqueries of each node and are consistent with each
other. Formally, these abstract descriptions are given by the set Γt (q, T ) consisting of
all t-tuples w = (w1 , . . . , wt ) of words from Wq,T . Intuitively, the words in w specify,
for each variable x in λ(N ), the path of roles that lead from the ABox to the image of x
in the canonical model. We say that w ∈ Γt (q, T ) is consistent with a node N in T if:
  – if A(λi (N )) ∈ q, then either w[i] = ε or w[i] = w0 R and ∃R− vT A
  – if r(λi (N ), λj (N )) ∈ q, then one of the following holds:
      • w[i] = w[j] = ε
      • w[j] = w[i] · R with R vT r
      • w[i] = w[j] · R with R vT r−
We call a pair of tuples (w1 , w2 ) compatible with the pair of nodes (N1 , N2 ) if:
  – λi (N1 ) = λj (N2 ) implies that w1 [i] = w2 [j]
We assume that the elements of Γt (q, T ) are numbered from 1 to M , and use ξi to refer
to the i-th element. We define a tree T 0 that replaces each edge {Ni , Nj } in T by the
following sequence of edges:

          {Ni , u1ij }, {u1ij , vij
                                 1       1
                                    }, {vij , u2ij }, {u2ij , vij
                                                               2
                                                                  }, . . . {uM     M     M     M
                                                                             ij , vij }{vij , vji }
                               2     2       1     2       1     1            1
          {uM     M
            ji , vji } . . . {uji , vji }, {vji , uji }, {uji , vji }, {Nj , uji }

The desired TreeHGP (Hq,T , lq,T ) is based upon T 0 and contains the hyperedges:
 – Eik = hukij1 , . . . , ukijn i, for every ξk ∈ Γt (q, T ) that is consistent with Ni , where
   Nj1 , . . . , Njn are the neighbours of Ni
     km           k    m
 – Eij    = hvij    , vji i, for every pair of tuples (ξk , ξm ) that is compatible with (Ni , Nj )
Vertices of the hypergraph (i.e. the edges in T 0 ) are labeled by lq,T as follows:
 – every edge of the form {Ni , u1ij }, {vij
                                          `
                                             , u`+1        M     M
                                                ij }, or {vij , vji } is labelled 0
                   `   `
 – every edge {uij , vij } with ξ` = w is labelled by the conjunction of:
     • p% , if vars(%) ⊆ λ(Ni ) and λg (Ni ) ∈ vars(%) implies w[g] = ε
                                                                           0
     • pR
        z , if vars(%) = {z} ⊆ λ(Ni ), z = λg (Ni ), and w[g] = Rw
     • pz , pz0 , and pz=z , if vars(%) = {z, z } ⊆ λ(Ni ), z = λg (Ni ), z 0 = λg0 (Ni ),
        R     R            0
                                                 0

       and either w[g] = Rw0 or w[g 0 ] = Rw0
                                                                            0
Theorem 10. For every TBox T and CQ q, the TreeHGP (Hq,T , lq,T ) computes fH q.
                                                                                                      T
If q has treewidth t, then |Hq,T | ≤ 8 · |q|2 · |Wq,T |2t .

                      P
From TreeHGP to fq,T      . Consider a tree hypergraph H = (V, E) that is based upon
the tree T whose vertices are v1 , . . . , vn . Let T ↓ be the directed tree obtained from T
by fixing v1 as the root and orienting edges away from v1 .
    We wish to construct a tree-shaped query qH and TBox TH of depth 2 whose prim-
itive evaluation function fqPH ,TH can be used to compute fH . The construction general-
izes the one from the preceding section for linear queries. The query qH is obtained by
doubling the edges in T ↓ :
                                    ^
                                                              0
                    qH = ∃y               (rij (yi , yij ) ∧ rij (yij , yj )).
                                     (vi ,vj )∈T ↓

The TBox TH is defined as the union of Te over all hyperedges e ∈ E. Consider some
hyperedge e = hvi1 , . . . , vim i ∈ E, and suppose w.l.o.g. that vi1 is the vertex in e that
is highest in T ↓ . Then Te contains Be v ∃se , ∃s−        0
                                                   e v ∃se , and the axioms:

     se v ri1 ,k      if {vi1 , vk } ∈ e
    s−      0
     e v rj` ,i`      if 1 < ` ≤ n and (vj` , vi` ) ∈ T ↓ (so, {vj` , vi` } ∈ e)
              −
      s0e v rj,k      if {vj , vk } ∈ e, (vj , vk ) ∈ T ↓ , and vj 6= vi1
      s0e v rj,k
              0
                      if {vj , vk } ∈ e, (vj , vk ) ∈ T ↓ , and vk 6= vi` for any 1 < ` ≤ m

Observe that both qH and TH are of polynomial size in |H|.
Theorem 11. For every tree hypergraph H = (V, E) and for all α : V → {0, 1} and
β : E → {0, 1}, fH (α, β) = 1 iff fqPH ,TH (γ) = 1 where γ is defined as follows:
                            0
γ(Be ) = β(e), γ(rij ) = γ(rij ) = α({vi , vj }), and γ(se ) = γ(s0e ) = 0.
5.2   Tree hypergraph programs and SAC1
To characterize the power of tree hypergraph programs, we consider semi-unbounded
fan-in circuits in which NOT gates are applied only to the inputs, AND gates have
fan-in 2, and OR gates have unbounded fan-in. The complexity class SAC1 [27] is
defined by considering circuits of this type having polynomial size and logarithmic
depth; SAC1 is the non-uniform analog of the class LOGCFL of all languages logspace-
reducible to context-free languages [26].
    We consider semi-unbounded fan-in circuits of size σ and depth log σ, where σ is
a parameter, and show that they are polynomially equivalent to TreeHGP by providing
reductions in both directions (details can be found in [3]).
Theorem 12. There exist polynomial functions p and p0 such that:
 – Every semi-unbounded fan-in circuit of size at most s and depth at most log σ is
   computable by a TreeHGP of size p(σ).
 – Every TreeHGP of size σ is computable by semi-unbounded fan-in circuit of size at
   most p0 (σ) and depth at most log p0 (σ).
5.3   Size of rewritings of bounded treewidth queries
Theorems 10 and 12 together show us how to construct a polysize monotone SAC1
                       0
circuit that computes fH q . Thus, by applying Theorem 2, we obtain:
                         T

Theorem 13. Fix a constant t > 0, and let T be a class of OWL 2 QL TBoxes with the
polynomial image property. Then all CQs of treewidth at most t and TBoxes in T have
polynomial-size NDL-rewritings.
    In the case of FO-rewritings, we can show that the existence of polysize rewritings
corresponds to the open question of whether SAC1 ⊆ NC1 .
Theorem 14. The following are equivalent:
 1. There exist polysize FO-rewritings for all tree-shaped CQs and depth 2 TBoxes;
 2. There exist polysize FO-rewritings for all CQs of treewidth at most t and TBoxes
    from a class T with the polynomial image property (for any fixed t);
 3. There exists a polynomial function p such that every semi-unbounded fan-in circuit
    of size at most σ and depth at most log σ is computable by a formula of size p(σ).
    Equivalently, SAC1 ⊆ NC1 .
6     Conclusion and Future Work
In this paper, we filled some gaps in the succinctness landscape for query rewriting in
OWL 2 QL by providing new bounds on the worst-case rewriting size for various forms
of tree-like queries. In doing so, we closed one of the open questions from [15].
    In future work, we plan to consider additional dimensions of the succinctness land-
scape. For example, all existing lower bounds rely on sequences (qn , Tn ) in which the
number of roles in qn and Tn grows with n. Moreover, Tn often contains a large number
of role inclusions. Thus, an interesting and practically relevant problem is to explore the
impact of restricting the number of roles and/or the use of role inclusions.
Acknowledgments. This work was partially supported by ANR grant 12-JS02-007-01,
Russian Foundation for Basic Research and the programme “Leading Scientific Schools”.
References
 1. S. Arora and B. Barak. Computational Complexity: A Modern Approach. Cambridge Uni-
    versity Press, New York, NY, USA, 1st edition, 2009.
 2. F. Baader, D. Calvanese, D. McGuinness, D. Nardi, and P. Patel-Schneider, editors. The De-
    scription Logic Handbook: Theory, Implementation and Applications. Cambridge University
    Press, 2003.
 3. M. Bienvenu, S. Kikot, and V. Podolskii. Succinctness of query rewriting in OWL 2 QL: The
    case of tree-like queries. CoRR, abs/1406.3047, 2014.
 4. M. Bienvenu, M. Ortiz, M. Simkus, and G. Xiao. Tractable queries for lightweight descrip-
    tion logics. In Proc. of the 23rd International Joint Conference on Artificial Intelligence
    (IJCAI 2013). AAAI Press, 2013.
 5. D. Calvanese, G. De Giacomo, D. Lembo, M. Lenzerini, and R. Rosati. Tractable reasoning
    and efficient query answering in description logics: The DL-Lite family. Journal of Auto-
    mated Reasoning, 39(3):385–429, 2007.
 6. A. Chortaras, D. Trivela, and G. Stamou. Optimized query rewriting for OWL 2 QL. In
    Proc. of the 23rd Int. Conf. on Automated Deduction (CADE-23), volume 6803 of Lecture
    Notes in Computer Science, pages 192–206. Springer, 2011.
 7. T. Eiter, M. Ortiz, M. Šimkus, T.-K. Tran, and G. Xiao. Query rewriting for Horn-SHIQ plus
    rules. In Proc. of the 26th AAAI Conf. on Artificial Intelligence (AAAI 2012). AAAI Press,
    2012.
 8. G. Gottlob, S. Kikot, R. Kontchakov, V. Podolskii, T. Schwentick, and M. Zakharyaschev.
    The price of query rewriting in ontology-based data access. Artificial Intelligence, to appear,
    2014.
 9. G. Gottlob, N. Leone, and F. Scarcello. Hypertree decompositions and tractable queries.
    In V. Vianu and C. H. Papadimitriou, editors, Proc. of the 18th ACM SIGACT-SIGMOD-
    SIGART Symposium on Principles of Database Systems (PODS 1999), pages 21–32. ACM
    Press, 1999.
10. G. Gottlob, G. Orsi, and A. Pieris. Ontological queries: Rewriting and optimization. In
    Proc. of the 27th Int. Conf. on Data Engineering (ICDE 2011), pages 2–13. IEEE Computer
    Society, 2011.
11. G. Gottlob and T. Schwentick. Rewriting ontological queries into small nonrecursive datalog
    programs. In Proc. of the 13th Int. Conf. on the Principles of Knowledge Representation and
    Reasoning (KR 2012), pages 254–263. AAAI Press, 2012.
12. S. Jukna. Boolean Function Complexity: Advances and Frontiers. Springer, 2012.
13. M. Karchmer and A. Wigderson. Monotone circuits for connectivity require super-
    logarithmic depth. In Proc./ of the 20th Annual ACM Symposium on Theory of Computing
    (STOC 1988), pages 539–550. ACM Press, 1988.
14. S. Kikot, R. Kontchakov, V. V. Podolskii, and M. Zakharyaschev. Exponential lower bounds
    and separation for query rewriting. In Proc. of the 39th Int. Colloquium on Automata, Lan-
    guages, and Programming (ICALP 2012), Part II, volume 7392 of Lecture Notes in Com-
    puter Science, pages 263–274. Springer, 2012.
15. S. Kikot, R. Kontchakov, V. V. Podolskii, and M. Zakharyaschev. On the succinctness of
    query rewriting over OWL 2 QL ontologies with shallow chases. In Proc. of the 29th Annual
    ACM/IEEE Symposium on Logic in Computer Science (LICS 2014). ACM Press, 2014.
16. T. Kloks. Treewidth: Computations and Approximations, volume 842 of Lecture Notes in
    Computer Science. Springer, 1994.
17. M. König, M. Leclère, M.-L. Mugnier, and M. Thomazo. A sound and complete backward
    chaining algorithm for existential rules. In Proc. of the 6th Int. Conf. on Web Reasoning
    and Rule Systems (RR 2012), volume 7497 of Lecture Notes in Computer Science, pages
    122–138. Springer, 2012.
18. B. Motik, B. Cuenca Grau, I. Horrocks, Z. Wu, A. Fokoue, and C. Lutz. OWL 2 Web
    Ontology Language profiles. W3C Recommendation, 11 December 2012. Available at
    http://www.w3.org/TR/owl2-profiles/.
19. M. Ortiz and M. Simkus. Reasoning and query answering in description logics. In Reasoning
    Web, volume 7487 of Lecture Notes in Computer Science, pages 1–53. Springer, 2012.
20. H. Pérez-Urbina, B. Motik, and I. Horrocks. A comparison of query rewriting techniques for
    DL-Lite. In Proc. of the 22nd Int. Workshop on Description Logics (DL 2009), volume 477.
    CEUR-WS, 2009.
21. H. Pérez-Urbina, E. Rodrı́guez-Dı́az, M. Grove, G. Konstantinidis, and E. Sirin. Evaluation
    of query rewriting approaches for OWL 2. In Proc. of SSWS+HPCSW 2012, volume 943.
    CEUR-WS, 2012.
22. A. Razborov. Lower bounds for deterministic and nondeterministic branching programs. In
    Proc. of the 8th Int. Symposium on Fundamentals of Computation Theory (FCT’91), volume
    529 of Lecture Notes in Computer Science, pages 47–60. Springer, 1991.
23. M. Rodrı́guez-Muro, R. Kontchakov, and M. Zakharyaschev. Ontology-based data access:
    Ontop of databases. In Proc. of the 12th Int. Semantic Web Conf. (ISWC 2013), volume 8218
    of Lecture Notes in Computer Science, pages 558–573. Springer, 2013.
24. R. Rosati. Prexto: Query rewriting under extensional constraints in DL-Lite. In Proc. of the
    9th Extended Semantic Web Conf. (EWSC 2012), volume 7295 of Lecture Notes in Computer
    Science, pages 360–374. Springer, 2012.
25. R. Rosati and A. Almatelli. Improving query answering over DL-Lite ontologies. In Proc. of
    the 10th Int. Conf. on the Principles of Knowledge Representation and Reasoning (KR 2010),
    pages 290–300. AAAI Press, 2010.
26. H. Venkateswaran. Properties that characterize LOGCFL. J. Computer and System Sciences,
    43(2):380–404, 1991.
27. H. Vollmer. Introduction to circuit complexity - a uniform approach. Texts in theoretical
    computer science. Springer, 1999.
28. M. Yannakakis. Algorithms for acyclic database schemes. In Proc. of the 7th Int. Conf. on
    Very Large Data Bases (VLDB’81), pages 82–94. IEEE Computer Society, 1981.