=Paper= {{Paper |id=Vol-1577/paper_28 |storemode=property |title=Theoretically Optimal Datalog Rewritings for OWL 2 QL Ontology-Mediated Queries |pdfUrl=https://ceur-ws.org/Vol-1577/paper_28.pdf |volume=Vol-1577 |authors=Meghyn Bienvenu,Stanislav Kikot,Roman Kontchakov,Vladimir Podolskii,Michael Zakharyaschev |dblpUrl=https://dblp.org/rec/conf/dlog/BienvenuKKPZ16 }} ==Theoretically Optimal Datalog Rewritings for OWL 2 QL Ontology-Mediated Queries== https://ceur-ws.org/Vol-1577/paper_28.pdf
         Theoretically Optimal Datalog Rewritings for
           OWL 2 QL Ontology-Mediated Queries

    M. Bienvenu1, S. Kikot2, R. Kontchakov2, V. Podolskii3, and M. Zakharyaschev2
             1
                CNRS & University of Montpellier, France (meghyn@lirmm.fr)
2
    Birkbeck, University of London, U.K. ({kikot,roman,michael}@dcs.bbk.ac.uk)
         3
           Steklov Mathematical Institute, Moscow, Russia (podolskii@mi.ras.ru)


        Abstract. We show that, for OWL 2 QL ontology-mediated queries with (i) on-
        tologies of bounded depth and conjunctive queries of bounded treewidth, (ii) on-
        tologies of bounded depth and bounded-leaf tree-shaped conjunctive queries, and
        (iii) arbitrary ontologies and bounded-leaf tree-shaped conjunctive queries, one can
        construct and evaluate nonrecursive datalog rewritings by, respectively, LOGCFL,
        NL and LOGCFL algorithms, which matches the optimal combined complexity.


1    Introduction
Ontology-based data access (OBDA) via query rewriting [18] reduces the problem of
finding answers to conjunctive queries (CQs) mediated by OWL 2 QL ontologies to
standard database query answering. The question we are concerned with here is whether
this reduction is optimal with respect to the combined complexity of query evaluation.
Figure 1 (a) summarises what is known about the size of positive existential (PE),
nonrecursive datalog (NDL) and first-order (FO) rewritings of OWL 2 QL ontology-
mediated queries (OMQs) depending on the existential depth of their ontologies and
the shape of their CQs [13, 9, 12, 3]. Figure 1 (b) shows the combined complexity of
OMQ evaluation for the corresponding classes of OMQs [5, 14, 12, 3]. Thus, we see, for
example, that PE-rewritings for OMQs with ontologies of bounded depth and CQs of
bounded treewidth can be of super-polynomial size, and so not evaluable in polynomial
time, while the evaluation problem for these OMQs is decidable in LOGCFL ⊆ P. On
the other hand, the OMQs in this class enjoy polynomial-size NDL-rewritings. However,
these rewritings were defined using an argument from circuit complexity [3], and it
has been unclear whether they can be constructed and evaluated in LOGCFL. The same
concerns the class of OMQs with ontologies of bounded depth and bounded-leaf tree-
shaped queries, which can be evaluated in NL, and the class of OMQs with arbitrary
ontologies and bounded-leaf tree-shaped queries, which can be evaluated in LOGCFL.
    In this paper, we consider OMQs in these three classes and construct NDL-rewritings
that are theoretically optimal in the sense that the rewriting and evaluation can be
carried out by algorithms of optimal combined complexity, that is, from the complexity
classes LOGCFL, NL and LOGCFL, respectively. Such algorithms are known to be space
efficient and highly parallelisable. We compared our optimal NDL rewritings with those
produced by query rewriting engines Clipper [8] and Rapid [6], using a sequence of
OMQs with linear CQs and a fixed ontology of depth 1.
    The full version is available at http://tinyurl.com/LogNDL-DL.
       (a)                          poly NDL, but no poly PE                                     (b)
                                    poly FO iff NL/poly ⊆ NC1
                 1        poly Π4 -PE           poly PE                                           1




                                                                             iff NP/poly⊆NC1
                 2     poly NDL                   poly NDL                                        2
ontology depth



                 3                                no poly PE                                      3            NL              LOGCFL
                       no poly PE
           ...                                     poly FO                                       ...
                        poly FO                   iff
                 d          iff             LOGCFL/poly⊆NC1                                       d
                      NL/poly ⊆ NC1




                                                                        poly
                                                                         FO
         arb.                               no poly NDL & PE                                     arb.     LOGCFL                               NP
                      2    ...      `     trees     2       . . . bound. arb.                             2  ...     ` trees    2   . . . bound. arb.
                      number of leaves                       treewidth                                  number of leaves            treewidth


                      Fig. 1. (a) Size of OMQ rewritings; (b) combined complexity of OMQ evaluation.


2                    Preliminaries
We give OWL 2 QL in the DL syntax with individual names ai , concept names Ai , and
role names Pi (i ≥ 1). Roles R and basic concepts B are defined by

                              R     ::=       Pi        |     Pi− ,                              B       ::=        Ai   |     ∃R.

A TBox, T , is a finite set of inclusions of the form

                           B1 v B2 ,              B1 u B2 v ⊥,                                 R1 v R2 ,            R1 u R2 v ⊥.

An ABox, A, is a finite set of atoms of the form Ak (ai ) or Pk (ai , aj ). We denote by
ind(A) the set of individual names in A, and by RT the set of role names occurring in T
and their inverses. We use A ≡ B for A v B and B v A. The semantics for OWL 2 QL
is defined in the usual way based on interpretations I = (∆I , ·I ) [2].
    For every role R ∈ RT , we take a fresh concept name AR and add AR ≡ ∃R to
T . The resulting TBox is said to be in normal form, and we assume, without loss of
generality, that all our TBoxes are in normal form. The subsumption relation induced by
T is denoted by vT : we write S1 vT S2 if T |= S1 v S2 , where S1 , S2 are both either
concepts or roles. We write R(a, b) ∈ A if P (a, b) ∈ A and R = P , or P (b, a) ∈ A
and R = P − ; we also write (∃R)(a) ∈ A if R(a, b) ∈ A for some b. An ABox A is
called H-complete with respect to T in case

P (a, b) ∈ A if R(a, b) ∈ A, for roles P and R with R vT P,
  A(a) ∈ A if B(a) ∈ A, for a concept name A and basic concept B with B vT A.

    A conjunctive query (CQ) q(x) is a formula ∃y ϕ(x, y), where ϕ is a conjunction
of atoms Ak (z1 ) or Pk (z1 , z2 ) with zi ∈ x ∪ y (without loss of generality, we assume
that CQs do not contain constants). We denote by var(q) the variables x ∪ y of q
and by avar(q) the answer variables x. An ontology-mediated query (OMQ) is a pair
Q(x) = (T , q(x)), where T is a TBox and q(x) a CQ. A tuple a in ind(A) is a certain
answer to Q(x) over an ABox A if I |= q(a) for all models I of T and A; in this
case we write T , A |= q(a). If x = ∅, then a certain answer to Q over A is ‘yes’ if
T , A |= q and ‘no’ otherwise. We often regard a CQ q as the set of its atoms.
     Every consistent OWL 2 QL knowledge base (KB) (T , A) has a canonical model
CT ,A with the property that T , A |= q(a) iff CT ,A |= q(a), for any CQ q and any a
in ind(A). Thus, CQ answering in OWL 2 QL amounts to finding a homomorphism
from the given CQ into the canonical model. Informally, CT ,A is obtained from A by
repeatedly applying the axioms in T , introducing fresh elements as needed to serve as
witnesses for the existential quantifiers. According to the standard construction (cf. [16]),
the domain ∆CT ,A of CT ,A consists of words of the form aR1 . . . Rn (n ≥ 0) with
a ∈ ind(A) and R1 . . . Rn ∈ R∗T such that (i) T , A |= ∃R1 (a) and (ii) ∃Ri− vT ∃Ri+1
and Ri− 6vT Ri+1 , for 1 ≤ i < n. We let WT consist of all words R1 . . . Rn ∈ R∗T
satisfying condition (ii). A TBox T is of depth ω if WT is infinite, and of depth d < ω,
if d is the maximum length of the words in WT .
     A datalog program, Π, is a finite set of Horn clauses ∀z (γ0 ← γ1 ∧ · · · ∧ γm ),
where each γi is an atom S(y) with y ⊆ z or an equality (z = z 0 ) with z, z 0 ∈ z. (As
usual, when writing clauses, we omit ∀z.) The atom γ0 is the head of the clause, and
γ1 , . . . , γm its body. All variables in the head must also occur in the body, and = can
only occur in the body. The predicates in the heads of clauses in Π are IDB predicates,
the rest (including =) EDB predicates. A predicate S depends on S 0 in Π if Π has
a clause with S in the head and S 0 in the body; Π is a nonrecursive datalog (NDL)
program if the (directed) dependence graph of the dependence relation is acyclic.
     An NDL query is a pair (Π, G(x)), where Π is an NDL program and G(x) a
predicate. A tuple a in ind(A) is an answer to (Π, G(x)) over an ABox A if G(a) holds
in the first-order model with domain ind(A) obtained by closing A under the clauses
in Π; in this case we write Π, A |= G(a). The problem of checking whether a is an
answer to (Π, G(x)) over A is called the query evaluation problem. The arity of Π
is the maximal arity, r(Π), of predicates in Π. The depth of (Π, G(x)) is the length,
d(Π, G), of the longest directed path in the dependence graph for Π starting from G.
NDL queries are equivalent if they have exactly the same answers over any ABox.
     An NDL query (Π, G(x)) is an NDL-rewriting of an OMQ Q(x) = (T , q(x)) over
H-complete ABoxes in case T , A |= q(a) iff Π, A |= G(a), for any H-complete ABox
A and any tuple a in ind(A). Rewritings over arbitrary ABoxes are defined by dropping
the condition that the ABoxes are H-complete. Let (Π, G(x)) be an NDL-rewriting of
Q(x) over H-complete ABoxes. Denote by Π ∗ the result of replacing each predicate S
in Π with a fresh predicate S ∗ and adding the clauses A∗ (x) ← B 0 (x), for B vT A,
and P ∗ (x, y) ← R0 (x, y), for R vT P , where B 0 (x) and R0 (x, y) are the obvious
first-order translations of B and R (for example, B 0 (x) = ∃y R(x, y) if B = ∃R). It is
easy to see that (Π ∗ , G∗ (x)) is an NDL-rewriting of Q(x) over arbitrary ABoxes.
     It is well-known [4] that, without loss of generality, we can only consider NDL-
rewritings of OMQs (T , q(x)) over ABoxes A that are consistent with T .
     We call an NDL query (Π, G(x1 , . . . , xn )) ordered if each of its IDB predicates S
comes with fixed variables xi1 , . . . , xik (1 ≤ i1 < · · · < ik ≤ n), called the parameters
of S, such that (i) every occurrence of S in Π is of the form S(y1 , . . . , ym , xi1 , . . . , xik ),
(ii) the xi are the parameters of G, and (iii) if x0 are all the parameters in the body of
a clause, then the head has x0 among its parameters. The width w(Π, G) of a ordered
(Π, G) is the maximal number of non-parameter variables in a clause of Π. All our
NDL-rewritings in Secs. 4–6 are ordered, so we now only consider ordered NDL queries.
3   NL and LOGCFL Fragments of Nonrecursive Datalog

In this section, we identify two classes of (ordered) NDL queries with the evaluation
problem in the complexity classes NL and LOGCFL for combined complexity. Recall [1]
that an NDL program is called linear if the body of its every clause contains at most one
IDB predicate (remember that equality is an EDB predicate).
Theorem 1. Fix some w > 0. The combined complexity of evaluating linear NDL
queries of width at most w is NL-complete.

Proof. Let (Π, G(x)) be a linear NDL query. Deciding whether Π, A |= G(a) is re-
ducible to finding a path to G(a) from a certain set X in the grounding graph G(Π, A, a)
constructed as follows. The vertices of the graph are the ground atoms obtained by
taking an IDB atom from Π, replacing each of its parameters by the corresponding
constant from a, and replacing each non-parameter variable by some constant from A.
The graph has an edge from S(c) to S 0 (c0 ) iff the grounding of Π contains a clause
S 0 (c0 ) ← S(c) ∧ E1 (e1 ) ∧ · · · ∧ Ek (ek ) with Ej (ej ) ∈ A, for 1 ≤ j ≤ k (we
assume that (c = c) ∈ A). The set X consists of all vertices S(c) with IDB predi-
cates S being of in-degree 0 in the dependency graph of Π for which there is a clause
S(c) ← E1 (e1 ) ∧ · · · ∧ Ek (ek ) in the grounding of Π with Ej (ej ) ∈ A (1 ≤ j ≤ k).
Bounding the width of (Π, G) ensures that G(Π, A, a) is of polynomial size and can be
constructed by a deterministic Turing machine with separate input, write-once output
and logarithmic-size working tapes.                                                   q

    The transformation of NDL-rewritings over H-complete ABoxes into rewritings for
arbitrary ABoxes in Section 2 does not preserve linearity. However, we can still show
that it suffices to consider the H-complete case:
Lemma 2. For any fixed w > 0, there is an LNL -transducer that, given a linear NDL-
rewriting of an OMQ Q(x) over H-complete ABoxes that is of width at most w, computes
a linear NDL-rewriting of Q(x) over arbitrary ABoxes whose width is at most w + 1.
    The complexity class LOGCFL can be defined in terms of nondeterministic auxiliary
pushdown automata (NAuxPDAs) [7], which are nondeterministic Turing machines with
an additional work tape constrained to operate as a pushdown store. Sudborough [19]
proved that LOGCFL coincides with the class of problems that are solved by NAuxPDAs
running in logarithmic space and polynomial time (the space on the pushdown tape is
not subject to the logarithmic bound).
    We call an NDL query (Π, G) skinny if the body of any clause in Π has ≤ 2 atoms.
Lemma 3. For any skinny NDL query (Π, G(x)) and ABox A, query evaluation can be
done by an NAuxPDA in space log |Π| + w(Π, G) · log |A| and time 2O(d(Π,G)) .
              a
Proof. Let ΠA    be the set of ground clauses obtained by first replacing each parameter
in Π by the corresponding constant from a, and then performing the standard grounding
of Π using the constants from A. Consider the monotone Boolean circuit C(Π, A, a)
constructed as follows. The output of C(Π, A, a) is G(a). For every atom γ occurring
                              a
in the head of a clause in ΠA   , we take an OR-gate whose output is γ and inputs are the
bodies of the clauses with head γ; for every such body, we take an AND-gate whose inputs
are the atoms in the body. We set an input gate γ to 1 iff γ ∈ A. Clearly, C(Π, A, a) is
a semi-unbounded fan-in circuit (where OR-gates have arbitrarily many inputs, and AND-
gates two inputs) with O(|Π| · |A|w(Π,G) ) gates and depth O(d(Π, G)). It is known that
the nonuniform analog of LOGCFL can be defined using families of semi-unbounded fan-
in circuits of polynomial size and logarithmic depth. Moreover, there is an algorithm that,
given such a circuit C, computes the output using an NAuxPDA in logarithmic space
in the size of C and exponential time in the depth of C [20, pp. 392–397]. Observing
that C(Π, A, a) can be computed by a deterministic logspace Turing machine, we
conclude that the query evaluation problem can be solved by an NAuxPDA in space
log |Π| + w(Π, G) · log |A| and time 2O(d(Π,G)) .                                       q

    A function ν from the predicate names in Π to N is a weight function for an NDL-
query (Π, G(x)) if ν(P ) > 0, for any IDB P in Π, and ν(P ) ≥ ν(Q1 ) + · · · + ν(Qn ),
for any P (z) ← Q1 (z 1 ) ∧ · · · ∧ Qn (z n ) in Π.

Lemma 4. If (Π, G(x)) has a weight function ν, then it is equivalent to a skinny NDL
query (Π 0 , G(x)) such that |Π 0 | is polynomial in |Π|, d(Π 0 , G) ≤ d(Π, G) + log ν(G)
and w(Π 0 , G) ≤ w(Π, G).

Proof. The proof is by induction on d(Π, G). If d(Π, G) = 0, we take Π 0 = Π.
Suppose Π contains a clause ψ of the form G(z) ← P1 (z 1 ) ∧ · · · ∧ Pk (z k ) and, for
each 1 ≤ j ≤ k, we have an NDL query (ΠP0 j , Pj ) which is equivalent to (Π, Pj ) and
such that

        d(ΠP0 j , Pj ) ≤ d(ΠPj , Pj ) + log ν(Pj ) ≤ d(Π, G) − 1 + log ν(Pj ).            (1)

We construct the Huffman tree [11] for the alphabet {1, . . . , k}, where the frequency
of j is ν(Pj )/ν(G) (by definition, ν(G) > 0). The Huffman tree is binary and has k
leaves, denoted 1, . . . , k, and k − 1 internal nodes (including the root, g), and the length
of the path from g to any leaf j at most dlog(ν(G)/ν(Pj ))e. For each internal node v
of the tree (but the root), we take a predicate Pv (z v ), where z v is the union of z u for
all descendants u of v; for the root g, we take Pg (z g ) = G(z). Let Πψ0 be the extension
of the union of ΠP0 j , for 1 ≤ j ≤ k, with clauses Pv (z v ) ← Pu1 (z u1 ) ∧ Pu2 (z u2 ), for
each v with immediate successors u1 and u2 . The number of the new clauses is k − 1.
Consider the NDL query (Πψ0 , G(z)). By (1), we have:

  d(Πψ0 , G) ≤ maxj {dlog(ν(G)/ν(Pj ))e + d(ΠP0 j , Pj )} ≤
          maxj {log(ν(G)/ν(Pj )) + d(Π, G) + log ν(Pj )} = log ν(G) + d(Π, G).

Let Π 0 be the result of applying this transformation to each clause in Π with head G(z).
It is readily seen that (Π 0 , G) is as required; in particular, |Π 0 | = O(|Π|2 ).   q

Theorem 5. Fix c ≥ 1, w ≥ 1 and a polynomial p. Query evaluation for NDL queries
(Π, G(x)) with a weight function ν such that ν(G) ≤ p(|Π|), w(Π, G) ≤ w and
d(Π, G) ≤ c log ν(G) is in LOGCFL for combined complexity.
Proof. By Lemma 4, (Π, G) is equivalent to a skinny NDL query (Π 0 , G0 ) with |Π 0 |
polynomial in |Π|, w(Π 0 , G) ≤ w, and d(Π 0 , G0 ) ≤ (c + 1) log ν(G). By Lemma 3,
query evaluation for (Π 0 , G0 ) over A is solved by an NAuxPDA in space log |Π 0 | +
                                                              0  0
w(Π 0 , G) · log |A| = O(log |Π| + log |A|) and time 2O(d(Π ,G )) ≤ 2O(log ν(G)) =
(ν(G))O(1) ≤ p0 (|Π|), for some polynomial p0 .                                    q

Corollary 6. Suppose there is an algorithm that, given any OMQ Q(x) from some
class C, constructs its NDL-rewriting (Π, G(x)) over H-complete ABoxes having
a weight function ν with ν(G) ≤ |Q| and d(Π, G) ≤ c log ν(G), and such that
w(Π, G) ≤ w and |Q| ≤ |Π| ≤ p(|Q|), for some fixed constants c, w and polyno-
mial p. Then the evaluation problem for the NDL-rewritings (Π ∗ , G∗ (x)) of the OMQs
in C over arbitrary ABoxes (defined in Section 2) is in LOGCFL for combined complexity.


4     Bounded Treewidth CQs and Bounded-Depth TBoxes

With every CQ q, we associate its Gaifman graph G whose vertices are the variables of q
and edges are the pairs {u, v} such that P (u, v) ∈ q, for some P . We call q tree-shaped
if G is a tree; q is connected if the graph G is connected. A tree decomposition of an
undirected graph G = (V, E) is a pair (T, λ), where T is an (undirected) tree and λ a
function from the set of nodes of T to 2V such that the following conditions hold:

    – for every v ∈ V , there exists a node t with v ∈ λ(t);
    – for every e ∈ E, there exists a node t with e ⊆ λ(t);
    – for every v ∈ V , the nodes {t | v ∈ λ(t)} induce a connected subtree of T .

We call the set λ(t) ⊆ V a bag for t. The width of (T, λ) is maxt∈T |λ(t)| − 1. The
treewidth of a graph G is the minimum width over all tree decompositions of G. The
treewidth of a CQ is the treewidth of its Gaifman graph.

Example 7. Consider CQ q(x0 , x7 ) depicted below (black nodes are answer variables):
            R             S        R         R             S        R         R
       x0        x1           x2        x3        x4           x5        x6          x7
Its natural tree decomposition of treewidth 1 is based on the the chain T of 7 vertices,
which are represented as bags as follows:
            x1         x2          x3        x4        x5           x6        x7

            R         S            R         R         S            R         R

            x0         x1          x2        x3        x4           x5        x6


    Fix a connected CQ q(x) and a tree decomposition (T, λ) of its Gaifman graph
G = (V, E). Let D be a subtree of T . The size of D is the number of nodes in it. We call
a node t of D boundary if T has an edge {t, t0 } with t0 ∈
                                                         / D, and let the degree deg(D)
of D be the number of its boundary nodes. Note that T itself is the only subtree of T of
degree 0. We say that a node t splits D into subtrees D1 , . . . , Dk if the Di partition D
without t: each node of D except t belongs to exactly one Di .
Lemma 8 ([3]). Let D be a subtree of T of size m > 1.
If deg(D) = 2, then there is a node t splitting D into subtrees of size ≤ m/2 and
degree ≤ 2 and, possibly, one subtree of size < m − 1 and degree 1.
If deg(D) ≤ 1, then there is t splitting D into subtrees of size ≤ m/2 and degree ≤ 2.
   In Example 7, t splits T into T1 and T2 depicted below:
         x1         x2    T 1   x3        x4 t      x5   T   2   x6         x7

         R          S           R         R          S           R         R

          x0         x1         x2         x3         x4         x5         x6


    We define recursively a set sub(T ) of subtrees of T , a binary relation ≺ on sub(T )
and a function σ on sub(T ) indicating the splitting node. We begin by adding T to
sub(T ). Take D ∈ sub(T ) that has not been split yet. If D is of size 1 then let σ(D)
be the only node of D. Otherwise, by Lemma 8, we find a node t in D that splits it
into D1 , . . . , Dk . We set σ(D) = t and, for each 1 ≤ i ≤ k, add Di to sub(T ) and set
Di ≺ D; then, we apply the procedure recursively to each of D1 , . . . , Dk . In Example 7
with t splitting T , we have σ(T ) = t, T1 ≺ T and T2 ≺ T .
    For each D ∈ sub(T ), we recursively define a set of atoms q D by taking
                                                           [
                   q D = S(v) ∈ q | v ⊆ λ(σ(D)) ∪                0
                                                                     q D0 .
                                                              D ≺D

By the definition of tree decomposition, q T = q. Denote by xD the subset of avar(q)
that occur in q D . In our running example, xT = {x0 , x7 }, xT1 = {x0 } and xT2 = {x7 }.
Denote by ∂D the union of all λ(t) ∩ λ(t0 ) for a boundary node t of D and its unique
neighbour t0 in T outside D. In our example, ∂T = ∅, ∂T1 = {x3 } and ∂T2 = {x4 }.
    Let T be a TBox of finite depth k. A type is a partial map w from V to WT ; its
domain is denoted by dom(w). By ε we denote the unique partial type with dom(ε) = ∅.
We use types to represent how variables are mapped into CT ,A , with w(u) = w indi-
cating that u is mapped to an element of the form aw (for some a ∈ ind(A)), and with
w(u) = ε that u is mapped to an ABox individual. We say that a type w is compatible
with a bag t if, for all u, v ∈ λ(t) ∩ dom(w), we have
 – if v ∈ avar(q), then w(v) = ε;
 – if A(v) ∈ q, then either w(v) = ε or w(v) = wR with ∃R− vT A;
 – if R(v, u) ∈ q, then w(v) = w(u) = ε, or w(u) = w(v)R0 with R0 vT R, or
   w(v) = w(u)R0 with R0 vT R− .
    In the sequel, we abuse notation and use sets of variables in place of sequences
assuming that they are ordered in some (fixed) way. For example, we use xD for a tuple
of variables in the set xD (ordered in some way). Also, given a tuple a in ind(A) of
length |xD | and x ∈ xD , we write a(x) to refer to the element of a that corresponds to
x (that is, to the component of the tuple with the same index).
    Let ΠQ be an NDL program that, for any D ∈ sub(T ), types w and s such that
dom(w) = ∂D, dom(s) = λ(σ(D)), s is compatible with σ(D) and agrees with w on
their common domain, contains the clause
                                                     (s∪w)∂D 0
                                          ^
               GwD (∂D, xD ) ← At ∧
                                     s
                                              0
                                                   GD0          (∂D0 , xD0 ),        (2)
                                              D ≺D
                                                           0
where xD are the parameters of predicate Gw                                  1
                                          D , (s ∪ w)  ∂D is the restriction of the
                             0        s
union s ∪ w of s and w to ∂D , and At is defined as follows:
            ^                ^                    ^                 ^
 Ats =          A(u) ∧           R(u, v) ∧            (u = v) ∧          AS (u). (3)
              A(u)∈q              R(u,v)∈q                 R(u,v)∈q                 s(u)=Sw0
              s(u)=ε            s(u)=s(v)=ε             s(u)6=ε or s(v)6=ε          for some w0


The first two conjunctions in Ats ensure that atoms all of whose variables are assigned
ε are present in the ABox. The third conjunction ensures that if one of the variables in
a role atom is not mapped to ε, then the images of the variables share the same initial
individual. Finally, atoms in the final conjunction ensure that if a variable is to be mapped
to aSw0 , then the individual a satisfies ∃S (so aSw0 is part of the domain of CT ,A ).

Example 9. Now we fix an ontology T with the following axioms:

      A ≡ ∃P,        P v S,        P v R− ,              B ≡ ∃Q,             Q v R,       Q v S−.

Since λ(t) = {x3 , x4 }, there are only three types compatible with t:

      s1 : x3 7→ ε, x4 7→ ε,        s2 : x3 7→ P, x4 7→ ε        and         s3 : x3 7→ ε, x4 7→ Q.

So, Ats1 = R(x3 , x4 ), Ats2 = A(x3 ) ∧ (x3 = x4 ), Ats3 = B(x4 ) ∧ (x3 = x4 ). Thus,
predicate GεT is defined by the following clauses, for s1 , s2 and s3 , respectively:

         GεT (x0 , x7 ) ← GxT31 7→ε (x3 , x0 ) ∧ R(x3 , x4 ) ∧ GxT42 7→ε (x4 , x7 ),
         GεT (x0 , x7 ) ← GxT31 7→P (x3 , x0 ) ∧ A(x3 ) ∧ (x3 = x4 ) ∧ GTx42 7→ε (x4 , x7 ),
         GεT (x0 , x7 ) ← GxT31 7→ε (x3 , x0 ) ∧ B(x4 ) ∧ (x3 = x4 ) ∧ GTx42 7→Q (x4 , x7 ).

      By induction on ≺ on sub(T ), we show that (ΠQ , GεT ) is a rewriting of Q(x).

Lemma 10. For any ABox A, any D ∈ sub(T ), any type w with dom(w) = ∂D,
any b ∈ ind(A)|∂D| and a ∈ ind(A)|xD | , we have ΠQ , A |= Gw
                                                            D (b, a) iff there is a
homomorphism h : q D → CT ,A such that

           h(x) = a(x), for x ∈ xD ,           and     h(v) = b(v)w(v), for v ∈ ∂D.

    Fix now k and t, and consider the class of OMQs Q(x) = (T , q(x)) with T of
depth ≤ k and q of treewidth ≤ t. Let T be a tree decomposition of q of treewidth ≤ t.
We take the following weight function: ν(Gw                              ε
                                                D ) = |D|. Clearly, ν(GT ) ≤ |Q|. By
Lemma 8, d(ΠQ , GT ) ≤ 2 log |T | = 2 log ν(GT ) ≤ 2 log |Q|. Since |sub(T )| ≤ |T |2
                    ε                             ε

and there are at most |T |2tk options for w, there are polynomially many predicates GwD
and so, ΠQ is of polynomial size. Thus, by Corollary 6, the obtained NDL-rewriting over
arbitrary ABoxes can be evaluated in LOGCFL. Finally, we note that a tree decomposition
of treewidth ≤ t can be computed using an LLOGCFL -transducer [10], and so the NDL-
rewriting can also be constructed by an LLOGCFL -transducer.
 1
     By construction, dom(s ∪ w) covers ∂D0 , and so the domain of (s ∪ w)  ∂D0 is ∂D0 .
5     Bounded-Leaf CQs and Bounded-Depth TBoxes
We next consider OMQs with tree-shaped CQs in which both the depth of the ontology
and the number of leaves in the CQ are bounded. Let T be a TBox of finite depth k, and
let q(x) be a tree-shaped CQ with at most ` leaves. Fix one of the variables of q as root,
and let M be the maximal distance to a leaf from the root. For n ≤ M , let z n denote
the set of all variables of q at distance n from the root; clearly, |z n | ≤ `. We call the
z n slices of q and observe that they satisfy the following: for every R(u, v) ∈ q with
u 6= v, there exists 0 ≤ n < M such that either u ∈ z n and v ∈ z n+1 or u ∈ z n+1 and
v ∈ z n . For 0 ≤ n ≤ M ,Swe denote by q n (z n∃ , xn ) the query consisting of all atoms
S(u) of q such that u ⊆ n≤m≤M z m , where xn = var(q n ) ∩ x and z n∃ = z n \ x.
    By type of a slice z n , we mean a total map w from z n to WT . Analogously to
Section 4, we define what it means for a type (or pair of types) to be compatible with a
slice (pair of adjacent slices). We call w locally compatible with z n if for every z ∈ z n :
    – if z ∈ avar(q), then w(z) = ε;
    – if A(z) ∈ q, then either w(z) = ε or w(z) = wR with ∃R− vT A;
    – if R(z, z) ∈ q, then w(z) = ε.
If w, s are types for z n and z n+1 respectively, then we call (w, s) compatible with
(z n , z n+1 ) if w is locally compatible with z n , s is locally compatible with z n+1 , and
for every atom R(z n , z n+1 ) ∈ q, one of the following holds: (i) w(z n ) = s(z n+1 ) = ε,
(ii) s(z n+1 ) = w(z n )R0 with R0 vT R, or (iii) w(z n ) = s(z n+1 )R0 with R0 vT R− .
                                     0
     Consider the NDL program ΠQ       defined as follows. For every 0 ≤ n < M and every
pair of types (w, s) that is compatible with (z n , z n+1 ), we include the clause:
               Pnw (z n∃ , xn ) ← Atw∪s (z n , z n+1 ) ∧ Pn+1
                                                          s
                                                              (z n+1
                                                                 ∃   , xn+1 ),
where xn are the parameters of Pnw and Atw∪s (z n , z n+1 ) is the conjunction of atoms (3),
as defined in Section 4, for the union w ∪ s of types w and s. For every type w locally
compatible with z M , we include the clause:
                                w
                               PM (z M
                                     ∃ ,x
                                          M
                                            ) ← Atw (z M ).
(Recall that z M is a disjoint union of z M∃  and xM .) We use G, with parameters x, as
the goal predicate and include G(x) ← P0w (z 0 , x) for every predicate P0w (z 0 , x0 )
occurring in the head of one of the preceding clauses.
     The following lemma (which is proved by induction) is the key step in showing that
   0
(ΠQ   , G(x)) is a rewriting of (T , q) over H-complete ABoxes:
Lemma 11. For any H-complete ABox A, any 0 ≤ n ≤ M , any predicate Pnw , any
            n                        n
b ∈ ind(A)|z∃ | and any a ∈ ind(A)|x | , we have ΠQ
                                                  0
                                                    , A |= Pnw (b, a) iff there is a
homomorphism h : q n → CT ,A such that
          h(x) = a(x), for x ∈ xn ,       and    h(z) = b(z)w(z), for z ∈ z n∃ .         (4)
                              0
     It should be clear that ΠQ is a linear NDL program of width at most 2`. Moreover,
when ` and k are bounded by fixed constants, it takes only logarithmic space to store a
                                        0
type w, which allows us to show that ΠQ    can be computed by an LNL -transducer. We
can apply Lemma 2 to obtain an NDL rewriting for arbitrary ABoxes, and then use
Theorem 1 to conclude that the resulting program can be evaluated in NL.
6      Bounded-Leaf CQs and Arbitrary TBoxes
For OMQs with bounded-leaf CQs and ontologies of unbounded depth, our rewriting
utilises the notion of tree witness [15]. Let Q(x) = (T , q(x)) with q(x) = ∃y ϕ(x, y).
For a pair t = (tr , ti ) of disjoint sets of variables in q, with ti ⊆ y and ti 6= ∅, set
                                 
                         q t = S(z) ∈ q | z ⊆ tr ∪ ti and z 6⊆ tr .

If q t is a minimal subset of q for which there is a homomorphism h : q t → CTAR (a) such
that tr = h−1 (a) and q t contains every atom of q with at least one variable from ti , then
we call t = (tr , ti ) a tree witness for Q generated by R. Note that the same tree witness
t = (tr , ti ) can be generated by different roles R. By q \ t we denote the query obtained
from q by removing the atoms in q t and having x ∪ tr as answer variables, and for every
v ∈ var(q), we let TWQ (v) denote the set of all tree witnesses t for Q such that v ∈ ti .
     The logarithmic depth NDL-rewriting for bounded-leaf queries and ontologies of
unbounded depth is based upon the following observation.
Lemma 12. Every tree T of size m has a node splitting it into subtrees of size ≤ m/2.
    We will use repeated applications of this lemma to decompose the input CQ into
smaller and smaller subqueries. Formally, for every tree-shaped CQ q, we use vq to
denote a vertex in the Gaifman graph G of q that satisfies the condition of Lemma 12.
Then, starting from an OMQ Q0 = (T , q 0 (x0 )), we define SQQ0 as the smallest set of
queries that contains q 0 (x0 ) and is such that for every q(z) ∈ SQQ0 with |var(q)| > 1,
the following queries also belong to SQQ0 :
    – for every ui that is adjacent to vq in G: the query q i (z i ) consisting of all role atoms
      linking vq and ui , as well as all atoms whose variables cannot reach vq in G without
      passing by ui , and with z i equal to (z ∩ var(q i )) ∪ {vq };
    – for every t ∈ TWQ0 (vq ) with tr 6= ∅: the queries q t1 (z t1 ), . . . , q tmt (z tmt ) correspond-
      ing to the connected components of q \ t, with z ti equal to var(q ti ) ∩ avar(q \ t).
                    00
The NDL program ΠQ     0
                         uses IDB predicates Pq , for q ∈ SQQ0 , with arity |avar(q)| and
parameters var(q) ∩ x. For each q(z) ∈ SQQ0 with |var(q)| > 1, we include the clause
                         ^                 ^                    ^
         Pq (z) ←           A(vq ) ∧           R(vq , vq ) ∧         Pqi (z i ),
                            A(vq )∈q           R(vq ,vq )∈q               1≤i≤n

where q 1 (z 1 ), . . . , q n (z n ) are the subqueries induced by the neighbours of vq in G. We
also include, for every t ∈ TWQ (vq ) with tr 6= ∅ and role R generating t, the clause
                                  ^                    ^                ^
             Pq (z) ←                   (u = u0 ) ∧        AR (u) ∧          Pqti (z ti )
                              u,u0 ∈tr                u∈tr              1≤i≤mt

where q t1 , . . . , q tmt are the connected components of q \ t. For every q ∈ SQQ0 with
|var(q)| = 1, we include the clause Pq (z) ← q. Finally, if q 0 is Boolean, we include
clauses Pq0 ← A(x) for all concept names A such that T , {A(a)} |= q 0 .
                           00
    The program ΠQ            0
                                is inspired by a similar construction from [12]. By adapting results
                                                      00
from the latter paper, we can show that (ΠQ              0
                                                           , Pq0 (x)) is indeed a rewriting:
Lemma 13. For any tree-shaped OMQ Q0 (x) = (T , q 0 (x)), any q(z) ∈ SQQ0 , any
                                               00
H-complete ABox A, and any tuple a in ind(A), ΠQ  0
                                                    , A |= Pq (a) iff there exists a
homomorphism h : q → CT ,A such that h(z) = a.
    Now fix ` > 1, and consider the class of OMQs Q(x) = (T , q(x)) with tree-shaped
                                                   00
q(x) having at most ` leaves. The size of ΠQ          is polynomially bounded in |Q|, since
bounded-leaf CQs have polynomially many tree witnesses and also polynomially many
tree-shaped subCQs. It is readily seen that the function ν defined by setting ν(Pq0 ) = |q 0 |
                               00
is a weight function for (ΠQ      , Pq ) such that ν(Pq ) ≤ |Q|. Moreover, by Lemma 12,
d(Π, G) ≤ log ν(Pq ) + 1. We can thus apply Corollary 6 to conclude that the obtained
NDL-rewritings can be evaluated in LOGCFL. Finally, we note that since the number
of leaves is bounded, it is in NL to decide whether a vertex satisfies the conditions of
Lemma 12, and it is in LOGCFL to decide whether T , {A(a)} |= q 0 [3] or whether a
(logspace) representation of a possible tree witness is indeed a tree witness. This allows
                   00
us to show that (ΠQ   , Pq ) can be generated by an LLOGCFL -transducer.

7     Conclusions
As shown above, for three important classes of OMQs, NDL-rewritings can be con-
structed and evaluated by theoretically optimal NL and LOGCFL algorithms. To see
whether these rewritings are viable in practice, we generated three sequences of OMQs
with the ontology from Example 9 and linear CQs of up to 15 atoms as in Example 7. We
compared our NL and LOGCFL rewritings from Sections 5 and 4 (called L IN and L OG)
with those produced by Clipper [8] and Rapid [6]. The barcharts below show the number
of100clauses in the rewritings over H-complete ABoxes. While L IN and L OG grow linearly
(in accord with theory), Clipper and Rapid failed to produce rewritings for longer CQs.

 50
                              100
 25
 10
      1     2     3   4   5   650 7     8           100 13 14 15
                                            9 10 11 12

                              25
                              10
                                                        50
                  Clipper           1   2   3   4   5   6 7   8   9 10 11 12 13 14 15
                Rapid                                   25
           L OG                                         10
    L IN
                                                          1   2   3   4   5   6   7   8   9 10 11 12 13 14 15
We evaluated the rewritings over a few randomly generated ABoxes using off-the-shelf
datalog engine RDFox [17]. The experiments (see the full version) show that our rewrit-
ings are usually executed faster than Clipper’s and Rapid’s when the number of answers
is relatively small (. 104 ); for queries with & 106 answers, the execution times are
comparable. The version of RDFox we used did not seem to take advantage of the
structure of the NL/LOGCFL rewritings, and it would be interesting to see whether their
nonrecursiveness and parallelisability can be utilised to produce efficient execution plans.
Acknowledgements: The first author was supported by contract ANR-12-JS02-007-01,
the fourth by the Russian Foundation for Basic Research and the grant MK-7312.2016.1.
References

 1. Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases. Addison-Wesley (1995)
 2. Baader, F., Calvanese, D., McGuinness, D., Nardi, D., Patel-Schneider, P. (eds.): The Descrip-
    tion Logic Handbook: Theory, Implementation and Applications. Cambridge University Press
    (2003)
 3. Bienvenu, M., Kikot, S., Podolskii, V.V.: Tree-like queries in OWL 2 QL: succinctness
    and complexity results. In: Proc. of the 30th Annual ACM/IEEE Symposium on Logic in
    Computer Science (LICS 2015). pp. 317–328. ACM (2015)
 4. Calı̀, A., Gottlob, G., Lukasiewicz, T.: A general datalog-based framework for tractable query
    answering over ontologies. Journal of Web Semantics 14, 57–83 (2012)
 5. Calvanese, D., De Giacomo, G., Lembo, D., Lenzerini, M., Rosati, R.: Tractable reasoning
    and efficient query answering in description logics: the DL-Lite family. Journal of Automated
    Reasoning 39(3), 385–429 (2007)
 6. Chortaras, A., Trivela, D., Stamou, G.: Optimized query rewriting for OWL 2 QL. In: Proc.
    of CADE-23. LNCS, vol. 6803, pp. 192–206. Springer (2011)
 7. Cook, S.A.: Characterizations of pushdown machines in terms of time-bounded computers.
    Journal of the ACM 18(1), 4–18 (1971)
 8. Eiter, T., Ortiz, M., Šimkus, M., Tran, T.K., Xiao, G.: Query rewriting for Horn-SHIQ plus
    rules. In: Proc. of the 26th AAAI Conf. on Artificial Intelligence (AAAI 2012). pp. 726–733.
    AAAI (2012)
 9. Gottlob, G., Kikot, S., Kontchakov, R., Podolskii, V.V., Schwentick, T., Zakharyaschev, M.:
    The price of query rewriting in ontology-based data access. Artificial Intelligence 213, 42–59
    (2014)
10. Gottlob, G., Leone, N., Scarcello, F.: Computing LOGCFL certificates. In: Proc. of the 26th
    Int. Colloquium on Automata, Languages and Programming (ICALP-99). LNCS, vol. 1644,
    pp. 361–371. Springer (1999)
11. Huffman, D.A.: A method for the construction of minimum-redundancy codes. Proc. of the
    Institute of Radio Engineers 40(9), 1098–1101 (1952)
12. Kikot, S., Kontchakov, R., Podolskii, V., Zakharyaschev, M.: On the succinctness of query
    rewriting over shallow ontologies. In: Proc. of the 29th Annual ACM/IEEE Symposium on
    Logic in Computer Science (LICS 2014). ACM (2014)
13. Kikot, S., Kontchakov, R., Podolskii, V.V., Zakharyaschev, M.: Exponential lower bounds and
    separation for query rewriting. In: Proc. of the 39th Int. Colloquium on Automata, Languages
    and Programming (ICALP 2012). LNCS, vol. 7392, pp. 263–274. Springer (2012)
14. Kikot, S., Kontchakov, R., Zakharyaschev, M.: On (in)tractability of OBDA with OWL 2 QL.
    In: Proc. of the 24th Int. Workshop on Description Logics (DL 2011). vol. 745, pp. 224–234.
    CEUR-WS (2011)
15. Kikot, S., Kontchakov, R., Zakharyaschev, M.: Conjunctive query answering with OWL 2 QL.
    In: Proc. of the 13th Int. Conf. on Principles of Knowledge Representation and Reasoning
    (KR 2012). pp. 275–285. AAAI (2012)
16. Kontchakov, R., Lutz, C., Toman, D., Wolter, F., Zakharyaschev, M.: The combined approach
    to query answering in DL-Lite. In: Proc. of the 12th Int. Conf. on Principles of Knowledge
    Representation and Reasoning (KR 2010). AAAI Press (2010)
17. Nenov, Y., Piro, R., Motik, B., Horrocks, I., Wu, Z., Banerjee, J.: RDFox: A highly-scalable
    RDF store. In: Proc. of the 14th Int. Semantic Web Conf. (ISWC 2015), Part II. LNCS, vol.
    9367, pp. 3–20. Springer (2015)
18. Poggi, A., Lembo, D., Calvanese, D., De Giacomo, G., Lenzerini, M., Rosati, R.: Linking
    data to ontologies. Journal on Data Semantics X, 133–173 (2008)
19. Sudborough, I.H.: On the tape complexity of deterministic context-free languages. Journal of
    the ACM 25(3), 405–414 (1978)
20. Venkateswaran, H.: Properties that characterize LOGCFL. Journal of Computer and System
    Sciences 43(2), 380–404 (1991)