Ontology-Mediated Querying with EL:
    Trichotomy and Linear Datalog Rewritability

                          Carsten Lutz and Leif Sabellek

         Department of Computer Science, University of Bremen, Germany


       Abstract. We consider ontology-mediated queries (OMQs) based on
       an EL ontology and an atomic query (AQ), provide an ultimately fine-
       grained analysis of data complexity and study rewritability into linear
       Datalog – aiming to capture linear recursion in SQL. Our main results
       are that every such OMQ is in AC0 , NL-complete or PTime-complete,
       and that containment in NL coincides with rewritability into linear
       Datalog (whereas containment in AC0 coincides with rewritability into
       first-order logic). We establish natural characterizations of the three
       cases, show that deciding linear Datalog rewritability (as well as the
       mentioned complexities) is ExpTime-complete, give a way to construct
       linear Datalog rewritings when they exist, and prove that there is no
       constant bound on the arity of IDB relations in linear Datalog rewritings.


1    Introduction

An important application of ontologies is to enrich data with a semantics and
with domain knowledge while also providing additional vocabulary for query
formulation [10, 20, 6, 9]. The combination of a traditional database query and an
ontology can be viewed as a compound query, commonly referred to as an ontology-
mediated query (OMQ). Substantial research efforts have been invested into
studying OMQs based on description logic (DL) ontologies, with two dominating
topics being the data complexity of OMQs [17, 21, 24, 11] and their rewritability
into more standard database query languages such as SQL (which in this context is
often equated with first-order logic) and Datalog [23, 14, 8, 6, 19, 25, 15]. While the
former topic aims to understand the feasibility of OMQs from a theoretical angle,
the latter is inspired by rather practical concerns: since most database systems are
unaware of ontologies, rewriting OMQs into standard query languages provides
an important avenue for implementing OMQ execution in practical applications;
however, a major challenge emerges from the fact that the desired rewritings are
typically not guaranteed to always exist, though they often do exist in practically
relevant cases. Both topics are thoroughly intertwined since rewritability into
first-order logic (FO) is closely related to AC0 data complexity while rewritability
into Datalog is closely related to PTime data complexity.
    Modern DLs can roughly be divided into two families: ‘expressive DLs’ such
as ALC and SHIQ which typically have coNP data complexity and where
rewritability is guaranteed neither into FO nor into Datalog [6, 25, 15], and ‘Horn
DLs’ such as EL and Horn-SHIQ which typically have PTime data complexity
and where rewritability into Datalog is guaranteed, but FO-rewritability is not
[8, 16, 7] (with the notable exception of DL-Lite [10]). In this paper, we consider
the OMQ language (EL, AQ) where the ontology is formulated in the Horn
description logic EL and where the actual queries are atomic queries (AQs) of
the form A(x), studying data complexity, rewritability, and their relations. Our
actual contribution is two-fold.
     First, we carry out an ultimately fine-grained analysis of data complexity. In
fact, we establish a trichotomy, showing that every OMQ from (EL, AQ) is in
AC0 , NL-complete, or PTime-complete, a remarkable sparseness of complexities.
We also establish elegant characterizations that separate the three classes of
OMQs. In particular, we show that an OMQ Q is in NL if there is a bound k
such that any minimal tree-shaped ABox A whose root is an answer to the OMQ
Q does not contain a full binary tree of depth k as a minor, and PTime-hard
otherwise. We additionally use a second, more operational characterization to
determine the precise complexity of deciding whether a given OMQ is in AC0 ,
NL-complete, or PTime-complete, which turns out to be ExpTime-complete.
     And second, we put rewritability into linear Datalog onto the agenda of
OMQ research. In fact, the equation “SQL = FO” often adopted in this area
ignores the fact that SQL contains linear recursion from its version 3 published
in 1999 on, which exceeds the expressive power of FO. We believe that, in the
context of OMQs, linear Datalog is a natural abstraction of SQL that includes
linear recursion, despite the fact that it does not contain full FO. Indeed, all
OMQs from (EL, AQ) that are FO-rewritable are also rewritable into a union of
conjunctive queries (UCQ) and thus into linear Datalog (and the same is true
for much more expressive OMQ languages) [6]. This shows that the expressive
power of FO that lies outside of linear Datalog is not useful when using SQL
as a target language for OMQ rewriting. We prove that rewritability into linear
Datalog coincides with containment in NL. By what was said above, it is thus
ExpTime-complete to decide whether a given OMQ is rewritable. Moreover,
we show how to construct linear Datalog rewritings when they exist and prove
that there is no constant bound on the arity of IDB relations in linear Datalog
rewritings.
     This paper is a workshop version of [22]. For proof details, we refer the
reader to the appendix of that paper, available at http://www.informatik.uni-
bremen.de/tdki/research/papers.html.


2   Preliminaries

Let NC , NR , and NI be countably infinite sets of concept names, role names,
and individual names. An EL-concept is built according to the syntax rule
C, D ::= > | A | C u D | ∃r.C where A ranges over concept names and r over
role names. An EL-TBox is a finite set of concept inclusions (CIs) of the form
C v D, C and D EL-concepts. The size of T , denoted |T |, is the number of
symbols needed to write T , concept and role names counting as one symbol.
    An ABox is a finite set of concept assertions A(a) and role assertions r(a, b)
where A is a concept name, r a role name, and a, b individual names. We use
Ind(A) to denote the set of individuals of the ABox A. A signature is a set of
concept and role names. An ABox that only uses concept and role names from a
signature Σ is a Σ-ABox. The semantics of DLs is defined in the usual way.

     An atomic query (AQ) takes the form A(x), A a concept name. An ontology-
mediated query (OMQ) is a triple Q = (T , Σ, A(x)) with T a TBox, Σ an ABox
signature, and A(x) an AQ. We assume w.l.o.g. that A occurs in T . Let A
be a Σ-ABox. We write A |= Q(a) and say that a is an answer to Q on A if
A, T |= A(a). The evaluation problem for Q is to decide, given a Σ-ABox A and
an a ∈ A, whether A |= Q(a). With the complexity of an OMQ Q, we generally
mean its evaluation problem. We use (EL, AQ) to denote the set of all OMQs
(T , Σ, A(x)) where T is an EL-TBox. All OMQs in (EL, AQ) are in PTime [24,
21].

    We use standard notation for Datalog programs, see for example [1]. A Datalog
program is linear if each rule body contains at most one IDB relation. The width
of a Datalog program is the maximum arity of non-goal IDB relations used in
it and its diameter is the maximum number of variables that occur in a rule
in Π. A Datalog program Π over EDB signature Σ is a rewriting of an OMQ
Q = (T , Σ, A(x)) if for all Σ-ABoxes A and all a ∈ Ind(A), we have A |= Q(a)
iff A |= Π(a). We say that Q is (linear) Datalog-rewritable if there is a (linear)
Datalog program that is a rewriting of Q.

   Throughout the paper, we assume w.l.o.g. and without further notice that
TBoxes are in normal form, that is, they contain only concept inclusions of the
form ∃r.A1 v A2 , > v A1 , A1 u A2 v A3 , A1 v ∃r.A2 where all Ai are concept
names [3].

     We shall often deal with ABoxes that are tree-shaped. By a tree, we gen-
erally mean a directed (unlabelled) tree T = (V, E), defined in the usual way.
Every ABox gives rise to a directed graph GA = (Ind(A), {(a, b) | r(a, b) ∈
A for some r}). We say that A is tree-shaped if GA is a tree and r(a, b), s(a, b) ∈ A
implies r = s. We introduce some further standard graph theoretic notions for
ABoxes. A homomorphism from an ABox A1 to an ABox A2 is a total func-
tion h : Ind(A1 ) → Ind(A2 ) such that A(a) ∈ A1 implies A(h(a)) ∈ A2 and
r(a, b) ∈ A1 implies r(h(a), h(b)) ∈ A2 . We write A1 → A2 if there is a homomor-
phism from A1 to A2 . A directed graph G = (V, E) is a minor of an ABox A if
G is a minor of GA , that is, if G can be obtained from GA by deleting edges and
vertices and by contracting edges. A path decomposition of a directed graph (V, E)
is a sequence S1 , . . . , Sn of subsets of V such that for every (a, b) ∈ E there is a
set Si with a, b ∈ Si and Si ∩ Sk ⊆ Sj , for all i ≤ j ≤ k. A path decomposition is
an (`, k)-path decomposition if k = maxni=1 |Si | and ` = maxn−1  i=1 |Si ∩ Si+1 |. The
pathwidth of a directed graph (V, E) is the smallest k such that (V, E) has an
                                               N
(`, k + 1)-path decomposition for some ` ∈ . We identify the pathwidth of an
ABox A with the pathwidth of GA .
3    NL, PTime, Linear Datalog Rewritability
We start with establishing a dichotomy between PTime and NL for evaluating
queries from (EL, AQ), also showing that containment in NL coincides with
rewritability into linear Datalog (unless NL = PTime). The dichotomy is based
on a characterization of containment in NL via a ‘bounded amount of branching’
in ABoxes whose root is an answer to the query. The linear Datalog programs
constructed in the proofs are of unbounded width. We prove a hierarchy theorem
which shows that this is unavoidable.
     Let Q = (T , Σ, A0 (x)) be an OMQ. We say that Q is unboundedly branching
if for every k ≥ 0, there is a tree-shaped Σ-ABox A such that
 1. A, T |= A0 (a), a the root of A, and A is minimal with this property (w.r.t.
    set inclusion)
 2. A has the full binary tree of depth k as a minor.
Otherwise, Q is boundedly branching. In the latter case, the branching limit of
Q is the maximum integer k such that there is a tree-shaped Σ-ABox A that
satisfies Conditions 1 and 2 above. We define the branching limit to be 0 if there
is no tree-shaped Σ-ABox A that satisfies Condition 1.
Example 1. (1) The OMQ Q1 = (T1 , {A, r, s}, A(x)) with T1 = {∃r.A v B1 ,
∃s.A v B2 , B1 u B2 v A} is unboundedly branching as witnessed by the ABoxes
A1 , A2 , . . . where Ai is a full binary tree of depth i, each left successor connected
via the role name r, each right successor via the role name s, and with the concept
name A asserted for each leaf.
(2) The OMQ Q2 = (T2 , {A, r, s}, B12 (x)) with T2 = {∃r.A v B1 , ∃s.A v
B2 , ∃s.B2 v B2 , B1 u B2 v B12 , ∃r.B12 v B1 } is boundedly branching with
branching limit one. In fact, every minimal tree-shaped Σ-ABox whose root is an
answer to Q2 consists of a single r-path with an s-path starting at each non-leaf
node and with A asserted for each leaf. Note that the number of individuals at
which a branching occurs is unbounded in such ABoxes.
The following theorem sums up the results obtained in this section, except for
the width hierarchy, which is Theorem 15 below.
Theorem 2. For every OMQ Q ∈ (EL, AQ), one of the following applies:
 1. Q is PTime-hard and not expressible in linear Datalog;
 2. Q is rewritable into linear Datalog and thus in NL.
Bounded branching of Q implies linear Datalog rewritability and delineates the
two cases.
Note that Theorem 2 implies that any OMQ from (EL, AQ) is linear Datalog
rewritable if and only if it is in NL (unless NL = PTime). It is also interesting
to compare Theorem 2 with the result by [2] that there are Datalog-queries that
are not expressible as a linear Datalog program, but belong to N C 2 and are thus
unlikely to be PTime-hard.
    The proof of Theorem 2 procceds as follows: in subsection 3.1, we prove that
unbounded branching implies PTime-hardness and in subsection 3.2 we show
that bounded branching is equivalent to linear Datalog rewritability.
3.1    Characterizations and PTime-Hardness

Theorem 2 provides a characterization of PTime-hardness in terms of unbounded
branching that is elegant, but does not lend itself to hardness proofs very well. For
this reason, we establish a second characterization designed to enable a reduction
from the PTime-complete path systems accessibility (PSA) problem and show
that both characterizations are equivalent. The new characterization will also be
handy later on to decide the rewritability of OMQs into linear Datalog.
    An instance of PSA takes the form G = (V, E, S, t) where V is a finite set of
nodes, E is a ternary relation on V , S ⊆ V is a set of source nodes, and t ∈ V
is a target node. G is a yes instance if t is accessible, where a node v ∈ V is
accessible if v ∈ S or there are accessible nodes u, w with (u, w, v) ∈ E.
    Before we can state the new characterization, we need some preliminaries.
Let T be a TBox. A T -type is a set t of concept names from T that is closed
under T -consequence, that is, if T |= u t v A, then A ∈ t. For any ABox A
and a ∈ Ind(A), we use tpA,T (a) to denote the set of concept names A from T
such that A, T |= A(a), which is a T -type. If M is a set of concept names, then
by M (a) we denote the ABox {A(a) | A ∈ M }. We also write A, T |= M (a),
meaning that A, T |= A(a) for all A ∈ M . For every tree-shaped ABox A and
a ∈ Ind(A), we use Aa to denote the sub-tree ABox of A that has a as the root.
Moreover, we use Aa to denote A \ Aa , that is, the ABox obtained from A by
removing all assertions that involve descendants of a (making a a leaf) and all
assertions of the form A(a). We also combine these notations, writing for example
Aabc for ((Aa )b )c .

Definition 3. An OMQ (T , Σ, A0 (x)) ∈ (EL, AQ) has the ability to simulate
PSA if there are T -types t0 , t1 and a tree-shaped Σ-ABox A with root a and
distinguished non-root individuals b, c, d where c and d are distinct incomparable
descendants of b such that

 1. A, T |= A0 (a);
 2. t1 = tpA,T (b) = tpA,T (c) = tpA,T (d);
 3. Ab ∪ t0 (b), T 6|= A0 (a);
 4. tpAc ∪t0 (c),T (b) = tpAd ∪t0 (d),T (b) = t0 .

We define Afinish := Ab , A∧ := Abcd and Astart := Ab .


Example 4. The OMQ Q1 from Example 1 has the ability to simulate PSA.
Figure 1 shows a witnessing ABox A according to Definition 3 where t1 =
{A, B1 , B2 } and t0 = {B2 }.

PSA is PTime-hard under FO-reductions [18]. Using a reduction from this probem,
we show that having the ability to simulate PSA is sufficient for PTime-hardness
under FO-reductions. In particular, we use the ABox A∧ from Definition 3 to
implement an “and” gate where t0 and t1 represent the truth values zero and
one to capture the behaviour of the ternary relation E in PSA.
                                                                   a
                                                           r           s
                                           b
                                   r                   s                   A

                           r           s                       A
                       c                   d
                   r           s   r           s
               A           A           A           A

                               Fig. 1. Witness ABox for Example 4

Lemma 5. If Q ∈ (EL, AQ) has the ability to simulate PSA, then Q is PTime-
hard under FO-reductions.
To link Lemma 5 to Theorem 2, we next show that the ability to simulate PSA
is equivalent to unbounded branching.

Proposition 6. Let Q ∈ (EL, AQ). Then Q has the ability to simulate PSA iff
Q is unboundedly branching.
The “⇒” direction is proved by taking an ABox A that witnesses the ability
to simulate PSA and then glueing together disjoint copies of A∧ to obtain tree-
shaped ABoxes whose root is an answer to Q, which are minimal with this
property, and that contain deeper and deeper full binary trees as a minor. The
“⇐” direction is based on a combinatorial argument: if we take a minimal tree-
shaped ABox that makes Q true at the root and contains a deep full binary
tree as a minor, then it must contain an ABox that witnesses the ability to
simulate PSA.


3.2   NL and Linear Datalog-Rewritability

We show that bounded branching characterizes containment in NL as well as
linear Datalog rewritability, which therefore coincide (unless NL = PTime). We
also give a way to construct linear Datalog rewritings when they exist.

Proposition 7. Let Q ∈ (EL, AQ). Then Q is boundedly branching iff Q is
rewritable into a linear Datalog program. Moreover, if the branching limit of Q is
k, then there is a linear Datalog rewriting of width k + 1.

Direction “⇒”. Let Q = (T , Σ, A0 (x)) be an OMQ from (EL, AQ). For each
k > 0, we construct a linear Datalog program ΠQ,k that is sound as a rewriting
of Q and complete on ABoxes that do not have the full binary tree of depth k
as a minor. The program ΠQ,k uses IDB relations of the form Pt1 ,...,tm where
t1 , . . . , tm , are T -types; the arity of this relation is m ≤ k. For any finite set S
of concepts, we use clT (S) to denote the smallest (w.r.t. set inclusion) T -type t
with T |= u S v t. Let N be the set of all concept names from T . The program
ΠQ,k consists of five types of rules:
Start
V       rules: PclT (S) (x) ← S(x) for all S ⊆ N and where S(x) abbreviates
  A∈S  A(x);
Extension rules: Pt1 ,...,tm ,clT (S) (x1 , . . . , xm , y) ← Pt1 ,...,tm (x1 , . . . , xm ) ∧ S(y) for
all S ⊆ N and T -types t1 , . . . , tm ;
Step rules: Pt1 ,...,tm−1 ,t (x1 , . . . , xm−1 , y) ← Pt1 ,...,tm (x1 , . . . , xm ) ∧ r(y, xm ) ∧ S(y)
for all S ⊆ N and T -types t1 , . . . , tm where t = clT (S ∪ {∃r.A | A ∈ tm });
Consolidation rules: Pt1 ,...,tm−2 ,t (x1 , . . . , xm−1 ) ← Pt1 ,...,tm (x1 , . . . , xm−1 , xm−1 )
for all S ⊆ N and T -types t1 , . . . , tm , t where t = clT (tm−1 ∪ tm );
Goal rules: goal(x) ← Pt (x) for all T -types t with A0 ∈ t.
Example 8. We give a fragment of the program ΠQ2 ,2 for the OMQ Q2 from
Example 1 that is equivalent to the full ΠQ2 ,2 and showcases the purpose of
the different rules. For readability, we use representative concept names in the
subscript of IDB relations instead of types:
           PA (x) ← A(x)                            PB1 (x) ← r(x, y) ∧ PA (y)
     PB1 ,A (x, y) ← PB1 (x) ∧ A(y)           PB1 ,B2 (x, y) ← s(y, z) ∧ PB1 ,A (x, z)
    PB1 ,B2 (x, y) ← s(y, z) ∧ PB1 ,B2 (x, z)      PB12 (x) ← PB1 ,B2 (x, x)
          PB1 (x) ← r(x, y) ∧ PB12 (y)             goal(x) ← PB12 (x)

It can be verified that the program ΠQ,k is sound, that is, A |= ΠQ,k (a) implies
A |= Q(a) for any Σ-ABox A. The following is a form of completeness.
Lemma 9. If A is a tree-shaped ABox with root a0 that does not have the full
binary tree of depth k as a minor and A, T |= A0 (a0 ), then A |= ΠQ,k (a0 ).
Lemma 9 is proved by exhibiting a suitable strategy for applying the rules in ΠQ,k .
Returning to the “⇒” direction of Proposition 7, we next show the following.
Lemma 10. If k − 1 is the branching limit of Q, then ΠQ,k is a rewriting of Q.
    The programs ΠQ,k allow us to construct a linear Datalog rewriting of an
OMQ Q provided that we know an upper bound on its branching limit. The
following lemma establishes such an upper bound (in case that Q is rewritable
into linear Datalog at all).
Lemma 11. If Q = (T , Σ, A0 (x)) ∈ (EL, AQ) is boundedly branching, then its
                             |T |+1
branching limit is at most 24       .
It can be verified that Lemma 11 is a consequence of the proof of Proposition 6.
Lemma 11 almost yields decidability of linear Datalog rewritability: guess a
tree-shaped Σ-ABox A and verify that it satisfies Conditions 1 and 2 from the
definition of k-branching, where k is the bound from Lemma 11. For this to work,
we would additionally have to bound the depth and degree of the tree-shaped
ABoxes to be guessed. While this is not too difficult, we follow a different route
(in Section 5) to obtain tight complexity bounds.
Direction “⇐”. For d, k, n ≥ 0, let `kd (n) denote the maximum number of leaves
in any tree that has degree d, depth n, and does not have as a minor the full
binary tree of depth k + 1. The following lemma says that `kd (n) as a function of
n grows like a polynomial of degree k.
Lemma 12. (d − 1)k (n − k)k ≤ `kd (n) ≤ (k + 1)(d − 1)k nk for all d, k ≥ 0 and
n ≥ 2k.
   Let Π be a Datalog program over EDB signature Σ and IDB signature ΣI ,
and let A a Σ-ABox. It is standard to characterize answers to Π in terms of
derivations that take the form of a labelled tree, see [1] or the appendix. From
each derivation D, one can read off an ABox AD in a standard way such that
the properties summarized by the following lemma are satisfied.
Lemma 13. Let D be a derivation of Π(a) in A, Π of diameter d. Then
 1. AD |= Π(a);
 2. there is a homomorphism h from AD to A with h(a) = a;
 3. AD has pathwidth at most d.
      We are now ready to prove the desired result.
Lemma 14. If Q ∈ (EL, AQ) is unboundedly branching, then it is not rewritable
into a linear Datalog program.
    The proof, inspired by [2], is by contradiction. Assume that Q ∈ (EL, AQ) is
unboundedly branching, but rewritable into a linear Datalog program Π. We
choose a minimal tree-shaped Σ-ABox A that contains a full binary tree of very
large depth as a minor and such that A |= Q(a0 ), a0 the root of A. Consider
the derivation D of Π(a0 ) in A and the associated ABox AD . By a sequence
of manipulations, we identify a tree-shaped sub-ABox B ⊆ AD such that B has
a very large number of leaves (a consequence of Point 2 of Lemma 13 and the
fact that the homomorphism must be surjective due to the minimality of A and
Point 1 of that lemma). By Lemma 12, it follows that B must contain a full
binary tree of large depth as a minor and therefore must have high pathwidth, in
contrast to Point 3 of Lemma 13.
    This finishes the proof of Proposition 7 and thus of Theorem 2.

3.3     Width Hierarchy
The linear Datalog rewritings constructed in the previous section are of unbounded
width. We next show that this is unavoidable, in contrast to the fact that every
OMQ from (EL, AQ) can be rewritten into a monadic Datalog program [4]. It
strengthens a result by [12] who establish an analogous statement for constraint
satisfaction problems (CSPs). However, while every OMQ from (EL, AQ) is
equivalent to a CSP (up to complementation [6]), the converse is false and indeed
the CSPs used by Dalmau and Krokhin are not equivalent to an OMQ from
(EL, AQ).
Theorem 15. For every ` > 0, there is an OMQ from (EL, AQ) that is rewritable
into linear Datalog, but not into a linear Datalog program of width `.
                                            u                       t
                                u           t                       u   t

                      r         s           u       t       u       t
                  r       s r       s   r       s       r       s


Fig. 2. An ABox of depth 4 whose root is an answer to Q2 and which is minimal with
this property. It has 11 leaves, the largest number of leaves that a binary tree of depth
4 can have, unless it contains the full binary tree of depth 3 as a minor.

To prove Theorem 15, we use the following queries: for all k ≥ 1, let Qk =
(Tk , Σ, Ak (x)) where Σ = {r, s, t, u} and

              Tk = {> v A0 } ∪
                   {∃x.Ai v Bx,i | x ∈ {r, s, t, u}, 0 ≤ i ≤ k − 1} ∪
                   {∃x.Bx,i v Bx,i | x ∈ {r, s, t, u}, 0 ≤ i ≤ k − 1} ∪
                   {Br,i u Bs,i v Ai+1 | 0 ≤ i ≤ k − 1} ∪
                   {Bt,i u Bu,i+1 v Ai+1 | 0 ≤ i ≤ k − 1}.

In the OMQ Qk , each concept name Ai , i ≤ k, represents the existence of a
full binary tree of depth i, that is, if Ai is derived at the root of a tree-shaped
Σ-ABox A, then A contains the full binary tree of depth i as a minor. Thus,
deriving Qk at the root implies that A has the full binary tree of depth k as a
minor. Furthermore, for every n ≥ k there is minimal tree-shaped Σ-ABox A
such that Qk is derived at the root, A is of depth n, and A has the maximum
number of leaves that any tree of depth n without the full binary tree of depth
k + 1 as a minor can have. For the case k = 2 and n = 4, such an ABox is shown
in Figure 2. The concept inclusions ∃x.Bx,i v Bx,i in Tk ensure that Qk is closed
under subdivisions of ABoxes, that is, if A is a Σ-ABox and A0 is obtained from
A by subdividing an edge into a path (using the same role name as the original
edge), then A |= Qk (a) iff A0 |= Qk (a) for all a ∈ Ind(A).
Lemma 16. Every Qk is rewritable into linear Datalog.
    We prove Lemma 16 by showing that each Qk is boundedly branching with
branching limit k and using Proposition 7.
    To show that linear Datalog rewritings of the defined family of OMQs require
unbounded width, we first show that they require unbounded diameter and then
proceed by showing that the width of rewritings cannot be significantly smaller
than the required diameter. To make the latter step work, we actually show
the former on an infinite family of classes of ABoxes of restricted shape. More
precisely, for all i ≥ 0 we consider the class Ci of all forest-shaped Σ-ABoxes
in which the distance between any two branching individuals exceeds i (where
a forest is a disjoint union of trees and a branching individual is one that has
at least two successors). Since the queries Qk are closed under the subdivision
of ABoxes, each class Ci contains ABoxes whose root is an answer to the query.
In summary, we obtain the following result. We are now ready to establish the
hierarchy.
Proposition 17. Q8`+13 is not rewritable into a linear Datalog program of
width `.

4    AC0 vs. NL: Completing the Trichotomy
We say that an OMQ Q = (T , Σ, A0 (x)) has unbounded depth if for every k ≥ 0,
there is a tree-shaped ABox A with depth at least k and root a such that
A, T |= A0 (a) and A is minimal with this property (regarding set inclusion). The
following theorem summarizes the results in this section.
Theorem 18. For every OMQ Q ∈ (EL, AQ), one of the following applies:
 1. Q is FO-rewritable and thus in AC0 .
 2. Q is not FO-rewritable and NL-hard.
Unbounded depth of Q implies NL-hardness and delineates the two cases.
The following characterization of FO-rewritability was established in [8].
Theorem 19. Let Q ∈ (EL, AQ). Q is not FO-rewritable iff Q has unbounded
depth.
To prove Theorem 18, it thus remains to show that unbounded depth implies
NL-hardness. Similarly to the case of PTime-hardness, the elegant condition of
unbounded depth does not directly lend itself to hardness proofs, and we thus
establish a second and equivalent characterization. Here, the second characteriza-
tion is tailored towards NL-hardness proofs via reduction from reachability in
directed graphs (REACH).
Definition 20. An OMQ (T , Σ, A0 (x)) ∈ (EL, AQ) has the ability to simulate
REACH iff there are T -types t0 ( t1 and a tree-shaped ABox A with root a and
distinguished non-root individuals b, c where c is a descendant of b such that
 1. A, T |= A0 (a),
 2. t1 = tpA,T (b) = tpA,T (c),
 3. tpAc ∪t0 (c),T (b) = t0 , and
 4. Ab ∪ t0 (b), T 6|= A0 (a).
We define Afinish = Ab , Aedge = Abc , and Astart = Ac .
The three defined sub-ABoxes can be used in a reduction from REACH to Q. We
now prove that unbounded depth implies NL-hardness, proceeding via the ability
to simulate REACH. The following lemma is essentially implicit already in [8].
Lemma 21. Let Q ∈ (EL, AQ). If Q has unbounded depth, then Q has the ability
to simulate REACH.
    The next lemma is proved similarly to Lemma 5.
Lemma 22. Let Q ∈ (EL, AQ). If Q has the ability to simulate REACH, then
Q is NL-hard under FO-reductions.
    We have completed the proof of Theorem 18, and thus also of the trichotomy.
5    Decidability and Complexity
We first show that an existing reduction in [8] yields a variety of relevant hardness
results, under various complexity-theoretic assumptions.
Theorem 23. The following properties of OMQs from (EL, AQ) are ExpTime-
hard: linear Datalog rewritability, containment in NL (unless NL = PTime),
NL-hardness (unless L = NL), and PTime-hardness (unless L = PTime).
    Regarding upper bounds, we first recall the known result that it is in ExpTime
to decide whether an OMQ from (EL, AQ) is FO-rewritable [8] and observe that,
by Theorem 18, we also obtain an ExpTime upper bound for NL-hardness. For
linear Datalog rewritability, containment in NL, and PTime-hardness, we use an
approach based on (one-way) alternating parity automata on finite trees (APTAs).
Because of space constraints, we can only give a brief sketch. By Theorem 2
and Proposition 6, it suffices to decide whether a given OMQ has the ability
to simulate PSA, that is, whether there are T -types t0 , t1 and a tree-shaped
Σ-ABox A that satisfy the conditions from Definition 3. We iterate over all
choices for t0 , t1 , building for each choice an APTA At0 ,t1 that accepts precisely
the tree-shaped Σ-ABoxes satisfying the required conditions for the chosen t0 , t1 .
Theorem 24. It is in ExpTime to decide whether a given OMQ from (EL, AQ)
is rewritable into linear Datalog.
Interestingly, it is rather unclear how an ExpTime upper bound would be
established based on the characterization in terms of bounded branching. The
following corollary sums up the results obtained in this section.
Corollary 25. For OMQs from (EL, AQ), all of the following problems are
ExpTime-complete (under the same complexity theoretic assumptions for the
lower bounds as in Theorem 23): linear Datalog rewritability, containment in NL,
NL-hardness, and PTime-hardness.
Note that Theorem 24 and the results from Section 3.2 give an algorithm that
provides a linear Datalog rewriting of a given OMQ if it exists and reports failure
otherwise.


6    Conclusion
As future work, we plan to extend our analysis to (ELI, AQ) where ELI is the
extension of EL with inverse roles. It is clear that the overall picture changes
because there are OMQs from (ELI, AQ) that are L-complete. This also raises
the question whether L-completeness coincides with rewritability into symmetric
Datalog [13]. It should be noted, though, that even lifting to (ELI, AQ) the
results established in this paper such as the dichotomy between NL and PTime is
non-trivial. It would also be interesting to replace AQs with conjunctive queries,
or even to move directly to frontier-one tuple generating dependencies [5].
Acknowledgements. Funded by ERC consolidator grant 647289 ‘CODA’.
References
 1. Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases. Addison-Wesley
    (1995)
 2. Afrati, F.N., Cosmadakis, S.S.: Expressiveness of restricted recursive queries (ex-
    tended abstract). In: Proc. of STOC. pp. 113–126 (1989)
 3. Baader, F., Brandt, S., Lutz, C.: Pushing the EL envelope. In: Proc. of IJCAI. pp.
    364–369 (2005)
 4. Baader, F., Horrocks, I., Lutz, C., Sattler, U.: An Introduction to Description
    Logics. Cambride University Press (2017)
 5. Baget, J., Leclère, M., Mugnier, M., Salvat, E.: Extending decidable cases for rules
    with existential variables. In: Proc. of IJCAI. pp. 677–682 (2009)
 6. Bienvenu, M., ten Cate, B., Lutz, C., Wolter, F.: Ontology-based data access: A
    study through disjunctive datalog, CSP, and MMSNP. ACM Trans. Database Syst.
    39(4), 33:1–33:44 (2014)
 7. Bienvenu, M., Hansen, P., Lutz, C., Wolter, F.: First order-rewritability and con-
    tainment of conjunctive queries in horn description logics. In: Proc. of IJCAI
    (2016)
 8. Bienvenu, M., Lutz, C., Wolter, F.: First order-rewritability of atomic queries in
    horn description logics. In: Proc. of IJCAI (2013)
 9. Bienvenu, M., Ortiz, M.: Ontology-mediated query answering with data-tractable
    description logics. In: Proc. of Reasoning Web. LNCS, vol. 9203, pp. 218–307.
    Springer (2015)
10. Calvanese, D., De Giacomo, G., Lembo, D., Lenzerini, M., Poggi, A., Rodriguez-
    Muro, M., Rosati, R.: Ontologies and databases: The DL-Lite approach. In: Proc.
    of Reasoning Web 2009. pp. 255–356 (2009)
11. Calvanese, D., De Giacomo, G., Lembo, D., Lenzerini, M., Rosati, R.: Data com-
    plexity of query answering in description logics. Artif. Intell. 195, 335–360 (2013)
12. Dalmau, V., Krokhin, A.A.: Majority constraints have bounded pathwidth duality.
    Eur. J. Comb. 29(4), 821–837 (2008)
13. Egri, L., Larose, B., Tesson, P.: Symmetric datalog and constraint satisfaction prob-
    lems in LogSpace. Electronic Colloquium on Computational Complexity (ECCC)
    14(024) (2007)
14. Eiter, T., Ortiz, M., Simkus, M., Tran, T., Xiao, G.: Query rewriting for Horn-SHIQ
    plus rules. In: Proc. of AAAI. AAAI Press (2012)
15. Feier, C., Kuusisto, A., Lutz, C.: Rewritability in monadic disjunctive datalog,
    MMSNP, and expressive description logics. In: Proc. of ICDT (2017)
16. Hansen, P., Lutz, C., İnanç Seylan, Wolter, F.: Efficient query rewriting in the
    description logic EL and beyond. In: Proc. of IJCAI (2015)
17. Hustadt, U., Motik, B., Sattler, U.: Data complexity of reasoning in very expressive
    description logics. In: Proc. of IJCAI. pp. 466–471. Professional Book Center (2005)
18. Immerman, N.: Descriptive complexity. Graduate texts in computer science, Springer
    (1999)
19. Kaminski, M., Nenov, Y., Grau, B.C.: Datalog rewritability of disjunctive datalog
    programs and its applications to ontology reasoning. In: Proc. of AAAI. pp. 1077–
    1083. AAAI Press (2014)
20. Kontchakov, R., Rodriguez-Muro, M., Zakharyaschev, M.: Ontology-based data
    access with databases: A short course. In: Reasoning Web. pp. 194–229 (2013)
21. Krisnadhi, A., Lutz, C.: Data complexity in the EL family of description logics. In:
    Dershowitz, N., Voronkov, A. (eds.) Proc. of LPAR. LNAI, vol. 4790, pp. 333–347.
    Springer (2007)
22. Lutz, C., Sabellek, L.: Ontology-mediated querying with the description logic el:
    Trichotomy and linear datalog rewritability. In: Proceedings of the 26th International
    Joint Conference on Artificial Intelligence (IJCAI-17) (2017)
23. Pérez-Urbina, H., Motik, B., Horrocks, I.: Tractable query answering and rewriting
    under description logic constraints. Journal of Applied Logic 8(2), 186–209 (2010)
24. Rosati, R.: The limits of querying ontologies. In: Proc. of ICDT. LNCS, vol. 4353,
    pp. 164–178. Springer (2007)
25. Trivela, D., Stoilos, G., Chortaras, A., Stamou, G.B.: Optimising resolution-based
    rewriting algorithms for OWL ontologies. J. Web Sem. 33, 30–49 (2015)