Ontology-Mediated Querying with EL: Trichotomy and Linear Datalog Rewritability Carsten Lutz and Leif Sabellek Department of Computer Science, University of Bremen, Germany Abstract. We consider ontology-mediated queries (OMQs) based on an EL ontology and an atomic query (AQ), provide an ultimately fine- grained analysis of data complexity and study rewritability into linear Datalog – aiming to capture linear recursion in SQL. Our main results are that every such OMQ is in AC0 , NL-complete or PTime-complete, and that containment in NL coincides with rewritability into linear Datalog (whereas containment in AC0 coincides with rewritability into first-order logic). We establish natural characterizations of the three cases, show that deciding linear Datalog rewritability (as well as the mentioned complexities) is ExpTime-complete, give a way to construct linear Datalog rewritings when they exist, and prove that there is no constant bound on the arity of IDB relations in linear Datalog rewritings. 1 Introduction An important application of ontologies is to enrich data with a semantics and with domain knowledge while also providing additional vocabulary for query formulation [10, 20, 6, 9]. The combination of a traditional database query and an ontology can be viewed as a compound query, commonly referred to as an ontology- mediated query (OMQ). Substantial research efforts have been invested into studying OMQs based on description logic (DL) ontologies, with two dominating topics being the data complexity of OMQs [17, 21, 24, 11] and their rewritability into more standard database query languages such as SQL (which in this context is often equated with first-order logic) and Datalog [23, 14, 8, 6, 19, 25, 15]. While the former topic aims to understand the feasibility of OMQs from a theoretical angle, the latter is inspired by rather practical concerns: since most database systems are unaware of ontologies, rewriting OMQs into standard query languages provides an important avenue for implementing OMQ execution in practical applications; however, a major challenge emerges from the fact that the desired rewritings are typically not guaranteed to always exist, though they often do exist in practically relevant cases. Both topics are thoroughly intertwined since rewritability into first-order logic (FO) is closely related to AC0 data complexity while rewritability into Datalog is closely related to PTime data complexity. Modern DLs can roughly be divided into two families: ‘expressive DLs’ such as ALC and SHIQ which typically have coNP data complexity and where rewritability is guaranteed neither into FO nor into Datalog [6, 25, 15], and ‘Horn DLs’ such as EL and Horn-SHIQ which typically have PTime data complexity and where rewritability into Datalog is guaranteed, but FO-rewritability is not [8, 16, 7] (with the notable exception of DL-Lite [10]). In this paper, we consider the OMQ language (EL, AQ) where the ontology is formulated in the Horn description logic EL and where the actual queries are atomic queries (AQs) of the form A(x), studying data complexity, rewritability, and their relations. Our actual contribution is two-fold. First, we carry out an ultimately fine-grained analysis of data complexity. In fact, we establish a trichotomy, showing that every OMQ from (EL, AQ) is in AC0 , NL-complete, or PTime-complete, a remarkable sparseness of complexities. We also establish elegant characterizations that separate the three classes of OMQs. In particular, we show that an OMQ Q is in NL if there is a bound k such that any minimal tree-shaped ABox A whose root is an answer to the OMQ Q does not contain a full binary tree of depth k as a minor, and PTime-hard otherwise. We additionally use a second, more operational characterization to determine the precise complexity of deciding whether a given OMQ is in AC0 , NL-complete, or PTime-complete, which turns out to be ExpTime-complete. And second, we put rewritability into linear Datalog onto the agenda of OMQ research. In fact, the equation “SQL = FO” often adopted in this area ignores the fact that SQL contains linear recursion from its version 3 published in 1999 on, which exceeds the expressive power of FO. We believe that, in the context of OMQs, linear Datalog is a natural abstraction of SQL that includes linear recursion, despite the fact that it does not contain full FO. Indeed, all OMQs from (EL, AQ) that are FO-rewritable are also rewritable into a union of conjunctive queries (UCQ) and thus into linear Datalog (and the same is true for much more expressive OMQ languages) [6]. This shows that the expressive power of FO that lies outside of linear Datalog is not useful when using SQL as a target language for OMQ rewriting. We prove that rewritability into linear Datalog coincides with containment in NL. By what was said above, it is thus ExpTime-complete to decide whether a given OMQ is rewritable. Moreover, we show how to construct linear Datalog rewritings when they exist and prove that there is no constant bound on the arity of IDB relations in linear Datalog rewritings. This paper is a workshop version of [22]. For proof details, we refer the reader to the appendix of that paper, available at http://www.informatik.uni- bremen.de/tdki/research/papers.html. 2 Preliminaries Let NC , NR , and NI be countably infinite sets of concept names, role names, and individual names. An EL-concept is built according to the syntax rule C, D ::= > | A | C u D | ∃r.C where A ranges over concept names and r over role names. An EL-TBox is a finite set of concept inclusions (CIs) of the form C v D, C and D EL-concepts. The size of T , denoted |T |, is the number of symbols needed to write T , concept and role names counting as one symbol. An ABox is a finite set of concept assertions A(a) and role assertions r(a, b) where A is a concept name, r a role name, and a, b individual names. We use Ind(A) to denote the set of individuals of the ABox A. A signature is a set of concept and role names. An ABox that only uses concept and role names from a signature Σ is a Σ-ABox. The semantics of DLs is defined in the usual way. An atomic query (AQ) takes the form A(x), A a concept name. An ontology- mediated query (OMQ) is a triple Q = (T , Σ, A(x)) with T a TBox, Σ an ABox signature, and A(x) an AQ. We assume w.l.o.g. that A occurs in T . Let A be a Σ-ABox. We write A |= Q(a) and say that a is an answer to Q on A if A, T |= A(a). The evaluation problem for Q is to decide, given a Σ-ABox A and an a ∈ A, whether A |= Q(a). With the complexity of an OMQ Q, we generally mean its evaluation problem. We use (EL, AQ) to denote the set of all OMQs (T , Σ, A(x)) where T is an EL-TBox. All OMQs in (EL, AQ) are in PTime [24, 21]. We use standard notation for Datalog programs, see for example [1]. A Datalog program is linear if each rule body contains at most one IDB relation. The width of a Datalog program is the maximum arity of non-goal IDB relations used in it and its diameter is the maximum number of variables that occur in a rule in Π. A Datalog program Π over EDB signature Σ is a rewriting of an OMQ Q = (T , Σ, A(x)) if for all Σ-ABoxes A and all a ∈ Ind(A), we have A |= Q(a) iff A |= Π(a). We say that Q is (linear) Datalog-rewritable if there is a (linear) Datalog program that is a rewriting of Q. Throughout the paper, we assume w.l.o.g. and without further notice that TBoxes are in normal form, that is, they contain only concept inclusions of the form ∃r.A1 v A2 , > v A1 , A1 u A2 v A3 , A1 v ∃r.A2 where all Ai are concept names [3]. We shall often deal with ABoxes that are tree-shaped. By a tree, we gen- erally mean a directed (unlabelled) tree T = (V, E), defined in the usual way. Every ABox gives rise to a directed graph GA = (Ind(A), {(a, b) | r(a, b) ∈ A for some r}). We say that A is tree-shaped if GA is a tree and r(a, b), s(a, b) ∈ A implies r = s. We introduce some further standard graph theoretic notions for ABoxes. A homomorphism from an ABox A1 to an ABox A2 is a total func- tion h : Ind(A1 ) → Ind(A2 ) such that A(a) ∈ A1 implies A(h(a)) ∈ A2 and r(a, b) ∈ A1 implies r(h(a), h(b)) ∈ A2 . We write A1 → A2 if there is a homomor- phism from A1 to A2 . A directed graph G = (V, E) is a minor of an ABox A if G is a minor of GA , that is, if G can be obtained from GA by deleting edges and vertices and by contracting edges. A path decomposition of a directed graph (V, E) is a sequence S1 , . . . , Sn of subsets of V such that for every (a, b) ∈ E there is a set Si with a, b ∈ Si and Si ∩ Sk ⊆ Sj , for all i ≤ j ≤ k. A path decomposition is an (`, k)-path decomposition if k = maxni=1 |Si | and ` = maxn−1 i=1 |Si ∩ Si+1 |. The pathwidth of a directed graph (V, E) is the smallest k such that (V, E) has an N (`, k + 1)-path decomposition for some ` ∈ . We identify the pathwidth of an ABox A with the pathwidth of GA . 3 NL, PTime, Linear Datalog Rewritability We start with establishing a dichotomy between PTime and NL for evaluating queries from (EL, AQ), also showing that containment in NL coincides with rewritability into linear Datalog (unless NL = PTime). The dichotomy is based on a characterization of containment in NL via a ‘bounded amount of branching’ in ABoxes whose root is an answer to the query. The linear Datalog programs constructed in the proofs are of unbounded width. We prove a hierarchy theorem which shows that this is unavoidable. Let Q = (T , Σ, A0 (x)) be an OMQ. We say that Q is unboundedly branching if for every k ≥ 0, there is a tree-shaped Σ-ABox A such that 1. A, T |= A0 (a), a the root of A, and A is minimal with this property (w.r.t. set inclusion) 2. A has the full binary tree of depth k as a minor. Otherwise, Q is boundedly branching. In the latter case, the branching limit of Q is the maximum integer k such that there is a tree-shaped Σ-ABox A that satisfies Conditions 1 and 2 above. We define the branching limit to be 0 if there is no tree-shaped Σ-ABox A that satisfies Condition 1. Example 1. (1) The OMQ Q1 = (T1 , {A, r, s}, A(x)) with T1 = {∃r.A v B1 , ∃s.A v B2 , B1 u B2 v A} is unboundedly branching as witnessed by the ABoxes A1 , A2 , . . . where Ai is a full binary tree of depth i, each left successor connected via the role name r, each right successor via the role name s, and with the concept name A asserted for each leaf. (2) The OMQ Q2 = (T2 , {A, r, s}, B12 (x)) with T2 = {∃r.A v B1 , ∃s.A v B2 , ∃s.B2 v B2 , B1 u B2 v B12 , ∃r.B12 v B1 } is boundedly branching with branching limit one. In fact, every minimal tree-shaped Σ-ABox whose root is an answer to Q2 consists of a single r-path with an s-path starting at each non-leaf node and with A asserted for each leaf. Note that the number of individuals at which a branching occurs is unbounded in such ABoxes. The following theorem sums up the results obtained in this section, except for the width hierarchy, which is Theorem 15 below. Theorem 2. For every OMQ Q ∈ (EL, AQ), one of the following applies: 1. Q is PTime-hard and not expressible in linear Datalog; 2. Q is rewritable into linear Datalog and thus in NL. Bounded branching of Q implies linear Datalog rewritability and delineates the two cases. Note that Theorem 2 implies that any OMQ from (EL, AQ) is linear Datalog rewritable if and only if it is in NL (unless NL = PTime). It is also interesting to compare Theorem 2 with the result by [2] that there are Datalog-queries that are not expressible as a linear Datalog program, but belong to N C 2 and are thus unlikely to be PTime-hard. The proof of Theorem 2 procceds as follows: in subsection 3.1, we prove that unbounded branching implies PTime-hardness and in subsection 3.2 we show that bounded branching is equivalent to linear Datalog rewritability. 3.1 Characterizations and PTime-Hardness Theorem 2 provides a characterization of PTime-hardness in terms of unbounded branching that is elegant, but does not lend itself to hardness proofs very well. For this reason, we establish a second characterization designed to enable a reduction from the PTime-complete path systems accessibility (PSA) problem and show that both characterizations are equivalent. The new characterization will also be handy later on to decide the rewritability of OMQs into linear Datalog. An instance of PSA takes the form G = (V, E, S, t) where V is a finite set of nodes, E is a ternary relation on V , S ⊆ V is a set of source nodes, and t ∈ V is a target node. G is a yes instance if t is accessible, where a node v ∈ V is accessible if v ∈ S or there are accessible nodes u, w with (u, w, v) ∈ E. Before we can state the new characterization, we need some preliminaries. Let T be a TBox. A T -type is a set t of concept names from T that is closed under T -consequence, that is, if T |= u t v A, then A ∈ t. For any ABox A and a ∈ Ind(A), we use tpA,T (a) to denote the set of concept names A from T such that A, T |= A(a), which is a T -type. If M is a set of concept names, then by M (a) we denote the ABox {A(a) | A ∈ M }. We also write A, T |= M (a), meaning that A, T |= A(a) for all A ∈ M . For every tree-shaped ABox A and a ∈ Ind(A), we use Aa to denote the sub-tree ABox of A that has a as the root. Moreover, we use Aa to denote A \ Aa , that is, the ABox obtained from A by removing all assertions that involve descendants of a (making a a leaf) and all assertions of the form A(a). We also combine these notations, writing for example Aabc for ((Aa )b )c . Definition 3. An OMQ (T , Σ, A0 (x)) ∈ (EL, AQ) has the ability to simulate PSA if there are T -types t0 , t1 and a tree-shaped Σ-ABox A with root a and distinguished non-root individuals b, c, d where c and d are distinct incomparable descendants of b such that 1. A, T |= A0 (a); 2. t1 = tpA,T (b) = tpA,T (c) = tpA,T (d); 3. Ab ∪ t0 (b), T 6|= A0 (a); 4. tpAc ∪t0 (c),T (b) = tpAd ∪t0 (d),T (b) = t0 . We define Afinish := Ab , A∧ := Abcd and Astart := Ab . Example 4. The OMQ Q1 from Example 1 has the ability to simulate PSA. Figure 1 shows a witnessing ABox A according to Definition 3 where t1 = {A, B1 , B2 } and t0 = {B2 }. PSA is PTime-hard under FO-reductions [18]. Using a reduction from this probem, we show that having the ability to simulate PSA is sufficient for PTime-hardness under FO-reductions. In particular, we use the ABox A∧ from Definition 3 to implement an “and” gate where t0 and t1 represent the truth values zero and one to capture the behaviour of the ternary relation E in PSA. a r s b r s A r s A c d r s r s A A A A Fig. 1. Witness ABox for Example 4 Lemma 5. If Q ∈ (EL, AQ) has the ability to simulate PSA, then Q is PTime- hard under FO-reductions. To link Lemma 5 to Theorem 2, we next show that the ability to simulate PSA is equivalent to unbounded branching. Proposition 6. Let Q ∈ (EL, AQ). Then Q has the ability to simulate PSA iff Q is unboundedly branching. The “⇒” direction is proved by taking an ABox A that witnesses the ability to simulate PSA and then glueing together disjoint copies of A∧ to obtain tree- shaped ABoxes whose root is an answer to Q, which are minimal with this property, and that contain deeper and deeper full binary trees as a minor. The “⇐” direction is based on a combinatorial argument: if we take a minimal tree- shaped ABox that makes Q true at the root and contains a deep full binary tree as a minor, then it must contain an ABox that witnesses the ability to simulate PSA. 3.2 NL and Linear Datalog-Rewritability We show that bounded branching characterizes containment in NL as well as linear Datalog rewritability, which therefore coincide (unless NL = PTime). We also give a way to construct linear Datalog rewritings when they exist. Proposition 7. Let Q ∈ (EL, AQ). Then Q is boundedly branching iff Q is rewritable into a linear Datalog program. Moreover, if the branching limit of Q is k, then there is a linear Datalog rewriting of width k + 1. Direction “⇒”. Let Q = (T , Σ, A0 (x)) be an OMQ from (EL, AQ). For each k > 0, we construct a linear Datalog program ΠQ,k that is sound as a rewriting of Q and complete on ABoxes that do not have the full binary tree of depth k as a minor. The program ΠQ,k uses IDB relations of the form Pt1 ,...,tm where t1 , . . . , tm , are T -types; the arity of this relation is m ≤ k. For any finite set S of concepts, we use clT (S) to denote the smallest (w.r.t. set inclusion) T -type t with T |= u S v t. Let N be the set of all concept names from T . The program ΠQ,k consists of five types of rules: Start V rules: PclT (S) (x) ← S(x) for all S ⊆ N and where S(x) abbreviates A∈S A(x); Extension rules: Pt1 ,...,tm ,clT (S) (x1 , . . . , xm , y) ← Pt1 ,...,tm (x1 , . . . , xm ) ∧ S(y) for all S ⊆ N and T -types t1 , . . . , tm ; Step rules: Pt1 ,...,tm−1 ,t (x1 , . . . , xm−1 , y) ← Pt1 ,...,tm (x1 , . . . , xm ) ∧ r(y, xm ) ∧ S(y) for all S ⊆ N and T -types t1 , . . . , tm where t = clT (S ∪ {∃r.A | A ∈ tm }); Consolidation rules: Pt1 ,...,tm−2 ,t (x1 , . . . , xm−1 ) ← Pt1 ,...,tm (x1 , . . . , xm−1 , xm−1 ) for all S ⊆ N and T -types t1 , . . . , tm , t where t = clT (tm−1 ∪ tm ); Goal rules: goal(x) ← Pt (x) for all T -types t with A0 ∈ t. Example 8. We give a fragment of the program ΠQ2 ,2 for the OMQ Q2 from Example 1 that is equivalent to the full ΠQ2 ,2 and showcases the purpose of the different rules. For readability, we use representative concept names in the subscript of IDB relations instead of types: PA (x) ← A(x) PB1 (x) ← r(x, y) ∧ PA (y) PB1 ,A (x, y) ← PB1 (x) ∧ A(y) PB1 ,B2 (x, y) ← s(y, z) ∧ PB1 ,A (x, z) PB1 ,B2 (x, y) ← s(y, z) ∧ PB1 ,B2 (x, z) PB12 (x) ← PB1 ,B2 (x, x) PB1 (x) ← r(x, y) ∧ PB12 (y) goal(x) ← PB12 (x) It can be verified that the program ΠQ,k is sound, that is, A |= ΠQ,k (a) implies A |= Q(a) for any Σ-ABox A. The following is a form of completeness. Lemma 9. If A is a tree-shaped ABox with root a0 that does not have the full binary tree of depth k as a minor and A, T |= A0 (a0 ), then A |= ΠQ,k (a0 ). Lemma 9 is proved by exhibiting a suitable strategy for applying the rules in ΠQ,k . Returning to the “⇒” direction of Proposition 7, we next show the following. Lemma 10. If k − 1 is the branching limit of Q, then ΠQ,k is a rewriting of Q. The programs ΠQ,k allow us to construct a linear Datalog rewriting of an OMQ Q provided that we know an upper bound on its branching limit. The following lemma establishes such an upper bound (in case that Q is rewritable into linear Datalog at all). Lemma 11. If Q = (T , Σ, A0 (x)) ∈ (EL, AQ) is boundedly branching, then its |T |+1 branching limit is at most 24 . It can be verified that Lemma 11 is a consequence of the proof of Proposition 6. Lemma 11 almost yields decidability of linear Datalog rewritability: guess a tree-shaped Σ-ABox A and verify that it satisfies Conditions 1 and 2 from the definition of k-branching, where k is the bound from Lemma 11. For this to work, we would additionally have to bound the depth and degree of the tree-shaped ABoxes to be guessed. While this is not too difficult, we follow a different route (in Section 5) to obtain tight complexity bounds. Direction “⇐”. For d, k, n ≥ 0, let `kd (n) denote the maximum number of leaves in any tree that has degree d, depth n, and does not have as a minor the full binary tree of depth k + 1. The following lemma says that `kd (n) as a function of n grows like a polynomial of degree k. Lemma 12. (d − 1)k (n − k)k ≤ `kd (n) ≤ (k + 1)(d − 1)k nk for all d, k ≥ 0 and n ≥ 2k. Let Π be a Datalog program over EDB signature Σ and IDB signature ΣI , and let A a Σ-ABox. It is standard to characterize answers to Π in terms of derivations that take the form of a labelled tree, see [1] or the appendix. From each derivation D, one can read off an ABox AD in a standard way such that the properties summarized by the following lemma are satisfied. Lemma 13. Let D be a derivation of Π(a) in A, Π of diameter d. Then 1. AD |= Π(a); 2. there is a homomorphism h from AD to A with h(a) = a; 3. AD has pathwidth at most d. We are now ready to prove the desired result. Lemma 14. If Q ∈ (EL, AQ) is unboundedly branching, then it is not rewritable into a linear Datalog program. The proof, inspired by [2], is by contradiction. Assume that Q ∈ (EL, AQ) is unboundedly branching, but rewritable into a linear Datalog program Π. We choose a minimal tree-shaped Σ-ABox A that contains a full binary tree of very large depth as a minor and such that A |= Q(a0 ), a0 the root of A. Consider the derivation D of Π(a0 ) in A and the associated ABox AD . By a sequence of manipulations, we identify a tree-shaped sub-ABox B ⊆ AD such that B has a very large number of leaves (a consequence of Point 2 of Lemma 13 and the fact that the homomorphism must be surjective due to the minimality of A and Point 1 of that lemma). By Lemma 12, it follows that B must contain a full binary tree of large depth as a minor and therefore must have high pathwidth, in contrast to Point 3 of Lemma 13. This finishes the proof of Proposition 7 and thus of Theorem 2. 3.3 Width Hierarchy The linear Datalog rewritings constructed in the previous section are of unbounded width. We next show that this is unavoidable, in contrast to the fact that every OMQ from (EL, AQ) can be rewritten into a monadic Datalog program [4]. It strengthens a result by [12] who establish an analogous statement for constraint satisfaction problems (CSPs). However, while every OMQ from (EL, AQ) is equivalent to a CSP (up to complementation [6]), the converse is false and indeed the CSPs used by Dalmau and Krokhin are not equivalent to an OMQ from (EL, AQ). Theorem 15. For every ` > 0, there is an OMQ from (EL, AQ) that is rewritable into linear Datalog, but not into a linear Datalog program of width `. u t u t u t r s u t u t r s r s r s r s Fig. 2. An ABox of depth 4 whose root is an answer to Q2 and which is minimal with this property. It has 11 leaves, the largest number of leaves that a binary tree of depth 4 can have, unless it contains the full binary tree of depth 3 as a minor. To prove Theorem 15, we use the following queries: for all k ≥ 1, let Qk = (Tk , Σ, Ak (x)) where Σ = {r, s, t, u} and Tk = {> v A0 } ∪ {∃x.Ai v Bx,i | x ∈ {r, s, t, u}, 0 ≤ i ≤ k − 1} ∪ {∃x.Bx,i v Bx,i | x ∈ {r, s, t, u}, 0 ≤ i ≤ k − 1} ∪ {Br,i u Bs,i v Ai+1 | 0 ≤ i ≤ k − 1} ∪ {Bt,i u Bu,i+1 v Ai+1 | 0 ≤ i ≤ k − 1}. In the OMQ Qk , each concept name Ai , i ≤ k, represents the existence of a full binary tree of depth i, that is, if Ai is derived at the root of a tree-shaped Σ-ABox A, then A contains the full binary tree of depth i as a minor. Thus, deriving Qk at the root implies that A has the full binary tree of depth k as a minor. Furthermore, for every n ≥ k there is minimal tree-shaped Σ-ABox A such that Qk is derived at the root, A is of depth n, and A has the maximum number of leaves that any tree of depth n without the full binary tree of depth k + 1 as a minor can have. For the case k = 2 and n = 4, such an ABox is shown in Figure 2. The concept inclusions ∃x.Bx,i v Bx,i in Tk ensure that Qk is closed under subdivisions of ABoxes, that is, if A is a Σ-ABox and A0 is obtained from A by subdividing an edge into a path (using the same role name as the original edge), then A |= Qk (a) iff A0 |= Qk (a) for all a ∈ Ind(A). Lemma 16. Every Qk is rewritable into linear Datalog. We prove Lemma 16 by showing that each Qk is boundedly branching with branching limit k and using Proposition 7. To show that linear Datalog rewritings of the defined family of OMQs require unbounded width, we first show that they require unbounded diameter and then proceed by showing that the width of rewritings cannot be significantly smaller than the required diameter. To make the latter step work, we actually show the former on an infinite family of classes of ABoxes of restricted shape. More precisely, for all i ≥ 0 we consider the class Ci of all forest-shaped Σ-ABoxes in which the distance between any two branching individuals exceeds i (where a forest is a disjoint union of trees and a branching individual is one that has at least two successors). Since the queries Qk are closed under the subdivision of ABoxes, each class Ci contains ABoxes whose root is an answer to the query. In summary, we obtain the following result. We are now ready to establish the hierarchy. Proposition 17. Q8`+13 is not rewritable into a linear Datalog program of width `. 4 AC0 vs. NL: Completing the Trichotomy We say that an OMQ Q = (T , Σ, A0 (x)) has unbounded depth if for every k ≥ 0, there is a tree-shaped ABox A with depth at least k and root a such that A, T |= A0 (a) and A is minimal with this property (regarding set inclusion). The following theorem summarizes the results in this section. Theorem 18. For every OMQ Q ∈ (EL, AQ), one of the following applies: 1. Q is FO-rewritable and thus in AC0 . 2. Q is not FO-rewritable and NL-hard. Unbounded depth of Q implies NL-hardness and delineates the two cases. The following characterization of FO-rewritability was established in [8]. Theorem 19. Let Q ∈ (EL, AQ). Q is not FO-rewritable iff Q has unbounded depth. To prove Theorem 18, it thus remains to show that unbounded depth implies NL-hardness. Similarly to the case of PTime-hardness, the elegant condition of unbounded depth does not directly lend itself to hardness proofs, and we thus establish a second and equivalent characterization. Here, the second characteriza- tion is tailored towards NL-hardness proofs via reduction from reachability in directed graphs (REACH). Definition 20. An OMQ (T , Σ, A0 (x)) ∈ (EL, AQ) has the ability to simulate REACH iff there are T -types t0 ( t1 and a tree-shaped ABox A with root a and distinguished non-root individuals b, c where c is a descendant of b such that 1. A, T |= A0 (a), 2. t1 = tpA,T (b) = tpA,T (c), 3. tpAc ∪t0 (c),T (b) = t0 , and 4. Ab ∪ t0 (b), T 6|= A0 (a). We define Afinish = Ab , Aedge = Abc , and Astart = Ac . The three defined sub-ABoxes can be used in a reduction from REACH to Q. We now prove that unbounded depth implies NL-hardness, proceeding via the ability to simulate REACH. The following lemma is essentially implicit already in [8]. Lemma 21. Let Q ∈ (EL, AQ). If Q has unbounded depth, then Q has the ability to simulate REACH. The next lemma is proved similarly to Lemma 5. Lemma 22. Let Q ∈ (EL, AQ). If Q has the ability to simulate REACH, then Q is NL-hard under FO-reductions. We have completed the proof of Theorem 18, and thus also of the trichotomy. 5 Decidability and Complexity We first show that an existing reduction in [8] yields a variety of relevant hardness results, under various complexity-theoretic assumptions. Theorem 23. The following properties of OMQs from (EL, AQ) are ExpTime- hard: linear Datalog rewritability, containment in NL (unless NL = PTime), NL-hardness (unless L = NL), and PTime-hardness (unless L = PTime). Regarding upper bounds, we first recall the known result that it is in ExpTime to decide whether an OMQ from (EL, AQ) is FO-rewritable [8] and observe that, by Theorem 18, we also obtain an ExpTime upper bound for NL-hardness. For linear Datalog rewritability, containment in NL, and PTime-hardness, we use an approach based on (one-way) alternating parity automata on finite trees (APTAs). Because of space constraints, we can only give a brief sketch. By Theorem 2 and Proposition 6, it suffices to decide whether a given OMQ has the ability to simulate PSA, that is, whether there are T -types t0 , t1 and a tree-shaped Σ-ABox A that satisfy the conditions from Definition 3. We iterate over all choices for t0 , t1 , building for each choice an APTA At0 ,t1 that accepts precisely the tree-shaped Σ-ABoxes satisfying the required conditions for the chosen t0 , t1 . Theorem 24. It is in ExpTime to decide whether a given OMQ from (EL, AQ) is rewritable into linear Datalog. Interestingly, it is rather unclear how an ExpTime upper bound would be established based on the characterization in terms of bounded branching. The following corollary sums up the results obtained in this section. Corollary 25. For OMQs from (EL, AQ), all of the following problems are ExpTime-complete (under the same complexity theoretic assumptions for the lower bounds as in Theorem 23): linear Datalog rewritability, containment in NL, NL-hardness, and PTime-hardness. Note that Theorem 24 and the results from Section 3.2 give an algorithm that provides a linear Datalog rewriting of a given OMQ if it exists and reports failure otherwise. 6 Conclusion As future work, we plan to extend our analysis to (ELI, AQ) where ELI is the extension of EL with inverse roles. It is clear that the overall picture changes because there are OMQs from (ELI, AQ) that are L-complete. This also raises the question whether L-completeness coincides with rewritability into symmetric Datalog [13]. It should be noted, though, that even lifting to (ELI, AQ) the results established in this paper such as the dichotomy between NL and PTime is non-trivial. It would also be interesting to replace AQs with conjunctive queries, or even to move directly to frontier-one tuple generating dependencies [5]. Acknowledgements. Funded by ERC consolidator grant 647289 ‘CODA’. References 1. Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases. Addison-Wesley (1995) 2. Afrati, F.N., Cosmadakis, S.S.: Expressiveness of restricted recursive queries (ex- tended abstract). In: Proc. of STOC. pp. 113–126 (1989) 3. Baader, F., Brandt, S., Lutz, C.: Pushing the EL envelope. In: Proc. of IJCAI. pp. 364–369 (2005) 4. Baader, F., Horrocks, I., Lutz, C., Sattler, U.: An Introduction to Description Logics. Cambride University Press (2017) 5. Baget, J., Leclère, M., Mugnier, M., Salvat, E.: Extending decidable cases for rules with existential variables. In: Proc. of IJCAI. pp. 677–682 (2009) 6. Bienvenu, M., ten Cate, B., Lutz, C., Wolter, F.: Ontology-based data access: A study through disjunctive datalog, CSP, and MMSNP. ACM Trans. Database Syst. 39(4), 33:1–33:44 (2014) 7. Bienvenu, M., Hansen, P., Lutz, C., Wolter, F.: First order-rewritability and con- tainment of conjunctive queries in horn description logics. In: Proc. of IJCAI (2016) 8. Bienvenu, M., Lutz, C., Wolter, F.: First order-rewritability of atomic queries in horn description logics. In: Proc. of IJCAI (2013) 9. Bienvenu, M., Ortiz, M.: Ontology-mediated query answering with data-tractable description logics. In: Proc. of Reasoning Web. LNCS, vol. 9203, pp. 218–307. Springer (2015) 10. Calvanese, D., De Giacomo, G., Lembo, D., Lenzerini, M., Poggi, A., Rodriguez- Muro, M., Rosati, R.: Ontologies and databases: The DL-Lite approach. In: Proc. of Reasoning Web 2009. pp. 255–356 (2009) 11. Calvanese, D., De Giacomo, G., Lembo, D., Lenzerini, M., Rosati, R.: Data com- plexity of query answering in description logics. Artif. Intell. 195, 335–360 (2013) 12. Dalmau, V., Krokhin, A.A.: Majority constraints have bounded pathwidth duality. Eur. J. Comb. 29(4), 821–837 (2008) 13. Egri, L., Larose, B., Tesson, P.: Symmetric datalog and constraint satisfaction prob- lems in LogSpace. Electronic Colloquium on Computational Complexity (ECCC) 14(024) (2007) 14. Eiter, T., Ortiz, M., Simkus, M., Tran, T., Xiao, G.: Query rewriting for Horn-SHIQ plus rules. In: Proc. of AAAI. AAAI Press (2012) 15. Feier, C., Kuusisto, A., Lutz, C.: Rewritability in monadic disjunctive datalog, MMSNP, and expressive description logics. In: Proc. of ICDT (2017) 16. Hansen, P., Lutz, C., İnanç Seylan, Wolter, F.: Efficient query rewriting in the description logic EL and beyond. In: Proc. of IJCAI (2015) 17. Hustadt, U., Motik, B., Sattler, U.: Data complexity of reasoning in very expressive description logics. In: Proc. of IJCAI. pp. 466–471. Professional Book Center (2005) 18. Immerman, N.: Descriptive complexity. Graduate texts in computer science, Springer (1999) 19. Kaminski, M., Nenov, Y., Grau, B.C.: Datalog rewritability of disjunctive datalog programs and its applications to ontology reasoning. In: Proc. of AAAI. pp. 1077– 1083. AAAI Press (2014) 20. Kontchakov, R., Rodriguez-Muro, M., Zakharyaschev, M.: Ontology-based data access with databases: A short course. In: Reasoning Web. pp. 194–229 (2013) 21. Krisnadhi, A., Lutz, C.: Data complexity in the EL family of description logics. In: Dershowitz, N., Voronkov, A. (eds.) Proc. of LPAR. LNAI, vol. 4790, pp. 333–347. Springer (2007) 22. Lutz, C., Sabellek, L.: Ontology-mediated querying with the description logic el: Trichotomy and linear datalog rewritability. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI-17) (2017) 23. Pérez-Urbina, H., Motik, B., Horrocks, I.: Tractable query answering and rewriting under description logic constraints. Journal of Applied Logic 8(2), 186–209 (2010) 24. Rosati, R.: The limits of querying ontologies. In: Proc. of ICDT. LNCS, vol. 4353, pp. 164–178. Springer (2007) 25. Trivela, D., Stoilos, G., Chortaras, A., Stamou, G.B.: Optimising resolution-based rewriting algorithms for OWL ontologies. J. Web Sem. 33, 30–49 (2015)