Deciding FO-Rewritability in EL Meghyn Bienvenu1 and Carsten Lutz2 and Frank Wolter3 1 LRI - CNRS & Université Paris Sud, France 2 Department of Computer Science, University of Bremen, Germany 3 Department of Computer Science, University of Liverpool, UK Abstract. We consider the problem of deciding, given an instance query A(x), an EL-TBox T , and possibly an ABox signature Σ, whether A(x) is FO-rewri- table relative to T and Σ-ABoxes. Our main results are PS PACE-completeness for the case where Σ comprises all symbols and E XP T IME-completeness for the general case. We also show that the problem is in PT IME for classical TBoxes and that every instance query is FO-rewritable into a polynomial-size FO query relative to every (semi)-acyclic TBox (under some mild assumptions on the data). 1 Introduction Over the last years, query answering over instance data has developed into one of the most prominent problems in description logic (DL) research. Many approaches aim at utilizing relational databases systems (RDBMSs), exploiting their mature technol- ogy, advanced optimization techniques, and the general infrastructure that those sys- tems offer. Roughly, RDBMS-based approaches can be classified into query rewriting approaches, where the original query and the DL TBox are compiled into an SQL query that is passed to the RDBMS for execution [5], and combined approaches, where the consequences of the TBox are materialized in the data in a compact form and some query rewriting is used to ensure correct answers despite the compact representation [12, 11]. This division is by no means strict, as illustrated by the approach presented in [7] which is based on query rewriting, but also has strong similarities with combined approaches. A fundamental difference between the query rewriting approach and the combined approach is that, in query rewriting, an exponential blowup of the query is often un- avoidable [8] while the combined approach typically blows up both query and data only polynomially [12, 11]. It is thus unsurprising that query execution is more efficient in the combined approach than in the query rewriting approach, see the experiments in [11]. Depending on the application, however, there can still be good reasons to use pure query rewriting. Ease of implementation: Query rewriting approaches are often easier to implement as they do not involve a data completion phase. When the TBox is sufficiently small so that the exponential blowup of the query is not prohibitive or when only a prototype implementation is aimed at, it may not be worthwhile to imple- ment a full combined approach. Access limitations: If the user does not have permission to modify the data in the database, materializing the consequences of the TBox in the data might simply be out of the question. This problem arises notably in information integration applications. In this paper, we are interested in TBoxes formulated in the description logic EL, which forms the basis of the OWL EL fragment of OWL 2 and is popular as a ba- sic language for large-scale ontologies. In general, query rewriting approaches are not applicable to EL because instance query answering in this DL is PT IME-complete re- garding data complexity while AC0 data complexity marks the boundary of DLs for which the pure query rewriting approach can be made work [5]. For example, the query A(x) cannot be answered by an SQL-based RDBMS in the presence of the very simple EL-TBox T = {∃r.A v A}, intuitively because T forces the concept name A to be propagated unboundedly along r-chains in the data and thus the rewritten query would have to express transitive closure of r. We say that A(x) is not FO-rewritable relative to T , alluding to the known equivalence of first-order (FO) formulas and SQL queries. Of course, such an isolated example does not rule out the possibility that some EL- TBoxes, including those that are used in applications, still enjoy FO-rewritability. For example, the query A(x) is FO-rewritable relative to the EL-TBox T 0 = {A v ∃r.A}: since the additional instances of A stipulated by T 0 are ‘anonymous objects’ (nulls in database parlance) rather than primary data objects, there is no unbounded propagation through the data and, in fact, we can simply drop T 0 when answering A(x). Inspired by these observations, the aim of this paper is to study FO-rewritability on the level of individual TBoxes, essentially following the non-uniform approach initiated in [15]. In particular, we are interested in deciding, for a given instance query (IQ) A(x) and EL- TBox T , whether q is FO-rewritable relative to T . Sometimes, we additionally allow as a third input an ABox-signature Σ that restricts the symbols which can occur in the data [2, 3]. Our main result is that deciding FO-rewritability of IQs relative to general EL- TBoxes (sets of concept inclusions C v D) is PS PACE-complete when the ABox signa- ture Σ is full (i.e., all symbols are allowed in the ABox) and E XP T IME-complete when Σ is given as an input. For proving these results, we establish some properties that are of independent interest, such as: (1) whenever an IQ is FO-rewritable, then it is FO- rewritable into a union of tree-shaped conjunctive queries; (2) an IQ is FO-rewritable relative to all ABoxes iff it is FO-rewritable relative to tree-shaped ABoxes (see Sec- tion 3 for a precise formulation). We also study more restricted forms of TBoxes, show- ing that FO-rewritability of IQs relative to classical TBoxes (sets of concept definitions A ≡ C and concept implications A v C with A atomic, cycles allowed) is in PT IME, even when Σ is part of the input. For semi-acyclic TBoxes T (classical TBoxes without cycles that involve only concept definitions, but potentially with cycles that involve at least one concept inclusion), we observe that every IQ is FO-rewritable relative to T (for any ABox signature Σ) and that, under the mild assumption that the admitted databases have domain size at least two, even a polynomial-sized rewriting is possible. While it is not our primary aim in this first publication to actually generate FO-rewritings, we note that all our results come with effective procedures for doing this (the rewritings are of triple-exponential size in the worst case). Although we focus on simple IQs of the form A(x), all results in this paper also apply to instance queries of the form C(x) with C an EL-concept. The treatment of conjunctive queries (CQs) is left for future work. We also discuss the connection of FO- rewritability in EL to boundedness in datalog and in the µ-calculus. Proof details are deferred to the long version at at http://www.informatik.uni-bremen.de/∼clu/papers/. 2 Preliminaries We remind the reader that EL-concepts are built up from concept names and the con- cept > using conjunction C u D and existential restriction ∃r.C. When we speak of a TBox without further qualification, we mean a general TBox, i.e., a finite set of concept inclusions (CIs) C v D. Other forms of TBoxes will be introduced later as needed. An ABox is a finite set of concept assertions A(a) and role assertions r(a, b) where A is a concept name, r a role name, and a, b individual names. We use Ind(A) to denote the set of all individual names used in A. It will sometimes be convenient to view an ABox A as an interpretation IA , defined in the obvious way (see [15]). Regarding query languages, we focus on instance queries (IQ), which have the form A(x) with A a concept name and x a variable. We write T , A |= A(a) if aI ∈ AI for all models I of T and A and call a a certain answer to A(x) given A and T . We use certT (A(x), A) to denote the set of all certain answers to A(x) given A and T . To define FO-rewritability, we require first-order queries (FOQs), which are first-order formulas constructed from atoms A(x), r(x, y), and x = y. We use ans(I, q) to denote the set of all answers to the FOQ q in the interpretation I. A signature is a set of concept and role names, which are uniformly called symbols in this context. A Σ-ABox is an ABox that uses only concept and role names from Σ. The full signature is the signature that contains all concept and role names. Definition 1 (FO-rewritability). Let T be an EL-TBox and Σ an ABox signature. An IQ q is FO-rewritable relative to T and Σ if there is a FOQ ϕ such that certT (A, q) = ans(IA , ϕ) for all Σ-ABoxes A. Then ϕ is an FO-rewriting of q relative to T and Σ. Example 1. Recall from the introduction that A(x) is not FO-rewritable relative to T = {∃r.A v A} and the full signature. If we add ∃r.> v A to T , then A(x) is FO-rewritable relative to the resulting TBox and the full signature, and ϕ(x) = A(x) ∨ ∃y r(x, y) is an FO-rewriting. If we choose Σ = {A}, then A(x) becomes FO-rewritable also relative to the original T , with the trivial FO-rewriting A(x). Conversely, if a query q is FO-rewritable relative to a TBox T 0 and a signature Σ, then q is FO-rewritable rel- ative to T 0 and any Σ 0 ⊆ Σ (take an FO-rewriting relative to T 0 and Σ and replace all atoms which involve predicates that are not in Σ 0 with false). Sometimes, instance queries have the more general form C(x) with C an EL- concept. Since C(x) is FO-rewritable relative to T and Σ whenever A(x) is FO- rewritable relative to T ∪ {A ≡ C} and Σ, A a fresh concept name, queries of this form are captured by the results in this paper. 3 General TBoxes – Upper Bounds We first characterize failure of FO-rewritability of an IQ A(x) relative to a TBox T and an ABox signature Σ in terms of the existence of certain Σ-ABoxes and then show how to decide the latter. The following result provides the starting point. An ABox is called tree-shaped if the directed graph (Ind(A), {(a, b) | r(a, b) ∈ A}) is a tree and r(a, b), s(a, b) ∈ A implies r = s. A FOQ is a tree-UCQ if it is a disjunction q1 ∨ · · · ∨ qn and each qi is a conjunctive query (CQ) that is tree-shaped (defined in analogy with tree-shaped ABoxes) and where the root is the only answer variable; see e.g. [15] for details on CQs. Theorem 1. Let T be an EL-TBox, Σ an ABox signature, and A(x) an IQ. Then 1. If A(x) is FO-rewritable relative to T and Σ, then there is a tree-UCQ that is an FO-rewriting of A(x) relative to T and Σ; 2. If ϕ(x) is an FO-rewriting of A(x) relative to T and tree-shaped Σ-ABoxes and ϕ(x) is a tree-UCQ, then ϕ(x) is an FO-rewriting relative to T and Σ; 3. A(x) is FO-rewritable relative to T and Σ iff A(x) is FO-rewritable relative to T and tree-shaped Σ-ABoxes. Of the three points in Theorem 1, Point 1 is most laborious to prove. It involves applying an Ehrenfeucht-Fraı̈ssé game and explicitly constructing a tree-UCQ as a disjunction of certain EL-concepts (c.f. the characterization of FO-rewritability in terms of datalog boundedness given in [15] and its proof). Point 2 can then be derived from Point 1, and Point 3 is an immediate consequence of Points 1 and 2. For a tree-shaped ABox A and k ≥ 0, we use A|k to denote the restriction of A to depth k. The following provides the first version of the announced characterization of FO-rewritability in terms of the existence of certain ABoxes. Theorem 2. Let T be an EL-TBox, Σ an ABox signature, and A(x) an IQ. Then A(x) is not FO-rewritable relative to T and Σ iff for every k ≥ 0, there is a tree-shaped Σ-ABox A of depth exceeding k with root a0 s.t. T , A |= A(a0 ) and T , A|k 6|= A(a0 ). The proof of Theorem 2 builds on Point 1 of Theorem 1. Note that if A(x) is FO- rewritable relative to T and Σ, then there is a k ≥ 0 such that for all tree-shaped Σ- ABoxes A of depth exceeding k with root a0 , T , A |= A(a0 ) implies T , A|k |= A(a0 ). In the proof of Theorem 2, we explicitly construct FO-rewritings which are tree-UCQs of outdegree at most |T | and depth at most k. To proceed, it is convenient to work with TBoxes in normal form, where all CIs must be of one of the forms A v B1 , A v ∃r.B, > v A, B1 u B2 v A, ∃r.B v A with A, B, B1 , B2 concept names. This can be assumed without loss of generality: Lemma 1. For any EL-TBox T , ABox signature Σ, and IQ A(x), there is a TBox T 0 in normal form such that for any FOQ ϕ, we have that ϕ(x) is an FO-rewriting of A(x) relative to T and Σ iff ϕ(x) is an FO-rewriting of A(x) relative to T 0 and Σ. To exploit Theorem 2 for building a decision procedure for FO-rewritability, we impose a bound on k. The next theorem is proved using Theorem 2 and a pumping argument. Theorem 3. Let T be an EL-TBox in normal form, Σ an ABox signature, A(x) an IQ, and n = |(sig(T ) ∪ Σ) ∩ NC |. Then A(x) is not FO-rewritable relative to T and Σ iff there exists a tree-shaped Σ-ABox A of depth exceeding 2n with root a0 such that T , A |= A(a0 ) and T , A|2n 6|= A(a0 ). Note that, with the remark after Theorem 2, we obtain a triple exponential upper bound on the size of FO-rewritings. The bound in Theorem 3 is optimal in the sense that, for every n ≥ 1, there is an EL-TBox T and an IQ A(x) such that |sig(T ) ∩ NC | = n, A(x) is FO-rewritable relative to T and the full Σ, and for all ABoxes of depth at least 2n with root a0 , we have T , A |= A(a0 ) iff T , A|2n −1 |= A(a0 ). Such a T can be constructed by simulating a binary counter, see Section 4 of [13]. Based on Theorem 3, we can establish the following result. Theorem 4. Deciding FO-rewritability of an IQ relative to an EL-TBox and an ABox signature is in E XP T IME. The proof utilizes non-deterministic bottom-up automata on finite, ranked trees: we construct exponential-size automata that accept precisely the ABoxes A from Theo- rem 3 and then decide their emptiness in PT IME. When Σ is full, the characterization given in Theorem 3 can be further improved. An ABox A is linear if it consists of role assertions r0 (a0 , a1 ), . . . , rn−1 (an−1 , an ) and concept assertions A(a) with a ∈ {a0 , . . . , an }. Somewhat unexpectedly, with full Σ we can replace the tree-shaped ABoxes from Theorem 3 with linear ones. Theorem 5. Let T be an EL-TBox in normal form, A(x) an IQ, and n = |(sig(T ) ∪ Σ) ∩ NC |. Then A(x) is not FO-rewritable relative to T (and the full Σ) iff there exists a linear ABox A of depth exceeding 2n with root a0 such that T , A |= A(a0 ) and T , A|2n 6|= A(a0 ). The surprisingly subtle proof of Theorem 5 is based on the careful extraction of a lin- ear ABox from the tree-shaped one whose existence is guaranteed by Theorem 3. The subtlety is largely due to the fact that it is not sufficient to simply select a linear chain of individuals from the tree-shaped ABox; additionally, the concept assertions on that chain have to be modified in a very careful way. The following example shows that, when Σ is not full, tree-shaped ABoxes in The- orem 3 cannot be replaced with linear ones. Example 2. Let T = {Ai v Xi , Bi u Xi v Yi , ∃r.Yi v Xi | i ∈ {1, 2}} ∪ {X1 u X2 v X, B1 u B2 v Z, ∃r.Z v X}, Σ = {A1 , A2 , B1 , B2 , r}, and take the IQ X(x). The tree-shaped ABox A = {r(a0 , ai,0 ), r(ai,0 , ai,1 ), . . . , r(ai,2n −1 , ai,2n ) | i ∈ {1, 2}} ∪ {B(ai,0 ), . . . , B(ai,2n ), Ai (ai,2n ) | i ∈ {1, 2}}, with n as in Theorems 3 and 5, is of depth exceeding 2n and it can be verified that T , A |= X(a0 ), but T , A|2n 6|= X(a0 ). However, for all linear Σ-ABoxes A, we have T , A |= X(a0 ) iff T , A|1 |= X(a0 ). Theorem 5 allows us to replace the non-deterministic tree automata in the proof of Theorem 6 with word automata, improving the upper bound to PS PACE. Theorem 6. Deciding FO-rewritability of an IQ relative to an EL-TBox and the full ABox signature is in PS PACE. 4 General TBoxes – Lower Bounds We establish lower bounds that match the upper bounds from the previous section. Theorem 7. Deciding FO-rewritability of an IQ relative to a general EL-TBox and an ABox signature Σ is (1) PS PACE-hard when Σ is full and (2) E XP T IME-hard when Σ is an input. The proof of Point 1 is by reduction of the word problem of polynomially space- bounded deterministic Turing machines (DTMs). For Point 2, we modify the proof of Point 1 to yield a reduction of the word problem of polynomially space-bounded alternating Turing machines (ATMs). We start with the former. Let M = (Q, Ω, Γ, δ, q0 , qacc , qrej ) be a DTM and p(·) its polynomial space bound. We assume w.l.o.g. that M terminates on every input, that it never attempts to move left on the left-most end of the tape, that q0 ∈ / {qacc , qrej }, and that there are no transitions defined for qacc and qrej . Let x ∈ Ω ∗ be an input to M of length n. Our aim is to construct a TBox T and select a concept name B such that B is not FO-rewritable relative to T and the full signature iff M accepts x. By Theorem 5, non-FO-rewritability of B w.r.t. T is witnessed by a sequence of linear ABoxes of increasing depth. In the reduction, these ABoxes take the form of longer and longer r-chains (with r a role name). The chains represent the computation of M on x, repeated over and over again. Specifically, the tape contents, the current state, and the head position are represented using the elements of Γ ∪ (Γ × Q) as concept names. If, for example, x = ab and the computation of M on x consists of the two configurations qab and aq 0 b,4 then this is represented by ABoxes of the form {r(b0 , b1 ), r(b1 , b2 ), r(b2 , b3 ), . . . , r(bn−1 , bn )} where additionally, the concept (q, a) is asserted for b0 , b4 , b8 , . . . , a is asserted for b1 , b5 , b9 , . . . and for b2 , b6 , b10 , . . . , and (q 0 , b) for b3 , b7 , b11 , . . . . If M accepts x, then B is propagated backwards along these chains (from b0 to b1 to b2 etc) unboundedly far, starting from a single explicit occurrence of B asserted for b0 . If M rejects x or the chain in the ABox does not properly represent the computation of M on x, then B will already be implied by any subchain of length at most p(n)2 and thus the unbounded propagation of B gets ‘disrupted’ resulting in FO-rewritability of B relative to T . The following CI in T results in backwards propagation of B provided that every ABox individual is labeled with at least one symbol from Γ ∪ Γ × Q: ∃r.(A u B) v B for all A ∈ Γ ∪ (Γ × Q). (1) Disrupt the propagation of B when M does not accept x: (a, qrej ) v B for all a ∈ Γ. We have to enforce that the ABox actually represents a (repeated) computation of M . To do this, we again use disruptions of the propagation of B: whenever an ABox A 4 uqv ∈ Γ ∗ QΓ ∗ means that M is in state q, the tape left of the head is labeled with u, and starting from the head position, the remaining tape is labeled with v. represents a configuration sequence that is not a proper computation, then B is implied by a sub-chain of bounded length. Let forbid denote the set of all tuples (A1 , A2 , A3 , A) with Ai ∈ Γ ∪(Γ ×Q) such that whenever three consecutive tape cells in a configuration c are labeled with A1 , A2 , A3 , then in the successor configuration c0 of c, the tape cell corresponding to the middle cell cannot be labeled with A. Put A u ∃rp(n)+1 .A1 u ∃rp(n) .A2 u ∃rp(n)−1 .A3 v B (2) for all (A1 , A2 , A3 , A) ∈ forbid. This ensures that the transition relation is respected and that the content of tape cells which are not under the head does not change. We also need to say that every tape cell has a unique label, that there is at not more than one head position per configuration, and at least one, again via disrupting the propagation of B: A u A0 v B for all distinct A, A0 ∈ Γ ∪ (Γ × Q) (a, q) v H for all (a, q) ∈ Γ × Q a v H for all a ∈ Γ i j ∃r .H u ∃r H v B for i < j < p(n) H u ∃r.H u · · · u ∃rp(n)−1 .H v B where H is a concept name indicating that the head is on the current cell and H indicat- ing that this is not the case. It remains to set up the initial configuration. It is tempting to introduce a concept name I that sets up the initial configuration and must be used at the end of the r-chain to start the propagation of B. However, since we assume the ABox signature Σ to be full, we can always put B itself at the end of the chain, avoid- ing I. To fix this issue, we refrain from introducing I, but utilize the final states qacc and qrej , enforcing that they must always be followed by the initial configuration. Let (0) (0) A1 , . . . , Am be the concept names that describe the initial configuration, i.e., when (0) (0) (0) the input x is x1 · · · xn , then A0 = (x1 , q0 ), Ai = xi for 2 ≤ i ≤ n, and Ai = xi is the blank symbol for n < i ≤ p(n). Now put (0) (0) ∃ri .qacc v Ai and ∃ri .qrej v Ai for 1 ≤ i ≤ p. (3) Note that all witness ABoxes for non-FO-rewritability of B must eventually contain qacc or qrej , thus the initial configuration will be properly set up at some point: ABoxes A that do not contain these states do not represent a proper computation of M because M reaches qacc or qrej after at most p(n) states, thus the propagation of B is disrupted in A. We now come to Point 2 of Theorem 7. When the ABox signature Σ is not re- quired to be full, it is much simpler to set up the initial configuration. Indeed, we can then simply introduce the mentioned concept name I, add I v B to T , and set Σ = Γ ∪ (Γ × Q) ∪ {r, I} to ensure that we must use I to start the propagation of B. (0) (0) We can further replace the CIs (3) above with ∃ri .I v Ai and ∃ri .I v Ai for 1 ≤ i ≤ p to ensure that I sets up the initial configuration as intended. With this modi- fication, we can adapt the reduction from DTMs to ATMs in a straightforward way, thus improving the PS PACE lower bound to an E XP T IME one. We only give a brief sketch. Assume w.l.o.g. that each configuration of the ATM M has at most two successor con- figurations. We introduce a second role name s ∈ Σ, reserving r for the first successor of a configuration and s for the second successor. Then the CIs (1) are replaced with ∃r.(A u B) u ∃s.(A0 u B) v B for all A, A0 ∈ Γ ∪ (Γ × Q). resulting in witness ABoxes to take the form of a tree-shaped ATM computation rather than a linear DTM computation. It remains to adapt the CIs (2) to reflect the branch- ing of computations. The set forbid now contains tuples (A1 , A2 , A3 , A01 , A02 , A03 , A), where A1 , A2 , A3 describe three neighboring cells in the left predecessor configuration and A01 , A02 , A03 the corresponding cells in the right predecessor configuration (for exis- tential states of M , we simply assume that the left and right predecessor are identical). We can then replace (2) with Au∃rp(n)+1 .A1 u∃rp(n) .A2 u∃rp(n)−1 .A3 u∃sp(n)+1 .A01 u∃sp(n) .A02 u∃sp(n)−1 .A03 v B. 5 Classical TBoxes A classical TBox T is a finite set of concept definitions A ≡ C and CIs A v C where A is a concept name. No concept name is allowed to occur more than once on the left hand side of a statement in T . Theorem 8. Deciding FO-rewritability of an IQ relative to a classical EL-TBox and an ABox signature is in PT IME. We give examples illustrating FO-rewritability in classical TBoxes and give the main idea of the proof. Example 3. (a) The IQ A(x) is not FO-rewritable relative to the classical TBox {A ≡ ∃r.A} and the full ABox signature. (b) The concept name A has a cyclic definition in the TBox T = { A ≡ B u ∃r.A, B v ∃r.A } which often indicates non-FO-rewritability, but in this case the IQ A(x) has an FO-rewriting relative to T and the full ABox signature, namely ϕ(x) = A(x) ∨ B(x). To present the idea of the proof, we use an appropriate normal form for classical TBoxes. A concept name A is defined in T if there is a definition A ≡ C ∈ T and primitive otherwise; A is non-conjunctive in T if it occurs in T , but there is no CI of the form A ≡ B1 u · · · u Bn in T with n ≥ 1 and B1 , . . . , Bn concept names. We use non-conj(T ) to denote the set of non-conjunctive concepts in T . A classical TBox T is in normal form if it is a set of statements A ≡ ∃r.B and A ≡ B1 u · · · u Bn where B, B1 , . . . , Bn are concept names and B1 , . . . , Bn are non-conjunctive. For every clas- sical TBox T , one can construct in polynomial time a classical TBox T 0 in normal form that uses additional concept names such that T 0 |= T and every model of T can be ex- panded to a model of T 0 [10]. It is not hard to verify that an IQ A(x) is FO-rewritable relative to T and Σ if and only if it is FO-rewritable relative to T 0 and Σ, provided that A is not among the new concept names introduced during the construction of T 0 . For a classical TBox T in normal form and a concept name A, define  {A} if A is non-conjunctive in T non-conjT (A) = {B1 , . . . , Bn } if A ≡ B1 u · · · u Bn ∈ T . Our polytime algorithm utilizes an ABox introduced in [10, 9] in the context of conser- vative extensions and logical difference: given a classical TBox T in normal form and an ABox signature Σ, we compute in polytime a polysize Σ-ABox AT ,Σ with individ- ual names aB , B non-conjunctive in T , such that for any Σ-ABox A, individual name a in A, and concept name A the following conditions are equivalent: – T , A 6|= A(a); – there exists B ∈ non-conjT (A) such that (A, a) is simulated by (AT ,Σ , aB ) (see appendix of long version of this paper). It follows that to check whether A(x) is FO-rewritable, instead of considering arbitrary tree-shaped Σ-ABoxes A and A|k as in Theorem 2, it suffices to consider the tree unfolding of AT ,Σ at aB and its restriction to depth k, for all B ∈ non-conjT (A). The original search problem has been reduced to the problem of analysing the tree unfolding of AT ,Σ . A polytime algorithm performing that analysis is given in the long version. 6 Semi-Acyclic TBoxes It is easy to see that every IQ is FO-rewritable relative to every acyclic EL-TBox and ev- ery ABox signature Σ. We observe that the same holds for semi-acyclic TBoxes, where some cycles are still allowed, and that it is possible to find rewritings of polynomial size when only databases of domain size at least two are admitted. A semi-acyclic TBox is defined like a classical TBox, except that definitorial cycles are disallowed, i.e., there cannot be concept definitions A0 ≡ C0 , . . . , An−1 ≡ Cn−1 such that Ai occurs in Ci+1 mod n . Note that cycles via concept inclusions, such as A v ∃r.A, are still permitted. Let T be a semi-acyclic TBox and Σ an W ABox signature. For an EL-concept C, we use preT ,Σ (C) to denote the FO-formula B∈Σ | T |=BvC B(x). For all concept names A and EL-concepts C and D and role names r, set ϕΣ>,T (x) = true ϕΣA,T (x) = preT ,Σ (A) if A is primitive ϕA,T (x) = ϕΣ Σ C,T (x) if A ≡ C ∈ T ϕCuD,T (x) = ϕC,T (x) ∧ ϕΣ Σ Σ D,T (x) ϕ∃r.C,T (x) = preT ,Σ (∃r.C) ∨ ∃y.(r(x, y) ∧ ϕΣ Σ C,T [y/x]) if r ∈ Σ ϕΣ ∃r.C,T (x) = pre T ,Σ (∃r.C) if r ∈ /Σ where ϕ[x/y] denotes the result of first renaming all bound variables in ϕ so that y does not occur, and then replacing the free variable x of ϕ with y. Lemma 2. For all IQs A(x), ϕΣ A,T (x) is an FO-rewriting of A(x) relative to T and Σ. The size of ϕΣ A,T (x) can clearly be exponential in the size of T , for example when A = An and T = {Ai ≡ ∃r.Ai−1 u ∃s.Ai−1 | 1 ≤ i ≤ n}. To reduce ϕΣ A,T to polynomial size, we can use Avigad’s observation that FO supports structure shar- ing [1]. More precisely, let ϕ be a positive FOQ (such as ϕΣA,T ) whose subformulas include ψ(x1 ), . . . , ψ(xn ). The multiple occurences of ψ can be avoided by rewriting ϕ to ∃u∀y∀z (ψ(y) ↔ z = u) → ϕ0 where ϕ0 is ϕ with each ψ(xi ) replaced with y = xi → z = u. Intuitively, we iterate over all y and memorize whether ψ(y) holds using identity of z with u. Since we need at least two different ‘values’ for z to make this trick work, the resulting FOQ is an FO-rewriting only on ABoxes with at least two individual names. 7 Related Work In [15], deciding FO-rewritability is studied in the context of the expressive DL ALCF I and several of its fragments. In general, though, the setup in that paper is different: while we are interested in deciding FO-rewritability of a single query relative to a TBox, the results in [15] concern deciding whether, for a given TBox T , all queries are FO-rewritable relative to T . It is shown that this problem is decidable for Horn- ALCF I-TBoxes of depth at most two and for Horn-ALCF-TBoxes (queries are IQs or, equivalently, CQs). As a by-product of these results, a close connection between FO-rewritability of TBoxes formulated in Horn DLs and boundedness of datalog pro- grams is observed, see e.g. [6, 17] for the latter problem. In its original formulation, the following result is established for a larger class of TBoxes, namely materializable ALCF I-TBoxes of depth one. Lemma 3 ([15]). For every (general) EL-TBox T in normal form, there is a datalog program ΠT such that for every ABox signature Σ and IQ A(x), the predicate A is bounded in ΠT relative to Σ-databases iff A(x) is FO-rewritable relative to T and Σ. In [15], the program ΠT is of exponential size. Since we are only interested in EL- TBoxes, it is easy to find a ΠT of polynomial size. More specifically, ΠT consists of A(x) ← true if > v A ∈ T B(x) ← r(x, y), A(x) if ∃r.A v B ∈ T X∃r.A (x) ← B(x) if B v ∃r.A ∈ T B(x) ← A1 (x), A2 (x) if A1 u A2 v B ∈ T (where possibly A1 = A2 ) B(x) ← X∃r.A (x) if ∃r.B0 v B ∈ T and T |= A v B0 This allows to carry over the 2E XP T IME upper bound for predicate boundedness of con- nected monadic datalog programs [6] to FO-rewritability of an IQ relative to a general EL-TBoxes and an ABox signature.5 Note that boundedness has been studied also in the context of the µ-calculus and monadic second order (MSO) logic [16, 4]. Here, an E XP T IME upper bound is known from [16] and it seems likely that this result can be utilized to find an alternative proof of Theorem 6. In particular, it is possible to find a µ-calculus rewriting ϕ of an IQ A(x) relative to an EL-TBox T and ABox signature Σ: proceeding similarly to the construction of the above datalog program ΠT , we can find a µ-calculus formula ϕT ,Σ such that for all Σ-ABoxes A and a ∈ Ind(A), we have T , A |= A(a) iff IA , a |= ϕ. When simultaneous fixpoints are admitted, ϕ even has polynomial size. 5 Note that we explicitly fix the signature Σ of the databases over which boundedness of ΠT is considered instead of assuming that only the EDB predicates can be used in the data as in [6]; this is only for simplicity and, in fact, it is easy to adapt ΠT to the latter assumption. 8 Conclusions It would be interesting to generalize the results presented in this paper to more expres- sive DLs and to more expressive query languages. Regarding the former, we note that using the techniques in [14, 15] it is possible to derive a NE XP T IME upper bound for deciding FO-rewritability of IQs relative to Horn-ALCI-TBoxes and ABox signatures. Regarding the latter, CQs are a natural choice and we believe that a mix of techniques from this paper and those in [3] might provide a good starting point. It is interesting to note that FO-rewritability of all IQ-atoms A(x) in a CQ q does not imply that q is FO-rewritable and the converse fails, too. Acknowledgements. C. Lutz was supported by the DFG SFB/TR 8 ‘Spatial Cognition’. References 1. J. Avigad. Eliminating definitions and skolem functions in first-order logic. In Proc. of LICS, pages 139–146. IEEE Computer Society, 2001. 2. F. Baader, M. Bienvenu, C. Lutz, and F. Wolter. Query and predicate emptiness in description logics. In Proc. of KR. AAAI Press, 2010. 3. M. Bienvenu, C. Lutz, and F. Wolter. Query containment in description logics reconsidered. In Proc. of KR, 2012. To appear. 4. A. Blumensath, M. Otto, and M. Weyer. Decidability results for the boundedness problem. Manuscript, 2012. 5. D. Calvanese, G. De Giacomo, D. Lembo, M. Lenzerini, and R. Rosati. Tractable reasoning and efficient query answering in description logics: The DL-Lite family. J. of Automated Reasoning, 39(3):385–429, 2007. 6. S. S. Cosmadakis, H. Gaifman, P. C. Kanellakis, and M. Y. Vardi. Decidable optimization problems for database logic programs. In Proc. of STOC, pages 477–490. ACM, 1988. 7. G. Gottlob and T. Schwentick. Rewriting ontological queries into small nonrecursive datalog programs. In Proc. of DL. CEUR-WS, 2011. 8. S. Kikot, R. Kontchakov, V. V. Podolskii, and M. Zakharyaschev. Exponential lower bounds and separation for query rewriting. CoRR, abs/1202.4193, 2012. 9. B. Konev, M. Ludwig, D. Walther, and F. Wolter. The logical diff for the lightweight descrip- tion logic EL. Technical report, U. of Liverpool, http://www.liv.ac.uk/∼ frank/publ/, 2011. 10. B. Konev, D. Walther, and F. Wolter. The logical difference problem for description logic terminologies. In Proc. of IJCAR, pages 259–274. Springer, 2008. 11. R. Kontchakov, C. Lutz, D. Toman, F. Wolter, and M. Zakharyaschev. The combined ap- proach to query answering in DL-Lite. In Proc. of KR. AAAI Press, 2010. 12. C. Lutz, D. Toman, and F. Wolter. Conjunctive query answering in the description logic EL using a relational database system. In Proc. of IJCAI, pages 2070–2075. AAAI Press, 2009. 13. C. Lutz and F. Wolter. Deciding inseparability and conservative extensions in the description logic EL. In J. of Symbolic Computation 45(2): 194-228, 2010. 14. C. Lutz and F. Wolter. Non-uniform data complexity of query answering in description logics. In Proc. of DL. CEUR-WS, 2011. 15. C. Lutz and F. Wolter. Non-uniform data complexity of query answering in description logics. In Proc. of KR, 2012. To appear. 16. M. Otto. Eliminating recursion in the µ-calculus. In Proc. of STACS, pages 531–540. Springer, 1999. 17. R. van der Meyden. Predicate boundedness of linear monadic datalog is in PSPACE. Int. J. Found. Comput. Sci., 11(4):591–612, 2000.