Deciding FO-Rewritability in EL

                Meghyn Bienvenu1 and Carsten Lutz2 and Frank Wolter3
                        1
                          LRI - CNRS & Université Paris Sud, France
            2
                Department of Computer Science, University of Bremen, Germany
                3
                  Department of Computer Science, University of Liverpool, UK


       Abstract. We consider the problem of deciding, given an instance query A(x),
       an EL-TBox T , and possibly an ABox signature Σ, whether A(x) is FO-rewri-
       table relative to T and Σ-ABoxes. Our main results are PS PACE-completeness
       for the case where Σ comprises all symbols and E XP T IME-completeness for the
       general case. We also show that the problem is in PT IME for classical TBoxes
       and that every instance query is FO-rewritable into a polynomial-size FO query
       relative to every (semi)-acyclic TBox (under some mild assumptions on the data).


1   Introduction
Over the last years, query answering over instance data has developed into one of the
most prominent problems in description logic (DL) research. Many approaches aim
at utilizing relational databases systems (RDBMSs), exploiting their mature technol-
ogy, advanced optimization techniques, and the general infrastructure that those sys-
tems offer. Roughly, RDBMS-based approaches can be classified into query rewriting
approaches, where the original query and the DL TBox are compiled into an SQL query
that is passed to the RDBMS for execution [5], and combined approaches, where the
consequences of the TBox are materialized in the data in a compact form and some
query rewriting is used to ensure correct answers despite the compact representation
[12, 11]. This division is by no means strict, as illustrated by the approach presented in
[7] which is based on query rewriting, but also has strong similarities with combined
approaches.
    A fundamental difference between the query rewriting approach and the combined
approach is that, in query rewriting, an exponential blowup of the query is often un-
avoidable [8] while the combined approach typically blows up both query and data
only polynomially [12, 11]. It is thus unsurprising that query execution is more efficient
in the combined approach than in the query rewriting approach, see the experiments
in [11]. Depending on the application, however, there can still be good reasons to use
pure query rewriting. Ease of implementation: Query rewriting approaches are often
easier to implement as they do not involve a data completion phase. When the TBox
is sufficiently small so that the exponential blowup of the query is not prohibitive or
when only a prototype implementation is aimed at, it may not be worthwhile to imple-
ment a full combined approach. Access limitations: If the user does not have permission
to modify the data in the database, materializing the consequences of the TBox in the
data might simply be out of the question. This problem arises notably in information
integration applications.
    In this paper, we are interested in TBoxes formulated in the description logic EL,
which forms the basis of the OWL EL fragment of OWL 2 and is popular as a ba-
sic language for large-scale ontologies. In general, query rewriting approaches are not
applicable to EL because instance query answering in this DL is PT IME-complete re-
garding data complexity while AC0 data complexity marks the boundary of DLs for
which the pure query rewriting approach can be made work [5]. For example, the query
A(x) cannot be answered by an SQL-based RDBMS in the presence of the very simple
EL-TBox T = {∃r.A v A}, intuitively because T forces the concept name A to be
propagated unboundedly along r-chains in the data and thus the rewritten query would
have to express transitive closure of r. We say that A(x) is not FO-rewritable relative
to T , alluding to the known equivalence of first-order (FO) formulas and SQL queries.
    Of course, such an isolated example does not rule out the possibility that some EL-
TBoxes, including those that are used in applications, still enjoy FO-rewritability. For
example, the query A(x) is FO-rewritable relative to the EL-TBox T 0 = {A v ∃r.A}:
since the additional instances of A stipulated by T 0 are ‘anonymous objects’ (nulls in
database parlance) rather than primary data objects, there is no unbounded propagation
through the data and, in fact, we can simply drop T 0 when answering A(x). Inspired
by these observations, the aim of this paper is to study FO-rewritability on the level of
individual TBoxes, essentially following the non-uniform approach initiated in [15]. In
particular, we are interested in deciding, for a given instance query (IQ) A(x) and EL-
TBox T , whether q is FO-rewritable relative to T . Sometimes, we additionally allow
as a third input an ABox-signature Σ that restricts the symbols which can occur in the
data [2, 3].
    Our main result is that deciding FO-rewritability of IQs relative to general EL-
TBoxes (sets of concept inclusions C v D) is PS PACE-complete when the ABox signa-
ture Σ is full (i.e., all symbols are allowed in the ABox) and E XP T IME-complete when
Σ is given as an input. For proving these results, we establish some properties that are
of independent interest, such as: (1) whenever an IQ is FO-rewritable, then it is FO-
rewritable into a union of tree-shaped conjunctive queries; (2) an IQ is FO-rewritable
relative to all ABoxes iff it is FO-rewritable relative to tree-shaped ABoxes (see Sec-
tion 3 for a precise formulation). We also study more restricted forms of TBoxes, show-
ing that FO-rewritability of IQs relative to classical TBoxes (sets of concept definitions
A ≡ C and concept implications A v C with A atomic, cycles allowed) is in PT IME,
even when Σ is part of the input. For semi-acyclic TBoxes T (classical TBoxes without
cycles that involve only concept definitions, but potentially with cycles that involve at
least one concept inclusion), we observe that every IQ is FO-rewritable relative to T (for
any ABox signature Σ) and that, under the mild assumption that the admitted databases
have domain size at least two, even a polynomial-sized rewriting is possible. While it is
not our primary aim in this first publication to actually generate FO-rewritings, we note
that all our results come with effective procedures for doing this (the rewritings are of
triple-exponential size in the worst case).
    Although we focus on simple IQs of the form A(x), all results in this paper also
apply to instance queries of the form C(x) with C an EL-concept. The treatment of
conjunctive queries (CQs) is left for future work. We also discuss the connection of FO-
rewritability in EL to boundedness in datalog and in the µ-calculus. Proof details are
deferred to the long version at at http://www.informatik.uni-bremen.de/∼clu/papers/.
2   Preliminaries

We remind the reader that EL-concepts are built up from concept names and the con-
cept > using conjunction C u D and existential restriction ∃r.C. When we speak of a
TBox without further qualification, we mean a general TBox, i.e., a finite set of concept
inclusions (CIs) C v D. Other forms of TBoxes will be introduced later as needed. An
ABox is a finite set of concept assertions A(a) and role assertions r(a, b) where A is a
concept name, r a role name, and a, b individual names. We use Ind(A) to denote the
set of all individual names used in A. It will sometimes be convenient to view an ABox
A as an interpretation IA , defined in the obvious way (see [15]).
    Regarding query languages, we focus on instance queries (IQ), which have the form
A(x) with A a concept name and x a variable. We write T , A |= A(a) if aI ∈ AI for
all models I of T and A and call a a certain answer to A(x) given A and T . We
use certT (A(x), A) to denote the set of all certain answers to A(x) given A and T .
To define FO-rewritability, we require first-order queries (FOQs), which are first-order
formulas constructed from atoms A(x), r(x, y), and x = y. We use ans(I, q) to denote
the set of all answers to the FOQ q in the interpretation I.
    A signature is a set of concept and role names, which are uniformly called symbols
in this context. A Σ-ABox is an ABox that uses only concept and role names from Σ.
The full signature is the signature that contains all concept and role names.

Definition 1 (FO-rewritability). Let T be an EL-TBox and Σ an ABox signature. An
IQ q is FO-rewritable relative to T and Σ if there is a FOQ ϕ such that certT (A, q) =
ans(IA , ϕ) for all Σ-ABoxes A. Then ϕ is an FO-rewriting of q relative to T and Σ.

Example 1. Recall from the introduction that A(x) is not FO-rewritable relative to
T = {∃r.A v A} and the full signature. If we add ∃r.> v A to T , then A(x) is
FO-rewritable relative to the resulting TBox and the full signature, and ϕ(x) = A(x) ∨
∃y r(x, y) is an FO-rewriting. If we choose Σ = {A}, then A(x) becomes FO-rewritable
also relative to the original T , with the trivial FO-rewriting A(x). Conversely, if a query
q is FO-rewritable relative to a TBox T 0 and a signature Σ, then q is FO-rewritable rel-
ative to T 0 and any Σ 0 ⊆ Σ (take an FO-rewriting relative to T 0 and Σ and replace all
atoms which involve predicates that are not in Σ 0 with false).

   Sometimes, instance queries have the more general form C(x) with C an EL-
concept. Since C(x) is FO-rewritable relative to T and Σ whenever A(x) is FO-
rewritable relative to T ∪ {A ≡ C} and Σ, A a fresh concept name, queries of this
form are captured by the results in this paper.


3   General TBoxes – Upper Bounds

We first characterize failure of FO-rewritability of an IQ A(x) relative to a TBox T
and an ABox signature Σ in terms of the existence of certain Σ-ABoxes and then show
how to decide the latter. The following result provides the starting point. An ABox
is called tree-shaped if the directed graph (Ind(A), {(a, b) | r(a, b) ∈ A}) is a tree
and r(a, b), s(a, b) ∈ A implies r = s. A FOQ is a tree-UCQ if it is a disjunction
q1 ∨ · · · ∨ qn and each qi is a conjunctive query (CQ) that is tree-shaped (defined in
analogy with tree-shaped ABoxes) and where the root is the only answer variable; see
e.g. [15] for details on CQs.
Theorem 1. Let T be an EL-TBox, Σ an ABox signature, and A(x) an IQ. Then
 1. If A(x) is FO-rewritable relative to T and Σ, then there is a tree-UCQ that is an
    FO-rewriting of A(x) relative to T and Σ;
 2. If ϕ(x) is an FO-rewriting of A(x) relative to T and tree-shaped Σ-ABoxes and
    ϕ(x) is a tree-UCQ, then ϕ(x) is an FO-rewriting relative to T and Σ;
 3. A(x) is FO-rewritable relative to T and Σ iff A(x) is FO-rewritable relative to T
    and tree-shaped Σ-ABoxes.
Of the three points in Theorem 1, Point 1 is most laborious to prove. It involves applying
an Ehrenfeucht-Fraı̈ssé game and explicitly constructing a tree-UCQ as a disjunction of
certain EL-concepts (c.f. the characterization of FO-rewritability in terms of datalog
boundedness given in [15] and its proof). Point 2 can then be derived from Point 1, and
Point 3 is an immediate consequence of Points 1 and 2.
    For a tree-shaped ABox A and k ≥ 0, we use A|k to denote the restriction of A to
depth k. The following provides the first version of the announced characterization of
FO-rewritability in terms of the existence of certain ABoxes.
Theorem 2. Let T be an EL-TBox, Σ an ABox signature, and A(x) an IQ. Then A(x)
is not FO-rewritable relative to T and Σ iff for every k ≥ 0, there is a tree-shaped
Σ-ABox A of depth exceeding k with root a0 s.t. T , A |= A(a0 ) and T , A|k 6|= A(a0 ).
The proof of Theorem 2 builds on Point 1 of Theorem 1. Note that if A(x) is FO-
rewritable relative to T and Σ, then there is a k ≥ 0 such that for all tree-shaped Σ-
ABoxes A of depth exceeding k with root a0 , T , A |= A(a0 ) implies T , A|k |= A(a0 ).
In the proof of Theorem 2, we explicitly construct FO-rewritings which are tree-UCQs
of outdegree at most |T | and depth at most k.
    To proceed, it is convenient to work with TBoxes in normal form, where all CIs
must be of one of the forms A v B1 , A v ∃r.B, > v A, B1 u B2 v A, ∃r.B v A
with A, B, B1 , B2 concept names. This can be assumed without loss of generality:
Lemma 1. For any EL-TBox T , ABox signature Σ, and IQ A(x), there is a TBox T 0
in normal form such that for any FOQ ϕ, we have that ϕ(x) is an FO-rewriting of A(x)
relative to T and Σ iff ϕ(x) is an FO-rewriting of A(x) relative to T 0 and Σ.
To exploit Theorem 2 for building a decision procedure for FO-rewritability, we impose
a bound on k. The next theorem is proved using Theorem 2 and a pumping argument.
Theorem 3. Let T be an EL-TBox in normal form, Σ an ABox signature, A(x) an IQ,
and n = |(sig(T ) ∪ Σ) ∩ NC |. Then A(x) is not FO-rewritable relative to T and Σ
iff there exists a tree-shaped Σ-ABox A of depth exceeding 2n with root a0 such that
T , A |= A(a0 ) and T , A|2n 6|= A(a0 ).
Note that, with the remark after Theorem 2, we obtain a triple exponential upper bound
on the size of FO-rewritings. The bound in Theorem 3 is optimal in the sense that, for
every n ≥ 1, there is an EL-TBox T and an IQ A(x) such that |sig(T ) ∩ NC | = n,
A(x) is FO-rewritable relative to T and the full Σ, and for all ABoxes of depth at least
2n with root a0 , we have T , A |= A(a0 ) iff T , A|2n −1 |= A(a0 ). Such a T can be
constructed by simulating a binary counter, see Section 4 of [13]. Based on Theorem 3,
we can establish the following result.

Theorem 4. Deciding FO-rewritability of an IQ relative to an EL-TBox and an ABox
signature is in E XP T IME.

The proof utilizes non-deterministic bottom-up automata on finite, ranked trees: we
construct exponential-size automata that accept precisely the ABoxes A from Theo-
rem 3 and then decide their emptiness in PT IME.
   When Σ is full, the characterization given in Theorem 3 can be further improved.
An ABox A is linear if it consists of role assertions r0 (a0 , a1 ), . . . , rn−1 (an−1 , an )
and concept assertions A(a) with a ∈ {a0 , . . . , an }. Somewhat unexpectedly, with full
Σ we can replace the tree-shaped ABoxes from Theorem 3 with linear ones.

Theorem 5. Let T be an EL-TBox in normal form, A(x) an IQ, and n = |(sig(T ) ∪
Σ) ∩ NC |. Then A(x) is not FO-rewritable relative to T (and the full Σ) iff there exists
a linear ABox A of depth exceeding 2n with root a0 such that T , A |= A(a0 ) and
T , A|2n 6|= A(a0 ).

The surprisingly subtle proof of Theorem 5 is based on the careful extraction of a lin-
ear ABox from the tree-shaped one whose existence is guaranteed by Theorem 3. The
subtlety is largely due to the fact that it is not sufficient to simply select a linear chain
of individuals from the tree-shaped ABox; additionally, the concept assertions on that
chain have to be modified in a very careful way.
    The following example shows that, when Σ is not full, tree-shaped ABoxes in The-
orem 3 cannot be replaced with linear ones.

Example 2. Let T = {Ai v Xi , Bi u Xi v Yi , ∃r.Yi v Xi | i ∈ {1, 2}} ∪
                      {X1 u X2 v X, B1 u B2 v Z, ∃r.Z v X},
Σ = {A1 , A2 , B1 , B2 , r}, and take the IQ X(x). The tree-shaped ABox

          A = {r(a0 , ai,0 ), r(ai,0 , ai,1 ), . . . , r(ai,2n −1 , ai,2n ) | i ∈ {1, 2}} ∪
              {B(ai,0 ), . . . , B(ai,2n ), Ai (ai,2n ) | i ∈ {1, 2}},

with n as in Theorems 3 and 5, is of depth exceeding 2n and it can be verified that
T , A |= X(a0 ), but T , A|2n 6|= X(a0 ). However, for all linear Σ-ABoxes A, we have
T , A |= X(a0 ) iff T , A|1 |= X(a0 ).

Theorem 5 allows us to replace the non-deterministic tree automata in the proof of
Theorem 6 with word automata, improving the upper bound to PS PACE.

Theorem 6. Deciding FO-rewritability of an IQ relative to an EL-TBox and the full
ABox signature is in PS PACE.
4      General TBoxes – Lower Bounds
We establish lower bounds that match the upper bounds from the previous section.
Theorem 7. Deciding FO-rewritability of an IQ relative to a general EL-TBox and an
ABox signature Σ is (1) PS PACE-hard when Σ is full and (2) E XP T IME-hard when Σ
is an input.
The proof of Point 1 is by reduction of the word problem of polynomially space-
bounded deterministic Turing machines (DTMs). For Point 2, we modify the proof
of Point 1 to yield a reduction of the word problem of polynomially space-bounded
alternating Turing machines (ATMs). We start with the former.
    Let M = (Q, Ω, Γ, δ, q0 , qacc , qrej ) be a DTM and p(·) its polynomial space bound.
We assume w.l.o.g. that M terminates on every input, that it never attempts to move left
on the left-most end of the tape, that q0 ∈   / {qacc , qrej }, and that there are no transitions
defined for qacc and qrej . Let x ∈ Ω ∗ be an input to M of length n. Our aim is to
construct a TBox T and select a concept name B such that B is not FO-rewritable
relative to T and the full signature iff M accepts x.
    By Theorem 5, non-FO-rewritability of B w.r.t. T is witnessed by a sequence of
linear ABoxes of increasing depth. In the reduction, these ABoxes take the form of
longer and longer r-chains (with r a role name). The chains represent the computation
of M on x, repeated over and over again. Specifically, the tape contents, the current
state, and the head position are represented using the elements of Γ ∪ (Γ × Q) as
concept names. If, for example, x = ab and the computation of M on x consists of the
two configurations qab and aq 0 b,4 then this is represented by ABoxes of the form

                         {r(b0 , b1 ), r(b1 , b2 ), r(b2 , b3 ), . . . , r(bn−1 , bn )}

where additionally, the concept (q, a) is asserted for b0 , b4 , b8 , . . . , a is asserted for
b1 , b5 , b9 , . . . and for b2 , b6 , b10 , . . . , and (q 0 , b) for b3 , b7 , b11 , . . . . If M accepts x, then
B is propagated backwards along these chains (from b0 to b1 to b2 etc) unboundedly
far, starting from a single explicit occurrence of B asserted for b0 . If M rejects x or the
chain in the ABox does not properly represent the computation of M on x, then B will
already be implied by any subchain of length at most p(n)2 and thus the unbounded
propagation of B gets ‘disrupted’ resulting in FO-rewritability of B relative to T .
     The following CI in T results in backwards propagation of B provided that every
ABox individual is labeled with at least one symbol from Γ ∪ Γ × Q:

                            ∃r.(A u B) v B for all A ∈ Γ ∪ (Γ × Q).                                            (1)

Disrupt the propagation of B when M does not accept x:

                                       (a, qrej ) v B for all a ∈ Γ.

We have to enforce that the ABox actually represents a (repeated) computation of M .
To do this, we again use disruptions of the propagation of B: whenever an ABox A
 4
     uqv ∈ Γ ∗ QΓ ∗ means that M is in state q, the tape left of the head is labeled with u, and
     starting from the head position, the remaining tape is labeled with v.
represents a configuration sequence that is not a proper computation, then B is implied
by a sub-chain of bounded length. Let forbid denote the set of all tuples (A1 , A2 , A3 , A)
with Ai ∈ Γ ∪(Γ ×Q) such that whenever three consecutive tape cells in a configuration
c are labeled with A1 , A2 , A3 , then in the successor configuration c0 of c, the tape cell
corresponding to the middle cell cannot be labeled with A. Put

                  A u ∃rp(n)+1 .A1 u ∃rp(n) .A2 u ∃rp(n)−1 .A3 v B                       (2)

for all (A1 , A2 , A3 , A) ∈ forbid. This ensures that the transition relation is respected
and that the content of tape cells which are not under the head does not change. We also
need to say that every tape cell has a unique label, that there is at not more than one
head position per configuration, and at least one, again via disrupting the propagation
of B:
         A u A0 v B for all distinct A, A0 ∈ Γ ∪ (Γ × Q)
          (a, q) v H for all (a, q) ∈ Γ × Q                    a v H for all a ∈ Γ
   i         j
∃r .H u ∃r H v B for i < j < p(n)                      H u ∃r.H u · · · u ∃rp(n)−1 .H v B

where H is a concept name indicating that the head is on the current cell and H indicat-
ing that this is not the case. It remains to set up the initial configuration. It is tempting
to introduce a concept name I that sets up the initial configuration and must be used
at the end of the r-chain to start the propagation of B. However, since we assume the
ABox signature Σ to be full, we can always put B itself at the end of the chain, avoid-
ing I. To fix this issue, we refrain from introducing I, but utilize the final states qacc
and qrej , enforcing that they must always be followed by the initial configuration. Let
  (0)         (0)
A1 , . . . , Am be the concept names that describe the initial configuration, i.e., when
                                   (0)              (0)                               (0)
the input x is x1 · · · xn , then A0 = (x1 , q0 ), Ai = xi for 2 ≤ i ≤ n, and Ai = xi
is the blank symbol for n < i ≤ p(n). Now put
                         (0)                        (0)
           ∃ri .qacc v Ai      and    ∃ri .qrej v Ai              for 1 ≤ i ≤ p.         (3)

Note that all witness ABoxes for non-FO-rewritability of B must eventually contain
qacc or qrej , thus the initial configuration will be properly set up at some point: ABoxes
A that do not contain these states do not represent a proper computation of M because
M reaches qacc or qrej after at most p(n) states, thus the propagation of B is disrupted
in A.
    We now come to Point 2 of Theorem 7. When the ABox signature Σ is not re-
quired to be full, it is much simpler to set up the initial configuration. Indeed, we
can then simply introduce the mentioned concept name I, add I v B to T , and set
Σ = Γ ∪ (Γ × Q) ∪ {r, I} to ensure that we must use I to start the propagation of B.
                                                                   (0)                (0)
We can further replace the CIs (3) above with ∃ri .I v Ai and ∃ri .I v Ai for
1 ≤ i ≤ p to ensure that I sets up the initial configuration as intended. With this modi-
fication, we can adapt the reduction from DTMs to ATMs in a straightforward way, thus
improving the PS PACE lower bound to an E XP T IME one. We only give a brief sketch.
Assume w.l.o.g. that each configuration of the ATM M has at most two successor con-
figurations. We introduce a second role name s ∈ Σ, reserving r for the first successor
of a configuration and s for the second successor. Then the CIs (1) are replaced with

            ∃r.(A u B) u ∃s.(A0 u B) v B for all A, A0 ∈ Γ ∪ (Γ × Q).

resulting in witness ABoxes to take the form of a tree-shaped ATM computation rather
than a linear DTM computation. It remains to adapt the CIs (2) to reflect the branch-
ing of computations. The set forbid now contains tuples (A1 , A2 , A3 , A01 , A02 , A03 , A),
where A1 , A2 , A3 describe three neighboring cells in the left predecessor configuration
and A01 , A02 , A03 the corresponding cells in the right predecessor configuration (for exis-
tential states of M , we simply assume that the left and right predecessor are identical).
We can then replace (2) with

Au∃rp(n)+1 .A1 u∃rp(n) .A2 u∃rp(n)−1 .A3 u∃sp(n)+1 .A01 u∃sp(n) .A02 u∃sp(n)−1 .A03 v B.


5   Classical TBoxes
A classical TBox T is a finite set of concept definitions A ≡ C and CIs A v C where
A is a concept name. No concept name is allowed to occur more than once on the left
hand side of a statement in T .
Theorem 8. Deciding FO-rewritability of an IQ relative to a classical EL-TBox and
an ABox signature is in PT IME.
We give examples illustrating FO-rewritability in classical TBoxes and give the main
idea of the proof.
Example 3. (a) The IQ A(x) is not FO-rewritable relative to the classical TBox {A ≡
∃r.A} and the full ABox signature.
(b) The concept name A has a cyclic definition in the TBox T = { A ≡ B u ∃r.A, B v
∃r.A } which often indicates non-FO-rewritability, but in this case the IQ A(x) has an
FO-rewriting relative to T and the full ABox signature, namely ϕ(x) = A(x) ∨ B(x).
To present the idea of the proof, we use an appropriate normal form for classical
TBoxes. A concept name A is defined in T if there is a definition A ≡ C ∈ T and
primitive otherwise; A is non-conjunctive in T if it occurs in T , but there is no CI of
the form A ≡ B1 u · · · u Bn in T with n ≥ 1 and B1 , . . . , Bn concept names. We use
non-conj(T ) to denote the set of non-conjunctive concepts in T . A classical TBox T
is in normal form if it is a set of statements A ≡ ∃r.B and A ≡ B1 u · · · u Bn where
B, B1 , . . . , Bn are concept names and B1 , . . . , Bn are non-conjunctive. For every clas-
sical TBox T , one can construct in polynomial time a classical TBox T 0 in normal form
that uses additional concept names such that T 0 |= T and every model of T can be ex-
panded to a model of T 0 [10]. It is not hard to verify that an IQ A(x) is FO-rewritable
relative to T and Σ if and only if it is FO-rewritable relative to T 0 and Σ, provided that
A is not among the new concept names introduced during the construction of T 0 . For a
classical TBox T in normal form and a concept name A, define
                                
                                  {A}                if A is non-conjunctive in T
               non-conjT (A) =
                                  {B1 , . . . , Bn } if A ≡ B1 u · · · u Bn ∈ T .
Our polytime algorithm utilizes an ABox introduced in [10, 9] in the context of conser-
vative extensions and logical difference: given a classical TBox T in normal form and
an ABox signature Σ, we compute in polytime a polysize Σ-ABox AT ,Σ with individ-
ual names aB , B non-conjunctive in T , such that for any Σ-ABox A, individual name
a in A, and concept name A the following conditions are equivalent:
    – T , A 6|= A(a);
    – there exists B ∈ non-conjT (A) such that (A, a) is simulated by (AT ,Σ , aB ) (see
      appendix of long version of this paper).
It follows that to check whether A(x) is FO-rewritable, instead of considering arbitrary
tree-shaped Σ-ABoxes A and A|k as in Theorem 2, it suffices to consider the tree
unfolding of AT ,Σ at aB and its restriction to depth k, for all B ∈ non-conjT (A). The
original search problem has been reduced to the problem of analysing the tree unfolding
of AT ,Σ . A polytime algorithm performing that analysis is given in the long version.


6     Semi-Acyclic TBoxes
It is easy to see that every IQ is FO-rewritable relative to every acyclic EL-TBox and ev-
ery ABox signature Σ. We observe that the same holds for semi-acyclic TBoxes, where
some cycles are still allowed, and that it is possible to find rewritings of polynomial size
when only databases of domain size at least two are admitted.
     A semi-acyclic TBox is defined like a classical TBox, except that definitorial cycles
are disallowed, i.e., there cannot be concept definitions A0 ≡ C0 , . . . , An−1 ≡ Cn−1
such that Ai occurs in Ci+1 mod n . Note that cycles via concept inclusions, such as A v
∃r.A, are still permitted. Let T be a semi-acyclic TBox and Σ an     W ABox signature. For
an EL-concept C, we use preT ,Σ (C) to denote the FO-formula B∈Σ | T |=BvC B(x).
For all concept names A and EL-concepts C and D and role names r, set


         ϕΣ>,T (x) = true
         ϕΣA,T (x) = preT ,Σ (A)                                     if A is primitive
         ϕA,T (x) = ϕΣ
           Σ
                       C,T (x)                                       if A ≡ C ∈ T
       ϕCuD,T (x) = ϕC,T (x) ∧ ϕΣ
        Σ              Σ
                                   D,T (x)
       ϕ∃r.C,T (x) = preT ,Σ (∃r.C) ∨ ∃y.(r(x, y) ∧ ϕΣ
        Σ
                                                     C,T [y/x])      if r ∈ Σ
       ϕΣ
        ∃r.C,T (x) = pre T ,Σ (∃r.C)                                 if r ∈
                                                                          /Σ

where ϕ[x/y] denotes the result of first renaming all bound variables in ϕ so that y does
not occur, and then replacing the free variable x of ϕ with y.
Lemma 2. For all IQs A(x), ϕΣ
                            A,T (x) is an FO-rewriting of A(x) relative to T and Σ.

The size of ϕΣ
             A,T (x) can clearly be exponential in the size of T , for example when
A = An and T = {Ai ≡ ∃r.Ai−1 u ∃s.Ai−1 | 1 ≤ i ≤ n}. To reduce ϕΣ               A,T
to polynomial size, we can use Avigad’s observation that FO supports structure shar-
ing [1]. More precisely, let ϕ be a positive FOQ (such as ϕΣA,T ) whose subformulas
include ψ(x1 ), . . . , ψ(xn ). The multiple occurences of ψ can be avoided by rewriting
ϕ to ∃u∀y∀z (ψ(y) ↔ z = u) → ϕ0 where ϕ0 is ϕ with each ψ(xi ) replaced with
y = xi → z = u. Intuitively, we iterate over all y and memorize whether ψ(y) holds
using identity of z with u. Since we need at least two different ‘values’ for z to make
this trick work, the resulting FOQ is an FO-rewriting only on ABoxes with at least two
individual names.

7      Related Work
In [15], deciding FO-rewritability is studied in the context of the expressive DL ALCF I
and several of its fragments. In general, though, the setup in that paper is different:
while we are interested in deciding FO-rewritability of a single query relative to a
TBox, the results in [15] concern deciding whether, for a given TBox T , all queries
are FO-rewritable relative to T . It is shown that this problem is decidable for Horn-
ALCF I-TBoxes of depth at most two and for Horn-ALCF-TBoxes (queries are IQs
or, equivalently, CQs). As a by-product of these results, a close connection between
FO-rewritability of TBoxes formulated in Horn DLs and boundedness of datalog pro-
grams is observed, see e.g. [6, 17] for the latter problem. In its original formulation,
the following result is established for a larger class of TBoxes, namely materializable
ALCF I-TBoxes of depth one.
Lemma 3 ([15]). For every (general) EL-TBox T in normal form, there is a datalog
program ΠT such that for every ABox signature Σ and IQ A(x), the predicate A is
bounded in ΠT relative to Σ-databases iff A(x) is FO-rewritable relative to T and Σ.
In [15], the program ΠT is of exponential size. Since we are only interested in EL-
TBoxes, it is easy to find a ΠT of polynomial size. More specifically, ΠT consists of
     A(x) ← true           if > v A ∈ T
     B(x) ← r(x, y), A(x) if ∃r.A v B ∈ T X∃r.A (x) ← B(x) if B v ∃r.A ∈ T
     B(x) ← A1 (x), A2 (x) if A1 u A2 v B ∈ T        (where possibly A1 = A2 )
     B(x) ← X∃r.A (x)      if ∃r.B0 v B ∈ T and T |= A v B0
This allows to carry over the 2E XP T IME upper bound for predicate boundedness of con-
nected monadic datalog programs [6] to FO-rewritability of an IQ relative to a general
EL-TBoxes and an ABox signature.5
   Note that boundedness has been studied also in the context of the µ-calculus and
monadic second order (MSO) logic [16, 4]. Here, an E XP T IME upper bound is known
from [16] and it seems likely that this result can be utilized to find an alternative proof
of Theorem 6. In particular, it is possible to find a µ-calculus rewriting ϕ of an IQ
A(x) relative to an EL-TBox T and ABox signature Σ: proceeding similarly to the
construction of the above datalog program ΠT , we can find a µ-calculus formula ϕT ,Σ
such that for all Σ-ABoxes A and a ∈ Ind(A), we have T , A |= A(a) iff IA , a |= ϕ.
When simultaneous fixpoints are admitted, ϕ even has polynomial size.
 5
     Note that we explicitly fix the signature Σ of the databases over which boundedness of ΠT is
     considered instead of assuming that only the EDB predicates can be used in the data as in [6];
     this is only for simplicity and, in fact, it is easy to adapt ΠT to the latter assumption.
8    Conclusions
It would be interesting to generalize the results presented in this paper to more expres-
sive DLs and to more expressive query languages. Regarding the former, we note that
using the techniques in [14, 15] it is possible to derive a NE XP T IME upper bound for
deciding FO-rewritability of IQs relative to Horn-ALCI-TBoxes and ABox signatures.
Regarding the latter, CQs are a natural choice and we believe that a mix of techniques
from this paper and those in [3] might provide a good starting point. It is interesting
to note that FO-rewritability of all IQ-atoms A(x) in a CQ q does not imply that q is
FO-rewritable and the converse fails, too.
Acknowledgements. C. Lutz was supported by the DFG SFB/TR 8 ‘Spatial Cognition’.

References
 1. J. Avigad. Eliminating definitions and skolem functions in first-order logic. In Proc. of
    LICS, pages 139–146. IEEE Computer Society, 2001.
 2. F. Baader, M. Bienvenu, C. Lutz, and F. Wolter. Query and predicate emptiness in description
    logics. In Proc. of KR. AAAI Press, 2010.
 3. M. Bienvenu, C. Lutz, and F. Wolter. Query containment in description logics reconsidered.
    In Proc. of KR, 2012. To appear.
 4. A. Blumensath, M. Otto, and M. Weyer. Decidability results for the boundedness problem.
    Manuscript, 2012.
 5. D. Calvanese, G. De Giacomo, D. Lembo, M. Lenzerini, and R. Rosati. Tractable reasoning
    and efficient query answering in description logics: The DL-Lite family. J. of Automated
    Reasoning, 39(3):385–429, 2007.
 6. S. S. Cosmadakis, H. Gaifman, P. C. Kanellakis, and M. Y. Vardi. Decidable optimization
    problems for database logic programs. In Proc. of STOC, pages 477–490. ACM, 1988.
 7. G. Gottlob and T. Schwentick. Rewriting ontological queries into small nonrecursive datalog
    programs. In Proc. of DL. CEUR-WS, 2011.
 8. S. Kikot, R. Kontchakov, V. V. Podolskii, and M. Zakharyaschev. Exponential lower bounds
    and separation for query rewriting. CoRR, abs/1202.4193, 2012.
 9. B. Konev, M. Ludwig, D. Walther, and F. Wolter. The logical diff for the lightweight descrip-
    tion logic EL. Technical report, U. of Liverpool, http://www.liv.ac.uk/∼ frank/publ/, 2011.
10. B. Konev, D. Walther, and F. Wolter. The logical difference problem for description logic
    terminologies. In Proc. of IJCAR, pages 259–274. Springer, 2008.
11. R. Kontchakov, C. Lutz, D. Toman, F. Wolter, and M. Zakharyaschev. The combined ap-
    proach to query answering in DL-Lite. In Proc. of KR. AAAI Press, 2010.
12. C. Lutz, D. Toman, and F. Wolter. Conjunctive query answering in the description logic EL
    using a relational database system. In Proc. of IJCAI, pages 2070–2075. AAAI Press, 2009.
13. C. Lutz and F. Wolter. Deciding inseparability and conservative extensions in the description
    logic EL. In J. of Symbolic Computation 45(2): 194-228, 2010.
14. C. Lutz and F. Wolter. Non-uniform data complexity of query answering in description
    logics. In Proc. of DL. CEUR-WS, 2011.
15. C. Lutz and F. Wolter. Non-uniform data complexity of query answering in description
    logics. In Proc. of KR, 2012. To appear.
16. M. Otto. Eliminating recursion in the µ-calculus. In Proc. of STACS, pages 531–540.
    Springer, 1999.
17. R. van der Meyden. Predicate boundedness of linear monadic datalog is in PSPACE. Int. J.
    Found. Comput. Sci., 11(4):591–612, 2000.