Query Rewriting under EL-TBoxes:
                     Efficient Algorithms

           Peter Hansen1 , Carsten Lutz1 , İnanç Seylan1 , and Frank Wolter2
    1
        University of Bremen, Germany, {hansen,clu,seylan}@informatik.uni-bremen.de
                      2
                        University of Liverpool, UK, frank@csc.liv.ac.uk


          Abstract. We propose a new type of algorithm for computing first-
          order (FO) rewritings of concept queries under EL-TBoxes that is tailored
          towards efficient implementation. The algorithm outputs a non-recursive
          datalog rewriting if the input is FO-rewritable and otherwise reports
          non-FO-rewritability. We carry out experiments with ontologies from
          practical applications which show that our algorithm performs very well
          in practice, and that EL-TBoxes originating from real-world applications
          admit FO-rewritings (of reasonable size) in almost all cases, even when
          in theory such rewritings are not guaranteed to exist.


1       Introduction
Query rewriting is an important technique for implementing ontology-based data
access (OBDA) based on a relational database system (RDBMS), thus taking
advantage of those systems’ efficiency and maturity [15, 9]. The general idea is to
transform the original query q and the relevant TBox T into a first-order (FO)
query qT that is then handed over to the RDBMS for execution. One limitation
of this approach is that, for the majority of description logics that are used as
ontology languages, the query qT is not guaranteed to exist. In fact, this is the
case already for the members of the popular EL family of lightweight DLs [2,
11], which underly the OWL2 EL profile and are frequently used as ontology
languages in health care and biology. This observation, however, does not rule
out the possibility that FO-rewritings still exist in many practically relevant
cases. In fact, TBoxes that emerge from practical applications tend to have a
rather simple structure and one might thus speculate that, indeed, FO-rewritings
under EL-TBoxes tend to exist in practice.
    In this paper, we consider the computation of FO-rewritings of concept
queries under TBoxes that are formulated in the description logic EL, where
a concept query takes the form C(x) with C an EL-concept. It has recently been
shown in [6] that deciding whether a concept query C(x) is FO-rewritable under
an EL-TBox T is a PSpace-complete, and that the problem becomes ExpTime-
complete when a signature is imposed on the set of admitted database instances
(represented as an ABox). This shows that computing the desired rewritings is
not an easy task. The existing approaches to query rewriting in EL and their
relevance to the computation of FO-rewritings can be summarized as follows:
(i) approaches that target rewritings in the more expressive query language dat-
alog which are incomplete in the sense that the generated datalog-rewritings
are not guaranteed to be non-recursive even if there is an FO-rewriting [16, 14,
13]; (ii) backwards chaining approaches for existential rules (a strict generaliza-
tion of EL) which are complete in the sense that they find an FO-rewriting if
there is one, but need not terminate otherwise [8]; (iii) the complete and ter-
minating approach from [6] which aims at proving upper complexity bounds for
FO-rewritings which cannot be expected to be feasible in practice because it
relies on brute-force enumeration techniques.3
    The aim of this paper is to design algorithms for computing FO-rewritings
of concept queries under EL-TBoxes that are complete, terminating, and feasible
in practice. To this end, we start with a marriage of approaches (ii) and (iii)
to get the best of both worlds; in particular, (ii) appears to be practically more
feasible than (iii) while (iii) provides a way to achieve termination. The resulting
algorithm is conceptually simple and constitutes a significant step towards our
goal. However, it produces FO-rewritings that are unions of conjunctive queries
(UCQs), which results in two significant drawbacks: first, recent experiments [10]
have shown that executing UCQ-rewritings on RDBMSs is prohibitively expen-
sive while executing equivalent rewritings that take the form of non-recursive
datalog programs is much more feasible (even when the original query is a con-
junctive query); and second, UCQ-rewritings can be of excessive size even in
practically relevant cases [17].
    To address these shortcomings, we refine our original algorithm to what we
call a decomposed algorithm. While the original algorithm uses tree-shaped con-
junctive queries (CQs) as an internal data structure, the new algorithm only
represents single nodes of CQs together with information of how to reassemble
these nodes into full tree-shaped CQs. This can be seen as a way to imple-
ment structure sharing and it also allows us to directly produce rewritings that
are non-recursive datalog programs, avoiding UCQ-rewritings altogether. The
algorithm runs in exponential time, is capable of deciding the existence of FO-
rewritings in ExpTime, and produces monadic non-recursive datalog rewritings
that are of at most exponential size (but much smaller in practice). Technically,
the decomposed algorithm is much more subtle than the original one.
    We then evaluate the decomposed algorithm by carrying out experiments
with seven ontologies from practical applications. We ask for an FO-rewriting
for every concept query A(x), with A a concept name from the ontology under
consideration. Out of 15970 requests in total, the decomposed algorithm times
out only on 158 inputs, with a timeout of 15 seconds. We also analyze the size
of the generated non-recursive datalog rewritings, with extremely encouraging
results. Our experiments show that the decomposed algorithm performs very
well on inputs from practical applications. They also confirm our initial belief
that EL-ontologies from practical applications often admit FO-rewritings. In
particular, only 31 of 15970 queries turn out to not be FO-rewritable.
    Proof details can be found in the long version available at:
http://www.informatik.uni-bremen.de/tdki/research/papers/2014/DL2014.pdf
3
    For approaches to FO-rewritability for non-Horn DLs we refer the reader to [7, 4].
2    Preliminaries

We use NC and NR to denote countably infinite sets of concept names and role
names, respectively. An EL-concept is formed according to the syntax rule C ::=
A | > | C u C | ∃r.C, a concept inclusion (CI) takes the form C v D with C
and D EL-concepts, and a TBox is a finite set of CIs. The semantics of concepts
and TBoxes is defined as usual. We write T |= C v D when C is subsumed by
D under T .
    An ABox A is a set of assertions of the form A(a) and r(a, b) with A a concept
name, r a role name, and a, b individual names from a countably infinite set NI .
We use ind(A) to denote the set of individual names that occur in A. A concept
query is an expression C(x) with C an EL-concept and x a variable. We write
A, T |= C(a) if a is a certain answer to C given the ABox A and TBox T . For
an FO-query q(x) with one free variable, we write A |= q(a) if A (viewed as a
structure) satisfies q under the assignment that maps x to a.
    A concept query C(x) is FO-rewritable under a TBox T if there is an FO-
formula ϕ(x) such that for all ABoxes A and individuals a, we have A, T |= C(a)
iff A |= ϕ(a). In this case, we call ϕ(x) an FO-rewriting of C(x) under T . An
FO-formula ϕ(x) is a partial FO-rewriting of A under T if for all ABoxes A and
a ∈ ind(A), A |= ϕ(a) implies A, T |= A(a). Thus a partial FO-rewriting is an
FO-rewriting that is sound, but not necessarily complete. When studying FO-
rewritability of concept queries C(x), we can assume w.l.o.g. that C is a concept
name since C(x) is FO-rewritable under a TBox T iff A(x) is FO-rewritable
under T ∪ {C v A} where A is a fresh concept name [6].
    We will also consider more specific forms of FO-rewritings, in particular
UCQ-rewritings and non-recursive datalog rewritings. Although, strictly speak-
ing, non-recursive datalog rewritings are not FO-rewritings, the existence of ei-
ther kind of rewriting coincides with the existence of an FO-rewriting. In particu-
lar, non-recursive datalog rewritings can be viewed as a compact representation
of a UCQ-rewriting that implements structure sharing. By a monadic datalog
rewriting, we mean a datalog rewriting in which all intensional (IDB) predicates
are unary.
    For our technical constructions, it will be convenient to view EL-concepts as
CQs that take the form of a directed tree. We will represent such queries as sets
of atoms of the form A(x) and r(x, y) with A a concept name, r a role name
and x, y variables, not distinguishing between answer variables and quantified
variables. Tree-shapedness of a conjunctive query q then means that the directed
graph (V, {(x, y) | r(x, y) ∈ q}) is a tree, where V is the set of variables in q, and
that r(x, y), s(x, y) ∈ q implies r = s. In the following, we will not distinguish
explicitly between an EL-concept C and its query representation. We thus use
var(C) to denote the set of variables that occur in C and xε to denote the
root variable in C. For an x ∈ var(C), we use C|x to denote the EL-concept
represented by (the subtree rooted at) x. When we speak of a top-level conjunct
(tlc) of an EL-concept C, we mean a concept name A such that A(xε ) ∈ C or a
concept ∃r.D such that r(xε , y) ∈ C and D = C|y . We use tlc(C) to denote the
set of all top-level conjuncts of C. For any syntactic object (such as a concept
or a TBox), we define its size to be the number of symbols used to write it.


3   A Backwards Chaining Algorithm
Let T be an EL-TBox and A0 a concept name for which an FO-rewriting is
to be constructed. The algorithm presented in this section constructs a set of
partial rewritings of A0 under T by starting from {A0 } and then exhaustively
applying the concept inclusions in T as rules in a backwards chaining manner.
Let C and D be EL-concepts and ϕ = E v F an EL-concept inclusion. Further,
let x ∈ var(C) and let there be at least one tlc G of C|x with |= F v G. Then D
is obtained from C by applying ϕ at x if D can be obtained from C by
 – removing A(x) for all concept names A with |= F v A;
 – removing the subtree rooted at y whenever r(x, y) ∈ C and |= F v ∃r.(C|y );
 – adding A(x) for all concept name A that are a tlc of E;
 – adding the subtree ∃r.H to x for each ∃r.H that is a tlc of E.
When the exact identity of x is not important, we say that D is obtained from C
by applying ϕ. This corresponds to a backwards chaining step based on so-called
piece unifiers in [3, 8]. The following is immediate.
Lemma 1. If T |= C v A0 and D can be obtained from C by applying some CI
in T , then T |= D v A0 .
Apart from applying CIs from the TBox as rules, our algorithm will also minimize
the generated partial rewritings to attain completeness and termination. To make
this precise, we introduce some notation. For EL-concepts C and D, we write
C  D if there is x ∈ var(D) such that C = D \ D|x , that is the concept C
is obtained from D by dropping the subtree D|x from D. We use ∗ to denote
the transitive closure of  and say that C is -minimal with T |= C v A0
if T |= C v A0 and there is no C 0  C with T |= C 0 v A0 . Note that if
T |= C v A0 , then it is possible to find in polynomial time a C 0 ∗ C that is
minimal with T |= C 0 v A0 (since subsumption in EL can be decided in PTime).
    The constructions in [6] suggest that, to achieve termination, we can use a
certain form of blocking, similar to the blocking used in DL tableau algorithms.
Let sub(T ) denote the set of subconcepts of (concepts that occur in) T . For each
EL-concept C and x ∈ var(C), we set conC   T (x) := {D ∈ sub(T ) | T |= C|x v D}.
We say that C is blocked if there are x1 , x2 , x3 ∈ var(C) such that
1. x1 is an ancestor of x2 , which is an ancestor of x3 and
                                  C\C|x3             C\C|x3
2. conC            C
      T (x1 ) = conT (x2 ) and conT        (x1 ) = conT       (x2 ).
The algorithm is formulated in Figure 1. Note that, by Lemma 1, the concept D
considered in the condition of the while loop satisfies T |= D v A0 . We are thus
guaranteed to find the desired D0 inside the while loop. Also note that there are
potentially many different D0 that we could use, and each of them will work.
procedure find-rewriting(A0 (x), T )
    M := {A0 }
    while there is a C ∈ M and a concept D such that
             1. D can be obtained from C by applying some CI in T and
             2. there is no D0  D with D0 ∈ M then
          find a D0 ∗ D that is minimal with T |= D0 v A0
          if D0 is blocked then
             return ‘not FO-rewritable’
          add D0 to M
                    W
    return the UCQ M .

                      Fig. 1. The backwards chaining algorithm

Example 1. Let T = {∃r.(B1 u B2 ) v A0 , ∃s.B2 v B2 , B1 v B2 }. Starting with
M = {A0 } and applying the first CI to C = A0 , we get M = {A0 , ∃r.(B1 u B2 )}.
Applying the second CI to C = ∃r.(B1 u B2 ) then yields D = ∃r.(B1 u ∃s.B2 )
which is not minimal with T |= D v A0 as witnessed by D0 =W∃r.B1 , which is
added to M . At this point, rule application stops and the UCQ M is returned.
It is illustrative to try Example 1 without applying the minimization step. We
then find a blocked concept in M , which shows that without minimization the
algorithm is incomplete. It can also be seen that dropping minimization results
in non-termination since the out-degree of concepts in M can grow unboundedly.
    We now establish correctness and termination, showing first that, if the al-
gorithm claims to have found an FO-rewriting, then this is indeed the case.
                                                             W          W
Proposition 1 (Soundness). If the algorithm returns M , then M is an
FO-rewriting of A0 under T .
                                                                    W
Proof.(sketch) Let A be an ABox. We have to show:    W (1) if A |= M (a0 ), then
A, T |= A0 (aW0 ); (2) if A, T |= A0 (a0 ), then A |= M (a0 ). For Point 1, assume
that A |= M (a0 ). Then there is a C ∈ M with A |= C(a0 ). Consequently
A, T |= C(a0 ). By construction of M , all its elements C satisfy T |= C v A0 ,
thus A, T |= A0 (a0 ) as required.
    For Point 2, we essentially follow the proof strategy from [3], based on the
chase procedure. If A, T |= A0 (a0 ), then A0 (a0 ) ∈ chaseT (A) and consequently,
there is a sequence of ABoxes A = A0 , A1 , . . . , Ak that demonstrates A0 (a) ∈
chaseT (A), that is, each Ai+1 is obtained from Ai by a single chase step and
A0 (a) ∈ Ak . It thus suffices to prove by induction on k that if A = AW  0 , . . . , Ak
is a chase sequence that demonstrates A0 (a0 ) ∈ chaseT (A), then A |= M (a0 ).
This is slightly tedious, but straightforward.                                         o
Note that the generated UCQ-rewritings are not necessarily of minimal size. It is
possible to attain minimal-size rewritings by using a stronger form of minimality
when constructing the concept D0 , namely by redefining the relation “” so that
C  D if there is a root-preserving homomorphism from C to D (c.f. the notion
of most-general rewritings in [8]). As a consequence, the -minimal concept D0
with T |= D0 v A can then be of size exponential in the size of D. However, D0
can still be constructed in output-polynomial time.
Proposition 2 (Completeness). If the algorithm returns ‘not FO-rewritable’,
then A0 has no FO-rewriting under T .
Proof. By Theorem 2 in [5], it suffices to show that if the algorithm returns ‘not
FO-rewritable’, then
(∗) for every k > 0, there is a concept C whose depth exceeds k and such that
T |= C v A0 and T 6|= C|− k v A0 ,

where C|−k denotes the concept obtained from C be removing all variables whose
depth exceeds k. Using a pumping argument, one can show the following suffi-
cient condition for (∗).
Fact. If there is a concept C that is blocked with variables x1 , x2 , x3 ∈ var(C),
T |= C v A0 , and T 6|= C \ C|x3 v A0 , then (∗) holds.
Now assume the algorithm returns ‘not FO-rewritable’. Then there is a concept
D that is minimal with T |= D v A0 and that is blocked with variables x1 , x2 , x3 .
By minimality of D, T 6|= (D \ D|x3 ) v A0 and (∗) follows.                      o
We prove in the appendix that the algorithm always terminates by showing that
all concepts in M have outdegree at most n and depth at most 22n , n the size
of T . As remarked in [6], the size of UCQ-rewritings can be triple exponential
in the size of T , and thus the same is true for the runtime of the presented
algorithm. While this worst case is probably not encountered in practice, the
size of M can become prohibitively large for realistic inputs. For this reason,
we propose an improved algorithm in the subsequent section, which produces
non-recursive datalog rewritings instead of UCQ-rewritings and whose runtime
is at most single exponential.


4   A Decomposed Algorithm
The algorithm presented in this section consists of three phases. In the first phase,
a certain set Γ is computed that can be viewed as a decomposed representation
of the set M from Section 3 in the sense that we store only single nodes of the
tree-shaped concepts in M , rather than entire concepts. In the second phase,
we compute a certain set Ω that enriches the node representation provided by
Γ with sets of logical consequences as mentioned in Point 2 of the definition of
blocked concepts. In the third phase, we first execute a certain cycle check on
Ω, which corresponds to checking the existence of a blocked concept in M . If no
cycle is found, we can read off a rewriting from Ω.
    Assume that T is a TBox and A0 a concept name for which we want to com-
pute an FO-rewriting. To present the algorithm, it is convenient to decompose
conjunctions on the right-hand side of CIs, that is, to assume that T consists of
CIs of the form C v A, A a concept name, or C v ∃r.D. We start with describ-
ing the construction of Γ , whose elements we call node pairs. A node pair has
 the form (C, S), where C ∈ sub(T ) and S ⊆ sub(T ) is a set of concept names
 and concepts of the form ∃r.C. Intuitively, a node pair (C, S) describes a set of
 concepts D (subtrees of concepts in Γ ) such that T |= D v C and the following
 conditions are satisfied:
 (i) if A ∈ S for a concept name A, then A is a tlc of D;
(ii) if ∃r.E ∈ S, then there is a tlc ∃r.E 0 of D such that T |= E 0 v E.
 The computation of Γ starts with {(A0 , {A0 })} and proceeds by exhaustively
 applying the following two rules:
(r1) if (C, S) ∈ Γ , D v A ∈ T and A ∈ S, then add pair (C, (S \ {A}) ∪ tlc(D))
(r2) if (C, S) ∈ Γ , D v ∃r.F ∈ T , and there is an ∃r.G ∈ S with T |= F v G,
     then add the pair (C, (S \ {∃r.G | T |= F v G}) ∪ tlc(D))
 After applying either rule, we also have to add the pair (G, tlc(G)) for every
 ∃r.G ∈ sub(D) to trigger further derivation.
     The set Γ represents a (potentially infinitary) UCQ-rewriting of A0 under T
 in a sense that we make precise now. Let Γb be the set obtained as the limit of
 the sequence of sets Γb0 , Γb1 , . . . defined as follows:
   – Γb0 := {(C, u S) | (C, S) ∈ Γ and S ⊆ NC }.
   – Γbi+1 is Γbi extended with all pairs (C, D) such that there are (C, S) ∈ Γ and
     (G, Cr,G ) ∈ Γbi for each ∃r.G ∈ S and D = u A u u ∃r.Cr,G .
                                                      A∈S∩NC       ∃r.G∈S

 Proposition 3 (Soundness and Completeness of Γb). For all ABoxes A and
 a ∈ ind(A), we have A, T |= A0 (a) iff there is a (A0 , D) ∈ Γb with A |= D(a).
 Note that Γ provides us with a sufficient condition for FO-rewritability and
 suggests a way to produce a non-recursive datalog rewriting. In fact, if Γ is
 acyclic in the sense that the directed graph
             GΓ = (Γ, {((C, S), (C 0 , S 0 )) | S contains a concept ∃r.C 0 })
 contains no cycle, then Γb is finite and we obtain a non-recursive datalog program
 ΠΓ that is a rewriting of A0 under T by taking the rule
                           ^              ^
               PC (x) ←         A(x) ∧        ( r(x, yr,D ) ∧ PD (yr,D ) )
                            A∈S          ∃r.D∈S

 for each (C, S) ∈ Γ and using A0 as the goal predicate. However, if Γ is not
 acyclic, then A0 could still be FO-rewritable under T , but the above program
 will be recursive. To deal with this problem, we need the next two phases.
     We construct a set of node tuples Ω from Γ by further annotating and
 duplicating the pairs in Γ . A node tuple takes the form t = (Ct , St , cont , st , xcont )
 where Ct and St have the same form as the components of node pairs in Γ ,
 cont ⊆ sub(T ), st is an existential restriction in St or the special symbol “−”,
 and xcont is either a subset of sub(T ) or the special symbol “−”. Intuitively, a
 node tuple t ∈ Ω describes a set of concepts D (subtrees of concepts in Γ ) such
 that (Ct , St ) describes D in the way described above and the following additional
 conditions are satisfied:
(iii) for each E ∈ sub(T ), we have T |= D v E iff E ∈ cont ;
(iv) in the subtree of D rooted at st there is a leaf node such that for the concept
      D0 obtained by dropped this node and each E ∈ sub(T ), we have T |= D0 v
      E iff E ∈ xcont .

 When St contains no existential restrictions, we use “−” in the last two com-
 ponents. To understand st , it is useful to think of D as a tree and of st as a
 selected successor of the root of that tree. We start the construction of Ω with

              Ω0 := {(C, S, conT (S), −, −) | (C, S) ∈ Γ with S ⊆ NC },

 where for a set of concepts M , conT (M ) denotes the set of concepts D ∈ sub(T )
 such that T |= u M v D. We call the tuples in Ω0 leaf tuples. The final set Ω
 is constructed by exhaustively applying the following rule:

(r3) If t is a node tuple such that ∃r0 .D0 , . . . , ∃rn .Dn are the existential restric-
     tions in St , st = ∃r` .D` , and t0 , . . . , tn ∈ Ω with Cti = Di for 0 ≤ i ≤ n,
     then add t to Ω if the following conditions are satisfied:
       • there is a node pair (Ct , S) ∈ S  Γ with St ⊆ S and S ∩ NC = St ∩ NC ;
       • cont = conT (M ), where M = (St ∩NC )∪{∃ri .G | i ≤ n and G ∈ conti };
       • xcont = conT (M 0 ), where M 0 is
          [
             (St ∩ NC ) ∪ {∃r` .G | G ∈ xcont` } ∪ {∃ri .G | ` 6= i ≤ n and G ∈ conti }.

 In Points 3, the concept ∃r.− (which might occur in the set M 0 ) is identified
 with >. For t, t0 ∈ Ω, we write t Ω t0 if there are t0 , . . . , tn ∈ Ω that satisfy
 the conditions listed in (r3) and such that t0 = t` , that is, t0 is the tuple that
 was chosen for the selected successor. Note that the computation of the sets cont
 and xcont is complete, relying on the following observation [12].
 Lemma 2. For any concept C = A1 u · · · u An u ∃r1 .G1 u · · · u ∃rm .Gm ,
                                            [
      conT (C) = conT ({A1 , . . . , An } ∪   {∃ri .D | D ∈ conT (Gi )})
                                              1≤i≤m

 We now describe the third and last phase of the algorithm, which first checks
 whether an FO-rewriting exists at all and, if so, produces a rewriting that takes
 the form of a non-recursive monadic datalog program.
      We start with introducing the relevant notion of a cycle. A tuple t ∈ Ω is a
 root tuple if A0 ∈ cont and A0 ∈    / xcont . A path through Ω is a finite sequence of
 node tuples t1 , . . . , tk from Ω such that ti Ω ti+1 for 1 ≤ i < k. A tuple t ∈ Ω
 is looping if there is a path t1 , . . . , tk through Ω of length at least one such that
 t = t1 , cont = contk , and xcont = xcontk . We say that Ω contains a root cycle if
 there are tuples t, t0 ∈ Ω such that t is a root tuple, t0 is a looping tuple, and t0
 is reachable from t along Ω .
 Proposition 4. A0 is not FO-rewritable under T if and only if Ω contains a
 root cycle.
Proof.(sketch) “if”. Assume that Ω contains a root cycle. Using this cycle as a
guide, we show how to find a concept C that is blocked in the sense of Section 3
and satisfies T |= C v A0 as well as T 6|= (C \ C|x3 ) v A0 , where x3 is as in
the definition of ‘blocked’. Once we have constructed such a concept C, we can
again rely on the results of [5], as in the proof of Proposition 2.
   “only if”. Assuming that Ω contains no root cycle, we show below how to con-
stuct a non-recursive datalog-rewriting of A0 under T , thus A0 is FO-rewritable
under T .                                                                     o
As suggested by Proposition 4, our algorithm first checks whether Ω contains
a root cycle and, if so, returns ‘not FO-rewritable’. Otherwise, it constructs a
non-recursive datalog program ΠT ,A0 as follows. First, we drop from Ω all tuples
that are not reachable from a root tuple along an Ω -path. For t, t0 ∈ Ω and
∃r.D ∈ St , we write t ∃r.D t0 if there is a tuple b
                                                   t = (Ct , St , cont , ∃r.D, xconbt ) ∈
                      0                                                       0
Ω such that b   t Ω t . Note that, by definition of “ Ω ”, t            ∃r.D t implies
Ct0 = D. Now, ΠT ,A0 contains for every t ∈ Ω, the rule
                   ^               ^                      _
  PCt ,cont (x) ←        A(x) ∧        (r(x, yr,D ) ∧                PD,cont0 (yr,D ))
                 A∈St ∩NC          ∃r.D∈St               t0 ∈Ω|t   ∃r.D t
                                                                            0


Note that the disjunctions can be removed by introducing auxiliary IDB pred-
icates, without causing a significant blowup. The goal predicates of ΠT ,A0 are
all predicates of the form PA0 ,con (x) with A0 ∈ con.
Theorem 1.
1. The program ΠT ,A0 is a rewriting of A0 under T .
2. If Ω contains no root cycle, then ΠT ,A0 is non-recursive.
Even if Ω contains no root-cycles, the program ΠT ,A0 may have up to (single)
exponentially many IDB predicates. We observe that this cannot be significantly
improved without giving up monadicity.
Theorem 2. There is a family of TBoxes T1 , T2 , . . . such that for all n ≥ 1,
Tn is of size O(n2 ), the concept name A0 is FO-rewritable under Tn , and the
smallest non-recursive monadic datalog rewriting has size at least 2n .
Let us briefly analyze the complexity of the decomposed algorithm. It is easy to
verify that the number of Γ -pairs and Ω-tuples is singly exponential in the size
of T and that all required operations for building Γ and Ω and for determining
the existence of a root cycle require only polynomial time. By Proposition 4,
we have thus found an ExpTime algorithm for deciding FO-rewritability of EL-
concept queries. This is almost optimal as the problem we are dealing with is
PSpace-complete and becomes ExpTime-hard if slightly varied, see [6].

5    Experiments
We have implemented the decomposed algorithm in Java and conducted a num-
ber of experiments. The implementation is not highly optimized, but some as-
pects of handling the set Γ are worth to point out. In particular, we use numbers
 TBox      concept inclusions concept names role names timeouts not FO-rewritable
 XP                     1046            906         27        0                 1
 NBO                    1468            962          8        0                 6
 ENVO                   1848           1538          7        0                 7
 FBbi                     567           517          1        0                 0
 MOHSE                  3665           2203         71        2                 1
 not-galen              4636           2748        159      149                 0
 SO                     2740           1894         13        7                16

                     Table 1. TBoxes used in the experiments.


to represent subconcepts in T , store the S-component of each pair (C, S) ∈ Γ ,
which is a set of subconcepts of T , as an ordered set, and use so-called tries as a
data structure to store Γ . We remove pairs (C, S) ∈ Γ where S is not minimal,
that is, for which there is a (C, S 0 ) ∈ Γ with S 0 ( S. It is easy to see that this
optimization does not compromise correctness.
    The experiments were carried out on a Linux (3.2.0) machine with a 3.5Ghz
quad-core processor and 8GB of RAM. Although a large number of EL-TBoxes
is available on the web and from various repositories, most of them are acyclic
TBoxes in the traditional DL sense, that is, the left-hand sides of all CIs are
concept names, there are no two CIs with the same left-hand side, and there
are no syntactic cycles. Since concept queries are always FO-rewritable under
acyclic EL-TBoxes [5], such TBoxes are not useful for our experiments. We have
identified seven TBoxes that do not fall into this class, listed in Table 1 together
with the number of concept inclusions, concept names, and role names that they
contain. All TBoxes together with information about their origin are available
at http://tinyurl.com/q96q34z.
    For each of these TBoxes, we have applied the decomposed algorithm to every
concept name in the TBox. In some rare cases, the set Γ has reached excessive
size, resulting in non-termination. We have thus established a 15 second timeout
for the Γ -phase of the algorithm. With that timeout, our algorithm was able to
decide FO-rewritability in almost all of the cases, see Table 1. The overall runtime
for all our experiments as a batch job (15970 invocations of the algorithm) took
only 80 minutes. The generated non-recursive datalog-rewritings are typically
of very reasonable size. The number of rules in the rewriting is displayed in the
upper part of Figure 2; for example, for NBO, about 55% of all rewritings consist
of a single rule, about 18% have two or three rules, about 10% have 4–7 rules,
and so on. Note that the x-axis has logarithmic scale. The size of the rule bodies
is typically very small, between one and two atoms in the vast majority of cases,
and we have never encountered a rule with more than ten body atoms. It is also
interesting to consider the number of IDB predicates in a rewriting, as intuitively
these correspond to views that have to be generated by a database system that
executes the query. As shown in the lower part of Figure 2, the number of IDB
predicates is rather small, and considerably lower than the number of rules in
the produced programs (we again use logarithmic scale on the x-axis).
                           80
                                                                                             XP
    % of concept names
                                                                                            NBO
                           60
                                                                                           ENVO
                                                                                            FBbi
                           40
                                                                                          MOHSE
                                                                                          not-galen
                           20
                                                                                             SO

                            0
                                  1

                                       2

                                           4

                                                8
                                                    16

                                                         32

                                                                64

                                                                  8

                                                                  6

                                                                  2
                                                                24

                                                                 48

                                                                 96

                                                                 92
                                                               12

                                                               25

                                                               51
                                                              10

                                                              20

                                                              40

                                                              81
                                                          Number of rules

                          100                                                                XP
 % of concept names


                                                                                            NBO
                                                                                           ENVO
                                                                                            FBbi
                           50                                                             MOHSE
                                                                                          not-galen
                                                                                             SO

                           0
                                  1


                                           2


                                                     4


                                                               8


                                                                        16


                                                                                  32


                                                                                              64
                                                     Number of IDB predicates
                                Fig. 2. Number of rules and IDB predicates in the rewriting

   The experiments also confirm our initial belief that ontologies which are
used in practical applications have a simple structure. As shown in Table 1,
the number of concept names that are not FO-rewritable is extremely small.
Moreover, if a concept name was FO-rewritable, then we were always able to
determine this already in the Γ -phase of our algorithm, without ever entering
the Ω-phase. Note, though, that for those cases that turned out to be not FO-
rewritable, we had to go through the full Ω-construction.


6                        Outlook
We plan to optimize the implementation of the decomposed approach further to
eliminate the timeouts we encountered in the experiments. Moreover, we plan
to extend the algorithm and implementation in several ways. First, we plan to
generalize the approach from concept queries to conjunctive queries. Second, in
many applications the ABox signature (i.e., the concept and role names occurring
in the ABoxes) is a small subset of the signature of the TBox [1, 6]. Queries that
are not FO-rewritable without any restriction on the ABox signature can become
FO-rewritable under smaller ABox signatures. We therefore plan to extend the
decomposed approach to arbitrary ABox signatures. Finally, we plan to extend
our approach to more expressive Horn-DLs such as ELI with role inclusions and
thereby unify DL-Lite and EL query rewriting approaches in one framework.
References
 1. Baader, F., Bienvenu, M., Lutz, C., Wolter, F.: Query and predicate emptiness in
    description logics. In: KR. pp. 192–202 (2010)
 2. Baader, F., Brandt, S., Lutz, C.: Pushing the EL Envelope. In: IJCAI. pp. 364–369
    (2005)
 3. Baget, J.F., Leclère, M., Mugnier, M.L., Salvat, E.: On rules with existential vari-
    ables: Walking the decidability line. Artif. Intell. 175(9-10), 1620–1654 (2011)
 4. Bienvenu, M., ten Cate, B., Lutz, C., Wolter, F.: Ontology-based data access: a
    study through Disjunctive Datalog, CSP, and MMSNP. In: PODS. pp. 213–224
    (2013)
 5. Bienvenu, M., Lutz, C., Wolter, F.: Deciding FO-rewritability in EL. In: Description
    Logics. pp. 70–80 (2012)
 6. Bienvenu, M., Lutz, C., Wolter, F.: First order-rewritability of atomic queries in
    horn description logics. In: IJCAI. pp. 754–760 (2013)
 7. Kaminski, M., Grau, B.C.: Sufficient conditions for first-order and datalog
    rewritability in ELU. In: Description Logics. pp. 271–293 (2013)
 8. König, M., Leclère, M., Mugnier, M.L., Thomazo, M.: A sound and complete back-
    ward chaining algorithm for existential rules. In: RR. pp. 122–138 (2012)
 9. Kontchakov, R., Rodriguez-Muro, M., Zakharyaschev, M.: Ontology-based data
    access with databases: A short course. In: Reasoning Web. pp. 194–229 (2013)
10. Lutz, C., İnanç Seylan, Toman, D., Wolter, F.: The combined approach to OBDA:
    Taming role hierarchies using filters. In: ISWC. pp. 314–330 (2013)
11. Lutz, C., Toman, D., Wolter, F.: Conjunctive query answering in the description
    logic el using a relational database system. In: IJCAI. pp. 2070–2075 (2009)
12. Lutz, C., Wolter, F.: Deciding inseparability and conservative extensions in the
    description logic EL. J. Symb. Comput. pp. 194–228 (2010)
13. Mora, J., Corcho, Ó.: Engineering optimisations in query rewriting for OBDA. In:
    I-SEMANTICS. pp. 41–48 (2013)
14. Pérez-Urbina, H., Motik, B., Horrocks, I.: Tractable query answering and rewriting
    under description logic constraints. J. Applied Logic 8(2), 186–209 (2010)
15. Poggi, A., Lembo, D., Calvanese, D., Giacomo, G.D., Lenzerini, M., Rosati, R.:
    Linking data to ontologies. J. Data Semantics 10, 133–173 (2008)
16. Rosati, R.: On conjunctive query answering in EL. In: Description Logics. pp.
    451–458 (2007)
17. Rosati, R., Almatelli, A.: Improving query answering over DL-Lite ontologies. In:
    KR. pp. 290–300 (2010)