<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Small Datalog Query Rewritings for E L</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, University of Oxford</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Giorgio Stefanoni</institution>
          ,
          <addr-line>Boris Motik, and Ian Horrocks</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>Description Logics are a key technology in data management scenarios such as Ontology-Based Data Access (OBDA), a paradigm in which a DL ontology is used to provide a conceptual view of the data [1]. An OBDA system transforms a conjunctive query over the ontology into a query over the data sources [2]. This transformation is independent of the data, so the OBDA approach can thus be used in settings where the data sources provide read-only access to the data, and where the data changes frequently. Most existing OBDA systems are based on the DL-Lite family of lightweight Description Logics [1], which is also the basis for the QL pro le of the OWL 2 ontology language. Logics in this family have been designed to allow a conjunctive query posed over the ontology to be rewritten as a rst order query over the data sources|that is, queries are rst-order rewritable. The query rewriting procedure is independent of the data, and the resulting queries can be evaluated using highly scalable relational database technology. To achieve this, however, the expressive power of DL-Lite is very restrictive. This prevents the OBDA approach from being applied in the life science domain, where many ontologies use DLs from the E L family [3, 4]. This family provides the basis for the EL pro le of OWL2, and many prominent ontologies, such as SNOMED-CT, were developed using this language. The problem of answering conjunctive queries in E L has already been studied in the literature, and two orthogonal approaches have been proposed. First, Rosati proposed a pure query rewriting technique which transforms an E L TBox T and a conjunctive query q into a Datalog program PT ;q [5]. Second, Lutz et al. introduced a \combined" approach [6, 7]. This technique rst materializes certain facts entailed by the ontology in a precomputation step. Then, each user query is rewritten into a polynomial rst-order query that, when evaluated over the materialized facts, computes the answers to the user's query. Unfortunately, these two approaches exhibit several shortcomings when applied in the context of OBDA. In particular, Rosati's rewriting technique computes for each user query a fresh Datalog program whose size depends on both the query and the terminology, which could be very ine cient when dealing with large scale ontologies. The approach by Lutz et al. produces smaller rst order rewritings, but the use of materialization means that the technique is only applicable when the data sources provide read/write access to the data; furthermore, materialization can be ine cient if the data changes frequently.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>? This work was supported by EPSRC and Alcatel-Lucent.</p>
      <p>In this paper, we present a pure query rewriting technique to query answering
in E L. Our approach reinterprets the combined approach proposed by Lutz and
colleagues in terms of Datalog. Our rewriting procedure consists of two distinct
steps. The rst step rewrites a TBox T into a Datalog program PT , whose size
depends linearly on the size of T . Then, at query time, the conjunctive query q
is rewritten into a Datalog query hQP ; QC i, whose size depends polynomially
on q. The two rewriting steps are such that, given an ABox A, deciding whether
QP (a1; : : : ; ak) follows from PT [ QC [ A is equivalent to deciding whether
ha1; : : : ; aki is a certain answer to q over a knowledge base hT ; Ai.</p>
      <p>
        At last, we summarize our main contributions. First, our rewriting approach,
unlike Rosati's, separates the rewriting of the TBox and the query into two
distinct steps, thus reducing ine ciency when dealing with large ontologies. Second,
our technique does not require the materialization of entailed facts, hence our
solution is in the spirit of OBDA and it avoids the problems associated with
the materialization of large models. Finally, we set the stage for assessing the
utility and the applicability to PT [ QC [ A of optimized Datalog evaluation
techniques, such as magic sets and SLG resolution [
        <xref ref-type="bibr" rid="ref8 ref9">8, 9</xref>
        ]. Indeed, heuristic-based
evaluation strategies signi cantly reduce the number of facts needed to answer
a query, thus potentially improving the performance of our rewriting approach.
In this paper we provide only proof sketches, and refer the reader to [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] for full
proofs.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Preliminaries</title>
      <sec id="sec-2-1">
        <title>Description Logic EL</title>
        <p>Let NC , NR, NI be pairwise disjoint in nite sets of atomic concepts, atomic roles,
and individuals. Together, the sets NC , NR, and NI form the signature of an E L
language. Whenever the distinction between atomic concepts and atomic roles
is immaterial, we call an element of NC [ NR a predicate. The set of E L concept
expressions is inductively de ned starting from atomic concepts A 2 NC and
atomic roles R 2 NR according to the syntax rule: C ! A j C1 u C2 j 9R:C j &gt;.</p>
        <p>An E L TBox T is a nite set of concept inclusions of the form C v D; an
E L ABox A is a nite set of assertions of the form A(a) or R(a; b) with a and
b individuals; and an E L knowledge base (KB) is a tuple K = hT ; Ai, where
T is an E L TBox and A is an E L ABox. We denote with Ind(A) the set of all
individuals occurring in the ABox A. Furthermore, for E either a TBox or an
ABox, Pred(E ) is the set of all predicates occurring in E .</p>
        <p>
          Semantics is given as usual in terms of rst-order interpretations I = h I ; I i,
where I is a nonempty domain and I is an interpretation function; please
refer to [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] for details. In the following, we will extensively use the notion of an
unraveling of an interpretation w.r.t. an ABox. Consider an interpretation I and
an ABox A over an arbitrary E L signature. A path p in I w.r.t. A is a nonempty
nite sequence c1 R2 c2 Rn 1 cn 1 Rn cn such that c1 2 faI j a 2 Ind(A)g
and for all 1 i n 1 we have that hci; ci+1i 2 RiI+1 for Ri+1 2 NR. We say
that a path p has depth n and we write dep(p) = n; furthermore, tail(p) is the
last domain element cn in p. Let pathsA(I) denote the set of all paths w.r.t. A
occurring in I. The unraveling J of I w.r.t. A is the following interpretation.
        </p>
        <p>
          J = pathsA(I)
aJ = aI
AJ = fp 2 pathsA(I) j tail(p) 2 AI g
RJ = fhaJ ; bJ i j R(a; b) 2 Ag [ fhp; p R ci j fp; p R cg
pathsA(I)g
In this paper, we deal only with normalized E L TBoxes. Let A1, A, and B be
arbitrary concepts from NC [ f&gt;g. We say that an E L TBox T is in normal
form if each axiom in T is in one of the following forms: A v B; A u A1 v B,
A v 9R:B, or 9R:A v B. Given an arbitrary E L TBox T , we can compute a
normalized TBox Tnorm of T in linear time [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ].
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Querying EL KBs</title>
        <p>
          Let NV be an in nite set of variables disjoint from NI . Together, NV and NI form
the set NT of terms. A rst-order query q is a rst-order formula constructed
from the terms in NT and the predicates from NC [ NR [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. In general, we
write q = (~x) to express that q is the FO formula whose answer variables are
~x = fx1; : : : ; xkg. A query with k answer variables is a k-ary query. A conjunctive
query (CQ) is a FO query of the form q = 9~y: (~x; ~y), where is a conjunction
of unary atoms A(s) and binary atoms R(s; t) with s and t terms. The variables
~y are the quanti ed variables of q. In the following, avar(q) is the set of answer
variables of q, and qvar(q) is the set of quanti ed variables. Finally, NV (q) is the
set of all variables occurring in q, and NT (q) is the set of all terms occurring
in q. Let q = (~x) be a k-ary FO query with ~x = hx1; : : : ; xki and let I be an
interpretation. We say that a k-ary tuple of individuals ha1; : : : ; aki is an answer
to q in I, written I j= q[a1; : : : ; ak], if I satis es q under the mapping which
sets (xi) = ai for all 1 i k. We call a match for q in I witnessing
ha1; : : : ; aki, written I j= q. We say that ha1; : : : ; aki is a certain answer to
q over K if I j= q[a1; : : : ; ak], for all models I of K. We denote the set of all
certain answers to q over K with cert(q; K). Rosati in [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] showed that deciding
whether a tuple of individuals is a certain answer to q over K is Ptime-complete
w.r.t. data-complexity (i.e., w.r.t. the size of the ABox); Ptime-complete w.r.t.
KB complexity (i.e., w.r.t. the size of K); and, NP-complete w.r.t. combined
complexity (i.e., w.r.t. the size of both K and q).
        </p>
        <p>
          Datalog
Let NB be a nonempty set of built-in predicates [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. Then, a Datalog rule r
is an expression of the form
        </p>
        <p>S(~u)</p>
        <p>S1(~u1); : : : ; Sn(~un); Bn+1(~un+1) : : : ; Bm(~um);
where n; m 0, fS; S1; : : : ; Sng NC [ NR, fBn+1; : : : ; Bmg NB, and ~u and
~ui are tuples of terms of suitable length. A rule is safe if each variable occurring
in ~u [ ~un+1 [ : : : [ ~um also occurs in ~u1 [ : : : [ ~un. Atom S(~u) is the head of the
rule, and atoms S1(~u1); : : : ; Bm(~um) constitute the body of the rule. Whenever
the body of a rule r is empty, we call r a fact, and we write the rule as S(~u). A
Datalog program P is a set of safe Datalog rules. Finally, sch(P ) is the set
of predicates occurring in P .</p>
        <p>
          Next, we de ne the semantics of a Datalog program P using Herbrand
interpretations [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. The Herbrand Universe of P is the set of all individuals
occurring in P . The Herbrand Base of P is the set of all facts that can be
constructed from the predicates in NC [ NR and the individuals in the universe
of P . A Herbrand interpretation I of P is a subset of the Herbrand Base of P .
Note that I does not interpret built-in predicates. As usual, we assume that these
predicates are evaluated over a xed, possibly in nite Herbrand interpretation
B [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. Then I is a model of P w.r.t. B if, for all the rules r in P , we have that
        </p>
        <p>
          I [ B j= 8~x(Bm(~un) ^ : : : ^ Bn+1(~un+1) ^ Sn(~un) ^ : : : ^ S1(~u1) ! S(~u));
where ~x is a tuple consisting of all variables occurring in the rule. The semantics
of a Datalog program P is de ned as the minimal Herbrand interpretation I
satisfying P w.r.t. B, written MB(P ). Whenever the program does not contain
built-in predicates, we do not consider the interpretation B and we simply write
M(P ). The semantics of Datalog programs can be de ned also by means of a
xpoint construction. Then, TP is the immediate consequence operator that maps
instances I over sch(P ) to instances over sch(P ) as follows. For each rule r in P ,
if there exists a match for S1(~u1) ^ : : : ^ Sn(~un) ^ Bn+1(~un+1) ^ : : : ^ Bm(~um)
in I [ B, then S(a1; : : : ; ak) is contained in TP (I) with ai = (ui) for each ui 2 ~u.
One can prove that TP has a minimum xpoint TP! such that TP! = MB(P ) [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ].
        </p>
        <p>Finally, a Datalog query is a tuple hQP ; QC i where QP is a predicate symbol
and QC is a Datalog program. A tuple of individuals ha1; : : : ; aki is an answer
to hQP ; QC i over Datalog program P if P [ QC j= QP (a1; : : : ; ak).
3</p>
        <p>Datalog Rewriting for E L TBoxes
In this section, we show how to transform an E L TBox T into a Datalog
program PT whose size depends linearly on T . The transformation is such that,
for an arbitrary E L ABox A, we can use the unraveling of M(PT [A) to compute
the answers to conjunctive queries over hT ; Ai. Let T be a TBox over an arbitrary
E L signature. Intuitively, for each axiom occurring in T , the program PT
contains a set of Datalog rules which encode the constraint imposed by . To
achieve this, we have to overcome two issues.</p>
        <p>
          First, E L concept inclusions of the form A v 9R:B require the use of either
existential quanti cations or Skolem terms in rule heads; however, Datalog
does not allow neither of the two. In order to solve this issue, we use a technique
that has been introduced for representing canonical models of E L knowledge
bases [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. That is, for each atomic concept B occurring in T we introduce a fresh
A v B
A1 u A2 v B
9R:A v B
A v 9R:B
        </p>
        <p>B(X)
B(X)</p>
        <p>Set of rules ( )
R(X; oB)</p>
        <p>B(X)</p>
        <p>A(X)
B(oB)
A1(X); A2(X)
R(X; Y ); A(Y )</p>
        <p>A(X)</p>
        <p>A(X)
auxiliary individual oB, which represents the class of existentially quanti ed
individuals of type B. Then, for each axiom of the above form, the program PT
contains the following two rules:</p>
        <p>R(X; oB)</p>
        <p>A(X);</p>
        <p>B(oB)</p>
        <p>A(X):</p>
        <p>Second, E L allows for &gt; to occur in concept expressions. Hence, we need to
de ne in PT a unary predicate &gt;, whose extension|given an ABox A|coincides
with the Herbrand universe of PT [ A. To achieve this, we restrict our study
to a subset of all E L ABoxes. In particular, we consider only those ABoxes A
such that Pred(A) Pred(T ). That is, each predicate occurring in the ABox A
must occur also in the TBox T . Then, in our Datalog program, for each atomic
concept A and for each atomic role R occurring in T , we add the following rules:
&gt;(X)</p>
        <p>A(X);
&gt;(X)</p>
        <p>R(X; Y );
&gt;(Y )</p>
        <p>
          R(X; Y ):
This is only one of the several ways in which we can encode such a predicate.
In fact, another possibility would be|as suggested by Rosati in [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]|to assume
that each ABox A contains an assertion &gt;(a) for each individual a 2 Ind(A). We
believe that in the context of OBDA|where the focus is to provide access to
arbitrary data-sources|it is important to make as few assumptions as possible
on the physical realization of the ABox. For this reason, we prefer the option
presented above.
        </p>
        <p>Next, we formalize the transformation of a TBox T into a Datalog program
PT . Let Aux = foA j A 2 NC g [ fo&gt;g be a set of auxiliary individuals distinct
from NI . Then, the program PT is constructed from terms in NT [ Aux and
predicates in NC [ NR [ f&gt;g as follows. The transformation uses the function
, shown in Figure 1, to transform each axiom in the (normalized) TBox T into
a set of Datalog rules. The Datalog program PT is then de ned as follows.</p>
        <p>PT = S ( )</p>
        <p>S 2T</p>
        <p>A2Pred(T )\NC &gt;(X)
SR2Pred(T )\NR &gt;(X)</p>
        <p>A(X)
R(X; Y ); &gt;(Y )</p>
        <p>R(X; Y )
The following result readily follows from the de nition of the program.
Proposition 1. For an arbitrary E L TBox T , Datalog program PT can be
computed in time linear in the size of T .</p>
        <p>Consider an arbitrary EL ABox A. Next, we prove that the unraveling U
w.r.t. A of M(PT [A) can be used to answer conjunctive queries over K = hT ; Ai.
We do so in two distinct steps. First, we introduce the notion of chase of an EL
knowledge base K. Second, we show that the chase of K is isomorphic to U .</p>
        <p>The chase of an EL knowledge base K = hT ; Ai, written chase(K), is a
possibly in nite Herbrand interpretation de ned inductively by starting from A
and then applying axioms occurring in the TBox to assertions occurring in the
ABox. In our de nition of the chase, we use function terms to denote existentially
quanti ed individuals. Hence, the de nition of ABox assertion is extended in a
natural way to accommodate for assertions over function terms. We denote with
u and w terms that can be either individuals or function terms. Next, we de ne
an operator T that chases an ABox by applying the axioms occurring in the
TBox T . In the de nition, we use assertions of the form &gt;(u) to assert that u is
a member of the EL concept expression &gt;. For S an arbitrary ABox, T (S) is
the smallest ABox containing S and closed under the following chasing rules.
(cr1) If fA(u)g S and A v B 2 T , then fB(u)g T (S).
(cr2) If fA1(u); A2(u)g S and A1 u A2 v B 2 T , then fB(u)g
(cr3) If fR(u; w); A(w)g S and 9R:A v B 2 T , then fB(u)g
(cr4) If fA(u)g S and A v 9R:B 2 T , then</p>
        <p>T (S).</p>
        <p>T (S).
fR(u; f (u; R; B)); B(f (u; R; B))g</p>
        <p>T (S):
(cr5) If u occurs in S, then f&gt;(u)g
T (S).</p>
        <p>
          We now de ne an in nite sequence of nite ABoxes Ai for i 2 N.
It is clear that our construction of the chase of K is fair. In fact, for each i 2 N we
have that Ai+1 is the result of exhaustively applying|to all possible assertions
occurring in Ai|all applicable axioms in T . At last, we want to point out that
chase(K) can be used to compute the certain answers to a CQ q over K.
Proposition 2 ([
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]). Let K be an EL knowledge base. Further, let q be a k-ary
conjunctive query. Then, for each k-ary tuple of individuals ha1; : : : ; aki, we have
ha1; : : : ; aki 2 cert(q; K) if and only if chase(K) j= q[a1; : : : ; ak]:
        </p>
        <p>So, by proving that the unraveling U of M(PT [A) is isomorphic to chase(K),
we establish that U can be used to answer conjunctive queries over K. To prove
the structural equivalence of U and chase(K), we de ne a function h mapping
Finally, the chase of K is the in nite union of all such ABoxes Ai.</p>
        <p>A0 = A
Ai+1 =</p>
        <p>T (Ai)
chase(K) = [ Ai
i2N
paths occurring in U to terms in chase(K). We de ne h by induction on the
depth of paths occurring in U as follows.</p>
        <p>Base Case. Consider an arbitrary p 2 U with dep(p) = 1. We set h(p) := p.</p>
        <p>Inductive Step. Let p = t1 R2 t2 tn 1 Rn tn be a path occurring
in U such that h(p) has not been de ned yet, but h(t1 Rn 1 tn 1) = u. We
distinguish between two cases depending on the type of the individual tn.
1. If tn occurs in the ABox, we set h(p) := tn.
2. If tn is of the form oB, we set h(p) := f (u; Rn; B).</p>
        <p>Theorem 1 shows that h is an isomorphism between the two structures.
Intuitively, for the only-if direction, we show that h is an injective homomorphism
from U to chase(K) by induction on the depth of paths occurring in U ; for the
if-direction, by induction on the construction of chase(K) we prove that h is a
surjective function and that it is a homomorphism from chase(K) to U .
Theorem 1. Function h is an isomorphism from U to chase(K).</p>
        <p>Since the unraveling of M(PT [ A) is generally in nite, this result alone does
not provide us with an algorithm for answering queries in E L. In the next section,
we show how to rewrite a user query q into a Datalog query hQP ; QC i such
that PT [ A [ QC j= QP (a1; : : : ; ak) if and only if ha1; : : : ; aki 2 cert(q; K) and
thus solve the problem.
4</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Polynomial Query Rewriting in Datalog</title>
      <p>In the previous section, we have seen that for an arbitrary E L KB K = hT ; Ai
evaluating a conjunctive query q over the unraveling of the Herbrand model
of PT [ A is equivalent to computing the certain answers to q over K. In this
section, we develop a query rewriting procedure that reduces the computation of
cert(q; K) to the problem of evaluating a suitably constructed Datalog query
over PT [ A. We achieve this in two steps. First, we present an interesting
property of a certain class of interpretations. Second, we show how this result
can be used to develop a query rewriting procedure in our Datalog setting.</p>
      <p>
        We use the notions of A-connected and split interpretations from [
        <xref ref-type="bibr" rid="ref13 ref6">6, 13</xref>
        ]. Let
I be an interpretation and let A be an ABox over an arbitrary E L signature.
We say that I is A-connected if, for each domain element c 2 I , there exists a
path p 2 pathsA(I) such that tail(p) = c. Furthermore, I is a split interpretation
if, for all domain elements c; c0 2 I , we have that c 62 faI j a 2 Ind(A)g
and hc; c0i 2 RI imply c0 62 faI j a 2 Ind(A)g. Intuitively, in an A-connected
interpretation I, for each domain element cn it is always possible to nd a path
aI R2 c2 Rn cn such that a 2 Ind(A). Furthermore, if I is split, then each
domain element that is not the image of an individual can be related by a role
only with elements that themselves do not interpret individuals.
      </p>
      <p>
        Then, let J be the unraveling w.r.t. A of a split and A-connected
interpretation I and let q be a conjunctive query. Lutz et al. in [
        <xref ref-type="bibr" rid="ref13 ref6">6, 13</xref>
        ] showed that it is
possible to reduce the problem of answering q in J to evaluating a rst-order
query rewriting q of q over I. Roughly speaking, the query rewriting q rules
out some spurious answers for q in I that cannot be reproduced in J . More
speci cally, we have to ensure that the answer variables of q, the variables of
q mapped to cyclic portions of I, and the variables of q mapped to nontree
portions of I are all matched only to the domain elements in faI j a 2 Ind(A)g.
      </p>
      <p>
        We now brie y outline how we can construct such an FO rewriting q for
q [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Let q be the smallest equivalence relation over NT (q) that is closed under
the following rule: if R(s; t) and R(s0; t0) occur in q and t q t0, then s q s0.
Then, for each equivalence class of q, we let t 2 be an arbitrary, but xed,
representative of the class. Also, for each such equivalence class and for each
atomic role R occurring in q, we let Pred( ; R) be the following set.
      </p>
      <p>Pred( ; R) = ft 2 NT (q) j R(t; t0) occurs in q with t0 2 g
Next, we de ne three sets of terms that correspond to the above mentioned cases.
{ Fork= is the set of all pairs hPred( ; R); t i such that is an equivalence class
of q and jPred( ; R)j &gt; 1.
{ Fork6= is the set of all quanti ed variables v 2 qvar(q) for which atoms R(s; v)
and S(s0; t) exist in q such that R 6= S and v q t.
{ Cyc is the set of all variables v 2 qvar(q) for which atoms</p>
      <p>R0(t0; t00); : : : ; Rm(tm; t0m); : : : ; Rn(tn; t0n)
exist in q such that m; n 0; for some i n we have that ti q v; for each
j &lt; n we have that t0j q tj+1; and t0n q tm.</p>
      <p>
        We are now ready to formally specify the FO query rewriting q . In the de nition,
we assume that Aux is a fresh predicate not occurring in q and K and that every
interpretation I interprets Aux as I n faI j a 2 Ind(A)g. Then, formulae q1 and
q2 are de ned as follows.
Finally, we set q = q^q1^q2. It turns out that q can be computed in polynomial
time w.r.t. q [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. In the same paper, Lutz et al. prove the following result.
Proposition 3. Let A be an arbitrary E L ABox, let I be a split and A-connected
interpretation, and let J be the unraveling of I w.r.t. A. Then, for every k-tuple
of individuals ha1; : : : ; aki, we have that
      </p>
      <p>I j= q [a1 : : : ; ak] if and only if J j= q[a1 : : : ; ak]:</p>
      <p>This result applies to our Datalog rewriting of E L TBoxes. Indeed, for an
arbitrary E L KB K = hT ; Ai, we have that M(PT [A) is a split and A-connected
interpretation. The intuition behind the argument is as follows. We show that
M(PT [ A) is split by noticing that rules encoded in PT do not allow for the
derivation of facts of the form R(oB; a) for a 2 Ind(A) and oB 2 Aux. To see
that M(PT [ A) is A-connected, we just recall that M(PT [ A) is minimal and,
hence, all the derived facts must be \grounded" w.r.t. the facts in A.
Theorem 2. Let K = hT ; Ai be an E L knowledge base. Then, M(PT [ A) is a
split and A-connected interpretation.</p>
      <p>Hence, for an arbitrary k-ary CQ q and for each k-tuple of individuals ha1; : : : ; aki,
we have that M(PT [ A) j= q [a1 : : : ; ak] if and only if ha1; : : : ; aki 2 cert(q; K).</p>
      <p>
        Note that q is a rst-order query, and we are unaware of systems
capable of evaluating rst-order queries over Datalog programs. Therefore, we
next show how to transform q into a Datalog query hQP ; QC i such that
ha1; : : : ; aki 2 cert(q; K) if and only if PT [ A [ QC j= QP (a1 : : : ; ak). We
construct such a Datalog query hQP ; QC i by applying to q a simpli ed version
of the Lloyd-Topor transformation [
        <xref ref-type="bibr" rid="ref14 ref15">14, 15</xref>
        ].
      </p>
      <p>De nition 1 (Datalog Rewriting). Let q(~x) be a k-ary CQ whose
quantied variables are among ~y; let Cyc, Fork6=, and Fork= be as speci ed above; let
hPred( 1; R1); t1i, : : :, hPred( n; Rn); tni be an arbitrary enumeration of Fork=;
let p0; p1; : : : ; pn be fresh predicates; and let Named be a built-in with a
predetermined, possibly in nite Herbrand interpretation N = fNamed(a) j a 2 NI g.
Query QC then contains the following safe Datalog rules:
p0(~x; ~y)
q;</p>
      <p>^
v2avar(q)[Fork6=[Cyc</p>
      <p>Named(v)
pi(~x; ~y)</p>
      <p>pi 1(~x; ~y); Named(ti )
pi(~x; ~y)</p>
      <p>pi 1(~x; ~y);
QP (~x)
pn(~x; ~y)</p>
      <p>^
t;t02Pred( i;Ri)
t = t0
for 1
for 1
i
i
n
n
(1)
(2)
(3)
(4)
One may think that the recursive de nition of predicates pi for 1 i n could
be simpli ed by writing QP (~x) p0(~x; ~y) : : : pn(~x; ~y) and by de ning each pi as:
pi(~x; ~y)</p>
      <p>Named(ti )
pi(~x; ~y)</p>
      <p>Vt;t02Pred( i;Ri) t = t0
Unfortunately, these rules are not safe. Safe rules, on the one hand, provide
us with a clear and unambiguous semantics. On the other hand, unsafe rules
are also computationally more expensive for bottom-up computation, since each
variable in the head may be bound to an arbitrary individual in the universe of
the program. For this reason, we prefer our, slightly more involved, solution.
Proposition 4. For an arbitrary k-ary conjunctive query q, query hQP ; QC i
can be computed in polynomial time w.r.t. the size of q.</p>
      <p>
        Proof. We note that q can be computed in polynomial time w.r.t. the size of
q [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] and, therefore, also the sets Cyc, Fork6=, and Fork= can be computed in
polynomial time w.r.t. q. Furthermore, the size of the body of rule p0(~x; ~y) depends
linearly on the size of q, Cyc, and Fork6=. Also, for each pair hPred( ; R); t i in
Fork=, the program QC contains exactly two rules. The size of these two rules
depends linearly on the size of hPred( ; R); t i. Thus, we conclude that hQP ; QC i
can be computed in polynomial time with respect to the size of q.
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], we prove that our rewriting is correct|that is, that answering hQP ; QC i
over PT [ A is equivalent to computing cert(q; T ; A). This follows from
Proposition 3 and the fact that our Datalog query is the result of transforming the
FO rewriting q along the lines of the Lloyd-Topor transformation.
Theorem 3. Let K be an EL knowledge base and let q be a k-ary CQ over K.
Then, for every k-tuple of individuals ha1; : : : ; aki, we have that
      </p>
      <p>ha1; : : : ; aki 2 cert(q; K) if and only if PT [ A [ QC j= QP (a1; : : : ; ak):
Finally, we investigate the complexity of our rewriting procedure.
Theorem 4. Let K = hT ; Ai be an EL KB, let q be a k-ary CQ, and let
ha1; : : : ; aki be a tuple of individuals. We can decide PT [A[QC j= QP (a1; : : : ; ak)
in polynomial time w.r.t. the size of K and in non-deterministic polynomial time
with respect to the size of both K and q.</p>
      <p>
        Proof. We have already argued that the size of Datalog program PT depends
linearly on the size of the TBox T and that the Datalog rewriting hQP ; QC i
can be computed in Ptime w.r.t. q. Also, we note that the arity of predicates
and the number of variables occurring in PT [ A [ QC do not depend on K.
Finally, from an implementation point-of-view (as suggested in [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]), the built-in
predicate Named can be considered as an assertion in the ABox A with a di erent
physical realization: it is not directly stored in the ABox but it is implemented
as a procedure which is evaluated during the execution of the program. Clearly,
such a procedure can be implemented to run in time polynomial in K. It follows
that we can compute the minimal Herbrand model of PT [ A [ QC in time
polynomial in the size of K [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. The membership in NP follows directly from
the considerations above and from the fact that we can guess and check in
nondeterministic polynomial time a match for QP in M(PT [ A [ QC ). tu
5
      </p>
    </sec>
    <sec id="sec-4">
      <title>Conclusions</title>
      <p>In this paper, we introduce a query rewriting approach to query answering in EL.
In our approach, the process of computing the certain answers to a CQ q over an
EL KB K = hT ; Ai is divided into two distinct steps. A rst preprocessing step
in which the TBox T is transformed into a Datalog program PT , whose size
is linear in T . Then, at query time, the query q is independently rewritten into
a Datalog query hQP ; QC i, whose size is polynomial in q. Finally, computing
cert(q; K) amounts to evaluating the Datalog query hQP ; QC i over PT [ A.
dr. Lutz et al.</p>
      <p>
        In future, we plan to extend our approach to deal with ELH?
have already proposed a combined approach for query answering in this DL [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
However, di erently from their solution, we would like the Datalog rewriting
hQP ; QC i to be independent from the role inclusions contained in the TBox.
Additionally, we plan to extend our work to cover nominals, which raises the
question on how to e ciently handle equality in Datalog [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Calvanese</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>De Giacomo</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lembo</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lenzerini</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Poggi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rodr</surname>
            <given-names>guezMuro</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Rosati</surname>
          </string-name>
          , R.:
          <article-title>Ontologies and databases: the DL-Lite approach</article-title>
          . In Tessaris, S.,
          <string-name>
            <surname>Franconi</surname>
          </string-name>
          , E., eds.:
          <source>Semantic Technologies for Informations Systems { 5th Int. Reasoning Web Summer School (RW</source>
          <year>2009</year>
          ). Volume
          <volume>5689</volume>
          . (
          <year>2009</year>
          )
          <volume>255</volume>
          {
          <fpage>356</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Perez-Urbina</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Motik</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Horrocks</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Tractable query answering and rewriting under description logic constraints</article-title>
          .
          <source>J. Applied Logic</source>
          <volume>8</volume>
          (
          <issue>2</issue>
          ) (
          <year>2010</year>
          )
          <volume>186</volume>
          {
          <fpage>209</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Baader</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brandt</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lutz</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Pushing the EL envelope</article-title>
          . In Kaelbling,
          <string-name>
            <given-names>L.P.</given-names>
            ,
            <surname>Sa</surname>
          </string-name>
          <string-name>
            <surname>otti</surname>
          </string-name>
          , A., eds.: IJCAI, Professional Book Center (
          <year>2005</year>
          )
          <volume>364</volume>
          {
          <fpage>369</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Baader</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brandt</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lutz</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Pushing the EL envelope further</article-title>
          . In Clark,
          <string-name>
            <given-names>K.</given-names>
            ,
            <surname>Patel-Schneider</surname>
          </string-name>
          , P.F., eds.:
          <source>In Proceedings of the OWLED 2008 DC Workshop on OWL: Experiences and Directions</source>
          . (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Rosati</surname>
          </string-name>
          , R.:
          <article-title>On conjunctive query answering in EL</article-title>
          . In Calvanese, D.,
          <string-name>
            <surname>Franconi</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Haarslev</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lembo</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Motik</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Turhan</surname>
            ,
            <given-names>A.Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tessaris</surname>
          </string-name>
          , S., eds.:
          <source>Description Logics. Volume 250 of CEUR Workshop Proceedings., CEUR-WS.org</source>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Lutz</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Toman</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wolter</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Conjunctive query answering in EL using a database system</article-title>
          .
          <source>In: Proceedings of the OWLED 2008 Workshop on OWL: Experiences and Directions</source>
          . (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Kontchakov</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lutz</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Toman</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wolter</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zakharyaschev</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>The combined approach to ontology-based data access</article-title>
          .
          <source>In: IJCAI</source>
          , AAAI Press (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Abiteboul</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hull</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vianu</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Foundations of databases</article-title>
          .
          <source>Addison-Wesley</source>
          (
          <year>1995</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Warren</surname>
            ,
            <given-names>D.S.:</given-names>
          </string-name>
          <article-title>Query evaluation under the well-founded semantics</article-title>
          . In:
          <article-title>Proceedings of the twelfth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems</article-title>
          .
          <source>PODS '93</source>
          , New York, NY, USA, ACM (
          <year>1993</year>
          )
          <volume>168</volume>
          {
          <fpage>179</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Stefanoni</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Motik</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Horrocks</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Small datalog query rewriting for EL</article-title>
          .
          <source>Technical report</source>
          , University of Oxford (
          <year>2012</year>
          ) Available at http://www.cs.ox.ac.uk/people/giorgio.stefanoni/pubs/2012/tr/dl2012.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Ullman</surname>
            ,
            <given-names>J.D.</given-names>
          </string-name>
          :
          <article-title>Principles of database and knowledge-base systems</article-title>
          , Vol. I. Computer Science Press, Inc., New York, NY, USA (
          <year>1988</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Ceri</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gottlob</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tanca</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>What you always wanted to know about datalog (and never dared to ask)</article-title>
          .
          <source>IEEE Trans. Knowl. Data Eng</source>
          .
          <volume>1</volume>
          (
          <issue>1</issue>
          ) (
          <year>1989</year>
          )
          <volume>146</volume>
          {
          <fpage>166</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Lutz</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Toman</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wolter</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Conjunctive query answering in the description logic EL using a relational database system</article-title>
          . In Boutilier, C., ed.
          <source>: IJCAI</source>
          . (
          <year>2009</year>
          )
          <year>2070</year>
          {
          <fpage>2075</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Lloyd</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Topor</surname>
          </string-name>
          , R.:
          <article-title>Making prolog more expressive</article-title>
          .
          <source>The Journal of Logic Programming</source>
          <volume>1</volume>
          (
          <issue>3</issue>
          ) (
          <year>1984</year>
          )
          <volume>225</volume>
          {
          <fpage>240</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Decker</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Semantic web methods for knowledge managmement</article-title>
          .
          <source>PhD thesis</source>
          , University of Karlsruhe (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>