=Paper=
{{Paper
|id=None
|storemode=property
|title=Introducing Nominals to the Combined Query Answering Approaches for EL
|pdfUrl=https://ceur-ws.org/Vol-1014/paper_10.pdf
|volume=Vol-1014
|dblpUrl=https://dblp.org/rec/conf/dlog/StefanoniMH13
}}
==Introducing Nominals to the Combined Query Answering Approaches for EL==
<pdf width="1500px">https://ceur-ws.org/Vol-1014/paper_10.pdf</pdf>
<pre>
        Introducing Nominals to the Combined Query
               Answering Approaches for EL

                   Giorgio Stefanoni, Boris Motik, and Ian Horrocks

                   Department of Computer Science, University of Oxford


1   Introduction
Description logics (DLs) [1] are a family of knowledge representation formalisms that
underpin OWL 2 [2]—an ontology language used in advanced information systems
with many practical applications. Answering conjunctive queries (CQs) over ontology-
enriched data sets is a core reasoning service in such systems, so the computational
aspects of this problem have received a lot of interest lately. For expressive DLs, the
problem is at least doubly exponential in query size [3]. The problem, however, becomes
easier for the EL [4] and the DL-Lite [5] families of DLs, which provide the foundation
for the OWL 2 EL and the OWL 2 QL profiles of OWL 2. An important goal of this
research was to devise not only worst-case optimal, but also practical algorithms. The
known approaches can be broadly classified as follows.
     The first group consists of automata-based approaches for DLs such as OWL 2
EL [6] and Horn-SHOIQ and Horn-SROIQ [7]. While worst-case optimal, these
approaches are typically not suitable for practice since their best-case and worst-case
performance often coincide.
     The second group consists of rewriting-based approaches. Roughly speaking, these
approaches rewrite the ontology and/or the query into another formalism, typically a
union of conjunctive queries or a datalog program; the relevant answers can then be
obtained by evaluating the rewriting over the data. Rewriting-based approaches were
developed for members of the DL-Lite family [5, 8], and the DLs ELHIO⊥ [9] and
Horn-SHIQ [10], to name just a few. A common problem, however, is that rewrit-
ings can be exponential in the ontology and/or query size. Although this is often not a
problem in practice, such approaches are not worst-case optimal. An exception is the
algorithm by Rosati [11] that rewrites an ELH⊥ ontology into a datalog program of
polynomial size; however, the algorithm also uses a nondeterministic step to transform
the CQ into a tree-shaped one, and it is not clear how to implement this step in a goal-
directed manner.
     The third group consists of combined approaches, which use a three-step process:
first, they augment the data with certain consequences of the ontology; second, they
evaluate the CQ over the augmented data; and third, they filter the result of the sec-
ond phase to eliminate unsound answers. The third step is necessary because, to ensure
termination, the first step is unsound and may introduce facts that do not follow from
the ontology; however, this is done in a way that makes the third step feasible. Such
approaches have been developed for logics in the DL-Lite [12] and the EL [13] fami-
lies, and they are appealing because they are worst-case optimal and practical: only the
second step is intractable (in query size), but it can be solved using database techniques.
    None of the combined approaches proposed thus far, however, handles nominals—
concepts containing precisely one individual. Nominals are included in OWL 2 EL, and
they are often used to state that all instances of a class have a certain property value,
such as ‘the sex of all men is male’, or ‘each German city is located in Germany’. In this
                                                    r
paper we present a combined approach for ELHO⊥        —the DL that covers all features of
OWL 2 EL apart from transitive roles and complex role inclusions. To the best of our
knowledge, this is the first combined approach that handles nominals. Our extension is
nontrivial because nominals require equality reasoning, which increases the complexity
of the first and the third step of the algorithm. In particular, nominals may introduce
recursive dependencies in the filtering conditions used in the third phase; this is in
contrast to the known combined approach for EL [13] in which filtering conditions are
not recursive and thusly can be incorporated into the input query. To solve this problem,
our algorithm evaluates the original CQ and then uses a polynomial function to check
the relevant conditions for each answer.
    Following Krötzsch, Rudolph, and Hitzler [14], instead of directly materialising the
relevant consequences of the ontology and the data, we transform the ontology into
a datalog program that captures the relevant consequences. Although seemingly just
a stylistic issue, a datalog-based specification may be beneficial in practice: one can
either materialise all consequences of the program bottom-up in advance, or one can
use a top-down technique to compute only the consequences relevant for the query at
hand. The latter can be particularly useful in informations systems that have read-only
access to the data, or where data changes frequently.
    We have implemented a prototypical system using our algorithm, and we carried
out a preliminary empirical evaluation of (i) the blowup in the number of facts intro-
duced by the datalog program, and (ii) the number of unsound answers obtained in the
second phase. Our experiments show both of these numbers to be manageable in typical
cases, suggesting that our algorithm provides a practical basis for answering CQs in an
expressive fragment of OWL 2 EL.
    The proofs of our technical results are provided in the technical report [15].


2   Preliminaries

Logic Programming. We use the standard notions of variables, constants, function
symbols, terms, atoms, formulas, and sentences [16]. We often identify a conjunction
with the set of its conjuncts. A substitution σ is a partial mapping of variables to terms;
dom(σ) and rng(σ) are the domain and the range of σ, respectively; σ|S is the restric-
tion of σ to a set of variables S; and, for α a term or a formula, σ(α) is the result of
simultaneously replacing each free variable x occurring in α with σ(x). A Horn clause
C is an expression of the form B1 ∧ . . . ∧ Bm → H, where H and each Bi are atoms.
Such C is a fact if m = 0, and it is commonly written as H. Furthermore, C is safe
if each variable occurring in H also occurs in some Bi . A logic program Σ is a finite
set of safe Horn clauses; furthermore, Σ is a datalog program if each clause in Σ is
function-free.
    In this paper, we interpret a logic program Σ in a model that can be constructed
bottom-up. The Herbrand universe of Σ is the set of all terms built from the constants
and the function symbols occurring in Σ. Given an arbitrary set of facts B, let Σ(B)
be the smallest superset of B such that, for each clause ϕ → ψ ∈ Σ and each substitu-
tion σ mapping the variables occurring in the clause to the Herbrand universe of Σ, if
σ(ϕ) ⊆ B, then σ(ψ) ⊆ Σ(B). Let IS       0 be the set of all facts occurring in Σ; for each
i ∈ N, let Ii+1 = Σ(Ii ); and let I = i∈N Ii . Then I is the minimal Herbrand model
of Σ, and it is well known that I satisfies ∀~x.C for each Horn clause C ∈ Σ and ~x the
vector of all variables occurring in C.
     In this paper we allow a logic program Σ to contain the equality predicate ≈. In
first-order logic, ≈ is usually interpreted as the identity over the interpretation domain;
however, ≈ can also be explicitly axiomatised [16]. Let Σ≈ be the set containing clauses
(1)–(3), an instance of clause (4) for each n-ary predicate R occurring in Σ and each
1 ≤ i ≤ n, and an instance of clause (5) for each n-ary function symbol f occurring in
Σ and each 1 ≤ i ≤ n.
                                           →x≈x                                               (1)
                                x1 ≈ x2 → x2 ≈ x1                                             (2)
                   x1 ≈ x2 ∧ x2 ≈ x3 → x1 ≈ x3                                                (3)
                         x) ∧ xi ≈ x0i → R(x1 , . . . , x0i , . . . , xn )
                       R(~                                                                    (4)
                                xi ≈ x0i → f (. . . , xi , . . .) ≈ f (. . . , x0i , . . .)   (5)
 The minimal Herbrand model of a logic program Σ that contains ≈ is the minimal
Herbrand model of Σ ∪ Σ≈ .
     Conjunctive Queries. A conjunctive query (CQ) is a formula q = ∃~y .ψ(~x, ~y ) with
ψ a conjunction of function-free atoms over variables ~x ∪ ~y . Variables ~x are the answer
variables of q. Let NT (q) be the set of terms occurring in q.
     For τ a substitution such that rng(τ ) contains only constants, let τ (q) = ∃~z.τ (ψ),
where ~z is obtained from ~y by removing each variable y ∈ ~y for which τ (y) is defined.
Note that, according to this definition, non-free variables can also be replaced; for ex-
ample, given q = ∃y1 , y2 .R(y1 , y2 ) and τ = {y2 7→ a}, we have τ (q) = ∃y1 .R(y1 , a).
     Let Σ be a logic program, let I be the minimal Herbrand model of Σ, and let
q = ∃~y .ψ(~x, ~y ) be a CQ that uses only the predicates occurring in Σ. A substitution π
is a candidate answer for q in Σ if dom(π) = ~x and rng(π) contains only constants; fur-
thermore, such a π is a certain answer to q over Σ, written Σ |= π(q), if a substitution
τ exists such that dom(τ ) = ~x ∪ ~y , π = τ |~x , and τ (q) ⊆ I.
                                        r
     Description Logic. DL ELHO⊥           is defined w.r.t. a signature consisting of mutu-
ally disjoint and countably infinite sets NC , NR , and NI of atomic concepts (i.e., unary
predicates), roles (i.e., binary predicates), and individuals (i.e., constants), respectively.
Furthermore, for each individual a ∈ NI , expression {a} denotes a nominal—that is,
a concept containing precisely the individual a. Also, we assume that > and ⊥ are
unary predicates (without any predefined meaning) not occurring in NC . We consider
                                                                                    r
only normalised knowledge bases, as it is well known [4] that each ELHO⊥              KB can
                                                                                            r
be normalised in polynomial time without affecting the answers to CQs. An ELHO⊥
TBox is a finite set of axioms of the form shown in the left-hand side of Table 1, where
A(i) ∈ NC ∪ {>}, B ∈ NC ∪ {>, ⊥}, R, S ∈ NR , and a ∈ NI . An ABox A is a finite
set of facts constructed using the symbols from NC ∪ {>, ⊥}, NR , and NI . Finally, an
         r                                                                            r
ELHO⊥      knowledge base (KB) is a tuple K = hT , Ai, where T is an ELHO⊥              TBox
T and an A is an ABox such that each predicate occurring in A also occurs in T .
                                          r
                Table 1. Transforming ELHO⊥ Axioms into Horn Clauses
                    Type Axiom                               Clause
                       1 {a} v A                              A(a)
                        2AvB                          A(x) → B(x)
                        3 A v {a}                    A(x) → x ≈ a
                        4 A1 u A2 v A       A1 (x) ∧ A2 (x) → A(x)
                        5 ∃R.A1 v A        R(x, y) ∧ A1 (y) → A(x)
                                            A1 (x) → R(x, fR,A (x))
                        6 A1 v ∃R.A
                                              A1 (x) → A(fR,A (x))
                        7RvS                      R(x, y) → S(x, y)
                        8 range(R, A)               R(x, y) → A(y)


    We interpret K as a logic program. Table 1 shows how to translate a TBox T into a
logic program Ξ(T ). Moreover, let >(T ) be the set of the following clauses instantiated
for each atomic concept A and each role R occurring in T .
                 A(x) → >(x) R(x, y) → >(x) R(x, y) → >(y)
A KB K = hT , Ai is translated into the logic program Ξ(K) = Ξ(T ) ∪ >(T ) ∪ A.
Then, K is unsatisfiable if Ξ(K) |= ∃y.⊥(y). Furthermore, given a conjunctive query q
and a candidate answer π for q, we write K |= π(q) iff K is unsatisfiable or Ξ(K) |= π(q).
Given a candidate answer π for q, deciding whether Ξ(K) |= π(q) holds is NP-complete
in combined complexity, and P TIME-complete in data complexity [6].

                             r
3   Datalog Rewriting of ELHO⊥ TBoxes
                                                         r
For the rest of this section, we fix an arbitrary ELHO⊥    knowledge base K = hT , Ai.
We next show how to transform K into a datalog program D(K) that can be used to
check the satisfiability of K. In the following section, we then show how to use D(K)
to answer conjunctive queries.
    Due to axioms of type 6 (cf. Table 1), Ξ(K) may contain function symbols and
is generally not a datalog program; thus, the evaluation of Ξ(K) may not terminate.
To ensure termination, we eliminate function symbols from Ξ(K) using the technique
by Krötzsch, Rudolph, and Hitzler [14]: for each A ∈ NC ∪ {>} and each R ∈ NR
occurring in T , we introduce a globally fresh and unique auxiliary individual oR,A .
Intuitively, oR,A represents all terms in the Herbrand universe of Ξ(K) needed to satisfy
the existential concept ∃R.A. Krötzsch, Rudolph, and Hitzler [14] used this technique
to facilitate taxonomic reasoning, while we use it to obtain a practical CQ answering
algorithm. Please note that oR,A depends on both R and A, whereas in the known
approaches such individuals depend only on A [13] or R [12].
Definition 1. Datalog program D(T ) is obtained by translating each axiom of type
other than 6 in the TBox T of K into a clause as shown in Table 1, and by translating
each axiom A1 v ∃R.A in T into clauses A1 (x) → R(x, oR,A ) and A1 (x) → A(oR,A ).
Furthermore, the translation of K into datalog is given by D(K) = D(T ) ∪ >(T ) ∪ A.
                                                taught        oT,P advis
                                           ai                                or
                                                  ht                                advisor
                                             taug                                 oA,P
                                          kr taug                             r
                                                   h   t               v  iso
                                                                     ad

                                          J                john ≈ oT,J

                           ai
                                  taught        fT,P(ai) advisor              fA,P (fT,P(ai)) advisor


                                  taught        fT,P(kr) advisor             fA,P (fT,P(kr)) advisor
                          kr


                                ta
                                 ug
                                     ht
                                                                            fA,P (john)
                                                           advisor                         advisor
                                john ≈ fT,J(kr)                                  ≈
                                                                           fA,P (fT,J(kr))
                                  I


                         Fig. 1. Representing the Models of Ξ(K).

                                      r
Example 1. Let T be the following ELHO⊥ TBox:

    KRC v ∃taught.JProf                              JProf v {john}                                     KRC v Course
   Course v ∃taught.Prof                         ∃taught.> v Course                                     range(taught, Prof )
     Prof v ∃advisor .Prof                            {kr } v KRC

Then, D(T ) contains the following clauses:

KRC (x) → taught(x, oT,J ) Prof (x) → advisor (x, oA,P ) KRC (kr)
KRC (x) → JProf (oT,J )      Prof (x) → Prof (oA,P )     KRC (x) → Course(x)
Course(x) → taught(x, oT,P ) JProf (x) → x ≈ john        taught(x, y) → Prof (y)
Course(x) → Prof (oT,P )     taught(x, y) → Course(x)                         ♦

   The following result readily follows from the definition of Ξ(K) and D(K).

Proposition 1. Program D(K) can be computed in time linear in the size of K.

     Next, we prove that the datalog program D(K) can be used to decide the satisfia-
bility of K. To this end, we define a function δ that maps each term w in the Herbrand
universe of Ξ(K) to the Herbrand universe of D(K) as follows:
                          (
                            w       if w ∈ NI ,
                  δ(w) =
                            oR,A if w is of the form w = fR,A (w0 ).

Let I and J be the minimal Herbrand models of Ξ(K) and D(K), respectively. Mapping
δ establishes a tight relationship between I and J as illustrated in the following example.

Example 2. Let A = {Course(ai )}, let T be as in Example 1, and let K = hT , Ai.
Figure 1 shows a graphical representation of the minimal Herbrand models I and J of
Ξ(K) and D(K), respectively. The grey dotted lines show how δ relates the terms in I
to the terms in J. For the sake of clarity, Figure 1 does not show the reflexivity of ≈. ♦

   Mapping δ is a homomorphism from I to J.
Lemma 1. Let I and J be the minimal Herbrand models of Ξ(K) and D(K), respec-
tively. Mapping δ satisfies the following three properties for all terms w0 and w, each
B ∈ NC ∪ {>, ⊥}, and each R ∈ NR .
 1. B(w) ∈ I implies B(δ(w)) ∈ J.
 2. R(w0 , w) ∈ I implies R(δ(w0 ), δ(w)) ∈ J.
 3. w0 ≈ w ∈ I implies δ(w0 ) ≈ δ(w) ∈ J.
    For a similar result in the other direction, we need a couple of definitions. Let H
be an arbitrary Herbrand model. Then, dom(H) is the set containing each term w that
occurs in H in at least one fact with a predicate in NC ∪ {>, ⊥} ∪ NR ; note that, by this
definition, we have w 6∈ dom(H) whenever w occurs in H only in assertions involving
the ≈ predicate. Furthermore, auxH is the set of all terms w ∈ dom(H) such that, for
each term w0 with w ≈ w0 ∈ H, we have w0 6∈ NI . We say that the terms in auxH are
‘true’ auxiliary terms—that is, they are not equal to an individual in NI . In Figure 1,
bold terms are ‘true’ auxiliary terms in I and J.
Lemma 2. Let I and J be the minimal Herbrand models of Ξ(K) and D(K), respec-
tively. Mapping δ satisfies the following five properties for all terms w1 and w2 in
dom(I), each B ∈ NC ∪ {>, ⊥}, and each R ∈ NR .
 1. B(δ(w1 )) ∈ J implies that B(w1 ) ∈ I.
 2. R(δ(w1 ), δ(w2 )) ∈ J and δ(w2 ) 6∈ auxJ imply that R(w1 , w2 ) ∈ I.
 3. R(δ(w1 ), δ(w2 )) ∈ J and δ(w2 ) ∈ auxJ imply that δ(w2 ) is of the form oP,A , that
    R(w1 , fP,A (w1 )) ∈ I, and that a term w10 exists such that R(w10 , w2 ) ∈ I.
 4. δ(w1 ) ≈ δ(w2 ) ∈ J and δ(w2 ) 6∈ auxJ imply that w1 ≈ w2 ∈ I.
 5. For each term u occurring in J, term w ∈ dom(I) exists such that δ(w) = u.
    Lemmas 1 and 2 allow us to decide the satisfiability of K by answering a simple
query over D(K), as shown in Proposition 2. The complexity claim is due to the fact
that each clause in D(K) contains a bounded number of variables [17].
                                        r
Proposition 2. For K an arbitrary ELHO⊥   knowledge base, Ξ(K) |= ∃y.⊥(y) if and
only if D(K) |= ∃y.⊥(y). Furthermore, the satisfiability of K can be checked in time
polynomial in the size of K.

4    Answering Conjunctive Queries
                                               r
In this section, we fix a satisfiable ELHO⊥      knowledge base K = hT , Ai and a con-
junctive query q = ∃~y .ψ(~x, ~y ). Furthermore, we fix I and J to be the minimal Herbrand
models of Ξ(K) and D(K), respectively.
    While D(K) can be used to decide the satisfiability of K, the following example
shows that D(K) cannot be used directly to compute the answers to q.
Example 3. Let K be as in Example 2, and let q1 , q2 , and q3 be the following CQs:
q1 = taught(x1 , x2 )
q2 = ∃y1 , y2 , y3 . taught(x1 , y1 ) ∧ taught(x2 , y2 ) ∧ advisor (y1 , y3 ) ∧ advisor (y2 , y3 )
q3 = ∃y. advisor (y, y)
Furthermore, let τi be the following substitutions:

          τ1 = {x1 7→ kr , x2 7→ oT,P }
          τ2 = {x1 7→ kr , x2 7→ ai , y1 7→ oT,P , y2 7→ oT,P , y3 7→ oA,P }
          τ3 = {y 7→ oA,P }

Finally, let each πi be the projection of τi to the answer variables of qi . Using Figure 1,
one can readily check that D(K) |= τi (qi ), but Ξ(K) 6|= πi (qi ), for each 1 ≤ i ≤ 3. ♦

    This can be explained by observing that J is a homomorphic image of I. Now
homomorphisms preserve CQ answers (i.e., Ξ(K) |= π(q) implies D(K) |= π(q)), but
they can also introduce unsound answers (i.e., D(K) |= π(q) does not necessarily imply
Ξ(K) |= π(q)). This gives rise to the following notion of spurious answers.

Definition 2. A substitution τ with dom(τ ) = ~x ∪ ~y and D(K) |= τ (q) is a spurious
answer to q if τ |~x is not a certain answer to q over Ξ(K).

     Based on these observations, we answer q over K in two steps: first, we evaluate q
over D(K) and thus obtain an overestimation of the certain answers to q over Ξ(K);
second, for each substitution τ obtained in the first step, we eliminate spurious an-
swers using a special function isSpur. We next formally introduce this function. We
first present all relevant definitions, after which we discuss the intuitions. As we shall
see, each query in Example 3 illustrates a distinct source of spuriousness that our func-
tion needs to deal with.

Definition 3. Let τ be a substitution s.t. dom(τ ) = ~x ∪ ~y and D(K) |= τ (q). Relation
∼ ⊆ NT (q) × NT (q) for q, τ , and D(K) is the smallest reflexive, symmetric, and tran-
sitive relation closed under the fork rule, where auxD(K) is the set containing each in-
dividual u from D(K) for which no individual c ∈ NI exists such that D(K) |= u ≈ c.
                    0     0
           (fork) s ∼ t       R(s, s0 ) and P (t, t0 ) occur in q, and τ (s0 ) ∈ auxD(K)
                   s∼t
    Please note that the definition auxD(K) is actually a reformulation of the definition
of auxJ , but based on the consequences of D(K) rather than the facts in J.
    Relation ∼ is reflexive, symmetric, and transitive, so it is an equivalence relation,
which allows us to normalise each term t ∈ NT (q) to a representative of its equiva-
lence class using the mapping γ defined below. We then construct a graph Gaux that
checks whether substitution τ matches ‘true’ auxiliary individuals in a way that cannot
be converted to a match over ‘true’ auxiliary terms in I.

Definition 4. Let τ and ∼ be as specified in Definition 3. Function γ : NT (q) 7→ NT (q)
maps each term t ∈ NT (q) to an arbitrary, but fixed representative γ(t) of the equiva-
lence class of ∼ that contains t. Furthermore, the directed graph Gaux = hVaux , Eaux i
is defined as follows.
  – Set Vaux contains a vertex γ(t) for each term t ∈ NT (q) such that τ (t) ∈ auxD(K) .
  – Set Eaux contains an edge hγ(s), γ(t)i for each atom of the form R(s, t) in q such
     that {γ(s), γ(t)} ⊆ Vaux .
Query q is aux-cyclic w.r.t. τ and D(K) if Gaux contains a cycle; otherwise, q is aux-
acyclic w.r.t. τ and D(K).

   We are now ready to define our function that checks whether a substitution τ is a
spurious answer.

Definition 5. Let τ and ∼ be as specified in Definition 3. Function isSpur(q, D(K), τ )
returns t if and only if at least one of the following conditions hold.
(a) Variable x ∈ ~x exists such that τ (x) 6∈ NI .
(b) Terms s and t occurring in q exist such that s ∼ t and D(K) 6|= τ (s) ≈ τ (t).
(c) Query q is aux-cyclic w.r.t. τ and D(K).

     We next discuss the intuition behind our definitions. We ground our discussion in
minimal Herbrand models I and J, but our technique does not depend on such models:
all conditions are stated as entailments that can be checked using an arbitrary sound
                                                    r
and complete technique. Since K is an ELHO⊥           knowledge base, the shape of model
I resembles a forest: roughly speaking, the role assertions in I that involve at least
one functional term are of the form R(w1 , fR,A (w1 )) or R(w1 , a) for a ∈ NI ; thus,
I can be viewed as a family of directed trees whose roots are the individuals in NI
and whose edges point from parents to children or to the individuals in NI . This is
illustrated in Figure 1, whose lower part shows the the forest-model of the knowledge
base from Example 3. Note that assertions of the form R(w1 , a) are introduced via
equality reasoning and may result in links from functional terms to individuals.
     Now let τ be a substitution such that D(K) |= τ (q), and let π = τ |~x . If τ is not
a spurious answer, it should be possible to convert τ into a substitution π ∗ such that
π = π ∗ |~x and π ∗ (q) ⊆ I. Using the queries from Example 3, we next identify three
reasons why this may not be possible.
     First, τ may map an answer variable of q to an auxiliary individual, so by the defini-
tion π cannot be a certain answer to q; condition (a) of Definition 5 identifies such cases.
Query q1 and substitution τ1 from Example 3 illustrate such a situation: τ2 (x2 ) = oT ,P
and oT ,P is a ‘true’ auxiliary individual, so π1 is not a certain answer to q1 .
     The remaining two problems arise because model J is not forest-shaped, so τ might
map q into J in a way that cannot be converted into a substitution π ∗ that maps q into I.
     The second problem is best explained using substitution τ2 and query q2 from Ex-
ample 3. Query q2 contains a ‘fork’ advisor (y1 , y3 ) ∧ advisor (y2 , y3 ). Now, substi-
tution τ2 maps y3 to ‘true’ auxiliary individual oA,P which represents ‘true’ auxiliary
terms fA,P (fT,P (ai )), fA,P (fT,P (kr )), and so on. Since I is forest-shaped, a match
π2∗ for q in I obtained from τ2 would need to map y3 to one of these terms; let us assume
that π2∗ (y3 ) is mapped to fA,P (fT,P (ai )). Since I is forest-shaped and fA,P (fT,P (ai ))
is a ‘true’ auxiliary term, this means that both y1 and y2 must be mapped to the same
term (in both J and I). This is captured by the (fork) rule: since ∼ is reflexive, the rule
derives y1 ∼ y2 , and condition (b) of Definition 5 checks whether τ2 maps y1 and y2
in a way that satisfies this constraint. Note that, due to role hierarchies, the rule needs
to be applied to atoms R(s, s0 ) and P (t, t0 ) with R 6= P . Moreover, such constraints
must be propagated further up the query. In our example, due to y1 ∼ y2 and τ2 (y1 ) is
a ‘true’ auxiliary individual, atoms taught(x1 , y1 ) ∧ taught(x2 , y2 ) in q2 also consti-
                             Table 2. Size of the materialisations.


                         Individuals     Unary facts            Binary facts
                       (% in auxD(K) ) (% over auxD(K) )      (% over auxD(K) )
                L-5    100848           169079                 296941
                Mat.   100868 (0.01) 309350 (0.01)             632489 (49.2)
                L-10   202387           339746                 598695
                Mat.   202407 (0.01) 621158 (0.01)            1277575 (49.3)
                L-20   426144           714692                1259936
                Mat.   426164 (0.01) 1304815 (0.01)           2691766 (49.3)
                SEM      17945              17945               47248
                Mat.     17953 (0.04)       25608 (0.03)        76590 (38.3)


tute a ‘fork’, so the rule derives x1 ∼ x2 ; this allows condition (b) of Definition 5 to
correctly identify τ2 as spurious.
     The third problem is best explained using substitution τ3 and query q3 from Exam-
ple 3. Model J contains a ‘loop’ on individual oA,P , which allows τ3 to map q3 into J.
In contrast, model I is forest-shaped, and so the ‘true’ auxiliary terms that correspond
to oA,P do not form loops. Condition (c) of Definition 5 detects such situations using
the graph Gaux . The vertices of Gaux correspond to the terms of q that are matched to
‘true’ auxiliary individuals (mapping γ simply ensures that equal terms are represented
as one vertex), and edges of Gaux correspond to the role atoms in q. Hence, if Gaux is
cyclic, then the substitution π ∗ obtained from τ would need to match the query q over
a cycle of ‘true’ auxiliary terms, which is impossible since I is forest-shaped.
     Unlike the known combined approaches, our approach does not extend q with con-
ditions that detect spurious answers. Due to nominals, the relevant equality constraints
have a recursive nature: they depend on both the substitution τ and on the previously
derived constraints. Given that first-order queries do not allow for expressing recursive
properties, extending q with these constraints seems to necessarily result in an exponen-
tial blow-up in the size of the query. Consequently, filtering in our approach is realised
as postprocessing. The following theorem proves the correctness of our approach.
                                                        r
Theorem 1. Let K = hT , Ai be a satisfiable ELHO⊥          KB, let q = ∃~y .ψ(~x, ~y ) be a CQ,
and let π : ~x 7→ NI be a candidate answer for q. Then, Ξ(K) |= π(q) iff a substitution τ
exists such that dom(τ ) = ~x ∪ ~y , τ |~x = π, D(K) |= τ (q), and isSpur(q, D(K), τ ) = f.
    Furthermore, isSpur(q, D(K), τ ) can be evaluated in polynomial time, so the main
source of complexity in our approach is in deciding whether D(K) |= τ (q) holds. This
gives rise to the following result.
Theorem 2. Deciding whether K |= π(q) can be implemented in nondeterministic poly-
nomial time w.r.t. the size of K and q, and in polynomial time w.r.t. the size of A.

5   Evaluation
To gain insight into the practical applicability of our approach, we implemented our
technique in a prototypical system. The system uses HermiT, a widely used ontology
reasoner, as a datalog engine in order to materialise the consequences of D(K) and
evaluate q. The system has been implemented in Java, and we ran our experiments on a
MacBook Pro with 4GB of RAM and an Intel Core 2 Duo 2.4 Ghz processor. We used
two ontologies in our evaluation, details of which are given below. The ontologies, the
queries, and the prototype are available at http://www.cs.ox.ac.uk/isg/tools/KARMA/.
    The LSTW benchmark [18] consists of an OWL 2 QL version of the LUBM on-
tology [19], queries q1l , . . . , q11
                                    l
                                        , and a data generator. The LSTW ontology extends the
standard LUBM ontology with several axioms of type 6 (see Table 1). To obtain an
         r
ELHO⊥      ontology, we removed inverse roles and datatypes, added 11 axioms using 9
freshly introduced nominals, and added one axiom of type 4 (see Table 1). These addi-
tional axioms resemble the ones in Example 1, and they were designed to test equality
reasoning. The resulting signature consists of 132 concepts, 32 roles, and 9 nominals,
and the ontology contains 180 axioms. From the 11 LSTW queries, we did not consider
queries q4l , q6l , q7l , and q11
                               l
                                  because their result sets were empty: q4l relies on existential
quantification over inverse roles, and the other three are empty already w.r.t. the original
LSTW ontology. Query q2l is similar to query q2 from Example 3, and it was designed
to produce only spurious answers and thus stress the system. We generated data sets
with 5, 10 and 20 universities. For each data set, we denote with L-i the knowledge
                                      r
base consisting of our ELHO⊥             ontology and the ABox for i universities (see Table 2).
    SEMINTEC is an ontology about financial services developed within the SEM-
                                                                            r
INTEC project at the University of Poznan. To obtain an ELHO⊥                  ontology, we re-
moved inverse roles, role functionality axioms, and universal restrictions, added nine
axioms of type 6 (see Table 1), and added six axioms using 4 freshly introduced nomi-
nals. The resulting ontology signature consists of 60 concepts, 16 roles, and 4 nominals,
and the ontology contains 173 axioms. Queries q1s –q5s are tree-shaped queries used in
the SEMINTEC project, and we developed queries q6s –q9s ourselves. Query q6s resembles
query q2l from LSTW, and queries q8s and q9s were designed to retrieve a large number
of answers containing auxiliary individuals, thus stressing condition (a) of Definition
5. Finally, the SEMINTEC ontology comes with a data set consisting of approximately
65,000 facts concerning 18,000 individuals (see row SEM in Table 2).
    The practicality of our approach, we believe, is determined mainly by the following
two factors. First, the number of facts involving auxiliary individuals introduced during
the materialisation step should not be ‘too large’. Table 2 shows the materialisation re-
sults: the first column shows the number of individuals before and after materialisation
and the percentage of ‘true’ auxiliary individuals, the second column shows the number
of unary facts before and after materialisation and the percentage of facts involving a
‘true’ auxiliary individual, and the third column does the same for binary facts. As one
can see, for each data set, the materialisation introduces few ‘true’ auxiliary individuals,
and the number of facts at most doubles. The number of unary facts involving a ‘true’
auxiliary individual does not change with the size of the input data set, whereas the
number of such binary facts increases by a constant factor. This is because, in clauses
of type 6, atoms A(oR,A ) do not contain a variable, whereas atoms R(x, oR,A ) do.
    Second, evaluating q over D(K) should not produce too many spurious answers. Ta-
ble 3 shows the total number of answers for each query—that is, the number of answers
obtained by evaluating the query over D(K); moreover, the table shows what percentage
Table 3. Total number of answers and ratio spurious to answers. In Table LSTW, the ratio is
stable for each data set. The bottom table displays results for the SEMINTEC ontology.

    LSTW       q1l          q2l        q3l       q5l         q8l      q9l        l
                                                                                q10
             Tot (%)     Tot (%) Tot (%) Tot (%) Tot (%) Tot (%) Tot (%)
       L-5 116K       3.7M           10        28K       13K        1K      12K
     L-10 233K (4.0) 32M (100.0) 22 (0.0) 57K (0.0) 26K (26.0) 2K (0.0) 25K (74.5)
     L-20 487K       170M            43       121K       55K        4K      53K
    q1s       q2s     q3s       q4s       q5s        q6s        q7s     q8s        q9s
 Tot (%) Tot (%) Tot (%) Tot (%) Tot (%) Tot (%) Tot (%) Tot (%) Tot (%)
   7 (0.0) 53 (0.0) 16 (0.0) 12 (0.0) 31 (0.0) 838K (55.4) 5K (0.0) 5K (54.3) 13K (33.3)


of these answers are spurious. Queries q2l , q10
                                              l
                                                 , q6s , and q8s retrieve a significant percent-
age of spurious answers. However, only query q2l has proven to be challenging for our
system due to the large number of retrieved answers, with an evaluation time of about
40 minutes over the largest knowledge base (L-20). Surprisingly, q1l also performed
rather poorly despite a low number of spurious answers, with an evaluation time of
about 20 minutes for L-20. All other queries were evaluated in at most a few seconds,
thus suggesting that queries q1l and q2l are problematical mainly because HermiT does
not implement query optimisation algorithms typically used in relational databases.


6   Conclusion

We presented the first combined technique for answering conjunctive queries over DL
ontologies that include nominals. A preliminary evaluation suggests the following. First,
the number of materialised facts over ‘true’ anonymous individuals increases by a con-
stant factor with the size of the data. Second, query evaluation results have shown that,
while some cases may be challenging, in most cases the percentage of answers that
are spurious is manageable. Hence, our technique provides a practical CQ answering
algorithm for a large fragment of OWL 2 EL.
     We anticipate several directions for our future work. First, we would like to investi-
gate the use of top-down query evaluation techniques, such as magic sets [20] or SLG
resolution [21]. Second, tighter integration of the detection of spurious answers with the
query evaluation algorithms should make it possible to eagerly detect spurious answers
(i.e., before the query is fully evaluated). Lutz et al. [18] already implemented a filtering
condition as a user-defined function in a database, but it is unclear to what extent such
an implementation can be used to optimise query evaluation. Finally, we would like to
extend our approach to all of OWL 2 EL.


Acknowledgements

This work was supported by the Royal Society; Alcatel-Lucent; the EU FP7 project
OPTIQUE; and the EPSRC projects ExODA, MASI3 , and QueRe.
References
 1. Baader, F., Calvanese, D., McGuinness, D., Nardi, D., Patel-Schneider, P.F., eds.: The De-
    scription Logic Handbook: Theory, Implementation, and Applications. Cambridge Univer-
    sity Press (2007) ISBN 9780511717383.
 2. Cuenca Grau, B., Horrocks, I., Motik, B., Parsia, B., Patel-Schneider, P., Sattler, U.: OWL
    2: The next step for OWL. Journal of Web Semantics 6(4) (2008) 309–322
 3. Glimm, B., Horrocks, I., Lutz, C., Sattler, U.: Conjunctive Query Answering for the De-
    scription Logic SHIQ. Journal of Artificial Intelligence Research 31 (2008) 151–198
 4. Baader, F., Brandt, S., Lutz, C.: Pushing the EL Envelope. In Kaelbling, L.P., Saffiotti, A.,
    eds.: Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI
    2005), Edinburgh, UK, Morgan Kaufmann Publishers (July 30–August 5 2005) 364–369
 5. Calvanese, D., De Giacomo, G., Lembo, D., Lenzerini, M., Rosati, R.: Tractable Reason-
    ing and Efficient Query Answering in Description Logics: The DL-Lite Family. Journal of
    Automated Reasoning 9(3) (2007) 385–429
 6. Krötzsch, M., Rudolph, S., Hitzler, P.: Conjunctive queries for a tractable fragment of
    OWL 1.1. In Aberer, K., Choi, K.S., Noy, N., Allemang, D., Lee, K.I., Nixon, L., Golbeck,
    J., Mika, P., Maynard, D., Mizoguchi, R., Schreiber, G., Cudré-Mauroux, P., eds.: Proceed-
    ings of the 6th International Semantic Web Conference (ISWC’07). Volume 4825 of LNCS.,
    Springer (2007) 310–323
 7. Ortiz, M., Rudolph, S., Simkus, M.: Query Answering in the Horn Fragments of the De-
    scription Logics SHOIQ and SROIQ. In Walsh, T., ed.: Proceedings of the 22nd Inter-
    national Joint Conference on Artificial Intelligence (IJCAI 2011), Barcelona, Spain, AAAI
    Press (July 16–22 2011) 1039–1044
 8. Artale, A., Calvanese, D., Kontchakov, R., Zakharyaschev, M.: The DL-Lite Family and
    Relations. Journal of Artificial Intelligence Research 36 (2009) 1–69
 9. Pérez-Urbina, H., Motik, B., Horrocks, I.: Tractable Query Answering and Rewriting under
    Description Logic Constraints. Journal of Applied Logic 8(2) (2010) 186–209
10. Eiter, T., Ortiz, M., Simkus, M., Tran, T.K., Xiao, G.: Query Rewriting for Horn-SHIQ
    Plus Rules. In: Proceedings of the 26th AAAI Conference on Artificial Intelligence, (AAAI
    2012), AAAI Press (2012)
11. Rosati, R.: On Conjunctive Query Answering in EL. In Calvanese, D., Franconi, E.,
    Haarslev, V., Lembo, D., Motik, B., Turhan, A.Y., Tessaris, S., eds.: Proceedings of the 20th
    International Workshop on Description Logics (DL-2007). CEUR Workshop Proceedings,
    CEUR-WS.org (2007)
12. Kontchakov, R., Lutz, C., Toman, D., Wolter, F., Zakharyaschev, M.: The Combined Ap-
    proach to Ontology-Based Data Access. In Walsh, T., ed.: Proceedings of the 22nd Interna-
    tional Joint Conference on Artificial Intelligence (IJCAI 2011), AAAI Press (2011)
13. Lutz, C., Toman, D., Wolter, F.: Conjunctive Query Answering in the Description Logic
    EL Using a Relational Database System. In Boutilier, C., ed.: Proceedings of the 21st In-
    ternational Joint Conference on Artificial Intelligence, (IJCAI 2009), AAAI Press (2009)
    2070–2075
14. Krötzsch, M., Rudolph, S., Hitzler, P.: ELP: Tractable rules for OWL 2. In Sheth, A., Staab,
    S., Dean, M., Paolucci, M., Maynard, D., Finin, T., Thirunarayan, K., eds.: Proceedings of the
    7th International Semantic Web Conference (ISWC’08). Volume 5318 of LNCS., Springer
    (2008) 649–664
15. Stefanoni, G., Motik, B., Horrocks, I.: Introducing Nominals to the Combined Query An-
    swering Approaches for EL. CoRR abs/1303.7430 (2013)
16. Fitting, M.: First-order logic and automated theorem proving (2nd ed.). Springer-Verlag
    New York, Inc., Secaucus, NJ, USA (1996)
17. Dantsin, E., Eiter, T., Gottlob, G., Voronkov, A.: Complexity and expressive power of logic
    programming. ACM Computing Surveys 33(3) (2001) 374–425
18. Lutz, C., Seylan, I., Toman, D., Wolter, F.: The Combined Approach to OBDA: Taming
    Role Hierarchies using Filters. In Fokoue, A., Liebig, T., Goodman, E., Weaver, J., Urbani,
    J., Mizell, D., eds.: Proceedings of the Joint Workshop on Scalable and High-Performance
    Semantic Web Systems (SSWS+HPCSW 2012). Volume 943 of CEUR Workshop Proceed-
    ings., CEUR-WS.org (2012) 16–31
19. Guo, Y., Pan, Z., Heflin, J.: LUBM: A benchmark for OWL knowledge base systems. Journal
    of Web Semantics 3(2–3) (2005) 158–182
20. Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases. Addison-Wesley (1995)
21. Chen, W., Warren, D.S.: Query evaluation under the well-founded semantics. In: Proceed-
    ings of the 12th ACM SIGACT-SIGMOD-SIGART symposium on Principles of database
    systems. PODS ’93, New York, NY, USA, ACM (1993) 168–179

</pre>