Pay-as-you-go Ontology Query Answering
              Using a Datalog Reasoner?

      Yujiao Zhou, Yavor Nenov, Bernardo Cuenca Grau, and Ian Horrocks

              Department of Computer Science, University of Oxford, UK


        Abstract. We describe a hybrid approach to conjunctive query answer-
        ing over OWL 2 ontologies that combines a datalog reasoner with a
        fully-fledged OWL 2 reasoner in order to provide scalable “pay as you
        go” performance. Our approach delegates the bulk of the computation
        to the highly scalable datalog engine and resorts to expensive OWL 2
        reasoning only as necessary to fully answer the query. We have imple-
        mented a prototype system that uses RDFox as a datalog reasoner, and
        HermiT as an OWL 2 reasoner. Our evaluation over both benchmark and
        realistic ontologies and datasets suggests the feasibility of our approach.


1     Introduction
The use of RDF [14], OWL 2 [15], and SPARQL 1.1 [24] to represent and query
semi-structured data together with domain knowledge is increasingly widespread.
Query answering in this setting is, however, of high worst-case complexity [6, 5],
and although heavily optimised, existing systems for query answering w.r.t. RDF
data and an unrestricted OWL 2 ontology can process only small to medium size
datasets [12, 25, 10]. This has led to the development of query answering proce-
dures that are more scalable, but that can (fully) process only fragments of OWL
2, and several prominent fragments have now been standardised as OWL 2 pro-
files [13]. Such systems have been shown to be (potentially) highly scalable [23,
1, 22, 26], but if the ontology falls outside the relevant profile, then the answers
computed by such a system may be incomplete: if it returns an answer, then all
tuples in the answer are (usually) valid, but some valid tuples may be missing
from the answer. When used with out-of-profile ontologies, a query answer com-
puted by such a system can thus be understood as providing a lower-bound on
the correct answer; however, they cannot in general provide any upper bound or
even any indication as to how complete the computed answer is [2].
     In this paper, we describe a hybrid approach to query answering that exploits
a datalog reasoner to compute both a lower bound answer and an upper bound
answer. If lower and upper bound answers coincide, they obviously provide a
sound and complete answer. Otherwise, relevant fragments of the ontology and
data can be extracted that are guaranteed to be sufficient to test the validity
of tuples in the “gap” between the two answers. These fragments can also be
?
    This paper recapitulates some of our results in [27–29], and it is accompanied with
    a technical report available at http://www.cs.ox.ac.uk/isg/people/yujiao.zhou.
computed by relying solely on the datalog reasoner, and are typically much
smaller than the input ontology and data. The remaining gap tuples need to
be checked w.r.t. to the identified fragments using an OWL 2 reasoner such
as HermiT [16] or Pellet [18]; furthermore, since the number of gap tuples can
be significant in some cases, we exploit summarisation techniques inspired by
the SHER system [3, 4] to quickly identify spurious gap answers, thus further
reducing the requirement for fully-fledged OWL 2 reasoning.
    Our approach is pay-as-you-go in the sense that the bulk of the computation
is delegated to a scalable datalog engine. Furthermore, although our main goal
is to answer queries over OWL 2 ontologies efficiently, our technical results are
very general and our approach is not restricted to DLs. More precisely, given a
first-order KR language L that can be captured by rules allowing for existential
quantification and disjunction in the head, and over which we want to answer
conjunctive queries, our only assumption is the availability of a fully-fledged
reasoner for L and a datalog reasoner, which are both used as a “black box”.
    We have implemented our techniques in a prototypical system using the
RDFox as a datalog reasoner [23] and the HermiT as a fully-fledged OWL 2
reasoner.1 Our preliminary evaluation over both benchmark and realistic data
suggests that the system can provide scalable pay-as-you-go query answering
for a wide range of OWL 2 ontologies, RDF data and queries. In almost all
cases, the system is able to completely answer queries without resorting to fully-
fledged OWL 2 reasoning, and even when this is not the case, relevant fragment
extraction and summarisation are effective in reducing the size of the problem
to manageable proportions.


2     Preliminaries

We adopt standard first order logic notions, such as variables, constants, atoms,
formulas, clauses, substitutions, satisfiability, and entailment. We also assume
basic familarity with OWL 2 [15] and its profiles [13]. A generalised rule (or just
a rule) is a function-free sentence of the form
                                n
                                ^              m
                                               _
                         ∀x (       Bj (x) →         ∃yi ϕi (x, yi ))
                              j=0              i=0


where Bj (x) are body atoms and ϕi are conjunctions of head atoms. The universal
quantifiers are left implicit from now on. A rule is Horn if m ≤ 1, and it is datalog
if it is Horn and does not contain existential quantifiers. A fact is a ground atom
and a dataset is a finite set of facts. A knowledge base K consists of a finite
set of rules and a dataset. We treat equality (≈) as an ordinary predicate, but
assume that every knowledge base in which equality occurs contains the axioms
of equality for its signature. Each OWL 2 ontology can be normalised as one
1
    Although our techniques are proved correct for general conjunctive queries, in prac-
    tice we are limited by the current query capabilities of OWL 2 reasoners.
                                              Foreman(x) → Manag(x)                 (T1 )
                                                 Superv(x) → Manag(x)               (T2 )
                                  Superv(x) ∧ boss(x, y) → Workman(y)               (T3 )
                             TeamLead(x) ∧ boss(x, y) ∧ Manag(y) →                  (T4 )
                  Manag(x) → Superv(x) ∨ ∃y.(boss(x, y) ∧ Manag(y))                 (T5 )
                                           Manag(x) → ∃y.(boss(x, y))               (T6 )


    Manag(Sue)       (D1 )         Superv(Rob)    (D3 )       Manag(Jo)     (D5 )
    Superv(Dan)      (D2 )      boss(Dan, Ben)    (D4 )    TeamLead(Jo)     (D6 )
                                                          boss(Jane, Rob)   (D7 )

                        Fig. 1. Example knowledge base Kex .


such knowledge base using the correspondence of OWL and first order logic and
a variant of the structural transformation (e.g., see [16] for detais).
    We focus on CQ answering as the key reasoning problem. A query is a formula
q(x) = ∃y ϕ(x, y) with ϕ(x, y) a conjunction of atoms. We usually omit the free
variables x of queries and write just q. The query is atomic if ϕ(x, y) is a single
atom. A tuple of individuals a is a (certain) answer to q w.r.t. a set of sentences
F iff F |= q(a). The set of all answers to q(x) w.r.t. F is denoted by cert(q, F).
    There are two main techniques for answering queries over a datalog knowledge
base K. Forward chaining computes the set M at(K) of ground atoms entailed by
K, called the materialisation of K. A query q over K can be answered directly over
the materialisation. Backward chaining treats a query as a conjunction of atoms
(a goal ). An SLD resolvent of a goal A ∧ ψ with a datalog rule ϕ → C1 ∧ · · · ∧ Cn
is a goal ψθ ∧ ϕθ, with θ the MGU of A and Cj , for some 1 ≤ j ≤ n. An SLD
proof of a goal G0 in K is a sequence of goals (G0 , . . . , Gn ) with Gn the empty
goal (), and each Gi+1 a resolvent of Gi and a rule in K.


3   Overview
The main idea behind our approach to query answering is to delegate the bulk
of the computational workload to a highly scalable datalog reasoner, thus min-
imising the use of a fully-fledged OWL 2 reasoner. Given a knowledge base K
and a query q, we proceed according to the following algorithm:

 1. Use the datalog reasoner to compute both lower bound (sound but possibly
    incomplete) and upper bound (complete but possibly unsound) answers to
    the (Boolean) unsatisfiability query and the input q. (See Sections 4, 5).
 2. If both bounds report unsatisfiability, then we return unsatisfiable. If none
    of them reports unsatisfiability and they yield the same answers to q, we
    output the resulting answers. In any other case, proceed to the next step.
3. Use the datalog reasoner to compute fragments K⊥ and K[q,G] of K, where
   G is the set of answers to q in the gap between the bounds. (See Section 6).
4. If the upper bound reports unsatisfiability and K⊥ is unsatisfiable, then
   return unsatisfiable.
5. Use the OWL reasoner to check whether K[q,G] ∪ K⊥ |= q(a), for each a ∈ G.
   To minimise the computational workload of the OWL reasoner, this step is
   carried out as follows (see Section 7):
   (a) Summarise K[q,G] ∪K⊥ by merging all constants that instantiate the same
       unary predicates [4]. Use the OWL reasoner to discard those a ∈ G such
       that q(a) is not entailed by the summarised KB.
   (b) Compute a dependency relation between the remaining elements of G
       such that if b depends on a and a is a spurious answer, then so is b.
       Arrange the calls to the reasoners according to these dependencies.
6. Return the lower bound answers to q plus those tuples in G determined to
   be answers in Step 5.

   We will describe each of these steps and illustrate them using as running
example the knowledge base Kex in Figure 1 and the following query q ex :

                     q ex (x) = ∃y(boss(x, y) ∧ Workman(y))


4   Computing Upper Bounds

To compute upper bound query answers, we first compute a datalog knowl-
edge base U(K) that entails the nullary predicate ⊥, if K is unsatisfiable, and
that entails K, otherwise. Hence, for satisfiable knowledge bases K we get that
cert(q, U(K)) subsumes cert(q, K). The knowledge base U(K) is the result of con-
secutively applying the transformations Σ, Ξ and Ψ defined next.

Definition 1. Let K be a KB. We define U(K) := Ψ ◦ Ξ ◦ Σ(K), where

 – Σ is a mapping that transforms each rule into clausal normal form;
 – Ξ maps each clause C to a set of clauses as follows: (i) if C contains only
   negative literals, then Ξ(C) = C ∨ ⊥; (ii) if C is of the form ¬B0 ∨ · · · ∨
   ¬Bk ∨ C0 ∨ · · · ∨ Cr+1 then Ξ(C) consists of the clauses ¬B1 ∨ · · · ∨ ¬Bk ∨ Ci ,
   for 0 ≤ i ≤ r + 1; (iii) in any other case, Ξ(C) = C.
 – Ψ maps every Horn clause C to a datalog rule Ψ (C) obtained from C by
   first replacing each functional term with a globally fresh constant, and then
   transforming the resulting clause into its equivalent datalog rule.

These transformations extend to sets in the natural way.

    In our example Kex , the transformation U is the identity for all rules except
T4 –T6 . Rule T4 is transformed by U (Ξ in particular) into the datalog rule U4 .

                   TeamLead(x) ∧ boss(x, y) ∧ Manag(y) → ⊥                      (U4 )
T5 is first transformed by Σ into clauses ¬Manag(x) ∨ Superv(x) ∨ boss(x, f1 (x))
and ¬Manag(x) ∨ Superv(x) ∨ Manag(f1 (x)). These will then be transformed
by Ξ to the clauses ¬Manag(x) ∨ Superv(x), ¬Manag(x) ∨ boss(x, f1 (x)) and
¬Manag(x) ∨ Manag(f1 (x)). Finally, Ψ will produce the following datalog rules.

                               Manag(x) → Superv(x)                              (U51 )
                              Manag(x) → boss(x, c1 )                            (U52 )
                              Manag(x) → Manag(c1 )                              (U53 )

   Rule T6 will be transformed by Σ into the clause ¬Manag(x) ∨ boss(x, f2 (x)),
which in turn will be transformed by Ψ into the datalog rule U6 .

                              Manag(x) → boss(x, c2 )                             (U6 )

Hence, U(K) comprises the facts D1 –D7 and the datalog rules T1 –T3 , U4 , U51 –U53 ,
and U6 . One can easily verify that cert(q ex , U(Kex )) = {Sue, Dan, Rob, Jo}.
   The following lemma captures the properties of these transformations.
Proposition 1. Let K be a knowledge base and q be a query. Then:
1. K unsatisfiable ⇔ Σ(K) unsatisfiable ⇒ Ξ(Σ(K)) |= ⊥ ⇒ U(K) |= ⊥;
2. K satisf. ⇒ cert(q, K) = cert(q, Σ(K)) ⊆ cert(q, Ξ(Σ(K))) ⊆ cert(q, U(K)).


5    Computing Lower Bounds
A direct way to compute lower bound query answers given K and q is to select the
datalog fragment L(K) of K, check its satisfiability, and compute cert(q, L(K))
using a datalog engine. By monotonicity of first-order logic, K entails L(K), and
hence cert(q, K) is guaranteed to subsume cert(q, L(K)). In our running example,
the lower bound knowledge base L(Kex ) comprises the facts D1 –D7 and the
datalog rules T1 –T4 , and it can be easily verified that cert(q ex , L(Kex )) = {Dan}.
     To improve this bound, we adopt the combined approach introduced to han-
dle query answering in ELHOr⊥ [19, 11]. Given an ELHOr⊥ knowledge base K0
and a query q, the combined approach first exploits the upper bound data-
log program U(K0 ) to check satisfiability of K0 and to compute cert(q, U(K0 )).
A subsequent filtering step Φ, which is efficiently implementable, guarantees to
eliminate all spurious tuples; the resulting answer Φ(cert(q, U(K0 ))) is thus sound
and complete w.r.t. q and K0 .
     The combined approach is clearly compatible with ours. Given an OWL 2
knowledge base K and query q, we proceed as follows. First, we select the dat-
alog fragment K1 = L(K), and compute the materialisation M at(K1 ) using the
datalog engine. Second, we select the subset K2 of K corresponding to ELHOr⊥
axioms and Skolemise existential quantifiers to constants to obtain U(K2 ). Then,
we further compute the answers cert(q, U(K2 ) ∪ M at(K1 )). Finally, we apply the
filtering step Φ to obtain the final set of lower bound answers. The ELHOr⊥
fragment for our running example Kex consists of rules T1 –T4 and T6 , and the
resulting new lower bound answer of q ex is the set {Dan, Rob}.
                  Table 1. SLD proofs of ⊥ and q ex (Jo) in U(Kex )

                    ⊥                                b(J, y) ∧ W (y)
         T (x) ∧ b(x, y) ∧ M (y)   by U4             M (J) ∧ W (c2 )     by U6
             b(J, y) ∧ M (y)       by D6                  W (c2 )        by D5
             M (J) ∧ M (c1 )       by U52            S(x) ∧ b(x, c2 )    by T3
                  M (c1 )          by D5             M (x) ∧ b(x, c2 )   by U51
                  M (x)            by U35                b(J, c2 )       by D5
                                  by D5                  M (J)          by U6
                                                                        by D5


6    Computing Relevant Fragments
Fragment Definition and Formal Properties The relevant fragments K⊥
and K[q,G] are defined in terms of SLD proofs in U(K). In particular, K⊥ is
defined in terms of proofs for the nullary predicate ⊥, and K[q,G] is defined in
terms of proofs for each answer in G.
Definition 2. Let K be a knowledge base, q(x) be a query, and S be a set of
tuples. Then K⊥ (resp. K[q,S] ) is the set of all α ∈ K for which there exists
β ∈ U(α) involved in an SLD proof of ⊥ (resp. Q(a), for some a ∈ S) in U(K).
The properties of these fragments needed to ensure the correctness of our algo-
rithm in Section 3 are summarised in the following theorem.
Theorem 1. Let K be a knowledge base, q(x) a conjunctive query, and S a set
of tuples. Then, (i) K is satisfiable iff K⊥ is satisfiable; and (ii) if K is satisfiable,
then K |= q(a) iff K[q,S] ∪ K⊥ |= q(a) for every a ∈ S.
    As expected, K⊥ can be used to determine satisfiability of K. In case K is
found satisfiable, the union of K[q,G] and K⊥ can then be used to check the valid-
ity of each candidate answer in G (in this case, K⊥ is still needed to account for
the possible interactions between non-Horn rules and rules with empty heads).
    Table 1 specifies proofs of ⊥ and q ex (Jo) in U(Kex ), where predicates and
constants are abbreviated to their first letters. By Definition 2, K⊥ ∪ K[qex ,{Jo}]
subsumes {T3 , . . . , T6 , D5 , D6 }, and, hence, it entails q ex (Jo), as expected. Note
that K[q,{Jo}] alone is not sufficient to show q ex (Jo) since every fragment of Kex
that entails q ex (Jo) must include rule T4 . According to Definition 1, K[q,{Jo}]
will include T4 if and only if U4 is used in an SLD proof of q ex (Jo) in U(Kex );
however, no such proof will involve U4 since the goal q ex (Jo) does not involve
⊥, and there is no way of eliminating ⊥ from a goal using the rules in U(Kex )
as they do not contain ⊥ in their bodies.
    The proof of Theorem 1 is involved, and details are deferred to the appendix.
Nonetheless, we next sketch the arguments behind the proof. A first observation
is that, w.l.o.g. we can restrict ourselves to the case where q(x) is atomic.
Lemma 1. Let K be a knowledge base, q(x) = ∃y ϕ(x, y) be a CQ, S be a set
of tuples, Q be a fresh predicate, and let K0 = K[q,S] ∪ K⊥ . Then, K0 |= q(a) iff
K0 ∪ {ϕ(x, y) → Q(x)} |= Q(a).
    The crux of the proof relies on the following properties of Ξ (the step in the
definition of U which splits each non-Horn clause C into Horn clauses).
Lemma 2. Let N be a set of first-order clauses. Then:
 – if C ∈ N participates in a refutation in N , then every C 0 ∈ Ξ(C) is part of
   an SLD proof of ⊥ in Ξ(N );
 – if C ∈ N participates in a resolution proof in N of an atomic query Q(a),
   then each C 0 ∈ Ξ(C) participates in an SLD proof of ⊥ or Q(a) in Ξ(N ).
    Thus, by Lemma 2, each resolution proof in a set of clauses N can be mapped
to SLD proofs in Ξ(N ) that “preserves” the participating clauses. The following
lemma allows us to restate Lemma 2 for Ψ ◦ Ξ instead of Ξ.
Lemma 3. Let H be a set of first-order Horn clauses, Q(x) be an atomic query,
and a be a tuple of constants. If a clause C participates in an SLD proof of Q(a)
in H, then Ψ (C) participates in an SLD proof of Q(a) in Ψ (H).
   With these Lemmas, we can exploit refutational completeness of resolution
and the entailment preservation properties of Skolemisation to show Theorem 1.

Fragment Computation The computation of the relevant fragments requires
a scalable algorithm for “tracking” all rules and facts involved in SLD proofs for
datalog programs. We next present a novel technique that delegates this task to
the datalog engine itself. The main idea is to extend the datalog program with
additional rules that are responsible for the tracking; in this way, the relevant
rules and facts can be obtained from the materialisation of the modified program.
Definition 3. Let K be a datalog KB and let F be a set of facts in M at(K).
Then, ∆(K, F ) is the datalog program containing the rules and facts given next:
 – each rule and fact in K;
 – a fact P̄ (a) for each fact P (a) in F ;
 – the following rules for each r ∈ K of the form B1 (x1 ), . . . , Bm (xm ) → H(x),
   and 1 ≤ i ≤ m, with cr a fresh constant for each r, and S a fresh predicate:
                        H̄(x) ∧ B1 (x1 ) ∧ . . . , Bm (xm ) → S(cr )              (1)
                       H̄(x) ∧ B1 (x1 ), . . . ∧ Bm (xm ) → B̄i (xi )             (2)
The auxiliary predicates P̄ are used to record facts involved in proofs; in particu-
lar, if P̄ (c) is contained in M at(∆(K, F )), we can conclude that P (c) participates
in an SLD proof in K of a fact in F . Furthermore, each rule r ∈ K is represented
by a fresh constant cr , and S is a fresh predicate that is used to record rules
of K involved in proofs. In particular, if S(cr ) is contained in M at(∆(K, F )),
we can conclude that rule r participates in an SLD proof in K of a fact in F .
The additional rules (1) and (2) are responsible for the tracking and make sure
that the materialisation of ∆(K, F ) contains the required information. Indeed,
if there is an instantiation B1 (a1 ) ∧ . . . ∧ Bm (am ) → H(a) of a rule r ∈ ∆, then,
by virtue of (1), cr will be added to S, and, by virtue of (2), each B̄i (ai ), for
1 ≤ i ≤ m, will be derived. Correctness is established as follows.
Theorem 2. Let K be a datalog knowledge base and let F be a set of facts in
M at(K). Then, a fact P (a) (resp. a rule r) in K participates in an SLD proof
of some fact in F iff P̄ (a) (resp. S(cr )) is in M at(∆(K, F )).


7   Summarisation and Answer Dependencies

Once the relevant fragment has been computed, we check, using the fully-fledged
reasoner, whether each candidate answer is entailed. This can be computation-
ally expensive if the fragment is large, or there are many candidate answers to
verify. To address these issues, we exploit summarisation techniques [4] to effi-
ciently prune candidate answers. The idea behind summarisation is to “shrink”
the data by merging constants instantiating the same unary predicates. Since
summarisation is equivalent to extending the knowledge base with equality asser-
tions, the summarised knowledge base entails the original one by monotonicity.

Definition 4. Let K be a knowledge base. A type T is a set of unary predicates;
for a constant a in K, we say that T = {A | A(a) ∈ K} is the type for a. Further-
more, for each type T , let cT be a globally fresh constant uniquely associated with
T . The summary function over K is the substitution σ mapping each constant a
in K to cT , where T is the type for a. Finally, the knowledge base σ(K) obtained
by replacing each constant a in K with σ(a) is called the summary of K.
By summarising a knowledge base, we overestimate query answers [4].

Proposition 2. Let K be a knowledge base, and let σ be the summary function
over K. Then, for every query q we have σ(cert(q, K)) ⊆ cert(σ(q), σ(K)).

Summarisation can be exploited to detect spurious answers in G: if a tuple is not
in cert(σ(q), σ(K)), then it is not in cert(q, K). Since summarisation can signifi-
cantly reduce the size of a knowledge base, we can efficiently detect non-answers
even if checking them over the summary requires calling the OWL reasoner.

Corollary 1. Let K be a knowledge base, let q be a query, let S be a set of
tuples, and let K0 = K[q,S] ∪ K⊥ . Furthermore, let σ be the summary function
over K0 . Then, σ(K0 ) 6|= σ(q(a)) implies K 6|= q(a) for each a ∈ S.

   Finally, we try to further reduce the calls to the fully-fledged reasoner by
exploiting dependencies between the candidate answers. Consider tuples a and
b in G and the dataset D in the fragment K[q,G] ∪ K⊥ ; furthermore, suppose
we can find an endomorphism h of D in which h(a) = b. If we can determine
(by calling the fully-fledged reasoner) that b is a spurious answer, then so must
be a; as a result, we no longer call the reasoner to check a. We exploit this
idea to compute a dependency graph having candidate answers as nodes and an
edge (a, b) whenever an endomorphism in D exists mapping a to b. Computing
endomorphisms is computationally hard, so we have implemented a sound (but
incomplete) greedy algorithm that approximates the dependency graph.
                          Data       DL      Axioms      Facts
                        LUBM(n)     SHI            93       105 n
                             −
                       UOBM (n) SHIN              314 2 × 105 n
                          FLY       SRI 144,407            6,308
                       DBPedia+ SHOIN           1,757 12,119,662
                          NPD      SHIF           819 3,817,079
                           Table 2. Statistics for test data


        Strategy      Solved |Univ| tavg.          Strategy    Solved |Univ| tavg.
       RL Bounds        14     1000 18.4         RL Bounds        12    500 0.7
    + EL Lower Bound 22        1000 11.7       + Summarisation 14        60 14.0
       + Sum, Dep       24      100 29.6       + Dependencies     15      1 1.8
         Table 3. Result for LUBM                 Table 4. Result for UOBM−


8      Evaluation
We have implemented a prototype system, called PAGOdA, based on RDFox and
HermiT (v. 1.3.8). For testing, we used the LUBM and UOBM benchmarks, as
well as the Fly Anatomy ontology, DBPedia and NPD FactPages; their key fea-
tures are summarised in Table 2. Our system, test data, ontologies, and queries
are available online.2 We compared our system with Pellet (v. 2.3.1) and TrOWL
[20] on all datasets. While Pellet is sound and complete, TrOWL relies on ap-
proximate reasoning and does not provide correctness guarantees. Tests were
performed on a 16 core 3.30GHz Intel Xeon E5-2643 with 125GB of RAM, and
running Linux 2.6.32. For each test, we measured materialisation times for upper
and lower bound, the time to answer each query, and the number of queries that
can be fully answered using different techniques. All times are in seconds.
    Materialisation is fast on LUBM [7]: it takes 319s (341s) to materialise the
basic lower (upper) bound entailments for LUBM(1000). These bounds match
for all 14 standard LUBM queries, and we have used 10 additional queries for
which this is not the case; we tested our system on all 24 queries (see Table 3 for
a summary of the results). The refined lower bound was materialised in 366s, and
it matches the upper bound for 8 of the 10 additional queries; thus, our system
could answer 22 of the 24 queries over LUBM(1000) efficiently in 12s on average.3
For the remaining 2 queries, we could scale to LUBM(100) in reasonable time.
On LUBM(100) the gaps contain 29 and 14 tuples respectively, none of which
were eliminated by summarisation; however, exploiting dependencies between
gap tuples reduced the calls to HermiT to only 3 and 1 respectively, with the
majority of time taken in extraction (avg. 45s) and HermiT calls (avg. 281s).
On LUBM(1000), Pellet ran out of memory. For LUBM(100), Pellet took on
average 8.2s to answer the standard queries with an initialisation overhead of
388s. TrOWL timed out after 1h on LUBM(100).
2
    http://www.cs.ox.ac.uk/isg/tools/PAGOdA/
3
    Average query answering times are measured after materialisation.
    UOBM is an extension of LUBM [25]. Query answering over UOBM re-
quires equality reasoning (e.g., to deal with cardinality constraints), which is
not natively supported by RDFox,4 so we have used a slightly weakened ontol-
ogy UOBM− for which equality is not required. Materialisation is still fast on
UOBM− (500): it takes 346s (378s) to materialise the basic lower (upper) bound
entailments. We have tested the 15 standard queries (see Table 4). The basic
lower and upper bounds match for 12 queries; our system is efficient for these
queries, with an average query answering time of less than 1s over UOBM− (500).
For 2 of the remaining queries, summarisation prunes all candidate answers.
Average times for these queries were under 15s for UOBM− (60). For the one
remaining query, summarisation rules out 6245 among 6509 answers in the gap,
and the dependency analysis groups all the remaining individuals. HermiT, how-
ever, takes 20s to check the representative answer for UOBM− (1), and 4000s for
UOBM− (10). Pellet times out even on UOBM− (1). TrOWL took 237s on av-
erage to answer 14 out of the 15 queries over UOBM− (60).5 Furthermore, a
comparison with our system reveals that TrOWL answers may be neither sound
nor complete for most test queries.
    Fly Anatomy is a complex ontology, rich in existential axioms, and including
a dataset with over 6,000 facts. We tested it with five queries provided by the
developers the ontology. It took 88s (106s) to materialise lower (upper) bound
entailments. The basic lower bounds for all queries are empty, whereas the refined
lower bounds (which take 185s to materialise) match with the upper bound in
all cases; as a result, we can answer the queries in 0.2s on average. Pellet fails
to answer queries given a 1h timeout, and TrOWL returns only empty answers.
    In contrast to Fly, the DBPedia dataset is relatively large, but the ontology
is simple. To provide a more challenging test, we have used the LogMap ontol-
ogy matching system [9] to extend DBPedia with the tourism ontology which
contains both disjunctive and existential axioms. Since the tested systems report
errors on datatypes, we have removed all axioms and facts involving datatypes.
It takes 45s (47s) to materialise the basic lower (upper) bound entailments. The
upper bound was unsatisfiable and it took 2.6s to check satisfiability of the K⊥
fragment. We queried for instances of all 441 atomic concepts. Bounds matched
in 439 cases (using the refined lower bound), and these queries were answered in
0.3s on average. Summarisation filtered out all gap tuples for the remaining two
queries. The answer time for both queries was less than 3s. Pellet takes 280.9s
to initialise and answers each query in an average time of 16.2s. TrOWL times
out after 1h.
    The NPD FactPages ontology describes petroleum activities on the Nor-
wegian continental shelf. The ontology is not Horn, and it includes existential
axioms. As in the case of DBPedia, we removed axioms involving datatypes. Its
dataset has about 4 million triples; it takes 17s (22s) to materialise the lower
(upper) bound entailments. The upper bound is unsatisfiable, and it took 30s
to check satisfiability of K⊥ . We queried for the instances of the 329 atomic
4
    RDFox supports equality via its axiomatisation as a congruence relation.
5
    An exception is reported for the remaining query.
concepts, and could answer all queries in 2.5s on average. Queries with matching
bounds (294 out of 329) could be answered on 0.1s, and average query answer-
ing time was 3s. TrOWL took 1.3s to answer queries on average; answers were
complete for 320 out of the 329 queries.


9    Related Techniques

The Screech system [21] exploits the KAON2 reasoner [8] to rewrite a SHIQ
ontology into disjunctive datalog while preserving atomic queries, and then trans-
forms ∨ into ∧; the resulting over-approximation can be used to compute up-
per bound query answers. This technique is restricted to SHIQ ontologies and
atomic queries; furthermore, the set of rules obtained from KAON2 can be ex-
pensive to compute, as well as of exponential size. Both the Quill system [17]
and the work of [25] under-approximate the ontology into OWL 2 QL; however,
neither approximation is independent of both query and data, and using OWL 2
QL increases the chances that the approximated ontology will be unsatisfiable.
    The SHER system uses summarisation to efficiently compute an upper bound
answer, with exact answers then being computed via successive relaxations [3,
4]. The technique has been shown to be scalable, but it is only known to be
applicable to SHIN and atomic queries, and is less modular than our approach.
In contrast, our approach can profitably exploit the summarisation technique,
and could even improve scalability for the hardest queries by replacing HermiT
with SHER when the extracted fragment is SHIN .


10    Discussion
We have proposed a novel approach for query answering that integrates scalable
and complete reasoners to provide pay-as-you-go performance. Our evaluation
shows that 772 of the 814 test queries could be answered using highly scalable
lower and upper bound computations, 39 of the remaining 42 queries yielded to
extraction and summarisation techniques, and even for the remaining 3 queries
our fragment extraction and dependency techniques greatly improved scalability.
Our approach is complementary to other optimisation efforts, and could immedi-
ately benefit from alternative techniques for efficiently computing lower bounds
and/or a more efficient OWL reasoner. Our technical results are very general,
and hold for any language L captured by generalised rules.
    There are still many possibilities for future work. For the immediate future,
our main focus will be improving the fragment extraction and checking tech-
niques so as to improve scalability for the hardest queries.
Acknowledgements. This work was supported by the Royal Society, the EP-
SRC projects Score!, Exoda, and MaSI3 , and the FP7 project OPTIQUE.
References

 1. Bishop, B., Kiryakov, A., Ognyanoff, D., Peikov, I., Tashev, Z., Velkov, R.:
    OWLIM: A family of scalable semantic repositories. Semantic Web 2(1), 33–42
    (2011)
 2. Cuenca Grau, B., Motik, B., Stoilos, G., Horrocks, I.: Completeness guarantees for
    incomplete ontology reasoners: Theory and practice. J. Artif. Intell. Res. (JAIR)
    43, 419–476 (2012)
 3. Dolby, J., Fokoue, A., Kalyanpur, A., Kershenbaum, A., Schonberg, E., Srinivas,
    K., Ma, L.: Scalable semantic retrieval through summarization and refinement. In:
    AAAI. pp. 299–304 (2007)
 4. Dolby, J., Fokoue, A., Kalyanpur, A., Schonberg, E., Srinivas, K.: Scalable highly
    expressive reasoner (SHER). J. Web Sem. 7(4), 357–361 (2009)
 5. Eiter, T., Ortiz, M., Simkus, M.: Conjunctive query answering in the description
    logic SH using knots. J. Comput. Syst. Sci. 78(1), 47–85 (2012)
 6. Glimm, B., Lutz, C., Horrocks, I., Sattler, U.: Conjunctive query answering for the
    description logic SHIQ. J. Artif. Intell. Res. (JAIR) 31, 157–204 (2008)
 7. Guo, Y., Pan, Z., Heflin, J.: LUBM: A benchmark for OWL knowledge base sys-
    tems. J. Web Sem. 3(2-3), 158–182 (2005)
 8. Hustadt, U., Motik, B., Sattler, U.: Reasoning in description logics by a reduction
    to disjunctive datalog. J. Autom. Reasoning 39(3), 351–384 (2007)
 9. Jiménez-Ruiz, E., Cuenca Grau, B., Zhou, Y., Horrocks, I.: Large-scale interactive
    ontology matching: Algorithms and implementation. In: ECAI. pp. 444–449 (2012)
10. Kollia, I., Glimm, B.: Optimizing SPARQL query answering over OWL ontologies.
    J. Artif. Intell. Res. (JAIR) 48, 253–303 (2013)
11. Kontchakov, R., Lutz, C., Toman, D., Wolter, F., Zakharyaschev, M.: The com-
    bined approach to ontology-based data access. In: IJCAI. pp. 2656–2661 (2011)
12. Möller, R., Neuenstadt, C., Özçep, Ö.L., Wandelt, S.: Advances in accessing big
    data with expressive ontologies. In: Description Logics. pp. 842–853 (2013)
13. Motik, B., Cuenca Grau, B., Horrocks, I., Wu, Z., Fokoue, A., Lutz, C.: OWL
    2 Web Ontology Language Profiles. W3C Recommendation (27 October 2009),
    available at http://www.w3.org/TR/owl2-profiles/
14. Motik, B., Nenov, Y., Piro, R., Horrocks, I., Olteanu, D.: Parallel materialisation
    of datalog programs in main-memory rdf databases. In: AAAI (2014)
15. Motik, B., Patel-Schneider, P.F., Parsia, B.: OWL 2 Web Ontology Language
    Structural Specification and Functional-style Syntax. W3C Recommendation (27
    October 2009 2009), available at http://www.w3.org/TR/owl2-syntax/
16. Motik, B., Shearer, R., Horrocks, I.: Hypertableau reasoning for description logics.
    J. Artif. Intell. Res. (JAIR) 36, 165–228 (2009)
17. Pan, J.Z., Thomas, E., Zhao, Y.: Completeness guaranteed approximations for
    OWL-DL query answering. In: Description Logics (2009)
18. Sirin, E., Parsia, B., Cuenca Grau, B., Kalyanpur, A., Katz, Y.: Pellet: A practical
    OWL-DL reasoner. J. Web Sem. 5(2), 51–53 (2007)
19. Stefanoni, G., Motik, B., Horrocks, I.: Introducing nominals to the combined query
    answering approaches for EL. In: AAAI (2013)
20. Thomas, E., Pan, J.Z., Ren, Y.: TrOWL: Tractable OWL 2 reasoning infrastruc-
    ture. In: ESWC (2). pp. 431–435 (2010)
21. Tserendorj, T., Rudolph, S., Krötzsch, M., Hitzler, P.: Approximate OWL-
    reasoning with Screech. In: RR. pp. 165–180 (2008)
22. Urbani, J., van Harmelen, F., Schlobach, S., Bal, H.E.: QueryPIE: Backward rea-
    soning for OWL Horst over very large knowledge bases. In: International Semantic
    Web Conference (1). pp. 730–745 (2011)
23. Urbani, J., Kotoulas, S., Maassen, J., van Harmelen, F., Bal, H.E.: WebPIE: A web-
    scale parallel inference engine using MapReduce. J. Web Sem. 10, 59–75 (2012)
24. W3C SPARQL Working Group: SPARQL 1.1 Overview. W3C Recommendation
    (21 March 2013), available at http://www.w3.org/TR/sparql11-overview/
25. Wandelt, S., Möller, R., Wessel, M.: Towards scalable instance retrieval over on-
    tologies. Int. J. Software and Informatics 4(3), 201–218 (2010)
26. Wu, Z., Eadon, G., Das, S., Chong, E.I., Kolovski, V., Annamalai, M., Srinivasan,
    J.: Implementing an inference engine for RDFS/OWL constructs and user-defined
    rules in oracle. In: ICDE. pp. 1239–1248 (2008)
27. Zhou, Y., Cuenca Grau, B., Horrocks, I., Wu, Z., Banerjee, J.: Making the most
    of your triple store: query answering in OWL 2 using an RL reasoner. In: WWW.
    pp. 1569–1580 (2013)
28. Zhou, Y., Nenov, Y., Cuenca Grau, B., Horrocks, I.: Complete query answering
    over Horn ontologies using a triple store. In: ISWC (1). pp. 720–736 (2013)
29. Zhou, Y., Nenov, Y., Cuenca Grau, B., Horrocks, I.: Pay-as-you-go OWL query
    answering using a triple store. In: AAAI (2014)