Pay-as-you-go Ontology Query Answering Using a Datalog Reasoner? Yujiao Zhou, Yavor Nenov, Bernardo Cuenca Grau, and Ian Horrocks Department of Computer Science, University of Oxford, UK Abstract. We describe a hybrid approach to conjunctive query answer- ing over OWL 2 ontologies that combines a datalog reasoner with a fully-fledged OWL 2 reasoner in order to provide scalable “pay as you go” performance. Our approach delegates the bulk of the computation to the highly scalable datalog engine and resorts to expensive OWL 2 reasoning only as necessary to fully answer the query. We have imple- mented a prototype system that uses RDFox as a datalog reasoner, and HermiT as an OWL 2 reasoner. Our evaluation over both benchmark and realistic ontologies and datasets suggests the feasibility of our approach. 1 Introduction The use of RDF [14], OWL 2 [15], and SPARQL 1.1 [24] to represent and query semi-structured data together with domain knowledge is increasingly widespread. Query answering in this setting is, however, of high worst-case complexity [6, 5], and although heavily optimised, existing systems for query answering w.r.t. RDF data and an unrestricted OWL 2 ontology can process only small to medium size datasets [12, 25, 10]. This has led to the development of query answering proce- dures that are more scalable, but that can (fully) process only fragments of OWL 2, and several prominent fragments have now been standardised as OWL 2 pro- files [13]. Such systems have been shown to be (potentially) highly scalable [23, 1, 22, 26], but if the ontology falls outside the relevant profile, then the answers computed by such a system may be incomplete: if it returns an answer, then all tuples in the answer are (usually) valid, but some valid tuples may be missing from the answer. When used with out-of-profile ontologies, a query answer com- puted by such a system can thus be understood as providing a lower-bound on the correct answer; however, they cannot in general provide any upper bound or even any indication as to how complete the computed answer is [2]. In this paper, we describe a hybrid approach to query answering that exploits a datalog reasoner to compute both a lower bound answer and an upper bound answer. If lower and upper bound answers coincide, they obviously provide a sound and complete answer. Otherwise, relevant fragments of the ontology and data can be extracted that are guaranteed to be sufficient to test the validity of tuples in the “gap” between the two answers. These fragments can also be ? This paper recapitulates some of our results in [27–29], and it is accompanied with a technical report available at http://www.cs.ox.ac.uk/isg/people/yujiao.zhou. computed by relying solely on the datalog reasoner, and are typically much smaller than the input ontology and data. The remaining gap tuples need to be checked w.r.t. to the identified fragments using an OWL 2 reasoner such as HermiT [16] or Pellet [18]; furthermore, since the number of gap tuples can be significant in some cases, we exploit summarisation techniques inspired by the SHER system [3, 4] to quickly identify spurious gap answers, thus further reducing the requirement for fully-fledged OWL 2 reasoning. Our approach is pay-as-you-go in the sense that the bulk of the computation is delegated to a scalable datalog engine. Furthermore, although our main goal is to answer queries over OWL 2 ontologies efficiently, our technical results are very general and our approach is not restricted to DLs. More precisely, given a first-order KR language L that can be captured by rules allowing for existential quantification and disjunction in the head, and over which we want to answer conjunctive queries, our only assumption is the availability of a fully-fledged reasoner for L and a datalog reasoner, which are both used as a “black box”. We have implemented our techniques in a prototypical system using the RDFox as a datalog reasoner [23] and the HermiT as a fully-fledged OWL 2 reasoner.1 Our preliminary evaluation over both benchmark and realistic data suggests that the system can provide scalable pay-as-you-go query answering for a wide range of OWL 2 ontologies, RDF data and queries. In almost all cases, the system is able to completely answer queries without resorting to fully- fledged OWL 2 reasoning, and even when this is not the case, relevant fragment extraction and summarisation are effective in reducing the size of the problem to manageable proportions. 2 Preliminaries We adopt standard first order logic notions, such as variables, constants, atoms, formulas, clauses, substitutions, satisfiability, and entailment. We also assume basic familarity with OWL 2 [15] and its profiles [13]. A generalised rule (or just a rule) is a function-free sentence of the form n ^ m _ ∀x ( Bj (x) → ∃yi ϕi (x, yi )) j=0 i=0 where Bj (x) are body atoms and ϕi are conjunctions of head atoms. The universal quantifiers are left implicit from now on. A rule is Horn if m ≤ 1, and it is datalog if it is Horn and does not contain existential quantifiers. A fact is a ground atom and a dataset is a finite set of facts. A knowledge base K consists of a finite set of rules and a dataset. We treat equality (≈) as an ordinary predicate, but assume that every knowledge base in which equality occurs contains the axioms of equality for its signature. Each OWL 2 ontology can be normalised as one 1 Although our techniques are proved correct for general conjunctive queries, in prac- tice we are limited by the current query capabilities of OWL 2 reasoners. Foreman(x) → Manag(x) (T1 ) Superv(x) → Manag(x) (T2 ) Superv(x) ∧ boss(x, y) → Workman(y) (T3 ) TeamLead(x) ∧ boss(x, y) ∧ Manag(y) → (T4 ) Manag(x) → Superv(x) ∨ ∃y.(boss(x, y) ∧ Manag(y)) (T5 ) Manag(x) → ∃y.(boss(x, y)) (T6 ) Manag(Sue) (D1 ) Superv(Rob) (D3 ) Manag(Jo) (D5 ) Superv(Dan) (D2 ) boss(Dan, Ben) (D4 ) TeamLead(Jo) (D6 ) boss(Jane, Rob) (D7 ) Fig. 1. Example knowledge base Kex . such knowledge base using the correspondence of OWL and first order logic and a variant of the structural transformation (e.g., see [16] for detais). We focus on CQ answering as the key reasoning problem. A query is a formula q(x) = ∃y ϕ(x, y) with ϕ(x, y) a conjunction of atoms. We usually omit the free variables x of queries and write just q. The query is atomic if ϕ(x, y) is a single atom. A tuple of individuals a is a (certain) answer to q w.r.t. a set of sentences F iff F |= q(a). The set of all answers to q(x) w.r.t. F is denoted by cert(q, F). There are two main techniques for answering queries over a datalog knowledge base K. Forward chaining computes the set M at(K) of ground atoms entailed by K, called the materialisation of K. A query q over K can be answered directly over the materialisation. Backward chaining treats a query as a conjunction of atoms (a goal ). An SLD resolvent of a goal A ∧ ψ with a datalog rule ϕ → C1 ∧ · · · ∧ Cn is a goal ψθ ∧ ϕθ, with θ the MGU of A and Cj , for some 1 ≤ j ≤ n. An SLD proof of a goal G0 in K is a sequence of goals (G0 , . . . , Gn ) with Gn the empty goal (), and each Gi+1 a resolvent of Gi and a rule in K. 3 Overview The main idea behind our approach to query answering is to delegate the bulk of the computational workload to a highly scalable datalog reasoner, thus min- imising the use of a fully-fledged OWL 2 reasoner. Given a knowledge base K and a query q, we proceed according to the following algorithm: 1. Use the datalog reasoner to compute both lower bound (sound but possibly incomplete) and upper bound (complete but possibly unsound) answers to the (Boolean) unsatisfiability query and the input q. (See Sections 4, 5). 2. If both bounds report unsatisfiability, then we return unsatisfiable. If none of them reports unsatisfiability and they yield the same answers to q, we output the resulting answers. In any other case, proceed to the next step. 3. Use the datalog reasoner to compute fragments K⊥ and K[q,G] of K, where G is the set of answers to q in the gap between the bounds. (See Section 6). 4. If the upper bound reports unsatisfiability and K⊥ is unsatisfiable, then return unsatisfiable. 5. Use the OWL reasoner to check whether K[q,G] ∪ K⊥ |= q(a), for each a ∈ G. To minimise the computational workload of the OWL reasoner, this step is carried out as follows (see Section 7): (a) Summarise K[q,G] ∪K⊥ by merging all constants that instantiate the same unary predicates [4]. Use the OWL reasoner to discard those a ∈ G such that q(a) is not entailed by the summarised KB. (b) Compute a dependency relation between the remaining elements of G such that if b depends on a and a is a spurious answer, then so is b. Arrange the calls to the reasoners according to these dependencies. 6. Return the lower bound answers to q plus those tuples in G determined to be answers in Step 5. We will describe each of these steps and illustrate them using as running example the knowledge base Kex in Figure 1 and the following query q ex : q ex (x) = ∃y(boss(x, y) ∧ Workman(y)) 4 Computing Upper Bounds To compute upper bound query answers, we first compute a datalog knowl- edge base U(K) that entails the nullary predicate ⊥, if K is unsatisfiable, and that entails K, otherwise. Hence, for satisfiable knowledge bases K we get that cert(q, U(K)) subsumes cert(q, K). The knowledge base U(K) is the result of con- secutively applying the transformations Σ, Ξ and Ψ defined next. Definition 1. Let K be a KB. We define U(K) := Ψ ◦ Ξ ◦ Σ(K), where – Σ is a mapping that transforms each rule into clausal normal form; – Ξ maps each clause C to a set of clauses as follows: (i) if C contains only negative literals, then Ξ(C) = C ∨ ⊥; (ii) if C is of the form ¬B0 ∨ · · · ∨ ¬Bk ∨ C0 ∨ · · · ∨ Cr+1 then Ξ(C) consists of the clauses ¬B1 ∨ · · · ∨ ¬Bk ∨ Ci , for 0 ≤ i ≤ r + 1; (iii) in any other case, Ξ(C) = C. – Ψ maps every Horn clause C to a datalog rule Ψ (C) obtained from C by first replacing each functional term with a globally fresh constant, and then transforming the resulting clause into its equivalent datalog rule. These transformations extend to sets in the natural way. In our example Kex , the transformation U is the identity for all rules except T4 –T6 . Rule T4 is transformed by U (Ξ in particular) into the datalog rule U4 . TeamLead(x) ∧ boss(x, y) ∧ Manag(y) → ⊥ (U4 ) T5 is first transformed by Σ into clauses ¬Manag(x) ∨ Superv(x) ∨ boss(x, f1 (x)) and ¬Manag(x) ∨ Superv(x) ∨ Manag(f1 (x)). These will then be transformed by Ξ to the clauses ¬Manag(x) ∨ Superv(x), ¬Manag(x) ∨ boss(x, f1 (x)) and ¬Manag(x) ∨ Manag(f1 (x)). Finally, Ψ will produce the following datalog rules. Manag(x) → Superv(x) (U51 ) Manag(x) → boss(x, c1 ) (U52 ) Manag(x) → Manag(c1 ) (U53 ) Rule T6 will be transformed by Σ into the clause ¬Manag(x) ∨ boss(x, f2 (x)), which in turn will be transformed by Ψ into the datalog rule U6 . Manag(x) → boss(x, c2 ) (U6 ) Hence, U(K) comprises the facts D1 –D7 and the datalog rules T1 –T3 , U4 , U51 –U53 , and U6 . One can easily verify that cert(q ex , U(Kex )) = {Sue, Dan, Rob, Jo}. The following lemma captures the properties of these transformations. Proposition 1. Let K be a knowledge base and q be a query. Then: 1. K unsatisfiable ⇔ Σ(K) unsatisfiable ⇒ Ξ(Σ(K)) |= ⊥ ⇒ U(K) |= ⊥; 2. K satisf. ⇒ cert(q, K) = cert(q, Σ(K)) ⊆ cert(q, Ξ(Σ(K))) ⊆ cert(q, U(K)). 5 Computing Lower Bounds A direct way to compute lower bound query answers given K and q is to select the datalog fragment L(K) of K, check its satisfiability, and compute cert(q, L(K)) using a datalog engine. By monotonicity of first-order logic, K entails L(K), and hence cert(q, K) is guaranteed to subsume cert(q, L(K)). In our running example, the lower bound knowledge base L(Kex ) comprises the facts D1 –D7 and the datalog rules T1 –T4 , and it can be easily verified that cert(q ex , L(Kex )) = {Dan}. To improve this bound, we adopt the combined approach introduced to han- dle query answering in ELHOr⊥ [19, 11]. Given an ELHOr⊥ knowledge base K0 and a query q, the combined approach first exploits the upper bound data- log program U(K0 ) to check satisfiability of K0 and to compute cert(q, U(K0 )). A subsequent filtering step Φ, which is efficiently implementable, guarantees to eliminate all spurious tuples; the resulting answer Φ(cert(q, U(K0 ))) is thus sound and complete w.r.t. q and K0 . The combined approach is clearly compatible with ours. Given an OWL 2 knowledge base K and query q, we proceed as follows. First, we select the dat- alog fragment K1 = L(K), and compute the materialisation M at(K1 ) using the datalog engine. Second, we select the subset K2 of K corresponding to ELHOr⊥ axioms and Skolemise existential quantifiers to constants to obtain U(K2 ). Then, we further compute the answers cert(q, U(K2 ) ∪ M at(K1 )). Finally, we apply the filtering step Φ to obtain the final set of lower bound answers. The ELHOr⊥ fragment for our running example Kex consists of rules T1 –T4 and T6 , and the resulting new lower bound answer of q ex is the set {Dan, Rob}. Table 1. SLD proofs of ⊥ and q ex (Jo) in U(Kex ) ⊥ b(J, y) ∧ W (y) T (x) ∧ b(x, y) ∧ M (y) by U4 M (J) ∧ W (c2 ) by U6 b(J, y) ∧ M (y) by D6 W (c2 ) by D5 M (J) ∧ M (c1 ) by U52 S(x) ∧ b(x, c2 ) by T3 M (c1 ) by D5 M (x) ∧ b(x, c2 ) by U51 M (x) by U35 b(J, c2 ) by D5  by D5 M (J) by U6  by D5 6 Computing Relevant Fragments Fragment Definition and Formal Properties The relevant fragments K⊥ and K[q,G] are defined in terms of SLD proofs in U(K). In particular, K⊥ is defined in terms of proofs for the nullary predicate ⊥, and K[q,G] is defined in terms of proofs for each answer in G. Definition 2. Let K be a knowledge base, q(x) be a query, and S be a set of tuples. Then K⊥ (resp. K[q,S] ) is the set of all α ∈ K for which there exists β ∈ U(α) involved in an SLD proof of ⊥ (resp. Q(a), for some a ∈ S) in U(K). The properties of these fragments needed to ensure the correctness of our algo- rithm in Section 3 are summarised in the following theorem. Theorem 1. Let K be a knowledge base, q(x) a conjunctive query, and S a set of tuples. Then, (i) K is satisfiable iff K⊥ is satisfiable; and (ii) if K is satisfiable, then K |= q(a) iff K[q,S] ∪ K⊥ |= q(a) for every a ∈ S. As expected, K⊥ can be used to determine satisfiability of K. In case K is found satisfiable, the union of K[q,G] and K⊥ can then be used to check the valid- ity of each candidate answer in G (in this case, K⊥ is still needed to account for the possible interactions between non-Horn rules and rules with empty heads). Table 1 specifies proofs of ⊥ and q ex (Jo) in U(Kex ), where predicates and constants are abbreviated to their first letters. By Definition 2, K⊥ ∪ K[qex ,{Jo}] subsumes {T3 , . . . , T6 , D5 , D6 }, and, hence, it entails q ex (Jo), as expected. Note that K[q,{Jo}] alone is not sufficient to show q ex (Jo) since every fragment of Kex that entails q ex (Jo) must include rule T4 . According to Definition 1, K[q,{Jo}] will include T4 if and only if U4 is used in an SLD proof of q ex (Jo) in U(Kex ); however, no such proof will involve U4 since the goal q ex (Jo) does not involve ⊥, and there is no way of eliminating ⊥ from a goal using the rules in U(Kex ) as they do not contain ⊥ in their bodies. The proof of Theorem 1 is involved, and details are deferred to the appendix. Nonetheless, we next sketch the arguments behind the proof. A first observation is that, w.l.o.g. we can restrict ourselves to the case where q(x) is atomic. Lemma 1. Let K be a knowledge base, q(x) = ∃y ϕ(x, y) be a CQ, S be a set of tuples, Q be a fresh predicate, and let K0 = K[q,S] ∪ K⊥ . Then, K0 |= q(a) iff K0 ∪ {ϕ(x, y) → Q(x)} |= Q(a). The crux of the proof relies on the following properties of Ξ (the step in the definition of U which splits each non-Horn clause C into Horn clauses). Lemma 2. Let N be a set of first-order clauses. Then: – if C ∈ N participates in a refutation in N , then every C 0 ∈ Ξ(C) is part of an SLD proof of ⊥ in Ξ(N ); – if C ∈ N participates in a resolution proof in N of an atomic query Q(a), then each C 0 ∈ Ξ(C) participates in an SLD proof of ⊥ or Q(a) in Ξ(N ). Thus, by Lemma 2, each resolution proof in a set of clauses N can be mapped to SLD proofs in Ξ(N ) that “preserves” the participating clauses. The following lemma allows us to restate Lemma 2 for Ψ ◦ Ξ instead of Ξ. Lemma 3. Let H be a set of first-order Horn clauses, Q(x) be an atomic query, and a be a tuple of constants. If a clause C participates in an SLD proof of Q(a) in H, then Ψ (C) participates in an SLD proof of Q(a) in Ψ (H). With these Lemmas, we can exploit refutational completeness of resolution and the entailment preservation properties of Skolemisation to show Theorem 1. Fragment Computation The computation of the relevant fragments requires a scalable algorithm for “tracking” all rules and facts involved in SLD proofs for datalog programs. We next present a novel technique that delegates this task to the datalog engine itself. The main idea is to extend the datalog program with additional rules that are responsible for the tracking; in this way, the relevant rules and facts can be obtained from the materialisation of the modified program. Definition 3. Let K be a datalog KB and let F be a set of facts in M at(K). Then, ∆(K, F ) is the datalog program containing the rules and facts given next: – each rule and fact in K; – a fact P̄ (a) for each fact P (a) in F ; – the following rules for each r ∈ K of the form B1 (x1 ), . . . , Bm (xm ) → H(x), and 1 ≤ i ≤ m, with cr a fresh constant for each r, and S a fresh predicate: H̄(x) ∧ B1 (x1 ) ∧ . . . , Bm (xm ) → S(cr ) (1) H̄(x) ∧ B1 (x1 ), . . . ∧ Bm (xm ) → B̄i (xi ) (2) The auxiliary predicates P̄ are used to record facts involved in proofs; in particu- lar, if P̄ (c) is contained in M at(∆(K, F )), we can conclude that P (c) participates in an SLD proof in K of a fact in F . Furthermore, each rule r ∈ K is represented by a fresh constant cr , and S is a fresh predicate that is used to record rules of K involved in proofs. In particular, if S(cr ) is contained in M at(∆(K, F )), we can conclude that rule r participates in an SLD proof in K of a fact in F . The additional rules (1) and (2) are responsible for the tracking and make sure that the materialisation of ∆(K, F ) contains the required information. Indeed, if there is an instantiation B1 (a1 ) ∧ . . . ∧ Bm (am ) → H(a) of a rule r ∈ ∆, then, by virtue of (1), cr will be added to S, and, by virtue of (2), each B̄i (ai ), for 1 ≤ i ≤ m, will be derived. Correctness is established as follows. Theorem 2. Let K be a datalog knowledge base and let F be a set of facts in M at(K). Then, a fact P (a) (resp. a rule r) in K participates in an SLD proof of some fact in F iff P̄ (a) (resp. S(cr )) is in M at(∆(K, F )). 7 Summarisation and Answer Dependencies Once the relevant fragment has been computed, we check, using the fully-fledged reasoner, whether each candidate answer is entailed. This can be computation- ally expensive if the fragment is large, or there are many candidate answers to verify. To address these issues, we exploit summarisation techniques [4] to effi- ciently prune candidate answers. The idea behind summarisation is to “shrink” the data by merging constants instantiating the same unary predicates. Since summarisation is equivalent to extending the knowledge base with equality asser- tions, the summarised knowledge base entails the original one by monotonicity. Definition 4. Let K be a knowledge base. A type T is a set of unary predicates; for a constant a in K, we say that T = {A | A(a) ∈ K} is the type for a. Further- more, for each type T , let cT be a globally fresh constant uniquely associated with T . The summary function over K is the substitution σ mapping each constant a in K to cT , where T is the type for a. Finally, the knowledge base σ(K) obtained by replacing each constant a in K with σ(a) is called the summary of K. By summarising a knowledge base, we overestimate query answers [4]. Proposition 2. Let K be a knowledge base, and let σ be the summary function over K. Then, for every query q we have σ(cert(q, K)) ⊆ cert(σ(q), σ(K)). Summarisation can be exploited to detect spurious answers in G: if a tuple is not in cert(σ(q), σ(K)), then it is not in cert(q, K). Since summarisation can signifi- cantly reduce the size of a knowledge base, we can efficiently detect non-answers even if checking them over the summary requires calling the OWL reasoner. Corollary 1. Let K be a knowledge base, let q be a query, let S be a set of tuples, and let K0 = K[q,S] ∪ K⊥ . Furthermore, let σ be the summary function over K0 . Then, σ(K0 ) 6|= σ(q(a)) implies K 6|= q(a) for each a ∈ S. Finally, we try to further reduce the calls to the fully-fledged reasoner by exploiting dependencies between the candidate answers. Consider tuples a and b in G and the dataset D in the fragment K[q,G] ∪ K⊥ ; furthermore, suppose we can find an endomorphism h of D in which h(a) = b. If we can determine (by calling the fully-fledged reasoner) that b is a spurious answer, then so must be a; as a result, we no longer call the reasoner to check a. We exploit this idea to compute a dependency graph having candidate answers as nodes and an edge (a, b) whenever an endomorphism in D exists mapping a to b. Computing endomorphisms is computationally hard, so we have implemented a sound (but incomplete) greedy algorithm that approximates the dependency graph. Data DL Axioms Facts LUBM(n) SHI 93 105 n − UOBM (n) SHIN 314 2 × 105 n FLY SRI 144,407 6,308 DBPedia+ SHOIN 1,757 12,119,662 NPD SHIF 819 3,817,079 Table 2. Statistics for test data Strategy Solved |Univ| tavg. Strategy Solved |Univ| tavg. RL Bounds 14 1000 18.4 RL Bounds 12 500 0.7 + EL Lower Bound 22 1000 11.7 + Summarisation 14 60 14.0 + Sum, Dep 24 100 29.6 + Dependencies 15 1 1.8 Table 3. Result for LUBM Table 4. Result for UOBM− 8 Evaluation We have implemented a prototype system, called PAGOdA, based on RDFox and HermiT (v. 1.3.8). For testing, we used the LUBM and UOBM benchmarks, as well as the Fly Anatomy ontology, DBPedia and NPD FactPages; their key fea- tures are summarised in Table 2. Our system, test data, ontologies, and queries are available online.2 We compared our system with Pellet (v. 2.3.1) and TrOWL [20] on all datasets. While Pellet is sound and complete, TrOWL relies on ap- proximate reasoning and does not provide correctness guarantees. Tests were performed on a 16 core 3.30GHz Intel Xeon E5-2643 with 125GB of RAM, and running Linux 2.6.32. For each test, we measured materialisation times for upper and lower bound, the time to answer each query, and the number of queries that can be fully answered using different techniques. All times are in seconds. Materialisation is fast on LUBM [7]: it takes 319s (341s) to materialise the basic lower (upper) bound entailments for LUBM(1000). These bounds match for all 14 standard LUBM queries, and we have used 10 additional queries for which this is not the case; we tested our system on all 24 queries (see Table 3 for a summary of the results). The refined lower bound was materialised in 366s, and it matches the upper bound for 8 of the 10 additional queries; thus, our system could answer 22 of the 24 queries over LUBM(1000) efficiently in 12s on average.3 For the remaining 2 queries, we could scale to LUBM(100) in reasonable time. On LUBM(100) the gaps contain 29 and 14 tuples respectively, none of which were eliminated by summarisation; however, exploiting dependencies between gap tuples reduced the calls to HermiT to only 3 and 1 respectively, with the majority of time taken in extraction (avg. 45s) and HermiT calls (avg. 281s). On LUBM(1000), Pellet ran out of memory. For LUBM(100), Pellet took on average 8.2s to answer the standard queries with an initialisation overhead of 388s. TrOWL timed out after 1h on LUBM(100). 2 http://www.cs.ox.ac.uk/isg/tools/PAGOdA/ 3 Average query answering times are measured after materialisation. UOBM is an extension of LUBM [25]. Query answering over UOBM re- quires equality reasoning (e.g., to deal with cardinality constraints), which is not natively supported by RDFox,4 so we have used a slightly weakened ontol- ogy UOBM− for which equality is not required. Materialisation is still fast on UOBM− (500): it takes 346s (378s) to materialise the basic lower (upper) bound entailments. We have tested the 15 standard queries (see Table 4). The basic lower and upper bounds match for 12 queries; our system is efficient for these queries, with an average query answering time of less than 1s over UOBM− (500). For 2 of the remaining queries, summarisation prunes all candidate answers. Average times for these queries were under 15s for UOBM− (60). For the one remaining query, summarisation rules out 6245 among 6509 answers in the gap, and the dependency analysis groups all the remaining individuals. HermiT, how- ever, takes 20s to check the representative answer for UOBM− (1), and 4000s for UOBM− (10). Pellet times out even on UOBM− (1). TrOWL took 237s on av- erage to answer 14 out of the 15 queries over UOBM− (60).5 Furthermore, a comparison with our system reveals that TrOWL answers may be neither sound nor complete for most test queries. Fly Anatomy is a complex ontology, rich in existential axioms, and including a dataset with over 6,000 facts. We tested it with five queries provided by the developers the ontology. It took 88s (106s) to materialise lower (upper) bound entailments. The basic lower bounds for all queries are empty, whereas the refined lower bounds (which take 185s to materialise) match with the upper bound in all cases; as a result, we can answer the queries in 0.2s on average. Pellet fails to answer queries given a 1h timeout, and TrOWL returns only empty answers. In contrast to Fly, the DBPedia dataset is relatively large, but the ontology is simple. To provide a more challenging test, we have used the LogMap ontol- ogy matching system [9] to extend DBPedia with the tourism ontology which contains both disjunctive and existential axioms. Since the tested systems report errors on datatypes, we have removed all axioms and facts involving datatypes. It takes 45s (47s) to materialise the basic lower (upper) bound entailments. The upper bound was unsatisfiable and it took 2.6s to check satisfiability of the K⊥ fragment. We queried for instances of all 441 atomic concepts. Bounds matched in 439 cases (using the refined lower bound), and these queries were answered in 0.3s on average. Summarisation filtered out all gap tuples for the remaining two queries. The answer time for both queries was less than 3s. Pellet takes 280.9s to initialise and answers each query in an average time of 16.2s. TrOWL times out after 1h. The NPD FactPages ontology describes petroleum activities on the Nor- wegian continental shelf. The ontology is not Horn, and it includes existential axioms. As in the case of DBPedia, we removed axioms involving datatypes. Its dataset has about 4 million triples; it takes 17s (22s) to materialise the lower (upper) bound entailments. The upper bound is unsatisfiable, and it took 30s to check satisfiability of K⊥ . We queried for the instances of the 329 atomic 4 RDFox supports equality via its axiomatisation as a congruence relation. 5 An exception is reported for the remaining query. concepts, and could answer all queries in 2.5s on average. Queries with matching bounds (294 out of 329) could be answered on 0.1s, and average query answer- ing time was 3s. TrOWL took 1.3s to answer queries on average; answers were complete for 320 out of the 329 queries. 9 Related Techniques The Screech system [21] exploits the KAON2 reasoner [8] to rewrite a SHIQ ontology into disjunctive datalog while preserving atomic queries, and then trans- forms ∨ into ∧; the resulting over-approximation can be used to compute up- per bound query answers. This technique is restricted to SHIQ ontologies and atomic queries; furthermore, the set of rules obtained from KAON2 can be ex- pensive to compute, as well as of exponential size. Both the Quill system [17] and the work of [25] under-approximate the ontology into OWL 2 QL; however, neither approximation is independent of both query and data, and using OWL 2 QL increases the chances that the approximated ontology will be unsatisfiable. The SHER system uses summarisation to efficiently compute an upper bound answer, with exact answers then being computed via successive relaxations [3, 4]. The technique has been shown to be scalable, but it is only known to be applicable to SHIN and atomic queries, and is less modular than our approach. In contrast, our approach can profitably exploit the summarisation technique, and could even improve scalability for the hardest queries by replacing HermiT with SHER when the extracted fragment is SHIN . 10 Discussion We have proposed a novel approach for query answering that integrates scalable and complete reasoners to provide pay-as-you-go performance. Our evaluation shows that 772 of the 814 test queries could be answered using highly scalable lower and upper bound computations, 39 of the remaining 42 queries yielded to extraction and summarisation techniques, and even for the remaining 3 queries our fragment extraction and dependency techniques greatly improved scalability. Our approach is complementary to other optimisation efforts, and could immedi- ately benefit from alternative techniques for efficiently computing lower bounds and/or a more efficient OWL reasoner. Our technical results are very general, and hold for any language L captured by generalised rules. There are still many possibilities for future work. For the immediate future, our main focus will be improving the fragment extraction and checking tech- niques so as to improve scalability for the hardest queries. Acknowledgements. This work was supported by the Royal Society, the EP- SRC projects Score!, Exoda, and MaSI3 , and the FP7 project OPTIQUE. References 1. Bishop, B., Kiryakov, A., Ognyanoff, D., Peikov, I., Tashev, Z., Velkov, R.: OWLIM: A family of scalable semantic repositories. Semantic Web 2(1), 33–42 (2011) 2. Cuenca Grau, B., Motik, B., Stoilos, G., Horrocks, I.: Completeness guarantees for incomplete ontology reasoners: Theory and practice. J. Artif. Intell. Res. (JAIR) 43, 419–476 (2012) 3. Dolby, J., Fokoue, A., Kalyanpur, A., Kershenbaum, A., Schonberg, E., Srinivas, K., Ma, L.: Scalable semantic retrieval through summarization and refinement. In: AAAI. pp. 299–304 (2007) 4. Dolby, J., Fokoue, A., Kalyanpur, A., Schonberg, E., Srinivas, K.: Scalable highly expressive reasoner (SHER). J. Web Sem. 7(4), 357–361 (2009) 5. Eiter, T., Ortiz, M., Simkus, M.: Conjunctive query answering in the description logic SH using knots. J. Comput. Syst. Sci. 78(1), 47–85 (2012) 6. Glimm, B., Lutz, C., Horrocks, I., Sattler, U.: Conjunctive query answering for the description logic SHIQ. J. Artif. Intell. Res. (JAIR) 31, 157–204 (2008) 7. Guo, Y., Pan, Z., Heflin, J.: LUBM: A benchmark for OWL knowledge base sys- tems. J. Web Sem. 3(2-3), 158–182 (2005) 8. Hustadt, U., Motik, B., Sattler, U.: Reasoning in description logics by a reduction to disjunctive datalog. J. Autom. Reasoning 39(3), 351–384 (2007) 9. Jiménez-Ruiz, E., Cuenca Grau, B., Zhou, Y., Horrocks, I.: Large-scale interactive ontology matching: Algorithms and implementation. In: ECAI. pp. 444–449 (2012) 10. Kollia, I., Glimm, B.: Optimizing SPARQL query answering over OWL ontologies. J. Artif. Intell. Res. (JAIR) 48, 253–303 (2013) 11. Kontchakov, R., Lutz, C., Toman, D., Wolter, F., Zakharyaschev, M.: The com- bined approach to ontology-based data access. In: IJCAI. pp. 2656–2661 (2011) 12. Möller, R., Neuenstadt, C., Özçep, Ö.L., Wandelt, S.: Advances in accessing big data with expressive ontologies. In: Description Logics. pp. 842–853 (2013) 13. Motik, B., Cuenca Grau, B., Horrocks, I., Wu, Z., Fokoue, A., Lutz, C.: OWL 2 Web Ontology Language Profiles. W3C Recommendation (27 October 2009), available at http://www.w3.org/TR/owl2-profiles/ 14. Motik, B., Nenov, Y., Piro, R., Horrocks, I., Olteanu, D.: Parallel materialisation of datalog programs in main-memory rdf databases. In: AAAI (2014) 15. Motik, B., Patel-Schneider, P.F., Parsia, B.: OWL 2 Web Ontology Language Structural Specification and Functional-style Syntax. W3C Recommendation (27 October 2009 2009), available at http://www.w3.org/TR/owl2-syntax/ 16. Motik, B., Shearer, R., Horrocks, I.: Hypertableau reasoning for description logics. J. Artif. Intell. Res. (JAIR) 36, 165–228 (2009) 17. Pan, J.Z., Thomas, E., Zhao, Y.: Completeness guaranteed approximations for OWL-DL query answering. In: Description Logics (2009) 18. Sirin, E., Parsia, B., Cuenca Grau, B., Kalyanpur, A., Katz, Y.: Pellet: A practical OWL-DL reasoner. J. Web Sem. 5(2), 51–53 (2007) 19. Stefanoni, G., Motik, B., Horrocks, I.: Introducing nominals to the combined query answering approaches for EL. In: AAAI (2013) 20. Thomas, E., Pan, J.Z., Ren, Y.: TrOWL: Tractable OWL 2 reasoning infrastruc- ture. In: ESWC (2). pp. 431–435 (2010) 21. Tserendorj, T., Rudolph, S., Krötzsch, M., Hitzler, P.: Approximate OWL- reasoning with Screech. In: RR. pp. 165–180 (2008) 22. Urbani, J., van Harmelen, F., Schlobach, S., Bal, H.E.: QueryPIE: Backward rea- soning for OWL Horst over very large knowledge bases. In: International Semantic Web Conference (1). pp. 730–745 (2011) 23. Urbani, J., Kotoulas, S., Maassen, J., van Harmelen, F., Bal, H.E.: WebPIE: A web- scale parallel inference engine using MapReduce. J. Web Sem. 10, 59–75 (2012) 24. W3C SPARQL Working Group: SPARQL 1.1 Overview. W3C Recommendation (21 March 2013), available at http://www.w3.org/TR/sparql11-overview/ 25. Wandelt, S., Möller, R., Wessel, M.: Towards scalable instance retrieval over on- tologies. Int. J. Software and Informatics 4(3), 201–218 (2010) 26. Wu, Z., Eadon, G., Das, S., Chong, E.I., Kolovski, V., Annamalai, M., Srinivasan, J.: Implementing an inference engine for RDFS/OWL constructs and user-defined rules in oracle. In: ICDE. pp. 1239–1248 (2008) 27. Zhou, Y., Cuenca Grau, B., Horrocks, I., Wu, Z., Banerjee, J.: Making the most of your triple store: query answering in OWL 2 using an RL reasoner. In: WWW. pp. 1569–1580 (2013) 28. Zhou, Y., Nenov, Y., Cuenca Grau, B., Horrocks, I.: Complete query answering over Horn ontologies using a triple store. In: ISWC (1). pp. 720–736 (2013) 29. Zhou, Y., Nenov, Y., Cuenca Grau, B., Horrocks, I.: Pay-as-you-go OWL query answering using a triple store. In: AAAI (2014)