Module Extraction via Query Inseparability in OWL 2 QL Boris Konev1 , Roman Kontchakov2 , Michel Ludwig1 , Thomas Schneider3 , Frank Wolter1 , and Michael Zakharyaschev2 1 University of Liverpool, UK {konev,michel.ludwig,wolter}@liverpool.ac.uk 2 Birkbeck College London, UK {roman,michael}@dcs.bbk.ac.uk 3 University of Bremen, Germany tschneider@informatik.uni-bremen.de Abstract. We show that deciding conjunctive query inseparability for OWL 2 QL ontologies is PSpace-hard and in ExpTime. We give polyno- mial-time (incomplete) algorithms and demonstrate by experiments that they can be used for practical module extraction. 1 Introduction Ontology-based data access (OBDA) has recently emerged as one of the most interesting and challenging applications of description logic. The key idea is to use ontologies for enriching data with background knowledge, and thereby en- able query answering over incomplete and semistructured data via a high-level conceptual interface. The W3C recognised the importance of OBDA by includ- ing in the OWL 2 Web Ontology Language the profile OWL 2 QL, which was designed for OBDA with relational database systems. OWL 2 QL is based on a description logic that was originally introduced under the name DL-LiteR [5, 6] and called DL-LiteH core in the more general classification [1]. It can be described as an optimal sub-language of SROIQ, underlying OWL 2, which includes most of the features of conceptual models, and for which query answering can be done in AC0 for data complexity. Thus, DL-LiteH core is becoming a major language for developing ontologies, and a target language for translation and approxima- tion of existing ontologies formulated in more expressive DLs [11, 4]. One of the consequences of this development is that DL-LiteH core ontologies turn out to be larger and more complex than originally envisaged. As a result, reasoning support for ontology engineering tasks such as composing, re-using, comparing, and extracting ontologies—which so far has been only analysed for expressive DLs [7, 12], EL [10] and DL-Lite dialects without role inclusions [9]—is becoming increasingly important for DL-LiteH core as well. In the context of OBDA, the basic notion underlying many ontology engi- neering tasks is Σ-query inseparability: for a signature (a set of concept and role names) Σ, two ontologies are deemed to be inseparable if they give the same answers to any conjunctive query over any data formulated in Σ. Thus, in ap- plications using Σ-queries and data, one can safely replace any ontology by a Σ-query inseparable one. Note that the relativisation to Σ is very important here. For example, one cannot expect modules of an ontology to be query insep- arable from the whole ontology for arbitrary queries and data sets, whereas this should be the case if we restrict the query and data language to the module’s signature or a specified subset thereof. Similarly, when comparing two versions of one ontology, the subtle and potentially problematic differences are those that concern queries over their common symbols, rather than all symbols occurring in these versions. In applications where ontologies are built using imported parts, a stronger notion of inseparability is required: two ontologies are strongly Σ-query inseparable if they give the same answers to Σ-queries and data when imported to an arbitrary context ontology formulated in Σ. The aim of this paper is to (i) investigate the computational complexity of deciding (strong) Σ-query inseparability for DL-LiteH core ontologies, (ii) develop efficient (though incomplete) algorithms for practical inseparability checking, and (iii) analyse the performance of the algorithms for the challenging task of minimal module extraction. One of our surprising discoveries is that the analysis of Σ-query insepara- bility for DL-LiteH core ontologies requires drastically different logical tools com- pared with the previously considered DLs. It turns out that the new syntactic ingredient—the interaction of role inclusions and inverse roles—makes deciding (strong) query inseparability PSpace-hard, as opposed to the known coNP and Π2p -completeness results for DL-Lite dialects without role inclusions [9]. On the other hand, the obtained ExpTime upper bound is actually the first known decidability result for strong inseparability, which goes beyond the ‘essentially’ Boolean logic and might additionally indicate a way of solving the open problem of strong Σ-query inseparability for EL [10]. For DL-Litecore ontologies (without role inclusions), strong Σ-query inseparability is shown to be only NLogSpace- complete. We give (incomplete) polynomial-time algorithms checking (strong) Σ-inseparability and demonstrate, by a set of minimal module extraction exper- iments, that they are (i ) complete for many existing DL-LiteH core ontologies and signatures, and (ii ) sufficiently fast to be used in module extraction algorithms that require thousands of Σ-query inseparability checks. All omitted proofs can be found at www.dcs.bbk.ac.uk/~roman/owl2ql-modules. 2 Σ-Query Entailment and Inseparability We begin by formally defining DL-LiteH core , underlying OWL 2 QL, and the no- tions of Σ-query inseparability and entailment. The language of DL-LiteH core contains countably infinite sets of individual names ai , concept names Ai , and role names Pi . Roles R and concepts B of this language are defined by: R ::= Pi | Pi− , B ::= ⊥ | > | Ai | ∃R. A DL-LiteH core TBox, T , is a finite set of inclusions B1 v B2 , R1 v R2 , B1 u B2 v ⊥, R1 u R2 v ⊥, where B1 , B2 are concepts and R1 , R2 roles. An ABox, A, is a finite set of asser- tions of the form B(ai ), R(ai , aj ) and ai 6= aj , where ai and aj are individual names, B a concept and R a role. Ind(A) will stand for the set of individual names occurring in A. Taken together, T and A constitute the DL-LiteH core knowledge base (KB, for short) K = (T , A). The sub-language of DL-LiteH core without role inclusions R1 v R2 is denoted by DL-Litecore [6]. The semantics of DL-LiteH core is defined as usual in DL [2]. We only note that, in interpretations I = (∆I , ·I ), we do not have to comply with the UNA, that is, we can have aIi = aIj for i 6= j. We write I |= α to say that an inclusion or assertion α is true in I. The interpretation I is a model of a KB K = (T , A) if I |= α for all α ∈ T ∪ A. K is consistent if it has a model. A concept B is said to be T -consistent if (T , {B(a)}) has a model. K |= α means that I |= α for all models I of K. A conjunctive query (CQ) q(x1 , . . . , xn ) is a first-order formula ∃y1 . . . ∃ym ϕ(x1 , . . . , xn , y1 , . . . , ym ), where ϕ is constructed, using only ∧, from atoms of the form B(t) and R(t1 , t2 ), with B being a concept, R a role, and ti being an individual name or a variable from the list x1 , . . . , xn , y1 , . . . , ym . The variables in ~x = x1 , . . . , xn are called answer variables of q. We say that an n-tuple ~a ⊆ Ind(A) is an answer to q in an interpretation I if I |= q[~a] (here we regard I to be a first-order structure); ~a is a certain answer to q over a KB K = (T , A) if I |= q[~a] for all models I of K; in this case we write K |= q[~a]. To define the main notions of this paper, consider two KBs K1 = (T1 , A) and K2 = (T2 , A). For example, the Ti are different versions of some ontology, or one of them is a refinement of the other by means of new axioms. The question we are interested in is whether they give the same answers to queries formulated in a certain signature, say, in the common vocabulary of the Ti or in a vocabulary relevant to an application. To be precise, by a signature, Σ, we understand any finite set of concept and role names. A concept (inclusion, TBox, etc.) all concept and role names of which are in Σ is called a Σ-concept (inclusion, etc.). We say that K1 Σ-query entails K2 if, for all Σ-queries q(~x) and all ~a ⊆ Ind(A), K2 |= q[~a] implies K1 |= q[~a]. In other words: any certain answer to a Σ-query given by K2 is also given by K1 . As the ABox is typically not fixed or known at the ontology design stage, we may have to compare the TBoxes over arbitrary Σ-ABoxes rather than a fixed one, which gives our central definition: Definition 1. Let T1 and T2 be TBoxes and Σ a signature. T1 Σ-query entails T2 if (T1 , A) Σ-query entails (T2 , A) for any Σ-ABox A. T1 and T2 are Σ-query inseparable if they Σ-query entail each other, in which case we write T1 ≡Σ T2 . In many applications, Σ-query inseparability is enough to ensure that T1 can be safely replaced by T2 . However, if they are developed as part of a larger ontology or are meant to be imported in other ontologies, a stronger notion is required: Definition 2. T1 strongly Σ-query entails T2 if T1 ∪ T Σ-query entails T2 ∪ T , for all Σ-TBoxes T . T1 and T2 are strongly Σ-query inseparable if they strongly Σ-query entail each other, in which case we write T1 ≡sΣ T2 . The following example illustrates the difference between Σ-query and strong Σ-query inseparability. For further discussion and examples, consult [7, 9]. Example 3. Let T1 = ∅, T2 = {> v ∃R, ∃R− v B, B u A v ⊥} and Σ = {A}. T1 and T2 are Σ-query inseparable. However, they are not strongly Σ-query inseparable. Indeed, for the Σ-TBox T = {> v A}, T1 ∪ T is consistent, while T2 ∪ T is inconsistent, and so T1 ∪ T does not Σ-query entail T2 ∪ T , as witnessed by the query q = ⊥. 3 Σ-Query Entailment and Σ-Homomorphisms In this section, we characterise Σ-query entailment between DL-LiteH core TBoxes semantically in terms of (partial) Σ-homomorphisms between certain canonical models. Then, in the next section, we use this characterisation to investigate the complexity of deciding Σ-query entailment. The canonical model, MK , of a consistent KB K = (T , A) gives correct answers to all CQs. In general, MK is infinite; however, it can be folded up into a small generating model GK = (IK , K ) consisting of a finite interpretation ∗ IK and a generating relation K that defines the unfolding. Let vT be the reflexive and transitive closure of the role inclusion relation given by T , and let [R] = {S | R v∗T S and S v∗T R}. We write [R] ≤T [S] if R v∗T S; thus, ≤T is a partial order on the set {[R] | R a role in T }. For each [R], we introduce a witness w[R] and define a generating relation K on the set of these witnesses together with Ind(A) by taking: – a K w[R] if a ∈ Ind(A) and [R] is ≤T -minimal such that K |= ∃R(a) and K 6|= R(a, b) for all b ∈ Ind(A); – w[S] K w[R] if [R] is ≤T -minimal with T |= ∃S − v ∃R and [S − ] 6= [R]. A role R is generating in K if there are a ∈ Ind(A) and R1 , . . . , Rn = R such that a K w[R1 ] K · · · K w[Rn ] . The interpretation IK is defined as follows: ∆IK = Ind(A) ∪ {w[R] | R is generating in K}, aIK = a, for all a ∈ Ind(A), AIK = {a ∈ Ind(A) | K |= A(a)} ∪ {w[R] | T |= ∃R− v A}, P IK = {(a, b) ∈ Ind(A) × Ind(A) | there is R(a, b) ∈ A s.t. [R] ≤T [P ]} ∪ {(x, w[R] ) | x K w[R] and [R] ≤T [P ]} ∪ {(w[R] , x) | x K w[R] and [R] ≤T [P − ]}. GK can be constructed in polynomial time in |K|, and it is not hard to see that IK |= K. To construct the canonical model MK giving the correct answers to all CQs, we unfold the generating model GK = (IK , K ) along K . A path in GK is a finite sequence aw[R1 ] · · · w[Rn ] , n ≥ 0, such that a ∈ Ind(A), a K w[R1 ] and w[Ri ] K w[Ri+1 ] , for i < n. Denote by path(GK ) the set of all paths in GK and by tail(σ) the last element in σ ∈ path(GK ). MK is defined by taking: ∆MK = path(GK ), aMK = a, for all a ∈ Ind(A), AMK = {σ | tail(σ) ∈ AIK }, P MK = {(a, b) ∈ Ind(A) × Ind(A) | (a, b) ∈ P IK } ∪ {(σ, σ · w[R] ) | tail(σ) K w[R] , [R] ≤T [P ]} ∪ {(σ · w[R] , σ) | tail(σ) K w[R] , [R] ≤T [P − ]}. Example 4. For T1 = {A v ∃S, ∃S − v ∃T, ∃T − v ∃T, T v R} and K1 = (T1 , {A(a)}), the models GK1 and MK1 look as follows ( K1 in GK1 is shown as ): R, T A S R, T GK1 a wS wT A S R, T R, T MK1 ... a awS awS wT awS wT wT Theorem 5. For all consistent DL-LiteH core KBs K = (T , A), CQs q(~ x) and ~a ⊆ Ind(A), we have K |= q[~a] iff MK |= q[~a]. Thus, to decide Σ-query entailment between KBs K1 and K2 , it suffices to check whether MK2 |= q[~a] implies MK1 |= q[~a] for all Σ-queries q(~x) and tuples ~a. This relationship between MK2 and MK1 can be characterised semantically in terms of finite Σ-homomorphisms. For an interpretation I and a signature Σ, the Σ-types tIΣ (x) and r IΣ (x, y), for x, y ∈ ∆I , are given by: tIΣ (x) = {Σ-concept B | x ∈ B I }, r IΣ (x, y) = {Σ-role R | (x, y) ∈ RI }. 0 A Σ-homomorphism from an I to I 0 is a function h : ∆I → ∆I such that 0 0 h(aI ) = aI , for all individual names a interpreted in I, tIΣ (x) ⊆ tIΣ (h(x)) and 0 r IΣ (x, y) ⊆ r IΣ (h(x), h(y)), for all x, y ∈ ∆I . It is well-known that answers to conjunctive Σ-queries are preserved under Σ-homomorphisms. Thus, if there is a Σ-homomorphism from MK2 to MK1 , then K1 Σ-query entails K2 . However, the converse does not hold in general. Example 6. Take T1 from Example 4, and let T2 result from replacing R in T1 with R− . Let Σ = {A, R} and Ki = (Ti , {A(a)}). Then the Σ-reduct of MK1 does not contain a Σ-homomorphic image of the Σ-reduct of MK2 , depicted be- low. On the other hand, it is easily seen that T1 and T2 are Σ-query inseparable. A R− R− MK2 a ... Note that the Σ-reduct of MK2 contains points that are not reachable from the ABox by Σ-roles. In fact, using König’s Lemma, one can show that if every point in MK2 is reachable from the ABox by a path of Σ-roles, then K1 Σ-query entails K2 iff there exists a Σ-homomorphism from MK2 to MK1 . We say that I is finitely Σ-homomorphically embeddable into I 0 if, for every finite sub-interpretation I1 of I, there exists a Σ-homomorphism from I1 to I 0 . Theorem 7. Let K1 and K2 be consistent DL-LiteH core KBs. Then K1 Σ-query entails K2 iff MK2 is finitely Σ-homomorphically embeddable into MK1 . Theorem 7 does not yet give a satisfactory semantic characterisation of Σ- query entailment between TBoxes, as one still has to consider infinitely many Σ-ABoxes. However, using the fact that inclusions in DL-LiteH core , different from disjointness axioms, involve only one concept or role in the left-hand side and making sure that the TBoxes entail the same Σ-inclusions, one can show that it is enough to consider singleton Σ-ABoxes of the form {B(a)}. Denote the mod- els G(T ,{B(a)}) and M(T ,{B(a)}) by GTB and MBT , respectively. We thus obtain the following characterisation of Σ-entailment between DL-LiteH core TBoxes: Theorem 8. T1 Σ-query entails T2 iff (p) T2 |= α implies T1 |= α, for all Σ-inclusions α; (h) MB B T2 is finitely Σ-homomorphically embeddable into MT1 , for all T1 -con- sistent Σ-concepts B. By applying condition (p) to B v ⊥, we obtain that every T1 -consistent Σ- concept B is also T2 -consistent. 4 Complexity of Σ-Query Entailment We use Theorem 8 to show that deciding Σ-query entailment for DL-LiteH core TBoxes is PSpace-hard and in ExpTime. Recall that subsumption in DL-LiteH core is NLogSpace-complete [6, 1]; so condition (p) of Theorem 8 can be checked in polynomial time. And, since there are at most 2·|Σ| singleton Σ-ABoxes, we can concentrate on the complexity of checking finite Σ-homomorphic embeddability of canonical models for singleton ABoxes. We begin by considering DL-Litecore , where the existence of Σ-homomorph- isms between canonical models can be expressed in terms of the types of their points; cf. [9]. Let T1 and T2 be DL-Litecore TBoxes and Σ a signature. Theorem 9. T1 Σ-query entails T2 iff (p) holds and, for every T1 -consistent B B IB IB Σ-concept B and every x ∈ ∆IT2 , there is x0 ∈ ∆IT1 with tΣT2 (x) ⊆ tΣT1 (x0 ). The criterion of Theorem 9 can be checked in polynomial time, in NLog- Space, to be more precise. Thus: Theorem 10. Checking Σ-query entailment for TBoxes in DL-Litecore is com- plete for NLogSpace. However, if role inclusions become available, the picture changes dramatically: not only do we have to compare the Σ-types of points in the canonical models, but also the Σ-paths to these points. To illustrate, consider the generating models G1 , G2 in Fig. 1, where the arrows represent the generating relations, and the concept names A, Xi0 , Xi1 and the role names R and Tj are all symbols in Σ. The model G2 contains 4 R-paths from a to w, which are further extended by the infinite Tj -paths. The paths π from a to w can be homomorphically mapped to distinct R-paths h(π) in G1 starting from a. But the extension of such a π with the infinite Tj -chain can only be mapped first to a suffix of h(π) (backward, along Tj− )—because we have to map paths in the unfolding M2 of G2 to paths in M1 —and then to a Tj -loop in G1 . But to check whether this can be done, we may have to ‘remember’ the whole path π. X11 X31 G2 R R R R a w A R R R R T2 X10 X30 T1 T1 T2 T1 T2 T1 T1 T2 − X11 R,Tj− X21 R,Tj− X31 R,Tj− X41 R,T j − − − Tj Tj Tj A R, R R, R R, R a ,T − ,T − ,T − R,T − j j j G1 j X10 R,Tj− X20 R,Tj− X30 R,Tj− X40 Fig. 1. Σ-reducts of generating models G2 and G1 . To see thatVG1 and G2 can be given by DL-LiteH core TBoxes, fix a QBF m Q1 X1 . . . Qn Xn j=1 Cj , where Qi ∈ {∀, ∃} and C1 , . . . , Cm are clauses over the variables X1 , . . . , Xn . Let Σ = {A, Xi0 , Xi1 , R, Tj | i ≤ n, j ≤ m}, T1 contain the inclusions A v ∃S0− , − ∃Si−1 v ∃Qki , ∃(Qki )− v Xik , Qki v Si , Si v R, Xik v ∃Rj if k = 0, ¬Xi ∈ Cj or k = 1, Xi ∈ Cj , ∃Rj− v ∃Rj , Rj v Tj , Si v Tj− , and let T2 contain the inclusions ( ∃Qki , if Qi = ∀, A v ∃S0− , − ∃Si−1 v ∃Si , if Qi = ∃, ∃(Qki )− v Xik , Qki v Si , Si v R, ∃Sn− v ∃Pj , ∃Pj− v ∃Pj , Pj v Tj , for all i ≤ n, j ≤ m, k = 1, 2. The generating models GTA1 and GTA2 , restricted to Σ, look like G1 and G2 in Fig. 1, respectively. Moreover, one can show that MA A T2 is (finitely) Σ-homomorphically embeddable into MT1 iff the QBF above is satisfiable. As satisfiability of QBFs is PSpace-complete, we obtain: Theorem 11. Σ-query entailment for DL-LiteH core TBoxes is PSpace-hard. On the other hand, the problem whether MK2 is finitely Σ-homomorphically embeddable into MK1 can be reduced to the emptiness problem for alternating two-way automata, which belongs to ExpTime [13]. In a way similar to [13, 8], where these automata were employed to prove ExpTime-decidability of the modal µ-calculus with converse and the guarded fixed point logic of finite width, one can use their ability to ‘remember’ paths (in the sense illustrated in the example above) to obtain the ExpTime upper bound: Theorem 12. Σ-query entailment for DL-LiteH core TBoxes is in ExpTime. The precise complexity of Σ-query entailment for DL-LiteH core TBoxes is still unknown. Recall that deciding Σ-query entailment for DL-LiteN horn is coNP- complete [9]. Compared to DL-LiteH N core , DL-Litehorn allows (unqualified) number restrictions and conjunctions in the left-hand side of concept inclusions, but does not have role inclusions: DL-LiteN H horn ∩ DL-Litecore = DL-Litecore . CQ answering 0 is in AC for data complexity in all three languages under the UNA. However, the computational properties of these logics become different as far as Σ-query entailment is concerned: NLogSpace-complete for DL-Litecore , coNP-complete for DL-LiteN H horn , and between PSpace and ExpTime for DL-Litecore . It may be N of interest to note that Σ-query entailment for DL-Litebool , allowing full Booleans as concept constructs, is Π2p -complete. Let us consider strong Σ-query entailment. It is easy to construct an expo- nential-time algorithm checking strong Σ-query entailment between DL-LiteH core TBoxes T1 and T2 : enumerate all Σ-TBoxes T and check whether T1 ∪T Σ-query entails T2 ∪ T . As there are quadratically many Σ-inclusions, this algorithm calls 2 the Σ-query entailment checker ≤ 2|Σ| times. We now show that one can do much better than that. First, it turns out that instead of expensive Σ-query entailment checks for the TBoxes Ti ∪ T , it is enough to check consistency (in polynomial time). More precisely, suppose T1 Σ-query entails T2 . One can show then that T1 does not strongly Σ-query entail T2 iff there exist a Σ-TBox T and a Σ-concept B such that (T1 ∪ T , {B(a)}) is consistent but (T2 ∪ T , {B(a)}) is not (cf. Example 3). Moreover, checking consistency for all Σ-TBoxes T can further be reduced—using the primitive form of DL-LiteH core axioms—to checking consistency for all singleton Σ-TBoxes T . Thus, we obtain the following: Theorem 13. Suppose that T1 Σ-query entails T2 . Then T1 does not strongly Σ- query entail T2 iff there is a Σ-concept B and a Σ-TBox T with a single inclusion of the form B1 v B2 or R1 v R2 such that (T1 ∪ T , {B(a)}) is consistent but (T2 ∪ T , {B(a)}) is inconsistent. So, if we already know that T1 Σ-query entails T2 , then checking whether this entailment is actually strong can be done in polynomial time (and NLogSpace). 5 Incomplete Algorithm for Σ-Query Entailment The interplay between role inclusions and inverse roles, required in the proof of PSpace-hardness, appears to be too artificial compared to how roles are used in ‘real-world’ ontologies. Thus, in conceptual modelling, the number of roles is comparable with the number of concepts, but the number of role inclusions is much smaller. For this reason, instead of a complete (exponential) Σ-query en- tailment checker, we have implemented a polynomial-time correct but incomplete algorithm, which is based on testing simulations between transition systems. Let T1 and T2 be DL-LiteH core TBoxes, Σ a signature, B a Σ-concept. Denote Ki = (Ti , {B(a)}) and Ii = IKi , i = 1, 2. A relation ρ ⊆ ∆I2 × ∆I1 is called a Σ-simulation of GK2 in GK1 if the following conditions hold: (s1) the domain of ρ is ∆I2 and (aI2 , aI1 ) ∈ ρ; (s2) tIΣ2 (x) ⊆ tIΣ1 (x0 ), for all (x, x0 ) ∈ ρ; (s3) if x K2 w[R] and (x, x0 ) ∈ ρ, then there is y 0 ∈ ∆I1 such that (w[R] , y 0 ) ∈ ρ and S ∈ r IΣ1 (x0 , y 0 ) for every Σ-role S with [R] ≤T2 [S]. We call ρ a forward Σ-simulation if it satisfies (s1), (s2) and the condition (s30 ), which strengthens (s3) with the extra requirement: y 0 = w[T ] , for some role T , with x0 K1 w[T ] and [T ] ≤T1 [S] for every Σ-role S with [R] ≤T2 [S]. Example 14. In Example 6, there is a Σ-simulation of GK2 in GK1 , but no forward Σ-simulation. The same applies to G2 and G1 in the proof of the PSpace bound. In contrast to finite Σ-homomorphic embeddability of MK2 in MK1 , the problem of checking the existence of (forward) Σ-simulations of GK2 in GK1 is tractable and well understood from the literature on program verification [3]. Consider now the following conditions, which can be checked in polynomial time: (y) condition (p) holds and there is a forward Σ-simulation of GTB2 in GTB1 , for every T1 -consistent Σ-concept B; (n) condition (p) does not hold or there is no Σ-simulation of GTB2 in GTB1 , for any T1 -consistent Σ-concept B. Theorem 15. Let T1 , T2 be DL-LiteH core TBoxes and Σ a signature. If (y) holds, then T1 Σ-query entails T2 . If (n) holds, then T1 does not Σ-query entail T2 . Thus, an algorithm checking conditions (y) and (n) can be used as a correct but incomplete Σ-query entailment checker. It cannot be complete since neither (y) nor (n) holds in Example 14. On the other hand, condition (n) proves to be a criterion of Σ-query entailment in two important cases: Theorem 16. Let (a) T1 , T2 be DL-Litecore TBoxes, or (b) T1 = ∅ and T2 a DL-LiteH core TBox. Then condition (n) holds iff T1 does not Σ-query entail T2 . 6 Experiments Checking (strong) Σ-query entailment has multiple applications in ontology ver- sioning, re-use, and extraction. We have used the algorithms, suggested by The- orems 15 and 13, for minimal module extraction to see how efficient they are in practice and whether the incompleteness of the (y)–(n) conditions is prob- lematic. Extracting minimal modules from medium-sized real-world ontologies requires thousands of calls of the (strong) Σ-query entailment checker, and thus provides a tough test for our approach. For a TBox T and a signature Σ, a subset M ⊆ T is – a Σ-query module of T if M ≡Σ T ; – a strong Σ-query module of T if M ≡sΣ T ; – a depleting Σ-query module of T if ∅ ≡sΣ∪sig(M) T \ M, where sig(M) is the signature of M. We are concerned with computing a minimal (w.r.t. ⊆) Σ-query (MQM), a mini- mal strong Σ-query (MSQM), and the (uniquely determined) minimal depleting Σ-query (MDQM) module of T . The general extraction algorithms, which call Σ-query entailment checkers, are taken from [9]. For MQMs and MSQMs, the number of calls to the checker coincides with the number of inclusions in T . For MDQMs (where one of the TBoxes given to the checker is empty, and so the checker is complete, by Theorem 16), the number of checker calls is quadratic in the number of inclusions in T . We extracted modules from OWL 2 QL approximations of 3 commercial soft- ware applications called Core, Umbrella and Mimosa (the original ontologies use a few axioms that are not expressible OWL 2 QL). Mimosa is a specialisation of the MIMOSA OSA-EAI specification4 for container shipping. Core is based on a supply-chain management system used by the bookstore chain Ottakar’s (now merged with Waterstone’s), and Umbrella on a research data validation and processing system used by the Intensive Care National Audit and Research Centre.5 The original Core and Umbrella were used for the experiments in [9]. ontology Mimosa Core Umbrella IMDB LUBM concept inclusions 710 1214 1506 45 136 role inclusions 53 19 13 21 9 concept names 106 82 79 14 43 role names 145 76 64 30 31 For comparison, we extracted modules from OWL 2 QL approximations of the well-known IMDB and LUBM ontologies. For each of these ontologies, we ran- domly generated 20 signatures Σ of 5 concept and 5 roles names. We extracted Σ-MQMs, MSQMs, MDQMs as well as the >⊥-module [7] from the whole Mi- mosa, IMBD and LUBM ontologies. For the larger Umbrella and Core on- tologies, we first computed the >⊥-modules, and then employed them to fur- ther extract MQMs, MSQMs, MDQMs, which are all contained in the >⊥- modules. The average size of the resulting modules and its standard devia- tion is shown below. Details of the experiments and ontologies are available at www.dcs.bbk.ac.uk/~roman/owl2ql-modules. Here we briefly comment on efficiency and incompleteness. Checking Σ-query inseparability turned out to be very fast: a single call of the checker never took more than 1s for our ontologies. For strong Σ-query inseparability, the maximal time was less than 1 min. For 4 htpp://www.mimosa.org/?q=resources/specs/osa-eai-v321 5 http://www.icnarc.org comparisons with the empty TBox, the maximal time for strong Σ-query insep- arability tests was less than 10s. In the hardest case, Mimosa, the average total extraction times were 2.5 mins for MQMs, 140 mins for MSQMs, and 317 mins for MDQMs. Finally, only in 9 out of about 75,000 calls, the Σ-query entail- ment checker was not able to give a certain answer due to incompleteness of the (y)–(n) condition, in which case the inclusions in question were added to the module. Core (1233) Mimosa (763) Umbrella (1519) IMDB (66) LUBM (145) 391 375375 34 34 105 90 315 30 31 32 25 20 20 56 47 83 87 98 101 MDQM MDQM MDQM MDQM MDQM MSQM MSQM MSQM MSQM MSQM MQM MQM MQM MQM MQM >⊥M >⊥M >⊥M >⊥M >⊥M References [1] Artale, A., Calvanese, D., Kontchakov, R., Zakharyaschev, M.: The DL-Lite fam- ily and relations. Journal of Artificial Intelligence Research 36, 1–69 (2009) [2] Baader, F., Calvanese, D., McGuinness, D.L., Nardi, D., Patel-Schneider, P.F. (eds.): The Description Logic Handbook. Cambridge University Press (2003) [3] Baier, C., Katoen, J.-P.: Principles of Model Checking. MIT Press (2007) [4] Botoeva, E., Calvanese, D., Rodriguez-Muro, M.: Expressive approximations in DL-Lite ontologies. In: Proc. of AIMSA. pp. 21–31. Springer (2010). [5] Calvanese, D., De Giacomo, G., Lembo, D., Lenzerini, M., Rosati, R.: Data com- plexity of query answering in description logics. In: Proc. of KR. pp. 260–270 (2006) [6] Calvanese, D., De Giacomo, G., Lembo, D., Lenzerini, M., Rosati, R.: Tractable reasoning and efficient query answering in description logics: The DL-Lite family. Journal of Automated Reasoning 39(3), 385–429 (2007) [7] Cuenca Grau, B., Horrocks, I., Kazakov, Y., Sattler, U.: Modular reuse of ontolo- gies: Theory and practice. Journal of Artificial Intelligence Research 31, 273–318 (2008) [8] Grädel, E., Walukiewicz, I.: Guarded fixed point logic. In: Proc. of LICS. pp. 45–54 (1999) [9] Kontchakov, R., Wolter, F., Zakharyaschev, M.: Logic-based ontology comparison and module extraction, with an application to DL-Lite. Artificial Intelligence 174, 1093–1141 (2010) [10] Lutz, C., Wolter, F.: Deciding inseparability and conservative extensions in the description logic EL. Journal of Symbolic Computation 45(2):194–228 (2010) [11] Pan, J.Z., Thomas, E.: Approximating OWL-DL Ontologies. In: Proc. of AAAI. pp. 1434–1439 (2007) [12] Stuckenschmidt, H., Parent, C., Spaccapietra, S. (eds.): Modular Ontologies, LNCS, vol. 5445. Springer (2009) [13] Vardi, M.Y.: Reasoning about the past with two-way automata. In: Proc. of ICALP. LNCS, vol. 1443, pp. 628–641. Springer (1998)