Introduction

Ontology Module Extraction via Datalog Reasoning?

Ana Armas Romero

Mark Kaminski

Bernardo Cuenca Grau

Ian Horrocks

0 0 Department of Computer Science, University of Oxford , UK

We propose a novel approach in which module extraction is reduced to a reasoning problem in datalog. Our approach generalises existing approaches and can be tailored to preserve only speci c kinds of entailments, which allows us to extract signi cantly smaller modules. An evaluation on widely-used ontologies shows very encouraging results. Module extraction is the task of computing, given an ontology T and a signature of interest , a (preferably small) subset M of T (a module) that preserves all relevant entailments in T over the set of symbols . Such an M is indistinguishable from T w.r.t. , and T can be safely replaced with M in applications of T that use only the symbols in . Module extraction has received a lot of attention in recent years [27, 6, 25, 17, 10, 9, 23], and modules have found a wide range of applications, including ontology reuse [6, 13], matching [12], debugging [29, 19] and classi cation [1, 30, 4]. Preservation of relevant entailments is formalised via inseparability relations. The strongest notion is model inseparability, which requires that it must be possible to turn any model of M into a model of T by (re-)interpreting only the symbols outside ; such an M preserves all second-order entailments of T w.r.t. [15]. A weaker and more exible notion is deductive inseparability, which requires only that T and M entail the same -formulas in a given query language. Unfortunately, the decision problems associated with module extraction are invariably of high complexity, and often undecidable. For model inseparability, checking whether M is a -module in T is undecidable even if T is restricted to be in the description logic (DL) E L, for which standard reasoning is tractable. For deductive inseparability, the problem is typically decidable for lightweight DLs and \reasonable" query languages, albeit of high worst-case complexity; e.g., the problem is already ExpTime-hard for E L if we consider concept inclusions as the query language [20]. Practical algorithms that ensure minimality of the extracted modules are known only for acyclic E LI [15] and DL-Lite [17]. Practical module extraction techniques are typically based on sound approximations: they ensure that the extracted fragment M is a module (i.e., inseparable

Introduction

? This paper summarises the results of our work published in [ 2, 3 ]. from T w.r.t. ), but they give no minimality guarantee. The most popular such techniques are based on a family of polynomially checkable conditions called syntactic locality [ 5, 6, 24 ]; in particular, ?-locality and >? -locality. Each localitybased module M enjoys a number of desirable properties for applications: (i) it is model inseparable from T ; (ii) it is depleting, i.e., T n M is inseparable from the empty ontology w.r.t. ; (iii) it contains all justi cations (a.k.a. explanations) in T of every -formula entailed by T ; and (iv) last but not least, it can be computed e ciently, even for very expressive ontology languages.

Locality-based techniques are easy to implement, and surprisingly e ective in practice. However, the extracted modules can be rather large, which limits their usefulness in some applications [ 8 ]. One way to address this is to develop techniques that produce smaller modules while still preserving properties (i){(iii). E orts in this direction have con rmed that locality-based modules can be far from optimal in practice [ 10 ]; however, these techniques only apply to rather restricted languages and utilise algorithms with high worst-case complexity.

Another approach to computing smaller modules is to weaken properties (i){(iii), which are stronger than required for many applications. In particular, model inseparability (property (i)) is a very strong condition, and deductive inseparability would usually su ce, with the query language determining which kinds of consequence are preserved; in modular classi cation, for example, only atomic concept inclusions need to be preserved. However, practical module extraction techniques for expressive ontology languages yield modules that satisfy all three properties, and hence are potentially much larger than they need to be.

In this paper, we propose a technique that reduces module extraction to a reasoning problem in datalog. The connection between module extraction and datalog was rst observed in [28], where it was shown that locality ?-module extraction for E L ontologies could be reduced to propositional datalog reasoning. Our approach takes this connection much farther by generalising both localitybased and reachability-based [23] modules for expressive ontology languages in an elegant way. A key distinguishing feature of our technique is that it can extract deductively inseparable modules, with the query language tailored to the requirements of the application at hand, which allows us to relax Property (i) and extract signi cantly smaller modules. In all cases our modules preserve the nice features of locality: they are widely applicable (even beyond DLs), they can be e ciently computed, they are depleting (Property (ii)) and they preserve all justi cations of relevant entailments (Property (iii)). We have implemented our approach using the RDFox datalog engine [22]. Our evaluation shows that module size consistently decreases as we consider weaker inseparability relations, which could signi cantly improve the usefulness of modules in applications. 2

Preliminaries

Ontologies and Queries We use standard rst-order logic and assume familiarity with description logics, ontology languages and theorem proving. A signature is a set of predicates and Sig(F ) denotes the signature of a set of formulas F . To capture a wide range of KR languages, we formalise ontology axioms as rules: function-free sentences of the form 8x:['(x) ! 9y:[Win=1 i(x; y)]], where ', i are conjunctions of distinct atoms. Formula ' is the rule body and 9y:[Win=1 i(x; y)] is the head. Universal quanti cation is omitted for brevity. Rules are required to be safe (all variables in the head occur in the body). A TBox T is a nite set of rules; TBoxes mentioning equality ( ) are extended with its standard axiomatisation. A fact is a function-free ground atom. An ABox A is a nite set of facts. A positive existential query (PEQ) is a formula q(x) = 9y:'(x; y), where ' is built from function-free atoms using only ^ and _. Datalog A rule is datalog if its head has exactly one atom (which could be ?, the nullary falsehood predicate) and all variables are universally quanti ed. A datalog program P is a set of datalog rules. Given P and an ABox A, their materialisation is the set of facts entailed by P [A, which can be computed by means of forwardchaining. A fact is a consequence of a datalog rule r = Vin=1 i0 ! and facts 1; : : : ; n if = with a most-general uni er (MGU) of i; i0 for each 1 i n. A (forward-chaining) proof of in P [ A is a pair = (T; ) where T is a tree, is a mapping from nodes in T to facts such that for each node v the following holds: 1. (v) = if v is the root of T ; 2. if v is a leaf then (v) 2 A or (! (v)) 2 P; and 3. if v has children w1; : : : ; wn then (v) is a consequence of r and (w1); : : : ; (wn). Forward-chaining is sound (if has a proof in P [ A then it is entailed by P [ A) and complete (if is entailed by P [ A then either has a proof in P [ A or so does ?). Finally, the support of is the set of rules occurring in some proof of in P [ A.

Inseparability Relations & Modules We next recapitulate the most common inseparability relations studied in the literature. TBoxes T and T 0 are { -model inseparable (T m T 0), if for every model I of T there is a model

J of T 0 with the same domain s.t. AI = AJ for each A 2 , and vice versa. { -query inseparable (T q T 0) if for every Boolean PEQ q and -ABox A we have T [ A j= q i T 0 [ A j= q. { -fact inseparable (T f T 0) if for every fact and ABox A over we have T [ A j= i T 0 [ A j= . { -implication inseparable (T i T 0) if for each ' of the form A(x) ! B(x) with A; B 2 , T j= ' i T 0 j= '.

These relations are naturally ordered from strongest to weakest: m ( q ( f ( i for each non-trivial .

Given an inseparability relation for , a subset M T is a -module of T if T M. Furthermore, M is minimal if no M0 ( M is a -module of T . 3

Module Extraction via Datalog Reasoning

In this section, we present our approach to module extraction by reduction into a reasoning problem in datalog. Our approach builds on recent techniques that exploit datalog engines for ontology reasoning [ 16, 26, 31 ]. In what follows, we x an arbitrary TBox T and signature Sig(T ). Unless otherwise stated, our de nitions and theorems are parameterised by such T and . We assume w.l.o.g. that rules in T share no existentially quanti ed variables. For simplicity, we also assume that T contains no constants (all results extend to constants [ 3 ]). Our overall strategy to extract a module M of T for an inseparability relation z , with z 2 fm; q; f; ig, can be summarised by the following steps: 1. Pick a substitution mapping all existentially quanti ed variables in T to constants, and transform T into a datalog program P by (i) Skolemising all rules in T using and (ii) turning disjunctions into conjunctions while splitting them into di erent rules, thus replacing each function-free disjunctive rule of the form '(x) ! Win=1 i(x) with datalog rules '(x) ! 1(x); : : : ; '(x) ! n(x). 2. Pick a -ABox A0 and materialise P [ A0. 3. Pick a set Ar of \relevant facts" in the materialisation and compute the supporting rules in P for each such fact. 4. The module M consists of all rules in T that yield a supporting rule in P.

Thus, M is fully determined by the substitution and the ABoxes A0, Ar. The main intuition behind our module extraction approach is that we can pick , A0 and Ar (and hence M) such that each proof of a -consequence ' of T to be preserved can be embedded in a forward chaining proof 0 in P [ A0 of a relevant fact in Ar. Such an embedding satis es the key property that, for each rule r involved in , at least one corresponding datalog rule in P is involved in 0. In this way we ensure that M contains the necessary rules to entail '. This approach, however, does not ensure minimality of M: since P is a strengthening of T there may be proofs of a relevant fact in P [ A0 that do not correspond to a -consequence of T , which may lead to unnecessary rules in M.

To illustrate how our strategy might work in practice, consider T ex in Fig. 1, = fB; C; D; Gg, and that we want a module M that is -implication inseparable from T ex. This is a simple case since ' = D(x) ! G(x) is the only non-trivial -implication entailed by T ex; thus, for M to be a module we only need M j= '. Note that proving T ex j= ' amounts to proving T ex [ fD(a)g j= G(a) (with a a fresh constant). Figure 2(a) depicts a hyper-resolution tree showing how G(a) can be derived from the clauses corresponding to r4{r6 and D(a), with rule r4

r6 (a) r0 4 D(a) transformed into clauses r40 = D(x) ! S(x; f (x3)) and r400 = D(x) ! E(f (x3)). Hence M = fr4{r6g is a -implication inseparable module of T ex, and as G(a) cannot be derived from any subset of fr4{r6g, M is also minimal.

We pick A0 to contain the initial fact D(a), Ar to contain the fact to be proved G(a), and we make map variable y3 in r4 to a fresh constant c, in which case rule r4 corresponds to the rules D(x) ! S(x; c) and D(x) ! E(c) in P.

Figure 2(b) depicts a forward chaining proof 0 of G(a) in P [ fD(a)g. As shown, can be embedded in 0 via by mapping functional terms over f to the fresh constant c. This way, the rules involved in are mapped to the datalog rules involved in 0 via . Hence, we will extract the (minimal) module M = fr4{r6g. The substitution and the ABoxes A0 and Ar, which determine the extracted module, can be chosen in di erent ways to ensure the preservation of di erent kinds of -consequences. The following notion of a module setting captures in a declarative way the main elements of our approach.

De nition 3.1. A module setting for T and is a tuple = h ; A0; Ari with a substitution from existentially quanti ed variables in T to constants, A0 a -ABox, Ar a (Sig(T ) [ f?g)-ABox, and s.t. no constant in occurs in T .

The program of is the smallest datalog program P containing, for each r = '(x) ! 9y:[Win=1 i(x; y)] in T , the rule ' ! ? if n = 0 and all rules ' ! for each 1 i n and each atom in i. The support of is the set of rules r 2 P that support a fact from Ar in P [ A0. The module M of is the set of rules in T that have a corresponding datalog rule in the support of . 3.3

Modules for each Inseparability Relation We next consider each inseparability relation z , where z 2 fm; q; f; ig, and formulate a speci c setting z which provably yields a z -module of T . Implication Inseparability The example in Sect. 3.1 suggests a natural setting i = h ; A0; Ari that guarantees implication inseparability. As in our example, we pick to be as \general" as possible by Skolemising each existentially quanti ed variable to a fresh constant. For A and B predicates of the same arity n, proving that T entails a -implication ' = A(x1; : : : ; xn) ! B(x1; : : : ; xn), amounts to showing that T [ fA(a1; : : : ; an)g j= B(a1; : : : ; an) for fresh constants a1; : : : ; an. Thus, following the ideas of our example, we initialise A0 with a fact A(c1A; : : : ; cnA) for each n-ary predicate A 2 , and Ar with a fact B(c1A; : : : ; cnA) for each pair of n-ary predicates fB; Ag with B 6= A.

De nition 3.2. For each existentially quanti ed variable yj in T , let cyj be a fresh constant. Furthermore, for each A 2 of arity n, let c1A; : : : ; cnA be also fresh constants. The setting i = h i; Ai0; Airi is de ned as follows: { i = f yj 7! cyj j yj existentially quanti ed in T g, { Ai0 = fA(c1A; : : : ; cnA) j A n-ary predicate in g, and { Air = fB(c1A; : : : ; cnA) j A 6= B n-ary predicates in g [ f?g.

The setting i is reminiscent of the datalog encodings typically used to check whether a concept A is subsumed by concept B w.r.t. a \lightweight" ontology T [ 18, 26 ]. There, variables in rules are Skolemised as fresh constants to produce a datalog program P and it is then checked whether P [ fA(a)g j= B(a). Theorem 3.3. M i i T .

Fact Inseparability The setting i in Def. 3.2 cannot be used to ensure fact inseparability. Consider again T ex and = fB; C; D; Gg, for which M i = fr4; r5; r6g. For A = fB(a); C(a)g we have T ex [ A j= G(a) but M i [ A 6j= G(a), and hence M i is not fact inseparable from T ex.

More generally, M i will only preserve -fact entailments T [ A j= where A is a singleton. However, for a module to be fact inseparable from T it must preserve all -facts when coupled with any -ABox. We achieve this by choosing A0 to be the critical ABox for , which consists of all facts that can be constructed using and a single fresh constant [21]. Since every -ABox can be homomorphically mapped into the critical -ABox, we can show that all proofs of a -fact in T [ A are embeddable into a proof of a relevant fact in P [ A0. De nition 3.4. Let constants cyi be as in Def. 3.2, and let be a fresh constant. The setting f = h f ; Af0; Afri is de ned as follows: (i) f = i, (ii) Af0 = f A( ; : : : ; ) j A 2 g, and (iii) Afr = Af0 [ f?g.

The datalog programs for i and f coincide; hence, the only di erence between the two settings is in the de nition of their corresponding ABoxes. In our example, both Af0 and Afr contain facts B( ), C( ), D( ), and G( ). Clearly, P f [A0 j= G( ) and the proof additionally involves rule r3. Thus M f = fr3; r4; r5; r6g. Theorem 3.5. M f f T .

Query Inseparability Positive existential queries constitute a much richer query language than facts as they allow for existentially quanti ed variables. Thus, the query inseparability requirement invariably leads to larger modules.

For instance, let T = T ex and = fA; Bg. Given the -ABox A = fA(a)g and -query q = 9y:B(y) we have T ex [A j= q (due to rule r1). The module M f is, however, empty. Indeed, the materialisation of P f [fA( )g consists of the additional facts R( ; cy1 ) and B(cy1 ) and hence contains no relevant fact mentioning only . Thus, M f [ A 6j= q and M f is not query inseparable from T ex.

Our example suggests that, although the critical ABox is constrained enough to embed every -ABox, we need to consider additional relevant facts to capture all proofs of a -query. In particular, rule r1 implies that B contains an instance whenever A does: a dependency that is then checked by q. This can be captured by considering fact B(cy1 ) as relevant, in which case r1 would be in the module.

More generally, we consider a module setting that di ers from f only in that all -facts (and not just those over ) are considered as relevant. De nition 3.6. Let constants qcy=i and be as in Def. 3.4. The setting q = h q; Aq0; Arqi is as follows: (i) f , (ii) Aq0 = Af0, and (iii) Arq consists of ? and all -facts A(a1; : : : ; an) with each aj either a constant cyi or . Theorem 3.7. M q q T .

Model Inseparability The modules generated by q may not be model inseparable from T . To see this, let T = T ex and = fA; D; Rg, in which case M q = fr1; r2g. The interpretation I where I = fa; bg, AI = fag, BI = CI = fbg, DI = ; and RI = f(a; b)g is a model of M q . However, I cannot be extended to a model of r3 (and hence of T ) without reinterpreting A, R or D.

To achieve model inseparability, it su ces to ensure that each model of the module can be extended to a model of T in a uniform way. Thus, M = fr1; r2; r3g is a m -module of T ex since all its models can be extended by interpreting E, F and G as the domain, H as empty, and S as the Cartesian product of the domain. We can capture this idea in our framework by means of the following setting. De nition 3.8. The setting m = h m; A0m; Armi is as follows: m maps each existentially quanti ed yj to the fresh constant , A0m = A0, and Arm = A0 [f?g. f m In our example, P m [ A0m entails the relevant facts A( ), R( ; ) and D( ), and hence M m = fr1; r2; r3g.

Theorem 3.9. M m

m T . 3.4

Modules for Ontology Classi cation Module extraction has been exploited for optimising ontology classi cation [ 1, 30, 4 ]. In this case, it is not only required that modules are implication inseparable from T , but also that they preserve all implications A(x) ! B(x) with A 2 but B 2= . This requirement can be captured as given next.

De nition 3.10. TBoxes T and T 0 are -classi cation inseparable (T if for each ' of the form A(x) ! where A 2 and either = ? or for B 2 Sig(T [ T 0), we have T j= ' i T 0 j= '.

c T 0) = B(x) Classi cation inseparability is a stronger requirement than implication inseparability. For T = fA(x) ! B(x)g and = fAg, M = ; is implication inseparable from T , whereas classi cation inseparability requires that M = T .

Modular reasoners such as MORe [ 1 ] and Chainsaw [30] rely on locality ?modules, which satisfy this requirement. Each model of a ?-module M can be extended to a model of T by interpreting all additional predicates as empty, which is not possible if A 2 and T entails A(x) ! B(x) but M does not. We can cast ?-modules in our framework with the following setting, which extends m in Def. 3.8 by also considering as relevant facts involving predicates not in . De nition 3.11. The setting b = h b; Ab0; Arbi is as follows: b = A0m, and Ar consists of ? and all facts A( ; : : : ; ) where A 2 Sig(T ). m, Ab0 = The use of ?-modules is, however, stricter than is needed for ontology classi cation. For instance, if we consider T = T ex and = fAg we have that M b contains all rules r1{r6, but since A does not have any subsumers in T ex the empty TBox is already classi cation inseparable from T ex.

The following module setting extends i in Def. 3.2 to ensure classi cation inseparability. As in the case of b in Def. 3.11, the only required modi cation is to also consider as relevant facts involving predicates outside . De nition 3.12. Setting c = ( c; Ac0; Acr) is as follows: c = i, Ac0 = Ai0, and Acr consists of ? and all facts B(c1A; : : : ; cnA) s.t. A 6= B are n-ary predicates, A 2 and B 2 Sig(T ).

Indeed, given T = T ex and Theorem 3.13. M c c T .

= fAg, the module for c is empty, as desired. 3.5

Additional Properties of Modules Although the essential property of a module M is that it captures all relevant -consequences of T , in some applications modules are expected to satisfy additional requirements. For ontology reuse, it is sometimes desirable that a module M does not \leave any relevant information behind" in the sense that T nM does not entail any relevant -consequence|a property known as depletingness [ 17 ].

De nition 3.14. Let

is depleting if T n M z be an inseparability relation. A z ;.

z -module M of T Note that not all modules are depleting: for some relevant -entailment ' we may have M j= ' (as required by the de nition of module), but also (T nM) j= ', in which case M is not depleting. The following theorem establishes that all modules de ned in Sect. 3.3 are depleting.

Theorem 3.15. M z is depleting for each z 2 fm; q; f; i; cg.

Another application of modules is to optimise the computation of justi cations: minimal subsets of a TBox that su ce to entail a given formula [ 14, 29 ]. De nition 3.16. Let T j= '. A justi cation for ' in T is a minimal subset T 0 T such that T 0 j= '.

Justi cations are displayed in ontology development platforms as explanations of why an entailment holds, and tools typically compute all of them. Extracting justi cations is a computationally intensive task, and locality-based modules have been used to reduce the size of the problem: if T 0 is a justi cation of ' in T , then T 0 is contained in a locality module of T for = Sig('). Our modules are also justi cation-preserving, and we can adjust our modules depending on what kind of rst-order sentence ' is.

Theorem 3.17. Let T 0 be a justi cation for a rst-order sentence ' in T and let Sig(') . Then, T 0 M m . Additionally, the following properties hold: (i) if ' is a rule, then T 0 M q ; (ii) if ' is datalog, then T 0 M f ; and (iii) if ' is of the form A(x) ! B(x) or A(x) ! ?, then T 0 M i ; nally, if ' satis es A 2 ; B 2 Sig(T ), then T 0 M c . 3.6

Complexity of Module Extraction We conclude this section by showing that our modules can be e ciently computed in most practically relevant cases.

Theorem 3.18. Let m be a non-negative integer and L a class of TBoxes s.t. each rule in a TBox from L has at most m distinct universally quanti ed variables. The following problem is tractable: given z 2 fq; f; i; cg, T 2 L, and r 2 T , decide whether r 2 M z . The problem is tractable for arbitrary L if z = m.

We now provide a proof sketch for this result. Checking whether a datalog program P and an ABox A entail a fact is feasible in O(jPj nv), with n the number of constants in P [ A and v the maximum number of variables in a rule from P [ 7 ]. Thus, although datalog reasoning is exponential in the size of v (and hence of P), it is tractable if v is bounded by a constant.

Given arbitrary T and , and for z 2 fm; q; f; i; cg, the datalog program P z can be computed in linear time in the size of jT j. The number of constants n in z (and hence in P z [Az0) is linearly bounded in jT j, whereas the maximum number of variables v coincides with the maximum number of universally quanti ed variables in a rule from T . As shown in [31], computing the support of a fact in a datalog program is no harder than fact entailment, and thus module extraction in our approach is feasible in O(jT j nv), i.e., tractable for ontology languages where rules have a bounded number of variables (as is the case for most DLs). Finally, for z = m the setting m involves a single constant and module extraction boils down to propositional datalog reasoning (a tractable problem regardless of T ). 3.7

Module Containment and Optimality Intuitively, the more expressive the language for which preservation of consequences is required the larger modules need to be. The following proposition shows that our modules are consistent with this intuition.

Proposition 3.19. M i

M f

M q

M m

M b ; M i

Finally, we ask ourselves whether each z with z 2 fq; f; i; m; cg is optimal for its inseparability relation in the sense that there is no setting producing smaller modules. To make such statements precise we need to consider families of module settings, i.e., functions that assign a module setting to each pair of T and . De nition 3.20. A setting family is a function that maps a TBox T and signature to a module setting for T and . Family is uniform if (i) for every and pair T ; T 0 with the same number of existentially quanti ed variables, (T ; ) and (T 0; ) are isomorphic; (ii) for every T and 0, (T ; ) is a restriction of (T ; 0) to . Let z 2 fi; f; q; m; cg; then is z-admissible if, for each T and , M (T ; ) is a z -module of T . Finally, is z-optimal if M (T ; ) M 0(T ; ) for every T , and every uniform 0 that is z-admissible. Uniformity ensures that settings do not depend on the speci c shape of rules in T , but rather only on and the number of existentially quanti ed variables in T . In turn, admissibility ensures that each setting yields a module. The (uniform and admissible) family z for each setting z in Sects. 3.3 and 3.4 is de ned in the obvious way: for each T and , z(T ; ) is the setting z for T and .

Theorem 3.21.

z is z-optimal for z 2 fi; m; cg.

In contrast, q and f are not optimal. To see this, let T = fA(x) ! B(x); B(x) ! A(x)g and = fAg. The empty TBox is fact inseparable from T since the only -consequence of T is the tautology A(x) ! A(x). However, M f = T since fact A(a) is in Afr and its support is included in the module. One can provide a family of settings that distinguishes tautological from nontautological inferences; however, this family yields settings of exponential size in jT j, which is undesirable in practice. 4

Evaluation

We have implemented a prototype system for module extraction, called PrisM, that uses RDFox [22] for datalog materialisation and exploits some code from PAGOdA [31] for computing the support of an entailed fact. We have evaluated PrisM on a set of ontologies identi ed by Glimm et al. [ 11 ] as non-trivial for reasoning. We have normalised all ontologies to make axioms equivalent to rules.1

We compared the size of our modules with the locality-based modules computed using the OWL API. We have followed the experimental methodology from [ 8 ] where two kinds of signatures are considered: genuine signatures corresponding to the signature of individual axioms, and random signatures. Unlike in [ 8 ], we extracted random signatures using a randomised graph sampling algorithm provided by RDFox. The advantage of this approach is that the resulting 1 The ontologies used in our experiments are available for download at https://krr-nas.cs.ox.ac.uk/2015/jair/PrisM/testOntologies.zip. signatures are \semantically connected", which is likely to be the case in practical applications. For each type of signature and ontology, we took a sample of 400 runs and averaged module sizes. Random signatures were obtained from samples of size 1=1000 the size of the original ontology. We present results for the 7 largest ontologies from [ 11 ] in terms of signature (this selection is representative for the behaviour on the smaller ontologies; see [ 3 ]). All experiments were performed on a server with 2 Intel Xeon E5-2670 processors and 90GB of allocated RAM, running RDFox on 16 threads.

Table 1 summarises our results. We compared ?-modules with the modules for c (Sect. 3.4) and >? -modules with those for m, q, f , and i (Sect. 3.3). We can see that module size consistently decreases as we consider weaker inseparability relations. In some cases, the modules for c are over 3 orders of magnitude smaller than ?-modules. The di erence between >? -modules and i modules can also be considerable, reaching 2 orders of magnitude. In fact, c, f , and i modules are sometimes empty, meaning that no two predicates in are in an implication relationship (which can happen for large ontologies and small ). Also note that our modules for model inseparability slightly improve on >? -modules. Finally, recall that our modules may not be minimal for their inseparability relation. Since techniques for extracting minimal modules are available only for model inseparability, and for restricted languages, we could not assess how close our modules are to minimal ones.

Computation times were generally higher for c than for the other settings due to the larger signature involved in computing modules for c. For random signatures, average times (over all settings) were 54s for GALEN, 2s for FLY, 5s for FMA, 13s for Gazetteer, 7s for Molecule, 32s for SNOMED, and 44s for NCI, which is considerably higher than for locality-based modules, but still acceptable for some applications. For genuine signatures, the times were accordingly lower.

It should be noted that PrisM currently relies on RDFox, which is not optimised for module extraction and is used in a black-box fashion. We conjecture that the performance of our approach can be considerably improved by dedicated systems. Another interesting task would be integrating our techniques in existing modular reasoners as well as in systems for justi cation extraction. Finally, the issue of optimality of our approach requires further investigation.

Acknowledgements

This work was supported by the Royal Society, the EPSRC projects MaSI3, Score!, DBOnto, and ED3, and by the EU FP7 project OPTIQUE. 19. Ludwig, M.: Just: a tool for computing justi cations w.r.t. ELH ontologies. In:

ORE. pp. 1{7 (2014) 20. Lutz, C., Wolter, F.: Deciding inseparability and conservative extensions in the description logic EL. J. Symb. Comput. 45(2), 194{228 (2010) 21. Marnette, B.: Generalized schema-mappings: From termination to tractability. In:

PODS. pp. 13{22 (2009) 22. Motik, B., Nenov, Y., Piro, R., Horrocks, I., Olteanu, D.: Parallel materialisation of datalog programs in centralised, main-memory RDF systems. In: AAAI. pp. 129{137 (2014) 23. Nortje, R., Britz, K., Meyer, T.: Reachability modules for the description logic

SRIQ. In: LPAR. pp. 636{652 (2013) 24. Sattler, U., Schneider, T., Zakharyaschev, M.: Which kind of module should I extract? In: DL (2009) 25. Seidenberg, J., Rector, A.L.: Web ontology segmentation: Analysis, classi cation and use. In: WWW. pp. 13{22 (2006) 26. Stefanoni, G., Motik, B., Horrocks, I.: Introducing nominals to the combined query answering approaches for EL. In: AAAI. pp. 1177{1183 (2013) 27. Stuckenschmidt, H., Parent, C., Spaccapietra, S. (eds.): Modular Ontologies: Concepts, Theories and Techniques for Knowledge Modularization, LNCS, vol. 5445.

Springer (2009) 28. Suntisrivaraporn, B.: Module extraction and incremental classi cation: A pragmatic approach for ontologies. In: ESWC. pp. 230{244 (2008) 29. Suntisrivaraporn, B., Qi, G., Ji, Q., Haase, P.: A modularization-based approach to nding all justi cations for OWL DL entailments. In: ASWC. pp. 1{15 (2008) 30. Tsarkov, D., Palmisano, I.: Chainsaw: A metareasoner for large ontologies. In: ORE (2012) 31. Zhou, Y., Nenov, Y., Cuenca Grau, B., Horrocks, I.: Pay-as-you-go OWL query answering using a triple store. In: AAAI. pp. 1142{1148 (2014)

Armas

Romero , A. , Cuenca Grau , B. , Horrocks , I.: MORe: Modular combination of OWL reasoners for ontology classi cation . In: ISWC . pp. 1 { 16 ( 2012 )

Armas

Romero , A. , Kaminski , M. ,

Cuenca

Grau , B. , Horrocks , I. : Ontology module extraction via datalog reasoning . In: AAAI . pp. 1410 { 1416 ( 2015 )

Armas

Romero , A. , Kaminski , M. ,

Cuenca

Grau , B. , Horrocks , I. : Module extraction in expressive ontology languages via datalog reasoning . J. Artif. Intell. Res . ( 2016 ), to appear

Cuenca

Grau , B. , Halaschek-Wiener , C. , Kazakov , Y. , Suntisrivaraporn , B. : Incremental classi cation of description logics ontologies . J. Autom. Reason . 44 ( 4 ), 337 { 369 ( 2010 )

Cuenca

Grau , B. , Horrocks , I. , Kazakov , Y. , Sattler , U. : Just the right amount: Extracting modules from ontologies . In: WWW . pp. 717 { 726 ( 2007 )

Cuenca

Grau , B. , Horrocks , I. , Kazakov , Y. , Sattler , U. : Modular reuse of ontologies: Theory and practice . J. Artif. Intell. Res . 31 , 273 { 318 ( 2008 )

7. Dantsin , E. , Eiter , T. , Gottlob , G. , Voronkov , A. : Complexity and expressive power of logic programming . ACM Comput. Surv . 33 ( 3 ), 374 { 425 ( 2001 )

Del

Vescovo , C. , Klinov , P. , Parsia , B. , Sattler , U. , Schneider , T. , Tsarkov , D. : Empirical study of logic-based modules: Cheap is cheerful . In: DL. pp. 144 { 155 ( 2013 )

Del

Vescovo , C. , Parsia , B. , Sattler , U. , Schneider , T. : The modular structure of an ontology: Atomic decomposition . In: IJCAI . pp. 2232 { 2237 ( 2011 )

10. Gatens , W. , Konev , B. , Wolter , F. : Lower and upper approximations for depleting modules of description logic ontologies . In: ECAI . pp. 345 { 350 ( 2014 )

11. Glimm , B. , Horrocks , I. , Motik , B. , Stoilos , G. , Wang , Z. : HermiT: An OWL 2 reasoner . J. Autom . Reason. 53 ( 3 ), 245 { 269 ( 2014 )

12. Jimenez-Ruiz , E. , Cuenca Grau , B. : LogMap: Logic-based and scalable ontology matching . In: ISWC . pp. 273 { 288 ( 2011 )

13. Jimenez-Ruiz , E. , Cuenca Grau , B. , Sattler , U. , Schneider , T. , Berlanga

Llavori

, R.: Safe and economic re-use of ontologies: A logic-based methodology and tool support . In: ESWC . pp. 185 { 199 ( 2008 )

14. Kalyanpur , A. , Parsia , B. , Horridge , M. , Sirin , E.: Finding all justi cations of OWL DL entailments . In: ISWC. pp. 267 { 280 ( 2007 )

15. Konev , B. , Lutz , C. , Walther , D. , Wolter , F. : Model-theoretic inseparability and modularity of description logic ontologies . Artif. Intell . 203 , 66 { 103 ( 2013 )

16. Kontchakov , R. , Lutz , C. , Toman , D. , Wolter , F. , Zakharyaschev , M.: The combined approach to ontology-based data access . In: IJCAI . pp. 2656 { 2661 ( 2011 )

17. Kontchakov , R. , Wolter , F. , Zakharyaschev , M. : Logic-based ontology comparison and module extraction, with an application to DL-Lite . Artif. Intell . 174 ( 15 ), 1093 { 1141 ( 2010 )

18. Krotzsch, M. , Rudolph , S. , Hitzler , P. : ELP: Tractable rules for OWL 2 . In: ISWC. pp. 649 { 664 ( 2008 )