=Paper=
{{Paper
|id=None
|storemode=property
|title=Syntactic vs. Semantic Locality: How Good Is a Cheap Approximation?
|pdfUrl=https://ceur-ws.org/Vol-875/regular_paper_4.pdf
|volume=Vol-875
|dblpUrl=https://dblp.org/rec/conf/womo/VescovoKPS0T12
}}
==Syntactic vs. Semantic Locality: How Good Is a Cheap Approximation?==
Syntactic vs. Semantic Locality: How Good Is a Cheap Approximation? Chiara Del Vescovo1 , Pavel Klinov2 , Bijan Parsia1 , Uli Sattler1 , Thomas Schneider3 , and Dmitry Tsarkov1 1 University of Manchester, UK {delvescc,bparsia,sattler,tsarkov}@cs.man.ac.uk 2 University of Ulm, Germany pavel.klinov@uni-ulm.de 3 Universität Bremen, Germany tschneider@informatik.uni-bremen.de Abstract Extracting a subset of a given OWL ontology that captures all the ontology’s knowledge about a specified set of terms is a well- understood task. This task can be based, for instance, on locality-based modules (LBMs). These come in two flavours, syntactic and semantic, and a syntactic LBM is known to contain the corresponding semantic LBM. For syntactic LBMs, polynomial extraction algorithms are known, implemented in the OWL API, and being used. In contrast, extracting semantic LBMs involves reasoning, which is intractable for OWL 2 DL, and these algorithms had not been implemented yet for expressive onto- logy languages. We present the first implementation of semantic LBMs and report on experiments that compare them with syntactic LBMs extracted from real-life ontologies. Our study reveals whether semantic LBMs are worth the additional extraction effort, compared with syntactic LBMs. 1 Introduction Extracting a subset of a given OWL ontology that captures all the ontology’s knowledge about a specified set of concept and role names is an interesting task for various applications, and it is by now well-understood [2,10,11]. In general, we consider a setting where, for a given signature, we want to determine a (small) subset of a given ontology such that any axiom over the signature entailed by the ontology is also entailed by the subset. For expressive logics, this task can be implemented by making use of the notion of locality, and results in what is known as locality-based modules (LBMs) [2]. Locality comes in many different flavours, in particular there are notions of syntactic and semantic locality. A syntactic LBM is known to contain the corresponding semantic LBM, but might also contain extra axioms which are, because they are not in the semantic LBM, superfluous for entailments over the given signature. Algorithms for the extrac- tion of syntactic LBMs are known that run in time that is polynomial in the size of the ontology (thus much cheaper than reasoning), implemented in the OWL 2 C. Del Vescovo, P. Klinov, B. Parsia, U. Sattler, T. Schneider, D. Tsarkov API, and being used. In contrast, despite the fact that algorithms for extracting semantic LBMs are known, until now and to the best of our knowledge, they had not yet been implemented. Moreover, these involve entailment checking, and are thus intractable for expressive profiles of OWL 2. We present the first implementation of semantic LBMs and report on exper- iments that compare them with syntactic LBMs extracted from real-life onto- logies. The contributions of this paper are as follows: we show with statistical significance that, for almost all members of a large corpus of existing ontologies, there is no difference between any syntactic LBM and its corresponding semantic LBM. In the few cases where differences occur, these differences are modest and not worth the increased computation time needed to compute semantic LBMs. In addition, we isolate two types of axioms that lead to differences, where one is a simple tautology that can, in principle, be detected by a straightforward addition to the syntactic locality checker. Furthermore, our results show that the extraction of semantic LBMs, which is in principle hard, seems feasible in practice. The lesson we learn from these results is that “Cheap is Great”! 2 Preliminaries We assume the reader to be familiar with OWL and the underlying description logic SROIQ [1,8], and will define the central notions around locality-based modularity [2]. Let NC be a set of concept names, and NR a set of role names. A signature Σ is a set of terms, i.e., a set Σ ⊆ NC ∪ NR of concept and role names. We can think of a signature as specifying a topic of interest. Axioms that only use terms from Σ can be thought of as “on-topic”, and all other axioms as “off-topic”. For instance, if Σ = {Animal, Duck, Grass, eats}, then Duck v ∃eats.Grass is on-topic, while Duck v Bird is off-topic. Any concept, role, or axiom that uses only terms from Σ is called a Σ-concept, Σ-role, or Σ-axiom. Given any such object X, we call the set of terms in X the signature of X and denote it with X. e Given an interpretation I, we denote its restriction to the terms in a signature Σ with I|Σ . Two interpretations I and J are said to coincide on a signature Σ, in symbols I|Σ = J |Σ , if ∆I = ∆J and X I = X J for all X ∈ Σ. There are a number of variants of the notion of conservative extensions, which capture the desired preservation of knowledge to different degrees. We focus on the deductive variant. Definition 1. Let M ⊆ O be SROIQ-ontologies and Σ a signature. (1) O is a deductive Σ-conservative extension (Σ-dCE ) of M if, for all SROIQ- axioms α with α e ⊆ Σ, it holds that M |= α if and only if O |= α. (2) M is a dCE-based module for Σ of O if O is a Σ-dCE of M. Unfortunately, deciding in general if a set of axioms is a module in this sense is hard or even impossible for expressive DLs [6,12], and finding a minimal one Syntactic vs. Semantic Locality: How Good Is a Cheap Approximation? 3 is even more so. However, “good sized” modules that are efficiently computable have been introduced [2]. They are based on the locality of single axioms, which means that, given Σ, the axiom can always be satisfied independently of the interpretation of the Σ-terms, but in a restricted way: by interpreting all non-Σ terms either as the empty set (∅-locality) or as the full domain4 (∆-locality). Definition 2. A SROIQ-axiom α is called ∅-local (∆-local) w.r.t. signature Σ if, for each interpretation I, there exists an interpretation J such that I|Σ = J |Σ , J |= α, and for each X ∈ α e \ Σ, X J = ∅ (for each C ∈ α e \ Σ, C J = ∆ J and for each R ∈ α e \ Σ, R = ∆ × ∆). It has been shown in [2] that M ⊆ O and all axioms in O \ M being ∅-local (or all axioms being ∆-local) w.r.t. Σ ∪ M f is sufficient for O to be a Σ-dCE of M. The converse does not hold: e.g., the axiom A ≡ B is neither ∅- nor ∆-local w.r.t. {A}, but the ontology {A ≡ B} is an {A}-dCE of the empty ontology. Furthermore, locality can be tested using available DL-reasoners [2], which makes this problem considerably easier than testing conservativity. However, reasoning in expressive DLs is still complex, e.g. N2ExpTime-complete for SROIQ [9]. In order to achieve tractable module extraction, a syntactic ap- proximation of locality has been introduced in [2]. The following definition cap- tures only the case of SHQ-TBoxes and can straightforwardly be extended to SROIQ ontologies. Definition 3. An axiom α is called syntactically ⊥-local (>-local ) w.r.t. signa- ture Σ if it is of the form C ⊥ v C, C v C > , C ⊥ ≡ C ⊥ , C > ≡ C > , R⊥ v R (R v R> ), or Trans(R⊥ ) (Trans(R> )), where C is an arbitrary concept, R is an arbitrary role name, R⊥ ∈ / Σ (R> ∈/ Σ), and C ⊥ and C > are from Bot(Σ) and Top(Σ) as defined in Part (a) (resp. (b)) of the table below. (a) ⊥-Locality Let A⊥ , R⊥ ∈ / Σ, C ⊥ ∈ Bot(Σ), C(i) > ∈ Top(Σ), n̄ ∈ N \ {0} Bot(Σ) ::= A⊥ | ⊥ | ¬C > | C u C ⊥ | C ⊥ u C | ∃R.C ⊥ | >n̄ R.C ⊥ | ∃R⊥ .C | >n̄ R⊥ .C Top(Σ) ::= > | ¬C ⊥ | C1> u C2> | >0 R.C (b) >-Locality Let A> , R> ∈ / Σ, C ⊥ ∈ Bot(Σ), C(i) > ∈ Top(Σ), n̄ ∈ N \ {0} Bot(Σ) ::= ⊥ | ¬C > | C u C ⊥ | C ⊥ u C | ∃R.C ⊥ | >n̄ R.C ⊥ Top(Σ) ::= A> | > | ¬C ⊥ | C1> u C2> | ∃R> .C > | >n̄ R> .C > | >0 R.C It has been shown in [2] that ⊥-locality (>-locality) of an axiom α w.r.t. Σ implies ∅-locality (∆-locality) of α w.r.t. Σ. Therefore, all axioms in O \ M being ⊥-local (or all axioms being >-local) w.r.t. Σ ∪ M f is sufficient for O to be a Σ-dCE of M. The converse does not hold; examples can be found in [2]. For each of the four locality notions, modules of O are obtained by starting with an empty set of axioms and subsequently adding axioms from O that are Σ- non-local. In order for this procedure to be correct, the signature against which 4 Or, in the case of roles, the set of all pairs of domain elements. 4 C. Del Vescovo, P. Klinov, B. Parsia, U. Sattler, T. Schneider, D. Tsarkov locality is checked has to be extended with the terms in the axioms that are added in each step, so that the resulting module M consists of all the non-local axioms with respect to Σ ∪ M.f Definition 4 (1) introduces locality-based mod- ules, which are always dCE-based modules [2], although not necessarily minimal ones. Modules based on syntactic (semantic) locality can be made smaller by iteratively nesting >- and ⊥-extraction (∆- and ∅-extraction), and the result is still a dCE-based module [2,13]. These so-called >⊥∗ -modules (∆∅∗ -modules) are introduced in Definition 4 (3). Definition 4. Let x ∈ {∅, ∆, ⊥, >}, yz ∈ {>⊥, ∆∅}, O an ontology and Σ a signature. (1) An ontology M is the x-module of O w.r.t. Σ if it is the output of Al- gorithm 1. We write M = x-mod(Σ, O). (2) An ontology M is the yz-module of O w.r.t. Σ, written M = yz-mod(Σ, O), if M = y-mod(Σ, z-mod(Σ, O)). (3) Let (Mi )i>0 be a sequence of ontologies such that M0 = O and Mi+1 = yz-mod(Σ, Mi ) for every i > 0. For the smallest n > 0 with Mn = Mn+1 , we call Mn the yz ∗ -module of O w.r.t. Σ, written M = yz ∗ -mod(Σ, O). Algorithm 1 Extract a locality-based module Input: Ont. O, sig. Σ, x ∈ {∅, ∆, ⊥, >} Output: x-module M of O w.r.t. Σ 0 M ← ∅; O ← O repeat changed ← false for all α ∈ O0 do if α not x-local w.r.t. Σ ∪ M f then M ← M ∪ {α}; O ← O0 \ {α}; changed ← true 0 until changed = false return M As for (1), it has been shown in [2] that the output M of Algorithm 1 does not depend on the order in which the axioms α are selected.5 Furthermore, the integer n in (3) exists because the sequence (Mi )i>0 is decreasing (more precisely, we have M0 ⊃ · · · ⊃ Mn = Mn+1 = . . . ). Due to monotonicity properties of locality-based modules, the dual notions of ⊥>∗ - and ∅∆∗ -modules are uninteresting because they coincide with those of >⊥∗ - and ∆∅∗ -modules. Roughly speaking, a ∆- or >-module for Σ gives a view from above because it contains all subclasses of class names in Σ, while a ∅- or ⊥-module for Σ gives a view from below since it contains all superconcepts of concept names in Σ. Modulo the locality check, Algorithm 1 runs in time cubic in |O| + |Σ| [2]. Modules based on ⊥/>-locality are therefore a feasible approximation for mod- ules based on ∅/∆-locality. In both cases, modules are extracted axiom by axiom 5 Our algorithm is a special case of the one in [2, Figure 4]. Syntactic vs. Semantic Locality: How Good Is a Cheap Approximation? 5 but, as said above, the ∅/∆-locality check is more complex. A module extractor is implemented in the OWL API6 and SSWAP7 . To summarize: 1. Given an ontology O, the semantic module Msem Σ for a signature Σ is con- tained in the corresponding syntactic module Msyn Σ for the same seed signa- ture.8 This means that in principle more unnecessary axioms for preserving entailments over Σ can end up in syntactic modules rather than in semantic modules. 2. The extraction of a syntactic module can be done in polynomial time w.r.t. the size of the ontology O. In contrast, the extraction of a semantic module is as hard as reasoning. 3 Experimental design The main aim of this paper is to investigate how well syntactic locality approx- imates semantic locality. In particular, we want to see how (un)likely it is that syntactic locality-based modules are larger than semantic locality-based ones and how large these differences are. We also want to understand empirically how much more costly semantic locality is in terms of performance. Selection of the Corpus. For our experiments, we have built a corpus containing: (1) from the TONES repository,9 those ontologies that have already been studied in a previous work on modularity [4]: Koala, Mereology, University, People, mini- Tambis, OWL-S, Tambis, Galen; (2) all ontologies from the NCBO BioPortal ontology repository.10 We then filter out all those the ontologies for which at least one of the fol- lowing problems occurs: the ontology is impossible to download; the .owl file is corrupted when downloaded; the file is not parseable; the ontology is incon- sistent. Furthermore, due to time constraints, we exclude from this preliminary investigation all ontologies whose size exceeds 10, 000 axioms. This selection results in a corpus of 156 ontologies, which greatly differ in size and expressivity [7], as summarized in Table 3. For a full list of the corpus, please refer to the technical report: http://arxiv.org/abs/1207.1641 Repository Range of expressivity Range #axs. Range sig. size BioPortal ALCN -SHIN (D)/SOIN (D) 38–4,735 21–3,161 TONES AL-SROIF(D)/SHOIQ(D) 13–9,629 14–9,221 Table 1. Ontology corpus 6 http://owlapi.sourceforge.net 7 http://sswap.info 8 Recall that ⊥-syntactic modules approximate ∅-semantic modules, while >-syntactic modules approximate ∆-semantic modules. 9 http://owl.cs.manchester.ac.uk/repository/ 10 http://bioportal.bioontology.org 6 C. Del Vescovo, P. Klinov, B. Parsia, U. Sattler, T. Schneider, D. Tsarkov Comparing Syntactic and Semantic Locality. In order to compare syntactic and semantic locality, we want to understand: 1. whether, for a given seed signature Σ, the semantic Σ-module is likely to be smaller than the syntactic Σ-module, and if so by how much,11 2. how feasible the extraction of semantic modules is. Here, we focus on the two corresponding notions of ∅-semantic locality and ⊥-syntactic locality. In particular, ⊥-syntactic locality has been throughly in- vestigated in previous work [3], and it has proven to have many interesting properties. A completion of the investigation described in this paper for all fun- damental notions of modules is planned in our future work. Due to the recursive nature of the locality-based module extraction algorithm, we want to investigate locality both on a – per-axiom basis: given an axiom α and a signature Σ, is it likely that α is semantically ∅-local w.r.t. Σ but not syntactically ⊥- local w.r.t. Σ? – per-module basis: given a signature Σ, is it likely that ⊥-mod(Σ, O) 6= ∅-mod(Σ, O)? If yes, is it likely that the difference is large? Hence we need to pick, for each ontology in our corpus, a suitable set of sig- natures, and this poses a significant problem. First, we do not yet have enough insight into what typical seed signatures are for module extraction. One could assume that large ones are rarely relevant for module extraction—why bother with extracting a large module—but this still leaves a large, i.e., exponential space of possible seed signatures. If m = #O, e there are 2m possible seed signa- tures for which axioms can be tested for locality and for which modules can be extracted. Hence a full investigation is infeasible. One could assume that the comparison between semantic and syntactic mod- ules could be easier since many signatures can lead to the same module. In other words, the statistically significant number of modules w.r.t. the total number of modules is not larger than that of seed signatures needed w.r.t. the total number of seed signatures. In previous work [4,5], however, modules have been studied with respect to how numerous they are in real-world ontologies. The experiments carried out suggest that the number of modules in ontologies is, in general, exponential w.r.t. the size of the ontology. Moreover, the extraction of enough different modules can be hard, because by looking just at seed signatures there is no chance to avoid the extraction of the same module many times. In particular, for a module M there can be exponentially many seed signatures w.r.t. #Mf that generate M [3]. As a consequence, we compare the two kinds of locality of axioms—both on a per-axiom basis and a per-module basis—w.r.t. random signatures. To avoid any bias, we select a random signature as follows: we set each named entity E in the ontology to have probability p = 1/2 of being included in the signature. Thus each seed signature has the same probability to be chosen. For ontologies whose signature exceeds 9 entities, in order to get results where the 11 Recall that the semantic Σ-module is always a subset of the syntactic Σ-module. Syntactic vs. Semantic Locality: How Good Is a Cheap Approximation? 7 true proportion of differences between the two notions of locality lies in the confidence interval (±5%) with confidence level 95%, we have to select only 400 random signatures [14]. That is, we need to test only 400 random signatures to have a confidence of 95% (±5%) that the differences/equalities we observe reflect the real ones. Non-random seed signatures. A module, in general, does not necessarily show any internal coherence: intuitively, if we had an ontology describing some knowledge from both the domains of Geology and of Philosophy, we could still extract the module for the signature Σ = {Epistemology, Mineral}. This module is likely to be the union of the two disjoint modules for Σ1 = {Epistemology} and Σ2 = {Mineral}. This combinatorial behaviour can lead to exponentially many modules in the size of the signature of the ontology and indeed, as mentioned above, the number of modules in ontologies seems to be exponential [4,5]. In contrast to general modules, genuine modules can be called coherent: they are defined as those modules that cannot be decomposed into the union of two different modules. Notably, there are only linearly many genuine modules in the size of the ontology O, and the set of genuine modules is a base for all general modules: any module is either genuine or the union of genuine modules. The linear bound on the number of genuine modules is due to the fact that, for each genuine x-module M, there is an axiom α such that M = x-mod(α̃, O). Thus genuine modules can be said to be interesting modules that we can fully investigate. Hence in addition to the above mentioned investigation of ⊥- and ∅-modules for random signatures, we also look at all axiom signatures. In summary, we test: (T1) for random seed signatures Σ, (a) for each axiom α in our corpus, is α semantically ∅-local w.r.t. Σ but not syntactically ⊥- local w.r.t. Σ? (b) is ⊥-mod(Σ, O) 6= ∅-mod(Σ, O)? If yes, we determine the difference and its size. (T2) for each axiom signature from our corpus, is ⊥-mod(α̃, O) 6= ∅-mod(α̃, O)? If yes, we determine the difference and its size. 4 Experimental comparison No differences. The main result of the experiment is that, for 151 of the 156 ontologies we tested, no difference between ⊥- and ∅-locality can be observed. These 151 ontologies exclude the two NCBO BioPortal ontologies EFO (Ex- perimental Factor Ontology) and SWO (Software Ontology), as well as Koala, miniTambis, and Tambis. More specifically, for every generated seed signature, the corresponding ⊥- and ∅-module agree, and every axiom is either ⊥- and ∅-local, or neither. This statement applies to all randomly generated seed sig- natures as well as for all axiom signatures – which are seed signatures for all genuine modules. We can therefore draw the following conclusions for the 151 ontologies with respect to (T1) and (T2) above. 8 C. Del Vescovo, P. Klinov, B. Parsia, U. Sattler, T. Schneider, D. Tsarkov (T1) Given an arbitrary seed signature Σ, there is no difference (a) between ⊥- and ∅-locality of any given axiom w.r.t. Σ and (b) between the ⊥- and ∅-modules for Σ, both times at a significance level of 0.05. (T2) Given any axiom signature Σ, there is no difference between the ⊥- and ∅-modules for Σ. In the case of the 151 ontologies, the extraction of a ∅-module (with tautology tests performed by FaCT++) often took considerably longer than the extraction of the corresponding ⊥-module. For example, for MoleculeRole, the largest of the 151 ontologies, times to extract a ⊥-module (test all axioms for ⊥-locality, respectively) ranged between 27 and 169ms (21 and 77ms, respectively), while the extraction of a ∅-module (test of all axioms for ∅-locality, resp.) took up to 6 × as long, on average 2.7 × (2.0 ×, resp.). It is also worth noting that the ontologies Galen and People, which are renowned for having particularly large ⊥-modules [2,5], are among those without differences between ⊥- and ∅-locality. Differences. For the five ontologies where differences between ⊥- and ∅-modules (or -locality) occur, we isolated two types of culprits – axioms which are not ⊥-local w.r.t. some signature Σ, but which are ∅-local w.r.t. Σ. Type-1 culprits are simple tautologies that have accidentally entered the “inferred view” – i.e., closure under certain entailments – of two ontologies. They do not occur in the original “asserted” versions and can, in principle, be detected by a slightly refined syntactic locality check. Type-2 culprits are definitions of concept names via a conjunction that satisfies certain conditions explained below. There are not many type-1 and type-2 axioms in the affected ontologies, and the observed differences are comparably small. Table 2 gives an overview of the differences observed. Type-1 culprits are axioms InverseObjectProperties(P, InverseOf(P)), where P is a role. This translates into the tautology P ≡ (P− )− in DL nota- tion. Such an axiom is therefore ∅-local w.r.t. any signature. However, it behaves differently for ⊥-locality: if the signature Σ contains P, then both sides of the equation are neither in Bot(Σ) nor in Top(Σ), hence the axiom is considered non-local; otherwise, both sides are ⊥-equivalent, hence the axiom is local. Type-1 axioms occur in the “inferred view” of the ontologies EFO and SWO. Table 2 shows the relatively modest differences caused by these axioms. In all cases, there are no other axioms in the differences. This means that no differences occur for the non-inferred original versions of EFO and SWO. Type-2 culprits are complex definitions A ≡ C of a concept name A where C is a disjunction that contains both a universal and an existential (or min- imum cardinality) restriction on the same role. This affects the ontologies Koala, miniTambis, and Tambis. The effect is best illustrated for Koala, which contains exactly one such axiom, namely M ≡ S u ∀c.F u ∀g.{m} u =3 c.>, where we have abbreviated the concept names MaleStudentWith3Daughters, Student, Female, the roles hasChildren, hasGender, and the nominal male. Now if the signature against which the axiom is tested for locality contains {S, c, g} but Syntactic vs. Semantic Locality: How Good Is a Cheap Approximation? 9 Ontology #axs #differences difference time culprit sizes ratio type and #axs rel. avg. frequency SWO 3446 T1 a 400 6–22 0–1% 3.31 1 (30×) T1 b 400 23–29 1–2% 5.11 T2 3446 3–1 1–5% 5.86 EFO 6008 T1 a 400 8–24 0–1% 1.42 1 (32×) T1 b 400 13–30 0–1% 1.38 T2 128 1–4 9–17% — Koala 42 T1 a 0 0 0% — 2 (1×) T1 b 2 1 3% — T2 0 0 0% — miniTambis 170 T1 a 68 1–2 1–3% — 2 (3×) T1 b 93 1–4 1–3% — T2 26 1–7 6–75% — Tambis 592 T1 a 58 1–3 0–1% 3.31 2 (11×) T1 b 229 2–11 0–2% 5.01 T2 191 4–41 2–26% — Table 2. Overview table of differences observed. The columns show: the ontology name; the overall number of axioms; the name of the test (see list on Page 7); the number of cases with differences; the number of axioms in the differences (absolute and relative to the ⊥-case); the average time ratio ∅ : ⊥ (“—” indicates that no reliable statement is possible: the time for ⊥ is only a few, often 0, milliseconds); the type of culprit present and the number of axioms of this type. neither M nor F, then this axiom is not ⊥-local because none of the conjuncts on the right-hand side is in Bot(Σ). On the other hand, this axiom is a tautology when M and F are replaced by ⊥: the conjunction ∀c.⊥ u =3 c.> cannot have any instances, regardless of how c is interpreted. For Koala, this effect only causes two singleton differences between sets of local axioms for the randomly generated seed signatures, as shown in Table 2. For axiom signatures, there is no difference. Interestingly, this effect does not propagate to modules: for all signatures, ⊥- and ∅-modules are the same. The reason might be that (a) g is used in many axioms and is thus very likely to contribute to the extended signature during module extraction, and (b) then the axiom defining F is no longer local, which “pulls” F into the extended signature, preventing the observed effect. In miniTambis and Tambis, this effect is much stronger and affects a large proportion of modules, as shown in Table 2. The differences in these cases do not only consist of culprit axioms, but also of axioms that become non-local after the signature has been extended by the terms in the culprit axioms. Still, the size of the differences is mostly modest while, for Tambis, the ∅-locality test (∅-module extraction) takes on average over three times (five times) as long as the ⊥-locality test (⊥-module extraction). 10 C. Del Vescovo, P. Klinov, B. Parsia, U. Sattler, T. Schneider, D. Tsarkov 5 Conclusion and Outlook Summary. We obtain two main observations from the experiments carried out. – In practice, there is no or little difference between semantic and syntactic locality. That is, the computationally cheaper syntactic locality is a good approximation of semantic locality. – Though in principle hard to compute, semantic modules can be extracted rather fast in practice. These results suggest that it is questionable to conclude that semantic locality should be preferred to syntactic locality. In terms of computation time, there is often a benefit in using syntactic locality: the average speed-up compared to the extraction of a semantic-locality based module is by a factor of up to 6. For some particular module pairs, it is higher by an order of magnitude. The gain in module size is zero or so small that it is hard to justify the extra time spent. In particular, there is no gain in size for the ontologies Galen and People, which are “renowned” for having disproportionately large modules [2,5]. Our results are interesting not only because they provide an evaluation of how good the cheap syntactic locality approximates semantic locality, but also because they enabled us to fix bugs in the implementation of syntactic modular- ity. For example, earlier data from the experiment have shown that reflexivity axioms had been treated incorrectly by the syntactic locality checker. Future Work. It is evident that this work is preliminary. It investigates only the differences between the related notions of ⊥- and ∅-locality. We plan to ex- tend the same study to other notions of locality, in particular, nested modules (>⊥∗ - vs. ∆∅∗ -modules) – these notions are the most economical in terms of module size. Moreover, we want to extend the investigation to the remaining larger ontologies in the BioPortal repository and further large ontologies, e.g., some versions of the NCI Thesaurus12 . Preliminary results with a version that is not among the regular releases show differences due to type-2 culprits, but we have not included them here because the differences disappear after removing axioms that were introduced due a problem with object and annotation proper- ties when the ontology file is parsed by the OWL API. This behaviour is yet to be investigated and explained. Another interesting extension is to modify the seed signature sampling. Cur- rently, the random variable “size of the seed signature generated” follows the binomial distribution with expected value m/2 and variance m/4. Hence, most signatures in the sample have size around m/2; small and large signatures are un- derrepresented. For example, for one ontology with 915 terms, all signature sizes lay between 422 and 509. One might argue that, for big ontologies, the typical module extraction scenario does not require large seed signatures – but it does sometimes require relatively small seed signatures, for example, when a module is extracted to efficiently answer a given entailment query of typically small size. 12 Downloadable from http://evs.nci.nih.gov/ftp1/NCI_Thesaurus Syntactic vs. Semantic Locality: How Good Is a Cheap Approximation? 11 On the other hand, large modules resulting from larger seed signatures may be more likely to differ. We therefore plan an alternative seed signature sampling via bins for average signature sizes: repeat the current sampling procedure scaled to several subintervals of the range of possible signature sizes. Our current results answer the question whether there is a significant differ- ence between the two locality notions with respect to a given signature. It is also interesting to ask the same question relative to a given module. To answer it, the sampling of modules instead of seed signatures requires further investigation. Acknowledgment. We thank Rafael Gonçalves and the anonymous reviewers for helpful comments. References 1. Baader, F., Calvanese, D., McGuinness, D., Nardi, D., Patel-Schneider, P.F. (eds.): The Description Logic Handbook: Theory, Implementation, and Applications. Cambridge University Press (2003) 2. Cuenca Grau, B., Horrocks, I., Kazakov, Y., Sattler, U.: Modular reuse of ontolo- gies: Theory and practice. J. of Artif. Intell. Research 31, 273–318 (2008) 3. Del Vescovo, C., Gessler, D., Klinov, P., Parsia, B., Sattler, U., Schneider, T., Winget, A.: Decomposition and Modular Structure of BioPortal Ontologies. In: Proc. ISWC-11 (2011) 4. Del Vescovo, C., Parsia, B., Sattler, U., Schneider, T.: The modular structure of an ontology: an empirical study. In: Proc. of WoMO-10. Frontiers in AI and Appl., vol. 211, pp. 11–24. IOS Press (2010) 5. Del Vescovo, C., Parsia, B., Sattler, U., Schneider, T.: The modular structure of an ontology: atomic decomposition and module count. In: Proc. of WoMO-11. Frontiers in AI and Appl., vol. 230, pp. 25–39. IOS Press (2011) 6. Ghilardi, S., Lutz, C., Wolter, F.: Did I damage my ontology? A case for conser- vative extensions in description logics. In: Proc. of KR-06. pp. 187–197 (2006) 7. Horridge, M., Parsia, B., Sattler, U.: The state of bio-medical ontologies. In: Proc. of 2011 ISMB Bio-Ontologies SIG (2011) 8. Horrocks, I., Kutz, O., Sattler, U.: The even more irresistible SROIQ. In: Proc. of KR-06. pp. 57–67 (2006) 9. Kazakov, Y.: RIQ and SROIQ are harder than SHOIQ. In: Proc. of KR-08. pp. 274–284 (2008) 10. Konev, B., Lutz, C., Walther, D., Wolter, F.: Semantic modularity and module extraction in description logics. In: Proc. of ECAI-08. Frontiers in AI and Appl., vol. 178, pp. 55–59. IOS Press (2008) 11. Kontchakov, R., Wolter, F., Zakharyaschev, M.: Logic-based ontology compar- ison and module extraction, with an application to DL-Lite. Artificial Intelligence 174(15), 1093–1141 (2010) 12. Lutz, C., Walther, D., Wolter, F.: Conservative extensions in expressive description logics. In: Proc. of IJCAI-07. pp. 453–458 (2007) 13. Sattler, U., Schneider, T., Zakharyaschev, M.: Which kind of module should I extract? In: Proc. of DL 2009. ceur-ws.org, vol. 477 (2009) 14. Smithson, M.: Confidence Intervals. Quantitative Applications in the Social Sci- ences, Sage Publications (2003)