Empirical Study of Logic-Based Modules: Cheap Is Cheerful Chiara Del Vescovo1 , Pavel Klinov2 , Bijan Parsia1 , Ulrike Sattler1 , Thomas Schneider3 , and Dmitry Tsarkov1 1 University of Manchester, UK {delvescc|bparsia|sattler|tsarkov}@cs.man.ac.uk 2 University of Ulm, Germany pavel.klinov@uni-ulm.de 3 Universität Bremen, Germany tschneider@informatik.uni-bremen.de Abstract. For ontology reuse and integration, a number of approaches have been devised that aim at identifying modules, i.e., suitably small sets of “relevant” axioms from ontologies. Here we consider three logically sound notions of modules: MEX modules, only applicable to inexpressive ontologies; modules based on semantic locality, a sound approximation of the first; and modules based on syntactic locality, a sound approximation of the second (and thus the first), widely used since these modules can be extracted from SROIQ ontologies in time polynomial in the size of the ontology. In this paper we investigate the quality of both approximations over a large corpus of ontologies. In particular, we show with statistical signifi- cance that, in most cases, there is no difference between the two module notions based on locality; where they differ, the additional axioms are in general unproblematic since either they can be easily ruled out or their number is relatively small. Finally, we show that the same can be said about the relation between MEX and locality-based modules. 1 Introduction Some notable examples of ontologies describe large and loosely connected do- mains, as it is the case for SNOMED–CT, the Systematized Nomenclature Of MEDicine, Clinical Terms,4 which describes the terminology used in medicine including diseases, drugs, etc. Users often are not interested in a whole ontology O, rather only in a limited relevant part of it. In this context, the idea has been recently explored to use modules, i.e., suitably small subsets of ontologies that behave for specific purposes as the original ontologies over a given signature Σ, i.e., a set of terms (non-logical symbols – concept and role names). The notion of logical module [9,4] focuses on providing coverage, i.e., on preserving all the entailments of O over Σ. In [19] the authors comment on the crucial role played by coverage and by two additional properties of modules for ontology reuse and integration. Let M 4 http://www.ihtsdo.org/snomed-ct/ be a subset of O. We say that: (1) M is self-contained if it provides coverage for its signature; (2) M is depleting if the remainder O \ M of the ontology does not entail any non-tautological axiom η over Σ. Under some mild conditions a minimal depleting and self-contained module is also uniquely determined [15]. Extracting the uniquely determined module for a signature Σ is, however, hard or even impossible for expressive languages [18,10,17]. For identifying no- tions of modules whose extraction is feasible, we can either restrict the expres- sivity of the ontology language, or look for feasible sufficient conditions that guarantee M to be a module for Σ, even though not necessarily the smallest. For inexpressive logics, one can make use of the module extractor imple- mented in the MEX system [13]. This module extractor works on acyclic ELI terminologies and extracts the minimal module in polynomial time. For expressive logics, module extraction can be implemented making use of the notion of locality. The resulting modules, known as locality-based modules (LBMs) [2] are, in general, not minimal. Locality comes in two flavors: semantic and syntactic locality, which have a bottom-, a top-, and a star-variant which is contained in both the top- and the bottom-variants. For any of the three variants, a syntactic LBM contains the corresponding semantic LBM. Algorithms for the extraction of syntactic LBMs are known that run in time polynomial in the size of the ontology (thus much cheaper than reasoning), are implemented in the OWL API,5 and are currently used for ontology reuse and integration. In contrast, despite the fact that algorithms for extracting semantic LBMs are known, until now and to the best of our knowledge they had not been implemented. They require entailment checks against an empty ontology and thus involve reasoning of a kind that is rather unusual for DL reasoners.6 We know that the MEX module for a signature Σ is contained in the star semantic LBM which, in turn, is contained in the star syntactic LBM. Thus, syntactic locality can be seen as an approximation of semantic locality which, in turn, is an approximation of MEX modules. An interesting question arising here is how good these approximations are: how much larger are the modules extracted by the approximations, and how much faster is the extraction? To answer these questions, we present the first implementation of a semantic LBM extractor and report on experiments on a large corpus of ontologies. We compare the performance and results of the semantic LBM extractor with those of a syntactic LBM extractor and with those of MEX. A comparison between MEX- and syntactic bottom-modules only for SNOMED–CT is reported in [13]. The contributions of this paper are as follows. • We provide the first implementation for the extraction of semantic LBMs, which is embedded in the latest release of the FaCT++ reasoner. Our results show that the extraction of semantic LBMs, which is in principle hard, is feasible in practice: on average, it is between 3 times (for top-modules) and 15 times (for bottom- and star-modules) slower than the extraction of syntactic LBMs, and both only take milliseconds to seconds for most ontologies below 10K axioms. 5 http://owlapi.sourceforge.net/ 6 DL reasoners usually classify an ontology: test it for consistency and all concept names for satisfiability/mutual subsumption. • We show with statistical significance that, for almost all members of a large corpus of existing ontologies, there is no difference between any syntactic LBM and its semantic counterpart. In the few cases where differences occur, those are extremely modest so that it is questionable whether extracting semantic LBMs is worth the increased computational cost. • We isolate four patterns of axioms that completely explain those rare differ- ences. One includes simple tautologies that can be removed in a straightforward preprocessing step. • We modify the original corpus to obtain for each ontology an acyclic EL version suitable for the use with the MEX system. We then compare MEX-modules and the star-variants of LBMs, and find differences in only ∼ 27% of the corpus. We explain one reason for the largest differences observed. 2 Preliminaries We assume the reader to be familiar with Description Logic languages (e.g. SROIQ [1,12]), and aim here at fixing the notations and at defining the key notions around module extraction, with a focus on locality-based modules [2] and MEX modules [14]. Let O denote a SROIQ ontology, NC a set of concept names, and NR a set of role names. We do not consider individual names as they do not play any role in the extraction of the modules here analyzed. We refer to either a concept or a role name by using the word “term”. A signature is a set Σ ⊆ NC ∪ NR . Given a concept, role, or axiom X, we call the set of terms in X the signature of X, denoted X. e Given a set M ⊆ O of axioms from O, and a signature Σ, we say that O is a deductive Σ-conservative extension (Σ-dCE ) of M if, for all SROIQ- axioms α with α e ⊆ Σ, it holds that O |= α if and only if M |= α. O is a model Σ-conservative extension (Σ-mCE ) of M if {I|Σ | I |= O} = {I|Σ | I |= M}. Dually, M is a dCE-based module of O for Σ if O is a Σ-dCE of M, and it is an mCE-based module for Σ if O is a Σ-mCE of M. All dCE-based modules are also mCE-based modules, whilst the converse is not always true. A module M ⊆ O for Σ is called depleting if there is no non trivial entailment η over Σ such that O \ M |= η; M is called self-contained if M is a module for Σ = M. f Since M ⊆ O the monotonicity of SROIQ implies that every entailment η over Σ derivable from M is also derivable from O. Deciding the converse directions is in general computationally hard, or even undecidable for expressive DLs [18,10,17]. Since we do not need to find all the subsets of O that are a module for Σ, we can use easier conditions which guarantee that a set of axioms M ⊆ O is a module for Σ. One strategy to extract a module M ⊆ O for a seed signature Σ consists of defining a suitable oracle x to decide whether a single axiom α ∈ O is x- relevant for preserving the non-tautological entailments over Σ: clearly, if the x-check is positive for α, then by depletion M needs to contain α. To guarantee also self-containment, the fixpoint procedure described in Algorithm 1 needs to be performed. The module extracted will be called an x-module, parameterized by the notion x of relevance. Algorithm 1 is a special case of the one in [2, Algorithm 1 Extraction of an x-module for Σ Input: Ontology O, seed signature Σ, notion of relevance x Output: x-module M of O w.r.t. Σ M ← ∅; O0 ← O repeat changed ← false for all α ∈ O0 do if α is x-relevant w.r.t. Σ ∪ Mf then 0 0 M ← M ∪ {α}; O ← O \ {α}; changed ← true until changed = false return M Figure 4], and its output M does not depend on the order in which the axioms α are selected [2]. Due to space limitations, we cannot provide a full description of the notions of modules here investigated, and refer the interested reader to [2,14] for further details. We briefly sketch the intuition behind each notion of relevance and the corresponding results of interest for this paper. The MEX system. In [14], a procedure called MEX for extracting minimal mCE- based modules from acyclic ELI ontologies is given. The notion of MEX-relevance is based on a relation dependO , which associates each concept name A with the set dependO (A) of all the symbols X that are used in the definition7 of A in O. Then, an axiom α ∈ O is MEX-relevant for a signature Σ if it defines a term in Σ. The authors prove that, if O is an acyclic ELI ontology, then using the MEX-relevance in Algorithm 1 generates the minimal depleting self-contained module for each signature Σ, and that the extraction runs in polynomial time. Semantic locality In [2], the authors define the notion of locality and distin- guish two flavors, here called ∅- and ∆-locality. Intuitively, a SROIQ axiom α is ∅-local (resp. ∆-local ) w.r.t. signature Σ if α0 obtained by replacing all terms in e \ Σ with ⊥ (resp. >) is a tautology. The authors prove that, if all axioms in α O \ M are ∅-local (or all axioms are ∆-local) w.r.t. Σ ∪ M, f then M is an mCE- based (and hence dCE-based) module of O for Σ. Since deciding ∅- or ∆-locality requires tautology checks, this problem is as hard as standard reasoning. In some cases, α0 is not a SROIQ axiom, so standard reasoners need to be extended. Syntactic locality In order to achieve tractable module extraction, [2] define the two syntactic notions of ⊥- and >-locality. Those approximate the two notions of semantic ∅- and ∆-locality. Intuitively, the syntactic rules provided describe axioms α for which α0 is obviously a tautology—in which case α is said to be ⊥- or >-local w.r.t. Σ. Thus, every ∅-local (∆-local, resp.) axiom w.r.t. Σ is also ⊥-local (>-local, resp.) w.r.t. Σ, but not vice versa. As a consequence, also ⊥- and >-modules are mCE- and dCE-based modules for Σ. Applying the syntactic rules requires polynomial time, hence the extraction of this kind of modules can be performed in time polynomial in the size of the ontology. 7 This notion of “definition” is specific for the MEX-modules Modules based on syntactic (semantic) locality can be made smaller by it- eratively nesting >- and ⊥-extraction (∆- and ∅-extraction), again obtaining mCE- and dCE-based modules [2,19], called >⊥∗ - and ∆∅∗ -modules. The ∆∅∗ - module for Σ is always contained in the corresponding >⊥∗ -module. Moreover, for acyclic ELI ontologies, the MEX-module for Σ is always contained in the corresponding ∆∅∗ -module. 3 Research questions and experimental design A natural question is whether syntactic modules are likely to be much larger than the semantic ones, which are, in theory, computationally more costly. Hence, a second question is whether semantic module extraction is noticeably more costly: the tautology test has to be carried out often—once per axiom and signature that the algorithm goes through— and it is thus hard to predict the feasibility of semantic LBM extraction. Altogether, we want to know whether syntactic LBMs are a necessary approximation and how good an approximation of se- mantic LBMs they are. Similarly, for acyclic ELI terminologies the analogous question arises: how good an approximation of MEX modules are LBMs? One can always construct ontologies with huge differences in size and time between syntactic and semantic LBMs and between LBMs and MEX modules. Here, we are interested in these differences in currently available ontologies, and thus need to design, run, and analyse suitable experiments. Selection of the corpus. For our experiments, we have built a corpus con- taining: (1) all the ontologies from the NCBO BioPortal ontology repository,8 version of November 2012; (2) ontologies from the TONES repository9 which have already been studied in previous work on modularity [6]: Koala, Mereology, University, People, miniTambis, OWL-S, Tambis, Galen. From this corpus, we have removed ontologies that cannot be downloaded, whose .owl file is corrupted or impossible to parse, or which are inconsistent. Furthermore, we have excluded those large ontologies (exceeding 10K axioms) where the extraction of a semantic LBM repeatedly took more than 2 minutes: for each such ontology, the estimated time needed to perform our experiments would have exceeded 300 hours. This selection results in a corpus of 242 ontologies, which greatly vary in expressivity (from AL to SROIQ(D)) and in size (10–16,066 axioms, 10–16,068 terms) [11]. For a full list of the corpus, please refer to [5]. As mentioned before, for some ontologies is not possible to test ∆-locality (and thus for extracting ∆- and ∆∅∗ -modules) using standard DL reasoners., see [5] for details. To cover these cases, we have extended the reasoner FaCT++ to cover the uses of the >-role specific to the semantic locality tests. Since MEX handles only acyclic ELI ontologies, we created an ELI version ELI(O) of each ontology O in our corpus by filtering unsupported axioms and breaking terminological cycles. While a principled way of doing this is beyond the scope of this paper, we have used a heuristic, which is described in [5]. 8 http://bioportal.bioontology.org 9 http://owl.cs.manchester.ac.uk/repository/ Comparing modules and locality. In order to compare syntactic and se- mantic locality, as well as LBMs and MEX modules, we want to understand (1) whether, for a given seed signature Σ, it is likely that there is a difference between the syntactic and the semantic Σ-module or the MEX module for Σ and the latter and, if so, the size of the difference;10 and(2) how feasible the ex- traction of semantic LBMs is. For this purpose, we compare (a) ∅-semantic and ⊥-syntactic locality, ∆-semantic and >-syntactic locality, (b) ∅- and ⊥-modules, ∆- and >-modules, ∆∅∗ - and >⊥∗ -modules, (c) MEX modules and ∆∅∗ -modules. Due to the recursive nature of Algorithm 1, our investigation is both on a per-axiom-basis: given axiom α and signature Σ, is it likely that α is semanti- cally ∅-local (∆-local, resp.) w.r.t. Σ but not syntactically ⊥-local (>-local, resp.) w.r.t. Σ? per-module basis: given a signature Σ, is it likely that – ⊥-mod(Σ, O) 6= ∅-mod(Σ, O), or – >-mod(Σ, O) 6= ∆-mod(Σ, O), or – >⊥∗ -mod(Σ, O) 6= ∆∅∗ -mod(Σ, O), or – ∆∅∗ -mod(Σ, O) 6= MEX-mod(Σ, O)? If yes, is it likely that the difference is large? Clearly we need to pick, for each ontology in our corpus, a suitable set of signatures, and this poses a significant problem. A full investigation is infeasible: if m = #O, e there are 2m possible seed signatures, so that testing axioms for locality against all the signatures is already impossible for m ∼ 100. One could assume that comparing modules is easier since many signatures can lead to the same module. However, previous work [6,8] has shown that the number of modules in ontologies is, in general, exponential w.r.t. the size of the ontology. Still, different seed signatures can lead to the same module, which makes it hard to extract enough different modules. We will consider seed signatures of two kinds: genuine seed signatures and random seed signatures. Genuine seed signatures. A module does not necessarily show an internal co- herence: e.g., if we had an ontology O about the domains of geology and philoso- phy, we could extract the module for the signature Σ = {Epistemology, Mineral}. That module is likely to be the union of the two disjoint modules for Σ1 = {Epistemology} and Σ2 = {Mineral}. In contrast, genuine modules can be said to be coherent: they are those modules that cannot be decomposed into the union of two “⊆”-uncomparable modules. Notably, there are only linearly many genuine modules in the size of O since each genuine x-module equals x-mod(α̃, O), for some axiom α ∈ O. Moreover, all modules of O are composed from genuine modules [7]. Thus, genuine modules are of special interest, and we can investigate them, and the corresponding genuine signatures, in full. 10 Recall: the MEX module is always a subset of the semantic Σ-module, which is always a subset of the syntactic Σ-module. Random seed signatures. Since a full investigation of all the signatures is impossible, we compare locality—both on a per-axiom and per-module basis— as well as LBMs and MEX modules on a random signature Σ, which we select by setting each named entity E in the ontology to have probability p = 1/2 of being included in Σ. This ensures that each Σ will have the same probability to be chosen. This approach has a clear setback: the random variable “size of the seed signature generated” follows a binomial distribution, so a random seed signature is highly likely to be rather large and to contain half the terms of the ontology. However, we do not yet have enough insight into what typical seed signatures are for module extraction, so biasing the selection of signatures to, for example, those of a certain size has no rationale. In contrast, selecting random seed signatures avoids the introduction of any bias. Moreover, this choice is complementary to the selection of all the genuine signatures, which are in general small. Whilst the selection of genuine signatures is complete, we can only aim at selecting a number of random signatures to obtain statistically significant state- ments about modules. To reach a confidence level of 95% that the true pro- portion of differences between modules lies in the confidence interval (±5%) of the observed proportion, we have to sample at least 385 seed signatures (see the detailed explanations in [5]). For ontologies with at least 9 elements in the signature, we will therefore draw a sample of size 400. For all other ontologies, we will look at all of the 6 400 signatures. Summary. We compare, for every ontology O in our corpus, (T1) for random seed signatures Σ from O, (a) for each axiom α in O, is α – ∅-local w.r.t. Σ but not ⊥-local w.r.t. Σ? – ∆-local w.r.t. Σ but not >-local w.r.t. Σ? (b) is – ⊥-mod(Σ, O) 6= ∅-mod(Σ, O)? – >-mod(Σ, O) 6= ∆-mod(Σ, O)? – >⊥∗ -mod(Σ, O) 6= ∆∅∗ -mod(Σ, O)? – ∆∅∗ -mod(Σ, ELI(O)) 6= MEX-mod(Σ, ELI(O))? (T2) the same, Σ ranging over all the genuine signatures β̃ for β ∈ O. 4 Results of the Experiments No differences in locality. The main result of the experiment is that, for the vast majority of the ontologies in our corpus, no difference between syntactic and semantic locality is observed, for all three variants ⊥ vs. ∅, > vs. ∆, and >⊥∗ vs. ∆∅∗ . More precisely, for 210 out of 242 ontologies, we obtain that: (T1) for random seed signatures, there is no statistically significant difference (a) between semantic and syntactic locality of any kind, (b) between semantic and syntactic LBMs of any kind; (T2) given any genuine signature, there is no such difference. More specifically, for all randomly generated seed signatures and all genuine signatures, the corresponding bottom-modules (and the corresponding top- and star-modules, respectively) agree, and every axiom is either ⊥- and ∅-local, or none of both (and either >- and ∆-local, or none of both). The 210 ontologies include Galen and People, which are renowned for having unusually large ⊥-modules [2,8]. In most cases, extracting a semantic and syn- tactic LBM each took only a few milliseconds; hence, a performance comparison is not meaningful. For some ontologies, the semantic LBM took considerably longer to extract than the syntactic: up to 5 times for star-modules in Molecule Role, and up to 34 times in Galen. Differences in locality. We have observed differences between syntactic and semantic locality for 32 ontologies in our corpus. We call the axioms that cause these differences culprits – patterns of axioms which are not ⊥-local (>-local, respectively) w.r.t. some signature Σ, but which are ∅-local (∆-local, respec- tively) w.r.t. Σ. We have identified four types of patterns, a–d , and we describe them in the following. Sometimes, culprit axioms pull additional axioms into the syntactic LBM, due to signature extension during module extraction. We denote concept names by A, B, complex concepts by C, D, roles by r, s, . . . , nominals by a, non-empty data ranges (e.g., int or int0..9 ) by R, possibly with indices. Σ denotes a signature for which a module is extracted or against which an axiom is checked for locality. Terms outside Σ are overlined; we further use notation C⊥ and C> to denote concepts that are bottom- or top-equivalent due to the grammar defining syntactic locality in [3, Def. 6] and the analogous grammar for semantic locality. Culprits of type a are simple tautologies that accidentally entered the “in- ferred view” (closure under certain entailments) of an ontology. These axioms do not occur in the original “asserted” versions and could, in principle, be detected in a simple preprocessing step. Type-a culprits occur in 10 ontologies of the above 32, and are of the following kinds: A v A, r ≡ (r− )− , and A u C u D v A u C. Each such tautology is trivially ∅-local and ∆-local w.r.t. any Σ, but not always ⊥- or >-local: if Σ contains all terms in α, then both sides of the subsumption (equivalence) are neither ⊥- nor >-equivalent. Differences caused not solely by culprits of type a have been observed for 26 ontologies. In only 6 of these cases, the differences affect modules; in the remaining 20, they only affect locality of single axioms (tests T1 a and T2 a). We will focus on the former 6, listed in Table 1, and refer to [5] for details on all 26. Ontology Abbreviation DL expressivity #axioms #terms MiniTambis-repaired MiniT ALCN 170 226 Tambis-full Tambis SHIN (D) 592 496 Bleeding History P... BHO ALCIF(D) 1,925 581 Neuro Behavior O... NBO AL 1,314 970 Pharmacogenomic... PhaRe ALCHIF(D) 459 311 Terminological and... TOK SRIQ(D) 466 330 Table 1. Ontologies that exhibit differences in modules According to Table 1, differences between modules occur for ontologies of medium to large size and medium to high expressivity. Differences in locality alone additionally affect small ontologies such as Koala (42 axioms) and Pilot Ontology (85 axioms), as well as large ontologies such as Galen (4,735 axioms) and Experimental Factor Ontology (7,156 axioms). The number of axioms causing these differences (i.e., matching the culprit patterns) in the affected ontologies is small except for Galen, and most of the observed differences are relatively small. Table 2 gives a representative selection of the differences in modules observed. For a complete overview, including differences in locality of single axioms, consult the table in [5]. Ontology Types affected #diffs size of diffs culprit freq. #axs (rel.) type miniT bot, star 14–25% 1–7 0–600%b c 3 Tambis bot, star 32–57% 2–41c 1–62%c c 8 BHOa star 17% 1–12 0–300% b 31 NBOa star 3% 2 0–200% d 3 PhaRea top, star 1–8% 1–326d 0–6,520%d d 10 TOK top, star 49–100% 1–7 0–9% d 3 a differences only for genuine modules b differences > 5% only for genuine modules c differences > 11 axioms (> 2%) only for genuine modules d differences > 13 axioms (> 1,300%) only for top-modules The columns show: ontology name (abbreviations: see Table 1); type of modules af- fected; relative number of module pairs with differences; number of axioms in the differences (absolute and relative to the ∅- or ∆- or ∆∅∗ -case); type of culprit present and number of axioms of this type involved in differences. Table 2. Overview of observed differences between modules Table 2 shows small absolute differences for miniT, BHO, NBO, and TOK. In Tambis, large differences occur only for genuine modules, which suggests that they are unlikely to occur in practical cases with usually larger seed signatures. Finally, in PhaRe, large differences occur only for top-modules. For all these ontologies, a single syntactic or semantic module was extracted within only a few milliseconds, making module extraction times roughly equal. Culprits of type b are axioms with an ∃-restriction on a set of nominals or a non-empty data range on the right-hand side, such as A v ∃r.{a1 , . . . , an } or A v ∃r.R. These axioms are ∆-local w.r.t. any signature Σ that does not contain r because they become tautologies if r is replaced by >. However, they can only be >-local when A is a ⊥-equivalent concept w.r.t. Σ. Culprit-b axioms affect genuine modules of BHO, and (only) locality of single axioms for 4 more ontologies. We observed a slightly more sophisticated variant of the form A ≡ C> u ∃r.R. Culprits of type c are axioms α that contain a concept description C such that (a) C becomes equivalent to ⊥ (or >) if all terms outside Σ are replaced by ⊥ (or >); (b) this causes α to be semantically ⊥-local (or >-local); but (c) the grammars for syntactic locality do not “detect” C to be a C⊥ (or C> ). For example, C = ∀r.A u ∃r.> becomes ⊥-equivalent if A is replaced by ⊥; the same holds with cardinality restrictions in place of “∃”. Consequently, axioms such as A⊥ ≡ B u ∀r.C⊥ u ∀s.{a} u =3 r.>, (taken from Koala) are ∅-local but not ⊥-local. We found this pattern in 8 ontologies. Only in miniT and Tambis, it affects a large proportion of bottom- and star-modules, with additional axioms “pulled in”. Still, the size of the differences is modest, as argued above. Some of the remaining 6 ontologies contain different kinds of complex concepts that cause differences in top-locality of single axioms. Culprits of type d are axioms where a concept (or role) name from the left-hand side occurs on the right-hand side together with a top-equivalent role (or concept), causing differences in top-modules. The simplest kind of axiom of this type is A v ∃r.A, which is ∆-local because replacing r with > makes it a tautology. The axiom is only >-local if Σ contains neither r nor A. We have found further examples of increasing complexity in Adverse Event Reporting Ontology and Galen; see [5]. We have observed culprits of type d in 17 ontologies, see the detailed overview in [5]. Only in 3 cases (NBO, PhaRe, and TOK) are modules affected. Galen contains 121 culprit-D axioms, but they only affect locality of single axioms. In addition, the time differences for Galen are remarkable: checking all axioms for ∆-locality takes up to 70 times longer than checking them for >- locality. Summary. All culprits hardly ever cause significant differences in modules. Only for PhaRe are differences between semantic and syntactic modules not negligible, but we were able to relativize them. Table 1 may suggest that culprits occur only in expressive ontologies. How- ever, patterns a, c, d can, in principle, already occur in simple terminologies in EL and ALC, respectively. Evidently, type-a culprits can easily be filtered out in a preprocessing step. For types c and d , there is no hope for an exhaustive extension to locality because they can (and do) occur in arbitrarily complex shapes and contexts. Patterns of type b rely on nominals or datatypes – but they are repairable by a straightforward extension to the definition of syntactic locality: one can ex- tend the locality definition to distinguish ⊥- and >-distinct concepts, by adding appropriate grammars to the definition of syntactic locality, and adding more cases of ⊥- and >-equivalent concepts to the existing grammars. However, from the small numbers of differences observed, we doubt that such an extension of syntactic locality will have any significant effects in practice. LBMs vs MEX results. The results of the experimental comparison of syntac- tic/semantic LBMs and MEX modules are summarized in Table 3. They show that MEX modules smaller than the corresponding LBMs can be found in ∼ 27% of the preprocessed ontologies, for either random or axiom-based seed signatures. At the same time, unsurprisingly, syntactic and semantic LBMs do not differ at all for these simple ELI ontologies. In experiments with random seed signatures, it can be seen that for those ontologies where there are differences (most notably, Galen), they occur in many Experiment #ontol. % tests avg size of diffs with diffs. with diffs. #axs rel. Random signatures 66 84% 0–26 0–13% Axiom signatures 61 12% 0–13 0–80% The results from the third column on are averaged over all ontologies with differences LBM–MEX in at least one module. For example, the last two columns show the average min and max absolute (resp. relative) difference between LBMs and MEX modules. Table 3. Differences between MEX and LBMs (>⊥∗ , ∆∅∗ ) tests. Thus, the difference appears to be caused by features of the ontology, not some particular seed signatures. Also, the difference sometimes comes out large in certain tests, also for genuine modules. For example, for the signature of the following axiom in Galen, both ∆∅∗ -mod and >⊥∗ -mod contain 127 axioms while the MEX-module only contains the axiom itself:11 RICF ≡ ICF u ∃ISFO.RSH. The likely reason is the proliferation of concept equivalence axioms in Galen. For example A ≡ B will end up in the ∆∅∗ -mod for any seed signature containing either A or B. It is, however, an mCE of ∅ w.r.t. to either {A} or {B}. 5 Conclusion and outlook Summary. We obtain three main observations from our experiments. (1) In general, there is no or little difference between semantic and syntactic locality. Hence, the computationally cheaper syntactic locality is a good approximation of semantic locality. For the ontologies Galen and People, which are “renowned” for having disproportionately large modules, syntactic and semantic LBMs do not differ. (2) In most cases, there is no or little difference between LBMs and MEX modules. Only for Galen are MEX modules considerably smaller than LBMs. (3) Though in principle hard to compute, semantic LBMs can be extracted rather fast in practice. Still, their extraction often takes considerably longer than for syntactic LBMs. We cannot make any statement about MEX module extraction times because we use the original MEX implementation, which combines loading and module extraction. Due to (1), hardly any benefit can be expected from preferring the potentially smaller semantic LBMs to the cheaper syntactic LBMs. From (2) we can say that semantic LBMs can be seen as the best available approximation of MEX modules for ontologies in highly expressive languages. Not only does our study evaluate how good the cheap syntactic locality ap- proximates semantic locality and model conservativity, it also enabled us to fix bugs in the implementation of syntactic modularity. Future work. Two questions are interesting for future work: (1) How can we re- design the experiments so that we can include the very large ontologies? (2) How do LBMs compare to other types of conservativity-based modules? Concerning (2), one could include, for example, the technique based on reduc- tion to QBF for the OWL 2 QL profile [16] when an off-the-shelf implementation becomes available. 11 The acronyms denote RightIneffectiveCardiacFunction, IneffectiveCardiacFunction, isSpecificFunctionOf, RightSideOfHeart. References 1. Franz Baader, Deborah Calvanese, Diego andMcGuinness, Daniele Nardi, and Pe- ter F. Patel-Schneider, editors. The Description Logic Handbook: Theory, Imple- mentation, and Applications. Cambridge University Press, 2003. 2. Bernardo Cuenca Grau, Ian Horrocks, Yevgeny Kazakov, and Ulrike Sattler. Modu- lar reuse of ontologies: Theory and practice. J. of Artif. Intell. Research, 31(1):273– 318, 2008. 3. Bernardo Cuenca Grau, Ian Horrocks, Yevgeny Kazakov, and Ulrike Sattler. Ex- tracting modules from ontologies: A logic-based approach. In Stuckenschmidt et al. [20], pages 159–186. 4. Bernardo Cuenca Grau, Bijan Parsia, Evren Sirin, and Aditya Kalyanpur. Modu- larity and Web ontologies. In Proc. of KR-06. AAAI Press/The MIT Press, 2006. 5. Chiara Del Vescovo, Pavel Klinov, Bijan Parsia, Ulrike Sattler, Thomas Schneider, and Dmitry Tsarkov. Empirical study of logic-based modules: Cheap is cheerful. Technical report, 2013. https://sites.google.com/site/cheapischeerful/. 6. Chiara Del Vescovo, Bijan Parsia, Ulrike Sattler, and Thomas Schneider. The modular structure of an ontology: an empirical study. volume 573 of ceur-ws. org , 2010. 7. Chiara Del Vescovo, Bijan Parsia, Ulrike Sattler, and Thomas Schneider. The modular structure of an ontology: Atomic decomposition. In Proc. of IJCAI-11, pages 2232–2237, 2011. 8. Chiara Del Vescovo, Bijan Parsia, Ulrike Sattler, and Thomas Schneider. The mod- ular structure of an ontology: Atomic decomposition and module count. volume 230 of FAIA, pages 25–39, 2011. 9. James Garson. Modularity and relevant logic. 30(2):207–223, 1989. 10. Silvio Ghilardi, Carsten Lutz, and Frank Wolter. Did I damage my ontology? A case for conservative extensions in Description Logics. In Proc. of KR-06, pages 187–197. AAAI Press/The MIT Press, 2006. 11. Matthew Horridge, Bijan Parsia, and Ulrike Sattler. The state of bio-medical ontologies. 2011. 12. Ian Horrocks, Oliver Kutz, and Ulrike Sattler. The even more irresistible SROIQ. In Proc. of KR-06, pages 57–67, 2006. 13. Boris Konev, Carsten Lutz, Dirk Walther, and Frank Wolter. Semantic modularity and module extraction in description logics. In Proc. of ECAI-08, pages 55–59, 2008. 14. Boris Konev, Carsten Lutz, Dirk Walther, and Frank Wolter. Formal properties of modularization. In Stuckenschmidt et al. [20], pages 25–66. 15. Roman Kontchakov, Luca Pulina, Ulrike Sattler, Thomas Schneider, Petra Selmer, Frank Wolter, and Michael Zakharyaschev. Minimal module extraction from DL- Lite ontologies using QBF solvers. In Proc. of IJCAI-09, pages 836–841, 2009. 16. Roman Kontchakov, Frank Wolter, and Michael Zakharyaschev. Logic-based ontol- ogy comparison and module extraction, with an application to DL-Lite. Artificial Intelligence, 174(15):1093–1141, 2010. 17. Carsten Lutz, Dirk Walther, and Frank Wolter. Conservative extensions in expres- sive Description Logics. In Proc. of IJCAI-07, pages 453–458, 2007. 18. Carsten Lutz and Frank Wolter. Deciding inseparability and conservative exten- sions in the description logic EL. 45(2):194–228, 2010. 19. Ulrike Sattler, Thomas Schneider, and Michael Zakharyaschev. Which kind of module should I extract? volume 477 of ceur-ws. org , 2009. 20. Heiner Stuckenschmidt, Christine Parent, and Stefano Spaccapietra, editors. vol- ume 5445 of LNCS. Springer-Verlag, 2009.