-

Empirical Study of Logic-Based Modules: Cheap Is Cheerful

Chiara Del Vescovo

delvescc@cs.man.ac.uk 1

Pavel Klinov

pavel.klinov@uni-ulm.de 2

Bijan Parsia

bparsia@cs.man.ac.uk 1

Ulrike Sattler

sattler@cs.man.ac.uk 1

Thomas Schneider

tschneider@informatik.uni-bremen.de 0

Dmitry Tsarkov

tsarkov@cs.man.ac.uk 1 0 Universitat Bremen , Germany 1 University of Manchester , UK 2 University of Ulm , Germany

For ontology reuse and integration, a number of approaches have been devised that aim at identifying modules, i.e., suitably small sets of \relevant" axioms from ontologies. Here we consider three logically sound notions of modules: MEX modules, only applicable to inexpressive ontologies; modules based on semantic locality, a sound approximation of the rst; and modules based on syntactic locality, a sound approximation of the second (and thus the rst), widely used since these modules can be extracted from SROIQ ontologies in time polynomial in the size of the ontology. In this paper we investigate the quality of both approximations over a large corpus of ontologies. In particular, we show with statistical signi cance that, in most cases, there is no di erence between the two module notions based on locality; where they di er, the additional axioms are in general unproblematic since either they can be easily ruled out or their number is relatively small. Finally, we show that the same can be said about the relation between MEX and locality-based modules.

Some notable examples of ontologies describe large and loosely connected domains, as it is the case for SNOMED{CT, the Systematized Nomenclature Of MEDicine, Clinical Terms,4 which describes the terminology used in medicine including diseases, drugs, etc. Users often are not interested in a whole ontology O, rather only in a limited relevant part of it. In this context, the idea has been recently explored to use modules, i.e., suitably small subsets of ontologies that behave for speci c purposes as the original ontologies over a given signature , i.e., a set of terms (non-logical symbols { concept and role names). The notion of logical module [ 9,4 ] focuses on providing coverage, i.e., on preserving all the entailments of O over .

In [ 19 ] the authors comment on the crucial role played by coverage and by two additional properties of modules for ontology reuse and integration. Let M

4 http://www.ihtsdo.org/snomed-ct/

be a subset of O. We say that: (1) M is self-contained if it provides coverage for its signature; (2) M is depleting if the remainder O n M of the ontology does not entail any non-tautological axiom over . Under some mild conditions a minimal depleting and self-contained module is also uniquely determined [ 15 ].

Extracting the uniquely determined module for a signature is, however, hard or even impossible for expressive languages [ 18,10,17 ]. For identifying notions of modules whose extraction is feasible, we can either restrict the expressivity of the ontology language, or look for feasible su cient conditions that guarantee M to be a module for , even though not necessarily the smallest.

For inexpressive logics, one can make use of the module extractor implemented in the MEX system [ 13 ]. This module extractor works on acyclic ELI terminologies and extracts the minimal module in polynomial time.

For expressive logics, module extraction can be implemented making use of the notion of locality. The resulting modules, known as locality-based modules (LBMs) [ 2 ] are, in general, not minimal. Locality comes in two avors: semantic and syntactic locality, which have a bottom-, a top-, and a star-variant which is contained in both the top- and the bottom-variants. For any of the three variants, a syntactic LBM contains the corresponding semantic LBM. Algorithms for the extraction of syntactic LBMs are known that run in time polynomial in the size of the ontology (thus much cheaper than reasoning), are implemented in the OWL API,5 and are currently used for ontology reuse and integration. In contrast, despite the fact that algorithms for extracting semantic LBMs are known, until now and to the best of our knowledge they had not been implemented. They require entailment checks against an empty ontology and thus involve reasoning of a kind that is rather unusual for DL reasoners.6

We know that the MEX module for a signature is contained in the star semantic LBM which, in turn, is contained in the star syntactic LBM. Thus, syntactic locality can be seen as an approximation of semantic locality which, in turn, is an approximation of MEX modules. An interesting question arising here is how good these approximations are: how much larger are the modules extracted by the approximations, and how much faster is the extraction?

To answer these questions, we present the rst implementation of a semantic LBM extractor and report on experiments on a large corpus of ontologies. We compare the performance and results of the semantic LBM extractor with those of a syntactic LBM extractor and with those of MEX. A comparison between MEX- and syntactic bottom-modules only for SNOMED{CT is reported in [ 13 ].

The contributions of this paper are as follows.

We provide the rst implementation for the extraction of semantic LBMs, which is embedded in the latest release of the FaCT++ reasoner. Our results show that the extraction of semantic LBMs, which is in principle hard, is feasible in practice: on average, it is between 3 times (for top-modules) and 15 times (for bottom- and star-modules) slower than the extraction of syntactic LBMs, and both only take milliseconds to seconds for most ontologies below 10K axioms.

5 http://owlapi.sourceforge.net/

6 DL reasoners usually classify an ontology: test it for consistency and all concept names for satis ability/mutual subsumption.

We show with statistical signi cance that, for almost all members of a large corpus of existing ontologies, there is no di erence between any syntactic LBM and its semantic counterpart. In the few cases where di erences occur, those are extremely modest so that it is questionable whether extracting semantic LBMs is worth the increased computational cost.

We isolate four patterns of axioms that completely explain those rare di erences. One includes simple tautologies that can be removed in a straightforward preprocessing step.

We modify the original corpus to obtain for each ontology an acyclic EL version suitable for the use with the MEX system. We then compare MEX-modules and the star-variants of LBMs, and nd di erences in only 27% of the corpus. We explain one reason for the largest di erences observed. 2

Preliminaries

We assume the reader to be familiar with Description Logic languages (e.g. SROIQ [ 1,12 ]), and aim here at xing the notations and at de ning the key notions around module extraction, with a focus on locality-based modules [ 2 ] and MEX modules [ 14 ].

Let O denote a SROIQ ontology, NC a set of concept names, and NR a set of role names. We do not consider individual names as they do not play any role in the extraction of the modules here analyzed. We refer to either a concept or a role name by using the word \term". A signature is a set NC [ NR. Given a concept, role, or axiom X, we call the set of terms in X the signature of X, denoted Xe . Given a set M O of axioms from O, and a signature , we say that O is a deductive -conservative extension ( -dCE ) of M if, for all SROIQaxioms with e , it holds that O j= if and only if M j= . O is a model -conservative extension ( -mCE ) of M if fIj j I j= Og = fIj j I j= Mg. Dually, M is a dCE-based module of O for if O is a -dCE of M, and it is an mCE-based module for if O is a -mCE of M. All dCE-based modules are also mCE-based modules, whilst the converse is not always true. A module M O for is called depleting if there is no non trivial entailment over such that O n M j= ; M is called self-contained if M is a module for = Mf.

Since M O the monotonicity of SROIQ implies that every entailment over derivable from M is also derivable from O. Deciding the converse directions is in general computationally hard, or even undecidable for expressive DLs [ 18,10,17 ]. Since we do not need to nd all the subsets of O that are a module for , we can use easier conditions which guarantee that a set of axioms M O is a module for .

One strategy to extract a module M O for a seed signature consists of de ning a suitable oracle x to decide whether a single axiom 2 O is xrelevant for preserving the non-tautological entailments over : clearly, if the x-check is positive for , then by depletion M needs to contain . To guarantee also self-containment, the xpoint procedure described in Algorithm 1 needs to be performed. The module extracted will be called an x-module, parameterized by the notion x of relevance. Algorithm 1 is a special case of the one in [2, Algorithm 1 Extraction of an x-module for

false 2 O0 do is x-relevant w.r.t.

M M [ f g; O0 until changed = false return M Input: Ontology O, seed signature , notion of relevance x Output: x-module M of O w.r.t.

M ;; O0 O repeat changed for all if [ Mf then O0 n f g; changed true

The MEX system. In [ 14 ], a procedure called MEX for extracting minimal mCEbased modules from acyclic ELI ontologies is given. The notion of MEX-relevance is based on a relation dependO, which associates each concept name A with the set dependO(A) of all the symbols X that are used in the de nition7 of A in O. Then, an axiom 2 O is MEX-relevant for a signature if it de nes a term in . The authors prove that, if O is an acyclic ELI ontology, then using the MEX-relevance in Algorithm 1 generates the minimal depleting self-contained module for each signature , and that the extraction runs in polynomial time. Semantic locality In [ 2 ], the authors de ne the notion of locality and distinguish two avors, here called ;- and -locality. Intuitively, a SROIQ axiom is ;-local (resp. -local ) w.r.t. signature if 0 obtained by replacing all terms in e n with ? (resp. >) is a tautology. The authors prove that, if all axioms in O n M are ;-local (or all axioms are -local) w.r.t. [ Mf, then M is an mCEbased (and hence dCE-based) module of O for . Since deciding ;- or -locality requires tautology checks, this problem is as hard as standard reasoning. In some cases, 0 is not a SROIQ axiom, so standard reasoners need to be extended. Syntactic locality In order to achieve tractable module extraction, [ 2 ] de ne the two syntactic notions of ?- and >-locality. Those approximate the two notions of semantic ;- and -locality. Intuitively, the syntactic rules provided describe axioms for which 0 is obviously a tautology|in which case is said to be ?or >-local w.r.t. . Thus, every ;-local ( -local, resp.) axiom w.r.t. is also ?-local (>-local, resp.) w.r.t. , but not vice versa. As a consequence, also ?and >-modules are mCE- and dCE-based modules for . Applying the syntactic rules requires polynomial time, hence the extraction of this kind of modules can be performed in time polynomial in the size of the ontology. 7 This notion of \de nition" is speci c for the MEX-modules

Modules based on syntactic (semantic) locality can be made smaller by iteratively nesting >- and ?-extraction ( - and ;-extraction), again obtaining mCE- and dCE-based modules [ 2,19 ], called >? - and ; -modules. The ; module for is always contained in the corresponding >? -module. Moreover, for acyclic ELI ontologies, the MEX-module for is always contained in the corresponding ; -module. 3

Research questions and experimental design

A natural question is whether syntactic modules are likely to be much larger than the semantic ones, which are, in theory, computationally more costly. Hence, a second question is whether semantic module extraction is noticeably more costly: the tautology test has to be carried out often|once per axiom and signature that the algorithm goes through| and it is thus hard to predict the feasibility of semantic LBM extraction. Altogether, we want to know whether syntactic LBMs are a necessary approximation and how good an approximation of semantic LBMs they are. Similarly, for acyclic ELI terminologies the analogous question arises: how good an approximation of MEX modules are LBMs?

One can always construct ontologies with huge di erences in size and time between syntactic and semantic LBMs and between LBMs and MEX modules. Here, we are interested in these di erences in currently available ontologies, and thus need to design, run, and analyse suitable experiments.

Selection of the corpus. For our experiments, we have built a corpus containing: (1) all the ontologies from the NCBO BioPortal ontology repository,8 version of November 2012; (2) ontologies from the TONES repository9 which have already been studied in previous work on modularity [ 6 ]: Koala, Mereology, University, People, miniTambis, OWL-S, Tambis, Galen. From this corpus, we have removed ontologies that cannot be downloaded, whose .owl le is corrupted or impossible to parse, or which are inconsistent. Furthermore, we have excluded those large ontologies (exceeding 10K axioms) where the extraction of a semantic LBM repeatedly took more than 2 minutes: for each such ontology, the estimated time needed to perform our experiments would have exceeded 300 hours.

This selection results in a corpus of 242 ontologies, which greatly vary in expressivity (from AL to SROIQ(D)) and in size (10{16,066 axioms, 10{16,068 terms) [ 11 ]. For a full list of the corpus, please refer to [ 5 ].

As mentioned before, for some ontologies is not possible to test -locality (and thus for extracting - and ; -modules) using standard DL reasoners., see [ 5 ] for details. To cover these cases, we have extended the reasoner FaCT++ to cover the uses of the >-role speci c to the semantic locality tests.

Since MEX handles only acyclic ELI ontologies, we created an ELI version ELI(O) of each ontology O in our corpus by ltering unsupported axioms and breaking terminological cycles. While a principled way of doing this is beyond the scope of this paper, we have used a heuristic, which is described in [ 5 ].

8 http://bioportal.bioontology.org 9 http://owl.cs.manchester.ac.uk/repository/

Comparing modules and locality. In order to compare syntactic and semantic locality, as well as LBMs and MEX modules, we want to understand (1) whether, for a given seed signature , it is likely that there is a di erence between the syntactic and the semantic -module or the MEX module for and the latter and, if so, the size of the di erence;10 and(2) how feasible the extraction of semantic LBMs is. For this purpose, we compare (a) ;-semantic and ?-syntactic locality, -semantic and >-syntactic locality, (b) ;- and ?-modules, - and >-modules, ; - and >? -modules, (c) MEX modules and ; -modules. Due to the recursive nature of Algorithm 1, our investigation is both on a per-axiom-basis: given axiom and signature , is it likely that is semantically ;-local ( -local, resp.) w.r.t. but not syntactically ?-local (>-local, resp.) w.r.t. ? per-module basis: given a signature , is it likely that { ?-mod( ; O) 6= ;-mod( ; O), or { >-mod( ; O) 6= -mod( ; O), or { >? -mod( ; O) 6= ; -mod( ; O), or { ; -mod( ; O) 6= MEX-mod( ; O)? If yes, is it likely that the di erence is large?

Clearly we need to pick, for each ontology in our corpus, a suitable set of signatures, and this poses a signi cant problem. A full investigation is infeasible: if m = #Oe, there are 2m possible seed signatures, so that testing axioms for locality against all the signatures is already impossible for m 100. One could assume that comparing modules is easier since many signatures can lead to the same module. However, previous work [ 6,8 ] has shown that the number of modules in ontologies is, in general, exponential w.r.t. the size of the ontology. Still, di erent seed signatures can lead to the same module, which makes it hard to extract enough di erent modules.

We will consider seed signatures of two kinds: genuine seed signatures and random seed signatures.

Genuine seed signatures. A module does not necessarily show an internal coherence: e.g., if we had an ontology O about the domains of geology and philosophy, we could extract the module for the signature = fEpistemology; Mineralg. That module is likely to be the union of the two disjoint modules for 1 = fEpistemologyg and 2 = fMineralg.

In contrast, genuine modules can be said to be coherent: they are those modules that cannot be decomposed into the union of two \ "-uncomparable modules. Notably, there are only linearly many genuine modules in the size of O since each genuine x-module equals x-mod( ~; O), for some axiom 2 O. Moreover, all modules of O are composed from genuine modules [ 7 ]. Thus, genuine modules are of special interest, and we can investigate them, and the corresponding genuine signatures, in full. 10 Recall: the MEX module is always a subset of the semantic always a subset of the syntactic -module. -module, which is Random seed signatures. Since a full investigation of all the signatures is impossible, we compare locality|both on a per-axiom and per-module basis| as well as LBMs and MEX modules on a random signature , which we select by setting each named entity E in the ontology to have probability p = 1=2 of being included in . This ensures that each will have the same probability to be chosen. This approach has a clear setback: the random variable \size of the seed signature generated" follows a binomial distribution, so a random seed signature is highly likely to be rather large and to contain half the terms of the ontology. However, we do not yet have enough insight into what typical seed signatures are for module extraction, so biasing the selection of signatures to, for example, those of a certain size has no rationale. In contrast, selecting random seed signatures avoids the introduction of any bias. Moreover, this choice is complementary to the selection of all the genuine signatures, which are in general small.

Whilst the selection of genuine signatures is complete, we can only aim at selecting a number of random signatures to obtain statistically signi cant statements about modules. To reach a con dence level of 95% that the true proportion of di erences between modules lies in the con dence interval ( 5%) of the observed proportion, we have to sample at least 385 seed signatures (see the detailed explanations in [ 5 ]). For ontologies with at least 9 elements in the signature, we will therefore draw a sample of size 400. For all other ontologies, we will look at all of the 6 400 signatures.

Summary. We compare, for every ontology O in our corpus, (T1) for random seed signatures from O, (a) for each axiom in O, is { ;-local w.r.t. but not ?-local w.r.t. ? { -local w.r.t. but not >-local w.r.t. ? (b) is { ?-mod( ; O) 6= ;-mod( ; O)? { >-mod( ; O) 6= -mod( ; O)? { >? -mod( ; O) 6= ; -mod( ; O)? { ; -mod( ; ELI(O)) 6= MEX-mod( ; ELI(O))? (T2) the same, ranging over all the genuine signatures ~ for 4

Results of the Experiments

2 O.

No di erences in locality. The main result of the experiment is that, for the vast majority of the ontologies in our corpus, no di erence between syntactic and semantic locality is observed, for all three variants ? vs. ;, > vs. , and >? vs. ; . More precisely, for 210 out of 242 ontologies, we obtain that: (T1) for random seed signatures, there is no statistically signi cant di erence (a) between semantic and syntactic locality of any kind, (b) between semantic and syntactic LBMs of any kind; (T2) given any genuine signature, there is no such di erence.

More speci cally, for all randomly generated seed signatures and all genuine signatures, the corresponding bottom-modules (and the corresponding top- and star-modules, respectively) agree, and every axiom is either ?- and ;-local, or none of both (and either >- and -local, or none of both).

The 210 ontologies include Galen and People, which are renowned for having unusually large ?-modules [ 2,8 ]. In most cases, extracting a semantic and syntactic LBM each took only a few milliseconds; hence, a performance comparison is not meaningful. For some ontologies, the semantic LBM took considerably longer to extract than the syntactic: up to 5 times for star-modules in Molecule Role, and up to 34 times in Galen.

Di erences in locality. We have observed di erences between syntactic and semantic locality for 32 ontologies in our corpus. We call the axioms that cause these di erences culprits { patterns of axioms which are not ?-local (>-local, respectively) w.r.t. some signature , but which are ;-local ( -local, respectively) w.r.t. . We have identi ed four types of patterns, a{d , and we describe them in the following. Sometimes, culprit axioms pull additional axioms into the syntactic LBM, due to signature extension during module extraction.

We denote concept names by A; B, complex concepts by C; D, roles by r; s; : : : , nominals by a, non-empty data ranges (e.g., int or int0::9) by R, possibly with indices. denotes a signature for which a module is extracted or against which an axiom is checked for locality. Terms outside are overlined; we further use notation C? and C> to denote concepts that are bottom- or top-equivalent due to the grammar de ning syntactic locality in [3, Def. 6] and the analogous grammar for semantic locality.

Culprits of type a are simple tautologies that accidentally entered the \inferred view" (closure under certain entailments) of an ontology. These axioms do not occur in the original \asserted" versions and could, in principle, be detected in a simple preprocessing step. Type-a culprits occur in 10 ontologies of the above 32, and are of the following kinds: A v A, r (r ) , and A u C u D v A u C. Each such tautology is trivially ;-local and -local w.r.t. any , but not always ?- or >-local: if contains all terms in , then both sides of the subsumption (equivalence) are neither ?- nor >-equivalent.

Di erences caused not solely by culprits of type a have been observed for 26 ontologies. In only 6 of these cases, the di erences a ect modules; in the remaining 20, they only a ect locality of single axioms (tests T1 a and T2 a). We will focus on the former 6, listed in Table 1, and refer to [ 5 ] for details on all 26.

Ontology Abbreviation DL expressivity #axioms #terms MiniTambis-repaired MiniT 170 226 Tambis-full Tambis 592 496 Bleeding History P... BHO 1,925 581 Neuro Behavior O... NBO 1,314 970 Pharmacogenomic... PhaRe 459 311 Terminological and... TOK 466 330

According to Table 1, di erences between modules occur for ontologies of medium to large size and medium to high expressivity. Di erences in locality alone additionally a ect small ontologies such as Koala (42 axioms) and Pilot Ontology (85 axioms), as well as large ontologies such as Galen (4,735 axioms) and Experimental Factor Ontology (7,156 axioms). The number of axioms causing these di erences (i.e., matching the culprit patterns) in the a ected ontologies is small except for Galen, and most of the observed di erences are relatively small.

Table 2 gives a representative selection of the di erences in modules observed. For a complete overview, including di erences in locality of single axioms, consult the table in [ 5 ].

Ontology Types a ected miniT Tambis BHOa NBOa PhaRea TOK bot, star bot, star star star top, star top, star adi erences only for genuine modules bdi erences > 5% only for genuine modules cdi erences > 11 axioms (> 2%) only for genuine modules ddi erences > 13 axioms (> 1,300%) only for top-modules The columns show: ontology name (abbreviations: see Table 1); type of modules affected; relative number of module pairs with di erences; number of axioms in the di erences (absolute and relative to the ;- or - or ; -case); type of culprit present and number of axioms of this type involved in di erences.

Table 2. Overview of observed di erences between modules

Table 2 shows small absolute di erences for miniT, BHO, NBO, and TOK. In Tambis, large di erences occur only for genuine modules, which suggests that they are unlikely to occur in practical cases with usually larger seed signatures. Finally, in PhaRe, large di erences occur only for top-modules. For all these ontologies, a single syntactic or semantic module was extracted within only a few milliseconds, making module extraction times roughly equal. Culprits of type b are axioms with an 9-restriction on a set of nominals or a non-empty data range on the right-hand side, such as A v 9r:fa1; : : : ; ang or A v 9r:R. These axioms are -local w.r.t. any signature that does not contain r because they become tautologies if r is replaced by >. However, they can only be >-local when A is a ?-equivalent concept w.r.t. .

Culprit-b axioms a ect genuine modules of BHO, and (only) locality of single axioms for 4 more ontologies. We observed a slightly more sophisticated variant of the form A C> u 9r:R.

Culprits of type c are axioms that contain a concept description C such that (a) C becomes equivalent to ? (or >) if all terms outside are replaced by ? (or >); (b) this causes to be semantically ?-local (or >-local); but (c) the grammars for syntactic locality do not \detect" C to be a C? (or C>). For example, C = 8r:A u 9r:> becomes ?-equivalent if A is replaced by ?; the same holds with cardinality restrictions in place of \9". Consequently, axioms such as A? B u 8r:C? u 8s:fag u =3 r:>; (taken from Koala) are ;-local but not ?-local.

We found this pattern in 8 ontologies. Only in miniT and Tambis, it a ects a large proportion of bottom- and star-modules, with additional axioms \pulled in". Still, the size of the di erences is modest, as argued above. Some of the remaining 6 ontologies contain di erent kinds of complex concepts that cause di erences in top-locality of single axioms.

Culprits of type d are axioms where a concept (or role) name from the left-hand side occurs on the right-hand side together with a top-equivalent role (or concept), causing di erences in top-modules. The simplest kind of axiom of this type is A v 9r:A, which is -local because replacing r with > makes it a tautology. The axiom is only >-local if contains neither r nor A. We have found further examples of increasing complexity in Adverse Event Reporting Ontology and Galen; see [ 5 ].

We have observed culprits of type d in 17 ontologies, see the detailed overview in [ 5 ]. Only in 3 cases (NBO, PhaRe, and TOK) are modules a ected.

Galen contains 121 culprit-D axioms, but they only a ect locality of single axioms. In addition, the time di erences for Galen are remarkable: checking all axioms for -locality takes up to 70 times longer than checking them for >locality.

Summary. All culprits hardly ever cause signi cant di erences in modules. Only for PhaRe are di erences between semantic and syntactic modules not negligible, but we were able to relativize them.

Table 1 may suggest that culprits occur only in expressive ontologies. However, patterns a, c, d can, in principle, already occur in simple terminologies in EL and ALC, respectively. Evidently, type-a culprits can easily be ltered out in a preprocessing step. For types c and d , there is no hope for an exhaustive extension to locality because they can (and do) occur in arbitrarily complex shapes and contexts.

Patterns of type b rely on nominals or datatypes { but they are repairable by a straightforward extension to the de nition of syntactic locality: one can extend the locality de nition to distinguish ?- and >-distinct concepts, by adding appropriate grammars to the de nition of syntactic locality, and adding more cases of ?- and >-equivalent concepts to the existing grammars. However, from the small numbers of di erences observed, we doubt that such an extension of syntactic locality will have any signi cant e ects in practice.

LBMs vs MEX results. The results of the experimental comparison of syntactic/semantic LBMs and MEX modules are summarized in Table 3. They show that MEX modules smaller than the corresponding LBMs can be found in 27% of the preprocessed ontologies, for either random or axiom-based seed signatures. At the same time, unsurprisingly, syntactic and semantic LBMs do not di er at all for these simple ELI ontologies.

In experiments with random seed signatures, it can be seen that for those ontologies where there are di erences (most notably, Galen), they occur in many Experiment Random signatures Axiom signatures The results from the third column on are averaged over all ontologies with di erences LBM{MEX in at least one module. For example, the last two columns show the average min and max absolute (resp. relative) di erence between LBMs and MEX modules. tests. Thus, the di erence appears to be caused by features of the ontology, not some particular seed signatures. Also, the di erence sometimes comes out large in certain tests, also for genuine modules. For example, for the signature of the following axiom in Galen, both ; -mod and >? -mod contain 127 axioms while the MEX-module only contains the axiom itself:11 RICF ICF u 9ISFO:RSH. The likely reason is the proliferation of concept equivalence axioms in Galen. For example A B will end up in the ; -mod for any seed signature containing either A or B. It is, however, an mCE of ; w.r.t. to either fAg or fBg. 5

Conclusion and outlook

Summary. We obtain three main observations from our experiments. (1) In general, there is no or little di erence between semantic and syntactic locality. Hence, the computationally cheaper syntactic locality is a good approximation of semantic locality. For the ontologies Galen and People, which are \renowned" for having disproportionately large modules, syntactic and semantic LBMs do not di er. (2) In most cases, there is no or little di erence between LBMs and MEX modules. Only for Galen are MEX modules considerably smaller than LBMs. (3) Though in principle hard to compute, semantic LBMs can be extracted rather fast in practice. Still, their extraction often takes considerably longer than for syntactic LBMs. We cannot make any statement about MEX module extraction times because we use the original MEX implementation, which combines loading and module extraction. Due to (1), hardly any bene t can be expected from preferring the potentially smaller semantic LBMs to the cheaper syntactic LBMs. From (2) we can say that semantic LBMs can be seen as the best available approximation of MEX modules for ontologies in highly expressive languages.

Not only does our study evaluate how good the cheap syntactic locality approximates semantic locality and model conservativity, it also enabled us to x bugs in the implementation of syntactic modularity.

Future work. Two questions are interesting for future work: (1) How can we redesign the experiments so that we can include the very large ontologies? (2) How do LBMs compare to other types of conservativity-based modules?

Concerning (2), one could include, for example, the technique based on reduction to QBF for the OWL 2 QL pro le [ 16 ] when an o -the-shelf implementation becomes available. 11 The acronyms denote RightIne ectiveCardiacFunction, Ine ectiveCardiacFunction, isSpeci cFunctionOf, RightSideOfHeart.

Franz

Baader , Deborah Calvanese, Diego andMcGuinness, Daniele Nardi, and Peter F. Patel-Schneider, editors. The Description Logic Handbook: Theory , Implementation, and Applications . Cambridge University Press, 2003 .

Bernardo

Cuenca Grau , Ian Horrocks, Yevgeny Kazakov, and

Ulrike

Sattler . Modular reuse of ontologies: Theory and practice . J. of Artif. Intell. Research , 31 ( 1 ): 273 { 318 , 2008 .

Bernardo

Cuenca Grau , Ian Horrocks, Yevgeny Kazakov, and

Ulrike

Sattler . Extracting modules from ontologies: A logic-based approach . In Stuckenschmidt et al. [ 20 ], pages 159 { 186 .

Bernardo

Cuenca Grau , Bijan Parsia, Evren Sirin, and

Aditya

Kalyanpur . Modularity and Web ontologies . In Proc. of KR-06 . AAAI Press/The MIT Press, 2006 .

Chiara

Del Vescovo ,

Pavel

Klinov , Bijan Parsia, Ulrike Sattler, Thomas Schneider, and

Dmitry

Tsarkov . Empirical study of logic-based modules: Cheap is cheerful . Technical report , 2013 . https://sites.google.com/site/cheapischeerful/.

Chiara

Del Vescovo ,

Bijan

Parsia , Ulrike Sattler, and

Thomas

Schneider . The modular structure of an ontology: an empirical study . volume 573 of ceur-ws. org , 2010 .

Chiara

Del Vescovo ,

Bijan

Parsia , Ulrike Sattler, and

Thomas

Schneider . The modular structure of an ontology: Atomic decomposition . In Proc. of IJCAI-11 , pages 2232 { 2237 , 2011 .

Chiara

Del Vescovo ,

Bijan

Parsia , Ulrike Sattler, and

Thomas

Schneider . The modular structure of an ontology: Atomic decomposition and module count . volume 230 of FAIA , pages 25 { 39 , 2011 .

James

Garson . Modularity and relevant logic . 30 ( 2 ): 207 { 223 , 1989 .

10. Silvio

Ghilardi

, Carsten Lutz, and

Frank

Wolter . Did I damage my ontology? A case for conservative extensions in Description Logics . In Proc. of KR-06 , pages 187 { 197 . AAAI Press/The MIT Press, 2006 .

11. Matthew

Horridge

, Bijan Parsia, and

Ulrike

Sattler . The state of bio-medical ontologies . 2011 .

12. Ian

Horrocks

, Oliver Kutz, and

Ulrike

Sattler . The even more irresistible SROIQ . In Proc. of KR-06 , pages 57 { 67 , 2006 .

13. Boris

Konev

, Carsten Lutz, Dirk Walther, and

Frank

Wolter . Semantic modularity and module extraction in description logics . In Proc. of ECAI-08 , pages 55 { 59 , 2008 .

14. Boris

Konev

, Carsten Lutz, Dirk Walther, and

Frank

Wolter . Formal properties of modularization . In Stuckenschmidt et al. [ 20 ], pages 25 { 66 .

15. Roman

Kontchakov

, Luca Pulina, Ulrike Sattler, Thomas Schneider, Petra Selmer, Frank Wolter, and

Michael

Zakharyaschev . Minimal module extraction from DLLite ontologies using QBF solvers . In Proc. of IJCAI-09 , pages 836 { 841 , 2009 .

16. Roman

Kontchakov

, Frank Wolter, and

Michael

Zakharyaschev . Logic-based ontology comparison and module extraction, with an application to DL-Lite . Arti cial Intelligence , 174 ( 15 ): 1093 { 1141 , 2010 .

17. Carsten

Lutz

, Dirk Walther, and

Frank

Wolter . Conservative extensions in expressive Description Logics . In Proc. of IJCAI-07 , pages 453 { 458 , 2007 .

18.

Carsten

Lutz and

Frank

Wolter . Deciding inseparability and conservative extensions in the description logic EL . 45 ( 2 ): 194 { 228 , 2010 .

19. Ulrike

Sattler

, Thomas Schneider, and

Michael

Zakharyaschev . Which kind of module should I extract? volume 477 of ceur-ws . org , 2009 .

20. Heiner

Stuckenschmidt

, Christine Parent, and Stefano Spaccapietra, editors. volume 5445 of LNCS . Springer-Verlag, 2009 .