<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Empirical Study of Logic-Based Modules: Cheap Is Cheerful</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Chiara Del Vescovo</string-name>
          <email>delvescc@cs.man.ac.uk</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pavel Klinov</string-name>
          <email>pavel.klinov@uni-ulm.de</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bijan Parsia</string-name>
          <email>bparsia@cs.man.ac.uk</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ulrike Sattler</string-name>
          <email>sattler@cs.man.ac.uk</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Thomas Schneider</string-name>
          <email>tschneider@informatik.uni-bremen.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dmitry Tsarkov</string-name>
          <email>tsarkov@cs.man.ac.uk</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Universitat Bremen</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Manchester</institution>
          ,
          <country country="UK">UK</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Ulm</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>For ontology reuse and integration, a number of approaches have been devised that aim at identifying modules, i.e., suitably small sets of \relevant" axioms from ontologies. Here we consider three logically sound notions of modules: MEX modules, only applicable to inexpressive ontologies; modules based on semantic locality, a sound approximation of the rst; and modules based on syntactic locality, a sound approximation of the second (and thus the rst), widely used since these modules can be extracted from SROIQ ontologies in time polynomial in the size of the ontology. In this paper we investigate the quality of both approximations over a large corpus of ontologies. In particular, we show with statistical signi cance that, in most cases, there is no di erence between the two module notions based on locality; where they di er, the additional axioms are in general unproblematic since either they can be easily ruled out or their number is relatively small. Finally, we show that the same can be said about the relation between MEX and locality-based modules.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Some notable examples of ontologies describe large and loosely connected
domains, as it is the case for SNOMED{CT, the Systematized Nomenclature Of
MEDicine, Clinical Terms,4 which describes the terminology used in medicine
including diseases, drugs, etc. Users often are not interested in a whole ontology
O, rather only in a limited relevant part of it. In this context, the idea has been
recently explored to use modules, i.e., suitably small subsets of ontologies that
behave for speci c purposes as the original ontologies over a given signature ,
i.e., a set of terms (non-logical symbols { concept and role names). The notion
of logical module [
        <xref ref-type="bibr" rid="ref4 ref9">9,4</xref>
        ] focuses on providing coverage, i.e., on preserving all the
entailments of O over .
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] the authors comment on the crucial role played by coverage and by
two additional properties of modules for ontology reuse and integration. Let M
      </p>
    </sec>
    <sec id="sec-2">
      <title>4 http://www.ihtsdo.org/snomed-ct/</title>
      <p>
        be a subset of O. We say that: (1) M is self-contained if it provides coverage for
its signature; (2) M is depleting if the remainder O n M of the ontology does
not entail any non-tautological axiom over . Under some mild conditions a
minimal depleting and self-contained module is also uniquely determined [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ].
      </p>
      <p>
        Extracting the uniquely determined module for a signature is, however,
hard or even impossible for expressive languages [
        <xref ref-type="bibr" rid="ref10 ref17 ref18">18,10,17</xref>
        ]. For identifying
notions of modules whose extraction is feasible, we can either restrict the
expressivity of the ontology language, or look for feasible su cient conditions that
guarantee M to be a module for , even though not necessarily the smallest.
      </p>
      <p>
        For inexpressive logics, one can make use of the module extractor
implemented in the MEX system [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. This module extractor works on acyclic ELI
terminologies and extracts the minimal module in polynomial time.
      </p>
      <p>
        For expressive logics, module extraction can be implemented making use of
the notion of locality. The resulting modules, known as locality-based modules
(LBMs) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] are, in general, not minimal. Locality comes in two avors: semantic
and syntactic locality, which have a bottom-, a top-, and a star-variant which is
contained in both the top- and the bottom-variants. For any of the three variants,
a syntactic LBM contains the corresponding semantic LBM. Algorithms for the
extraction of syntactic LBMs are known that run in time polynomial in the size of
the ontology (thus much cheaper than reasoning), are implemented in the OWL
API,5 and are currently used for ontology reuse and integration. In contrast,
despite the fact that algorithms for extracting semantic LBMs are known, until
now and to the best of our knowledge they had not been implemented. They
require entailment checks against an empty ontology and thus involve reasoning
of a kind that is rather unusual for DL reasoners.6
      </p>
      <p>We know that the MEX module for a signature is contained in the star
semantic LBM which, in turn, is contained in the star syntactic LBM. Thus,
syntactic locality can be seen as an approximation of semantic locality which,
in turn, is an approximation of MEX modules. An interesting question arising
here is how good these approximations are: how much larger are the modules
extracted by the approximations, and how much faster is the extraction?</p>
      <p>
        To answer these questions, we present the rst implementation of a semantic
LBM extractor and report on experiments on a large corpus of ontologies. We
compare the performance and results of the semantic LBM extractor with those
of a syntactic LBM extractor and with those of MEX. A comparison between
MEX- and syntactic bottom-modules only for SNOMED{CT is reported in [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
      </p>
      <p>The contributions of this paper are as follows.</p>
      <p>We provide the rst implementation for the extraction of semantic LBMs,
which is embedded in the latest release of the FaCT++ reasoner. Our results
show that the extraction of semantic LBMs, which is in principle hard, is feasible
in practice: on average, it is between 3 times (for top-modules) and 15 times (for
bottom- and star-modules) slower than the extraction of syntactic LBMs, and
both only take milliseconds to seconds for most ontologies below 10K axioms.</p>
    </sec>
    <sec id="sec-3">
      <title>5 http://owlapi.sourceforge.net/</title>
      <p>6 DL reasoners usually classify an ontology: test it for consistency and all concept
names for satis ability/mutual subsumption.</p>
      <p>We show with statistical signi cance that, for almost all members of a large
corpus of existing ontologies, there is no di erence between any syntactic LBM
and its semantic counterpart. In the few cases where di erences occur, those are
extremely modest so that it is questionable whether extracting semantic LBMs
is worth the increased computational cost.</p>
      <p>We isolate four patterns of axioms that completely explain those rare di
erences. One includes simple tautologies that can be removed in a straightforward
preprocessing step.</p>
      <p>We modify the original corpus to obtain for each ontology an acyclic EL version
suitable for the use with the MEX system. We then compare MEX-modules and
the star-variants of LBMs, and nd di erences in only 27% of the corpus. We
explain one reason for the largest di erences observed.
2</p>
      <sec id="sec-3-1">
        <title>Preliminaries</title>
        <p>
          We assume the reader to be familiar with Description Logic languages (e.g.
SROIQ [
          <xref ref-type="bibr" rid="ref1 ref12">1,12</xref>
          ]), and aim here at xing the notations and at de ning the key
notions around module extraction, with a focus on locality-based modules [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]
and MEX modules [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ].
        </p>
        <p>Let O denote a SROIQ ontology, NC a set of concept names, and NR a set of
role names. We do not consider individual names as they do not play any role
in the extraction of the modules here analyzed. We refer to either a concept or
a role name by using the word \term". A signature is a set NC [ NR. Given
a concept, role, or axiom X, we call the set of terms in X the signature of X,
denoted Xe . Given a set M O of axioms from O, and a signature , we say
that O is a deductive -conservative extension ( -dCE ) of M if, for all
SROIQaxioms with e , it holds that O j= if and only if M j= . O is a model
-conservative extension ( -mCE ) of M if fIj j I j= Og = fIj j I j= Mg.
Dually, M is a dCE-based module of O for if O is a -dCE of M, and it is
an mCE-based module for if O is a -mCE of M. All dCE-based modules
are also mCE-based modules, whilst the converse is not always true. A module
M O for is called depleting if there is no non trivial entailment over
such that O n M j= ; M is called self-contained if M is a module for = Mf.</p>
        <p>
          Since M O the monotonicity of SROIQ implies that every entailment
over derivable from M is also derivable from O. Deciding the converse
directions is in general computationally hard, or even undecidable for expressive
DLs [
          <xref ref-type="bibr" rid="ref10 ref17 ref18">18,10,17</xref>
          ]. Since we do not need to nd all the subsets of O that are a
module for , we can use easier conditions which guarantee that a set of axioms
M O is a module for .
        </p>
        <p>One strategy to extract a module M O for a seed signature consists
of de ning a suitable oracle x to decide whether a single axiom 2 O is
xrelevant for preserving the non-tautological entailments over : clearly, if the
x-check is positive for , then by depletion M needs to contain . To guarantee
also self-containment, the xpoint procedure described in Algorithm 1 needs to
be performed. The module extracted will be called an x-module, parameterized
by the notion x of relevance. Algorithm 1 is a special case of the one in [2,
Algorithm 1 Extraction of an x-module for</p>
        <p>false
2 O0 do
is x-relevant w.r.t.</p>
        <p>M M [ f g; O0
until changed = false
return M
Input: Ontology O, seed signature , notion of relevance x
Output: x-module M of O w.r.t.</p>
        <p>M ;; O0 O
repeat
changed
for all
if
[ Mf then
O0 n f g; changed
true</p>
        <p>
          The MEX system. In [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ], a procedure called MEX for extracting minimal
mCEbased modules from acyclic ELI ontologies is given. The notion of MEX-relevance
is based on a relation dependO, which associates each concept name A with the
set dependO(A) of all the symbols X that are used in the de nition7 of A in O.
Then, an axiom 2 O is MEX-relevant for a signature if it de nes a term
in . The authors prove that, if O is an acyclic ELI ontology, then using the
MEX-relevance in Algorithm 1 generates the minimal depleting self-contained
module for each signature , and that the extraction runs in polynomial time.
Semantic locality In [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], the authors de ne the notion of locality and
distinguish two avors, here called ;- and -locality. Intuitively, a SROIQ axiom is
;-local (resp. -local ) w.r.t. signature if 0 obtained by replacing all terms in
e n with ? (resp. &gt;) is a tautology. The authors prove that, if all axioms in
O n M are ;-local (or all axioms are -local) w.r.t. [ Mf, then M is an
mCEbased (and hence dCE-based) module of O for . Since deciding ;- or -locality
requires tautology checks, this problem is as hard as standard reasoning. In some
cases, 0 is not a SROIQ axiom, so standard reasoners need to be extended.
Syntactic locality In order to achieve tractable module extraction, [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] de ne the
two syntactic notions of ?- and &gt;-locality. Those approximate the two notions
of semantic ;- and -locality. Intuitively, the syntactic rules provided describe
axioms for which 0 is obviously a tautology|in which case is said to be
?or &gt;-local w.r.t. . Thus, every ;-local ( -local, resp.) axiom w.r.t. is also
?-local (&gt;-local, resp.) w.r.t. , but not vice versa. As a consequence, also
?and &gt;-modules are mCE- and dCE-based modules for . Applying the syntactic
rules requires polynomial time, hence the extraction of this kind of modules can
be performed in time polynomial in the size of the ontology.
7 This notion of \de nition" is speci c for the MEX-modules
        </p>
        <p>
          Modules based on syntactic (semantic) locality can be made smaller by
iteratively nesting &gt;- and ?-extraction ( - and ;-extraction), again obtaining
mCE- and dCE-based modules [
          <xref ref-type="bibr" rid="ref19 ref2">2,19</xref>
          ], called &gt;? - and ; -modules. The ;
module for is always contained in the corresponding &gt;? -module. Moreover,
for acyclic ELI ontologies, the MEX-module for is always contained in the
corresponding ; -module.
3
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>Research questions and experimental design</title>
        <p>A natural question is whether syntactic modules are likely to be much larger than
the semantic ones, which are, in theory, computationally more costly. Hence, a
second question is whether semantic module extraction is noticeably more costly:
the tautology test has to be carried out often|once per axiom and signature
that the algorithm goes through| and it is thus hard to predict the feasibility
of semantic LBM extraction. Altogether, we want to know whether syntactic
LBMs are a necessary approximation and how good an approximation of
semantic LBMs they are. Similarly, for acyclic ELI terminologies the analogous
question arises: how good an approximation of MEX modules are LBMs?</p>
        <p>One can always construct ontologies with huge di erences in size and time
between syntactic and semantic LBMs and between LBMs and MEX modules.
Here, we are interested in these di erences in currently available ontologies, and
thus need to design, run, and analyse suitable experiments.</p>
        <p>
          Selection of the corpus. For our experiments, we have built a corpus
containing: (1) all the ontologies from the NCBO BioPortal ontology repository,8
version of November 2012; (2) ontologies from the TONES repository9 which
have already been studied in previous work on modularity [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]: Koala, Mereology,
University, People, miniTambis, OWL-S, Tambis, Galen. From this corpus, we have
removed ontologies that cannot be downloaded, whose .owl le is corrupted or
impossible to parse, or which are inconsistent. Furthermore, we have excluded
those large ontologies (exceeding 10K axioms) where the extraction of a semantic
LBM repeatedly took more than 2 minutes: for each such ontology, the estimated
time needed to perform our experiments would have exceeded 300 hours.
        </p>
        <p>
          This selection results in a corpus of 242 ontologies, which greatly vary in
expressivity (from AL to SROIQ(D)) and in size (10{16,066 axioms, 10{16,068
terms) [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. For a full list of the corpus, please refer to [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ].
        </p>
        <p>
          As mentioned before, for some ontologies is not possible to test -locality
(and thus for extracting - and ; -modules) using standard DL reasoners., see
[
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] for details. To cover these cases, we have extended the reasoner FaCT++ to
cover the uses of the &gt;-role speci c to the semantic locality tests.
        </p>
        <p>
          Since MEX handles only acyclic ELI ontologies, we created an ELI version
ELI(O) of each ontology O in our corpus by ltering unsupported axioms and
breaking terminological cycles. While a principled way of doing this is beyond
the scope of this paper, we have used a heuristic, which is described in [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ].
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>8 http://bioportal.bioontology.org</title>
    </sec>
    <sec id="sec-5">
      <title>9 http://owl.cs.manchester.ac.uk/repository/</title>
      <p>Comparing modules and locality. In order to compare syntactic and
semantic locality, as well as LBMs and MEX modules, we want to understand
(1) whether, for a given seed signature , it is likely that there is a di erence
between the syntactic and the semantic -module or the MEX module for
and the latter and, if so, the size of the di erence;10 and(2) how feasible the
extraction of semantic LBMs is. For this purpose, we compare (a) ;-semantic and
?-syntactic locality, -semantic and &gt;-syntactic locality, (b) ;- and ?-modules,
- and &gt;-modules, ; - and &gt;? -modules, (c) MEX modules and ; -modules.
Due to the recursive nature of Algorithm 1, our investigation is both on a
per-axiom-basis: given axiom and signature , is it likely that is
semantically ;-local ( -local, resp.) w.r.t. but not syntactically ?-local (&gt;-local,
resp.) w.r.t. ?
per-module basis: given a signature , is it likely that
{ ?-mod( ; O) 6= ;-mod( ; O), or
{ &gt;-mod( ; O) 6= -mod( ; O), or
{ &gt;? -mod( ; O) 6= ; -mod( ; O), or
{ ; -mod( ; O) 6= MEX-mod( ; O)?
If yes, is it likely that the di erence is large?</p>
      <p>
        Clearly we need to pick, for each ontology in our corpus, a suitable set of
signatures, and this poses a signi cant problem. A full investigation is infeasible:
if m = #Oe, there are 2m possible seed signatures, so that testing axioms for
locality against all the signatures is already impossible for m 100. One could
assume that comparing modules is easier since many signatures can lead to
the same module. However, previous work [
        <xref ref-type="bibr" rid="ref6 ref8">6,8</xref>
        ] has shown that the number of
modules in ontologies is, in general, exponential w.r.t. the size of the ontology.
Still, di erent seed signatures can lead to the same module, which makes it hard
to extract enough di erent modules.
      </p>
      <p>We will consider seed signatures of two kinds: genuine seed signatures and
random seed signatures.</p>
      <p>Genuine seed signatures. A module does not necessarily show an internal
coherence: e.g., if we had an ontology O about the domains of geology and
philosophy, we could extract the module for the signature = fEpistemology; Mineralg.
That module is likely to be the union of the two disjoint modules for 1 =
fEpistemologyg and 2 = fMineralg.</p>
      <p>
        In contrast, genuine modules can be said to be coherent: they are those
modules that cannot be decomposed into the union of two \ "-uncomparable
modules. Notably, there are only linearly many genuine modules in the size
of O since each genuine x-module equals x-mod( ~; O), for some axiom 2
O. Moreover, all modules of O are composed from genuine modules [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Thus,
genuine modules are of special interest, and we can investigate them, and the
corresponding genuine signatures, in full.
10 Recall: the MEX module is always a subset of the semantic
always a subset of the syntactic -module.
-module, which is
Random seed signatures. Since a full investigation of all the signatures is
impossible, we compare locality|both on a per-axiom and per-module basis|
as well as LBMs and MEX modules on a random signature , which we select by
setting each named entity E in the ontology to have probability p = 1=2 of being
included in . This ensures that each will have the same probability to be
chosen. This approach has a clear setback: the random variable \size of the seed
signature generated" follows a binomial distribution, so a random seed signature
is highly likely to be rather large and to contain half the terms of the ontology.
However, we do not yet have enough insight into what typical seed signatures are
for module extraction, so biasing the selection of signatures to, for example, those
of a certain size has no rationale. In contrast, selecting random seed signatures
avoids the introduction of any bias. Moreover, this choice is complementary to
the selection of all the genuine signatures, which are in general small.
      </p>
      <p>
        Whilst the selection of genuine signatures is complete, we can only aim at
selecting a number of random signatures to obtain statistically signi cant
statements about modules. To reach a con dence level of 95% that the true
proportion of di erences between modules lies in the con dence interval ( 5%) of
the observed proportion, we have to sample at least 385 seed signatures (see
the detailed explanations in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]). For ontologies with at least 9 elements in the
signature, we will therefore draw a sample of size 400. For all other ontologies,
we will look at all of the 6 400 signatures.
      </p>
      <p>Summary. We compare, for every ontology O in our corpus,
(T1) for random seed signatures from O,
(a) for each axiom in O, is
{ ;-local w.r.t. but not ?-local w.r.t. ?
{ -local w.r.t. but not &gt;-local w.r.t. ?
(b) is
{ ?-mod( ; O) 6= ;-mod( ; O)?
{ &gt;-mod( ; O) 6= -mod( ; O)?
{ &gt;? -mod( ; O) 6= ; -mod( ; O)?
{ ; -mod( ; ELI(O)) 6= MEX-mod( ; ELI(O))?
(T2) the same, ranging over all the genuine signatures ~ for
4</p>
      <sec id="sec-5-1">
        <title>Results of the Experiments</title>
        <p>2 O.</p>
        <p>No di erences in locality. The main result of the experiment is that, for the
vast majority of the ontologies in our corpus, no di erence between syntactic
and semantic locality is observed, for all three variants ? vs. ;, &gt; vs. , and
&gt;? vs. ; . More precisely, for 210 out of 242 ontologies, we obtain that:
(T1) for random seed signatures, there is no statistically signi cant di erence
(a) between semantic and syntactic locality of any kind,
(b) between semantic and syntactic LBMs of any kind;
(T2) given any genuine signature, there is no such di erence.</p>
        <p>More speci cally, for all randomly generated seed signatures and all genuine
signatures, the corresponding bottom-modules (and the corresponding top- and
star-modules, respectively) agree, and every axiom is either ?- and ;-local, or
none of both (and either &gt;- and -local, or none of both).</p>
        <p>
          The 210 ontologies include Galen and People, which are renowned for having
unusually large ?-modules [
          <xref ref-type="bibr" rid="ref2 ref8">2,8</xref>
          ]. In most cases, extracting a semantic and
syntactic LBM each took only a few milliseconds; hence, a performance comparison
is not meaningful. For some ontologies, the semantic LBM took considerably
longer to extract than the syntactic: up to 5 times for star-modules in Molecule
Role, and up to 34 times in Galen.
        </p>
        <p>Di erences in locality. We have observed di erences between syntactic and
semantic locality for 32 ontologies in our corpus. We call the axioms that cause
these di erences culprits { patterns of axioms which are not ?-local (&gt;-local,
respectively) w.r.t. some signature , but which are ;-local ( -local,
respectively) w.r.t. . We have identi ed four types of patterns, a{d , and we describe
them in the following. Sometimes, culprit axioms pull additional axioms into the
syntactic LBM, due to signature extension during module extraction.</p>
        <p>We denote concept names by A; B, complex concepts by C; D, roles by r; s; : : : ,
nominals by a, non-empty data ranges (e.g., int or int0::9) by R, possibly with
indices. denotes a signature for which a module is extracted or against which
an axiom is checked for locality. Terms outside are overlined; we further use
notation C? and C&gt; to denote concepts that are bottom- or top-equivalent due to
the grammar de ning syntactic locality in [3, Def. 6] and the analogous grammar
for semantic locality.</p>
        <p>Culprits of type a are simple tautologies that accidentally entered the
\inferred view" (closure under certain entailments) of an ontology. These axioms do
not occur in the original \asserted" versions and could, in principle, be detected
in a simple preprocessing step. Type-a culprits occur in 10 ontologies of the
above 32, and are of the following kinds: A v A, r (r ) , and A u C u D v A u C.
Each such tautology is trivially ;-local and -local w.r.t. any , but not always
?- or &gt;-local: if contains all terms in , then both sides of the subsumption
(equivalence) are neither ?- nor &gt;-equivalent.</p>
        <p>
          Di erences caused not solely by culprits of type a have been observed
for 26 ontologies. In only 6 of these cases, the di erences a ect modules; in the
remaining 20, they only a ect locality of single axioms (tests T1 a and T2 a). We
will focus on the former 6, listed in Table 1, and refer to [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] for details on all 26.
        </p>
        <p>Ontology Abbreviation DL expressivity #axioms #terms
MiniTambis-repaired MiniT 170 226
Tambis-full Tambis 592 496
Bleeding History P... BHO 1,925 581
Neuro Behavior O... NBO 1,314 970
Pharmacogenomic... PhaRe 459 311
Terminological and... TOK 466 330</p>
        <p>According to Table 1, di erences between modules occur for ontologies of
medium to large size and medium to high expressivity. Di erences in locality
alone additionally a ect small ontologies such as Koala (42 axioms) and Pilot
Ontology (85 axioms), as well as large ontologies such as Galen (4,735 axioms)
and Experimental Factor Ontology (7,156 axioms). The number of axioms causing
these di erences (i.e., matching the culprit patterns) in the a ected ontologies is
small except for Galen, and most of the observed di erences are relatively small.</p>
        <p>
          Table 2 gives a representative selection of the di erences in modules observed.
For a complete overview, including di erences in locality of single axioms, consult
the table in [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ].
        </p>
        <p>Ontology Types a ected
miniT
Tambis
BHOa
NBOa
PhaRea
TOK
bot, star
bot, star
star
star
top, star
top, star
adi erences only for genuine modules
bdi erences &gt; 5% only for genuine modules
cdi erences &gt; 11 axioms (&gt; 2%) only for genuine modules
ddi erences &gt; 13 axioms (&gt; 1,300%) only for top-modules
The columns show: ontology name (abbreviations: see Table 1); type of modules
affected; relative number of module pairs with di erences; number of axioms in the
di erences (absolute and relative to the ;- or - or ; -case); type of culprit present
and number of axioms of this type involved in di erences.</p>
        <p>Table 2. Overview of observed di erences between modules</p>
        <p>Table 2 shows small absolute di erences for miniT, BHO, NBO, and TOK.
In Tambis, large di erences occur only for genuine modules, which suggests that
they are unlikely to occur in practical cases with usually larger seed signatures.
Finally, in PhaRe, large di erences occur only for top-modules. For all these
ontologies, a single syntactic or semantic module was extracted within only a
few milliseconds, making module extraction times roughly equal.
Culprits of type b are axioms with an 9-restriction on a set of nominals or
a non-empty data range on the right-hand side, such as A v 9r:fa1; : : : ; ang or
A v 9r:R. These axioms are -local w.r.t. any signature that does not contain
r because they become tautologies if r is replaced by &gt;. However, they can only
be &gt;-local when A is a ?-equivalent concept w.r.t. .</p>
        <p>Culprit-b axioms a ect genuine modules of BHO, and (only) locality of single
axioms for 4 more ontologies. We observed a slightly more sophisticated variant
of the form A C&gt; u 9r:R.</p>
        <p>Culprits of type c are axioms that contain a concept description C such
that (a) C becomes equivalent to ? (or &gt;) if all terms outside are replaced
by ? (or &gt;); (b) this causes to be semantically ?-local (or &gt;-local); but (c)
the grammars for syntactic locality do not \detect" C to be a C? (or C&gt;). For
example, C = 8r:A u 9r:&gt; becomes ?-equivalent if A is replaced by ?; the same
holds with cardinality restrictions in place of \9". Consequently, axioms such as
A? B u 8r:C? u 8s:fag u =3 r:&gt;; (taken from Koala) are ;-local but not ?-local.</p>
        <p>We found this pattern in 8 ontologies. Only in miniT and Tambis, it a ects
a large proportion of bottom- and star-modules, with additional axioms \pulled
in". Still, the size of the di erences is modest, as argued above. Some of the
remaining 6 ontologies contain di erent kinds of complex concepts that cause
di erences in top-locality of single axioms.</p>
        <p>
          Culprits of type d are axioms where a concept (or role) name from the
left-hand side occurs on the right-hand side together with a top-equivalent role
(or concept), causing di erences in top-modules. The simplest kind of axiom of
this type is A v 9r:A, which is -local because replacing r with &gt; makes it a
tautology. The axiom is only &gt;-local if contains neither r nor A. We have found
further examples of increasing complexity in Adverse Event Reporting Ontology
and Galen; see [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ].
        </p>
        <p>
          We have observed culprits of type d in 17 ontologies, see the detailed overview
in [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. Only in 3 cases (NBO, PhaRe, and TOK) are modules a ected.
        </p>
        <p>Galen contains 121 culprit-D axioms, but they only a ect locality of single
axioms. In addition, the time di erences for Galen are remarkable: checking all
axioms for -locality takes up to 70 times longer than checking them for
&gt;locality.</p>
        <p>Summary. All culprits hardly ever cause signi cant di erences in modules. Only
for PhaRe are di erences between semantic and syntactic modules not negligible,
but we were able to relativize them.</p>
        <p>Table 1 may suggest that culprits occur only in expressive ontologies.
However, patterns a, c, d can, in principle, already occur in simple terminologies in
EL and ALC, respectively. Evidently, type-a culprits can easily be ltered out
in a preprocessing step. For types c and d , there is no hope for an exhaustive
extension to locality because they can (and do) occur in arbitrarily complex
shapes and contexts.</p>
        <p>Patterns of type b rely on nominals or datatypes { but they are repairable
by a straightforward extension to the de nition of syntactic locality: one can
extend the locality de nition to distinguish ?- and &gt;-distinct concepts, by adding
appropriate grammars to the de nition of syntactic locality, and adding more
cases of ?- and &gt;-equivalent concepts to the existing grammars. However, from
the small numbers of di erences observed, we doubt that such an extension of
syntactic locality will have any signi cant e ects in practice.</p>
        <p>LBMs vs MEX results. The results of the experimental comparison of
syntactic/semantic LBMs and MEX modules are summarized in Table 3. They show
that MEX modules smaller than the corresponding LBMs can be found in 27%
of the preprocessed ontologies, for either random or axiom-based seed signatures.
At the same time, unsurprisingly, syntactic and semantic LBMs do not di er at
all for these simple ELI ontologies.</p>
        <p>In experiments with random seed signatures, it can be seen that for those
ontologies where there are di erences (most notably, Galen), they occur in many
Experiment
Random signatures
Axiom signatures
The results from the third column on are averaged over all ontologies with di erences
LBM{MEX in at least one module. For example, the last two columns show the average
min and max absolute (resp. relative) di erence between LBMs and MEX modules.
tests. Thus, the di erence appears to be caused by features of the ontology,
not some particular seed signatures. Also, the di erence sometimes comes out
large in certain tests, also for genuine modules. For example, for the signature of
the following axiom in Galen, both ; -mod and &gt;? -mod contain 127 axioms
while the MEX-module only contains the axiom itself:11 RICF ICF u 9ISFO:RSH.
The likely reason is the proliferation of concept equivalence axioms in Galen. For
example A B will end up in the ; -mod for any seed signature containing
either A or B. It is, however, an mCE of ; w.r.t. to either fAg or fBg.
5</p>
      </sec>
      <sec id="sec-5-2">
        <title>Conclusion and outlook</title>
        <p>Summary. We obtain three main observations from our experiments. (1) In
general, there is no or little di erence between semantic and syntactic locality.
Hence, the computationally cheaper syntactic locality is a good approximation of
semantic locality. For the ontologies Galen and People, which are \renowned" for
having disproportionately large modules, syntactic and semantic LBMs do not
di er. (2) In most cases, there is no or little di erence between LBMs and MEX
modules. Only for Galen are MEX modules considerably smaller than LBMs. (3)
Though in principle hard to compute, semantic LBMs can be extracted rather
fast in practice. Still, their extraction often takes considerably longer than for
syntactic LBMs. We cannot make any statement about MEX module extraction
times because we use the original MEX implementation, which combines loading
and module extraction. Due to (1), hardly any bene t can be expected from
preferring the potentially smaller semantic LBMs to the cheaper syntactic LBMs.
From (2) we can say that semantic LBMs can be seen as the best available
approximation of MEX modules for ontologies in highly expressive languages.</p>
        <p>Not only does our study evaluate how good the cheap syntactic locality
approximates semantic locality and model conservativity, it also enabled us to x
bugs in the implementation of syntactic modularity.</p>
        <p>Future work. Two questions are interesting for future work: (1) How can we
redesign the experiments so that we can include the very large ontologies? (2) How
do LBMs compare to other types of conservativity-based modules?</p>
        <p>
          Concerning (2), one could include, for example, the technique based on
reduction to QBF for the OWL 2 QL pro le [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] when an o -the-shelf implementation
becomes available.
11 The acronyms denote RightIne ectiveCardiacFunction, Ine ectiveCardiacFunction,
isSpeci cFunctionOf, RightSideOfHeart.
        </p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Franz</given-names>
            <surname>Baader</surname>
          </string-name>
          , Deborah Calvanese, Diego andMcGuinness, Daniele Nardi, and
          <string-name>
            <surname>Peter F.</surname>
          </string-name>
          Patel-Schneider, editors.
          <source>The Description Logic Handbook: Theory</source>
          , Implementation, and
          <string-name>
            <surname>Applications</surname>
          </string-name>
          . Cambridge University Press,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Bernardo</given-names>
            <surname>Cuenca</surname>
          </string-name>
          <string-name>
            <surname>Grau</surname>
          </string-name>
          , Ian Horrocks, Yevgeny Kazakov, and
          <string-name>
            <given-names>Ulrike</given-names>
            <surname>Sattler</surname>
          </string-name>
          .
          <article-title>Modular reuse of ontologies: Theory and practice</article-title>
          .
          <source>J. of Artif. Intell. Research</source>
          ,
          <volume>31</volume>
          (
          <issue>1</issue>
          ):
          <volume>273</volume>
          {
          <fpage>318</fpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Bernardo</given-names>
            <surname>Cuenca</surname>
          </string-name>
          <string-name>
            <surname>Grau</surname>
          </string-name>
          , Ian Horrocks, Yevgeny Kazakov, and
          <string-name>
            <given-names>Ulrike</given-names>
            <surname>Sattler</surname>
          </string-name>
          .
          <article-title>Extracting modules from ontologies: A logic-based approach</article-title>
          . In Stuckenschmidt et al. [
          <volume>20</volume>
          ], pages
          <fpage>159</fpage>
          {
          <fpage>186</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>Bernardo</given-names>
            <surname>Cuenca</surname>
          </string-name>
          <string-name>
            <surname>Grau</surname>
          </string-name>
          , Bijan Parsia, Evren Sirin, and
          <string-name>
            <given-names>Aditya</given-names>
            <surname>Kalyanpur</surname>
          </string-name>
          .
          <article-title>Modularity and Web ontologies</article-title>
          .
          <source>In Proc. of KR-06</source>
          . AAAI Press/The MIT Press,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>Chiara</given-names>
            <surname>Del Vescovo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Pavel</given-names>
            <surname>Klinov</surname>
          </string-name>
          , Bijan Parsia, Ulrike Sattler, Thomas Schneider, and
          <string-name>
            <given-names>Dmitry</given-names>
            <surname>Tsarkov</surname>
          </string-name>
          .
          <article-title>Empirical study of logic-based modules: Cheap is cheerful</article-title>
          .
          <source>Technical report</source>
          ,
          <year>2013</year>
          . https://sites.google.com/site/cheapischeerful/.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>Chiara</given-names>
            <surname>Del Vescovo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Bijan</given-names>
            <surname>Parsia</surname>
          </string-name>
          , Ulrike Sattler, and
          <string-name>
            <given-names>Thomas</given-names>
            <surname>Schneider</surname>
          </string-name>
          .
          <article-title>The modular structure of an ontology: an empirical study</article-title>
          . volume
          <volume>573</volume>
          of ceur-ws. org ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>Chiara</given-names>
            <surname>Del Vescovo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Bijan</given-names>
            <surname>Parsia</surname>
          </string-name>
          , Ulrike Sattler, and
          <string-name>
            <given-names>Thomas</given-names>
            <surname>Schneider</surname>
          </string-name>
          .
          <article-title>The modular structure of an ontology: Atomic decomposition</article-title>
          .
          <source>In Proc. of IJCAI-11</source>
          , pages
          <fpage>2232</fpage>
          {
          <fpage>2237</fpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>Chiara</given-names>
            <surname>Del Vescovo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Bijan</given-names>
            <surname>Parsia</surname>
          </string-name>
          , Ulrike Sattler, and
          <string-name>
            <given-names>Thomas</given-names>
            <surname>Schneider</surname>
          </string-name>
          .
          <article-title>The modular structure of an ontology: Atomic decomposition and module count</article-title>
          . volume
          <volume>230</volume>
          <source>of FAIA</source>
          , pages
          <volume>25</volume>
          {
          <fpage>39</fpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>James</given-names>
            <surname>Garson</surname>
          </string-name>
          .
          <source>Modularity and relevant logic</source>
          .
          <volume>30</volume>
          (
          <issue>2</issue>
          ):
          <volume>207</volume>
          {
          <fpage>223</fpage>
          ,
          <year>1989</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Silvio</surname>
            <given-names>Ghilardi</given-names>
          </string-name>
          , Carsten Lutz, and
          <string-name>
            <given-names>Frank</given-names>
            <surname>Wolter</surname>
          </string-name>
          .
          <article-title>Did I damage my ontology? A case for conservative extensions in Description Logics</article-title>
          .
          <source>In Proc. of KR-06</source>
          , pages
          <fpage>187</fpage>
          {
          <fpage>197</fpage>
          . AAAI Press/The MIT Press,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Matthew</surname>
            <given-names>Horridge</given-names>
          </string-name>
          , Bijan Parsia, and
          <string-name>
            <given-names>Ulrike</given-names>
            <surname>Sattler</surname>
          </string-name>
          .
          <article-title>The state of bio-medical ontologies</article-title>
          .
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Ian</surname>
            <given-names>Horrocks</given-names>
          </string-name>
          , Oliver Kutz, and
          <string-name>
            <given-names>Ulrike</given-names>
            <surname>Sattler</surname>
          </string-name>
          .
          <article-title>The even more irresistible SROIQ</article-title>
          .
          <source>In Proc. of KR-06</source>
          , pages
          <fpage>57</fpage>
          {
          <fpage>67</fpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Boris</surname>
            <given-names>Konev</given-names>
          </string-name>
          , Carsten Lutz, Dirk Walther, and
          <string-name>
            <given-names>Frank</given-names>
            <surname>Wolter</surname>
          </string-name>
          .
          <article-title>Semantic modularity and module extraction in description logics</article-title>
          .
          <source>In Proc. of ECAI-08</source>
          , pages
          <fpage>55</fpage>
          {
          <fpage>59</fpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Boris</surname>
            <given-names>Konev</given-names>
          </string-name>
          , Carsten Lutz, Dirk Walther, and
          <string-name>
            <given-names>Frank</given-names>
            <surname>Wolter</surname>
          </string-name>
          .
          <article-title>Formal properties of modularization</article-title>
          . In Stuckenschmidt et al. [
          <volume>20</volume>
          ], pages
          <fpage>25</fpage>
          {
          <fpage>66</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Roman</surname>
            <given-names>Kontchakov</given-names>
          </string-name>
          , Luca Pulina, Ulrike Sattler, Thomas Schneider, Petra Selmer, Frank Wolter, and
          <string-name>
            <given-names>Michael</given-names>
            <surname>Zakharyaschev</surname>
          </string-name>
          .
          <article-title>Minimal module extraction from DLLite ontologies using QBF solvers</article-title>
          .
          <source>In Proc. of IJCAI-09</source>
          , pages
          <fpage>836</fpage>
          {
          <fpage>841</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Roman</surname>
            <given-names>Kontchakov</given-names>
          </string-name>
          , Frank Wolter, and
          <string-name>
            <given-names>Michael</given-names>
            <surname>Zakharyaschev</surname>
          </string-name>
          .
          <article-title>Logic-based ontology comparison and module extraction, with an application to DL-Lite</article-title>
          .
          <source>Arti cial Intelligence</source>
          ,
          <volume>174</volume>
          (
          <issue>15</issue>
          ):
          <volume>1093</volume>
          {
          <fpage>1141</fpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Carsten</surname>
            <given-names>Lutz</given-names>
          </string-name>
          , Dirk Walther, and
          <string-name>
            <given-names>Frank</given-names>
            <surname>Wolter</surname>
          </string-name>
          .
          <article-title>Conservative extensions in expressive Description Logics</article-title>
          .
          <source>In Proc. of IJCAI-07</source>
          , pages
          <fpage>453</fpage>
          {
          <fpage>458</fpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <given-names>Carsten</given-names>
            <surname>Lutz</surname>
          </string-name>
          and
          <string-name>
            <given-names>Frank</given-names>
            <surname>Wolter</surname>
          </string-name>
          .
          <article-title>Deciding inseparability and conservative extensions in the description logic EL</article-title>
          .
          <volume>45</volume>
          (
          <issue>2</issue>
          ):
          <volume>194</volume>
          {
          <fpage>228</fpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Ulrike</surname>
            <given-names>Sattler</given-names>
          </string-name>
          , Thomas Schneider, and
          <string-name>
            <given-names>Michael</given-names>
            <surname>Zakharyaschev</surname>
          </string-name>
          .
          <article-title>Which kind of module should I extract? volume 477 of ceur-ws</article-title>
          . org ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Heiner</surname>
            <given-names>Stuckenschmidt</given-names>
          </string-name>
          , Christine Parent, and Stefano Spaccapietra, editors. volume
          <volume>5445</volume>
          <source>of LNCS</source>
          . Springer-Verlag,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>