<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Syntactic vs. Semantic Locality: How Good Is a Cheap Approximation?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Chiara Del Vescovo</string-name>
          <email>delvescc@cs.man.ac.uk</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pavel Klinov</string-name>
          <email>pavel.klinov@uni-ulm.de</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bijan Parsia</string-name>
          <email>bparsia@cs.man.ac.uk</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Uli Sattler</string-name>
          <email>sattler@cs.man.ac.uk</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Thomas Schneider</string-name>
          <email>tschneider@informatik.uni-bremen.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dmitry Tsarkov</string-name>
          <email>tsarkov@cs.man.ac.uk</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Universität Bremen</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Manchester</institution>
          ,
          <country country="UK">UK</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Ulm</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Extracting a subset of a given OWL ontology that captures all the ontology's knowledge about a specified set of terms is a wellunderstood task. This task can be based, for instance, on locality-based modules (LBMs). These come in two flavours, syntactic and semantic, and a syntactic LBM is known to contain the corresponding semantic LBM. For syntactic LBMs, polynomial extraction algorithms are known, implemented in the OWL API, and being used. In contrast, extracting semantic LBMs involves reasoning, which is intractable for OWL 2 DL, and these algorithms had not been implemented yet for expressive ontology languages. We present the first implementation of semantic LBMs and report on experiments that compare them with syntactic LBMs extracted from real-life ontologies. Our study reveals whether semantic LBMs are worth the additional extraction effort, compared with syntactic LBMs.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Extracting a subset of a given OWL ontology that captures all the ontology’s
knowledge about a specified set of concept and role names is an interesting task
for various applications, and it is by now well-understood [
        <xref ref-type="bibr" rid="ref10 ref11 ref2">2,10,11</xref>
        ]. In general,
we consider a setting where, for a given signature, we want to determine a (small)
subset of a given ontology such that any axiom over the signature entailed by
the ontology is also entailed by the subset. For expressive logics, this task can
be implemented by making use of the notion of locality, and results in what is
known as locality-based modules (LBMs) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Locality comes in many different
flavours, in particular there are notions of syntactic and semantic locality. A
syntactic LBM is known to contain the corresponding semantic LBM, but might
also contain extra axioms which are, because they are not in the semantic LBM,
superfluous for entailments over the given signature. Algorithms for the
extraction of syntactic LBMs are known that run in time that is polynomial in the size
of the ontology (thus much cheaper than reasoning), implemented in the OWL
API, and being used. In contrast, despite the fact that algorithms for extracting
semantic LBMs are known, until now and to the best of our knowledge, they had
not yet been implemented. Moreover, these involve entailment checking, and are
thus intractable for expressive profiles of OWL 2.
      </p>
      <p>We present the first implementation of semantic LBMs and report on
experiments that compare them with syntactic LBMs extracted from real-life
ontologies. The contributions of this paper are as follows: we show with statistical
significance that, for almost all members of a large corpus of existing ontologies,
there is no difference between any syntactic LBM and its corresponding semantic
LBM. In the few cases where differences occur, these differences are modest and
not worth the increased computation time needed to compute semantic LBMs.
In addition, we isolate two types of axioms that lead to differences, where one
is a simple tautology that can, in principle, be detected by a straightforward
addition to the syntactic locality checker. Furthermore, our results show that
the extraction of semantic LBMs, which is in principle hard, seems feasible in
practice. The lesson we learn from these results is that “Cheap is Great”!
2</p>
    </sec>
    <sec id="sec-2">
      <title>Preliminaries</title>
      <p>
        We assume the reader to be familiar with OWL and the underlying description
logic SROIQ [
        <xref ref-type="bibr" rid="ref1 ref8">1,8</xref>
        ], and will define the central notions around locality-based
modularity [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>Let NC be a set of concept names, and NR a set of role names. A signature
is a set of terms, i.e., a set NC [ NR of concept and role names. We can
think of a signature as specifying a topic of interest. Axioms that only use terms
from can be thought of as “on-topic”, and all other axioms as “off-topic”. For
instance, if = fAnimal; Duck; Grass; eatsg, then Duck v 9eats:Grass is on-topic,
while Duck v Bird is off-topic.</p>
      <p>Any concept, role, or axiom that uses only terms from is called a -concept,
-role, or -axiom. Given any such object X, we call the set of terms in X the
signature of X and denote it with Xe .</p>
      <p>Given an interpretation I, we denote its restriction to the terms in a signature
with Ij . Two interpretations I and J are said to coincide on a signature ,
in symbols Ij = J j , if I = J and XI = XJ for all X 2 .</p>
      <p>There are a number of variants of the notion of conservative extensions, which
capture the desired preservation of knowledge to different degrees. We focus on
the deductive variant.</p>
      <p>Definition 1. Let M</p>
      <p>O be SROIQ-ontologies and
a signature.
(1) O is a deductive -conservative extension ( -dCE ) of M if, for all
SROIQaxioms with e , it holds that M j= if and only if O j= .
(2) M is a dCE-based module for of O if O is a -dCE of M.</p>
      <p>
        Unfortunately, deciding in general if a set of axioms is a module in this sense
is hard or even impossible for expressive DLs [
        <xref ref-type="bibr" rid="ref12 ref6">6,12</xref>
        ], and finding a minimal one
is even more so. However, “good sized” modules that are efficiently computable
have been introduced [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. They are based on the locality of single axioms, which
means that, given , the axiom can always be satisfied independently of the
interpretation of the -terms, but in a restricted way: by interpreting all
nonterms either as the empty set (;-locality) or as the full domain4 ( -locality).
Definition 2. A SROIQ-axiom is called ;-local ( -local) w.r.t. signature
if, for each interpretation I, there exists an interpretation J such that Ij
J j , J j= , and for each X 2 e n , XJ = ; (for each C 2 e n , CJ =
and for each R 2 e n , RJ = ).
=
      </p>
      <p>
        It has been shown in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] that M O and all axioms in O n M being ;-local
(or all axioms being -local) w.r.t. [ Mf is sufficient for O to be a -dCE of
M. The converse does not hold: e.g., the axiom A B is neither ;- nor -local
w.r.t. fAg, but the ontology fA Bg is an fAg-dCE of the empty ontology.
      </p>
      <p>
        Furthermore, locality can be tested using available DL-reasoners [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], which
makes this problem considerably easier than testing conservativity. However,
reasoning in expressive DLs is still complex, e.g. N2ExpTime-complete for
SROIQ [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. In order to achieve tractable module extraction, a syntactic
approximation of locality has been introduced in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. The following definition
captures only the case of SHQ-TBoxes and can straightforwardly be extended to
SROIQ ontologies.
      </p>
      <p>Definition 3. An axiom is called syntactically ?-local (&gt;-local ) w.r.t.
signature if it is of the form C? v C, C v C&gt;, C? C?, C&gt; C&gt;, R? v R
(R v R&gt;), or Trans(R?) (Trans(R&gt;)), where C is an arbitrary concept, R is an
arbitrary role name, R? 2= (R&gt; 2= ), and C? and C&gt; are from Bot( ) and
Top( ) as defined in Part (a) (resp. (b)) of the table below.
(a) ?-Locality</p>
      <p>Let A?; R? 2= ; C? 2 Bot( ), C(&gt;i) 2 Top( ); n 2 N n f0g
Bot( ) ::= A? j ? j :C&gt; j C u C? j C? u C j 9R:C? j &gt;n R:C? j 9R?:C j &gt;n R?:C
Top( ) ::= &gt; j :C? j C1&gt; u C2&gt; j &gt;0 R:C
(b) &gt;-Locality</p>
      <p>
        Let A&gt;; R&gt; 2= ; C? 2 Bot( ), C(&gt;i) 2 Top( ); n 2 N n f0g
Bot( ) ::= ? j :C&gt; j C u C? j C? u C j 9R:C? j &gt;n R:C?
Top( ) ::= A&gt; j &gt; j :C? j C1&gt; u C2&gt; j 9R&gt;:C&gt; j &gt;n R&gt;:C&gt; j &gt;0 R:C
It has been shown in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] that ?-locality (&gt;-locality) of an axiom w.r.t.
implies ;-locality ( -locality) of w.r.t. . Therefore, all axioms in O n M
being ?-local (or all axioms being &gt;-local) w.r.t. [ Mf is sufficient for O to
be a -dCE of M. The converse does not hold; examples can be found in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>For each of the four locality notions, modules of O are obtained by starting
with an empty set of axioms and subsequently adding axioms from O that are
non-local. In order for this procedure to be correct, the signature against which</p>
      <sec id="sec-2-1">
        <title>4 Or, in the case of roles, the set of all pairs of domain elements.</title>
        <p>
          locality is checked has to be extended with the terms in the axioms that are
added in each step, so that the resulting module M consists of all the non-local
axioms with respect to [ Mf. Definition 4 (1) introduces locality-based
modules, which are always dCE-based modules [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], although not necessarily minimal
ones. Modules based on syntactic (semantic) locality can be made smaller by
iteratively nesting &gt;- and ?-extraction ( - and ;-extraction), and the result
is still a dCE-based module [
          <xref ref-type="bibr" rid="ref13 ref2">2,13</xref>
          ]. These so-called &gt;? -modules ( ; -modules)
are introduced in Definition 4 (3).
        </p>
        <p>Definition 4. Let x 2 f;; ; ?; &gt;g, yz 2 f&gt;?;
signature.
;g, O an ontology and
a
(1) An ontology M is the x-module of O w.r.t. if it is the output of
Algorithm 1. We write M = x-mod( ; O).
(2) An ontology M is the yz-module of O w.r.t. , written M = yz-mod( ; O),
if M = y-mod( ; z-mod( ; O)).
(3) Let (Mi)i&gt;0 be a sequence of ontologies such that M0 = O and Mi+1 =
yz-mod( ; Mi) for every i &gt; 0. For the smallest n &gt; 0 with Mn = Mn+1,
we call Mn the yz -module of O w.r.t. , written M = yz -mod( ; O).
Algorithm 1 Extract a locality-based module</p>
        <p>Input: Ont. O, sig. , x 2 f;; ; ?; &gt;g</p>
        <sec id="sec-2-1-1">
          <title>Output: x-module M of O w.r.t.</title>
          <p>O
M ;; O0
repeat
changed
for all
if</p>
          <p>false
2 O0 do
not x-local w.r.t.</p>
          <p>M M [ f g; O0
until changed = false
return M
[ Mf then</p>
        </sec>
        <sec id="sec-2-1-2">
          <title>O0 n f g; changed</title>
          <p>true</p>
          <p>
            As for (1), it has been shown in [
            <xref ref-type="bibr" rid="ref2">2</xref>
            ] that the output M of Algorithm 1 does
not depend on the order in which the axioms are selected.5 Furthermore,
the integer n in (3) exists because the sequence (Mi)i&gt;0 is decreasing (more
precisely, we have M0 Mn = Mn+1 = : : : ). Due to monotonicity
properties of locality-based modules, the dual notions of ?&gt; - and ; -modules
are uninteresting because they coincide with those of &gt;? - and ; -modules.
          </p>
          <p>Roughly speaking, a - or &gt;-module for gives a view from above because
it contains all subclasses of class names in , while a ;- or ?-module for gives
a view from below since it contains all superconcepts of concept names in .</p>
          <p>
            Modulo the locality check, Algorithm 1 runs in time cubic in jOj + j j [
            <xref ref-type="bibr" rid="ref2">2</xref>
            ].
Modules based on ?/&gt;-locality are therefore a feasible approximation for
modules based on ;/ -locality. In both cases, modules are extracted axiom by axiom
          </p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>5 Our algorithm is a special case of the one in [2, Figure 4].</title>
        <p>but, as said above, the ;/ -locality check is more complex. A module extractor
is implemented in the OWL API6 and SSWAP7. To summarize:
1. Given an ontology O, the semantic module Msem for a signature is
contained in the corresponding syntactic module Msyn for the same seed
signature.8 This means that in principle more unnecessary axioms for preserving
entailments over can end up in syntactic modules rather than in semantic
modules.
2. The extraction of a syntactic module can be done in polynomial time w.r.t.
the size of the ontology O. In contrast, the extraction of a semantic module
is as hard as reasoning.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Experimental design</title>
      <p>The main aim of this paper is to investigate how well syntactic locality
approximates semantic locality. In particular, we want to see how (un)likely it is that
syntactic locality-based modules are larger than semantic locality-based ones
and how large these differences are. We also want to understand empirically how
much more costly semantic locality is in terms of performance.</p>
      <p>
        Selection of the Corpus. For our experiments, we have built a corpus containing:
(1) from the TONES repository,9 those ontologies that have already been studied
in a previous work on modularity [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]: Koala, Mereology, University, People,
miniTambis, OWL-S, Tambis, Galen; (2) all ontologies from the NCBO BioPortal
ontology repository.10
      </p>
      <p>We then filter out all those the ontologies for which at least one of the
following problems occurs: the ontology is impossible to download; the .owl file
is corrupted when downloaded; the file is not parseable; the ontology is
inconsistent. Furthermore, due to time constraints, we exclude from this preliminary
investigation all ontologies whose size exceeds 10; 000 axioms.</p>
      <p>
        This selection results in a corpus of 156 ontologies, which greatly differ in
size and expressivity [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], as summarized in Table 3. For a full list of the corpus,
please refer to the technical report: http://arxiv.org/abs/1207.1641
Repository
BioPortal
TONES
      </p>
      <p>Range of expressivity
ALCN -SHIN (D)=SOIN (D)
AL-SROIF (D)=SHOIQ(D)</p>
      <p>Range #axs. Range sig. size
38–4,735 21–3,161
13–9,629 14–9,221
6 http://owlapi.sourceforge.net
7 http://sswap.info
8 Recall that ?-syntactic modules approximate ;-semantic modules, while &gt;-syntactic
modules approximate -semantic modules.
9 http://owl.cs.manchester.ac.uk/repository/
10 http://bioportal.bioontology.org
Comparing Syntactic and Semantic Locality. In order to compare syntactic and
semantic locality, we want to understand:
1. whether, for a given seed signature , the semantic -module is likely to be
smaller than the syntactic -module, and if so by how much,11
2. how feasible the extraction of semantic modules is.</p>
      <p>
        Here, we focus on the two corresponding notions of ;-semantic locality and
?-syntactic locality. In particular, ?-syntactic locality has been throughly
investigated in previous work [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], and it has proven to have many interesting
properties. A completion of the investigation described in this paper for all
fundamental notions of modules is planned in our future work.
      </p>
      <p>Due to the recursive nature of the locality-based module extraction algorithm,
we want to investigate locality both on a
– per-axiom basis: given an axiom and a signature , is it likely that is
semantically ;-local w.r.t. but not syntactically ?- local w.r.t. ?
– per-module basis: given a signature , is it likely that ?-mod( ; O) 6=
;-mod( ; O)? If yes, is it likely that the difference is large?</p>
      <p>Hence we need to pick, for each ontology in our corpus, a suitable set of
signatures, and this poses a significant problem. First, we do not yet have enough
insight into what typical seed signatures are for module extraction. One could
assume that large ones are rarely relevant for module extraction—why bother
with extracting a large module—but this still leaves a large, i.e., exponential
space of possible seed signatures. If m = #Oe, there are 2m possible seed
signatures for which axioms can be tested for locality and for which modules can be
extracted. Hence a full investigation is infeasible.</p>
      <p>
        One could assume that the comparison between semantic and syntactic
modules could be easier since many signatures can lead to the same module. In other
words, the statistically significant number of modules w.r.t. the total number
of modules is not larger than that of seed signatures needed w.r.t. the total
number of seed signatures. In previous work [
        <xref ref-type="bibr" rid="ref4 ref5">4,5</xref>
        ], however, modules have been
studied with respect to how numerous they are in real-world ontologies. The
experiments carried out suggest that the number of modules in ontologies is, in
general, exponential w.r.t. the size of the ontology. Moreover, the extraction of
enough different modules can be hard, because by looking just at seed signatures
there is no chance to avoid the extraction of the same module many times. In
particular, for a module M there can be exponentially many seed signatures
w.r.t. #Mf that generate M [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>
        As a consequence, we compare the two kinds of locality of axioms—both
on a per-axiom basis and a per-module basis—w.r.t. random signatures. To
avoid any bias, we select a random signature as follows: we set each named
entity E in the ontology to have probability p = 1=2 of being included in the
signature. Thus each seed signature has the same probability to be chosen. For
ontologies whose signature exceeds 9 entities, in order to get results where the
11 Recall that the semantic -module is always a subset of the syntactic -module.
true proportion of differences between the two notions of locality lies in the
confidence interval ( 5%) with confidence level 95%, we have to select only 400
random signatures [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. That is, we need to test only 400 random signatures to
have a confidence of 95% ( 5%) that the differences/equalities we observe reflect
the real ones.
      </p>
      <p>
        Non-random seed signatures. A module, in general, does not necessarily show any
internal coherence: intuitively, if we had an ontology describing some knowledge
from both the domains of Geology and of Philosophy, we could still extract the
module for the signature = fEpistemology; Mineralg. This module is likely
to be the union of the two disjoint modules for 1 = fEpistemologyg and
2 = fMineralg. This combinatorial behaviour can lead to exponentially many
modules in the size of the signature of the ontology and indeed, as mentioned
above, the number of modules in ontologies seems to be exponential [
        <xref ref-type="bibr" rid="ref4 ref5">4,5</xref>
        ].
      </p>
      <p>In contrast to general modules, genuine modules can be called coherent: they
are defined as those modules that cannot be decomposed into the union of two
different modules. Notably, there are only linearly many genuine modules in the
size of the ontology O, and the set of genuine modules is a base for all general
modules: any module is either genuine or the union of genuine modules. The
linear bound on the number of genuine modules is due to the fact that, for each
genuine x-module M, there is an axiom such that M = x-mod( ~; O).</p>
      <p>Thus genuine modules can be said to be interesting modules that we can
fully investigate. Hence in addition to the above mentioned investigation of
?and ;-modules for random signatures, we also look at all axiom signatures.</p>
      <p>In summary, we test:
(T1) for random seed signatures ,
(a) for each axiom in our corpus, is semantically ;-local w.r.t. but
not syntactically ?- local w.r.t. ?
(b) is ?-mod( ; O) 6= ;-mod( ; O)? If yes, we determine the difference and
its size.
(T2) for each axiom signature from our corpus, is ?-mod( ~; O) 6= ;-mod( ~; O)?</p>
      <p>If yes, we determine the difference and its size.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Experimental comparison</title>
      <p>No differences. The main result of the experiment is that, for 151 of the 156
ontologies we tested, no difference between ?- and ;-locality can be observed.
These 151 ontologies exclude the two NCBO BioPortal ontologies EFO
(Experimental Factor Ontology) and SWO (Software Ontology), as well as Koala,
miniTambis, and Tambis. More specifically, for every generated seed signature,
the corresponding ?- and ;-module agree, and every axiom is either ?- and
;-local, or neither. This statement applies to all randomly generated seed
signatures as well as for all axiom signatures – which are seed signatures for all
genuine modules. We can therefore draw the following conclusions for the 151
ontologies with respect to (T1) and (T2) above.
(T1) Given an arbitrary seed signature , there is no difference (a) between
?- and ;-locality of any given axiom w.r.t. and (b) between the ?- and
;-modules for , both times at a significance level of 0:05.
(T2) Given any axiom signature , there is no difference between the ?- and
;-modules for .</p>
      <p>
        In the case of the 151 ontologies, the extraction of a ;-module (with tautology
tests performed by FaCT++) often took considerably longer than the extraction
of the corresponding ?-module. For example, for MoleculeRole, the largest of
the 151 ontologies, times to extract a ?-module (test all axioms for ?-locality,
respectively) ranged between 27 and 169ms (21 and 77ms, respectively), while
the extraction of a ;-module (test of all axioms for ;-locality, resp.) took up
to 6 as long, on average 2.7 (2.0 , resp.). It is also worth noting that the
ontologies Galen and People, which are renowned for having particularly large
?-modules [
        <xref ref-type="bibr" rid="ref2 ref5">2,5</xref>
        ], are among those without differences between ?- and ;-locality.
Differences. For the five ontologies where differences between ?- and ;-modules
(or -locality) occur, we isolated two types of culprits – axioms which are not
?-local w.r.t. some signature , but which are ;-local w.r.t. . Type-1 culprits
are simple tautologies that have accidentally entered the “inferred view” – i.e.,
closure under certain entailments – of two ontologies. They do not occur in the
original “asserted” versions and can, in principle, be detected by a slightly refined
syntactic locality check. Type-2 culprits are definitions of concept names via a
conjunction that satisfies certain conditions explained below. There are not many
type-1 and type-2 axioms in the affected ontologies, and the observed differences
are comparably small. Table 2 gives an overview of the differences observed.
Type-1 culprits are axioms InverseObjectProperties(P, InverseOf(P)),
where P is a role. This translates into the tautology P (P ) in DL
notation. Such an axiom is therefore ;-local w.r.t. any signature. However, it behaves
differently for ?-locality: if the signature contains P, then both sides of the
equation are neither in Bot( ) nor in Top( ), hence the axiom is considered
non-local; otherwise, both sides are ?-equivalent, hence the axiom is local.
      </p>
      <p>Type-1 axioms occur in the “inferred view” of the ontologies EFO and SWO.
Table 2 shows the relatively modest differences caused by these axioms. In all
cases, there are no other axioms in the differences. This means that no differences
occur for the non-inferred original versions of EFO and SWO.</p>
      <p>Type-2 culprits are complex definitions A C of a concept name A where
C is a disjunction that contains both a universal and an existential (or
minimum cardinality) restriction on the same role. This affects the ontologies Koala,
miniTambis, and Tambis. The effect is best illustrated for Koala, which contains
exactly one such axiom, namely M S u 8c:F u 8g:fmg u =3 c:&gt;; where we
have abbreviated the concept names MaleStudentWith3Daughters, Student,
Female, the roles hasChildren, hasGender, and the nominal male. Now if the
signature against which the axiom is tested for locality contains fS; c; gg but
Ontology</p>
      <p>#axs
SWO
EFO
Koala
miniTambis
Tambis
time culprit
ratio type and
avg. frequency
neither M nor F, then this axiom is not ?-local because none of the conjuncts on
the right-hand side is in Bot( ). On the other hand, this axiom is a tautology
when M and F are replaced by ?: the conjunction 8c:? u =3 c:&gt; cannot have any
instances, regardless of how c is interpreted.</p>
      <p>For Koala, this effect only causes two singleton differences between sets of
local axioms for the randomly generated seed signatures, as shown in Table 2.
For axiom signatures, there is no difference. Interestingly, this effect does not
propagate to modules: for all signatures, ?- and ;-modules are the same. The
reason might be that (a) g is used in many axioms and is thus very likely to
contribute to the extended signature during module extraction, and (b) then the
axiom defining F is no longer local, which “pulls” F into the extended signature,
preventing the observed effect.</p>
      <p>In miniTambis and Tambis, this effect is much stronger and affects a large
proportion of modules, as shown in Table 2. The differences in these cases do
not only consist of culprit axioms, but also of axioms that become non-local
after the signature has been extended by the terms in the culprit axioms. Still,
the size of the differences is mostly modest while, for Tambis, the ;-locality test
(;-module extraction) takes on average over three times (five times) as long as
the ?-locality test (?-module extraction).</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusion and Outlook</title>
      <p>Summary. We obtain two main observations from the experiments carried out.
– In practice, there is no or little difference between semantic and syntactic
locality. That is, the computationally cheaper syntactic locality is a good
approximation of semantic locality.
– Though in principle hard to compute, semantic modules can be extracted
rather fast in practice.</p>
      <p>
        These results suggest that it is questionable to conclude that semantic locality
should be preferred to syntactic locality. In terms of computation time, there is
often a benefit in using syntactic locality: the average speed-up compared to the
extraction of a semantic-locality based module is by a factor of up to 6. For
some particular module pairs, it is higher by an order of magnitude. The gain
in module size is zero or so small that it is hard to justify the extra time spent.
In particular, there is no gain in size for the ontologies Galen and People, which
are “renowned” for having disproportionately large modules [
        <xref ref-type="bibr" rid="ref2 ref5">2,5</xref>
        ].
      </p>
      <p>Our results are interesting not only because they provide an evaluation of
how good the cheap syntactic locality approximates semantic locality, but also
because they enabled us to fix bugs in the implementation of syntactic
modularity. For example, earlier data from the experiment have shown that reflexivity
axioms had been treated incorrectly by the syntactic locality checker.
Future Work. It is evident that this work is preliminary. It investigates only
the differences between the related notions of ?- and ;-locality. We plan to
extend the same study to other notions of locality, in particular, nested modules
(&gt;? - vs. ; -modules) – these notions are the most economical in terms of
module size. Moreover, we want to extend the investigation to the remaining
larger ontologies in the BioPortal repository and further large ontologies, e.g.,
some versions of the NCI Thesaurus12. Preliminary results with a version that
is not among the regular releases show differences due to type-2 culprits, but we
have not included them here because the differences disappear after removing
axioms that were introduced due a problem with object and annotation
properties when the ontology file is parsed by the OWL API. This behaviour is yet to
be investigated and explained.</p>
      <p>Another interesting extension is to modify the seed signature sampling.
Currently, the random variable “size of the seed signature generated” follows the
binomial distribution with expected value m=2 and variance m=4. Hence, most
signatures in the sample have size around m=2; small and large signatures are
underrepresented. For example, for one ontology with 915 terms, all signature sizes
lay between 422 and 509. One might argue that, for big ontologies, the typical
module extraction scenario does not require large seed signatures – but it does
sometimes require relatively small seed signatures, for example, when a module
is extracted to efficiently answer a given entailment query of typically small size.
12 Downloadable from http://evs.nci.nih.gov/ftp1/NCI_Thesaurus
On the other hand, large modules resulting from larger seed signatures may be
more likely to differ. We therefore plan an alternative seed signature sampling
via bins for average signature sizes: repeat the current sampling procedure scaled
to several subintervals of the range of possible signature sizes.</p>
      <p>Our current results answer the question whether there is a significant
difference between the two locality notions with respect to a given signature. It is also
interesting to ask the same question relative to a given module. To answer it, the
sampling of modules instead of seed signatures requires further investigation.
Acknowledgment. We thank Rafael Gonçalves and the anonymous reviewers for
helpful comments.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Baader</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Calvanese</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McGuinness</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nardi</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Patel-Schneider</surname>
            ,
            <given-names>P.F</given-names>
          </string-name>
          . (eds.):
          <article-title>The Description Logic Handbook: Theory, Implementation, and Applications</article-title>
          . Cambridge University Press (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Cuenca</given-names>
            <surname>Grau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Horrocks</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            ,
            <surname>Kazakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            ,
            <surname>Sattler</surname>
          </string-name>
          ,
          <string-name>
            <surname>U.</surname>
          </string-name>
          :
          <article-title>Modular reuse of ontologies: Theory and practice</article-title>
          .
          <source>J. of Artif. Intell. Research</source>
          <volume>31</volume>
          ,
          <fpage>273</fpage>
          -
          <lpage>318</lpage>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Del</given-names>
            <surname>Vescovo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Gessler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Klinov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Parsia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Sattler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U.</given-names>
            ,
            <surname>Schneider</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Winget</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          :
          <article-title>Decomposition and Modular Structure of BioPortal Ontologies</article-title>
          .
          <source>In: Proc. ISWC-11</source>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>Del</given-names>
            <surname>Vescovo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Parsia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Sattler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U.</given-names>
            ,
            <surname>Schneider</surname>
          </string-name>
          ,
          <string-name>
            <surname>T.</surname>
          </string-name>
          :
          <article-title>The modular structure of an ontology: an empirical study</article-title>
          .
          <source>In: Proc. of WoMO-10. Frontiers in AI and Appl.</source>
          , vol.
          <volume>211</volume>
          , pp.
          <fpage>11</fpage>
          -
          <lpage>24</lpage>
          . IOS Press (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>Del</given-names>
            <surname>Vescovo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Parsia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Sattler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U.</given-names>
            ,
            <surname>Schneider</surname>
          </string-name>
          ,
          <string-name>
            <surname>T.</surname>
          </string-name>
          :
          <article-title>The modular structure of an ontology: atomic decomposition and module count</article-title>
          .
          <source>In: Proc. of WoMO-11. Frontiers in AI and Appl.</source>
          , vol.
          <volume>230</volume>
          , pp.
          <fpage>25</fpage>
          -
          <lpage>39</lpage>
          . IOS Press (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Ghilardi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lutz</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wolter</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Did I damage my ontology? A case for conservative extensions in description logics</article-title>
          .
          <source>In: Proc. of KR-06</source>
          . pp.
          <fpage>187</fpage>
          -
          <lpage>197</lpage>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Horridge</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Parsia</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sattler</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          :
          <article-title>The state of bio-medical ontologies</article-title>
          .
          <source>In: Proc. of 2011</source>
          ISMB
          <string-name>
            <surname>Bio-Ontologies</surname>
            <given-names>SIG</given-names>
          </string-name>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Horrocks</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kutz</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sattler</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          :
          <article-title>The even more irresistible SROIQ</article-title>
          .
          <source>In: Proc. of KR-06</source>
          . pp.
          <fpage>57</fpage>
          -
          <lpage>67</lpage>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Kazakov</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>RIQ and SROIQ are harder than SHOIQ</article-title>
          .
          <source>In: Proc. of KR-08</source>
          . pp.
          <fpage>274</fpage>
          -
          <lpage>284</lpage>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Konev</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lutz</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Walther</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wolter</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Semantic modularity and module extraction in description logics</article-title>
          .
          <source>In: Proc. of ECAI-08. Frontiers in AI and Appl.</source>
          , vol.
          <volume>178</volume>
          , pp.
          <fpage>55</fpage>
          -
          <lpage>59</lpage>
          . IOS Press (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Kontchakov</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wolter</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zakharyaschev</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Logic-based ontology comparison and module extraction, with an application to DL-Lite</article-title>
          .
          <source>Artificial Intelligence</source>
          <volume>174</volume>
          (
          <issue>15</issue>
          ),
          <fpage>1093</fpage>
          -
          <lpage>1141</lpage>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Lutz</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Walther</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wolter</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Conservative extensions in expressive description logics</article-title>
          .
          <source>In: Proc. of IJCAI-07</source>
          . pp.
          <fpage>453</fpage>
          -
          <lpage>458</lpage>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Sattler</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schneider</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zakharyaschev</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Which kind of module should I extract?</article-title>
          <source>In: Proc. of DL</source>
          <year>2009</year>
          .
          <article-title>ceur-ws.org</article-title>
          , vol.
          <volume>477</volume>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Smithson</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <string-name>
            <given-names>Confidence</given-names>
            <surname>Intervals</surname>
          </string-name>
          .
          <article-title>Quantitative Applications in the Social Sciences</article-title>
          ,
          <string-name>
            <surname>Sage Publications</surname>
          </string-name>
          (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>