=Paper=
{{Paper
|id=None
|storemode=property
|title=Syntactic vs. Semantic Locality: How Good Is a Cheap Approximation?
|pdfUrl=https://ceur-ws.org/Vol-875/regular_paper_4.pdf
|volume=Vol-875
|dblpUrl=https://dblp.org/rec/conf/womo/VescovoKPS0T12
}}
==Syntactic vs. Semantic Locality: How Good Is a Cheap Approximation?==
<pdf width="1500px">https://ceur-ws.org/Vol-875/regular_paper_4.pdf</pdf>
<pre>
          Syntactic vs. Semantic Locality:
        How Good Is a Cheap Approximation?

             Chiara Del Vescovo1 , Pavel Klinov2 , Bijan Parsia1 ,
            Uli Sattler1 , Thomas Schneider3 , and Dmitry Tsarkov1
                        1
                           University of Manchester, UK
              {delvescc,bparsia,sattler,tsarkov}@cs.man.ac.uk
                         2
                            University of Ulm, Germany
                           pavel.klinov@uni-ulm.de
                        3
                           Universität Bremen, Germany
                    tschneider@informatik.uni-bremen.de


      Abstract Extracting a subset of a given OWL ontology that captures
      all the ontology’s knowledge about a specified set of terms is a well-
      understood task. This task can be based, for instance, on locality-based
      modules (LBMs). These come in two flavours, syntactic and semantic,
      and a syntactic LBM is known to contain the corresponding semantic
      LBM. For syntactic LBMs, polynomial extraction algorithms are known,
      implemented in the OWL API, and being used. In contrast, extracting
      semantic LBMs involves reasoning, which is intractable for OWL 2 DL,
      and these algorithms had not been implemented yet for expressive onto-
      logy languages.
      We present the first implementation of semantic LBMs and report on
      experiments that compare them with syntactic LBMs extracted from
      real-life ontologies. Our study reveals whether semantic LBMs are worth
      the additional extraction effort, compared with syntactic LBMs.


1   Introduction

Extracting a subset of a given OWL ontology that captures all the ontology’s
knowledge about a specified set of concept and role names is an interesting task
for various applications, and it is by now well-understood [2,10,11]. In general,
we consider a setting where, for a given signature, we want to determine a (small)
subset of a given ontology such that any axiom over the signature entailed by
the ontology is also entailed by the subset. For expressive logics, this task can
be implemented by making use of the notion of locality, and results in what is
known as locality-based modules (LBMs) [2]. Locality comes in many different
flavours, in particular there are notions of syntactic and semantic locality. A
syntactic LBM is known to contain the corresponding semantic LBM, but might
also contain extra axioms which are, because they are not in the semantic LBM,
superfluous for entailments over the given signature. Algorithms for the extrac-
tion of syntactic LBMs are known that run in time that is polynomial in the size
of the ontology (thus much cheaper than reasoning), implemented in the OWL
2       C. Del Vescovo, P. Klinov, B. Parsia, U. Sattler, T. Schneider, D. Tsarkov

API, and being used. In contrast, despite the fact that algorithms for extracting
semantic LBMs are known, until now and to the best of our knowledge, they had
not yet been implemented. Moreover, these involve entailment checking, and are
thus intractable for expressive profiles of OWL 2.
    We present the first implementation of semantic LBMs and report on exper-
iments that compare them with syntactic LBMs extracted from real-life onto-
logies. The contributions of this paper are as follows: we show with statistical
significance that, for almost all members of a large corpus of existing ontologies,
there is no difference between any syntactic LBM and its corresponding semantic
LBM. In the few cases where differences occur, these differences are modest and
not worth the increased computation time needed to compute semantic LBMs.
In addition, we isolate two types of axioms that lead to differences, where one
is a simple tautology that can, in principle, be detected by a straightforward
addition to the syntactic locality checker. Furthermore, our results show that
the extraction of semantic LBMs, which is in principle hard, seems feasible in
practice. The lesson we learn from these results is that “Cheap is Great”!


2   Preliminaries

We assume the reader to be familiar with OWL and the underlying description
logic SROIQ [1,8], and will define the central notions around locality-based
modularity [2].
    Let NC be a set of concept names, and NR a set of role names. A signature
Σ is a set of terms, i.e., a set Σ ⊆ NC ∪ NR of concept and role names. We can
think of a signature as specifying a topic of interest. Axioms that only use terms
from Σ can be thought of as “on-topic”, and all other axioms as “off-topic”. For
instance, if Σ = {Animal, Duck, Grass, eats}, then Duck v ∃eats.Grass is on-topic,
while Duck v Bird is off-topic.
    Any concept, role, or axiom that uses only terms from Σ is called a Σ-concept,
Σ-role, or Σ-axiom. Given any such object X, we call the set of terms in X the
signature of X and denote it with X. e
    Given an interpretation I, we denote its restriction to the terms in a signature
Σ with I|Σ . Two interpretations I and J are said to coincide on a signature Σ,
in symbols I|Σ = J |Σ , if ∆I = ∆J and X I = X J for all X ∈ Σ.
    There are a number of variants of the notion of conservative extensions, which
capture the desired preservation of knowledge to different degrees. We focus on
the deductive variant.

Definition 1. Let M ⊆ O be SROIQ-ontologies and Σ a signature.

(1) O is a deductive Σ-conservative extension (Σ-dCE ) of M if, for all SROIQ-
    axioms α with α e ⊆ Σ, it holds that M |= α if and only if O |= α.
(2) M is a dCE-based module for Σ of O if O is a Σ-dCE of M.

    Unfortunately, deciding in general if a set of axioms is a module in this sense
is hard or even impossible for expressive DLs [6,12], and finding a minimal one
       Syntactic vs. Semantic Locality: How Good Is a Cheap Approximation?            3

is even more so. However, “good sized” modules that are efficiently computable
have been introduced [2]. They are based on the locality of single axioms, which
means that, given Σ, the axiom can always be satisfied independently of the
interpretation of the Σ-terms, but in a restricted way: by interpreting all non-Σ
terms either as the empty set (∅-locality) or as the full domain4 (∆-locality).

Definition 2. A SROIQ-axiom α is called ∅-local (∆-local) w.r.t. signature Σ
if, for each interpretation I, there exists an interpretation J such that I|Σ =
J |Σ , J |= α, and for each X ∈ α e \ Σ, X J = ∅ (for each C ∈ α e \ Σ, C J = ∆
                            J
and for each R ∈ α e \ Σ, R = ∆ × ∆).

    It has been shown in [2] that M ⊆ O and all axioms in O \ M being ∅-local
(or all axioms being ∆-local) w.r.t. Σ ∪ M
                                         f is sufficient for O to be a Σ-dCE of
M. The converse does not hold: e.g., the axiom A ≡ B is neither ∅- nor ∆-local
w.r.t. {A}, but the ontology {A ≡ B} is an {A}-dCE of the empty ontology.
    Furthermore, locality can be tested using available DL-reasoners [2], which
makes this problem considerably easier than testing conservativity. However,
reasoning in expressive DLs is still complex, e.g. N2ExpTime-complete for
SROIQ [9]. In order to achieve tractable module extraction, a syntactic ap-
proximation of locality has been introduced in [2]. The following definition cap-
tures only the case of SHQ-TBoxes and can straightforwardly be extended to
SROIQ ontologies.

Definition 3. An axiom α is called syntactically ⊥-local (>-local ) w.r.t. signa-
ture Σ if it is of the form C ⊥ v C, C v C > , C ⊥ ≡ C ⊥ , C > ≡ C > , R⊥ v R
(R v R> ), or Trans(R⊥ ) (Trans(R> )), where C is an arbitrary concept, R is an
arbitrary role name, R⊥ ∈ / Σ (R> ∈/ Σ), and C ⊥ and C > are from Bot(Σ) and
Top(Σ) as defined in Part (a) (resp. (b)) of the table below.

 (a) ⊥-Locality         Let A⊥ , R⊥ ∈
                                    / Σ, C ⊥ ∈ Bot(Σ), C(i)
                                                        >
                                                            ∈ Top(Σ), n̄ ∈ N \ {0}
Bot(Σ) ::= A⊥ | ⊥ | ¬C > | C u C ⊥ | C ⊥ u C | ∃R.C ⊥ | >n̄ R.C ⊥ | ∃R⊥ .C | >n̄ R⊥ .C
Top(Σ) ::= > | ¬C ⊥ | C1> u C2> | >0 R.C

    (b) >-Locality       Let A> , R> ∈
                                     / Σ, C ⊥ ∈ Bot(Σ), C(i)
                                                         >
                                                             ∈ Top(Σ), n̄ ∈ N \ {0}
    Bot(Σ) ::= ⊥ | ¬C > | C u C ⊥ | C ⊥ u C | ∃R.C ⊥ | >n̄ R.C ⊥
 Top(Σ) ::= A> | > | ¬C ⊥ | C1> u C2> | ∃R> .C > | >n̄ R> .C > | >0 R.C

   It has been shown in [2] that ⊥-locality (>-locality) of an axiom α w.r.t.
Σ implies ∅-locality (∆-locality) of α w.r.t. Σ. Therefore, all axioms in O \ M
being ⊥-local (or all axioms being >-local) w.r.t. Σ ∪ M  f is sufficient for O to
be a Σ-dCE of M. The converse does not hold; examples can be found in [2].
   For each of the four locality notions, modules of O are obtained by starting
with an empty set of axioms and subsequently adding axioms from O that are Σ-
non-local. In order for this procedure to be correct, the signature against which
4
    Or, in the case of roles, the set of all pairs of domain elements.
4        C. Del Vescovo, P. Klinov, B. Parsia, U. Sattler, T. Schneider, D. Tsarkov

locality is checked has to be extended with the terms in the axioms that are
added in each step, so that the resulting module M consists of all the non-local
axioms with respect to Σ ∪ M.f Definition 4 (1) introduces locality-based mod-
ules, which are always dCE-based modules [2], although not necessarily minimal
ones. Modules based on syntactic (semantic) locality can be made smaller by
iteratively nesting >- and ⊥-extraction (∆- and ∅-extraction), and the result
is still a dCE-based module [2,13]. These so-called >⊥∗ -modules (∆∅∗ -modules)
are introduced in Definition 4 (3).

Definition 4. Let x ∈ {∅, ∆, ⊥, >}, yz ∈ {>⊥, ∆∅}, O an ontology and Σ a
signature.

(1) An ontology M is the x-module of O w.r.t. Σ if it is the output of Al-
    gorithm 1. We write M = x-mod(Σ, O).
(2) An ontology M is the yz-module of O w.r.t. Σ, written M = yz-mod(Σ, O),
    if M = y-mod(Σ, z-mod(Σ, O)).
(3) Let (Mi )i>0 be a sequence of ontologies such that M0 = O and Mi+1 =
    yz-mod(Σ, Mi ) for every i > 0. For the smallest n > 0 with Mn = Mn+1 ,
    we call Mn the yz ∗ -module of O w.r.t. Σ, written M = yz ∗ -mod(Σ, O).


Algorithm 1 Extract a locality-based module
    Input: Ont. O, sig. Σ, x ∈ {∅, ∆, ⊥, >}       Output: x-module M of O w.r.t. Σ
               0
    M ← ∅; O ← O
    repeat
      changed ← false
      for all α ∈ O0 do
        if α not x-local w.r.t. Σ ∪ M
                                    f then
           M ← M ∪ {α}; O ← O0 \ {α}; changed ← true
                               0

    until changed = false
    return M

    As for (1), it has been shown in [2] that the output M of Algorithm 1 does
not depend on the order in which the axioms α are selected.5 Furthermore,
the integer n in (3) exists because the sequence (Mi )i>0 is decreasing (more
precisely, we have M0 ⊃ · · · ⊃ Mn = Mn+1 = . . . ). Due to monotonicity
properties of locality-based modules, the dual notions of ⊥>∗ - and ∅∆∗ -modules
are uninteresting because they coincide with those of >⊥∗ - and ∆∅∗ -modules.
    Roughly speaking, a ∆- or >-module for Σ gives a view from above because
it contains all subclasses of class names in Σ, while a ∅- or ⊥-module for Σ gives
a view from below since it contains all superconcepts of concept names in Σ.
    Modulo the locality check, Algorithm 1 runs in time cubic in |O| + |Σ| [2].
Modules based on ⊥/>-locality are therefore a feasible approximation for mod-
ules based on ∅/∆-locality. In both cases, modules are extracted axiom by axiom
5
    Our algorithm is a special case of the one in [2, Figure 4].
    Syntactic vs. Semantic Locality: How Good Is a Cheap Approximation?           5

but, as said above, the ∅/∆-locality check is more complex. A module extractor
is implemented in the OWL API6 and SSWAP7 . To summarize:
 1. Given an ontology O, the semantic module Msem   Σ   for a signature Σ is con-
    tained in the corresponding syntactic module Msyn Σ for the same seed signa-
    ture.8 This means that in principle more unnecessary axioms for preserving
    entailments over Σ can end up in syntactic modules rather than in semantic
    modules.
 2. The extraction of a syntactic module can be done in polynomial time w.r.t.
    the size of the ontology O. In contrast, the extraction of a semantic module
    is as hard as reasoning.


3    Experimental design
The main aim of this paper is to investigate how well syntactic locality approx-
imates semantic locality. In particular, we want to see how (un)likely it is that
syntactic locality-based modules are larger than semantic locality-based ones
and how large these differences are. We also want to understand empirically how
much more costly semantic locality is in terms of performance.

Selection of the Corpus. For our experiments, we have built a corpus containing:
(1) from the TONES repository,9 those ontologies that have already been studied
in a previous work on modularity [4]: Koala, Mereology, University, People, mini-
Tambis, OWL-S, Tambis, Galen; (2) all ontologies from the NCBO BioPortal
ontology repository.10
    We then filter out all those the ontologies for which at least one of the fol-
lowing problems occurs: the ontology is impossible to download; the .owl file
is corrupted when downloaded; the file is not parseable; the ontology is incon-
sistent. Furthermore, due to time constraints, we exclude from this preliminary
investigation all ontologies whose size exceeds 10, 000 axioms.
    This selection results in a corpus of 156 ontologies, which greatly differ in
size and expressivity [7], as summarized in Table 3. For a full list of the corpus,
please refer to the technical report: http://arxiv.org/abs/1207.1641
     Repository      Range of expressivity       Range #axs.    Range sig. size
     BioPortal    ALCN -SHIN (D)/SOIN (D)         38–4,735        21–3,161
     TONES        AL-SROIF(D)/SHOIQ(D)            13–9,629        14–9,221

                             Table 1. Ontology corpus


6
   http://owlapi.sourceforge.net
7
   http://sswap.info
 8
   Recall that ⊥-syntactic modules approximate ∅-semantic modules, while >-syntactic
   modules approximate ∆-semantic modules.
 9
   http://owl.cs.manchester.ac.uk/repository/
10
   http://bioportal.bioontology.org
6         C. Del Vescovo, P. Klinov, B. Parsia, U. Sattler, T. Schneider, D. Tsarkov

Comparing Syntactic and Semantic Locality. In order to compare syntactic and
semantic locality, we want to understand:
 1. whether, for a given seed signature Σ, the semantic Σ-module is likely to be
    smaller than the syntactic Σ-module, and if so by how much,11
 2. how feasible the extraction of semantic modules is.
   Here, we focus on the two corresponding notions of ∅-semantic locality and
⊥-syntactic locality. In particular, ⊥-syntactic locality has been throughly in-
vestigated in previous work [3], and it has proven to have many interesting
properties. A completion of the investigation described in this paper for all fun-
damental notions of modules is planned in our future work.
   Due to the recursive nature of the locality-based module extraction algorithm,
we want to investigate locality both on a
    – per-axiom basis: given an axiom α and a signature Σ, is it likely that α is
      semantically ∅-local w.r.t. Σ but not syntactically ⊥- local w.r.t. Σ?
    – per-module basis: given a signature Σ, is it likely that ⊥-mod(Σ, O) 6=
      ∅-mod(Σ, O)? If yes, is it likely that the difference is large?
    Hence we need to pick, for each ontology in our corpus, a suitable set of sig-
natures, and this poses a significant problem. First, we do not yet have enough
insight into what typical seed signatures are for module extraction. One could
assume that large ones are rarely relevant for module extraction—why bother
with extracting a large module—but this still leaves a large, i.e., exponential
space of possible seed signatures. If m = #O, e there are 2m possible seed signa-
tures for which axioms can be tested for locality and for which modules can be
extracted. Hence a full investigation is infeasible.
    One could assume that the comparison between semantic and syntactic mod-
ules could be easier since many signatures can lead to the same module. In other
words, the statistically significant number of modules w.r.t. the total number
of modules is not larger than that of seed signatures needed w.r.t. the total
number of seed signatures. In previous work [4,5], however, modules have been
studied with respect to how numerous they are in real-world ontologies. The
experiments carried out suggest that the number of modules in ontologies is, in
general, exponential w.r.t. the size of the ontology. Moreover, the extraction of
enough different modules can be hard, because by looking just at seed signatures
there is no chance to avoid the extraction of the same module many times. In
particular, for a module M there can be exponentially many seed signatures
w.r.t. #Mf that generate M [3].
    As a consequence, we compare the two kinds of locality of axioms—both
on a per-axiom basis and a per-module basis—w.r.t. random signatures. To
avoid any bias, we select a random signature as follows: we set each named
entity E in the ontology to have probability p = 1/2 of being included in the
signature. Thus each seed signature has the same probability to be chosen. For
ontologies whose signature exceeds 9 entities, in order to get results where the
11
     Recall that the semantic Σ-module is always a subset of the syntactic Σ-module.
     Syntactic vs. Semantic Locality: How Good Is a Cheap Approximation?       7

true proportion of differences between the two notions of locality lies in the
confidence interval (±5%) with confidence level 95%, we have to select only 400
random signatures [14]. That is, we need to test only 400 random signatures to
have a confidence of 95% (±5%) that the differences/equalities we observe reflect
the real ones.

Non-random seed signatures. A module, in general, does not necessarily show any
internal coherence: intuitively, if we had an ontology describing some knowledge
from both the domains of Geology and of Philosophy, we could still extract the
module for the signature Σ = {Epistemology, Mineral}. This module is likely
to be the union of the two disjoint modules for Σ1 = {Epistemology} and
Σ2 = {Mineral}. This combinatorial behaviour can lead to exponentially many
modules in the size of the signature of the ontology and indeed, as mentioned
above, the number of modules in ontologies seems to be exponential [4,5].
    In contrast to general modules, genuine modules can be called coherent: they
are defined as those modules that cannot be decomposed into the union of two
different modules. Notably, there are only linearly many genuine modules in the
size of the ontology O, and the set of genuine modules is a base for all general
modules: any module is either genuine or the union of genuine modules. The
linear bound on the number of genuine modules is due to the fact that, for each
genuine x-module M, there is an axiom α such that M = x-mod(α̃, O).
    Thus genuine modules can be said to be interesting modules that we can
fully investigate. Hence in addition to the above mentioned investigation of ⊥-
and ∅-modules for random signatures, we also look at all axiom signatures.
    In summary, we test:
(T1) for random seed signatures Σ,
   (a) for each axiom α in our corpus, is α semantically ∅-local w.r.t. Σ but
       not syntactically ⊥- local w.r.t. Σ?
   (b) is ⊥-mod(Σ, O) 6= ∅-mod(Σ, O)? If yes, we determine the difference and
       its size.
(T2) for each axiom signature from our corpus, is ⊥-mod(α̃, O) 6= ∅-mod(α̃, O)?
   If yes, we determine the difference and its size.


4    Experimental comparison
No differences. The main result of the experiment is that, for 151 of the 156
ontologies we tested, no difference between ⊥- and ∅-locality can be observed.
These 151 ontologies exclude the two NCBO BioPortal ontologies EFO (Ex-
perimental Factor Ontology) and SWO (Software Ontology), as well as Koala,
miniTambis, and Tambis. More specifically, for every generated seed signature,
the corresponding ⊥- and ∅-module agree, and every axiom is either ⊥- and
∅-local, or neither. This statement applies to all randomly generated seed sig-
natures as well as for all axiom signatures – which are seed signatures for all
genuine modules. We can therefore draw the following conclusions for the 151
ontologies with respect to (T1) and (T2) above.
8       C. Del Vescovo, P. Klinov, B. Parsia, U. Sattler, T. Schneider, D. Tsarkov

(T1) Given an arbitrary seed signature Σ, there is no difference (a) between
   ⊥- and ∅-locality of any given axiom w.r.t. Σ and (b) between the ⊥- and
   ∅-modules for Σ, both times at a significance level of 0.05.
(T2) Given any axiom signature Σ, there is no difference between the ⊥- and
   ∅-modules for Σ.

    In the case of the 151 ontologies, the extraction of a ∅-module (with tautology
tests performed by FaCT++) often took considerably longer than the extraction
of the corresponding ⊥-module. For example, for MoleculeRole, the largest of
the 151 ontologies, times to extract a ⊥-module (test all axioms for ⊥-locality,
respectively) ranged between 27 and 169ms (21 and 77ms, respectively), while
the extraction of a ∅-module (test of all axioms for ∅-locality, resp.) took up
to 6 × as long, on average 2.7 × (2.0 ×, resp.). It is also worth noting that the
ontologies Galen and People, which are renowned for having particularly large
⊥-modules [2,5], are among those without differences between ⊥- and ∅-locality.

Differences. For the five ontologies where differences between ⊥- and ∅-modules
(or -locality) occur, we isolated two types of culprits – axioms which are not
⊥-local w.r.t. some signature Σ, but which are ∅-local w.r.t. Σ. Type-1 culprits
are simple tautologies that have accidentally entered the “inferred view” – i.e.,
closure under certain entailments – of two ontologies. They do not occur in the
original “asserted” versions and can, in principle, be detected by a slightly refined
syntactic locality check. Type-2 culprits are definitions of concept names via a
conjunction that satisfies certain conditions explained below. There are not many
type-1 and type-2 axioms in the affected ontologies, and the observed differences
are comparably small. Table 2 gives an overview of the differences observed.

Type-1 culprits are axioms InverseObjectProperties(P, InverseOf(P)),
where P is a role. This translates into the tautology P ≡ (P− )− in DL nota-
tion. Such an axiom is therefore ∅-local w.r.t. any signature. However, it behaves
differently for ⊥-locality: if the signature Σ contains P, then both sides of the
equation are neither in Bot(Σ) nor in Top(Σ), hence the axiom is considered
non-local; otherwise, both sides are ⊥-equivalent, hence the axiom is local.
    Type-1 axioms occur in the “inferred view” of the ontologies EFO and SWO.
Table 2 shows the relatively modest differences caused by these axioms. In all
cases, there are no other axioms in the differences. This means that no differences
occur for the non-inferred original versions of EFO and SWO.

Type-2 culprits are complex definitions A ≡ C of a concept name A where
C is a disjunction that contains both a universal and an existential (or min-
imum cardinality) restriction on the same role. This affects the ontologies Koala,
miniTambis, and Tambis. The effect is best illustrated for Koala, which contains
exactly one such axiom, namely M ≡ S u ∀c.F u ∀g.{m} u =3 c.>, where we
have abbreviated the concept names MaleStudentWith3Daughters, Student,
Female, the roles hasChildren, hasGender, and the nominal male. Now if the
signature against which the axiom is tested for locality contains {S, c, g} but
     Syntactic vs. Semantic Locality: How Good Is a Cheap Approximation?                 9


          Ontology     #axs    #differences difference         time    culprit
                                            sizes              ratio   type and
                                            #axs       rel.     avg.   frequency
          SWO           3446   T1 a     400    6–22    0–1%    3.31    1 (30×)
                               T1 b     400   23–29    1–2%    5.11
                               T2      3446     3–1    1–5%    5.86
          EFO           6008   T1 a     400    8–24 0–1%       1.42    1 (32×)
                               T1 b     400   13–30 0–1%       1.38
                               T2       128     1–4 9–17%        —
          Koala           42   T1 a       0        0     0%      —     2 (1×)
                               T1 b       2        1     3%      —
                               T2         0        0     0%      —
          miniTambis     170   T1 a      68     1–2 1–3%         —     2 (3×)
                               T1 b      93     1–4 1–3%         —
                               T2        26     1–7 6–75%        —
          Tambis         592   T1 a      58     1–3 0–1%       3.31    2 (11×)
                               T1 b     229    2–11 0–2%       5.01
                               T2       191    4–41 2–26%        —

Table 2. Overview table of differences observed. The columns show: the ontology name;
the overall number of axioms; the name of the test (see list on Page 7); the number of
cases with differences; the number of axioms in the differences (absolute and relative
to the ⊥-case); the average time ratio ∅ : ⊥ (“—” indicates that no reliable statement is
possible: the time for ⊥ is only a few, often 0, milliseconds); the type of culprit present
and the number of axioms of this type.


neither M nor F, then this axiom is not ⊥-local because none of the conjuncts on
the right-hand side is in Bot(Σ). On the other hand, this axiom is a tautology
when M and F are replaced by ⊥: the conjunction ∀c.⊥ u =3 c.> cannot have any
instances, regardless of how c is interpreted.
    For Koala, this effect only causes two singleton differences between sets of
local axioms for the randomly generated seed signatures, as shown in Table 2.
For axiom signatures, there is no difference. Interestingly, this effect does not
propagate to modules: for all signatures, ⊥- and ∅-modules are the same. The
reason might be that (a) g is used in many axioms and is thus very likely to
contribute to the extended signature during module extraction, and (b) then the
axiom defining F is no longer local, which “pulls” F into the extended signature,
preventing the observed effect.
    In miniTambis and Tambis, this effect is much stronger and affects a large
proportion of modules, as shown in Table 2. The differences in these cases do
not only consist of culprit axioms, but also of axioms that become non-local
after the signature has been extended by the terms in the culprit axioms. Still,
the size of the differences is mostly modest while, for Tambis, the ∅-locality test
(∅-module extraction) takes on average over three times (five times) as long as
the ⊥-locality test (⊥-module extraction).
10       C. Del Vescovo, P. Klinov, B. Parsia, U. Sattler, T. Schneider, D. Tsarkov

5      Conclusion and Outlook

Summary. We obtain two main observations from the experiments carried out.

 – In practice, there is no or little difference between semantic and syntactic
   locality. That is, the computationally cheaper syntactic locality is a good
   approximation of semantic locality.
 – Though in principle hard to compute, semantic modules can be extracted
   rather fast in practice.

    These results suggest that it is questionable to conclude that semantic locality
should be preferred to syntactic locality. In terms of computation time, there is
often a benefit in using syntactic locality: the average speed-up compared to the
extraction of a semantic-locality based module is by a factor of up to 6. For
some particular module pairs, it is higher by an order of magnitude. The gain
in module size is zero or so small that it is hard to justify the extra time spent.
In particular, there is no gain in size for the ontologies Galen and People, which
are “renowned” for having disproportionately large modules [2,5].
    Our results are interesting not only because they provide an evaluation of
how good the cheap syntactic locality approximates semantic locality, but also
because they enabled us to fix bugs in the implementation of syntactic modular-
ity. For example, earlier data from the experiment have shown that reflexivity
axioms had been treated incorrectly by the syntactic locality checker.

Future Work. It is evident that this work is preliminary. It investigates only
the differences between the related notions of ⊥- and ∅-locality. We plan to ex-
tend the same study to other notions of locality, in particular, nested modules
(>⊥∗ - vs. ∆∅∗ -modules) – these notions are the most economical in terms of
module size. Moreover, we want to extend the investigation to the remaining
larger ontologies in the BioPortal repository and further large ontologies, e.g.,
some versions of the NCI Thesaurus12 . Preliminary results with a version that
is not among the regular releases show differences due to type-2 culprits, but we
have not included them here because the differences disappear after removing
axioms that were introduced due a problem with object and annotation proper-
ties when the ontology file is parsed by the OWL API. This behaviour is yet to
be investigated and explained.
    Another interesting extension is to modify the seed signature sampling. Cur-
rently, the random variable “size of the seed signature generated” follows the
binomial distribution with expected value m/2 and variance m/4. Hence, most
signatures in the sample have size around m/2; small and large signatures are un-
derrepresented. For example, for one ontology with 915 terms, all signature sizes
lay between 422 and 509. One might argue that, for big ontologies, the typical
module extraction scenario does not require large seed signatures – but it does
sometimes require relatively small seed signatures, for example, when a module
is extracted to efficiently answer a given entailment query of typically small size.
12
     Downloadable from http://evs.nci.nih.gov/ftp1/NCI_Thesaurus
     Syntactic vs. Semantic Locality: How Good Is a Cheap Approximation?              11

On the other hand, large modules resulting from larger seed signatures may be
more likely to differ. We therefore plan an alternative seed signature sampling
via bins for average signature sizes: repeat the current sampling procedure scaled
to several subintervals of the range of possible signature sizes.
    Our current results answer the question whether there is a significant differ-
ence between the two locality notions with respect to a given signature. It is also
interesting to ask the same question relative to a given module. To answer it, the
sampling of modules instead of seed signatures requires further investigation.

Acknowledgment. We thank Rafael Gonçalves and the anonymous reviewers for
helpful comments.


References
 1. Baader, F., Calvanese, D., McGuinness, D., Nardi, D., Patel-Schneider, P.F. (eds.):
    The Description Logic Handbook: Theory, Implementation, and Applications.
    Cambridge University Press (2003)
 2. Cuenca Grau, B., Horrocks, I., Kazakov, Y., Sattler, U.: Modular reuse of ontolo-
    gies: Theory and practice. J. of Artif. Intell. Research 31, 273–318 (2008)
 3. Del Vescovo, C., Gessler, D., Klinov, P., Parsia, B., Sattler, U., Schneider, T.,
    Winget, A.: Decomposition and Modular Structure of BioPortal Ontologies. In:
    Proc. ISWC-11 (2011)
 4. Del Vescovo, C., Parsia, B., Sattler, U., Schneider, T.: The modular structure of
    an ontology: an empirical study. In: Proc. of WoMO-10. Frontiers in AI and Appl.,
    vol. 211, pp. 11–24. IOS Press (2010)
 5. Del Vescovo, C., Parsia, B., Sattler, U., Schneider, T.: The modular structure of
    an ontology: atomic decomposition and module count. In: Proc. of WoMO-11.
    Frontiers in AI and Appl., vol. 230, pp. 25–39. IOS Press (2011)
 6. Ghilardi, S., Lutz, C., Wolter, F.: Did I damage my ontology? A case for conser-
    vative extensions in description logics. In: Proc. of KR-06. pp. 187–197 (2006)
 7. Horridge, M., Parsia, B., Sattler, U.: The state of bio-medical ontologies. In: Proc.
    of 2011 ISMB Bio-Ontologies SIG (2011)
 8. Horrocks, I., Kutz, O., Sattler, U.: The even more irresistible SROIQ. In: Proc.
    of KR-06. pp. 57–67 (2006)
 9. Kazakov, Y.: RIQ and SROIQ are harder than SHOIQ. In: Proc. of KR-08. pp.
    274–284 (2008)
10. Konev, B., Lutz, C., Walther, D., Wolter, F.: Semantic modularity and module
    extraction in description logics. In: Proc. of ECAI-08. Frontiers in AI and Appl.,
    vol. 178, pp. 55–59. IOS Press (2008)
11. Kontchakov, R., Wolter, F., Zakharyaschev, M.: Logic-based ontology compar-
    ison and module extraction, with an application to DL-Lite. Artificial Intelligence
    174(15), 1093–1141 (2010)
12. Lutz, C., Walther, D., Wolter, F.: Conservative extensions in expressive description
    logics. In: Proc. of IJCAI-07. pp. 453–458 (2007)
13. Sattler, U., Schneider, T., Zakharyaschev, M.: Which kind of module should I
    extract? In: Proc. of DL 2009. ceur-ws.org, vol. 477 (2009)
14. Smithson, M.: Confidence Intervals. Quantitative Applications in the Social Sci-
    ences, Sage Publications (2003)

</pre>