Empirical Study of Logic-Based Modules:
                  Cheap Is Cheerful

               Chiara Del Vescovo1 , Pavel Klinov2 , Bijan Parsia1 ,
           Ulrike Sattler1 , Thomas Schneider3 , and Dmitry Tsarkov1
                           1
                            University of Manchester, UK
               {delvescc|bparsia|sattler|tsarkov}@cs.man.ac.uk
                          2
                             University of Ulm, Germany
                            pavel.klinov@uni-ulm.de
                         3
                            Universität Bremen, Germany
                     tschneider@informatik.uni-bremen.de


       Abstract. For ontology reuse and integration, a number of approaches
       have been devised that aim at identifying modules, i.e., suitably small
       sets of “relevant” axioms from ontologies. Here we consider three logically
       sound notions of modules: MEX modules, only applicable to inexpressive
       ontologies; modules based on semantic locality, a sound approximation of
       the first; and modules based on syntactic locality, a sound approximation
       of the second (and thus the first), widely used since these modules can
       be extracted from SROIQ ontologies in time polynomial in the size of
       the ontology.
       In this paper we investigate the quality of both approximations over a
       large corpus of ontologies. In particular, we show with statistical signifi-
       cance that, in most cases, there is no difference between the two module
       notions based on locality; where they differ, the additional axioms are in
       general unproblematic since either they can be easily ruled out or their
       number is relatively small. Finally, we show that the same can be said
       about the relation between MEX and locality-based modules.


1     Introduction
Some notable examples of ontologies describe large and loosely connected do-
mains, as it is the case for SNOMED–CT, the Systematized Nomenclature Of
MEDicine, Clinical Terms,4 which describes the terminology used in medicine
including diseases, drugs, etc. Users often are not interested in a whole ontology
O, rather only in a limited relevant part of it. In this context, the idea has been
recently explored to use modules, i.e., suitably small subsets of ontologies that
behave for specific purposes as the original ontologies over a given signature Σ,
i.e., a set of terms (non-logical symbols – concept and role names). The notion
of logical module [9,4] focuses on providing coverage, i.e., on preserving all the
entailments of O over Σ.
     In [19] the authors comment on the crucial role played by coverage and by
two additional properties of modules for ontology reuse and integration. Let M
4
    http://www.ihtsdo.org/snomed-ct/
be a subset of O. We say that: (1) M is self-contained if it provides coverage for
its signature; (2) M is depleting if the remainder O \ M of the ontology does
not entail any non-tautological axiom η over Σ. Under some mild conditions a
minimal depleting and self-contained module is also uniquely determined [15].
    Extracting the uniquely determined module for a signature Σ is, however,
hard or even impossible for expressive languages [18,10,17]. For identifying no-
tions of modules whose extraction is feasible, we can either restrict the expres-
sivity of the ontology language, or look for feasible sufficient conditions that
guarantee M to be a module for Σ, even though not necessarily the smallest.
    For inexpressive logics, one can make use of the module extractor imple-
mented in the MEX system [13]. This module extractor works on acyclic ELI
terminologies and extracts the minimal module in polynomial time.
    For expressive logics, module extraction can be implemented making use of
the notion of locality. The resulting modules, known as locality-based modules
(LBMs) [2] are, in general, not minimal. Locality comes in two flavors: semantic
and syntactic locality, which have a bottom-, a top-, and a star-variant which is
contained in both the top- and the bottom-variants. For any of the three variants,
a syntactic LBM contains the corresponding semantic LBM. Algorithms for the
extraction of syntactic LBMs are known that run in time polynomial in the size of
the ontology (thus much cheaper than reasoning), are implemented in the OWL
API,5 and are currently used for ontology reuse and integration. In contrast,
despite the fact that algorithms for extracting semantic LBMs are known, until
now and to the best of our knowledge they had not been implemented. They
require entailment checks against an empty ontology and thus involve reasoning
of a kind that is rather unusual for DL reasoners.6
    We know that the MEX module for a signature Σ is contained in the star
semantic LBM which, in turn, is contained in the star syntactic LBM. Thus,
syntactic locality can be seen as an approximation of semantic locality which,
in turn, is an approximation of MEX modules. An interesting question arising
here is how good these approximations are: how much larger are the modules
extracted by the approximations, and how much faster is the extraction?
    To answer these questions, we present the first implementation of a semantic
LBM extractor and report on experiments on a large corpus of ontologies. We
compare the performance and results of the semantic LBM extractor with those
of a syntactic LBM extractor and with those of MEX. A comparison between
MEX- and syntactic bottom-modules only for SNOMED–CT is reported in [13].
    The contributions of this paper are as follows.
• We provide the first implementation for the extraction of semantic LBMs,
which is embedded in the latest release of the FaCT++ reasoner. Our results
show that the extraction of semantic LBMs, which is in principle hard, is feasible
in practice: on average, it is between 3 times (for top-modules) and 15 times (for
bottom- and star-modules) slower than the extraction of syntactic LBMs, and
both only take milliseconds to seconds for most ontologies below 10K axioms.
5
    http://owlapi.sourceforge.net/
6
    DL reasoners usually classify an ontology: test it for consistency and all concept
    names for satisfiability/mutual subsumption.
• We show with statistical significance that, for almost all members of a large
corpus of existing ontologies, there is no difference between any syntactic LBM
and its semantic counterpart. In the few cases where differences occur, those are
extremely modest so that it is questionable whether extracting semantic LBMs
is worth the increased computational cost.
• We isolate four patterns of axioms that completely explain those rare differ-
ences. One includes simple tautologies that can be removed in a straightforward
preprocessing step.
• We modify the original corpus to obtain for each ontology an acyclic EL version
suitable for the use with the MEX system. We then compare MEX-modules and
the star-variants of LBMs, and find differences in only ∼ 27% of the corpus. We
explain one reason for the largest differences observed.

2   Preliminaries
We assume the reader to be familiar with Description Logic languages (e.g.
SROIQ [1,12]), and aim here at fixing the notations and at defining the key
notions around module extraction, with a focus on locality-based modules [2]
and MEX modules [14].
    Let O denote a SROIQ ontology, NC a set of concept names, and NR a set of
role names. We do not consider individual names as they do not play any role
in the extraction of the modules here analyzed. We refer to either a concept or
a role name by using the word “term”. A signature is a set Σ ⊆ NC ∪ NR . Given
a concept, role, or axiom X, we call the set of terms in X the signature of X,
denoted X. e Given a set M ⊆ O of axioms from O, and a signature Σ, we say
that O is a deductive Σ-conservative extension (Σ-dCE ) of M if, for all SROIQ-
axioms α with α  e ⊆ Σ, it holds that O |= α if and only if M |= α. O is a model
Σ-conservative extension (Σ-mCE ) of M if {I|Σ | I |= O} = {I|Σ | I |= M}.
Dually, M is a dCE-based module of O for Σ if O is a Σ-dCE of M, and it is
an mCE-based module for Σ if O is a Σ-mCE of M. All dCE-based modules
are also mCE-based modules, whilst the converse is not always true. A module
M ⊆ O for Σ is called depleting if there is no non trivial entailment η over Σ
such that O \ M |= η; M is called self-contained if M is a module for Σ = M.  f
    Since M ⊆ O the monotonicity of SROIQ implies that every entailment
η over Σ derivable from M is also derivable from O. Deciding the converse
directions is in general computationally hard, or even undecidable for expressive
DLs [18,10,17]. Since we do not need to find all the subsets of O that are a
module for Σ, we can use easier conditions which guarantee that a set of axioms
M ⊆ O is a module for Σ.
    One strategy to extract a module M ⊆ O for a seed signature Σ consists
of defining a suitable oracle x to decide whether a single axiom α ∈ O is x-
relevant for preserving the non-tautological entailments over Σ: clearly, if the
x-check is positive for α, then by depletion M needs to contain α. To guarantee
also self-containment, the fixpoint procedure described in Algorithm 1 needs to
be performed. The module extracted will be called an x-module, parameterized
by the notion x of relevance. Algorithm 1 is a special case of the one in [2,
Algorithm 1 Extraction of an x-module for Σ
    Input: Ontology O, seed signature Σ, notion of relevance x
    Output: x-module M of O w.r.t. Σ
    M ← ∅; O0 ← O
    repeat
      changed ← false
      for all α ∈ O0 do
        if α is x-relevant w.r.t. Σ ∪ Mf then
                               0     0
           M ← M ∪ {α}; O ← O \ {α}; changed ← true
    until changed = false
    return M

Figure 4], and its output M does not depend on the order in which the axioms
α are selected [2].
   Due to space limitations, we cannot provide a full description of the notions
of modules here investigated, and refer the interested reader to [2,14] for further
details. We briefly sketch the intuition behind each notion of relevance and the
corresponding results of interest for this paper.

The MEX system. In [14], a procedure called MEX for extracting minimal mCE-
based modules from acyclic ELI ontologies is given. The notion of MEX-relevance
is based on a relation dependO , which associates each concept name A with the
set dependO (A) of all the symbols X that are used in the definition7 of A in O.
Then, an axiom α ∈ O is MEX-relevant for a signature Σ if it defines a term
in Σ. The authors prove that, if O is an acyclic ELI ontology, then using the
MEX-relevance in Algorithm 1 generates the minimal depleting self-contained
module for each signature Σ, and that the extraction runs in polynomial time.

Semantic locality In [2], the authors define the notion of locality and distin-
guish two flavors, here called ∅- and ∆-locality. Intuitively, a SROIQ axiom α is
∅-local (resp. ∆-local ) w.r.t. signature Σ if α0 obtained by replacing all terms in
e \ Σ with ⊥ (resp. >) is a tautology. The authors prove that, if all axioms in
α
O \ M are ∅-local (or all axioms are ∆-local) w.r.t. Σ ∪ M, f then M is an mCE-
based (and hence dCE-based) module of O for Σ. Since deciding ∅- or ∆-locality
requires tautology checks, this problem is as hard as standard reasoning. In some
cases, α0 is not a SROIQ axiom, so standard reasoners need to be extended.

Syntactic locality In order to achieve tractable module extraction, [2] define the
two syntactic notions of ⊥- and >-locality. Those approximate the two notions
of semantic ∅- and ∆-locality. Intuitively, the syntactic rules provided describe
axioms α for which α0 is obviously a tautology—in which case α is said to be ⊥-
or >-local w.r.t. Σ. Thus, every ∅-local (∆-local, resp.) axiom w.r.t. Σ is also
⊥-local (>-local, resp.) w.r.t. Σ, but not vice versa. As a consequence, also ⊥-
and >-modules are mCE- and dCE-based modules for Σ. Applying the syntactic
rules requires polynomial time, hence the extraction of this kind of modules can
be performed in time polynomial in the size of the ontology.
7
    This notion of “definition” is specific for the MEX-modules
    Modules based on syntactic (semantic) locality can be made smaller by it-
eratively nesting >- and ⊥-extraction (∆- and ∅-extraction), again obtaining
mCE- and dCE-based modules [2,19], called >⊥∗ - and ∆∅∗ -modules. The ∆∅∗ -
module for Σ is always contained in the corresponding >⊥∗ -module. Moreover,
for acyclic ELI ontologies, the MEX-module for Σ is always contained in the
corresponding ∆∅∗ -module.


3     Research questions and experimental design
A natural question is whether syntactic modules are likely to be much larger than
the semantic ones, which are, in theory, computationally more costly. Hence, a
second question is whether semantic module extraction is noticeably more costly:
the tautology test has to be carried out often—once per axiom and signature
that the algorithm goes through— and it is thus hard to predict the feasibility
of semantic LBM extraction. Altogether, we want to know whether syntactic
LBMs are a necessary approximation and how good an approximation of se-
mantic LBMs they are. Similarly, for acyclic ELI terminologies the analogous
question arises: how good an approximation of MEX modules are LBMs?
    One can always construct ontologies with huge differences in size and time
between syntactic and semantic LBMs and between LBMs and MEX modules.
Here, we are interested in these differences in currently available ontologies, and
thus need to design, run, and analyse suitable experiments.
Selection of the corpus. For our experiments, we have built a corpus con-
taining: (1) all the ontologies from the NCBO BioPortal ontology repository,8
version of November 2012; (2) ontologies from the TONES repository9 which
have already been studied in previous work on modularity [6]: Koala, Mereology,
University, People, miniTambis, OWL-S, Tambis, Galen. From this corpus, we have
removed ontologies that cannot be downloaded, whose .owl file is corrupted or
impossible to parse, or which are inconsistent. Furthermore, we have excluded
those large ontologies (exceeding 10K axioms) where the extraction of a semantic
LBM repeatedly took more than 2 minutes: for each such ontology, the estimated
time needed to perform our experiments would have exceeded 300 hours.
    This selection results in a corpus of 242 ontologies, which greatly vary in
expressivity (from AL to SROIQ(D)) and in size (10–16,066 axioms, 10–16,068
terms) [11]. For a full list of the corpus, please refer to [5].
    As mentioned before, for some ontologies is not possible to test ∆-locality
(and thus for extracting ∆- and ∆∅∗ -modules) using standard DL reasoners., see
[5] for details. To cover these cases, we have extended the reasoner FaCT++ to
cover the uses of the >-role specific to the semantic locality tests.
    Since MEX handles only acyclic ELI ontologies, we created an ELI version
ELI(O) of each ontology O in our corpus by filtering unsupported axioms and
breaking terminological cycles. While a principled way of doing this is beyond
the scope of this paper, we have used a heuristic, which is described in [5].
8
    http://bioportal.bioontology.org
9
    http://owl.cs.manchester.ac.uk/repository/
Comparing modules and locality. In order to compare syntactic and se-
mantic locality, as well as LBMs and MEX modules, we want to understand
(1) whether, for a given seed signature Σ, it is likely that there is a difference
between the syntactic and the semantic Σ-module or the MEX module for Σ
and the latter and, if so, the size of the difference;10 and(2) how feasible the ex-
traction of semantic LBMs is. For this purpose, we compare (a) ∅-semantic and
⊥-syntactic locality, ∆-semantic and >-syntactic locality, (b) ∅- and ⊥-modules,
∆- and >-modules, ∆∅∗ - and >⊥∗ -modules, (c) MEX modules and ∆∅∗ -modules.
Due to the recursive nature of Algorithm 1, our investigation is both on a

per-axiom-basis: given axiom α and signature Σ, is it likely that α is semanti-
   cally ∅-local (∆-local, resp.) w.r.t. Σ but not syntactically ⊥-local (>-local,
   resp.) w.r.t. Σ?
per-module basis: given a signature Σ, is it likely that
     – ⊥-mod(Σ, O) 6= ∅-mod(Σ, O), or
     – >-mod(Σ, O) 6= ∆-mod(Σ, O), or
     – >⊥∗ -mod(Σ, O) 6= ∆∅∗ -mod(Σ, O), or
     – ∆∅∗ -mod(Σ, O) 6= MEX-mod(Σ, O)?
   If yes, is it likely that the difference is large?

    Clearly we need to pick, for each ontology in our corpus, a suitable set of
signatures, and this poses a significant problem. A full investigation is infeasible:
if m = #O,   e there are 2m possible seed signatures, so that testing axioms for
locality against all the signatures is already impossible for m ∼ 100. One could
assume that comparing modules is easier since many signatures can lead to
the same module. However, previous work [6,8] has shown that the number of
modules in ontologies is, in general, exponential w.r.t. the size of the ontology.
Still, different seed signatures can lead to the same module, which makes it hard
to extract enough different modules.
    We will consider seed signatures of two kinds: genuine seed signatures and
random seed signatures.
Genuine seed signatures. A module does not necessarily show an internal co-
herence: e.g., if we had an ontology O about the domains of geology and philoso-
phy, we could extract the module for the signature Σ = {Epistemology, Mineral}.
That module is likely to be the union of the two disjoint modules for Σ1 =
{Epistemology} and Σ2 = {Mineral}.
   In contrast, genuine modules can be said to be coherent: they are those
modules that cannot be decomposed into the union of two “⊆”-uncomparable
modules. Notably, there are only linearly many genuine modules in the size
of O since each genuine x-module equals x-mod(α̃, O), for some axiom α ∈
O. Moreover, all modules of O are composed from genuine modules [7]. Thus,
genuine modules are of special interest, and we can investigate them, and the
corresponding genuine signatures, in full.
10
     Recall: the MEX module is always a subset of the semantic Σ-module, which is
     always a subset of the syntactic Σ-module.
Random seed signatures. Since a full investigation of all the signatures is
impossible, we compare locality—both on a per-axiom and per-module basis—
as well as LBMs and MEX modules on a random signature Σ, which we select by
setting each named entity E in the ontology to have probability p = 1/2 of being
included in Σ. This ensures that each Σ will have the same probability to be
chosen. This approach has a clear setback: the random variable “size of the seed
signature generated” follows a binomial distribution, so a random seed signature
is highly likely to be rather large and to contain half the terms of the ontology.
However, we do not yet have enough insight into what typical seed signatures are
for module extraction, so biasing the selection of signatures to, for example, those
of a certain size has no rationale. In contrast, selecting random seed signatures
avoids the introduction of any bias. Moreover, this choice is complementary to
the selection of all the genuine signatures, which are in general small.
    Whilst the selection of genuine signatures is complete, we can only aim at
selecting a number of random signatures to obtain statistically significant state-
ments about modules. To reach a confidence level of 95% that the true pro-
portion of differences between modules lies in the confidence interval (±5%) of
the observed proportion, we have to sample at least 385 seed signatures (see
the detailed explanations in [5]). For ontologies with at least 9 elements in the
signature, we will therefore draw a sample of size 400. For all other ontologies,
we will look at all of the 6 400 signatures.
Summary. We compare, for every ontology O in our corpus,

(T1) for random seed signatures Σ from O,
   (a) for each axiom α in O, is α
        – ∅-local w.r.t. Σ but not ⊥-local w.r.t. Σ?
        – ∆-local w.r.t. Σ but not >-local w.r.t. Σ?
   (b) is
        – ⊥-mod(Σ, O) 6= ∅-mod(Σ, O)?
        – >-mod(Σ, O) 6= ∆-mod(Σ, O)?
        – >⊥∗ -mod(Σ, O) 6= ∆∅∗ -mod(Σ, O)?
        – ∆∅∗ -mod(Σ, ELI(O)) 6= MEX-mod(Σ, ELI(O))?
(T2) the same, Σ ranging over all the genuine signatures β̃ for β ∈ O.


4   Results of the Experiments

No differences in locality. The main result of the experiment is that, for the
vast majority of the ontologies in our corpus, no difference between syntactic
and semantic locality is observed, for all three variants ⊥ vs. ∅, > vs. ∆, and
>⊥∗ vs. ∆∅∗ . More precisely, for 210 out of 242 ontologies, we obtain that:

(T1) for random seed signatures, there is no statistically significant difference
   (a) between semantic and syntactic locality of any kind,
   (b) between semantic and syntactic LBMs of any kind;
(T2) given any genuine signature, there is no such difference.
More specifically, for all randomly generated seed signatures and all genuine
signatures, the corresponding bottom-modules (and the corresponding top- and
star-modules, respectively) agree, and every axiom is either ⊥- and ∅-local, or
none of both (and either >- and ∆-local, or none of both).
    The 210 ontologies include Galen and People, which are renowned for having
unusually large ⊥-modules [2,8]. In most cases, extracting a semantic and syn-
tactic LBM each took only a few milliseconds; hence, a performance comparison
is not meaningful. For some ontologies, the semantic LBM took considerably
longer to extract than the syntactic: up to 5 times for star-modules in Molecule
Role, and up to 34 times in Galen.

Differences in locality. We have observed differences between syntactic and
semantic locality for 32 ontologies in our corpus. We call the axioms that cause
these differences culprits – patterns of axioms which are not ⊥-local (>-local,
respectively) w.r.t. some signature Σ, but which are ∅-local (∆-local, respec-
tively) w.r.t. Σ. We have identified four types of patterns, a–d , and we describe
them in the following. Sometimes, culprit axioms pull additional axioms into the
syntactic LBM, due to signature extension during module extraction.
    We denote concept names by A, B, complex concepts by C, D, roles by r, s, . . . ,
nominals by a, non-empty data ranges (e.g., int or int0..9 ) by R, possibly with
indices. Σ denotes a signature for which a module is extracted or against which
an axiom is checked for locality. Terms outside Σ are overlined; we further use
notation C⊥ and C> to denote concepts that are bottom- or top-equivalent due to
the grammar defining syntactic locality in [3, Def. 6] and the analogous grammar
for semantic locality.

Culprits of type a are simple tautologies that accidentally entered the “in-
ferred view” (closure under certain entailments) of an ontology. These axioms do
not occur in the original “asserted” versions and could, in principle, be detected
in a simple preprocessing step. Type-a culprits occur in 10 ontologies of the
above 32, and are of the following kinds: A v A, r ≡ (r− )− , and A u C u D v A u C.
Each such tautology is trivially ∅-local and ∆-local w.r.t. any Σ, but not always
⊥- or >-local: if Σ contains all terms in α, then both sides of the subsumption
(equivalence) are neither ⊥- nor >-equivalent.

Differences caused not solely by culprits of type a have been observed
for 26 ontologies. In only 6 of these cases, the differences affect modules; in the
remaining 20, they only affect locality of single axioms (tests T1 a and T2 a). We
will focus on the former 6, listed in Table 1, and refer to [5] for details on all 26.

        Ontology              Abbreviation DL expressivity #axioms #terms
        MiniTambis-repaired MiniT          ALCN                170    226
        Tambis-full           Tambis       SHIN (D)            592    496
        Bleeding History P... BHO          ALCIF(D)          1,925    581
        Neuro Behavior O... NBO            AL                1,314    970
        Pharmacogenomic... PhaRe           ALCHIF(D)           459    311
        Terminological and... TOK          SRIQ(D)             466    330
              Table 1. Ontologies that exhibit differences in modules
   According to Table 1, differences between modules occur for ontologies of
medium to large size and medium to high expressivity. Differences in locality
alone additionally affect small ontologies such as Koala (42 axioms) and Pilot
Ontology (85 axioms), as well as large ontologies such as Galen (4,735 axioms)
and Experimental Factor Ontology (7,156 axioms). The number of axioms causing
these differences (i.e., matching the culprit patterns) in the affected ontologies is
small except for Galen, and most of the observed differences are relatively small.
   Table 2 gives a representative selection of the differences in modules observed.
For a complete overview, including differences in locality of single axioms, consult
the table in [5].
        Ontology Types affected       #diffs   size of diffs     culprit    freq.
                                             #axs         (rel.) type
        miniT      bot, star         14–25%    1–7 0–600%b         c           3
        Tambis     bot, star         32–57% 2–41c      1–62%c      c           8
        BHOa       star                 17% 1–12       0–300%      b          31
        NBOa       star                  3%      2     0–200%      d           3
        PhaRea     top, star           1–8% 1–326d 0–6,520%d       d          10
        TOK        top, star        49–100%    1–7       0–9%      d           3
a
  differences only for genuine modules
b
  differences > 5% only for genuine modules
c
  differences > 11 axioms (> 2%) only for genuine modules
d
  differences > 13 axioms (> 1,300%) only for top-modules
The columns show: ontology name (abbreviations: see Table 1); type of modules af-
fected; relative number of module pairs with differences; number of axioms in the
differences (absolute and relative to the ∅- or ∆- or ∆∅∗ -case); type of culprit present
and number of axioms of this type involved in differences.
              Table 2. Overview of observed differences between modules

   Table 2 shows small absolute differences for miniT, BHO, NBO, and TOK.
In Tambis, large differences occur only for genuine modules, which suggests that
they are unlikely to occur in practical cases with usually larger seed signatures.
Finally, in PhaRe, large differences occur only for top-modules. For all these
ontologies, a single syntactic or semantic module was extracted within only a
few milliseconds, making module extraction times roughly equal.
Culprits of type b are axioms with an ∃-restriction on a set of nominals or
a non-empty data range on the right-hand side, such as A v ∃r.{a1 , . . . , an } or
A v ∃r.R. These axioms are ∆-local w.r.t. any signature Σ that does not contain
r because they become tautologies if r is replaced by >. However, they can only
be >-local when A is a ⊥-equivalent concept w.r.t. Σ.
    Culprit-b axioms affect genuine modules of BHO, and (only) locality of single
axioms for 4 more ontologies. We observed a slightly more sophisticated variant
of the form A ≡ C> u ∃r.R.
Culprits of type c are axioms α that contain a concept description C such
that (a) C becomes equivalent to ⊥ (or >) if all terms outside Σ are replaced
by ⊥ (or >); (b) this causes α to be semantically ⊥-local (or >-local); but (c)
the grammars for syntactic locality do not “detect” C to be a C⊥ (or C> ). For
example, C = ∀r.A u ∃r.> becomes ⊥-equivalent if A is replaced by ⊥; the same
holds with cardinality restrictions in place of “∃”. Consequently, axioms such as
A⊥ ≡ B u ∀r.C⊥ u ∀s.{a} u =3 r.>, (taken from Koala) are ∅-local but not ⊥-local.
    We found this pattern in 8 ontologies. Only in miniT and Tambis, it affects
a large proportion of bottom- and star-modules, with additional axioms “pulled
in”. Still, the size of the differences is modest, as argued above. Some of the
remaining 6 ontologies contain different kinds of complex concepts that cause
differences in top-locality of single axioms.
Culprits of type d are axioms where a concept (or role) name from the
left-hand side occurs on the right-hand side together with a top-equivalent role
(or concept), causing differences in top-modules. The simplest kind of axiom of
this type is A v ∃r.A, which is ∆-local because replacing r with > makes it a
tautology. The axiom is only >-local if Σ contains neither r nor A. We have found
further examples of increasing complexity in Adverse Event Reporting Ontology
and Galen; see [5].
    We have observed culprits of type d in 17 ontologies, see the detailed overview
in [5]. Only in 3 cases (NBO, PhaRe, and TOK) are modules affected.
    Galen contains 121 culprit-D axioms, but they only affect locality of single
axioms. In addition, the time differences for Galen are remarkable: checking all
axioms for ∆-locality takes up to 70 times longer than checking them for >-
locality.
Summary. All culprits hardly ever cause significant differences in modules. Only
for PhaRe are differences between semantic and syntactic modules not negligible,
but we were able to relativize them.
    Table 1 may suggest that culprits occur only in expressive ontologies. How-
ever, patterns a, c, d can, in principle, already occur in simple terminologies in
EL and ALC, respectively. Evidently, type-a culprits can easily be filtered out
in a preprocessing step. For types c and d , there is no hope for an exhaustive
extension to locality because they can (and do) occur in arbitrarily complex
shapes and contexts.
    Patterns of type b rely on nominals or datatypes – but they are repairable
by a straightforward extension to the definition of syntactic locality: one can ex-
tend the locality definition to distinguish ⊥- and >-distinct concepts, by adding
appropriate grammars to the definition of syntactic locality, and adding more
cases of ⊥- and >-equivalent concepts to the existing grammars. However, from
the small numbers of differences observed, we doubt that such an extension of
syntactic locality will have any significant effects in practice.
LBMs vs MEX results. The results of the experimental comparison of syntac-
tic/semantic LBMs and MEX modules are summarized in Table 3. They show
that MEX modules smaller than the corresponding LBMs can be found in ∼ 27%
of the preprocessed ontologies, for either random or axiom-based seed signatures.
At the same time, unsurprisingly, syntactic and semantic LBMs do not differ at
all for these simple ELI ontologies.
    In experiments with random seed signatures, it can be seen that for those
ontologies where there are differences (most notably, Galen), they occur in many
                 Experiment         #ontol.      % tests avg size of diffs
                                   with diffs. with diffs. #axs       rel.
                 Random signatures     66           84% 0–26      0–13%
                 Axiom signatures      61           12% 0–13      0–80%
The results from the third column on are averaged over all ontologies with differences
LBM–MEX in at least one module. For example, the last two columns show the average
min and max absolute (resp. relative) difference between LBMs and MEX modules.
              Table 3. Differences between MEX and LBMs (>⊥∗ , ∆∅∗ )
tests. Thus, the difference appears to be caused by features of the ontology,
not some particular seed signatures. Also, the difference sometimes comes out
large in certain tests, also for genuine modules. For example, for the signature of
the following axiom in Galen, both ∆∅∗ -mod and >⊥∗ -mod contain 127 axioms
while the MEX-module only contains the axiom itself:11 RICF ≡ ICF u ∃ISFO.RSH.
The likely reason is the proliferation of concept equivalence axioms in Galen. For
example A ≡ B will end up in the ∆∅∗ -mod for any seed signature containing
either A or B. It is, however, an mCE of ∅ w.r.t. to either {A} or {B}.


5      Conclusion and outlook
Summary. We obtain three main observations from our experiments. (1) In
general, there is no or little difference between semantic and syntactic locality.
Hence, the computationally cheaper syntactic locality is a good approximation of
semantic locality. For the ontologies Galen and People, which are “renowned” for
having disproportionately large modules, syntactic and semantic LBMs do not
differ. (2) In most cases, there is no or little difference between LBMs and MEX
modules. Only for Galen are MEX modules considerably smaller than LBMs. (3)
Though in principle hard to compute, semantic LBMs can be extracted rather
fast in practice. Still, their extraction often takes considerably longer than for
syntactic LBMs. We cannot make any statement about MEX module extraction
times because we use the original MEX implementation, which combines loading
and module extraction. Due to (1), hardly any benefit can be expected from
preferring the potentially smaller semantic LBMs to the cheaper syntactic LBMs.
From (2) we can say that semantic LBMs can be seen as the best available
approximation of MEX modules for ontologies in highly expressive languages.
    Not only does our study evaluate how good the cheap syntactic locality ap-
proximates semantic locality and model conservativity, it also enabled us to fix
bugs in the implementation of syntactic modularity.
Future work. Two questions are interesting for future work: (1) How can we re-
design the experiments so that we can include the very large ontologies? (2) How
do LBMs compare to other types of conservativity-based modules?
    Concerning (2), one could include, for example, the technique based on reduc-
tion to QBF for the OWL 2 QL profile [16] when an off-the-shelf implementation
becomes available.
11
     The acronyms denote RightIneffectiveCardiacFunction, IneffectiveCardiacFunction,
     isSpecificFunctionOf, RightSideOfHeart.
References
 1. Franz Baader, Deborah Calvanese, Diego andMcGuinness, Daniele Nardi, and Pe-
    ter F. Patel-Schneider, editors. The Description Logic Handbook: Theory, Imple-
    mentation, and Applications. Cambridge University Press, 2003.
 2. Bernardo Cuenca Grau, Ian Horrocks, Yevgeny Kazakov, and Ulrike Sattler. Modu-
    lar reuse of ontologies: Theory and practice. J. of Artif. Intell. Research, 31(1):273–
    318, 2008.
 3. Bernardo Cuenca Grau, Ian Horrocks, Yevgeny Kazakov, and Ulrike Sattler. Ex-
    tracting modules from ontologies: A logic-based approach. In Stuckenschmidt et al.
    [20], pages 159–186.
 4. Bernardo Cuenca Grau, Bijan Parsia, Evren Sirin, and Aditya Kalyanpur. Modu-
    larity and Web ontologies. In Proc. of KR-06. AAAI Press/The MIT Press, 2006.
 5. Chiara Del Vescovo, Pavel Klinov, Bijan Parsia, Ulrike Sattler, Thomas Schneider,
    and Dmitry Tsarkov. Empirical study of logic-based modules: Cheap is cheerful.
    Technical report, 2013. https://sites.google.com/site/cheapischeerful/.
 6. Chiara Del Vescovo, Bijan Parsia, Ulrike Sattler, and Thomas Schneider. The
    modular structure of an ontology: an empirical study. volume 573 of ceur-ws. org ,
    2010.
 7. Chiara Del Vescovo, Bijan Parsia, Ulrike Sattler, and Thomas Schneider. The
    modular structure of an ontology: Atomic decomposition. In Proc. of IJCAI-11,
    pages 2232–2237, 2011.
 8. Chiara Del Vescovo, Bijan Parsia, Ulrike Sattler, and Thomas Schneider. The mod-
    ular structure of an ontology: Atomic decomposition and module count. volume
    230 of FAIA, pages 25–39, 2011.
 9. James Garson. Modularity and relevant logic. 30(2):207–223, 1989.
10. Silvio Ghilardi, Carsten Lutz, and Frank Wolter. Did I damage my ontology? A
    case for conservative extensions in Description Logics. In Proc. of KR-06, pages
    187–197. AAAI Press/The MIT Press, 2006.
11. Matthew Horridge, Bijan Parsia, and Ulrike Sattler. The state of bio-medical
    ontologies. 2011.
12. Ian Horrocks, Oliver Kutz, and Ulrike Sattler. The even more irresistible SROIQ.
    In Proc. of KR-06, pages 57–67, 2006.
13. Boris Konev, Carsten Lutz, Dirk Walther, and Frank Wolter. Semantic modularity
    and module extraction in description logics. In Proc. of ECAI-08, pages 55–59,
    2008.
14. Boris Konev, Carsten Lutz, Dirk Walther, and Frank Wolter. Formal properties
    of modularization. In Stuckenschmidt et al. [20], pages 25–66.
15. Roman Kontchakov, Luca Pulina, Ulrike Sattler, Thomas Schneider, Petra Selmer,
    Frank Wolter, and Michael Zakharyaschev. Minimal module extraction from DL-
    Lite ontologies using QBF solvers. In Proc. of IJCAI-09, pages 836–841, 2009.
16. Roman Kontchakov, Frank Wolter, and Michael Zakharyaschev. Logic-based ontol-
    ogy comparison and module extraction, with an application to DL-Lite. Artificial
    Intelligence, 174(15):1093–1141, 2010.
17. Carsten Lutz, Dirk Walther, and Frank Wolter. Conservative extensions in expres-
    sive Description Logics. In Proc. of IJCAI-07, pages 453–458, 2007.
18. Carsten Lutz and Frank Wolter. Deciding inseparability and conservative exten-
    sions in the description logic EL. 45(2):194–228, 2010.
19. Ulrike Sattler, Thomas Schneider, and Michael Zakharyaschev. Which kind of
    module should I extract? volume 477 of ceur-ws. org , 2009.
20. Heiner Stuckenschmidt, Christine Parent, and Stefano Spaccapietra, editors. vol-
    ume 5445 of LNCS. Springer-Verlag, 2009.