=Paper=
{{Paper
|id=Vol-477/paper-37
|storemode=property
|title=Which Kind of Module Should I Extract?
|pdfUrl=https://ceur-ws.org/Vol-477/paper_33.pdf
|volume=Vol-477
|dblpUrl=https://dblp.org/rec/conf/dlog/SattlerSZ09
}}
==Which Kind of Module Should I Extract?==
      Which Kind of Module Should I Extract??
        Ulrike Sattler1 , Thomas Schneider1 , and Michael Zakharyaschev2
        1
            University of Manchester, UK, {sattler,schneider}@cs.man.ac.uk
                2
                  Birkbeck College London, UK, michael@dcs.bbk.ac.uk
        Abstract There are various techniques for specifying a module of an on-
        tology that covers all knowledge about a given set of terms. These differ
        with respect to the size of the module, the complexity of its computa-
        tion, and certain robustness properties. In this paper, we survey existing
        logic-based approaches, focus on syntactic approximations, and compare
        different kinds of modules with respect to their properties. This is in-
        tended to give guidelines on how to choose “the right kind of module”.
1     Introduction
An ontology provides a common vocabulary (signature) for a domain of interest
and describes the relationships between the terms built from that vocabulary.
When developing an ontology, it is helpful for the engineer if she can reuse
information from external, already existing, ontologies. Ideally, she should be
able to extract modules that represent (only) the knowledge she wants to reuse.
The problem of module extraction, therefore, can be phrased as follows: given
a subset Σ of the vocabulary of an ontology, find a (minimal) subset of that
ontology that is relevant for the terms3 in Σ. This means that we are considering
ontologies to be sets of axioms, and their modules to be subsets thereof.
    The above requirement of “relevance” for Σ is of course still vague and open
to interpretation. We will now provide intuitions for this requirement, using
different phrasings. We will later model it precisely, using a set of equivalence
relations between ontologies, called inseparability relations. There are different
approaches to relevance for Σ; they can be grouped into structural ones, e.g. [13,
16], and logic-based ones, e.g. [4, 3, 9]. While the former focus and depend on the
syntax of the axioms in the ontology and on the induced concept hierarchy, the
latter are concerned with preserving entailments or models over a signature Σ.
The latter acts as an interface for communication with the ontology in question:
with Σ we specify a set of terms, and we expect to obtain a module—a subset
that represents all knowledge about these terms from the original ontology. This
is how we understand relevance, and we usually phrase it as “has the same
entailments with respect to”—or sometimes as “has the same models modulo”,
which is a stronger condition. When we say that a logic-based module M of
the ontology O with respect to a signature Σ “is relevant for” the terms in
?
    This work has been supported in part by the UK EPSRC grant no. EP/E065155/1.
3
    Terms are concept (class) names and role (property) names.
Σ, we mean that all consequences of O that can be expressed over Σ are also
consequences of M. Then O is said to be a conservative extension (CE) of M.
     The logic-based view seems theoretically sound and elegant and provides
a desirable guarantee: reusing only terms from Σ, we will not be able to dis-
tinguish between importing M and importing O into our ontology. However,
deciding CEs is computationally expensive in general: the problem of deciding
whether two ontologies entail the same concept inclusions over a given signature
is usually harder than standard reasoning tasks. For DL-Litehorn , for instance,
complexity grows from polynomial-time to coNP-complete, and for DL-Litebool
it grows from NP-complete to Π2p -complete [11]. For ALC and extensions, it is
usually one exponential higher or even undecidable [5, 12]. These computational
obstacles have led to syntactic approximations via locality [3]. Locality-based
modules are in general not minimal, but provide the same guarantee, namely
the preservation of certain entailments. They have been shown to be useful for
economically reusing ontologies [7]. Syntactic locality comes in two variants,
“top” and “bottom”, and there are several ways of extracting a locality-based
module (“top”, “bottom”, “bottom of top”, vice versa, and further nesting).
     This paper surveys existing logic-based approaches, focusing on syntactic
approximations. We will compare different kinds of locality-based modules with
each other and with CE-based modules. In particular, we will explore nested
locality-based modules more in-depth than the existing literature. The compari-
son of module kinds will be based on examining properties relevant for ontology
reuse. It will help learning which modules are best suited for which requirements.
     The relevant properties have been identified in [8] and are sketched below.
For our import scenario, we can consider a module M of the ontology O w.r.t.
a signature Σ to be a subset of O that is indistinguishable from O w.r.t. Σ. The
above mentioned inseparability relations generalise the import scenario and thus
compare arbitrary ontologies that are not necessarily in the subset relation. They
provide a unifying framework for comparing arbitrary definitions of modules.
Robustness under vocabulary restrictions. This property implies that a module
of an ontology w.r.t. a signature Σ is also a module of this ontology w.r.t. any
subset of Σ. This is important because it means that we do not need to import
a different module when we restrict the set of terms that we are interested in.
Robustness under vocabulary extensions. This implies that a module of an ontol-
ogy O w.r.t. Σ is also a module of O w.r.t. any Σ 0 ⊇ Σ as long as Σ 0 \ Σ does
not share terms with O. This means that we do not need to import a different
module when extending the set of relevant terms with terms not from O.
Robustness under replacement for a logic L. This property implies that, if M is
a module of O w.r.t. Σ, then the result of importing M into an L-ontology O0
is a module of the result of importing O into O0 . This is called module coverage
in [7]: importing a module does not affect its property of being a module.
Robustness under joins. If two ontologies are indistinguishable w.r.t. Σ and they
share only terms from Σ, then each of them is indistinguishable from their union
w.r.t. Σ. This property, together with robustness under replacement, implies that
it is not necessary to import two indistinguishable versions of the same ontology.
2     Preliminaries
In the following, we will use L to denote a description logic contained in SHOIQ,
whose standard syntax and semantics are defined in [1]. The reason for this
choice is that locality-based modules are defined for logics up to SHOIQ. We
will consider an ontology as a TBox, i.e., a set of axioms.
    Let NC be a set of concept names, and NR a set of role names. A signa-
ture Σ is a set of terms, i.e., Σ ⊆ NC ∪ NR . We can think of a signature
as specifying a topic of interest. Axioms that only use terms from Σ can be
thought of as “on-topic”, and all other axioms as “off-topic”. For instance,
if Σ = {Animal, Duck, Grass, eats}, then Duck v ∃eats.Grass is on-topic, while
Duck v Bird is off-topic.
    Any concept name, role name, TBox, or axiom that uses only terms from Σ
is called a Σ-concept, Σ-role, Σ-TBox, or Σ-axiom. Given any such object X,
we call the set of terms in X the signature of X and denote it with sig(X) or X.  e
    Given an interpretation I, we denote its restriction to the terms in a signature
Σ with I|Σ . Two interpretations I and J are said to coincide on a signature Σ,
in symbols I|Σ = J |Σ , if ∆I = ∆J and X I = X J for all X ∈ Σ.
2.1     Conservative extensions and locality
There are a number of variants of the notion of conservative extensions, which
capture the desired preservation of knowledge to different degrees. We focus on
the following basic ones.
Definition 1. Let L be a DL, M ⊆ T be L-TBoxes, and Σ a signature.
(1) T is a deductive Σ-conservative extension (Σ-dCE ) of M w.r.t. L if for all
    GCI axioms α over L with αe ⊆ Σ, it holds that M |= α if and only if T |= α.
(2) T is a model Σ-conservative extension (Σ-mCE ) of M if {I|Σ | I |= M} =
    {I|Σ | I |= T }.
(3) M is a dCE-based (mCE-based ) module for Σ of T if T is a Σ-dCE (Σ-mCE)
    of M w.r.t. L.
    It is clear that T being a Σ-mCE of M implies that T is a Σ-dCE of M.
    Due to the computational difficulty to decide both kinds of CEs, approxima-
tions have been introduced [3]. They are based on locality of single axioms, which
means that, given Σ, the axiom can always be satisfied independently of the in-
terpretation of the Σ-terms, but in a restricted way: by interpreting all non-Σ
terms either as the empty set (∅-locality) or as the full domain4 (∆-locality).
Definition 2. An axiom α over a logic L is called ∅-local (∆-local) w.r.t. signa-
ture Σ if, for each interpretation I, there exists an interpretation J such that
I|Σ = J |Σ , J |= α, and for each X ∈ α   e \ Σ, X J = ∅ (for each C ∈ α   e \ Σ,
  J                                 J
C = ∆ and for each R ∈ α    e \ Σ, R = ∆ × ∆).
4
    Or, in the case of roles, the set of all pairs of domain individuals.
    It has been shown in [3] that M ⊆ O and all axioms in O \ M being ∅-local
(or all axioms being ∆-local) w.r.t. Σ ∪ M
                                         f is sufficient for O to be a Σ-mCE of
M. The converse does not hold: e.g., the axiom A ≡ B is neither ∅- nor ∆-local
w.r.t. {A}, but the ontology {A ≡ B} is an {A}-mCE of the empty ontology.
    Furthermore, locality can be tested using available DL-reasoners [3], which
makes this problem considerably easier than testing conservativity, see also Sec-
tion 3.2. However, reasoning in expressive DLs is still complex, e.g. NExpTime-
complete for SHOIQ. In order to achieve tractable module extraction, the fol-
lowing syntactic approximation of locality has been introduced in [3].
Definition 3. An axiom α is called syntactically ⊥-local (>-local ) w.r.t. signa-
ture Σ if it is of the form C ⊥ v C, C v C > , C ⊥ ≡ C ⊥ , C > ≡ C > , R⊥ v R
(R v R> ), or Trans(R⊥ ) (Trans(R> )), where C is an arbitrary concept, R is an
arbitrary role name, R⊥ ∈ / Σ (R> ∈/ Σ), and C ⊥ and C > are from Bot(Σ) and
Top(Σ) as defined in Figure 1 (a) (Figure 1 (b)).
 (a) ⊥-Locality       Let A⊥ , R⊥ ∈
                                  / Σ, C ⊥ ∈ Bot(Σ), C(i)
                                                      >
                                                          ∈ Top(Σ), n̄ ∈ N \ {0}
Bot(Σ) ::= A⊥ | ⊥ | ¬C > | C u C ⊥ | C ⊥ u C | ∃R.C ⊥ | >n̄ R.C ⊥ | ∃R⊥ .C | >n̄ R⊥ .C
Top(Σ) ::= > | ¬C ⊥ | C1> u C2> | >0 R.C
 (b) >-Locality        Let A> , R> ∈
                                   / Σ, C ⊥ ∈ Bot(Σ), C(i)
                                                       >
                                                           ∈ Top(Σ), n̄ ∈ N \ {0}
 Bot(Σ) ::= ⊥ | ¬C > | C u C ⊥ | C ⊥ u C | ∃R.C ⊥ | >n̄ R.C ⊥
Top(Σ) ::= A> | > | ¬C ⊥ | C1> u C2> | ∃R> .C > | >n̄ R> .C > | >0 R.C
                       Figure 1. Syntactic locality conditions
    It has been shown in [3] that ⊥-locality (>-locality) of an axiom α w.r.t.
Σ implies ∅-locality (∆-locality) of α w.r.t. Σ. Therefore, all axioms in O \ M
being ⊥-local (or all axioms being >-local) w.r.t. Σ ∪ M  f is sufficient for O to
be a Σ-mCE of M. The converse does not hold; examples can be found in [3].
    Modules of O for each of the four locality notions are obtained by start-
ing with an empty set of axioms and subsequently adding axioms from O that
are non-local. In order for this procedure to be correct, the signature against
which locality is checked has to be extended with the terms in the axiom that is
added in each step. Definition 4 (1) introduces locality-based modules, which are
always mCE-based (and therefore dCE-based) modules [3], although not neces-
sarily minimal ones. Modules based on syntactic locality can be made smaller
by nesting >-extraction into ⊥-extraction and vice versa, and the result is still
an mCE-based module. These so-called >⊥-modules and ⊥>-modules are in-
troduced in Definition 4 (2). Finally, we will see in Section 3.2 that iterated
nesting of the latter can lead to even smaller (still CE-based) modules. These
>⊥∗ -modules and ⊥>∗ -modules are introduced in Definition 4 (3).
Definition 4. Let x ∈ {∅, ∆, ⊥, >}; let y, z ∈ {⊥, >} with y 6= z; let T be a
TBox and Σ a signature.
(1) A TBox M is the x-module of T w.r.t. Σ if it is the output of Algorithm 1.
    We write M = x-mod(Σ, T ).
(2) A TBox M is the yz-module of T w.r.t. Σ, written M = yz-mod(Σ, T ), if
    M = y-mod(Σ, z-mod(Σ, T )).
(3) Let (Mi )i>0 be a sequence of TBoxes such that M0 = T and Mi+1 =
    yz-mod(Σ, Mi ) for every i > 0. For the smallest n > 0 with Mn = Mn+1 ,
    we call Mn the yz ∗ -module of T w.r.t. Σ, written M = yz ∗ -mod(Σ, T ).
Algorithm 1 Extract a locality-based module
    Input: TBox T , signature Σ, x ∈ {∅, ∆, ⊥, >}
    Output: x-module M of O w.r.t. Σ
    M ← ∅; T 0 ← T
    repeat
      changed ← false
      for all α ∈ T 0 do
        if α not x-local w.r.t. Σ ∪ M
                                    f then
           M ← M ∪ {α}
           T 0 ← T 0 \ {α}
           changed ← true
        end if
      end for
    until changed = false
    return M
    As for (1), it has been shown in [3] that the output M of Algorithm 1 does
not depend on the order in which the axioms α are selected.5 Furthermore, the
integer n in (3) exists because the sequence (Mi )i>0 is decreasing. (We even
have M0 ⊃ · · · ⊃ Mn = Mn+1 = . . . )
    Modulo the locality check, Algorithm 1 runs in time cubic in |T | + |Σ| [3].
Modules based on ⊥/>-locality are therefore a feasible approximation for mod-
ules based on ∅/∆-locality. In both cases, modules are extracted axiom by axiom
but, as said above, the ∅/∆-locality check is more complex.
2.2     Inseparability relations
The notions of modules defined so far were induced by CEs and different notions
of locality. We will now put them into a more general context of modules gen-
erated by inseparability relations. For a given logic L, an inseparability relation
is a family S = {≡SΣ | Σ is a signature} of equivalence relations on the set of
L-TBoxes. The intuition behind this notion is as follows: T1 ≡SΣ T2 means that
T1 and T2 are indistinguishable w.r.t. Σ, i.e., they represent the same knowledge
about the topic represented by Σ. The exact meaning of the terms “indistin-
guishable” and “the same knowledge” depends on the precise definition of the
5
    Our algorithm is a special case of the one in [3, Figure 4].
inseparability relation. M being a module for Σ of T should be equivalent to
M ⊆ T and M being inseparable w.r.t. Σ from T .
    The requirement to preserve entailments or models leads to the following
inseparability relations, which have been examined in [8].
 – T1 and T2 are Σ-concept name inseparable, written T1 ≡cΣ T2 , if for all Σ-
   concept names C, D, it holds that T1 |= C v D if and only if T2 |= C v D.
 – T1 and T2 are Σ-subsumption inseparable w.r.t. a logic L, written T1 ≡sΣ T2 ,
   if for all terms X and Y that are concept expressions over Σ or role names
   from Σ, it holds that T1 |= X v Y if and only if T2 |= X v Y .
 – T1 and T2 are Σ-query inseparable, written T1 ≡qΣ T2 , if for all Σ-ABoxes A, Σ-
   queries q(x) and tuples a of object names from A, it holds that (T1 , A) |= q(a)
   if and only if (T2 , A) |= q(a).
 – T1 and T2 are Σ-model inseparable, written T1 ≡sem Σ T2 , if {I|Σ | I |= T1 } =
   {I|Σ | I |= T2 }.
We denote the respective sets of inseparability relations with S c , S s , and S sem .
It is easy to see that, for each signature Σ, it holds that ≡sem      s
                                                             Σ ⊆ ≡Σ ⊆ ≡Σ .
                                                                              c
    Inseparability relations induce modules as follows.
Definition 5. Let S be an inseparability relation, T a TBox, M ⊆ T , and Σ
a signature. We call M
(1) an SΣ -module of T if M ≡SΣ T ;
(2) a self-contained SΣ -module of T if M ≡SΣ∪MfT;
                                      S
(3) a depleting SΣ -module of T if ∅ ≡Σ∪Mf T \ M.
M is called a minimal (self-contained, depleting) ≡Σ -module of T if M, but no
proper subset of M, is a (self-contained, depleting) ≡Σ -module of T .
    Due to the shift from Σ to Σ ∪ M  f, it is not necessarily the case that every
self-contained (or depleting) SΣ -module of T is an SΣ -module of T . However,
under certain robustness properties, this implication holds.
    While self-contained and depleting SΣ -modules tend to be bigger than SΣ -
modules, they have important applications. One of those is the computation
of all justifications for an entailment η of the ontology T [6], where depleting
modules are essential. For appropriate inseparability relations S , Definition 5
(1) ensures that each Sηe-module of T contains at least one justification for η,
but not necessarily all. We have good reasons to believe that each self-contained
depleting Sηe-module of T contains all justifications for η.
    Another application of depleting modules is the import scenario [3, 10]: if
M is a depleting SΣ -module of T and S satisfies certain robustness properties,
then, for every module M0 with M                 f, we can import M0 into T \ M
                                   g0 ∪ Te ⊆ Σ ∩ M
because they do not interfere with each other: (T \ M) ∪ M0 ≡SΣ 0 M0 .
    Finally, while there can be exponentially many minimal SΣ -modules, minimal
depleting modules are uniquely determined—under mild conditions involving
inseparability relations.
3     Modules and their properties
In order to find natural candidates for inseparability relations for locality-based
modules, we proceed by analogy to inseparability relations for conservativity:
ontologies T1 and T2 are inseparable if they have the same modules, i.e., if the
module extraction algorithm returns the same output for each of them. We have
replaced the semantic criterion “the same entailments/models” in conservativity-
based inseparability with a syntactic criterion “the same extraction result” sim-
ply because locality-based modules are defined algorithmically. Furthermore, we
will see that two of the thus obtained inseparability relations have almost all
desired properties—which makes them even superior to dCE-based modules.
    We consider the following inseparability relations for locality-based modules,
where x stands for one of the locality notions ∅, ∆, ⊥, and >, and yz stands for
one of the combinations ⊥> and >⊥.
                  Relation      T1 and T2 are in relation if . . .
                  ≡xΣ           x-mod(Σ, T1 ) = x-mod(Σ, T2 )
                  ≡yz
                    Σ           yz-mod(Σ, T1 ) = yz-mod(Σ, T2 )
                      ∗
                  ≡yz
                    Σ           yz ∗ -mod(Σ, T1 ) = yz ∗ -mod(Σ, T2 )
Evidently, they are all equivalence relations.
3.1   Robustness properties of inseparability relations
In the introduction, we have already sketched four important properties of in-
separability relations and have seen why they are of interest for applications of
modules. We will now define them.
Definition 6. Let L be a DL. The inseparability relation S is called
(1) robust under vocabulary restrictions if, for all L-TBoxes T1 , T2 and all signa-
    tures Σ, Σ 0 with Σ ⊆ Σ 0 , the following holds: if T1 ≡SΣ 0 T2 , then T1 ≡SΣ T2 .
(2) robust under vocabulary extensions if, for all L-TBoxes T1 , T2 and all signa-
    tures Σ, Σ 0 with Σ 0 ∩ (Te1 ∪ Te2 ) ⊆ Σ : if T1 ≡SΣ T2 , then T1 ≡SΣ 0 T2 .
(3) robust under replacement if, for all L-TBoxes T1 , T2 , all signatures Σ and
    every L-TBox T with Te ∩ (Te1 ∪ Te2 ) ⊆ Σ, the following holds: if T1 ≡SΣ
    T2 , then T1 ∪ T ≡SΣ T2 ∪ T .
(4) robust under joins if, for all L-TBoxes T1 , T2 and all signatures Σ with Te1 ∩
    Te2 ⊆ Σ and every i = 1, 2, the following holds: if T1 ≡SΣ T2 , then Ti ≡SΣ T1 ∪T2 .
    As mentioned in Section 2.2, some of these robustness properties have an
effect on the relation between the three different kinds of induced SΣ -modules
from Definition 5, see [15].
    Let us now examine the properties of the inseparability relations S sem , S s ,
                       ∗
S , S x , S yz and S yz , where x ∈ {∅, ∆, ⊥, >} and yz ∈ {⊥>, >⊥}. First, in
  c
addition to the inclusions ≡semΣ   ⊆ ≡sΣ ⊆ ≡cΣ from Section 2.2, the following
inclusions hold, see [15] for a proof.
Theorem 7. Let Σ be a signature, x ∈ {∅, ∆, ⊥, >} and yz ∈ {>⊥, ⊥>}. Then
                                                              ∗
the following properties hold: ≡xΣ ⊆ ≡sem
                                      Σ ;   ≡yz     sem
                                             Σ ⊆ ≡Σ ;     ≡yz      sem
                                                            Σ ⊆ ≡Σ .
                                                                                              ∗
     property                     relation   S sem    Ss     Sc        Sx       S yz   S yz
     corresponding module notion             mCE dCE         —        x-mod yz-mod yz ∗ -mod
     (min.) modules                          3− 3−           —          3      7        3
     (min.) self-contained mod.s              7   7          —          3      3        3
     (min.) depleting modules                 7   7          —          3      3        3
     robustness voc. restr.                    3       3     3          3        7      3
     robustness voc. ext.                      3      (7)   (3)         3        7      3
     robustness replacement                    3      (7)   (7)         3        3      3
     robustness joins                          3      (7)   (7)         3        3      3
Symbols:        Sx                 stands for S ∅ , S ∆ , S ⊥ , S >
                              ∗                                   ∗         ∗
                S yz , S yz        stand for S >⊥ , S ⊥> , S ⊥> , S >⊥
                3, 7               property holds/fails
                (3), (7)           property holds/fails for many standard description logics
                3−                 property holds except for minimality
                —                  property not considered: no corresponding module notion
    Figure 2. Properties of inseparability relations for different module notions.
     The properties of the equivalence relations defined in Subsection 2.2 are sum-
marised in Figure 2. The first four lines give the module notion which is gen-
eralised by this inseparability relation, and indicate whether each such module
for Σ is a (minimal) SΣ -module, or a (minimal) self-contained SΣ -module, or
a (minimal) depleting SΣ -module. For S c , this question is meaningless because
there is no corresponding standard module notion, apart from redefining Σ-dCEs
to take only Σ-concept inclusions into account. For S sem and S s , it is clear that
Definition 1 (3) leads to SΣ -modules, but not to minimal, self-contained, or de-
pleting ones. This can, however, easily be achieved by adopting stronger module
notions as in [9, 10]. Hence, the negative entries in this part of the table are
not problematic, and we can say that locality-based and conservativity-based
                                                                                  yz
modules are equally “good”—except for yz-modules: they are not always SΣ             -
modules, which is critical. However, they are always minimal self-contained (de-
             yz
pleting) SΣ     -modules.
     The remaining four lines indicate whether the respective inseparability re-
lation satisfies the four robustness properties from Definition 6. The results for
S sem , S s and S c are taken from [3, 8]; those for the locality-based relations will
be proven in the following. This part of the table reveals the following insights:
while it is not surprising and known that S c lacks some of the important robust-
ness properties, it is interesting (but known as well) that this is also the case for
S s . As for the locality-based inseparability relations, it is truly surprising that
                                                                          ∗
the S yz lack two of four robustness properties, while the S x and S yz appear to
be flawless and as good as S sem .
   The following theorem states the results in Figure 2 for locality-based insep-
arability relations. Their proofs can be found in [15].
Theorem 8. Let x ∈ {∅, ∆, ⊥, >} and yz ∈ {⊥>, >⊥}.
                                       x
(1) The inseparability relation SΣ         is robust under replacement, vocabulary re-
    strictions, vocabulary extensions and joins.
                                     yz
(2) The inseparability relation SΣ        is not robust under vocabulary restrictions or
    extensions, but under replacement and joins.
                                      yz ∗
(3) The inseparability relation SΣ          is robust under replacement, vocabulary re-
    strictions, vocabulary extensions and joins.
(4) Let T be a TBox, Σ a signature, and M = x-mod(Σ, T ). Then M is a
                x                                            x
    minimal SΣ     -module, a minimal self-contained SΣ        -module, and a minimal
                 x
    depleting SΣ   -module of T .
(5) Let T be a TBox, Σ a signature, and M = yz-mod(Σ, T ). Then M is
                                  yz                                                 yz
    not generally a minimal SΣ       -module, but it is a minimal self-contained SΣ     -
                                              yz
    module and a minimal depleting SΣ -module of T .
(6) Let T be a TBox, Σ a signature, and M = yz ∗ -mod(Σ, T ). Then M is a
                yz ∗                                         yz ∗
    minimal SΣ       -module, a minimal self-contained SΣ         -module, and a minimal
                    ∗
                 yz
    depleting SΣ      -module of T .
3.2    Minimality versus efficient computability
Theoretical results. For ALC and ALCQI, the problem of deciding whether two
ontologies entail the same concept inclusions over a given signature Σ is 2Exp-
Time-complete [5, 12], which is one exponential harder than standard reasoning
tasks. For ALCQIO, and hence for the more expressive OWL 1 and 2, this prob-
lem is undecidable [12]. The related problem of deciding whether two ontologies
have the same models with respect to Σ is undecidable already for ALC [12].
    Fortunately, extracting mCE-based modules can be tractable, for instance
for acyclic EL ontologies [9]. We will examine how much locality-based modules
differ from minimal modules in size and desired properties in this case.
    For the DL-Lite family [2], deciding dCEs is Π2p -complete for DL-Litebool
and coNP-complete for DL-Litehorn [11], i.e., it is most likely intractable in both
cases. However, there is experimental evidence [10] that minimal modules of DL-
Lite ontologies can be extracted quite efficiently via locality-based modules as a
preprocessing step and using QBF solvers.
Experimental results. Since CE-based modules are so hard to extract, it is dif-
ficult to perform experiments comparing modules of different notions extracted
from the same ontology for the same signature. However, in two cases this has
been possible. We will briefly describe the two cases and show the difference
between conservativity-based and locality-based modules.
    Snomed CT, the Systematized Nomenclature of Medicine, Clinical Terms,
is an ontology that consists of approximately 360,000 concepts and 1.4 million
relationships, and defines the medical terminology for health care systems in the
US, the UK, and other countries. It is an acyclic EL TBox; therefore extracting
mCE-based modules is tractable: the system MEX [9] extracts mCE-based mod-
ules from acyclic EL ontologies in polynomial time. Three signatures containing
some 4,000, 16,000, and 24,000 concept names from Snomed CT have been pro-
vided by the Intensive Care Unit of the Royal Prince Alfred Hospital in Sydney,
Australia, from a corpus of 60 million tokens complied from 6 years of notes. (For
the process of converting the text to Snomed CT codes, see [14]). For the given
signatures, we have extracted mCE-based modules with MEX, and locality-based
modules via the OWL API6 . In these cases, >-modules comprised almost the
whole ontology, and all yz-modules and yz ∗ -modules coincided. Furthermore,
yz ∗ -modules were only about 1.5 times as big as mCE-modules for signatures
greater than 15,000 symbols. The runtime was 1–4 seconds on an average PC in
all cases. The sizes of mCE-modules, ⊥-modules and >⊥∗ -modules are given in
the following two tables containing exact sizes and percentages, respectively.
          |Σ|     ⊥     >⊥∗      MEX              |Σ|         ⊥    >⊥∗    MEX
        2,687 15,351   15,011    6,374           0.7% 4.0%         4.0%   1.7%
       15,747 38,912   38,534   26,183           4.1% 10.2%       10.1%   6.9%
       23,859 55,504   54,958   37,129           6.3% 14.6%       14.5%   9.8%
    The version of Snomed CT underlying these figures was dated 9 February
2005. A version of 30 December 2006 led to similar results.
    In [10], dCE-based modules have been extracted from two ontologies in
DL-Litebool , Core and Umbrella containing 1283 and 1247 axioms, for system-
atically and randomly generated signatures of size 1 and 10. These modules are
                                               c                       q
MCM, MQM and MDQM, which are minimal SΣ          -modules, minimal SΣ    -modules
          q                                                  q
(where SΣ is based on query-CEs), and minimal depleting SΣ -modules. The ta-
ble below gives the average sizes for modules extracted for (a) all singleton sig-
natures, (b) 30 randomly generated signatures consisting of 10 concept names,
and (c) 30 random signatures consisting of 5 concept and 5 role names. The
entry “—” means that no data is available due to performance reasons.
                         (a) |Σ| = 1  (b) |Σ| = 10 (c) |Σ| = 5 + 5
                       Core Umbrella Core Umbrella Core Umbrella
            MCM           2          2    34       39    81          96
            MQM           5          2    54       66    —           —
            MDQM         80         57   294      319   497         485
            >⊥∗ -mod    226         69   465      351   671         526
    The results show that, on average, locality-based modules can be considerably
bigger than dCE-based modules. On the other hand, since locality-based modules
are always depleting (see Section 3.1), it is only fair to compare them with
MDQM, and in only one of six cases are >⊥∗ -modules more than one and a
half times as big as MDQMs on average. In [10], the time needed to extract the
modules can be found in addition to their sizes.
6
    http://owlapi.sourceforge.net
On the nesting of locality-based modules. We have stated in Section 2.1 that the
size of locality-based modules can be reduced by nesting the extraction of >- and
⊥-modules. In the first instance, this led to the notion of >⊥- and ⊥>-modules.
To the best of our knowledge, this has not been done before. As shown in the
following example, it indeed leads to smaller modules.
Example 9. Let Σ = {A, D} and T = {A v B, C v D}.
Then we have >-mod(Σ, T ) = {C v D} and ⊥-mod(Σ, T ) = {A v B}, but
>⊥-mod(Σ, T ) = ⊥>-mod(Σ, T ) = ∅.
   There are cases where repeated nesting of >⊥-modules (⊥>-modules) de-
creases module size even further, see Example 10.
Example 10. Consider Σ = {A} and T = {A v B t C, B v A}. Then we have
T ⊃ >⊥-mod(Σ, T ) ⊃ >⊥-mod(Σ, >⊥-mod(Σ, T )) because:
                 N1 = ⊥-mod(Σ, T )      = {A v B t C, B v A}
                M1 = >-mod(Σ, N1 ) = {B v A}
                 N2 = ⊥-mod(Σ, M1 ) = ∅
                M2 = >-mod(Σ, N2 ) = ∅
An analogous example for ⊥>-modules is Σ = {A}; T = {B v A t ¬C, A v B}.
   It therefore makes sense to continue this nesting up to a fixpoint. The follow-
ing lemma shows that the number of steps until this fixpoint is reached can be
asymptotically as big as the number of axioms in T . The proof is given in [15].
Lemma 11. For every integer n > 1 and yz ∈ {>⊥, ⊥>}, there exist an ALC-
TBox M0 of size O(n) and a signature Σ of size O(n) such that Mi+1 =
yz-mod(Σ, Mi ), for each i = 0, . . . , n − 1, and M0 ⊃ · · · ⊃ Mn .
4   Conclusion
We have compared important properties of conservativity-based and locality-
based modules via the more general notion of inseparability relations. It has
turned out that, while modules based on locality are in general larger than
conservativity-based ones, they are very robust. Two out of three versions of
locality-based modules enjoy the same robustness properties as mCE-based mod-
ules and are therefore more robust than modules based on dCEs. In addition,
they are self-contained and depleting. However, their robustness does not sim-
ply follow from the fact that they are a special case of mCE-based modules:
yz-modules do not “inherit” all robustness properties.
    Furthermore, modules based on syntactic locality can be extracted efficiently
for all logics up to SHOIQ. They can thus also serve as an intermediate step
for extracting conservativity-based modules [10]. Except for the few cases where
mCE-based modules can be extracted efficiently, locality-based modules seem
best suited to module extraction scenarios because they combine desirable prop-
erties with computational feasibility.
References
 [1] F. Baader, D. Calvanese, D. McGuinness, D. Nardi, and P. F. Patel-Schneider,
     editors. The Description Logic Handbook: Theory, Implementation, and Applica-
     tions. Cambridge University Press, 2003.
 [2] D. Calvanese, G. De Giacomo, D. Lembo, M. Lenzerini, and R. Rosati. DL-Lite:
     Tractable description logics for ontologies. In Proc. of AAAI-05, pages 602–607,
     2005.
 [3] B. Cuenca Grau, I. Horrocks, Y. Kazakov, and U. Sattler. Modular reuse of
     ontologies: Theory and practice. J. of Artificial Intelligence Research, 31:273–
     318, 2008.
 [4] B. Cuenca Grau, B. Parsia, E. Sirin, and A. Kalyanpur. Modularity and web
     ontologies. In Proc. of KR-06, pages 198–209, 2006.
 [5] S. Ghilardi, C. Lutz, and F. Wolter. Did I damage my ontology? A case for
     conservative extensions in description logics. In Proc. of KR-06, pages 187–197,
     2006.
 [6] M. Horridge, B. Parsia, and U. Sattler. Laconic and precise justifications in OWL.
     In Proc. of ISWC-08, volume 5318 of LNCS, pages 323–338, 2008.
 [7] E. Jiménez-Ruiz, B. Cuenca Grau, U. Sattler, T. Schneider, and R. Berlanga
     Llavori. Safe and economic re-use of ontologies: A logic-based methodology and
     tool support. In Proc. of ESWC-08, volume 5021 of LNCS, pages 185–199, 2008.
 [8] B. Konev, C. Lutz, D. Walther, and F. Wolter. Formal properties of modular-
     ization. In H. Stuckenschmidt, S. Spacciapietra, and C. Parent, editors, Ontology
     modularization, volume 5445 of LNCS. To be published.
 [9] B. Konev, C. Lutz, D. Walther, and F. Wolter. Semantic modularity and module
     extraction in description logics. In Proc. of ECAI-08, pages 55–59, 2008.
[10] R. Kontchakov, L. Pulina, U. Sattler, T. Schneider, P. Selmer, F. Wolter, and
     M. Zakharyaschev. Minimal module extraction from DL-Lite ontologies using
     QBF solvers. In Proc. of IJCAI-09, 2009. To appear.
[11] R. Kontchakov, F. Wolter, and M. Zakharyaschev. Can you tell the difference
     between DL-Lite ontologies? In Proc. of KR-08, pages 285–295, 2008.
[12] C. Lutz, D. Walther, and F. Wolter. Conservative extensions in expressive de-
     scription logics. In Proc. of IJCAI-07, pages 453–458, 2007.
[13] N. F. Noy and M. A. Musen. Specifying ontology views by traversal. In Proc. of
     ISWC-04, volume 3298 of LNCS, pages 713–725, 2004.
[14] J. Patrick, Y. Wang, and P. Budd. An automated system for conversion of clinical
     notes into SNOMED clinical terminology. In Proc. of HKMD-07, volume 68 of
     CRPIT, pages 219–226, 2007.
[15] U. Sattler, T. Schneider, and M. Zakharyaschev. Which kind of module should I
     extract? Technical report, University of Manchester, 2009. http://www.cs.man.
     ac.uk/%7Eschneidt/publ/ssz09 insep report.pdf .
[16] J. Seidenberg and A. L. Rector. Web ontology segmentation: analysis, classifica-
     tion and use. In Proc. of WWW-06, pages 13–22, 2006.