=Paper= {{Paper |id=Vol-2373/paper-21 |storemode=property |title=How Modular Are Modular Ontologies? Logic-Based Metrics for Ontologies with Imports |pdfUrl=https://ceur-ws.org/Vol-2373/paper-21.pdf |volume=Vol-2373 |authors=Robin Nolte,Thomas Schneider |dblpUrl=https://dblp.org/rec/conf/dlog/NolteS19 }} ==How Modular Are Modular Ontologies? Logic-Based Metrics for Ontologies with Imports== https://ceur-ws.org/Vol-2373/paper-21.pdf
         How Modular Are Modular Ontologies?
     Logic-Based Metrics for Ontologies with Imports

                            Robin Nolte and Thomas Schneider

              Department of Computer Science, University of Bremen, Germany
                         {nolte,tschneider}@uni-bremen.de



       Abstract. Many large ontologies are developed modularly, often using import
       statements, which are supported by the OWL standard. However, import statements
       do not provide logical guarantees such as local completeness, which is an estab-
       lished quality criterion for ontology modules: an ontology is locally complete if it
       uses terms from imported ontologies without changing the knowledge reused from
       them. To measure the extent to which ontologies separated by import statements
       are logically modular, we present four new quantitative logic-based metrics: two
       are strongly related to local completeness and based on module extraction, using
       some established module notion as a reference; the other two exploit the depen-
       dency relation of the atomic decomposition. We formally study the relationship
       between the measures and evaluate them on a set of ontologies.


1   Introduction

Modularity of ontologies has received much attention in the past decade, given the exis-
tence of large and comprehensive ontologies such as SNOMED CT [41] and NCI [16],
and considering the observation that modular ontologies can be maintained, compre-
hended, and reasoned over more easily. There are several ways to develop an ontology
modularly. The simplest one is certainly the distribution of the axioms over several files
and the use of OWL’s import statements. More principled approaches include the use of
(a-priori) extensions of DLs supporting modular development [2,4,40] or (a-posteriori)
decomposition methods [7,11]. The simple approach based on import statements seems
to be used frequently: out of the 438 ontologies in the NCBO BioPortal ontology reposi-
tory [29], at least 69 are built modular using imports; each of them imports (directly and
indirectly) up to 31 other ontologies from within and outside the repository. For example,
the Cell Ontology (CL) imports 8 ontologies, including the Gene Ontology (GO).
    The use of import statements allows developers not only to reuse an existing ontology
in (several) other ontologies, but also to follow the design principle separation of
concerns [12] or, in ontological terms, to separate (sub-)domains of interest. However,
this separation does not provide any logical guarantees, e.g., GO does not need to be a
module of CL in the strict logical sense that all knowledge about genes that follows from
CL already follows from GO. In other words, CL might reuse the vocabulary “borrowed
from” GO in a way that changes the knowledge in GO. More generally, if an ontology O
imports a module M, then it is reasonable to require that O reuses the vocabulary from
M in a “safe” way, in the sense that (a) O does not entail any knowledge about that
vocabulary or (b) O does not entail anything new about that vocabulary, (i.e., knowledge
not already entailed by M). Guarantee (a) is known as safety [5,19] and (b) as local
completeness [7,25]. Both are strongly related to encapsulation as known from software
engineering [36]; moreover, (b) can be formalised using conservative extensions [15].
     Alas, conservativity is undecidable already for fragments of the DL SROIQ [26]
underlying OWL. Approximations are known; e.g., locality [6] is a sufficient condition
for conservativity, and its syntactic variant can be computed in polynomial time. Locality
is the foundation for the successful family of locality-based modules [6]; it can and has
been used to evaluate local completeness qualitatively for ontologies with imports [5,19],
concluding that essentially all ontologies satisfy (b) from above [5].
     In this paper we introduce four quantitative measures, aka metrics, for the quality
of imports in ontologies or repositories. Our metrics are based on the general idea
of determining how close imported modules are to being modules in a strict sense,
i.e., determined by some “reference” notion of a module which itself guarantees local
completeness (including, but not restricted to, locality-based modules). Hence they
determine the extent to which an ontology with imports uses the vocabulary from the
imported ontologies in a “safe” way. The first two metrics capture local completeness
by determining the similarity of two graphs: one graph represents the import structure
of an ontology or repository, i.e., the nodes are the imported subontologies, and the
edges are induced by the imports; the other graph is a reference graph that represents
a logical significance relation between importing and imported ontologies which is
defined based on an arbitrary logically encapsulating notion of a module. The other two
metrics use a reference graph defined via the atomic decomposition of an ontology [11],
which constitutes a partition into subontologies that are atomic w.r.t. some underlying
module notion, together with a dependency relation between those atoms. This way they
determine the similarity of the import graph to an “ideal” graph that captures the logical
dependencies within the ontology, which is different from measuring local completeness.
     We define the new metrics and evaluate them on a recent BioPortal snapshot [29]. We
expect them to help ontology engineers assess whether and to which degree the import-
induced modular structure of their ontology actually reflects the logical dependencies
between the constituent ontologies, as represented by the underlying reference graphs.
This paper is based on the first author’s bachelor thesis [30].


2     Related Work

There is a large amount of work introducing quality criteria for modules and evaluating
modules against these criteria; informative overviews are given in [9,20]. Here, the term
“module” is to be conceived in a broad sense, referring to ontologies that can be used in
combination with other ontologies. These quality criteria can be divided into qualitative
and quantitative ones (metrics): qualitative criteria can be either satisfied or violated,
and metrics are satisfied to some degree. Some metrics have analogues in software
engineering. Criteria are also grouped by the characterised or measured aspects:

    – Logical criteria such as local correctness and local completeness [7,25]
    – Structural criteria such as size and redundancy [39]
    – Criteria transferred from software engineering to ontologies, based on or considering
      the semantics, such as cohesion and coupling [25,33,34,45]
    – User- or developer-centric criteria such as comprehensibility [25], readability [43],
      or domain coverage [8]

Many of the metrics have been implemented in ontology assessment tools and evaluated
empirically [13,20,43].


3     Preliminaries

3.1    Graph Theory

We use digraphs, i.e., directed graphs G = (V, E), where V is the (non-empty) set of
nodes and E ⊆ V × V the set of edges. In the following, let G = (V, E) and G0 = (V 0 , E 0 )
be digraphs. If V ⊆ V 0 and E ⊆ E 0 , then G is called a subgraph of G0 . We denote the
digraph (V, E \ E 0 ) by G \ G0 . To measure the similarity between two digraphs G and
G0 that share the same nodes, we use the following (asymmetric) variant of the Tversky
index [44] of their edge sets, which relates to the notion of specificity from test theory.

Definition 1. Given digraphs G = (V, E) and G0 = (V 0 , E 0 ) with V = V 0 , the relative
similarity of G with G0 is RSim(G, G0 ) := |E ∩ E 0 | / |E 0 | if this term is defined, i.e., if
E 0 , ∅. In case E 0 = ∅, we set RSim(G, G0 ) := 0 if E , ∅ and RSim(G, G0 ) := 1 if E = ∅.

As a consequence, if E = E 0 , then RSim(G, G0 ) = 1; if E ∩ E 0 = ∅, then RSim(G, G0 ) = 0.


3.2    OWL and Import Structures

We assume that the reader is familiar with OWL and the syntax and semantics of the
underlying description logic SROIQ, for details see [17,18,24]. An ontology O is a
finite set of general concept and role inclusions as well as concept and role assertions.
Let NC be a set of concept names, NR a set of role names and NI a set of individual
names. A signature is a set Σ ⊆ NC ∪ NR ∪ NI of terms. Given a concept, role, axiom, or
ontology X, the set of terms occurring in X is called the signature of X, denoted X.
                                                                                   e
    OWL ontologies may contain import statements, which can be used transitively and
even cyclically. The import closure of an ontology O is the union of O and all ontologies
imported directly and indirectly by O. The import structure of O can be represented by
a digraph whose nodes are the imported/importing ontologies and the edges denote the
import relation. More generally, an ograph is a digraph G whose nodes are ontologies.
Example 2. The ograph in Figure 1a consists of ontologies O1 , . . . , O5 and represents
the situation where O1 imports O2 and O4 , O2 imports O3 , and O4 imports O5 .
In general, an ograph is a means to denote some kind of (logical) significance relation
between ontologies. We assume that such relations are reflexive and transitive and thus
will often work with the reflexive transitive closure of an ograph G = (V, E), denoted
G∗ = (V, E ∗ ) and depicted in Figure 1b for the ontologies from Example 2. Import-
induced ographs as in Example 2 are a special case, representing the logical significance
                   O4     O5                           O4     O5

             (a)   O1     O2      O3         (b)       O1     O2      O3



      Fig. 1: (a) an exemplary ograph G and (b) its reflexive transitive closure G∗


relation that is to be expected from the import (e.g., O2 should be significant for O1
but not vice versa). Furthermore, an ograph in general does not need to have a unique
root or be connected. In this sense, the notion of an ograph even captures repositories of
ontologies. For an ograph G = (V, E), let OG := V. If G represents the import structure
                                                  S
of an ontology O, then OG is the import closure of O. Note that ontologies contained in
an ograph may share symbols regardless of whether or not they are adjacent.

3.3   Modules and Atomic Decomposition
For an interpretation I and an ontology O, we write I |= O if I is a model of O, and
denote the interpretation obtained by restricting I to the signature Σ with I|Σ . The central
notion that is used to define a module is that of a conservative extension [15] or, more
generally, of Σ-inseparability [22], which is defined as follows.
    Let O1 and O2 be ontologies and Σ a signature. O1 and O2 are model inseparable
w.r.t. Σ, written O1 ≡mCE
                        Σ    O2 , if {I|Σ | I |= O1 } = {I|Σ | I |= O2 }. O1 and O2 are
deductively inseparable w.r.t. Σ, written O1 ≡dCE
                                                Σ    O2 , if, for all SROIQ-entailments η
over Σ, we have O1 |= η if and only if O2 |= η. The equivalence relations ≡R are defined
upon the notion R ∈ {mCE, dCE} which is called an inseparability relation.
    Other notions of inseparability relations can be defined, see, e.g., [22,38]; in this
paper we will consider only R ∈ {mCE, dCE}. An inseparability relation R induces
modules defined as follows [23]. Let M and O be ontologies with M ⊆ O and Σ a
signature. We call M an RΣ -module of O if M ≡RΣ O, and a minimal RΣ -module of O if
M, but no proper subset of M, is an RΣ -module of O. Stronger module notions such as
self-contained and depleting RΣ -module exist [23], but are not needed in the following.
    Since both notions of inseparability are undecidable already for DLs of moderate
expressivity [26,27], extracting RΣ -modules is a computationally very hard problem. Still,
there are several successful module extraction methods: some are restricted to fragments
of SROIQ, such as the MEX/AMEX approach [14,21]; others are approximation
methods, which guarantee that the output is an RΣ -module (often with additional useful
properties), but that module is not necessarily minimal. These approaches include
locality-based modules (LBMs) [6], reachability-based modules (RBMs) [28,32], and
modules based on datalog reasoning [1], and minimal subsumption modules [3]. LBMs
come in two flavours (semantic and syntactic), and three variants per flavour (⊥, >,
nested). We will refer to LBMs in some examples, but the precise definitions are not
relevant for understanding those.
    Since the techniques developed in the following do not depend on a concrete module
extraction approach, we use the general notation x-mod(·, ·), i.e., M = x-mod(Σ, O)
denotes the module extracted for the signature Σ from the ontology O using approach x.
Usually Σ is called the seed signature for M. More precisely, the module extraction
function x-mod(·, ·) maps every pair (Σ, O) to a subset of O.
     In contrast to the extraction of a single module, there are techniques for decomposing
an ontology into a collection of subontologies which, in some sense, represent all
modules. Atomic decomposition (AD) [11] is one such technique. It partitions the input
ontology O into a set of atoms and computes a dependency relation between them, also
yielding a base for the set of all modules of O [10, Lemma 4.15]. AD can be used with
any module notion x whose function x-mod(·, ·) satisfies certain properties, which are
called (M0)–(M6) in [10]. LBMs and MEX modules satisfy all of them [10].
     Let O be an ontology and F x (O) = {x-mod(Σ, O) | Σ ⊆ O}.      e If x is clear from the
context or we do not have a specific x in mind, we simply write F(O). Given two axioms
α, β ∈ O, we write α ∼O β if, for all M ∈ F(O), we have α ∈ M iff β ∈ M. Obviously
∼O is an equivalence relation. The atoms of O are the equivalence classes of ∼O , i.e.,
maximal subsets of axioms that are not separated by any module. We denote atoms by
a, b, . . . and the set of atoms of O by A(O). It is immediate that A(O) is a partition of O
into linearly many atoms, and every module M ∈ F(O) is a disjoint union of atoms. In
contrast, not every atom needs to occur in some module, but this is only the case if O
contains certain tautologies [10]. We assume the absence of such tautologies.
     Let a, b be atoms of O. We say that a depends on b and write a  b if a ⊆ M implies
b ⊆ M, for every M ∈ F(O). The relation  is called the dependency relation of O; it
is obviously a partial order. The atomic decomposition (AD) of O is the poset (A(O), ),
where a  b iff a  b and a , b. It can be represented using a Hasse diagram.
     Although an ontology can have exponentially many modules [37], its AD can always
be computed using a linear number of module extractions: it suffices to compute the
genuine modules of O, which are the Mα := mod(e        α, O) for all α ∈ O, and compute the
atoms and the dependency relation from only the Mα [11]. This observation is based on
several properties of the AD that follow from M1–M5, including the following.
Lemma 3 ([11]). For all α ∈ a ∈ A(O), Mα is the smallest module of O containing a.


4   Metrics for Assessing Imports
Our aim is to develop metrics that assess whether an ontology O1 imports another
ontology O2 in a reasonable way. There are many possible meanings of “reasonable”.
We focus on the following two: local completeness requires that O1 does not add new
knowledge about the terms from O2 ; relevance requires that O2 adds to the knowledge
in O1 about those terms. These two conditions are orthogonal to each other, and they
should also hold when the ograph contains further ontologies, as we demonstrate in
Example 4 below. We thus postulate a condition that is stricter than local completeness
and call it completeness.
    A logically sound definition of “add knowledge” should arguably best be based
on inseparability. Since the latter is hard to impossible to decide, approximations are
needed and have been used already, e.g., locality as a qualitative approximation of local
completeness [5]. Since locality is a sufficient condition for inseparability, it implies that
the import is (locally) complete, but if locality is violated, we do not know and have to
be cautious. Contrariwise, a necessary condition would be a useful approximation.
    In the following, we devise four metrics based on an arbitrary module notion that
guarantees inseparability, but not necessarily minimality. By not committing to a par-
ticular module notion, we leave room for using better module notions that may be
developed in the future. Each of our metrics assigns every given ograph G (e.g., import
structure) a rational number between 0 and 1. This is done by comparing G with a refer-
ence ograph G0 that has the same nodes as G and whose edges denote the “reasonable”
import relations between the ontologies from G. For the first two measures, we base
our understanding of “reasonable” on the notion of significance, which we will define
using inseparability and approximate using modules. The second two measures will
use atoms from the atomic decomposition instead of modules, thus achieving an even
stronger notion of reasonableness that abstracts away from the specific signature of
the imported ontology. The actual metrics will then be given by a standard notion of
difference between the input and reference ographs.
    In the following, let R be an arbitrary inseparability relation R. We use ≡ as a
shorthand for ≡R . We assume that R is monotone [23], i.e., if O1 ⊆ O2 ⊆ O3 and
O1 ≡Σ O3 , then O1 ≡Σ O2 . Furthermore, let x be an arbitrary module notion that
yields unique RΣ -modules, i.e., for all O and Σ, we have that x-mod(Σ, O) is a uniquely
determined subset of O with x-mod(Σ, O) ≡Σ O, which is guaranteed, e.g., by LBMs
and MEX modules. In the following, we omit x where no confusion can arise.


4.1     Module-Induced Modularity

For our first two metrics, the edges of the reference ograph capture a variant of com-
pleteness between the respective nodes of G. To explain the underlying intuitions, we
continue Example 2.
Example 4. Consider the ograph G from Example 2. Given that O2 (directly) imports
O3 , as represented by G’s edges, this import would be “safe” if O3 were locally complete
with respect to O2 , i.e., if the import into O2 did not change the meaning of the symbols
in O3 , that is, O2 ∪ O3 ≡O g3 O3 (1). Similarly, since O1 imports the other four ontologies
(directly or indirectly), local completeness would require that the meaning of the symbols
in those is not changed, i.e., i=1,...,5 Oi ≡Si=2,...,5 O    i=2,...,5 Oi (2).
                                 S                         S
                                                        fi

In general, (1) and (2) cannot be decided. They can be approximated using locality, as in
Cuenca Grau et al.’s approach [5]. However, the authors applied their approach only to
“top-level ontologies”, i.e., (2) would have been tested for OG , but not (1). Furthermore,
our metrics should not rely on locality, as explained above. We will therefore measure
statements such as (1) and (2) in a different way, using sufficient conditions, based on
the above properties of the module notion mod:
Example 5. In Example 4, the following are sufficient for (1) and (2): mod(O        f3 , O2 ∪
O3 ) = O3 (1 ) and mod( i=2,...,5 Oi , i=1,...,5 Oi ) = i=2,...,5 Oi (2 ). Let x be the “top”
             0                                                         0
                        S         f   S                S
version of syntactic locality, and let O1 = {A t B v C}, O2 = {D v B, A v E},
O3 = {F v A}, O4 = {B v ¬A} and O5 = ∅. 1 Then both (10 ) and (20 ) hold.
 1
     O5 might still contain non-logical axioms, such as annotations or declarations. This case does
     occur, e.g. in DC Terms (http://purl.org/dc/elements/1.1/).
Note that we made expectation (1) implicitly based on the assumption that O1 and O4 are
irrelevant for the local completeness of O2 ∪ O3 w.r.t. O3 . However, if we take them into
account too and extract the module from the whole ontology OG , then >-mod(O       f3 , OG )
consists of O3 ∪ O4 plus the first axiom of O2 . The overlap with O2 suggests that O2
does change the knowledge of the terms from O3 in the context of all of OG .
     This last observation admits the following conclusions in view of our desired metrics:
(1) it is not enough to consider local completeness; (2) for testing completeness and
relevance, one needs to check edges as well as non-edges in an ograph. In order to
accommodate these conclusions, we use the following notion as a basis for our metric.

Definition 6. Let Σ be a signature and O, O0 be ontologies such that O0 ⊆ O. O0 is
Σ-significant in O iff O .Σ O \ O0 .

Intuitively, a Σ-significant ontology O0 in O adds knowledge about Σ to O (relevance).
Contrariwise, Σ-insignificance is similar to completeness but lets us specify a signature.
Based on our considerations above and the notion of significance, we can put our
expectation precisely: Given an ograph G = (V, E), for any two ontologies O1 , O2 ∈ V
we expect O1 to be O  f2 -significant in OG iff (O1 , O2 ) ∈ E ∗ . That is, O2 should import
O1 directly or indirectly if, and only if, O1 adds knowledge about the terms in O  f2 to OG .
In particular, if O1 and O2 do not share terms, we would not expect any path between
them in G; if they do share terms and both contain knowledge about those shared terms,
we would expect paths both ways.

Example 7. In Example 5, O1 is, as expected, O    f2 -, O
                                                        f3 , O
                                                             f4 - and O
                                                                      f5 -insignificant in OG ,
but O4 is O2 - and O3 -significant. Analogously, O2 is O3 - and O4 -significant, and O3 is
          f         f                                     f         f
O        f4 -significant. Note that O2 , O3 , O4 are all O
f2 - and O                                               f5 -significant but O5 is not.

Since Σ-significance is defined based on inseparability, which is undecidable already
for DLs of moderate expressivity, we can only hope to find a sufficient condition for
insignificance. Indeed, due to the above properties of modules, the following holds.

Lemma 8. Let Σ be a signature and O, O0 ontologies such that O0 ⊆ O.

(1) If O0 ∩ mod(Σ, O) = ∅, then O0 is Σ-insignificant in O.
(2) If O0 is Σ-insignificant in O, then there is some RΣ -module M of O with O0 ∩M = ∅.

Proof. (1) Let M := mod(Σ, O). Then M ≡Σ O. With O0 ∩ M = ∅, i.e., M = M \ O0 ,
we have M \ O0 ≡Σ O. By monotonicity of RΣ , O \ O0 ≡Σ O. (2) Set M = O \ O0 . o

The converse of Point (1) cannot be expected to hold since mod is not required to yield
minimal RΣ -modules; therefore it had to be reformulated as (2).
   Based on Lemma 8, we can construct an ograph that approximates our expectation
and can be calculated using only |V| module extractions:

Definition 9. Let G = (V, E) be an ograph. The module-induced dependency graph of
G is the ograph MDG(G) := (V, E 0 ) with

                        E 0 := (O1 , O2 ) | O1 ∩ mod(O
                                                     f2 , OG ) , ∅ .
                              
            O4     O5                           O4     O5
                                                                          MDG(G) \ G∗
  (a)       O1     O2      O3             (b)   O1     O2     O3          G \ MDG(G)



Fig. 2: (a) the MDG of the example ontology and (b) the visualisation of its MIC and MIR


Note that the MDG(G) is not a repair of G, but a representation of the modular dependen-
cies given the partitioning of axioms induced by G. Therefore, rather than restructuring
import statements to make the import structure match the MDG, an ontology developer
should consider moving axioms responsible for unintentional dependencies from one
ontology to another. In this paper, we do not investigate repairs further.
    There are now two ways to compare an ograph with its MDG, leading to two measures.
For capturing completeness, we determine the number of edges that are in MDG (i.e.,
denote significances) but not in G, relative to the overall number of edges in MDG. Since
we do not want to penalise non-transitive and non-reflexive imports, we have to consider
the edges in MDG \ G∗ . For capturing relevance, we determine the number of edges in
G \ MDG relative to those in G. Here we have to use G rather than G∗ , again to avoid
penalising non-transitive and non-reflexive imports, as in the following situation.
Example 10. Assume that Oa directly imports Ob , and Ob directly imports Oc . Further-
more, Oa reuses only knowledge from M ⊆ Ob and no knowledge from Oc , while
Ob \ M reuses knowledge from Oc . Hence both direct imports satisfy relevance, but the
indirect import would not.
Example 11. Consider the ograph G from Examples 2–5. Figure 2a shows MDG(G), and
Figure 2b shows MDG(G) \ G∗ (full arrows) and G \ MDG(G) (dashed arrows).
The actual metric is defined by dividing the size (number of edges) in one of the two
differences above by the size of MDG(G) or G, respectively, and subtracting it from 1:
Definition 12. Let G = (V, E) be an ograph. We call
 1. MIC(G) := RSim(MDG(G), G∗ ) the module-induced completeness of G;
 2. MIR(G) := RSim(G, MDG(G)) the module-induced relevance of G.
For the situation in Example 11 and Figure 2, we obtain MIR(G) = 0.5 and MIC(G) =
0.75. The MIR and MIC values can be considered as an “aggregated” measure for the
edge-wise similarity between the actual ograph and the reference MDG. In cases where
they clearly differ from the “ideal” value 1, as in the example, ontology developers can
use them as an indicator for reconsidering the import structure of their ontology if that
structure was meant to capture logical dependencies. The precise numerical values are of
minor interest; in particular, low values can be caused by few or many “structuring errors”
and do not pinpoint the precise cause. However, we will make use of the quantitative
nature of the MIR and MIC values in Section 5 when we empirically analyse the extent to
which adherence to completeness and relevance depend on certain ontology properties.
    One might wonder whether the global nature of significance might cause our mea-
sures to count the same fault several times. This is not necessarily so; see Example 10.
4.2   Atom-Induced Modularity

We now additionally assume that mod satisfies (M0)–(M6) required by the AD [10].
    We developed our previous two metrics extending the existing notion of local com-
pleteness to significance, which is used to define the underlying MDG. We now focus on
significance. Let O1 , O2 , OG be ontologies with O1 ∪O2 ⊆ OG and O1 ∩ mod(O   f2 , OG ) ,
∅. Intuitively, O1 contains knowledge about the terms in O2 and should be considered
                                                            f
when arguing about those. This intuitive criterion can be strengthened by abstracting
away from the signature of O2 : is there some signature Σ such that, whenever O2 contains
knowledge about Σ, then so does O1 ? The AD allows us to verify that criterion without
having to check every signature in OG . The dependency relation of the AD captures
the exact same property, but between atoms: An atom a depends on an atom b (a  b)
if a ⊆ M implies b ⊆ M. Hence the new criterion can be approximated by checking
whether or not every atom associated with O2 depends on some atom associated with O1 .
Since O2 may overlap with some atoms, we need to define “associated with” as follows.

Definition 13. Let O, O0 be ontologies such that O0 ⊆ O. We call the set

                         AC(O 0 , O) := a ∈ A(O) | a ∩ O 0 , ∅
                                        

the atom cover of O0 in O. For ontologies O, O1 , O2 with O1 ∪ O2 ⊆ O, we write
AC(O2 , O)  AC(O1 , O) iff there are a ∈ AC(O1 , O) and b ∈ AC(O2 , O) such that b  a.

Since O ⊆ O0 and A(O) is a partitioning of O, the atom cover of O0 in O is the unique
minimal cover of O0 by atoms of A(O) in the topological sense. It is easy to see that it can
be computed in polynomial time modulo the AD. Note that AC(O2 , OG )  AC(O1 , OG )
as a logical dependency between O1 and O2 is orthogonal to completeness: being
stronger than significance, it would at best yield a necessary condition for insignificance
and thus cannot serve as a useful approximation for completeness (see above).
    Based on AC and , we define the atom-induced counterpart of the MDG:

Definition 14. Let G = (V, E) be an ograph. The atom-induced dependency graph of G
is the ograph ADG(G) := (V, E 0 ) with

                     E 0 := (O1 , O2 ) | AC(O2 , OG )  AC(O1 , OG ) .
                           


Example 15. The ADG for the ograph G from Example 5, as shown in Figure 3a, is a
proper subgraph of the MDG (Figure 2a). This is due to O5 and its atom cover being
empty, while O4 is ∅-significant in OG .

We obtain the second two metric analogously to MIC and MIR:

Definition 16. Let G = (V, E) be an ograph. We call

 1. AIC(G) := RSim(ADG(G), G∗ ) the atom-induced completeness of G;
 2. AIR(G) := RSim(G, ADG(G)) the atom-induced relevance of G.

For the situation in Figure 3, we obtain AIR(G) ≈ 0.62 and AIC(G) = 0.75.
               O4     O5                         O4     O5
                                                                           ADG(G) \ G∗
    (a)        O1     O2    O3             (b)   O1     O2      O3         G \ ADG(G)



Fig. 3: (a) the ADG of the example ontology and (b) the visualisation of its AIC and AIR


4.3       Relation between the metrics
The coincidence between ADG and MDG in the previous examples is not accidental:
Lemma 17. Let O1 , O2 , O be ontologies such that O1 ∪ O2 ⊆ O. If AC(O2 , O) 
AC(O1 , O), then O1 ∩ mod(O
                          f2 , O) , ∅.

Proof. Let AC(O2 , O)  AC(O1 , O), i.e., b  a for some a ∈ AC(O1 , O) and b ∈
AC(O2 , O). Since b ∈ AC(O2 , O) and by Definition 13, there is some β ∈ b ∩ O2 . By
Lemma 3, β ∈ mod(β̃, O). Since β̃ ⊆ O
                                    f2 and mod is monotonic in the first argument (M3),
we have β ∈ mod(O2 , O) := M. By the definition of atoms, we have b ⊆ M, and b  a
                  f
implies a ⊆ M. Since a ∈ AC(O1 , O) and thus a ∩ O1 , ∅, we have O1 ∩ M , ∅. o

The following corollary is a direct result of Lemma 17:
Corollary 18. Let G be an ograph. ADG(G) is a subgraph of MDG(G).
As shown in Example 15, the MDG is, in general, not a subgraph of the ADG and
therefore the converse of Lemma 17 does not hold.


5     Implementation and Evaluation
We implemented both the MIC, MIR, AIC and AIR based on the OWL API [35] imple-
mentation of atomic decomposition and >⊥∗ -locality based module extraction.
    We then analysed the transitive and reflexive import closure of 438 ontologies in a
recent snapshot of the NCBO BioPortal ontology repository [29]. While the snapshot
also provides pre-gathered ontologies as single OWL XML files, we needed the import
structure and therefore used the original files. This failed for 45 ontologies, e.g., because
they referenced at least one file that was not available online any more. Furthermore, we
excluded 321 ontologies without import statements and 24 ontologies that violated the
OWL 2 DL standard, e.g. by using punning in a prohibited way. A further 3 ontologies
timed out after 20 minutes when calculating their AD. This left us with 45 ontologies.
Their import closures consist of 2 to 32 ontologies each, adding up to 263. Since some
ontologies were imported several times (e.g., IAO Metadata 10×), the number of unique
ontologies was 211. These multiple occurrences may have distorted our results.
    We found that the median MIC and AIC was ≈ 0.75 and the median MIR and AIR
was ≈ 0.89, with a standard deviation of ≈ 0.28 and ≈ 0.22, respectively. These medians
cannot be compared directly since MIC/AIC are defined differently from MIR/AIR. 18
ontologies achieved an MIC and AIC of 1, i.e., they use imports in a “safe” way. Note that
none of them contained more than four ontologies in their import closure, with PEAO
having the largest one. 21 ontologies had an MIR and AIR of 1, with COGPO having the
largest import closure (size 9). The NMOBR ontology with the largest import closure
(32) had both the lowest MIC and AIC, ≈ 0.09, see [31]. The DC ontology was scored
with the lowest MIR and AIR, both having the value ≈ 0.22.
    Given Lemma 17 we were not surprised to observe that ADG ⊆ MDG for all tested
ontologies. In addition, the Spearman’s rank correlation coefficient of MIC and AIC was
as high as ≈ 0.997 with p < 10−49 and that of MIR and AIR was ≈ 0.98 with p < 10−32 .
However, in only ten cases the ADG was a proper subgraph of the MDG. In six cases, this
was due to some axioms being non-local w.r.t. the empty signature (see Example 15). We
were unable to identify the reason for the remaining four ontologies because the number
of axioms in their import closure made both manual and automatic analysis infeasible.
    We evaluated two more hypotheses: (A) Are larger import closures less likely to be
constructed modularly? We found that MIC and AIC tended to decline with larger import
closures, indicated by the correlation coefficients of ≈-0.8 and ≈-0.79 with p < 10−10 .
This effect could not be observed for MIR and AIR (≈0.06 at p < 0.7 and ≈0.04 at
p < 0.8). A reason might be the difference between the “global” nature of completeness
(considering dependencies between ontologies unrelated via the import structure) and
the “local” nature of relevance (applying only to an ontology and its direct imports).
Therefore, a more complex import closure may make completeness harder to ensure,
while having no effect on relevance. (B) Do “non-modular” ontologies tend to have both
a low relevance and a low completeness? We cannot confirm this hypothesis: there was
no significant correlation between MIC and MIR, or AIC and AIR.


6   Conclusion and Future Work

With the MDG and the ADG we introduced two new views on the logical structure of a
modular ontology. Developers may find them helpful to investigate the logical dependen-
cies between imports in detail, while researchers may use the metrics based upon them
to analyse large ontology corpora similarly to what we did above. Nevertheless, there is
no precise general understanding of the terms “modularity” and “logical dependency”,
and our definitions capture only two of the possible variants.
    While we used a generalisation of local completeness, other modularity criteria
may be investigated using the same techniques, e.g., one might want to check whether
ontologies reuse all the imported knowledge. Even more so, because, for example,
ontology developers might not have control over the import structure of an imported
foreign ontology, it might make sense to evaluate certain import statements a special
way. Such scenarios can be taken care of by refining our approach with labelled ographs.
Further questions for continuing this work in progress include: In which cases do the
MDG and ADG actually differ? How are the experimental results affected by using a
module notion that provides minimal modules, such as MEX [21]? Can our metrics be
used in an optimisation problem for automatically calculating a “good” import structure
of a given ontology with maximal values of some/all measures? The last one does not
seem easy, as further parameters are needed to avoid trivial cases, such as constructing
an import structure without import statements.
Acknowledgements. We thank the anonymous reviewers for the constructive comments.
References

 1. Armas Romero, A., Kaminski, M., Cuenca Grau, B., Horrocks, I.: Module extraction in
    expressive ontology languages via Datalog reasoning. J. of Artificial Intelligence Research
    55, 499–564 (2016)
 2. Bao, J., Voutsadakis, G., Slutzki, G., Honavar, V.: Package-based description logics. In:
    Stuckenschmidt et al. [42], pp. 349–371
 3. Chen, J., Ludwig, M., Walther, D.: Computing minimal subsumption modules of ontologies.
    In: Proc. of GCAI-18. EPiC Series in Computing, vol. 55, pp. 41–53. EasyChair (2018)
 4. Cuenca Grau, B., Parsia, B., Sirin, E.: Ontology integration using E-connections. In: Stucken-
    schmidt et al. [42], pp. 293–320
 5. Cuenca Grau, B., Horrocks, I., Kazakov, Y., Sattler, U.: A logical framework for modularity
    of ontologies. In: Proc. of IJCAI-07. pp. 298–303 (2007)
 6. Cuenca Grau, B., Horrocks, I., Kazakov, Y., Sattler, U.: Modular reuse of ontologies: Theory
    and practice. J. of Artificial Intelligence Research 31(1), 273–318 (2008)
 7. Cuenca Grau, B., Parsia, B., Sirin, E., Kalyanpur, A.: Modularity and web ontologies. In:
    Proc. of KR-06. pp. 198–209. AAAI Press (2006)
 8. d’Aquin, M., Schlicht, A., Stuckenschmidt, H., Sabou, M.: Ontology modularization for
    knowledge selection: Experiments and evaluations. In: Proc. of DEXA-07. LNCS, vol. 4653,
    pp. 874–883. Springer (2007)
 9. d’Aquin, M., Schlicht, A., Stuckenschmidt, H., Sabou, M.: Criteria and evaluation for ontology
    modularization techniques. In: Stuckenschmidt et al. [42], pp. 67–89
10. Del Vescovo, C., Horridge, M., Parsia, B., Sattler, U., Schneider, T., Zhao, H.: Modular
    structures and atomic decomposition in ontologies. Manuscript, University of Bremen (2019),
    http://www.informatik.uni-bremen.de/˜schneidt/dl2019/AD.pdf
11. Del Vescovo, C., Parsia, B., Sattler, U., Schneider, T.: The modular structure of an ontology:
    Atomic decomposition. In: Proc. of IJCAI-11. pp. 2232–2237 (2011)
12. Dijkstra, E.W.: On the role of scientific thought. In: Selected writings on Computing: A
    Personal Perspective, pp. 60–66. Springer (1982)
13. Ensan, F., Du, W.: A semantic metrics suite for evaluating modular ontologies. Inf. Syst. 38(5),
    745–770 (2013)
14. Gatens, W., Konev, B., Wolter, F.: Lower and upper approximations for depleting modules of
    description logic ontologies. In: Proc. of DL-14. CEUR, vol. 1193 (2014)
15. Ghilardi, S., Lutz, C., Wolter, F.: Did I damage my ontology? A case for conservative
    extensions in description logics. In: Proc. of KR-06. pp. 187–197. AAAI Press (2006)
16. Golbeck, J., Fragoso, G., Hartel, F., Hendler, J., Oberthaler, J., Parsia, B.: The National Cancer
    Institute’s thesaurus and ontology. J. of Web Semantics 1(1), 75–80 (2003)
17. Hitzler, P., Krötzsch, M., Parsia, B., Patel-Schneider, P.F., Rudolph, S. (eds.): OWL 2 Web
    Ontology Language: Primer. W3C Recommendation (27 October 2009), available at http:
    //www.w3.org/TR/owl2-primer/
18. Horrocks, I., Kutz, O., Sattler, U.: The even more irresistible SROIQ. In: Proc. of KR-06.
    pp. 57–67. AAAI Press (2006)
19. Jiménez-Ruiz, E., Cuenca Grau, B., Sattler, U., Schneider, T., Berlanga Llavori, R.: Safe and
    economic re-use of ontologies: A logic-based methodology and tool support. In: Proc. of
    ESWC-08. LNCS, vol. 5021, pp. 185–199. Springer (2008)
20. Khan, Z.C., Keet, C.M.: Dependencies between modularity metrics towards improved modules.
    In: Proc. of EKAW-16. LNCS, vol. 10024, pp. 400–415 (2016)
21. Konev, B., Lutz, C., Walther, D., Wolter, F.: Semantic modularity and module extraction in
    description logics. In: Proc. of ECAI-08. pp. 55–59 (2008)
22. Konev, B., Lutz, C., Walther, D., Wolter, F.: Formal properties of modularization. In: Stucken-
    schmidt et al. [42], pp. 25–66
23. Kontchakov, R., Pulina, L., Sattler, U., Schneider, T., Selmer, P., Wolter, F., Zakharyaschev,
    M.: Minimal module extraction from DL-Lite ontologies using QBF solvers. In: Proc. of
    IJCAI-09. pp. 836–841 (2009)
24. Krötzsch, M., Simančik, F., Horrocks, I.: A description logic primer. CoRR abs/1201.4089
    (2012), http://arxiv.org/abs/1201.4089
25. Loebe, F.: Requirements for logical modules. In: Proc. of WoMO-06. CEUR, vol. 232 (2006)
26. Lutz, C., Walther, D., Wolter, F.: Conservative extensions in expressive description logics. In:
    Proc. of IJCAI-07. pp. 453–458 (2007)
27. Lutz, C., Wolter, F.: Deciding inseparability and conservative extensions in the description
    logic EL. J. of Symbolic Computation 45(2), 194–228 (2010)
28. Martı́n-Recuerda, F., Walther, D.: Fast modularisation and atomic decomposition of ontologies
    using axiom dependency hypergraphs. In: Proc. of ISWC-14, Part II. LNCS, vol. 8797, pp.
    49–64. Springer (2014)
29. Matentzoglu, N., Parsia, B.: BioPortal Snapshot 30 March 2017 (data set) (2017), http:
    //doi.org/10.5281/zenodo.439510
30. Nolte, R.: Modules, Imports, Atoms: Structural Comparison for Ontologies. Bachelor thesis,
    University of Bremen (2017), in German
31. Nolte, R., Schneider, T.: Supplemental material, webpage with ograph and MDG of the
    NMOBR ontology at http://www.informatik.uni-bremen.de/˜schneidt/dl2019
32. Nortjé, R., Britz, K., Meyer, T.: Reachability modules for the description logic SRIQ. In:
    Proc. of LPAR-19. LNCS, vol. 8312, pp. 636–652. Springer (2013)
33. Oh, S., Yeom, H.Y., Ahn, J.: Cohesion and coupling metrics for ontology modules. Information
    Technology and Management 12(2), 81–96 (2011)
34. Orme, A.M., Yao, H., Etzkorn, L.H.: Coupling metrics for ontology-based systems. IEEE
    Software 23(2), 102–108 (2006)
35. owl.cs Developer Team: The OWL API, GitHub repository https://owlcs.github.io/
    owlapi/
36. Page-Jones, M.: Fundamentals of Object-Oriented Design in UML. Addison-Wesley (1999)
37. Parsia, B., Schneider, T.: The modular structure of an ontology: an empirical study. In: Proc.
    of KR-10. pp. 584–586. AAAI Press (2010)
38. Sattler, U., Schneider, T., Zakharyaschev, M.: Which kind of module should I extract? In:
    Proc. of DL-09. CEUR, vol. 477 (2009)
39. Schlicht, A., Stuckenschmidt, H.: Towards structural criteria for ontology modularization. In:
    Proc. of WoMO-06. CEUR, vol. 232 (2006)
40. Serafini, L., Tamilin, A.: Composing modular ontologies with Distributed Description Logics.
    In: Stuckenschmidt et al. [42], pp. 321–347
41. Spackman, K.A., Campbell, K.E., Côté, R.A.: SNOMED RT: a reference terminology for
    health care. In: Proc. of 1st Amer. Medical Inform. Assoc. Annual Symposium (AMIA-97)
    (1997)
42. Stuckenschmidt, H., Parent, C., Spaccapietra, S. (eds.): Modular Ontologies: Concepts, Theo-
    ries and Techniques for Knowledge Modularization, LNCS, vol. 5445. Springer (2009)
43. Tartir, S., Arpinar, I.B., Moore, M., Sheth, A.P., Aleman-Meza, B.: OntoQA: Metric-based
    ontology quality analysis. In: Proc. of IEEE Workshop on Knowledge Acquisition from
    Distributed, Autonomous, Semantically Heterogeneous Data and Knowledge Sources. pp.
    45–53 (2005)
44. Tversky, A.: Features of similarity. Psychological Review 84(4), 327–352 (1977)
45. Yao, H., Orme, A.M., Etzkorn, L.H.: Cohesion metrics for ontology design and application. J.
    of Computer Science 1(1), 107–113 (2005)