=Paper=
{{Paper
|id=Vol-2373/paper-21
|storemode=property
|title=How Modular Are Modular Ontologies? Logic-Based Metrics for Ontologies with Imports
|pdfUrl=https://ceur-ws.org/Vol-2373/paper-21.pdf
|volume=Vol-2373
|authors=Robin Nolte,Thomas Schneider
|dblpUrl=https://dblp.org/rec/conf/dlog/NolteS19
}}
==How Modular Are Modular Ontologies? Logic-Based Metrics for Ontologies with Imports==
How Modular Are Modular Ontologies? Logic-Based Metrics for Ontologies with Imports Robin Nolte and Thomas Schneider Department of Computer Science, University of Bremen, Germany {nolte,tschneider}@uni-bremen.de Abstract. Many large ontologies are developed modularly, often using import statements, which are supported by the OWL standard. However, import statements do not provide logical guarantees such as local completeness, which is an estab- lished quality criterion for ontology modules: an ontology is locally complete if it uses terms from imported ontologies without changing the knowledge reused from them. To measure the extent to which ontologies separated by import statements are logically modular, we present four new quantitative logic-based metrics: two are strongly related to local completeness and based on module extraction, using some established module notion as a reference; the other two exploit the depen- dency relation of the atomic decomposition. We formally study the relationship between the measures and evaluate them on a set of ontologies. 1 Introduction Modularity of ontologies has received much attention in the past decade, given the exis- tence of large and comprehensive ontologies such as SNOMED CT [41] and NCI [16], and considering the observation that modular ontologies can be maintained, compre- hended, and reasoned over more easily. There are several ways to develop an ontology modularly. The simplest one is certainly the distribution of the axioms over several files and the use of OWL’s import statements. More principled approaches include the use of (a-priori) extensions of DLs supporting modular development [2,4,40] or (a-posteriori) decomposition methods [7,11]. The simple approach based on import statements seems to be used frequently: out of the 438 ontologies in the NCBO BioPortal ontology reposi- tory [29], at least 69 are built modular using imports; each of them imports (directly and indirectly) up to 31 other ontologies from within and outside the repository. For example, the Cell Ontology (CL) imports 8 ontologies, including the Gene Ontology (GO). The use of import statements allows developers not only to reuse an existing ontology in (several) other ontologies, but also to follow the design principle separation of concerns [12] or, in ontological terms, to separate (sub-)domains of interest. However, this separation does not provide any logical guarantees, e.g., GO does not need to be a module of CL in the strict logical sense that all knowledge about genes that follows from CL already follows from GO. In other words, CL might reuse the vocabulary “borrowed from” GO in a way that changes the knowledge in GO. More generally, if an ontology O imports a module M, then it is reasonable to require that O reuses the vocabulary from M in a “safe” way, in the sense that (a) O does not entail any knowledge about that vocabulary or (b) O does not entail anything new about that vocabulary, (i.e., knowledge not already entailed by M). Guarantee (a) is known as safety [5,19] and (b) as local completeness [7,25]. Both are strongly related to encapsulation as known from software engineering [36]; moreover, (b) can be formalised using conservative extensions [15]. Alas, conservativity is undecidable already for fragments of the DL SROIQ [26] underlying OWL. Approximations are known; e.g., locality [6] is a sufficient condition for conservativity, and its syntactic variant can be computed in polynomial time. Locality is the foundation for the successful family of locality-based modules [6]; it can and has been used to evaluate local completeness qualitatively for ontologies with imports [5,19], concluding that essentially all ontologies satisfy (b) from above [5]. In this paper we introduce four quantitative measures, aka metrics, for the quality of imports in ontologies or repositories. Our metrics are based on the general idea of determining how close imported modules are to being modules in a strict sense, i.e., determined by some “reference” notion of a module which itself guarantees local completeness (including, but not restricted to, locality-based modules). Hence they determine the extent to which an ontology with imports uses the vocabulary from the imported ontologies in a “safe” way. The first two metrics capture local completeness by determining the similarity of two graphs: one graph represents the import structure of an ontology or repository, i.e., the nodes are the imported subontologies, and the edges are induced by the imports; the other graph is a reference graph that represents a logical significance relation between importing and imported ontologies which is defined based on an arbitrary logically encapsulating notion of a module. The other two metrics use a reference graph defined via the atomic decomposition of an ontology [11], which constitutes a partition into subontologies that are atomic w.r.t. some underlying module notion, together with a dependency relation between those atoms. This way they determine the similarity of the import graph to an “ideal” graph that captures the logical dependencies within the ontology, which is different from measuring local completeness. We define the new metrics and evaluate them on a recent BioPortal snapshot [29]. We expect them to help ontology engineers assess whether and to which degree the import- induced modular structure of their ontology actually reflects the logical dependencies between the constituent ontologies, as represented by the underlying reference graphs. This paper is based on the first author’s bachelor thesis [30]. 2 Related Work There is a large amount of work introducing quality criteria for modules and evaluating modules against these criteria; informative overviews are given in [9,20]. Here, the term “module” is to be conceived in a broad sense, referring to ontologies that can be used in combination with other ontologies. These quality criteria can be divided into qualitative and quantitative ones (metrics): qualitative criteria can be either satisfied or violated, and metrics are satisfied to some degree. Some metrics have analogues in software engineering. Criteria are also grouped by the characterised or measured aspects: – Logical criteria such as local correctness and local completeness [7,25] – Structural criteria such as size and redundancy [39] – Criteria transferred from software engineering to ontologies, based on or considering the semantics, such as cohesion and coupling [25,33,34,45] – User- or developer-centric criteria such as comprehensibility [25], readability [43], or domain coverage [8] Many of the metrics have been implemented in ontology assessment tools and evaluated empirically [13,20,43]. 3 Preliminaries 3.1 Graph Theory We use digraphs, i.e., directed graphs G = (V, E), where V is the (non-empty) set of nodes and E ⊆ V × V the set of edges. In the following, let G = (V, E) and G0 = (V 0 , E 0 ) be digraphs. If V ⊆ V 0 and E ⊆ E 0 , then G is called a subgraph of G0 . We denote the digraph (V, E \ E 0 ) by G \ G0 . To measure the similarity between two digraphs G and G0 that share the same nodes, we use the following (asymmetric) variant of the Tversky index [44] of their edge sets, which relates to the notion of specificity from test theory. Definition 1. Given digraphs G = (V, E) and G0 = (V 0 , E 0 ) with V = V 0 , the relative similarity of G with G0 is RSim(G, G0 ) := |E ∩ E 0 | / |E 0 | if this term is defined, i.e., if E 0 , ∅. In case E 0 = ∅, we set RSim(G, G0 ) := 0 if E , ∅ and RSim(G, G0 ) := 1 if E = ∅. As a consequence, if E = E 0 , then RSim(G, G0 ) = 1; if E ∩ E 0 = ∅, then RSim(G, G0 ) = 0. 3.2 OWL and Import Structures We assume that the reader is familiar with OWL and the syntax and semantics of the underlying description logic SROIQ, for details see [17,18,24]. An ontology O is a finite set of general concept and role inclusions as well as concept and role assertions. Let NC be a set of concept names, NR a set of role names and NI a set of individual names. A signature is a set Σ ⊆ NC ∪ NR ∪ NI of terms. Given a concept, role, axiom, or ontology X, the set of terms occurring in X is called the signature of X, denoted X. e OWL ontologies may contain import statements, which can be used transitively and even cyclically. The import closure of an ontology O is the union of O and all ontologies imported directly and indirectly by O. The import structure of O can be represented by a digraph whose nodes are the imported/importing ontologies and the edges denote the import relation. More generally, an ograph is a digraph G whose nodes are ontologies. Example 2. The ograph in Figure 1a consists of ontologies O1 , . . . , O5 and represents the situation where O1 imports O2 and O4 , O2 imports O3 , and O4 imports O5 . In general, an ograph is a means to denote some kind of (logical) significance relation between ontologies. We assume that such relations are reflexive and transitive and thus will often work with the reflexive transitive closure of an ograph G = (V, E), denoted G∗ = (V, E ∗ ) and depicted in Figure 1b for the ontologies from Example 2. Import- induced ographs as in Example 2 are a special case, representing the logical significance O4 O5 O4 O5 (a) O1 O2 O3 (b) O1 O2 O3 Fig. 1: (a) an exemplary ograph G and (b) its reflexive transitive closure G∗ relation that is to be expected from the import (e.g., O2 should be significant for O1 but not vice versa). Furthermore, an ograph in general does not need to have a unique root or be connected. In this sense, the notion of an ograph even captures repositories of ontologies. For an ograph G = (V, E), let OG := V. If G represents the import structure S of an ontology O, then OG is the import closure of O. Note that ontologies contained in an ograph may share symbols regardless of whether or not they are adjacent. 3.3 Modules and Atomic Decomposition For an interpretation I and an ontology O, we write I |= O if I is a model of O, and denote the interpretation obtained by restricting I to the signature Σ with I|Σ . The central notion that is used to define a module is that of a conservative extension [15] or, more generally, of Σ-inseparability [22], which is defined as follows. Let O1 and O2 be ontologies and Σ a signature. O1 and O2 are model inseparable w.r.t. Σ, written O1 ≡mCE Σ O2 , if {I|Σ | I |= O1 } = {I|Σ | I |= O2 }. O1 and O2 are deductively inseparable w.r.t. Σ, written O1 ≡dCE Σ O2 , if, for all SROIQ-entailments η over Σ, we have O1 |= η if and only if O2 |= η. The equivalence relations ≡R are defined upon the notion R ∈ {mCE, dCE} which is called an inseparability relation. Other notions of inseparability relations can be defined, see, e.g., [22,38]; in this paper we will consider only R ∈ {mCE, dCE}. An inseparability relation R induces modules defined as follows [23]. Let M and O be ontologies with M ⊆ O and Σ a signature. We call M an RΣ -module of O if M ≡RΣ O, and a minimal RΣ -module of O if M, but no proper subset of M, is an RΣ -module of O. Stronger module notions such as self-contained and depleting RΣ -module exist [23], but are not needed in the following. Since both notions of inseparability are undecidable already for DLs of moderate expressivity [26,27], extracting RΣ -modules is a computationally very hard problem. Still, there are several successful module extraction methods: some are restricted to fragments of SROIQ, such as the MEX/AMEX approach [14,21]; others are approximation methods, which guarantee that the output is an RΣ -module (often with additional useful properties), but that module is not necessarily minimal. These approaches include locality-based modules (LBMs) [6], reachability-based modules (RBMs) [28,32], and modules based on datalog reasoning [1], and minimal subsumption modules [3]. LBMs come in two flavours (semantic and syntactic), and three variants per flavour (⊥, >, nested). We will refer to LBMs in some examples, but the precise definitions are not relevant for understanding those. Since the techniques developed in the following do not depend on a concrete module extraction approach, we use the general notation x-mod(·, ·), i.e., M = x-mod(Σ, O) denotes the module extracted for the signature Σ from the ontology O using approach x. Usually Σ is called the seed signature for M. More precisely, the module extraction function x-mod(·, ·) maps every pair (Σ, O) to a subset of O. In contrast to the extraction of a single module, there are techniques for decomposing an ontology into a collection of subontologies which, in some sense, represent all modules. Atomic decomposition (AD) [11] is one such technique. It partitions the input ontology O into a set of atoms and computes a dependency relation between them, also yielding a base for the set of all modules of O [10, Lemma 4.15]. AD can be used with any module notion x whose function x-mod(·, ·) satisfies certain properties, which are called (M0)–(M6) in [10]. LBMs and MEX modules satisfy all of them [10]. Let O be an ontology and F x (O) = {x-mod(Σ, O) | Σ ⊆ O}. e If x is clear from the context or we do not have a specific x in mind, we simply write F(O). Given two axioms α, β ∈ O, we write α ∼O β if, for all M ∈ F(O), we have α ∈ M iff β ∈ M. Obviously ∼O is an equivalence relation. The atoms of O are the equivalence classes of ∼O , i.e., maximal subsets of axioms that are not separated by any module. We denote atoms by a, b, . . . and the set of atoms of O by A(O). It is immediate that A(O) is a partition of O into linearly many atoms, and every module M ∈ F(O) is a disjoint union of atoms. In contrast, not every atom needs to occur in some module, but this is only the case if O contains certain tautologies [10]. We assume the absence of such tautologies. Let a, b be atoms of O. We say that a depends on b and write a b if a ⊆ M implies b ⊆ M, for every M ∈ F(O). The relation is called the dependency relation of O; it is obviously a partial order. The atomic decomposition (AD) of O is the poset (A(O), ), where a b iff a b and a , b. It can be represented using a Hasse diagram. Although an ontology can have exponentially many modules [37], its AD can always be computed using a linear number of module extractions: it suffices to compute the genuine modules of O, which are the Mα := mod(e α, O) for all α ∈ O, and compute the atoms and the dependency relation from only the Mα [11]. This observation is based on several properties of the AD that follow from M1–M5, including the following. Lemma 3 ([11]). For all α ∈ a ∈ A(O), Mα is the smallest module of O containing a. 4 Metrics for Assessing Imports Our aim is to develop metrics that assess whether an ontology O1 imports another ontology O2 in a reasonable way. There are many possible meanings of “reasonable”. We focus on the following two: local completeness requires that O1 does not add new knowledge about the terms from O2 ; relevance requires that O2 adds to the knowledge in O1 about those terms. These two conditions are orthogonal to each other, and they should also hold when the ograph contains further ontologies, as we demonstrate in Example 4 below. We thus postulate a condition that is stricter than local completeness and call it completeness. A logically sound definition of “add knowledge” should arguably best be based on inseparability. Since the latter is hard to impossible to decide, approximations are needed and have been used already, e.g., locality as a qualitative approximation of local completeness [5]. Since locality is a sufficient condition for inseparability, it implies that the import is (locally) complete, but if locality is violated, we do not know and have to be cautious. Contrariwise, a necessary condition would be a useful approximation. In the following, we devise four metrics based on an arbitrary module notion that guarantees inseparability, but not necessarily minimality. By not committing to a par- ticular module notion, we leave room for using better module notions that may be developed in the future. Each of our metrics assigns every given ograph G (e.g., import structure) a rational number between 0 and 1. This is done by comparing G with a refer- ence ograph G0 that has the same nodes as G and whose edges denote the “reasonable” import relations between the ontologies from G. For the first two measures, we base our understanding of “reasonable” on the notion of significance, which we will define using inseparability and approximate using modules. The second two measures will use atoms from the atomic decomposition instead of modules, thus achieving an even stronger notion of reasonableness that abstracts away from the specific signature of the imported ontology. The actual metrics will then be given by a standard notion of difference between the input and reference ographs. In the following, let R be an arbitrary inseparability relation R. We use ≡ as a shorthand for ≡R . We assume that R is monotone [23], i.e., if O1 ⊆ O2 ⊆ O3 and O1 ≡Σ O3 , then O1 ≡Σ O2 . Furthermore, let x be an arbitrary module notion that yields unique RΣ -modules, i.e., for all O and Σ, we have that x-mod(Σ, O) is a uniquely determined subset of O with x-mod(Σ, O) ≡Σ O, which is guaranteed, e.g., by LBMs and MEX modules. In the following, we omit x where no confusion can arise. 4.1 Module-Induced Modularity For our first two metrics, the edges of the reference ograph capture a variant of com- pleteness between the respective nodes of G. To explain the underlying intuitions, we continue Example 2. Example 4. Consider the ograph G from Example 2. Given that O2 (directly) imports O3 , as represented by G’s edges, this import would be “safe” if O3 were locally complete with respect to O2 , i.e., if the import into O2 did not change the meaning of the symbols in O3 , that is, O2 ∪ O3 ≡O g3 O3 (1). Similarly, since O1 imports the other four ontologies (directly or indirectly), local completeness would require that the meaning of the symbols in those is not changed, i.e., i=1,...,5 Oi ≡Si=2,...,5 O i=2,...,5 Oi (2). S S fi In general, (1) and (2) cannot be decided. They can be approximated using locality, as in Cuenca Grau et al.’s approach [5]. However, the authors applied their approach only to “top-level ontologies”, i.e., (2) would have been tested for OG , but not (1). Furthermore, our metrics should not rely on locality, as explained above. We will therefore measure statements such as (1) and (2) in a different way, using sufficient conditions, based on the above properties of the module notion mod: Example 5. In Example 4, the following are sufficient for (1) and (2): mod(O f3 , O2 ∪ O3 ) = O3 (1 ) and mod( i=2,...,5 Oi , i=1,...,5 Oi ) = i=2,...,5 Oi (2 ). Let x be the “top” 0 0 S f S S version of syntactic locality, and let O1 = {A t B v C}, O2 = {D v B, A v E}, O3 = {F v A}, O4 = {B v ¬A} and O5 = ∅. 1 Then both (10 ) and (20 ) hold. 1 O5 might still contain non-logical axioms, such as annotations or declarations. This case does occur, e.g. in DC Terms (http://purl.org/dc/elements/1.1/). Note that we made expectation (1) implicitly based on the assumption that O1 and O4 are irrelevant for the local completeness of O2 ∪ O3 w.r.t. O3 . However, if we take them into account too and extract the module from the whole ontology OG , then >-mod(O f3 , OG ) consists of O3 ∪ O4 plus the first axiom of O2 . The overlap with O2 suggests that O2 does change the knowledge of the terms from O3 in the context of all of OG . This last observation admits the following conclusions in view of our desired metrics: (1) it is not enough to consider local completeness; (2) for testing completeness and relevance, one needs to check edges as well as non-edges in an ograph. In order to accommodate these conclusions, we use the following notion as a basis for our metric. Definition 6. Let Σ be a signature and O, O0 be ontologies such that O0 ⊆ O. O0 is Σ-significant in O iff O .Σ O \ O0 . Intuitively, a Σ-significant ontology O0 in O adds knowledge about Σ to O (relevance). Contrariwise, Σ-insignificance is similar to completeness but lets us specify a signature. Based on our considerations above and the notion of significance, we can put our expectation precisely: Given an ograph G = (V, E), for any two ontologies O1 , O2 ∈ V we expect O1 to be O f2 -significant in OG iff (O1 , O2 ) ∈ E ∗ . That is, O2 should import O1 directly or indirectly if, and only if, O1 adds knowledge about the terms in O f2 to OG . In particular, if O1 and O2 do not share terms, we would not expect any path between them in G; if they do share terms and both contain knowledge about those shared terms, we would expect paths both ways. Example 7. In Example 5, O1 is, as expected, O f2 -, O f3 , O f4 - and O f5 -insignificant in OG , but O4 is O2 - and O3 -significant. Analogously, O2 is O3 - and O4 -significant, and O3 is f f f f O f4 -significant. Note that O2 , O3 , O4 are all O f2 - and O f5 -significant but O5 is not. Since Σ-significance is defined based on inseparability, which is undecidable already for DLs of moderate expressivity, we can only hope to find a sufficient condition for insignificance. Indeed, due to the above properties of modules, the following holds. Lemma 8. Let Σ be a signature and O, O0 ontologies such that O0 ⊆ O. (1) If O0 ∩ mod(Σ, O) = ∅, then O0 is Σ-insignificant in O. (2) If O0 is Σ-insignificant in O, then there is some RΣ -module M of O with O0 ∩M = ∅. Proof. (1) Let M := mod(Σ, O). Then M ≡Σ O. With O0 ∩ M = ∅, i.e., M = M \ O0 , we have M \ O0 ≡Σ O. By monotonicity of RΣ , O \ O0 ≡Σ O. (2) Set M = O \ O0 . o The converse of Point (1) cannot be expected to hold since mod is not required to yield minimal RΣ -modules; therefore it had to be reformulated as (2). Based on Lemma 8, we can construct an ograph that approximates our expectation and can be calculated using only |V| module extractions: Definition 9. Let G = (V, E) be an ograph. The module-induced dependency graph of G is the ograph MDG(G) := (V, E 0 ) with E 0 := (O1 , O2 ) | O1 ∩ mod(O f2 , OG ) , ∅ . O4 O5 O4 O5 MDG(G) \ G∗ (a) O1 O2 O3 (b) O1 O2 O3 G \ MDG(G) Fig. 2: (a) the MDG of the example ontology and (b) the visualisation of its MIC and MIR Note that the MDG(G) is not a repair of G, but a representation of the modular dependen- cies given the partitioning of axioms induced by G. Therefore, rather than restructuring import statements to make the import structure match the MDG, an ontology developer should consider moving axioms responsible for unintentional dependencies from one ontology to another. In this paper, we do not investigate repairs further. There are now two ways to compare an ograph with its MDG, leading to two measures. For capturing completeness, we determine the number of edges that are in MDG (i.e., denote significances) but not in G, relative to the overall number of edges in MDG. Since we do not want to penalise non-transitive and non-reflexive imports, we have to consider the edges in MDG \ G∗ . For capturing relevance, we determine the number of edges in G \ MDG relative to those in G. Here we have to use G rather than G∗ , again to avoid penalising non-transitive and non-reflexive imports, as in the following situation. Example 10. Assume that Oa directly imports Ob , and Ob directly imports Oc . Further- more, Oa reuses only knowledge from M ⊆ Ob and no knowledge from Oc , while Ob \ M reuses knowledge from Oc . Hence both direct imports satisfy relevance, but the indirect import would not. Example 11. Consider the ograph G from Examples 2–5. Figure 2a shows MDG(G), and Figure 2b shows MDG(G) \ G∗ (full arrows) and G \ MDG(G) (dashed arrows). The actual metric is defined by dividing the size (number of edges) in one of the two differences above by the size of MDG(G) or G, respectively, and subtracting it from 1: Definition 12. Let G = (V, E) be an ograph. We call 1. MIC(G) := RSim(MDG(G), G∗ ) the module-induced completeness of G; 2. MIR(G) := RSim(G, MDG(G)) the module-induced relevance of G. For the situation in Example 11 and Figure 2, we obtain MIR(G) = 0.5 and MIC(G) = 0.75. The MIR and MIC values can be considered as an “aggregated” measure for the edge-wise similarity between the actual ograph and the reference MDG. In cases where they clearly differ from the “ideal” value 1, as in the example, ontology developers can use them as an indicator for reconsidering the import structure of their ontology if that structure was meant to capture logical dependencies. The precise numerical values are of minor interest; in particular, low values can be caused by few or many “structuring errors” and do not pinpoint the precise cause. However, we will make use of the quantitative nature of the MIR and MIC values in Section 5 when we empirically analyse the extent to which adherence to completeness and relevance depend on certain ontology properties. One might wonder whether the global nature of significance might cause our mea- sures to count the same fault several times. This is not necessarily so; see Example 10. 4.2 Atom-Induced Modularity We now additionally assume that mod satisfies (M0)–(M6) required by the AD [10]. We developed our previous two metrics extending the existing notion of local com- pleteness to significance, which is used to define the underlying MDG. We now focus on significance. Let O1 , O2 , OG be ontologies with O1 ∪O2 ⊆ OG and O1 ∩ mod(O f2 , OG ) , ∅. Intuitively, O1 contains knowledge about the terms in O2 and should be considered f when arguing about those. This intuitive criterion can be strengthened by abstracting away from the signature of O2 : is there some signature Σ such that, whenever O2 contains knowledge about Σ, then so does O1 ? The AD allows us to verify that criterion without having to check every signature in OG . The dependency relation of the AD captures the exact same property, but between atoms: An atom a depends on an atom b (a b) if a ⊆ M implies b ⊆ M. Hence the new criterion can be approximated by checking whether or not every atom associated with O2 depends on some atom associated with O1 . Since O2 may overlap with some atoms, we need to define “associated with” as follows. Definition 13. Let O, O0 be ontologies such that O0 ⊆ O. We call the set AC(O 0 , O) := a ∈ A(O) | a ∩ O 0 , ∅ the atom cover of O0 in O. For ontologies O, O1 , O2 with O1 ∪ O2 ⊆ O, we write AC(O2 , O) AC(O1 , O) iff there are a ∈ AC(O1 , O) and b ∈ AC(O2 , O) such that b a. Since O ⊆ O0 and A(O) is a partitioning of O, the atom cover of O0 in O is the unique minimal cover of O0 by atoms of A(O) in the topological sense. It is easy to see that it can be computed in polynomial time modulo the AD. Note that AC(O2 , OG ) AC(O1 , OG ) as a logical dependency between O1 and O2 is orthogonal to completeness: being stronger than significance, it would at best yield a necessary condition for insignificance and thus cannot serve as a useful approximation for completeness (see above). Based on AC and , we define the atom-induced counterpart of the MDG: Definition 14. Let G = (V, E) be an ograph. The atom-induced dependency graph of G is the ograph ADG(G) := (V, E 0 ) with E 0 := (O1 , O2 ) | AC(O2 , OG ) AC(O1 , OG ) . Example 15. The ADG for the ograph G from Example 5, as shown in Figure 3a, is a proper subgraph of the MDG (Figure 2a). This is due to O5 and its atom cover being empty, while O4 is ∅-significant in OG . We obtain the second two metric analogously to MIC and MIR: Definition 16. Let G = (V, E) be an ograph. We call 1. AIC(G) := RSim(ADG(G), G∗ ) the atom-induced completeness of G; 2. AIR(G) := RSim(G, ADG(G)) the atom-induced relevance of G. For the situation in Figure 3, we obtain AIR(G) ≈ 0.62 and AIC(G) = 0.75. O4 O5 O4 O5 ADG(G) \ G∗ (a) O1 O2 O3 (b) O1 O2 O3 G \ ADG(G) Fig. 3: (a) the ADG of the example ontology and (b) the visualisation of its AIC and AIR 4.3 Relation between the metrics The coincidence between ADG and MDG in the previous examples is not accidental: Lemma 17. Let O1 , O2 , O be ontologies such that O1 ∪ O2 ⊆ O. If AC(O2 , O) AC(O1 , O), then O1 ∩ mod(O f2 , O) , ∅. Proof. Let AC(O2 , O) AC(O1 , O), i.e., b a for some a ∈ AC(O1 , O) and b ∈ AC(O2 , O). Since b ∈ AC(O2 , O) and by Definition 13, there is some β ∈ b ∩ O2 . By Lemma 3, β ∈ mod(β̃, O). Since β̃ ⊆ O f2 and mod is monotonic in the first argument (M3), we have β ∈ mod(O2 , O) := M. By the definition of atoms, we have b ⊆ M, and b a f implies a ⊆ M. Since a ∈ AC(O1 , O) and thus a ∩ O1 , ∅, we have O1 ∩ M , ∅. o The following corollary is a direct result of Lemma 17: Corollary 18. Let G be an ograph. ADG(G) is a subgraph of MDG(G). As shown in Example 15, the MDG is, in general, not a subgraph of the ADG and therefore the converse of Lemma 17 does not hold. 5 Implementation and Evaluation We implemented both the MIC, MIR, AIC and AIR based on the OWL API [35] imple- mentation of atomic decomposition and >⊥∗ -locality based module extraction. We then analysed the transitive and reflexive import closure of 438 ontologies in a recent snapshot of the NCBO BioPortal ontology repository [29]. While the snapshot also provides pre-gathered ontologies as single OWL XML files, we needed the import structure and therefore used the original files. This failed for 45 ontologies, e.g., because they referenced at least one file that was not available online any more. Furthermore, we excluded 321 ontologies without import statements and 24 ontologies that violated the OWL 2 DL standard, e.g. by using punning in a prohibited way. A further 3 ontologies timed out after 20 minutes when calculating their AD. This left us with 45 ontologies. Their import closures consist of 2 to 32 ontologies each, adding up to 263. Since some ontologies were imported several times (e.g., IAO Metadata 10×), the number of unique ontologies was 211. These multiple occurrences may have distorted our results. We found that the median MIC and AIC was ≈ 0.75 and the median MIR and AIR was ≈ 0.89, with a standard deviation of ≈ 0.28 and ≈ 0.22, respectively. These medians cannot be compared directly since MIC/AIC are defined differently from MIR/AIR. 18 ontologies achieved an MIC and AIC of 1, i.e., they use imports in a “safe” way. Note that none of them contained more than four ontologies in their import closure, with PEAO having the largest one. 21 ontologies had an MIR and AIR of 1, with COGPO having the largest import closure (size 9). The NMOBR ontology with the largest import closure (32) had both the lowest MIC and AIC, ≈ 0.09, see [31]. The DC ontology was scored with the lowest MIR and AIR, both having the value ≈ 0.22. Given Lemma 17 we were not surprised to observe that ADG ⊆ MDG for all tested ontologies. In addition, the Spearman’s rank correlation coefficient of MIC and AIC was as high as ≈ 0.997 with p < 10−49 and that of MIR and AIR was ≈ 0.98 with p < 10−32 . However, in only ten cases the ADG was a proper subgraph of the MDG. In six cases, this was due to some axioms being non-local w.r.t. the empty signature (see Example 15). We were unable to identify the reason for the remaining four ontologies because the number of axioms in their import closure made both manual and automatic analysis infeasible. We evaluated two more hypotheses: (A) Are larger import closures less likely to be constructed modularly? We found that MIC and AIC tended to decline with larger import closures, indicated by the correlation coefficients of ≈-0.8 and ≈-0.79 with p < 10−10 . This effect could not be observed for MIR and AIR (≈0.06 at p < 0.7 and ≈0.04 at p < 0.8). A reason might be the difference between the “global” nature of completeness (considering dependencies between ontologies unrelated via the import structure) and the “local” nature of relevance (applying only to an ontology and its direct imports). Therefore, a more complex import closure may make completeness harder to ensure, while having no effect on relevance. (B) Do “non-modular” ontologies tend to have both a low relevance and a low completeness? We cannot confirm this hypothesis: there was no significant correlation between MIC and MIR, or AIC and AIR. 6 Conclusion and Future Work With the MDG and the ADG we introduced two new views on the logical structure of a modular ontology. Developers may find them helpful to investigate the logical dependen- cies between imports in detail, while researchers may use the metrics based upon them to analyse large ontology corpora similarly to what we did above. Nevertheless, there is no precise general understanding of the terms “modularity” and “logical dependency”, and our definitions capture only two of the possible variants. While we used a generalisation of local completeness, other modularity criteria may be investigated using the same techniques, e.g., one might want to check whether ontologies reuse all the imported knowledge. Even more so, because, for example, ontology developers might not have control over the import structure of an imported foreign ontology, it might make sense to evaluate certain import statements a special way. Such scenarios can be taken care of by refining our approach with labelled ographs. Further questions for continuing this work in progress include: In which cases do the MDG and ADG actually differ? How are the experimental results affected by using a module notion that provides minimal modules, such as MEX [21]? Can our metrics be used in an optimisation problem for automatically calculating a “good” import structure of a given ontology with maximal values of some/all measures? The last one does not seem easy, as further parameters are needed to avoid trivial cases, such as constructing an import structure without import statements. Acknowledgements. We thank the anonymous reviewers for the constructive comments. References 1. Armas Romero, A., Kaminski, M., Cuenca Grau, B., Horrocks, I.: Module extraction in expressive ontology languages via Datalog reasoning. J. of Artificial Intelligence Research 55, 499–564 (2016) 2. Bao, J., Voutsadakis, G., Slutzki, G., Honavar, V.: Package-based description logics. In: Stuckenschmidt et al. [42], pp. 349–371 3. Chen, J., Ludwig, M., Walther, D.: Computing minimal subsumption modules of ontologies. In: Proc. of GCAI-18. EPiC Series in Computing, vol. 55, pp. 41–53. EasyChair (2018) 4. Cuenca Grau, B., Parsia, B., Sirin, E.: Ontology integration using E-connections. In: Stucken- schmidt et al. [42], pp. 293–320 5. Cuenca Grau, B., Horrocks, I., Kazakov, Y., Sattler, U.: A logical framework for modularity of ontologies. In: Proc. of IJCAI-07. pp. 298–303 (2007) 6. Cuenca Grau, B., Horrocks, I., Kazakov, Y., Sattler, U.: Modular reuse of ontologies: Theory and practice. J. of Artificial Intelligence Research 31(1), 273–318 (2008) 7. Cuenca Grau, B., Parsia, B., Sirin, E., Kalyanpur, A.: Modularity and web ontologies. In: Proc. of KR-06. pp. 198–209. AAAI Press (2006) 8. d’Aquin, M., Schlicht, A., Stuckenschmidt, H., Sabou, M.: Ontology modularization for knowledge selection: Experiments and evaluations. In: Proc. of DEXA-07. LNCS, vol. 4653, pp. 874–883. Springer (2007) 9. d’Aquin, M., Schlicht, A., Stuckenschmidt, H., Sabou, M.: Criteria and evaluation for ontology modularization techniques. In: Stuckenschmidt et al. [42], pp. 67–89 10. Del Vescovo, C., Horridge, M., Parsia, B., Sattler, U., Schneider, T., Zhao, H.: Modular structures and atomic decomposition in ontologies. Manuscript, University of Bremen (2019), http://www.informatik.uni-bremen.de/˜schneidt/dl2019/AD.pdf 11. Del Vescovo, C., Parsia, B., Sattler, U., Schneider, T.: The modular structure of an ontology: Atomic decomposition. In: Proc. of IJCAI-11. pp. 2232–2237 (2011) 12. Dijkstra, E.W.: On the role of scientific thought. In: Selected writings on Computing: A Personal Perspective, pp. 60–66. Springer (1982) 13. Ensan, F., Du, W.: A semantic metrics suite for evaluating modular ontologies. Inf. Syst. 38(5), 745–770 (2013) 14. Gatens, W., Konev, B., Wolter, F.: Lower and upper approximations for depleting modules of description logic ontologies. In: Proc. of DL-14. CEUR, vol. 1193 (2014) 15. Ghilardi, S., Lutz, C., Wolter, F.: Did I damage my ontology? A case for conservative extensions in description logics. In: Proc. of KR-06. pp. 187–197. AAAI Press (2006) 16. Golbeck, J., Fragoso, G., Hartel, F., Hendler, J., Oberthaler, J., Parsia, B.: The National Cancer Institute’s thesaurus and ontology. J. of Web Semantics 1(1), 75–80 (2003) 17. Hitzler, P., Krötzsch, M., Parsia, B., Patel-Schneider, P.F., Rudolph, S. (eds.): OWL 2 Web Ontology Language: Primer. W3C Recommendation (27 October 2009), available at http: //www.w3.org/TR/owl2-primer/ 18. Horrocks, I., Kutz, O., Sattler, U.: The even more irresistible SROIQ. In: Proc. of KR-06. pp. 57–67. AAAI Press (2006) 19. Jiménez-Ruiz, E., Cuenca Grau, B., Sattler, U., Schneider, T., Berlanga Llavori, R.: Safe and economic re-use of ontologies: A logic-based methodology and tool support. In: Proc. of ESWC-08. LNCS, vol. 5021, pp. 185–199. Springer (2008) 20. Khan, Z.C., Keet, C.M.: Dependencies between modularity metrics towards improved modules. In: Proc. of EKAW-16. LNCS, vol. 10024, pp. 400–415 (2016) 21. Konev, B., Lutz, C., Walther, D., Wolter, F.: Semantic modularity and module extraction in description logics. In: Proc. of ECAI-08. pp. 55–59 (2008) 22. Konev, B., Lutz, C., Walther, D., Wolter, F.: Formal properties of modularization. In: Stucken- schmidt et al. [42], pp. 25–66 23. Kontchakov, R., Pulina, L., Sattler, U., Schneider, T., Selmer, P., Wolter, F., Zakharyaschev, M.: Minimal module extraction from DL-Lite ontologies using QBF solvers. In: Proc. of IJCAI-09. pp. 836–841 (2009) 24. Krötzsch, M., Simančik, F., Horrocks, I.: A description logic primer. CoRR abs/1201.4089 (2012), http://arxiv.org/abs/1201.4089 25. Loebe, F.: Requirements for logical modules. In: Proc. of WoMO-06. CEUR, vol. 232 (2006) 26. Lutz, C., Walther, D., Wolter, F.: Conservative extensions in expressive description logics. In: Proc. of IJCAI-07. pp. 453–458 (2007) 27. Lutz, C., Wolter, F.: Deciding inseparability and conservative extensions in the description logic EL. J. of Symbolic Computation 45(2), 194–228 (2010) 28. Martı́n-Recuerda, F., Walther, D.: Fast modularisation and atomic decomposition of ontologies using axiom dependency hypergraphs. In: Proc. of ISWC-14, Part II. LNCS, vol. 8797, pp. 49–64. Springer (2014) 29. Matentzoglu, N., Parsia, B.: BioPortal Snapshot 30 March 2017 (data set) (2017), http: //doi.org/10.5281/zenodo.439510 30. Nolte, R.: Modules, Imports, Atoms: Structural Comparison for Ontologies. Bachelor thesis, University of Bremen (2017), in German 31. Nolte, R., Schneider, T.: Supplemental material, webpage with ograph and MDG of the NMOBR ontology at http://www.informatik.uni-bremen.de/˜schneidt/dl2019 32. Nortjé, R., Britz, K., Meyer, T.: Reachability modules for the description logic SRIQ. In: Proc. of LPAR-19. LNCS, vol. 8312, pp. 636–652. Springer (2013) 33. Oh, S., Yeom, H.Y., Ahn, J.: Cohesion and coupling metrics for ontology modules. Information Technology and Management 12(2), 81–96 (2011) 34. Orme, A.M., Yao, H., Etzkorn, L.H.: Coupling metrics for ontology-based systems. IEEE Software 23(2), 102–108 (2006) 35. owl.cs Developer Team: The OWL API, GitHub repository https://owlcs.github.io/ owlapi/ 36. Page-Jones, M.: Fundamentals of Object-Oriented Design in UML. Addison-Wesley (1999) 37. Parsia, B., Schneider, T.: The modular structure of an ontology: an empirical study. In: Proc. of KR-10. pp. 584–586. AAAI Press (2010) 38. Sattler, U., Schneider, T., Zakharyaschev, M.: Which kind of module should I extract? In: Proc. of DL-09. CEUR, vol. 477 (2009) 39. Schlicht, A., Stuckenschmidt, H.: Towards structural criteria for ontology modularization. In: Proc. of WoMO-06. CEUR, vol. 232 (2006) 40. Serafini, L., Tamilin, A.: Composing modular ontologies with Distributed Description Logics. In: Stuckenschmidt et al. [42], pp. 321–347 41. Spackman, K.A., Campbell, K.E., Côté, R.A.: SNOMED RT: a reference terminology for health care. In: Proc. of 1st Amer. Medical Inform. Assoc. Annual Symposium (AMIA-97) (1997) 42. Stuckenschmidt, H., Parent, C., Spaccapietra, S. (eds.): Modular Ontologies: Concepts, Theo- ries and Techniques for Knowledge Modularization, LNCS, vol. 5445. Springer (2009) 43. Tartir, S., Arpinar, I.B., Moore, M., Sheth, A.P., Aleman-Meza, B.: OntoQA: Metric-based ontology quality analysis. In: Proc. of IEEE Workshop on Knowledge Acquisition from Distributed, Autonomous, Semantically Heterogeneous Data and Knowledge Sources. pp. 45–53 (2005) 44. Tversky, A.: Features of similarity. Psychological Review 84(4), 327–352 (1977) 45. Yao, H., Orme, A.M., Etzkorn, L.H.: Cohesion metrics for ontology design and application. J. of Computer Science 1(1), 107–113 (2005)