=Paper=
{{Paper
|id=Vol-2211/paper-19
|storemode=property
|title=Ontology Partitioning Using E-Connections Revisited
|pdfUrl=https://ceur-ws.org/Vol-2211/paper-19.pdf
|volume=Vol-2211
|authors=Sascha Jongebloed,Thomas Schneider
|dblpUrl=https://dblp.org/rec/conf/dlog/Jongebloed018
}}
==Ontology Partitioning Using E-Connections Revisited==
Ontology Partitioning Using E-Connections Revisited Sascha Jongebloed and Thomas Schneider? Department of Computer Science, University of Bremen, Germany {sasjonge,ts}@cs.uni-bremen.de Abstract. We revisit the technique for partitioning ontologies using E-connections introduced by Cuenca Grau et al. (2006). We extend the underlying notions of E-connections and partitionings to the latest OWL 2 ontology language, with the exception of the universal role, which cannot be accommodated for reasons that we explain. Besides presenting a simplified notation for the theoretical foundations, we devise a new algorithm for computing partitions which, in contrast to the origi- nal quadratic-time algorithm, can be implemented in linear time, is deterministic, and admits simple rigorous correctness and maximality proofs. We further discuss possible extensions beyond OWL. 1 Introduction Modularity is an important paradigm for ontology development and use, which has received much attention in the past decade. Modular ontologies are usually easier to maintain, comprehend, and reason over, and extracting a module of a given ontology is an important task in scenarios where only a part of the knowledge in that ontology is to be reused efficiently. If an ontology is given as a monolithic entity, the task of decomposing it into modules is the first step towards making it modular and being able to enjoy the previously mentioned benefits. Partitionings based on E-connections (henceforth: E-partitions) have been developed for the purpose of automatically and efficiently decomposing an ontology [6,9,10]. The result is a graph whose nodes are (pairwise disjoint and mutually covering) components (i.e., subsets) of the ontology, and whose edges represent “semantic links” between the components in the style of E-connections [20]. The adoption of the E-connection frame- work ensures that the resulting partitions provide strong logical guarantees; thus (certain combinations of) the components are encapsulating modules of the input ontology [10]. E-Connections were defined for abstract description systems (ADSs), a notion that generalises many DLs, modal logics, and further formalisms [3]. An E-connection is a combination of heterogeneous logical theories via semantic links established by a desig- nated set of relations, called link relations. Their semantics is given by interpretations that consist of pairwise disjoint components and which interpret each component locally and the link relations as relations between the respective components. In contrast to the general nature of E-connections, E-partitions and the underlying framework have been defined specifically for the DL SHOIQ(D), a fragment of the latest OWL 2 ontology language. The partitioning procedure starts from a monolithic ontology O and attempts ? Supported by DFG project SCHN 1234/3. to turn it into an (as fine as possible) E-connection by identifying link relations among the roles in O. An efficient algorithm for computing E-partitions is given in [6,9,10]: is traverses the subconcepts of the input ontology, and assigns to the current concept either a fresh component (if this is witnessed by some freshly identified link relation) or an existing component (when necessary for conformance with the definition of E- connections). When all concepts are dealt with, the assignment of components induces a distribution of the ontology’s axioms over the components, constituting the partition. To ensure that the resulting E-partition and the input ontology are equivalent under the E-connection semantics, an additional condition has to be imposed on the input ontology, which was called safety in [6,9,10] and coincides with domain-independence (DI) known from first-order logic and database theory [1]. Contrary to DI in first-order logic, DI for SHOIQ(D) is decidable via a neat syntactic criterion called locality [10]. The partitioning algorithm was implemented as an experimental feature of the (dis- continued) ontology editor Swoop.1 In initial experiments [10], some ontologies admitted useful E-partitions, and some of those revealed modelling deficiencies. However, some modelling patterns – e.g., the use of few top-level concepts or exhaustive declaration of disjoint concepts or distinct individuals – are notoriously problematic: they admit only coarse E-partitions, which follows directly from the definitions underlying E-connections. This may be one of the reasons for the limited success of E-partitions. Another reason certainly lies in the preliminary nature of the existing implementation: it is incomplete (not all input axioms are preserved), incorrect (sometimes axioms are spuriously assigned to distinct components), and nondeterministic (we sometimes obtained different results, re-running the algorithm on the same input). Hence some observed “poor” E-partitions might be due to a bug in the code rather than a “feature” of the general approach. When trying to separate bugs from “features”, we found a considerably simpler way of computing E-partitions, based on the idea of creating an undirected graph G whose edges connect concepts and/or roles that must be part of the same component, and reading the minimal E-partition off G’s connected components. We were able to extend the underlying framework to most of OWL 2, with the only exception of the universal role u. This exception is unavoidable: with u, locality no longer characterises domain- independence, thus destroying the guarantee that the resulting E-partition and the input ontology are equivalent. We consider this exception insignificant because u “typically plays a minor role in modelling” [19], confirmed by a large corpus of ontologies.2 Contributions and overview. Our new approach offers the following advantages over the original one: – Ready applicability to the latest OWL 2 standard, under two restrictions ensuring equivalence (domain-independence, absence of the universal role) – A simplified notation of the theoretical foundations – A new, simpler partitioning algorithm which is deterministic – i.e., does not depend on the traversal order of concepts and axioms – and can be implemented to run in linear time (the original one is quadratic [10]) – Simple rigorous proofs that the algorithm is correct and outputs the maximal E- connection that is equivalent to the input ontology under the E-connection semantics 1 https://github.com/ronwalf/swoop 2 In last year’s BioPortal corpus [21], u occurs in only 322 of all 16.3 million axioms. Given these advantages, we expect that the algorithm is easy to implement and to evaluate on an up-to-date ontology corpus. Hence, the work reported here is in progress. We briefly introduce OWL (§2), define our extension of E-connections and E-partitions to SROIQ (§3), present the new algorithm (§4), discuss extensions to OWL and beyond (§5), and conclude in §6. Implementation and tests are subject to future work. Due to space restrictions, we provide most proofs and further examples in a technical report: http://www.informatik.uni-bremen.de/tdki/research/papers.html Related work. Numerous module extraction and modularisation approaches are known [18,30,7,8,11,29,12], as well as ontology languages supporting modular development [5,27]. Several decomposition approaches have been developed, some of which provide strong logical guarantees (such as encapsulation), e.g., signature decomposition [17], partitionings based on E-connections [10,8], and atomic decomposition [11]; and others are rather based on statistical parameters, such as structure-based partitioning [29,2]. 2 Preliminaries The ontology language OWL is based on the DL SROIQ [15]. OWL additionally provides datatypes [23] and keys [25], whose treatment we defer to Section 5. Here and in Section 3, we consider SROIQ without the universal role (see above). We give only a short overview of the syntax and refer to [19] for semantics and further details. We denote concept names with A, B, . . . , complex concepts with C, D, . . . , role names with r, s, . . . , complex roles (role names r or inverses r− ) with R, S , . . . , individual names with a, b, . . . , and axioms with α, β, . . . . Concepts and axioms are built according to the following grammars, where m ∈ N and the remaining letters are as explained above. C ::= A | ¬C | C u D | >m R.C | ∃R.Self | {a} α ::= C v D | C ≡ D | R v S | R ≡ S | Disjoint(R, S ) | R1 ◦ R2 v S | C(a) | R(a, b) | a ≈ b | a 0 b, The remaining operators >, ⊥, t, ∃R, ∀R, 6n R and axiom types (role transitivity, symme- try, etc.) are “syntactic sugar”; hence we do not treat them explicitly, thus simplifying the presentation without losing generality. We furthermore ignore OWL’s global restrictions (regularity, restricted use of non-simple roles) as E-connections and partitionings do not rely on them. An ontology is a set of axioms. 3 E-Connections and Partitionings for SROIQ The definitions in this section largely follow those in [6,9]; however, we simplify notation to allow, as we believe, a more concise presentation. The differences and their motivation are explained at the end of this section. In the following, we work with a fixed signature, which is a finite set Σ of concept names, role names, and individual names (together: terms), i.e., Σ = ΣC ] ΣR ] ΣI . Here and in the following, ] denotes disjoint union. Given a natural number n ≥ 1, an n-numbering of Σ is a function ν that assigns to each concept and individual name a number ν(A), ν(a) ∈ {1, . . . , n} and to each role name a pair ν(r) ∈ {1, . . . , n} × {1, . . . , n} of numbers. For what follows, the order between the numbers is irrelevant; an arbitrary finite index set I instead of {1, . . . , n} would suffice. Numberings are extended to arbitrary concepts and axioms inductively: Definition 1. Let ν be an n-numbering of Σ. – ν is extended to arbitrary roles R and concepts C as follows. 1. If R = r, then ν(R) = ν(r). 2. If R = r− , and ν(r) = (i, j), then ν(R) = ( j, i). 3. If C = ¬D, then ν(C) = ν(D). 4. If C = D u E and ν(D) = ν(E), then ν(C) = ν(D). 5. If C = >m R.D and ν(R) = (i, j) and ν(D) = j, then ν(C) = i. 6. If C = ∃R.Self and ν(R) = (i, i), then ν(C) = i. 7. If C = {a}, then ν(C) = ν(a). A concept C is called an i-concept if ν(C) is defined3 and ν(C) = i. – ν is further extended to axioms as follows. An i-axiom is of the form 8. C v D or C ≡ D where ν(C) = ν(D) = i; 9. R v S , R ≡ S , or Disjoint(R, S ) where ν(R) = ν(S ) = (i, j); 10. R1 ◦ R2 v S where ν(R1 ) = (i, j), ν(R2 ) = ( j, k), and ν(S ) = (i, k); 11. C(a) where ν(C) = ν(a) = i; 12. R(a, b) where ν(a) = i4 and ν(R) = (ν(a), ν(b)); 13. a ≈ b or a 0 b where ν(a) = ν(b) = i. The central notion of an ontology consisting of several parts depends on ν: Definition 2. Given an n-numbering ν, a ν-ontology is a tuple O = (O1 , . . . , On ), each of whose components Oi is a nonempty set of i-axioms. To distinguish (monolithic) ontologies O from (combined) ν-ontologies O, we sometimes call the former simple ontologies. The semantics of ν-ontologies is given by ν-interpretations, which are partitioned according to the numbering of the signature given by ν: Definition 3. A ν-interpretation is an interpretation (∆I , ·I,ν ) such that – ∆I = i≤n ∆Ii with ∆i , ∅ for each component ∆i U – A ⊆ ∆Ii for all A ∈ ΣC with ν(A) = i I,ν – rI,ν ⊆ ∆Ii × ∆Ij for all r ∈ ΣR with ν(r) = (i, j) – aI,ν ∈ ∆Ii for all a ∈ ΣI with ν(a) = i ν-interpretations are used for two purposes: (a) for defining the semantics of simple ontologies given ν, and (b) for defining the semantics of ν-ontologies, which is the usual semantics of E-connections [20]. In case (a), we use the standard DL semantics and the usual satisfaction symbol |=. We say that O has a ν-model if there is a ν-interpretation I |= O. The class of ν-models of O is contained in the class of its (unrestricted) models. For example, if O contains > v A, then all ν-models have a single component; thus, if ν is an n-numbering with n ≥ 2, then O has no ν-models but may still be consistent (i.e., have unrestricted models). For purpose (b), the interpretation function is extended ensuring that the extension of arbitrary i-concepts is a subset of ∆Ii . For this purpose, (only) the case of negation has to differ from the standard semantics. 3 ν(C) is undefined, e.g., for concepts ∃r.D with ν(r) = (i, j) and ν(D) , j. 4 This choice is arbitrary; it could also be “ν(b) = i”. Analogously for item 10. Definition 4. The interpretation function of a ν-interpretation I is extended to arbitrary ν-concepts as follows. – (¬C)I,ν = ∆Ii \ C I,ν for i-concepts C – (C u D)I,ν = C I,ν ∩ DI,ν – (>m R.C)I,ν = {d ∈ ∆Ii | #{e ∈ ∆Ij | (d, e) ∈ RI and e ∈ C I,ν } ≥ m} for (i, j)-roles R and j-concepts C – (∃R.Self)I,ν = {d ∈ ∆Ii | (d, d) ∈ RI } for (i, i)-roles R – {a}I,ν = aI,ν Satisfaction of i-axioms in ν-interpretations is defined as follows. – I |=ν C v D if C I,ν ⊆ DI,ν – I |=ν C ≡ D if C I,ν = DI,ν – I |=ν C(a) if aI ∈ C I,ν – For i-axioms α of all other types (role inclusion, equivalence, disjointness, assertion, or individual (in)equality), I |=ν α if I |= α. I is a model of a ν-ontology O, written I |=ν O, if I |=ν α for all axioms α in O. O is consistent if it has a model. The correspondence between simple and ν-ontologies is captured by compatibility and equivalence. The former notion is syntactic and only requires that the components of O partition O. The latter is semantic and relative to ν-interpretations. Definition 5. Let O be an ontology, ν an n-numbering, and O = (O1 , . . . , On ) a ν- ontology. 1. O and O are compatible, written O ∼ O, if O = i≤n Oi . S 2. O and O are equivalent, written O ≈ O if, for all ν-interpretations I, it holds that I |= O iff I |=ν O. Unsurprisingly, compatibility does not imply equivalence, and neither does the converse hold. The latter is trivial: if O = {A v B} and O = {¬B v ¬A}, then O ≈ O but O / O. For the other direction, take O = {¬A v A0 , B v B0 } and O = ({¬A v A0 }, {B v B0 }). Then O is well-formed according to Definitions 1 and 2 and O ∼ O, but O 0 O because O has ν-models I with 2 components but no such I is a model of O. The reason is that the axiom ¬A v A0 cannot be satisfied if the extension of both A and A0 is restricted to only one component, as required by Definition 3. As a consequence of this observation, an additional assumption has to be made, which is the DL equivalent of domain-independence known from the first-order and database worlds [1]. For (most of) SROIQ, this assumption admits an efficiently decidable syntactic characterisation. For domain-independent ontologies, compatibility implies equivalence; see Theorem 9. Definition 6. A concept C (axiom α) is domain-independent (DI) if C I = C J (I |= α iff J |= α) for all interpretations I, J with X I = X J for all terms X. An ontology is DI if so are all its axioms. The syntactic characterisation from [6,9,10] is the following. Definition 7. The set of local concepts is defined inductively as follows. – Every concept name is local. – ¬C is local if C is not. – C u D is local if C or D is local. – >m R.C, ∃R.Self, and {a} are local. An ontology O is safe if the following hold. – For all axioms C v D, if D is local, then so is C. – For all axioms C ≡ D, C is local iff so is D. Given an interpretation I and a set S , we write J = I ] S if ∆J = ∆I ] S and X J = X I for all terms X. Theorem 8 ([6,9,10]). For all concepts C and ontologies O: 1. C is DI iff C is local. 2. If C is DI, then C J = C I for all I and J = I ] S . 3. If C is not DI, then C J = C I ∪ S for all I and J = I ] S . 4. O is DI iff O is safe. Not only does Theorem 8 provide a syntactic characterisation of DI; with Definition 7 it also yields a linear-time decision procedure. Here it becomes clear why the universal role u (with the semantics uI = ∆I × ∆I ) cannot be accommodated by the framework: the concept C = ∃u.A is not DI but violates point 3 in Theorem 8: there are interpretations I and J = I ] S with C J = C I ∪ S (e.g., ∆I = AI = {d} and S = {e}); but if AI = ∅ then C J = C I = ∅. However, point 3 is an essential ingredient not just in the syntactic characterisation witnessing decidability of DI, but also in the following theorem linking compatibility and equivalence. Its proof completes the proof of [6, Theorem 7.2]. Theorem 9. Let O be an ontology, ν an n-numbering, O = (O1 , . . . , On ) a ν-ontology. 1. (a) If O is DI and O ∼ O, then O ≈ O. (b) If additionally O is consistent, then so is O. 2. If O is consistent and not DI, and O ∼ O and O ≈ O, then n = 1, i.e., O = O. Main differences in notation compared to [6,9,10]. n-numbered signatures correspond to “partitioned vocabularies” in [6,9]. We consider it easier to carry along the parameter ν instead of a partitioned vocabulary with its rather long denotation. Numberings also extend more naturally to complex concepts and axioms. Consequently, “combined knowledge bases” (a special case of E-connections) are now called ν-ontologies. ν-interpretations conflate “partitioned interpretations” and “combined interpreta- tions”, making “corresponding interpretations” redundant. Consequently we distinguish two semantics: I |= O (the standard semantics for simple ontologies) and I |=ν O (the E-connection semantics for ν-ontologies). Equivalence replaces “semantic compatibility” because it indeed captures equiva- lence under the class of ν-interpretations. As per the usual understanding of equivalence, O is not required to have (ν-)models. The implication “if O consistent, then also O” (under certain assumptions) is now captured by Theorem 9. As a result we reduced “syntactically compatible” to “compatible”. Finally, domain-independence replaces “invariance under domain expansions”. We decided to reflect the relationship with the fundamental first-order notion. 4 The New Partitioning Algorithm We now present the partitioning algorithm, based on the preceding definitions. As in [6,9,10], our algorithm receives as input an ontology O and returns a ν-ontology O = (O1 , . . . , On ) such that O ∼ O and n is maximal with this property. If O is domain- independent, O ≈ O follows by Theorem 9. The numbering ν is not computed by the algorithm but is implicit in the created data structure. Our algorithm determines n, the Oi , and the implicit ν by “gathering” constraints between the ν-values of the concepts and roles occurring in O in an undirected graph G. Let sub(O) be the set of concepts (atomic or complex) occurring in O. The graph G has one node for each concept in sub(O) or individual occurring in O, and two nodes r0 , r1 per role in O. The edges represent the constraints imposed by Definition 1. For example, given the axioms α = A v ∃r.B and β = B v B0 , then G has nodes A, ∃r.B, r0 , r1 , B, B0 , and α, β induce edges {A, ∃r.B}, {∃r.B, r0 }, {r1 , B}, {B, B0 }, representing cases 8 and 5 of Definition 1. The edges {A, ∃r.B} and {B, B0 } are labelled α and β, respectively. Now G has 2 connected components (CCs): G1 with nodes A, ∃r.B, r0 and label α; G2 with r1 , B, B0 and β. Hence O = ({α}, {β}). The 2-numbering follows from membership in the Gi : A, ∃r.B are 1-concepts, r is a 12-role, and B, B0 are 2-concepts. This procedure is given by the main routine partition(O) of Algorithm 1. It first creates the graph G and then adds all edges induced by the structure of the concepts (subroutine addSubConceptEdges) and axioms (addAxiomEdges) in O to G. For each role R, both subroutines use the notation Ri , which equals ri if R = r and r1−i if R = r− , for i = 0, 1. Additionally, addAxiomEdges labels, for each axiom α, one of the created edges with α. Next, the CCs of G are determined (e.g., via breadth-first search). Finally, the partitioning is read off the axiom labels in the CCs. Not all CCs need to be labelled with some axiom: e.g., if B v B0 is omitted from the above example, we will get the same G1 , G2 , but G2 will not contain any axiom label. Therefore, the partitioning of O is determined using only those CCs with ≥ 1 axiom label. We revisit this case below. 4.1 Correctness Now we want to show that the algorithm returns a combined ontology O that is compati- ble with the input ontology O. With Theorem 9 we also have that O and O are equivalent, provided that O is domain-independent. Theorem 10. If O = partition(O), then O is a ν-ontology for some ν, and O ∼ O. Proof. O ∼ O follows directly from lines 4–8 of partition. For the existence of ν, let G = (V, E, L) be the graph created in lines 1–4 of partition, G1 , . . . , Gn the CCs with ≥ 1 axiom label and Gn+1 , . . . , Gm with m ≥ n the remaining CCs. Let ν be a function assigning a number ≤ n to each node in G and each axiom in O: – for every i ≤ n: ν(x) = i for all nodes x in Gi – for every i > n: fix some j ≤ n and let ν(x) = j for all nodes in x in Gi – for all edges {x, y} in Gi with L({x, y}) = α: ν(α) = i ν contains an n-numbering of Σ if we additionally set: – for all roles r ∈ ΣR : ν(r) = (ν(r0 ), ν(r1 )) Algorithm 1: Partitioning an ontology O 1 Function partition(O): input : O with signature Σ output : ν-ontology O 2 V ← {C | C ∈ sub(O)} ∪ ΣI ∪ {r0 , r1 | r ∈ ΣR }; E ← ∅; L ← ∅; G ← (V, E, L) 3 forall C ∈ sub(O) do addSubConceptEdges(G, C) 4 forall α ∈ O do addAxiomEdges(G, α) 5 {G1 , . . . , Gn } ← all connected components (CCs) of G with ≥ 1 axiom label 6 forall i ≤ n do Oi ← {α | L(v, v0 ) = α for some edge (v, v0 ) in Gi } 7 O ← (O1 , . . . , On ) 8 return (O) 9 Function addSubConceptEdges(G, C): 10 switch C do 11 case ¬D do E ← E ∪ {C, D} 12 case D u F do E ← E ∪ {{C, D}, {C, F}} 13 case >m R.D do E ← E ∪ {{C, R0 }, {R1 , D}} 14 case ∃R.Self do E ← E ∪ {{C, R0 }, {C, R1 }} 15 case {a} do E ← E ∪ {{C, a}} 16 Function addAxiomEdges(G, α): 17 switch α do 18 case C v D or C ≡ D do E ← E ∪ {C, D}; L(C, D) ← α 19 case R v S , R ≡ S or Disjoint(R, S ) do 20 E ← E ∪ {{R0 , S 0 }, {R1 , S 1 }}; L(R0 , S 0 ) ← α 21 case R ◦ S v T do E ← E ∪ {{R1 , S 0 }, {R0 , T 0 }, {S 1 , T 1 }}; L(R0 , T 0 ) ← α 22 case C(a) do E ← E ∪ {C, a}; L(C, a) ← α 23 case R(a, b) do E ← E ∪ {{a, R0 }, {R1 , b}}; L(a, R0 ) ← α 24 case a ≈ b or a 0 b do E ← E ∪ {{a, b}}; L(a, b) ← α It remains to show that ν respects Definition 1. 1. ν is a n-numbering of Σ = sig(O). 2. ν on arbitrary concepts is an extension of ν on Σ that respects Definition 1, i.e., every C ∈ sub(O) is a ν(C)-concept. This is an easy induction on the structure of C. – C = ¬D: By line 11 of of the algorithm G contains the edge {C, D}. Hence C, D are in the same CC. Then ν(C) = ν(D). – C = >m R.D: By line 13, G contains the edges {C, R0 }, {R1 , D}. Hence C, R0 are in the same CC, and so are R1 , D. Then ν(C) = ν(R0 ) = i, ν(D) = ν(R1 ) = j and ν(R) = (i, j). – All other cases are analogous. 3. ν(α) respects the second part of Definition 1, i.e., every α ∈ O is a ν(α)-axiom. This is an easy case distinction on the type of α. – α = C v D: By line 18, G contains the edge {C, D} with L({C, D}) = α. Hence C, D and {C, D} are in the same CC and ν(C) = ν(D) = ν(α). – All other cases are anologous. o 4.2 Maximality The result of the algorithm is the maximal O with O ∼ O in the following sense. Let O = (O1 , . . . , On ), O0 = (O10 , . . . , Om0 ) be a ν- and a ν0 -ontology with i≤n Oi = i≤m Oi0 . S S We write O O0 (“O is at least as coarse as O0 ”) if for every Oi there is an Oi0 s.t. Oi ⊆ Oi0 . In other words: if O O0 then every Oi0 is the union of one or several Oi . Theorem 11. If O = partition(O) then, for every O0 with O ∼ O0 , we have O O0 . Proof. Let G = (V, E, L) the graph created in lines 1–4 of partition and O = (O1 , . . . , On ); furthermore let O0 = (O10 , . . . , Om0 ) be a ν0 -ontology. It suffices to show: For all i ≤ n and all α, β ∈ Oi : ν0 (α) = ν0 (β) (∗) To prove (∗) we will first show an auxiliary property, which involves the obvious extension of ν0 to all nodes in G; in particular, if r is a role name with ν0 (r) = (i, j), then ν0 (r0 ) := i and ν0 (r1 ) := j. Claim 1. For every x, y ∈ V: if x, y are in the same CC of G, then for all numberings ν0 of Σ we have ν0 (x) = ν0 (y). Claim 1 is proven via induction on the distance between x, y in their CC. The base case x = y is trivial. For the induction step, suppose that the claim holds for x, x0 (by IH) and there is an edge between x0 and y. Now ν0 (x0 ) = ν0 (y) can be shown via a straightforward case distinction on the creation of the edge in the functions addSubConceptEdges and addAxiomEdges together with Definition 1. Hence ν0 (x) = ν0 (y). We can now prove (∗) proceeding by axiom types and analysing the respective cases of addAxiomEdges. We just show one case; the remaining ones are very similar. – α = C1 v D1 and β = C2 v D2 : Since α, β are in the same CC, then so are C1 , C2 (by construction of ν and G). Claim 1 implies ν0 (C1 ) = ν0 (C2 ) and by Definition 1: ν0 (α) = ν0 (β). o 4.3 Complexity Let O be the input ontology, and G = (V, E, L) the graph created in lines 1–4 of partition. Set k = |O|, ` = |sub(O)| and m = |Σ|. We first consider the size of the created graph. The number of nodes is restricted by ` + 2m (2 nodes per role). It is easy to see that addAxiomEdges is called k-times and each call adds at most 3 edges. addSubConceptEdges adds at most 2 edges and is called ` times. Hence the number of edges is limited by 3k + 2`. Each call to addAxiomEdges and addSubConceptEdges is in constant time. The connected components of G can be calculated using breadth-first search in O(|V| + |E|) = O(k + ` + m) [13]. The components with ≥ 1 axiom label can be found going through all edges in O(|E|) = O(` + m). To collect the axioms for each component in line 5–6 every edge needs to be checked once. Therefore, this step is also in O(|E|) = O(` + m). Altogether the partition algorithm takes linear time, limited by O(k + ` + m), in contrast to [6, Theorem 7.2], which is quadratic. 4.4 Discussion We conclude this section with remarks concerning the implicit labelling of terms in the input ontology, the treatment of >, and the deterministic character of our algorithm. partition(O) creates a ν-ontology O with O ∼ O. The corresponding numbering ν is induced by the CCs of the created graph G. As we have already pointed out in the description of the algorithm, some CCs may not contain any axiom label. In this case the numbering of the “unlabelled” CCs is arbitrary, as seen in the proof of Theorem 10. If a unique numbering is required, e.g., in order to assign “home components” in O to the terms in O, then user intervention is required: suppose O = {A v ∃r.B, A0 v ∃r0 .B}. Then there are 3 CCs: G1 with nodes A, r0 ; G2 with A0 , r00 ; G3 with r1 , r10 , B. Now G3 is not axiom-labelled, and so r1 , r10 , B could be assigned to either G1 or G2 . Only the user can make that decision, and they need to know the respective subdomains: if the axioms are rewritten into HappyChild v ∃hasPet.Puppy and HappyDog v ∃hasChild.Puppy, then G3 should be merged with G1 . Algorithm 1 does not treat > explicitly. The required additions are straightforward but, from a theoretical point of view, unnecessary since > can be rewritten as At¬A, as usual. In that case, a fresh concept name A should be used for each occurrence of >; Otherwise nodes and the corresponding subdomains might be spuriously connected, e.g., hasToy1 and hasFood1 if O contains HappyChild v ∃hasToy.> and HappyDog v ∃hasFood.> Compared with the previous algorithms [6,9,10], neither partition nor its sub- routines make any nondeterministic choices. In particular, the computed partition is unique up to permutation of the components because the generated graph G does not depend on the order in which concepts and axioms are traversed. This last property is much less obvious in the previous algorithms, which contain a nondeterministic choice without a rigorous argument that the result is still deterministic (and we did observe a nondeterministic behaviour of the prototype implementation). 5 Extensions to the Language It is straightforward to add datatypes and keys. For datatypes this is shown in [6,9,10]. In essence, ν-interpretations require an additional component where all datatypes are interpreted; data properties are link relations directed towards the datatype component. Definitions, proofs, and algorithm require straightforward additional cases for concepts and axioms involving datatypes. Keys [25] are captured by one dedicated axiom type, whose semantics guarantees domain-independence. It suffices to add the respective case to definitions and algorithm. For details, see the appendix of the technical report. As remarked after Theorem 8, the universal role u would compromise the syntactic characterisation of domain-independence and thus the link between compatibility and equivalence. This problem does not occur with the > concept, as discussed in §4.4. We can pinpoint even more exactly the limits of partitioning based on E-connections: if role negation and disjunction were available, then u = r t ¬r for a fresh role name r. The “culprit” for the problem with C = ∃u.A as described after Theorem 8 is role negation; in fact, that argument remains valid if we instead consider C = ∃¬r.A. Negated roles compromise only the syntactic characterisation of domain-independence and thus the link between ∼ and ≈; E-connections themselves should be extensible to first-order (even higher-order) logic. If we forbid negation, we can even afford positive Boolean operators on roles and allow complex role expressions built using u, t, ◦: those are always domain- independent and can thus safely be used in number restrictions and all role axioms. Hence E-partitions can easily be extended to encompass DLs that allow a more general use of role composition [22] or (positive) Boolean roles [26]. 6 Conclusions and Future Work We have extended the original approach underlying E-partitions in [6,9,10] to all of OWL 2 except the universal role (which cannot be accommodated, as shown). We have presented a new linear-time algorithm for computing the maximal E-connection that is syntactically compatible (and, assuming domain-independence, equivalent) with the input ontology. The output of our algorithm does not depend on the traversal order of axioms and concepts. En route we have introduced a simplified notation. We have further shown that theory and algorithm extend to expressive operators on roles considered in the literature, and identified role negation as not amenable to this approach. For future work, we expect a straightforward implementation of our algorithm in the OWL API5 [14] and as a Protégé6 [24] plugin, as a basis for experiments on a representative up-to-date ontology corpus such as [21]. We conjecture that existing ontologies generally decompose well when allowing slight deviations from (syntactic) compatibility, to circumvent the notorious problematic modelling patterns, see §1. The implementation will also enable a comparison with other decomposition approaches, such as atomic decomposition [11]. We furthermore plan to revisit module extraction, extending the existing procedure in [10] to arbitrary E-connections, independently of a specific partitioning algorithm. Given the linear runtime of our algorithm, module extraction might even compete in performance with syntactic locality [7,32]. Finally, the “culprit” of role negation identified in §5 suggests the hypothesis that the partitioning approach can be transferred to logics that allow only unary negation, such as frontier-one existential rules [4] or even the unary negation fragment of first-order logic, UNFO [31]. Acknowledgement We thank the anonymous reviewers for their helpful comments. References 1. Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases. Addison-Wesley (1995), http: //webdam.inria.fr/Alice/ 2. Amato, F., De Santo, A., Moscato, V., Persia, F., Picariello, A., Poccia, S.R.: Partitioning of ontologies driven by a structure-based approach. In: Proc. of ICSC-15. pp. 320–323. IEEE Computer Society (2015) 3. Baader, F., Lutz, C., Sturm, H., Wolter, F.: Fusions of description logics and abstract descrip- tion systems. J. Artif. Intell. Res. 16, 1–58 (2002) 5 http://owlcs.github.io/owlapi/ 6 https://protege.stanford.edu/ 4. Baget, J., Leclère, M., Mugnier, M., Salvat, E.: Extending decidable cases for rules with existential variables. In: Proc. of IJCAI-09. pp. 677–682 (2009) 5. Bao, J., Voutsadakis, G., Slutzki, G., Honavar, V.: Package-based description logics. In: Stuckenschmidt et al. [28], pp. 349–371 6. Cuenca Grau, B.: Combination and integration of ontologies on the semantic web. Ph.D. thesis, Universidad de Valencia (2005) 7. Cuenca Grau, B., Horrocks, I., Kazakov, Y., Sattler, U.: Extracting modules from ontologies: A logic-based approach. In: Stuckenschmidt et al. [28], pp. 159–186 8. Cuenca Grau, B., Parsia, B., Sirin, E.: Ontology integration using E-Connections. In: Stucken- schmidt et al. [28], pp. 293–320 9. Cuenca Grau, B., Parsia, B., Sirin, E., Kalyanpur, A.: Automatic partitioning of OWL ontolo- gies using E-Connections. In: Proc. of DL-05. CEUR-WS.org, vol. 147 (2005) 10. Cuenca Grau, B., Parsia, B., Sirin, E., Kalyanpur, A.: Modularity and web ontologies. In: Proc. of KR-06. pp. 198–209. AAAI Press (2006) 11. Del Vescovo, C., Parsia, B., Sattler, U., Schneider, T.: The modular structure of an ontology: Atomic decomposition. In: Proc. of IJCAI-11. pp. 2232–2237 (2011) 12. Gatens, W., Konev, B., Wolter, F.: Module extraction for acyclic ontologies. In: Proc. of WoMO-13. CEUR-WS.org, vol. 1081 (2013) 13. Hopcroft, J.E., Tarjan, R.E.: Efficient algorithms for graph manipulation [H] (algorithm 447). Commun. ACM 16(6), 372–378 (1973) 14. Horridge, M., Bechhofer, S.: The OWL API: A Java API for OWL ontologies. Semantic Web 2(1), 11–21 (2011) 15. Horrocks, I., Kutz, O., Sattler, U.: The even more irresistible SROIQ. In: Proc. of KR-06. pp. 57–67. AAAI Press (2006) 16. Klinov, P., Del Vescovo, C., Schneider, T.: Incrementally updateable and persistent decompo- sition of OWL ontologies. In: Proc. of OWLED-12. CEUR-WS.org, vol. 849 (2012) 17. Konev, B., Lutz, C., Ponomaryov, D., Wolter, F.: Decomposing description logic ontologies. In: Proc. of KR-10. pp. 236–246. AAAI Press (2010) 18. Konev, B., Lutz, C., Walther, D., Wolter, F.: Semantic modularity and module extraction in description logics. In: Proc. of ECAI-08. FAIA, vol. 178, pp. 55–59. IOS Press (2008) 19. Krötzsch, M., Simančik, F., Horrocks, I.: A description logic primer. CoRR abs/1201.4089 (2012), http://arxiv.org/abs/1201.4089 20. Kutz, O., Lutz, C., Wolter, F., Zakharyaschev, M.: E-connections of abstract description systems. Artif. Intell. 156(1), 1–73 (2004) 21. Matentzoglu, N., Parsia, B.: BioPortal Snapshot 30 March 2017 (data set) (2017), http: //doi.org/10.5281/zenodo.439510 22. Mosurović, M., Krdžavac, N., Graves, H., Zakharyaschev, M.: A decidable extension of SROIQ with complex role chains and unions. J. Artif. Intell. Res. 47, 809–851 (2013) 23. Motik, B., Horrocks, I.: OWL datatypes: Design and implementation. In: Proc. of ISWC-08. LNCS, vol. 5318, pp. 307–322. Springer (2008) 24. Musen, M.A.: The Protégé project: a look back and a look forward. AI Matters 1(4), 4–12 (2015) 25. Parsia, B., Sattler, U., Schneider, T.: Easy keys for OWL. In: Proc. of OWLED-08EU. CEUR- WS.org, vol. 432 (2008) 26. Schmidt, R.A., Tishkovsky, D.: Using tableau to decide description logics with full role negation and identity. ACM Trans. Comput. Log. 15(1), 7:1–7:31 (2014) 27. Serafini, L., Tamilin, A.: Composing modular ontologies with Distributed Description Logics. In: Stuckenschmidt et al. [28], pp. 321–347 28. Stuckenschmidt, H., Parent, C., Spaccapietra, S. (eds.): Modular Ontologies: Concepts, Theo- ries and Techniques for Knowledge Modularization, LNCS, vol. 5445. Springer (2009) 29. Stuckenschmidt, H., Schlicht, A.: Structure-based partitioning of large ontologies. In: Stuck- enschmidt et al. [28], pp. 187–210 30. Suntisrivaraporn, B.: Module extraction and incremental classification: A pragmatic approach for EL ontologies. In: Proc. of ESWC-08. LNCS, vol. 5021, pp. 230–244. Springer (2008) 31. ten Cate, B., Segoufin, L.: Unary negation. Logical Methods in Computer Science 9(3) (2013) 32. Tsarkov, D.: Improved algorithms for module extraction and atomic decomposition. In: Proc. of DL-12. CEUR-WS.org, vol. 846 (2012)