=Paper= {{Paper |id=Vol-2211/paper-19 |storemode=property |title=Ontology Partitioning Using E-Connections Revisited |pdfUrl=https://ceur-ws.org/Vol-2211/paper-19.pdf |volume=Vol-2211 |authors=Sascha Jongebloed,Thomas Schneider |dblpUrl=https://dblp.org/rec/conf/dlog/Jongebloed018 }} ==Ontology Partitioning Using E-Connections Revisited== https://ceur-ws.org/Vol-2211/paper-19.pdf
    Ontology Partitioning Using E-Connections Revisited

                        Sascha Jongebloed and Thomas Schneider?

               Department of Computer Science, University of Bremen, Germany
                           {sasjonge,ts}@cs.uni-bremen.de



        Abstract. We revisit the technique for partitioning ontologies using E-connections
        introduced by Cuenca Grau et al. (2006). We extend the underlying notions of
        E-connections and partitionings to the latest OWL 2 ontology language, with the
        exception of the universal role, which cannot be accommodated for reasons that we
        explain. Besides presenting a simplified notation for the theoretical foundations,
        we devise a new algorithm for computing partitions which, in contrast to the origi-
        nal quadratic-time algorithm, can be implemented in linear time, is deterministic,
        and admits simple rigorous correctness and maximality proofs. We further discuss
        possible extensions beyond OWL.


1    Introduction

Modularity is an important paradigm for ontology development and use, which has
received much attention in the past decade. Modular ontologies are usually easier to
maintain, comprehend, and reason over, and extracting a module of a given ontology
is an important task in scenarios where only a part of the knowledge in that ontology
is to be reused efficiently. If an ontology is given as a monolithic entity, the task of
decomposing it into modules is the first step towards making it modular and being able
to enjoy the previously mentioned benefits.
     Partitionings based on E-connections (henceforth: E-partitions) have been developed
for the purpose of automatically and efficiently decomposing an ontology [6,9,10]. The
result is a graph whose nodes are (pairwise disjoint and mutually covering) components
(i.e., subsets) of the ontology, and whose edges represent “semantic links” between the
components in the style of E-connections [20]. The adoption of the E-connection frame-
work ensures that the resulting partitions provide strong logical guarantees; thus (certain
combinations of) the components are encapsulating modules of the input ontology [10].
     E-Connections were defined for abstract description systems (ADSs), a notion that
generalises many DLs, modal logics, and further formalisms [3]. An E-connection is a
combination of heterogeneous logical theories via semantic links established by a desig-
nated set of relations, called link relations. Their semantics is given by interpretations
that consist of pairwise disjoint components and which interpret each component locally
and the link relations as relations between the respective components. In contrast to the
general nature of E-connections, E-partitions and the underlying framework have been
defined specifically for the DL SHOIQ(D), a fragment of the latest OWL 2 ontology
language. The partitioning procedure starts from a monolithic ontology O and attempts
?
    Supported by DFG project SCHN 1234/3.
to turn it into an (as fine as possible) E-connection by identifying link relations among
the roles in O. An efficient algorithm for computing E-partitions is given in [6,9,10]:
is traverses the subconcepts of the input ontology, and assigns to the current concept
either a fresh component (if this is witnessed by some freshly identified link relation)
or an existing component (when necessary for conformance with the definition of E-
connections). When all concepts are dealt with, the assignment of components induces
a distribution of the ontology’s axioms over the components, constituting the partition.
To ensure that the resulting E-partition and the input ontology are equivalent under the
E-connection semantics, an additional condition has to be imposed on the input ontology,
which was called safety in [6,9,10] and coincides with domain-independence (DI) known
from first-order logic and database theory [1]. Contrary to DI in first-order logic, DI for
SHOIQ(D) is decidable via a neat syntactic criterion called locality [10].
     The partitioning algorithm was implemented as an experimental feature of the (dis-
continued) ontology editor Swoop.1 In initial experiments [10], some ontologies admitted
useful E-partitions, and some of those revealed modelling deficiencies. However, some
modelling patterns – e.g., the use of few top-level concepts or exhaustive declaration of
disjoint concepts or distinct individuals – are notoriously problematic: they admit only
coarse E-partitions, which follows directly from the definitions underlying E-connections.
This may be one of the reasons for the limited success of E-partitions. Another reason
certainly lies in the preliminary nature of the existing implementation: it is incomplete
(not all input axioms are preserved), incorrect (sometimes axioms are spuriously assigned
to distinct components), and nondeterministic (we sometimes obtained different results,
re-running the algorithm on the same input). Hence some observed “poor” E-partitions
might be due to a bug in the code rather than a “feature” of the general approach.
     When trying to separate bugs from “features”, we found a considerably simpler
way of computing E-partitions, based on the idea of creating an undirected graph G
whose edges connect concepts and/or roles that must be part of the same component, and
reading the minimal E-partition off G’s connected components. We were able to extend
the underlying framework to most of OWL 2, with the only exception of the universal
role u. This exception is unavoidable: with u, locality no longer characterises domain-
independence, thus destroying the guarantee that the resulting E-partition and the input
ontology are equivalent. We consider this exception insignificant because u “typically
plays a minor role in modelling” [19], confirmed by a large corpus of ontologies.2
Contributions and overview. Our new approach offers the following advantages over
the original one:
 – Ready applicability to the latest OWL 2 standard, under two restrictions ensuring
   equivalence (domain-independence, absence of the universal role)
 – A simplified notation of the theoretical foundations
 – A new, simpler partitioning algorithm which is deterministic – i.e., does not depend
   on the traversal order of concepts and axioms – and can be implemented to run in
   linear time (the original one is quadratic [10])
 – Simple rigorous proofs that the algorithm is correct and outputs the maximal E-
   connection that is equivalent to the input ontology under the E-connection semantics
 1
     https://github.com/ronwalf/swoop
 2
     In last year’s BioPortal corpus [21], u occurs in only 322 of all 16.3 million axioms.
Given these advantages, we expect that the algorithm is easy to implement and to evaluate
on an up-to-date ontology corpus. Hence, the work reported here is in progress. We
briefly introduce OWL (§2), define our extension of E-connections and E-partitions to
SROIQ (§3), present the new algorithm (§4), discuss extensions to OWL and beyond
(§5), and conclude in §6. Implementation and tests are subject to future work.
    Due to space restrictions, we provide most proofs and further examples in a technical
report: http://www.informatik.uni-bremen.de/tdki/research/papers.html
Related work. Numerous module extraction and modularisation approaches are known
[18,30,7,8,11,29,12], as well as ontology languages supporting modular development
[5,27]. Several decomposition approaches have been developed, some of which provide
strong logical guarantees (such as encapsulation), e.g., signature decomposition [17],
partitionings based on E-connections [10,8], and atomic decomposition [11]; and others
are rather based on statistical parameters, such as structure-based partitioning [29,2].


2    Preliminaries

The ontology language OWL is based on the DL SROIQ [15]. OWL additionally
provides datatypes [23] and keys [25], whose treatment we defer to Section 5. Here and
in Section 3, we consider SROIQ without the universal role (see above). We give only
a short overview of the syntax and refer to [19] for semantics and further details. We
denote concept names with A, B, . . . , complex concepts with C, D, . . . , role names with
r, s, . . . , complex roles (role names r or inverses r− ) with R, S , . . . , individual names
with a, b, . . . , and axioms with α, β, . . . . Concepts and axioms are built according to the
following grammars, where m ∈ N and the remaining letters are as explained above.

         C ::= A | ¬C | C u D | >m R.C | ∃R.Self | {a}
         α ::= C v D | C ≡ D | R v S | R ≡ S | Disjoint(R, S ) | R1 ◦ R2 v S |
               C(a) | R(a, b) | a ≈ b | a 0 b,

The remaining operators >, ⊥, t, ∃R, ∀R, 6n R and axiom types (role transitivity, symme-
try, etc.) are “syntactic sugar”; hence we do not treat them explicitly, thus simplifying the
presentation without losing generality. We furthermore ignore OWL’s global restrictions
(regularity, restricted use of non-simple roles) as E-connections and partitionings do not
rely on them. An ontology is a set of axioms.


3 E-Connections and Partitionings for SROIQ

The definitions in this section largely follow those in [6,9]; however, we simplify notation
to allow, as we believe, a more concise presentation. The differences and their motivation
are explained at the end of this section. In the following, we work with a fixed signature,
which is a finite set Σ of concept names, role names, and individual names (together:
terms), i.e., Σ = ΣC ] ΣR ] ΣI . Here and in the following, ] denotes disjoint union.
    Given a natural number n ≥ 1, an n-numbering of Σ is a function ν that assigns to
each concept and individual name a number ν(A), ν(a) ∈ {1, . . . , n} and to each role name
a pair ν(r) ∈ {1, . . . , n} × {1, . . . , n} of numbers. For what follows, the order between the
numbers is irrelevant; an arbitrary finite index set I instead of {1, . . . , n} would suffice.
    Numberings are extended to arbitrary concepts and axioms inductively:
Definition 1. Let ν be an n-numbering of Σ.
 – ν is extended to arbitrary roles R and concepts C as follows.
    1. If R = r, then ν(R) = ν(r).
    2. If R = r− , and ν(r) = (i, j), then ν(R) = ( j, i).
    3. If C = ¬D, then ν(C) = ν(D).
    4. If C = D u E and ν(D) = ν(E), then ν(C) = ν(D).
    5. If C = >m R.D and ν(R) = (i, j) and ν(D) = j, then ν(C) = i.
    6. If C = ∃R.Self and ν(R) = (i, i), then ν(C) = i.
    7. If C = {a}, then ν(C) = ν(a).
   A concept C is called an i-concept if ν(C) is defined3 and ν(C) = i.
 – ν is further extended to axioms as follows. An i-axiom is of the form
    8. C v D or C ≡ D where ν(C) = ν(D) = i;
    9. R v S , R ≡ S , or Disjoint(R, S ) where ν(R) = ν(S ) = (i, j);
   10. R1 ◦ R2 v S where ν(R1 ) = (i, j), ν(R2 ) = ( j, k), and ν(S ) = (i, k);
   11. C(a) where ν(C) = ν(a) = i;
   12. R(a, b) where ν(a) = i4 and ν(R) = (ν(a), ν(b));
   13. a ≈ b or a 0 b where ν(a) = ν(b) = i.
The central notion of an ontology consisting of several parts depends on ν:
Definition 2. Given an n-numbering ν, a ν-ontology is a tuple O = (O1 , . . . , On ), each
of whose components Oi is a nonempty set of i-axioms.
To distinguish (monolithic) ontologies O from (combined) ν-ontologies O, we sometimes
call the former simple ontologies.
    The semantics of ν-ontologies is given by ν-interpretations, which are partitioned
according to the numbering of the signature given by ν:
Definition 3. A ν-interpretation is an interpretation (∆I , ·I,ν ) such that
 – ∆I = i≤n ∆Ii with ∆i , ∅ for each component ∆i
          U
 – A ⊆ ∆Ii for all A ∈ ΣC with ν(A) = i
     I,ν

 – rI,ν ⊆ ∆Ii × ∆Ij for all r ∈ ΣR with ν(r) = (i, j)
 – aI,ν ∈ ∆Ii for all a ∈ ΣI with ν(a) = i
ν-interpretations are used for two purposes: (a) for defining the semantics of simple
ontologies given ν, and (b) for defining the semantics of ν-ontologies, which is the usual
semantics of E-connections [20]. In case (a), we use the standard DL semantics and the
usual satisfaction symbol |=. We say that O has a ν-model if there is a ν-interpretation
I |= O. The class of ν-models of O is contained in the class of its (unrestricted) models.
For example, if O contains > v A, then all ν-models have a single component; thus,
if ν is an n-numbering with n ≥ 2, then O has no ν-models but may still be consistent
(i.e., have unrestricted models). For purpose (b), the interpretation function is extended
ensuring that the extension of arbitrary i-concepts is a subset of ∆Ii . For this purpose,
(only) the case of negation has to differ from the standard semantics.
 3
     ν(C) is undefined, e.g., for concepts ∃r.D with ν(r) = (i, j) and ν(D) , j.
 4
     This choice is arbitrary; it could also be “ν(b) = i”. Analogously for item 10.
Definition 4. The interpretation function of a ν-interpretation I is extended to arbitrary
ν-concepts as follows.

 – (¬C)I,ν = ∆Ii \ C I,ν for i-concepts C
 – (C u D)I,ν = C I,ν ∩ DI,ν
 – (>m R.C)I,ν = {d ∈ ∆Ii | #{e ∈ ∆Ij | (d, e) ∈ RI and e ∈ C I,ν } ≥ m} for (i, j)-roles
   R and j-concepts C
 – (∃R.Self)I,ν = {d ∈ ∆Ii | (d, d) ∈ RI } for (i, i)-roles R
 – {a}I,ν = aI,ν

Satisfaction of i-axioms in ν-interpretations is defined as follows.

 – I |=ν C v D if C I,ν ⊆ DI,ν
 – I |=ν C ≡ D if C I,ν = DI,ν
 – I |=ν C(a) if aI ∈ C I,ν
 – For i-axioms α of all other types (role inclusion, equivalence, disjointness, assertion,
   or individual (in)equality), I |=ν α if I |= α.

I is a model of a ν-ontology O, written I |=ν O, if I |=ν α for all axioms α in O.
O is consistent if it has a model.

The correspondence between simple and ν-ontologies is captured by compatibility and
equivalence. The former notion is syntactic and only requires that the components of O
partition O. The latter is semantic and relative to ν-interpretations.
Definition 5. Let O be an ontology, ν an n-numbering, and O = (O1 , . . . , On ) a ν-
ontology.

 1. O and O are compatible, written O ∼ O, if O = i≤n Oi .
                                                   S
 2. O and O are equivalent, written O ≈ O if, for all ν-interpretations I, it holds that
    I |= O iff I |=ν O.

Unsurprisingly, compatibility does not imply equivalence, and neither does the converse
hold. The latter is trivial: if O = {A v B} and O = {¬B v ¬A}, then O ≈ O but O / O.
For the other direction, take O = {¬A v A0 , B v B0 } and O = ({¬A v A0 }, {B v B0 }).
Then O is well-formed according to Definitions 1 and 2 and O ∼ O, but O 0 O
because O has ν-models I with 2 components but no such I is a model of O. The
reason is that the axiom ¬A v A0 cannot be satisfied if the extension of both A and A0
is restricted to only one component, as required by Definition 3. As a consequence of
this observation, an additional assumption has to be made, which is the DL equivalent of
domain-independence known from the first-order and database worlds [1]. For (most of)
SROIQ, this assumption admits an efficiently decidable syntactic characterisation. For
domain-independent ontologies, compatibility implies equivalence; see Theorem 9.
Definition 6. A concept C (axiom α) is domain-independent (DI) if C I = C J (I |= α
iff J |= α) for all interpretations I, J with X I = X J for all terms X. An ontology is DI
if so are all its axioms.
The syntactic characterisation from [6,9,10] is the following.
Definition 7. The set of local concepts is defined inductively as follows.
  – Every concept name is local.
  – ¬C is local if C is not.
  – C u D is local if C or D is local.
  – >m R.C, ∃R.Self, and {a} are local.
An ontology O is safe if the following hold.
  – For all axioms C v D, if D is local, then so is C.
  – For all axioms C ≡ D, C is local iff so is D.
Given an interpretation I and a set S , we write J = I ] S if ∆J = ∆I ] S and X J = X I
for all terms X.
Theorem 8 ([6,9,10]). For all concepts C and ontologies O:
 1. C is DI iff C is local.
 2. If C is DI, then C J = C I for all I and J = I ] S .
 3. If C is not DI, then C J = C I ∪ S for all I and J = I ] S .
 4. O is DI iff O is safe.
Not only does Theorem 8 provide a syntactic characterisation of DI; with Definition 7 it
also yields a linear-time decision procedure. Here it becomes clear why the universal role
u (with the semantics uI = ∆I × ∆I ) cannot be accommodated by the framework: the
concept C = ∃u.A is not DI but violates point 3 in Theorem 8: there are interpretations
I and J = I ] S with C J = C I ∪ S (e.g., ∆I = AI = {d} and S = {e}); but if AI = ∅
then C J = C I = ∅. However, point 3 is an essential ingredient not just in the syntactic
characterisation witnessing decidability of DI, but also in the following theorem linking
compatibility and equivalence. Its proof completes the proof of [6, Theorem 7.2].
Theorem 9. Let O be an ontology, ν an n-numbering, O = (O1 , . . . , On ) a ν-ontology.
 1. (a) If O is DI and O ∼ O, then O ≈ O.
    (b) If additionally O is consistent, then so is O.
 2. If O is consistent and not DI, and O ∼ O and O ≈ O, then n = 1, i.e., O = O.
Main differences in notation compared to [6,9,10]. n-numbered signatures correspond
to “partitioned vocabularies” in [6,9]. We consider it easier to carry along the parameter
ν instead of a partitioned vocabulary with its rather long denotation. Numberings also
extend more naturally to complex concepts and axioms. Consequently, “combined
knowledge bases” (a special case of E-connections) are now called ν-ontologies.
    ν-interpretations conflate “partitioned interpretations” and “combined interpreta-
tions”, making “corresponding interpretations” redundant. Consequently we distinguish
two semantics: I |= O (the standard semantics for simple ontologies) and I |=ν O (the
E-connection semantics for ν-ontologies).
    Equivalence replaces “semantic compatibility” because it indeed captures equiva-
lence under the class of ν-interpretations. As per the usual understanding of equivalence,
O is not required to have (ν-)models. The implication “if O consistent, then also O”
(under certain assumptions) is now captured by Theorem 9. As a result we reduced
“syntactically compatible” to “compatible”.
    Finally, domain-independence replaces “invariance under domain expansions”. We
decided to reflect the relationship with the fundamental first-order notion.
4      The New Partitioning Algorithm
We now present the partitioning algorithm, based on the preceding definitions. As
in [6,9,10], our algorithm receives as input an ontology O and returns a ν-ontology
O = (O1 , . . . , On ) such that O ∼ O and n is maximal with this property. If O is domain-
independent, O ≈ O follows by Theorem 9. The numbering ν is not computed by the
algorithm but is implicit in the created data structure.
     Our algorithm determines n, the Oi , and the implicit ν by “gathering” constraints
between the ν-values of the concepts and roles occurring in O in an undirected graph G.
Let sub(O) be the set of concepts (atomic or complex) occurring in O. The graph G has
one node for each concept in sub(O) or individual occurring in O, and two nodes r0 , r1
per role in O. The edges represent the constraints imposed by Definition 1. For example,
given the axioms α = A v ∃r.B and β = B v B0 , then G has nodes A, ∃r.B, r0 , r1 , B, B0 ,
and α, β induce edges {A, ∃r.B}, {∃r.B, r0 }, {r1 , B}, {B, B0 }, representing cases 8 and 5 of
Definition 1. The edges {A, ∃r.B} and {B, B0 } are labelled α and β, respectively. Now
G has 2 connected components (CCs): G1 with nodes A, ∃r.B, r0 and label α; G2 with
r1 , B, B0 and β. Hence O = ({α}, {β}). The 2-numbering follows from membership in the
Gi : A, ∃r.B are 1-concepts, r is a 12-role, and B, B0 are 2-concepts.
     This procedure is given by the main routine partition(O) of Algorithm 1. It first
creates the graph G and then adds all edges induced by the structure of the concepts
(subroutine addSubConceptEdges) and axioms (addAxiomEdges) in O to G. For each
role R, both subroutines use the notation Ri , which equals ri if R = r and r1−i if R = r− ,
for i = 0, 1. Additionally, addAxiomEdges labels, for each axiom α, one of the created
edges with α. Next, the CCs of G are determined (e.g., via breadth-first search). Finally,
the partitioning is read off the axiom labels in the CCs. Not all CCs need to be labelled
with some axiom: e.g., if B v B0 is omitted from the above example, we will get the
same G1 , G2 , but G2 will not contain any axiom label. Therefore, the partitioning of O is
determined using only those CCs with ≥ 1 axiom label. We revisit this case below.

4.1     Correctness
Now we want to show that the algorithm returns a combined ontology O that is compati-
ble with the input ontology O. With Theorem 9 we also have that O and O are equivalent,
provided that O is domain-independent.
Theorem 10. If O = partition(O), then O is a ν-ontology for some ν, and O ∼ O.
Proof. O ∼ O follows directly from lines 4–8 of partition. For the existence of ν, let
G = (V, E, L) be the graph created in lines 1–4 of partition, G1 , . . . , Gn the CCs with
≥ 1 axiom label and Gn+1 , . . . , Gm with m ≥ n the remaining CCs. Let ν be a function
assigning a number ≤ n to each node in G and each axiom in O:
    – for every i ≤ n: ν(x) = i for all nodes x in Gi
    – for every i > n: fix some j ≤ n and let ν(x) = j for all nodes in x in Gi
    – for all edges {x, y} in Gi with L({x, y}) = α: ν(α) = i
ν contains an n-numbering of Σ if we additionally set:
    – for all roles r ∈ ΣR : ν(r) = (ν(r0 ), ν(r1 ))
 Algorithm 1: Partitioning an ontology O
 1    Function partition(O):
         input : O with signature Σ
         output : ν-ontology O
 2       V ← {C | C ∈ sub(O)} ∪ ΣI ∪ {r0 , r1 | r ∈ ΣR };      E ← ∅;      L ← ∅; G ← (V, E, L)
 3       forall C ∈ sub(O) do addSubConceptEdges(G, C)
 4       forall α ∈ O      do addAxiomEdges(G, α)
 5       {G1 , . . . , Gn } ← all connected components (CCs) of G with ≥ 1 axiom label
 6       forall i ≤ n do Oi ← {α | L(v, v0 ) = α for some edge (v, v0 ) in Gi }
 7       O ← (O1 , . . . , On )
 8       return (O)
 9  Function addSubConceptEdges(G, C):
 10    switch C do
 11        case ¬D      do E ← E ∪ {C, D}
 12        case D u F do E ← E ∪ {{C, D}, {C, F}}
 13        case >m R.D do E ← E ∪ {{C, R0 }, {R1 , D}}
 14        case ∃R.Self do E ← E ∪ {{C, R0 }, {C, R1 }}
 15        case {a}     do E ← E ∪ {{C, a}}

16  Function addAxiomEdges(G, α):
 17    switch α do
 18        case C v D or C ≡ D do E ← E ∪ {C, D};                                       L(C, D) ← α
 19        case R v S , R ≡ S or Disjoint(R, S ) do
 20                                  E ← E ∪ {{R0 , S 0 }, {R1 , S 1 }};                L(R0 , S 0 ) ← α
 21          case R ◦ S v T          do E ← E ∪ {{R1 , S 0 }, {R0 , T 0 }, {S 1 , T 1 }}; L(R0 , T 0 ) ← α
 22          case C(a)               do E ← E ∪ {C, a};                                   L(C, a) ← α
 23          case R(a, b)            do E ← E ∪ {{a, R0 }, {R1 , b}};                     L(a, R0 ) ← α
 24          case a ≈ b or a 0 b     do E ← E ∪ {{a, b}};                                 L(a, b) ← α


It remains to show that ν respects Definition 1.
 1. ν is a n-numbering of Σ = sig(O).
 2. ν on arbitrary concepts is an extension of ν on Σ that respects Definition 1, i.e., every
    C ∈ sub(O) is a ν(C)-concept. This is an easy induction on the structure of C.
      – C = ¬D: By line 11 of of the algorithm G contains the edge {C, D}. Hence C, D
         are in the same CC. Then ν(C) = ν(D).
      – C = >m R.D: By line 13, G contains the edges {C, R0 }, {R1 , D}. Hence C, R0 are
         in the same CC, and so are R1 , D. Then ν(C) = ν(R0 ) = i, ν(D) = ν(R1 ) = j and
         ν(R) = (i, j).
      – All other cases are analogous.
 3. ν(α) respects the second part of Definition 1, i.e., every α ∈ O is a ν(α)-axiom. This
    is an easy case distinction on the type of α.
      – α = C v D: By line 18, G contains the edge {C, D} with L({C, D}) = α. Hence
         C, D and {C, D} are in the same CC and ν(C) = ν(D) = ν(α).
      – All other cases are anologous.                                                     o
4.2   Maximality

The result of the algorithm is the maximal O with O ∼ O in the following sense. Let
O = (O1 , . . . , On ), O0 = (O10 , . . . , Om0 ) be a ν- and a ν0 -ontology with i≤n Oi = i≤m Oi0 .
                                                                                 S        S
We write O  O0 (“O is at least as coarse as O0 ”) if for every Oi there is an Oi0 s.t.
Oi ⊆ Oi0 . In other words: if O  O0 then every Oi0 is the union of one or several Oi .

Theorem 11. If O = partition(O) then, for every O0 with O ∼ O0 , we have O  O0 .

Proof. Let G = (V, E, L) the graph created in lines 1–4 of partition and O =
(O1 , . . . , On ); furthermore let O0 = (O10 , . . . , Om0 ) be a ν0 -ontology. It suffices to show:

                        For all i ≤ n and all α, β ∈ Oi : ν0 (α) = ν0 (β)                        (∗)

To prove (∗) we will first show an auxiliary property, which involves the obvious
extension of ν0 to all nodes in G; in particular, if r is a role name with ν0 (r) = (i, j), then
ν0 (r0 ) := i and ν0 (r1 ) := j.
Claim 1. For every x, y ∈ V: if x, y are in the same CC of G, then for all numberings ν0
of Σ we have ν0 (x) = ν0 (y).
Claim 1 is proven via induction on the distance between x, y in their CC. The base case
x = y is trivial. For the induction step, suppose that the claim holds for x, x0 (by IH) and
there is an edge between x0 and y. Now ν0 (x0 ) = ν0 (y) can be shown via a straightforward
case distinction on the creation of the edge in the functions addSubConceptEdges and
addAxiomEdges together with Definition 1. Hence ν0 (x) = ν0 (y).
We can now prove (∗) proceeding by axiom types and analysing the respective cases of
addAxiomEdges. We just show one case; the remaining ones are very similar.

  – α = C1 v D1 and β = C2 v D2 : Since α, β are in the same CC, then so are C1 , C2
    (by construction of ν and G). Claim 1 implies ν0 (C1 ) = ν0 (C2 ) and by Definition 1:
    ν0 (α) = ν0 (β).                                                                    o


4.3   Complexity

Let O be the input ontology, and G = (V, E, L) the graph created in lines 1–4 of
partition. Set k = |O|, ` = |sub(O)| and m = |Σ|. We first consider the size of the
created graph. The number of nodes is restricted by ` + 2m (2 nodes per role). It is
easy to see that addAxiomEdges is called k-times and each call adds at most 3 edges.
addSubConceptEdges adds at most 2 edges and is called ` times. Hence the number of
edges is limited by 3k + 2`. Each call to addAxiomEdges and addSubConceptEdges
is in constant time. The connected components of G can be calculated using breadth-first
search in O(|V| + |E|) = O(k + ` + m) [13]. The components with ≥ 1 axiom label can
be found going through all edges in O(|E|) = O(` + m). To collect the axioms for each
component in line 5–6 every edge needs to be checked once. Therefore, this step is also
in O(|E|) = O(` + m). Altogether the partition algorithm takes linear time, limited by
O(k + ` + m), in contrast to [6, Theorem 7.2], which is quadratic.
4.4   Discussion

We conclude this section with remarks concerning the implicit labelling of terms in the
input ontology, the treatment of >, and the deterministic character of our algorithm.
    partition(O) creates a ν-ontology O with O ∼ O. The corresponding numbering
ν is induced by the CCs of the created graph G. As we have already pointed out in the
description of the algorithm, some CCs may not contain any axiom label. In this case
the numbering of the “unlabelled” CCs is arbitrary, as seen in the proof of Theorem 10.
If a unique numbering is required, e.g., in order to assign “home components” in O to
the terms in O, then user intervention is required: suppose O = {A v ∃r.B, A0 v ∃r0 .B}.
Then there are 3 CCs: G1 with nodes A, r0 ; G2 with A0 , r00 ; G3 with r1 , r10 , B. Now G3 is not
axiom-labelled, and so r1 , r10 , B could be assigned to either G1 or G2 . Only the user can
make that decision, and they need to know the respective subdomains: if the axioms are
rewritten into HappyChild v ∃hasPet.Puppy and HappyDog v ∃hasChild.Puppy,
then G3 should be merged with G1 .
    Algorithm 1 does not treat > explicitly. The required additions are straightforward but,
from a theoretical point of view, unnecessary since > can be rewritten as At¬A, as usual.
In that case, a fresh concept name A should be used for each occurrence of >; Otherwise
nodes and the corresponding subdomains might be spuriously connected, e.g., hasToy1
and hasFood1 if O contains HappyChild v ∃hasToy.> and HappyDog v ∃hasFood.>
    Compared with the previous algorithms [6,9,10], neither partition nor its sub-
routines make any nondeterministic choices. In particular, the computed partition is
unique up to permutation of the components because the generated graph G does not
depend on the order in which concepts and axioms are traversed. This last property is
much less obvious in the previous algorithms, which contain a nondeterministic choice
without a rigorous argument that the result is still deterministic (and we did observe a
nondeterministic behaviour of the prototype implementation).


5     Extensions to the Language

It is straightforward to add datatypes and keys. For datatypes this is shown in [6,9,10].
In essence, ν-interpretations require an additional component where all datatypes are
interpreted; data properties are link relations directed towards the datatype component.
Definitions, proofs, and algorithm require straightforward additional cases for concepts
and axioms involving datatypes. Keys [25] are captured by one dedicated axiom type,
whose semantics guarantees domain-independence. It suffices to add the respective case
to definitions and algorithm. For details, see the appendix of the technical report.
     As remarked after Theorem 8, the universal role u would compromise the syntactic
characterisation of domain-independence and thus the link between compatibility and
equivalence. This problem does not occur with the > concept, as discussed in §4.4.
     We can pinpoint even more exactly the limits of partitioning based on E-connections:
if role negation and disjunction were available, then u = r t ¬r for a fresh role name
r. The “culprit” for the problem with C = ∃u.A as described after Theorem 8 is role
negation; in fact, that argument remains valid if we instead consider C = ∃¬r.A. Negated
roles compromise only the syntactic characterisation of domain-independence and thus
the link between ∼ and ≈; E-connections themselves should be extensible to first-order
(even higher-order) logic.
    If we forbid negation, we can even afford positive Boolean operators on roles
and allow complex role expressions built using u, t, ◦: those are always domain-
independent and can thus safely be used in number restrictions and all role axioms.
Hence E-partitions can easily be extended to encompass DLs that allow a more general
use of role composition [22] or (positive) Boolean roles [26].


6     Conclusions and Future Work

We have extended the original approach underlying E-partitions in [6,9,10] to all of
OWL 2 except the universal role (which cannot be accommodated, as shown). We have
presented a new linear-time algorithm for computing the maximal E-connection that
is syntactically compatible (and, assuming domain-independence, equivalent) with the
input ontology. The output of our algorithm does not depend on the traversal order of
axioms and concepts. En route we have introduced a simplified notation. We have further
shown that theory and algorithm extend to expressive operators on roles considered in
the literature, and identified role negation as not amenable to this approach.
    For future work, we expect a straightforward implementation of our algorithm in
the OWL API5 [14] and as a Protégé6 [24] plugin, as a basis for experiments on a
representative up-to-date ontology corpus such as [21]. We conjecture that existing
ontologies generally decompose well when allowing slight deviations from (syntactic)
compatibility, to circumvent the notorious problematic modelling patterns, see §1. The
implementation will also enable a comparison with other decomposition approaches,
such as atomic decomposition [11]. We furthermore plan to revisit module extraction,
extending the existing procedure in [10] to arbitrary E-connections, independently of
a specific partitioning algorithm. Given the linear runtime of our algorithm, module
extraction might even compete in performance with syntactic locality [7,32]. Finally, the
“culprit” of role negation identified in §5 suggests the hypothesis that the partitioning
approach can be transferred to logics that allow only unary negation, such as frontier-one
existential rules [4] or even the unary negation fragment of first-order logic, UNFO [31].
Acknowledgement We thank the anonymous reviewers for their helpful comments.


References
 1. Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases. Addison-Wesley (1995), http:
    //webdam.inria.fr/Alice/
 2. Amato, F., De Santo, A., Moscato, V., Persia, F., Picariello, A., Poccia, S.R.: Partitioning of
    ontologies driven by a structure-based approach. In: Proc. of ICSC-15. pp. 320–323. IEEE
    Computer Society (2015)
 3. Baader, F., Lutz, C., Sturm, H., Wolter, F.: Fusions of description logics and abstract descrip-
    tion systems. J. Artif. Intell. Res. 16, 1–58 (2002)
 5
     http://owlcs.github.io/owlapi/
 6
     https://protege.stanford.edu/
 4. Baget, J., Leclère, M., Mugnier, M., Salvat, E.: Extending decidable cases for rules with
    existential variables. In: Proc. of IJCAI-09. pp. 677–682 (2009)
 5. Bao, J., Voutsadakis, G., Slutzki, G., Honavar, V.: Package-based description logics. In:
    Stuckenschmidt et al. [28], pp. 349–371
 6. Cuenca Grau, B.: Combination and integration of ontologies on the semantic web. Ph.D.
    thesis, Universidad de Valencia (2005)
 7. Cuenca Grau, B., Horrocks, I., Kazakov, Y., Sattler, U.: Extracting modules from ontologies:
    A logic-based approach. In: Stuckenschmidt et al. [28], pp. 159–186
 8. Cuenca Grau, B., Parsia, B., Sirin, E.: Ontology integration using E-Connections. In: Stucken-
    schmidt et al. [28], pp. 293–320
 9. Cuenca Grau, B., Parsia, B., Sirin, E., Kalyanpur, A.: Automatic partitioning of OWL ontolo-
    gies using E-Connections. In: Proc. of DL-05. CEUR-WS.org, vol. 147 (2005)
10. Cuenca Grau, B., Parsia, B., Sirin, E., Kalyanpur, A.: Modularity and web ontologies. In:
    Proc. of KR-06. pp. 198–209. AAAI Press (2006)
11. Del Vescovo, C., Parsia, B., Sattler, U., Schneider, T.: The modular structure of an ontology:
    Atomic decomposition. In: Proc. of IJCAI-11. pp. 2232–2237 (2011)
12. Gatens, W., Konev, B., Wolter, F.: Module extraction for acyclic ontologies. In: Proc. of
    WoMO-13. CEUR-WS.org, vol. 1081 (2013)
13. Hopcroft, J.E., Tarjan, R.E.: Efficient algorithms for graph manipulation [H] (algorithm 447).
    Commun. ACM 16(6), 372–378 (1973)
14. Horridge, M., Bechhofer, S.: The OWL API: A Java API for OWL ontologies. Semantic Web
    2(1), 11–21 (2011)
15. Horrocks, I., Kutz, O., Sattler, U.: The even more irresistible SROIQ. In: Proc. of KR-06.
    pp. 57–67. AAAI Press (2006)
16. Klinov, P., Del Vescovo, C., Schneider, T.: Incrementally updateable and persistent decompo-
    sition of OWL ontologies. In: Proc. of OWLED-12. CEUR-WS.org, vol. 849 (2012)
17. Konev, B., Lutz, C., Ponomaryov, D., Wolter, F.: Decomposing description logic ontologies.
    In: Proc. of KR-10. pp. 236–246. AAAI Press (2010)
18. Konev, B., Lutz, C., Walther, D., Wolter, F.: Semantic modularity and module extraction in
    description logics. In: Proc. of ECAI-08. FAIA, vol. 178, pp. 55–59. IOS Press (2008)
19. Krötzsch, M., Simančik, F., Horrocks, I.: A description logic primer. CoRR abs/1201.4089
    (2012), http://arxiv.org/abs/1201.4089
20. Kutz, O., Lutz, C., Wolter, F., Zakharyaschev, M.: E-connections of abstract description
    systems. Artif. Intell. 156(1), 1–73 (2004)
21. Matentzoglu, N., Parsia, B.: BioPortal Snapshot 30 March 2017 (data set) (2017), http:
    //doi.org/10.5281/zenodo.439510
22. Mosurović, M., Krdžavac, N., Graves, H., Zakharyaschev, M.: A decidable extension of
    SROIQ with complex role chains and unions. J. Artif. Intell. Res. 47, 809–851 (2013)
23. Motik, B., Horrocks, I.: OWL datatypes: Design and implementation. In: Proc. of ISWC-08.
    LNCS, vol. 5318, pp. 307–322. Springer (2008)
24. Musen, M.A.: The Protégé project: a look back and a look forward. AI Matters 1(4), 4–12
    (2015)
25. Parsia, B., Sattler, U., Schneider, T.: Easy keys for OWL. In: Proc. of OWLED-08EU. CEUR-
    WS.org, vol. 432 (2008)
26. Schmidt, R.A., Tishkovsky, D.: Using tableau to decide description logics with full role
    negation and identity. ACM Trans. Comput. Log. 15(1), 7:1–7:31 (2014)
27. Serafini, L., Tamilin, A.: Composing modular ontologies with Distributed Description Logics.
    In: Stuckenschmidt et al. [28], pp. 321–347
28. Stuckenschmidt, H., Parent, C., Spaccapietra, S. (eds.): Modular Ontologies: Concepts, Theo-
    ries and Techniques for Knowledge Modularization, LNCS, vol. 5445. Springer (2009)
29. Stuckenschmidt, H., Schlicht, A.: Structure-based partitioning of large ontologies. In: Stuck-
    enschmidt et al. [28], pp. 187–210
30. Suntisrivaraporn, B.: Module extraction and incremental classification: A pragmatic approach
    for EL ontologies. In: Proc. of ESWC-08. LNCS, vol. 5021, pp. 230–244. Springer (2008)
31. ten Cate, B., Segoufin, L.: Unary negation. Logical Methods in Computer Science 9(3) (2013)
32. Tsarkov, D.: Improved algorithms for module extraction and atomic decomposition. In: Proc.
    of DL-12. CEUR-WS.org, vol. 846 (2012)