=Paper=
{{Paper
|id=Vol-1660/womocoe-paper6
|storemode=property
|title=On Computing Minimal EL-Subsumption Modules
|pdfUrl=https://ceur-ws.org/Vol-1660/womocoe-paper6.pdf
|volume=Vol-1660
|authors=Jieying Chen,Michel Ludwig,Dirk Walther
|dblpUrl=https://dblp.org/rec/conf/fois/ChenL016
}}
==On Computing Minimal EL-Subsumption Modules==
<pdf width="1500px">https://ceur-ws.org/Vol-1660/womocoe-paper6.pdf</pdf>
<pre>
                  On Computing Minimal
                E L -Subsumption Modules
              Jieying CHEN a , Michel LUDWIG b , and Dirk WALTHER b
       a Laboratoire de Recherche en Informatique, Université Paris-Sud, France

                                   jieying.chen@lri.fr
                 b Theoretical Computer Science, TU Dresden, Germany

                           {michel,dirk}@tcs.inf.tu-dresden.de

           Abstract. In the paper we study algorithms for computing minimal modules that
           are minimal w.r.t. set inclusion and that preserve the entailment of all E L -
           subsumptions over a signature of interest. We follow the black-box approach for
           finding one or all justifications by replacing the entailment tests with logical dif-
           ference checks, obtaining modules that preserve not only a given consequence but
           all entailments over a signature. Such minimal modules can serve to improve our
           understanding of the internal structure of large and complex ontologies. Addition-
           ally, several optimisations to speed up the computation of minimal modules are
           investigated. We present an experimental evaluation of an implementation of our
           algorithms by applying them on the medical ontologies Snomed CT and NCI.


1. Introduction

A module is a subset of an ontology that can act as a substitute for the ontology in cer-
tain contexts. A basic requirement on modules is to be indistinguishable from the origi-
nal ontology w.r.t. an inseparability relation. Such basic modules are also called ‘plain’
modules. Further module properties such as self-containment and depletion have been
proposed in the literature [9, 11] (also called weak and strong in [8]). These properties
together with inseparability relations give rise to a family of module notions. Several
inseparability notions have been considered, e.g., model-theoretic inseparability w.r.t. a
signature [9], or inseparability w.r.t. answers to queries [13] (i.e., two ontologies are in-
separable iff they entail the same queries). Popular query types are subsumption, instance
and conjunctive queries. In particular, for E L -TBoxes model-theoretic inseparability
w.r.t. a signature Σ coincides with entailment of second-order sentences over Σ (cf. The-
orem 4 in [9]). We call modules based on model-theoretic inseparability semantic mod-
ules. In this paper, however, we consider a weaker inseparability relation that is based on
subsumption queries between E L -concepts over a given signature. We call the resulting
modules E L -subsumption modules.
     An important requirement on modules is that they should be as small as possible,
which is particularly useful, e.g., in the ontology re-use scenario [4]. As smallest modules
are not necessarily unique, we are interested in computing all basic E L -subsumption
modules that are minimal w.r.t. set inclusion. Computing minimal basic semantic mod-
ules of E L -terminologies that are additionally self-contained and depleting has been
investigated in [8, 9]. Algorithms for computing minimal modules of DL-Lite ontolo-
gies have been studied in [11]. However, to the best of our knowledge, no practical ap-
proach for computing one or all basic E L -subsumption modules of E L -terminologies
has been considered.
     Minimal modules can serve as explanations of the entire set of entailments over a
signature, similar to the justifications for one consequence (i.e., minimal sets of axioms
sufficient to entail the consequence) [5]. In this sense, minimal modules can improve our
understanding of the internal structure of large and complex ontologies. Moreover, being
able to compute all minimal modules allows us to select the smallest minimal module.
     In general, extracting minimal modules is intractable, which is the reason why effi-
ciently extractable approximations of the (union of all) minimal modules have been in-
troduced. Among such approximations are the family of syntactic locality-based mod-
ules [4]. Such modules may contain more axioms than necessary to ensure the preser-
vation of entailments over a signature. For instance, the size of the syntactic ⊥>∗ -
locality modules [16] of Snomed CT (version Jan 2016) for 100 signatures consisting of
50 concept names selected at random together with all roles names ranges from 1 075 to
2 456 axioms. This is in contrast to the size of the minimal basic E L -subsumption mod-
ules for these signatures that ranges from around 50 to 118 axioms. Hence, such mini-
mal modules of Snomed CT may be more than 20 times smaller than the corresponding
syntactic ⊥>∗ -locality modules.
     The system MEX has been introduced to compute minimal depleting semantic mod-
ules (which are unique for a given signature) from acyclic E L -terminologies (possibly
extended with inverse roles) such as Snomed CT. The MEX-modules contain all mini-
mal basic E L -subsumption modules, i.e., the module notion that we are interested in.
The size of the MEX-modules of Snomed CT for the same signatures as above ranges
from 401 to 720 axioms. However, the corresponding minimal basic E L -subsumption
modules are still at least 6 times smaller. Moreover, MEX cannot handle cyclic E L -
terminologies such as some recent versions of NCI. For instance, the size of the syn-
tactic ⊥>∗ -locality module of NCI (version 14.01d) for 100 random signatures selected
from NCI (as before for Snomed CT just with 100 concept names) ranges from 679 to
3 895 axioms, whereas the size of the corresponding minimal basic E L -subsumption
modules ranges from around 0 to 64 axioms. Clearly, the ratio of the size of the syntactic
⊥>∗ -locality based modules compared to the size of the minimal basic E L -subsumption
modules is even larger than 20 in this case. Another approach for extracting minimal
depleting modules from DL-Lite ontologies is based on using QBF-solvers [11].
     In this paper, in order to compute minimal basic E L -subsumption modules we ex-
tend the black-box approach for finding one or all justifications in [5], which is based
on Reiter’s hitting set algorithm [15]. Instead of ensuring that a given entailment is pre-
served, we introduce an oracle to determine the inseparability between the original on-
tology and the resulting module. As an oracle we use a variant of the system CEX, which
is the only currently available tool for deciding whether two E L -terminologies are log-
ically different w.r.t. a signature [7, 10]. Additionally, several optimisations to speed up
the computation of minimal modules are investigated. We present an experimental evalu-
ation of our algorithms by applying them on the prominent and large medical ontologies
Snomed CT and NCI. We note that our algorithms are applicable to ontologies formu-
lated in any ontology language provided that a tool is available that can effectively de-
cide the inseparability relation. As CEX works with variants of E L -terminologies, we
restrict the presentation of our algorithms to E L -terminologies.
     We proceed as follows. We start by reviewing E L -terminologies together with the
notion of logical difference. In Section 3 we define the notion of basic E L -subsumption
module and we introduce algorithms for extracting one or all minimal such modules. In
Section 4 we present the results of an evaluation of our algorithms using Snomed CT and
NCI. We close the paper with a conclusion.


2. Preliminaries

Let NC and NR be mutually disjoint (countably infinite) sets of concept names and role
names. In the following we use upper case letters A, B, X, Y , Z to denote concept names,
and lower case letters r, s stand for role names. The set of E L -concepts C and the set
of E L -inclusions α are built according to the following grammar rules: C ::= > | A |
C uC | ∃r.C and α ::= C v C | C ≡ C, where A ∈ NC and r, s ∈ NR . An E L -TBox T is
a finite set of E L -inclusions. We also refer to E L -inclusions as axioms when they are
contained in an E L -TBox.
     The semantics is defined using interpretations I = (∆I , ·I ), where the domain
∆ is a non-empty set, and ·I is a function mapping each concept name A to a subset
  I

AI of ∆I and every role name r to a binary relation rI over ∆I . The extension CI
of a possibly complex concept C is defined inductively as: (>)I := ∆I , (C u D)I :=
CI ∩ DI , and (∃r.C)I := {x ∈ ∆I | ∃y ∈ CI : (x, y) ∈ rI }.
     An interpretation I satisfies an E L -concept C, an E L -inclusion C v D, or C ≡ D
if CI 6= 0,/ CI ⊆ DI , or CI = DI , respectively. We write I |= α if I satisfies the
axiom α. Note that every E L -concept and E L -inclusion is satisfiable, but a particular
interpretation does not necessarily satisfy a concept or inclusion. An interpretation I
is a model of T if I satisfies all axioms in T . An E L -inclusion α follows from an
E L -TBox T , written T |= α, if for all models I of T , we have that I |= α.
     A signature Σ is a finite set of symbols from NC and NR . The signature sig(ϕ) is the
set of concept and role names occurring in ϕ, where ϕ ranges over any syntactic object.
We set sigNC (ϕ) := sig(ϕ) ∩ NC . The symbol Σ is used as a subscript to a set of concepts
or axioms to denote that the elements only use symbols from Σ, e.g., E L Σ , etc.
     An E L -terminology T is an E L -TBox consisting of axioms of the forms X v C
or X ≡ C, where X is a concept name in NC and C is an E L -concept, and no concept
name X occurs more than once on the left-hand side of an axiom. A terminology is
said to be acyclic if it can be unfolded (i.e., the process of substituting concept names
by the right-hand sides of their defining axioms terminates). Formally, we define the
relation ≺T : sigNC (T ) × sigNC (T ) by setting (X,Y ) ∈ ≺T iff there exists an axiom of
the form X ≡ C or X v C in T such that Y ∈ sig(C). Then, a terminology T is acyclic iff
the transitive closure (≺T )+ of ≺T is irreflexive. For instance, the prominent medical
ontology Snomed CT (version Jan 2016) is an acyclic E L -terminology, whereas the
ontology NCI (version 14.01d) is a cyclic E L -terminology.
     We now recall basic notions related to the logical difference between two E L -
terminologies for E L -inclusions over a given signature as query language [6, 10].

Definition 1 (Logical Difference) Let T1 and T2 be two E L -terminologies, and let Σ
be a signature. The E L -concept inclusion difference between T1 and T2 w.r.t. Σ is the
set cDiff Σ (T1 , T2 ) of all E L -inclusions α of the form C v D for E L -concepts C and D
such that sig(α) ⊆ Σ, T1 |= α, and T2 6|= α.

     Two E L -terminologies T1 and T2 are also called inseparable w.r.t. E L -concept
inclusions over Σ iff cDiff Σ (T1 , T2 ) = 0/ and cDiff Σ (T2 , T1 ) = 0/ [13]. If there exists an
E L -inclusion α such that sig(α) ⊆ Σ, T1 |= α, and T2 6|= α, then the set cDiff Σ (T1 , T2 )
consists of infinitely many concept inclusions.
     For acyclic E L -terminologies T1 and T2 , the version 2.5 of the system CEX [7,10]
can decide whether the set cDiff Σ (T1 , T2 ) is empty. In this paper we use a variant of CEX
that works with cyclic E L -terminologies, implementing a hypergraph-based approach
to the logical difference problem as introduced in [3] and further extended in [12].


3. Minimal Modules

We now give a formal definition of the module notion that we consider in this paper.

Definition 2 (Basic E L -Subsumption Module) Let T be an E L -terminology, and
let Σ be a signature. A subset M ⊆ T is called a basic E L -subsumption module of T
w.r.t. Σ iff for all E L -inclusions α of the form C v D for E L -concepts C and D with
sig(α) ⊆ Σ it holds that T |= α iff M |= α.

     Every subset M of a terminology T that preserves the entailment of all E L -
subsumptions over a given signature Σ is a basic E L -subsumption module of T w.r.t. Σ.
In particular, T itself is a basic E L -subsumption module of T w.r.t. any signature.
It can readily be seen that M is a basic E L -subsumption module of T w.r.t. Σ iff
cDiff Σ (T , M ) = 0/ (cf. Definition 1). We have that M and T are inseparable w.r.t. E L -
inclusions over Σ. More precisely, as M ⊆ T , it holds that T is a conservative exten-
sion of M w.r.t. E L -inclusions over Σ [13]. There may be exponentially many (in the
size of T ) subsets of T that satisfy that criterion (see Example 6). For the use-case of
ontology re-use, however, we are most interested in modules that are as small as pos-
sible [4]. Note that smallest modules (regarding the number of axioms) are also mini-
mal w.r.t. ⊆, whereas the converse does not hold in general, i.e., there may be minimal
modules w.r.t. ⊆ that contain more axioms than other minimal modules w.r.t. ⊆.

Example 3 Let T = {A v X u Y, X v B, Y v Z, Z v B} be an E L -terminology, and
Σ = {A, B} be a signature. It holds that both sets, M1 = {A v X uY, X v B} and M2 =
{A v X uY, Y v Z, Z v B}, are minimal basic E L -subsumption modules of T w.r.t. Σ,
whereas M1 is the smallest minimal basic E L -subsumption module of T w.r.t. Σ as
|M1 | < |M2 |.

     The notion of a justification for a concept inclusion α has been introduced as a mini-
mal subset of a TBox that entails a given concept inclusion [2]. We can understand a min-
imal module as a more general notion of justification: a minimal basic E L -subsumption
module of T w.r.t. Σ is a justification for all the concept inclusions over Σ entailed by T .
     Semantic modules of E L -terminologies that are self-contained or depleting (in fact,
such modules have both properties [9]) can be larger than basic E L -subsumption mod-
ules as introduced in Definition 2.
Example 4 Let T = {A v ∃r.B} be an E L -terminology, and Σ = {A, B} be a signature.
It is easy to verify that T itself is a basic, self-contained, and depleting semantic module
of T w.r.t. Σ [8,9], whereas the empty set is the minimal basic E L -subsumption module
of T w.r.t. Σ.

     The following example extends Example 3 to show that the modules computed by
the system MEX [8] as well as the modules based on syntactic locality can be larger than
minimal basic E L -subsumption modules as introduced in Definition 2. Note that MEX-
modules are semantic modules that are self-contained as well as depleting [9] (equiva-
lently, weak and strong [8]).

Example 5 Let T = {A v X u Y u U, X v B, Y v Z, Z v B, U ≡ V u W } be an E L -
terminology, and Σ = {A, B} be a signature. It holds that both sets, M1 = {A v X u
Y, X v B} and M2 = {A v X uY, Y v Z, Z v B}, are minimal basic E L -subsumption
modules of T w.r.t. Σ. Moreover, MEX outputs M3 = M1 ∪ M2 = {A v X uY uU, X v
B, Y v Z, Z v B} as module of T w.r.t. Σ. Finally, T itself is the ⊥>∗ -local module
of T w.r.t. Σ.

    In general, there can be several minimal basic E L -subsumption modules of an
acyclic E L -terminology for a signature, and even the smallest of such modules are not
necessarily unique. The next example shows a sequence of acyclic E L -terminologies
whose number of minimal basic E L -subsumption modules for a given signature is ex-
ponentially increasing.

Example 6 Let Tn = {A v X0 } ∪ { Xi−1 v Yi u Zi | 1 ≤ i ≤ n } ∪ {Yi v Xi , Zi v Xi | 1 ≤
i ≤ n } ∪ {Xn v B} with n ≥ 0 be E L -terminologies, and let Σ = {A, B} be a signature.
      It holds that the set {A v X0 , X0 v B} is the minimal basic E L -subsumption module
of T0 w.r.t. Σ, the sets {A v X0 , X0 v Y1 u Z1 , Y1 v X1 , X1 v B} and {A v X0 , X0 v Y1 u
Z1 , Z1 v X1 , X1 v B} are the two minimal basic E L -subsumption modules of T1 , and the
sets {A v X0 , X0 v Y1 u Z1 , Y1 v X1 , X1 v Y2 u Z2 , Y2 v X2 , X2 v B}, {A v X0 , X0 v Y1 u
Z1 , Y1 v X1 , X1 v Y2 u Z2 , Z2 v X2 , X2 v B}, {A v X0 , X0 v Y1 u Z1 , Z1 v X1 , X1 v Y2 u
Z2 , Y2 v X2 , X2 v B}, and {A v X0 , X0 v Y1 u Z1 , Z1 v X1 , X1 v Y2 u Z2 , Z2 v X2 , X2 v
B} are the four minimal basic E L -subsumption modules of T2 , etc. In general, it can
readily be verified that Tn has 2n many distinct minimal basic E L -subsumption modules
w.r.t. Σ.

    In the remainder of this section, we present algorithms for computing minimal basic
E L -subsumption modules. In Section 4 we analyse the number of minimal basic E L -
subsumption modules in large medical ontologies for certain signatures. We will simply
write module instead of ‘basic E L -subsumption module’.

3.1. Computing a Single Minimal Module

A first straightforward procedure S INGLE -M INIMAL -M ODULE for computing a mini-
mal module of an E L -terminology T w.r.t. a signature Σ is given in Algorithm 1.1 The
procedure operates as follows. First, the variable M is initialised with T . Subsequently,
  1 A similar algorithm for DL-Lite ontologies has already been described in Theorem 67 of [11].
the procedure iterates over every axiom α ∈ T and checks whether cDiff Σ (T , M \
{α}) = 0,/ in which case the axiom α is removed from M . During the execution of the
while-loop the set M is hence shrunk by removing axioms that do not lead to a logical
difference until a minimal module of T for Σ remains.

Algorithm 1 Computing a Single Minimal Module w.r.t. a Signature
INPUT: E L -terminology T , signature Σ
 1: function S INGLE -M INIMAL -M ODULE(T , Σ)
 2:    M := T
 3:    for every axiom α ∈ T do
 4:        if cDiff Σ (T , M \ {α}) = 0/ then
 5:            M := M \ {α}
 6:        end if
 7:    end for
 8:    return M
 9: end function


     Note that the minimal module that is extracted by Algorithm 1 depends on the or-
der in which axioms were chosen during the iteration (Line 3), i.e. by iterating over
the axioms in a different order one can potentially obtain a different minimal module.
Moreover, one can show that all minimal modules can be computed by using all possible
orderings on the axioms α ∈ T in the for-loop in Line 3.
     It is easy to see that Algorithm 1 always terminates and that it runs in polynomial
time in the size of T and Σ since deciding the existence of a logical difference between
E L -terminologies can be performed in polynomial time in the size of T and Σ.
     Regarding correctness, if we assume towards a contradiction that a set Mmin ⊆ T
computed by Algorithm 1 applied on T and Σ is not a minimal module of T w.r.t. Σ,
then there would exist an axiom α ∈ M such that cDiff Σ (T , Mmin \ {α}) = 0. / However,
when α was analysed in the for-loop in Line 3, cDiff Σ (T , M 0 \ {α}) must have been
empty as well by monotonicity of |=, where M 0 with Mmin ⊆ M 0 represents the value
of the variable M in Algorithm 1 at the time α was inspected. Consequently, it would
hold that α 6∈ Mmin and we have derived a contradiction. We hence obtain the following
result.

Theorem 7 Let T be an E L -terminology and let Σ be a signature. Then Algorithm 1
applied on T and Σ computes a minimal module of T for Σ.

     As checking the existence of a logical difference can be costly in practice, we now
introduce a refinement of the previous algorithm that potentially allows it to reduce the
number of logical difference checks that are required for computing a minimal module.
The refined procedure S INGLE -M INIMAL -M ODULE -B UBBLE is shown in Algorithm 2.
     Intuitively, instead of checking whether the removal of a single axiom leads to a
logical difference, the refined procedure removes a set B of axioms from T at once.
Such a set B is also called a bubble. As an additional optimisation we introduce the
notion of logical difference core, which will become relevant in the context of computing
all minimal modules when the algorithm for computing one minimal module has to be
executed frequently.
Algorithm 2 Computing a Single Minimal Module w.r.t. a Signature using Axiom Bubbles
INPUT: E L -terminology T , signature Σ, n ≥ 1, logical difference core C ⊆ T w.r.t. Σ
 1: function S INGLE -M INIMAL -M ODULE -B UBBLE(T , Σ, n, C )
 2:    M := T
 3:    Q := S PLIT(T \ C , n)
 4:    while Q 6= [ ] do
 5:        B := H EAD(Q)
 6:        Q := TAIL(Q)
 7:        if cDiff Σ (T , M \ B) = 0/ then
 8:            M := M \ B
 9:        else
10:            if |B| > 1 then
11:                 (Bl , Br ) := S PLIT H ALF(B)
12:                 Q := Bl :: Br :: Q
13:            end if
14:        end if
15:    end while
16:    return M
17: end function


Definition 8 (Logical Difference Core) Let T be an E L -terminology and let Σ be a
signature. A subset C ⊆ T is said to be a logical difference core of T w.r.t. Σ iff for
every α ∈ C it holds that cDiff Σ (T , T \ {α}) 6= 0.
                                                   /

     Given a logical difference core C of T w.r.t. Σ and a minimal module M of T
w.r.t. Σ, it is easy to see that C ⊆ M must hold. The maximal logical difference core can
be computed by collecting all the axioms α ∈ T for which cDiff Σ (T , T \ {α}) 6= 0.    /
     Now, the procedure S INGLE -M INIMAL -M ODULE -B UBBLE applied on a terminol-
ogy T , a signature Σ, an initial size parameter n for the bubbles, and a logical difference
core C of T w.r.t. Σ operates as follows. First, the variable M is set to contain all the
axioms of T and the bubble queue Q is initialised by partitioning the axioms contained
in T \ C into bubbles of size n. Note that the size of one bubble may be different from n
if n is not a divisor of |T |, or if n > |T |. The resulting bubbles are then stored in the
queue Q. As long as Q is not empty, the first bubble B is extracted from the queue
(lines 5 and 6). Note that the empty queue is denoted with [ ]. Subsequently, it is veri-
fied in Line 7 whether the removal of the axioms in B from the minimal module candi-
date M leads to a logical difference. If not, all the axioms in B can safely be removed
from M in Line 8. Otherwise, if the bubble contained more than one axiom (Line 10),
we have to identify the subsets of B whose removal does not yield a logical difference.
To that end, B is split into two        Bl and Br (Line 11) such that Bl , Br ⊆ B,
                                    bubbles
|Bl | = 12 · |B| , and |Br | = 12 · |B| . The bubbles Bl and Br are then prepended to
the queue (Line 12), and the algorithm continues with the next iteration.
     The correctness of Algorithm 2 can be shown as before with Algorithm 1. Termi-
nation on any input follows from the fact that every axiom in T appears in at most one
bubble in Q and that in each iteration either the overall number of bubbles is reduced,
or one bubble that contains more than one axiom is split into two smaller bubbles. Note
that once a bubble B of size 1 has been selected in Line 5, it will not be contained in Q
in subsequent iterations. We obtain the following result.

Theorem 9 Let T be an E L -terminology and let Σ be a signature. Additionally, let
C ⊆ T be a logical difference core of T w.r.t. Σ.
   Then Algorithm 2 applied on T , Σ, and C computes a minimal module of T for Σ.

     Regarding computational complexity, we observe that the decomposition of every
bubble B induces a binary tree in which the nodes are labelled with the bubbles resulting
from splitting the parent bubble. In our algorithm, given a bubble B, such a decomposi-
tion tree has a depth of at most blog2 |B|c and the number of nodes in a decomposition
tree corresponds to the number of logical difference checks. As the number of nodes
in a binary tree of depth h is bounded by 2h+1 − 1, we hence obtain that every initial
bubble B results in at most 2 · |B| − 1 logical difference checks. Overall, we can infer
that the procedure S INGLE -M INIMAL -M ODULE -B UBBLE runs in polynomial time in
the size of T , Σ, and n.

3.2. Computing All Minimal Modules

A naı̈ve way to compute all minimal modules is to enumerate all subsets of the input
TBox T and to check their logical difference w.r.t. T and a given signature. For E L -
terminologies the logical difference problem can be decided in polynomial time [6]. Ex-
ample 6 shows that there are E L -terminologies with exponentially many minimal mod-
ules. Consequently, computing all minimal modules of an E L -terminology can only be
achieved in time exponential in the size of the terminology in the worst case.
     For that reason, upper approximations of (the union of) all minimal modules such
as the syntactic locality-based module notions that can be extracted more efficiently
have been introduced [4]. In our algorithm for computing all minimal modules (and
in our experiments for extracting one minimal module) we will make use of syntactic
⊥>∗ -locality modules to speed up computations. These modules are among the small-
est modules based on syntactic locality notions [16]. They can be obtained by iterating
the process of extracting a syntactic ⊥-local module followed by extracting a syntac-
tic >-local module until a fixpoint is reached. We will extract syntactic ⊥>∗ -locality
modules using the OWLAPI.2 Note that any syntactic ⊥>∗ -locality module Ms of an
E L -terminology T w.r.t. a signature Σ contains all the minimal modules of T w.r.t. Σ.
     In our algorithm for computing all minimal modules, we make use of a technique
developed for computing all minimal hitting sets [15]. Our algorithm is based on the
following observation: given a minimal module M of T w.r.t. a signature Σ, then for
any other minimal module M 0 of T w.r.t. Σ there must exist α ∈ M 0 such that α 6∈ M ,
i.e. M 0 must be contained in T \ {α} for some α ∈ M .
     Similarly to [15], our algorithm organises the search space using a labelled, di-
rected tree τ, called module search tree for T , that is extended during the run of the
algorithm. Formally, τ is a tuple (V , E , L , ρ), where V is a non-empty, finite set of
nodes, E ⊆ V × V is a set of edges, L is an edge labelling function, mapping ev-
ery edge e ∈ E to an axiom α ∈ T , and ρ ∈ V is the root node of τ. The procedure
A LL -M INIMAL -M ODULES shown in Algorithm 3 operates on a queue Q that contains
  2 http://owlapi.sourceforge.net
the nodes of τ that still have to be expanded. Intuitively, the labels of the edges on the
unique path from the root node to a node v ∈ V are the axioms that should be excluded
from the search for minimal modules. In each iteration a node v is extracted from Q and
the set Tex ⊆ T of exclusion axioms is computed by analysing the path from the root
node to v. The procedure S INGLE -M INIMAL -M ODULE -B UBBLE is then used to find a
minimal module M of T \ Tex w.r.t. Σ. Subsequently, the tree τ is extended by adding
a child vα of v for every α ∈ M and the search for all minimal modules continues in the
next iteration on the newly added nodes vα .


Algorithm 3 Computing All Minimal Modules w.r.t. a Signature
INPUT: E L -terminology T , signature Σ, n ≥ 1, logical difference core C ⊆ T w.r.t. Σ
 1: function A LL -M INIMAL -M ODULES(T , Σ, n, C )
 2:    TΣ := S YNTACTIC L OCALITY M ODULE(T , Σ)
 3:    M := 0;/ τ := ({ρ}, 0,   / ρ); Q := [ρ]; W := 0/
                             / 0,
 4:    while Q 6= [ ] do
 5:        v := H EAD(Q)
 6:        Q := TAIL(Q)
 7:        W := W ∪ {v}
 8:        Tex := L ABELS(PATH(τ, ρ, v))
 9:        if I S -PATH -R EDUNDANT(τ, ρ, Tex , W) then
10:             continue
11:        end if
12:        if cDiff Σ (TΣ , TΣ \ Tex ) 6= 0/ then
13:             continue
14:        end if
15:        M := 0/
16:        if there exists M 0 ∈ M such that Tex ∩ M 0 = 0/ then
17:             M := M 0
18:        else
19:             M := S INGLE -M INIMAL -M ODULE -B UBBLE(TΣ \ Tex , Σ, n, C )
20:             if M = C then
21:                 return {C }
22:             end if
23:             M := M ∪ {M }
24:        end if
25:        for every α ∈ M \ C do
26:             vα := A DD C HILD(τ, v, α)
27:             Q := vα :: Q
28:        end for
29:    end while
30:    return M
31: end function
     We now describe the A LL -M INIMAL -M ODULES procedure in detail, together with
the optimisations that we implemented. Some of the improvements to prune the search
space have been proposed in [15] already.
     Given an E L -terminology T , a signature Σ, a bubble size n ≥ 1, and a logical
difference core C ⊆ T of T w.r.t. Σ as input, in the lines 2 and 3 a syntactic ⊥>∗ -
locality module TΣ of T w.r.t. Σ is extracted from T , the variable τ is initialised to
represent a module search tree for T having only one node ρ. Moreover, the variables
M ⊆ 2TΣ , containing the minimal modules that have been computed so far, and W ⊆ V ,
containing the already explored nodes of τ, are both initialised with the empty set. The
queue Q of nodes in τ that still have to be explored is also set to contain the node ρ as
its only element.
     The algorithm then enters a while-loop in the lines 4 to 29 in which it remains as
long as Q is not empty. In each iteration the first element v is extracted from Q and v is
added to W (lines 5 to 7). Subsequently, the axioms labelling the edges of the path πv
from ρ to v in τ are collected in the set Tex (Line 8). The algorithm then checks whether
πv is redundant, in which case the next iteration of the while-loop starts.
     The path πv is redundant iff there exists an already explored node w ∈ W such that (a)
the axioms in Tex are exactly the axioms labelling the edges of the path πw from ρ to w
in τ, or (b) w is a leaf node of τ and the edges of πw are only labelled with axioms
from Tex . Condition (a) corresponds to early path termination in [5, 15]: the existence
of πw implies that all possible extensions of πv have already been considered. Condi-
tion (b) implies that the axioms labelling the edges of πw lead to a logical difference
when removed from TΣ . Consequently, removing Tex from TΣ also induces a logical
difference by monotonicity of |=, implying that πv and all its extensions do not have
to be explored. Moreover, the current iteration can also be terminated immediately if
cDiff Σ (TΣ , TΣ \ Tex ) 6= 0/ (lines 12 to 14) as no subset of TΣ \ Tex can be a module of TΣ
(and therefore of T ) w.r.t. Σ.
     Subsequently, in Line 15 the variable M that will hold a minimal module of TΣ \Tex
                    / At this point we can check if a minimal module M 0 ∈ M has already
is initialised with 0.
been computed for which Tex ∩ M 0 = 0/ (lines 16 and 17) holds, in which case we set M
to M 0 . This optimisation step can also be found in [5,15] and it allows us to avoid a costly
call to the S INGLE -M INIMAL -M ODULE -B UBBLE procedure. Otherwise, in the lines 18
to 24 we have to apply S INGLE -M INIMAL -M ODULE -B UBBLE on TΣ \ Tex to obtain a
minimal module of TΣ \ Tex w.r.t. Σ. The algorithm then checks whether M is equal to
C (lines 20 to 22), in which case the search for additional modules can be aborted. If the
logical difference core C is a minimal module itself, we can infer that no other minimal
module exists since C is a subset of all the minimal modules. Otherwise, the module M
is added to M in Line 23. Finally, in the lines 25 to 28 the tree τ is extended by adding
a child vα to v for every α ∈ M \ C , connected by an edge labelled with α. Note that it
is sufficient to take α 6∈ C as a set M with C 6⊆ M cannot be a minimal module of T
w.r.t. Σ. The procedure finishes by returning the set M in Line 30.
     Regarding correctness of Algorithm 3, we note that only minimal modules are added
to M. For completeness, one can show that the locality-based module TΣ of T w.r.t. Σ
contains all the minimal modules of T w.r.t. Σ. Moreover, it is easy to see that the
proposed optimisations do not lead to a minimal module not being computed. Overall,
we obtain the following result.
|Σ ∩ NC |                                                        50
Bubble Size               10                        25                         50                       100
Time (s)         67 / 523 / 200 / 87.2     68 / 483 / 197 / 82.5      70 / 505 / 202 / 85.3     71 / 507 / 204 / 86.4
Sizes             50 / 118 / 77 / 14.6      50 / 118 / 77 / 14.5       50 / 118 / 77 / 14.6      50 / 118 / 77 / 14.6
Size MEX-Mod                                            401 / 720 / 587 / 60.7
Size ⊥>∗ -Mod                                        1075 / 2456 / 1803 / 300.2

|Σ ∩ NC |                                                        75
Bubble Size               10                        25                         50                       100
Time (s)        225 / 584 / 434 / 102.0   216 / 1359 / 531 / 209.8 226 / 575 / 447 / 101.0     231 / 586 / 450 / 101.8
Sizes            78 / 177 / 110 / 16.4      75 / 216 / 113 / 20.8      78 / 177 / 110 / 15.8    78 / 178 / 110 / 15.9
Size MEX-Mod                                             679 / 971 / 825 / 72.0
Size ⊥>∗ -Mod                                         1939 / 3779 / 2641 / 373.2


Table 1. Computation of one minimal basic E L -subsumption module of Snomed CT for 100 random signa-
tures containing 50/75 concept names and all role names (minimal / maximal / median / standard deviation)


Theorem 10 Let T be an E L -terminology and let Σ be a signature. Additionally,
let n ≥ 1, and let C ⊆ T be a logical difference core of T w.r.t. Σ.
     Then the procedure A LL -M INIMAL -M ODULES shown in Algorithm 3 and applied
on T , Σ, n, and C , exactly computes all the minimal modules of T for Σ.

     Algorithm 3 terminates on any input as the paths in the module search tree τ for T
that is constructed during the execution represent all the permutations of the axioms in T
that are relevant for finding all minimal modules. It is easy to see that the procedure A LL -
M INIMAL -M ODULES runs in exponential time in size of T (and polynomially in Σ, n,
and C ) in the worst case.

4. Evaluation
To demonstrate the practical applicability of our approach, we have implemented Al-
gorithms 2 and 3 in a Java prototype to compute one and all minimal basic E L -
subsumption modules of E L -versions (i.e., without role axioms) of two prominent
biomedical ontologies: Snomed CT (version Jan 2016), an acyclic E L -terminology con-
sisting of 317 891 axioms, and NCI (version 14.01d), a cyclic E L -terminology contain-
ing 105 280 axioms. The experiments have been carried out on machines equipped with
an Intel Xeon Core 4 Duo CPU running at 2.50GHz and with 64GiB of RAM.
     Tables 1 and 2 show the results for computing one minimal basic E L -subsumption
module of Snomed CT and NCI for 100 random signatures of different sizes. When the
size of the signature increases, it takes more time in general to compute one minimal
module and the size of their minimal module is also increasing. Moreover, in our experi-
ments the median computation times were decreasing with an increasing bubble size for
signatures with 200 concept names.
     Table 3 shows that there exist several minimal basic E L -subsumption modules of
Snomed CT for the selected signatures (which contain concept names connected to at
most 8 other axioms). Note that we did not consider signatures that are extracted at
random as they usually yield one minimal module only. In our experiments the number
of minimal modules rose up to 32, and the size of the minimal modules varied from one
signature to another.
|Σ ∩ NC |                                                      100
Bubble Size                  10                      50                     100                       200
Time (s)            5 / 258 / 116 / 42.3      2 / 84 / 15 / 14.8     1 / 89 / 2 / 22.2        1 / 86 / 9 / 15.7
Sizes                  0 / 17 / 0 / 3.1        0 / 17 / 0 / 3.1      0 / 64 / 0 / 17.3         0 / 17 / 0 / 3.1
Size ⊥>∗ -Mod                                       679 / 3895 / 3510 / 915.2

|Σ ∩ NC |                                                      200
Bubble Size                  10                      50                     100                       200
Time (s)          125 / 581 / 210 / 100.7   28 / 546 / 59 / 99.4 16 / 552 / 39 / 105.3      9 / 562 / 29 / 109.5
Sizes               0 / 103 / 3 / 18.4       0 / 103 / 3 / 19.7    0 / 103 / 3 / 19.7        0 / 103 / 3 / 19.7
Size ⊥>∗ -Mod                                     3670 / 4619 / 4003 / 196.7


Table 2. Computation of one minimal basic E L -subsumption module of NCI for 100 random signatures
containing 100/200 concept names and all role names (minimal / maximal / median / standard deviation)


              Optimisation                     Core C = 0/                         Core C 6= 0/

              Time (s)                      5 / 709 / 24 / 238.7                 2 / 118 / 7 / 36.0
              Number of Modules                                 1 / 32 / 6 / 9.2
              Size of Modules                                81 / 265 / 126 / 46.5

Table 3. Computation of all minimal basic E L -subsumption modules of Snomed CT for 20 selected signa-
tures consisting of 30 concept names and all role names using a bubble size of 50 (min / max / med / std dev)


     Although a precomputation of the maximal logical difference core has the potential
of narrowing down the search space, it requires extra computational effort, which can be
potentially very time-consuming. In order to check whether the use of the logical dif-
ference core can help to speed up the process of searching for all minimal modules, we
computed all the minimal modules of Snomed CT with and without precomputing the
maximal logical difference core for the same signatures. It turns out that in our experi-
ments the precomputation of the maximal core was beneficial to the overall performance:
the overall computation process was sped up by more than three times.


5. Conclusion
We have reused the black-box approach for computing justifications in order to de-
vise two algorithms for computing one or all basic E L -subsumption modules. We de-
ploy a version of CEX as an oracle for determining whether two possibly cyclic E L -
terminologies are logically different (i.e. not inseparable). Our algorithms are applicable
to ontologies formulated in any ontology language provided that a tool is available that
can effectively decide the inseparability notion of interest.
     Our algorithms may require many costly calls to a logical difference tool. One way
to reduce the overall computation time would be to use a tool that allows for an iterative
computation of the logical difference (i.e., a tool that utilises previous computations on
similar input to determine the existence of a logical difference faster). Another possible
optimisation is refining the single module search algorithm by deploying a strategy for
selecting the sets of axioms (bubbles) that are to be removed next from the minimal mod-
ule candidate. Moreover, when creating bubbles (Algorithm 2) or selecting axioms that
are to be excluded from minimal modules (Algorithm 3) one can ensure that axioms that
always co-occur in minimal basic E L -subsumption modules are not separated. Finally,
instead of searching for minimal modules in the entire ontology, our algorithm first ex-
tracts modules that are based on the notion of syntactic locality. A further optimisation
might be achieved by exploring ways to compute such modules faster [14].
     Acknowledgements. This work was partially supported by the German Research
Foundation (DFG) within the Cluster of Excellence ‘Center for Advancing Electronics
Dresden’ and the China Scholarships Council. We would also like to thank the reviewers
of the workshop WOMoCoE 2016 and Yue Ma (Laboratoire de Recherche en Informa-
tique, Université Paris-Sud, France) for helpful feedback and input.

References

 [1]   F. Baader, S. Brandt, and C. Lutz. Pushing the EL envelope further. In In Proceedings of the OWLED
       2008 DC Workshop on OWL: Experiences and Directions, 2008.
 [2]   F. Baader, R. Peñaloza, and B. Suntisrivaraporn. Pinpointing in the description logic EL. In Proceedings
       of KI’07, volume 4667 of LNAI, pages 52–67, Osnabrück, Germany, 2007. Springer-Verlag.
 [3]   A. Ecke, M. Ludwig, and D. Walther. The concept difference for EL-terminologies using hypergraphs.
       In Proceedings of the International workshop on (Document) Changes: modeling, detection, storage
       and visualization (DChanges 2013), volume 1008 of CEUR-WS, 2013.
 [4]   B. C. Grau, I. Horrocks, Y. Kazakov, and U. Sattler. Modular reuse of ontologies: Theory and practice.
       Journal of Artificial Intelligence Research (JAIR), 31(1):273–318, 2008.
 [5]   A. Kalyanpur, B. Parsia, M. Horridge, and E. Sirin. Finding all justifications of OWL DL entailments. In
       Proceedings of the 6th International Semantic Web Conference & 2nd Asian Semantic Web Conference
       (ISWC 2007 & ASWC 2007), volume 4825 of LNCS, pages 267–280. Springer, 2007.
 [6]   B. Konev, M. Ludwig, D. Walther, and F. Wolter. The logical difference for the lightweight description
       logic EL. Journal of Artificial Intelligence Research (JAIR), 44:633–708, 2012.
 [7]   B. Konev, M. Ludwig, and F. Wolter. Logical difference computation with CEX2.5. In Proceedings of
       IJCAR’12, pages 371–377, Berlin, Heidelberg, 2012. Springer-Verlag.
 [8]   B. Konev, C. Lutz, D. Walther, and F. Wolter. Semantic modularity and module extraction in description
       logics. In Proceedings of ECAI’08, pages 55–59, Amsterdam, The Netherlands, 2008. IOS Press.
 [9]   B. Konev, C. Lutz, D. Walther, and F. Wolter. Model-theoretic inseparability and modularity of descrip-
       tion logic ontologies. Artificial Intelligence, 203:66–103, 2013.
[10]   B. Konev, D. Walther, and F. Wolter. The logical difference problem for description logic terminologies.
       In Proceedings of IJCAR’08, pages 259–274, Berlin, Heidelberg, 2008. Springer-Verlag.
[11]   R. Kontchakov, F. Wolter, and M. Zakharyaschev. Logic-based ontology comparison and module ex-
       traction, with an application to DL-Lite. Artificial Intelligence, 174(15):1093–1141, 2010.
[12]   M. Ludwig and D. Walther. The logical difference for ELHr-terminologies using hypergraphs. In
       Proceedings of ECAI’14, volume 263 of Frontiers in Artificial Intelligence and Applications, pages
       555–560. IOS Press, 2014.
[13]   C. Lutz and F. Wolter. Deciding inseparability and conservative extensions in the description logic EL.
       Journal of Symbolic Computation, 45(2):194–228, Feb. 2010.
[14]   F. Martin-Recuerda and D. Walther. Fast modularisation and atomic decomposition of ontologies using
       axiom dependency hypergraphs. In Proceedings of ISWC’14, Part II, volume 8797 of LNCS, pages
       49–64. Springer-Verlag, 2014.
[15]   R. Reiter. A theory of diagnosis from first principles. Artificial Intelligence, 32(1):57–95, 1987.
[16]   U. Sattler, T. Schneider, and M. Zakharyaschev. Which kind of module should I extract? In Proceedings
       of DL’09, volume 477 of CEUR Workshop Proceedings. CEUR-WS.org, 2009.

</pre>