<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Debugging Snomed ct Using Axiom Pinpointing in the Description Logic E L+</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Franz Baader</string-name>
          <email>baader@tcs.inf.tu-dresden.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Boontawee Suntisrivaraporn</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Institute for Theoretical Computer Science</institution>
          ,
          <addr-line>TU Dresden</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2008</year>
      </pub-date>
      <abstract>
        <p>Snomed ct is a large-scale medical ontology, which is developed using a variant of the inexpressive Description Logic E L. Description Logic reasoning can not only be used to compute subsumption relationships between Snomed concepts, but also to pinpoint the reason why a certain subsumption relationship holds by computing the axioms responsible for this relationship. This helps developers and users of Snomed ct to understand why a given subsumption relationship follows from the ontology, which can be seen as a first step toward removing unwanted subsumption relationships. In this paper, we describe a new method for axiom pinpointing in the Description Logic E L+, which is based on the computation of so-called reachabilitybased modules. Our experiments on Snomed ct show that the sets of axioms explaining subsumption are usually quite small, and that our method is fast enough to compute such sets on demand.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Description Logics (DLs) [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] are a family of
logicbased knowledge representation formalisms, which
can be used to develop ontologies in a formally
well-founded way. This is true both for
expressive DLs, which are the logical basis of the Web
Ontology Language OWL [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], and for
inexpressive DLs of the E L family [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], which are used in
the design of large-scale medical ontologies such as
Snomed ct1 and the National Cancer Institute’s
ontology.2
One of the main advantages of employing a
logicbased ontology language is that reasoning services
can be used to derive implicit knowledge from the
one explicitly represented. DL systems can, for
example, classify a given ontology, i.e., compute all
1http://www.ihtsdo.org/our-standards/
2http://www.nci.nih.gov/cancerinfo/terminologyresources
the subsumption (subconcept–superconcept)
relationships between the concepts defined in the
ontology. The advantage of using an inexpressive DL
of the E L family is that classification is tractable,
i.e., E L reasoners such as CEL [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] can compute
the subsumption hierarchy of a given ontology in
polynomial time.
      </p>
      <p>
        Similar to writing large programs, building
largescale ontologies is an error-prone endeavor.
Classification can help to alert the developer or user
of an ontology to the existence of errors. For
example, the subsumption relationship between
“amputation of finger” and “amputation of
upper limb” in Snomed ct is clearly unintended
[
        <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
        ], and thus reveals a modeling error.
However, given an unintended subsumption
relationship in a large ontology like Snomed ct with
almost four hundred thousand axioms, it is not
always easy to find the erroneous axioms responsible
for it by hand. To overcome this problem, the DL
community has recently invested quite some work
on automating this process. Given a
subsumption relationship or another questionable
consequence, axiom pinpointing computes a minimal
subset (all minimal subsets) of the ontology that
have this consequence (called MinAs in the
following). Most of the work on axiom pinpointing
in DLs was concerned with rather expressive DLs
(see, e.g., [
        <xref ref-type="bibr" rid="ref10 ref8 ref9">8, 9, 10</xref>
        ]). The only work that
concentrated on pinpointing in the E L family of DLs was
[
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. In addition to providing complexity results
for pinpointing, [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] introduces a “pragmatic”
algorithm for computing one MinA, which is based
on a modified version of the classification
algorithm used by the CEL reasoner [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Though this
approach worked quite well for mid-size ontologies
(see the experiments on a variant of the Galen
medical ontology described in [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]), it was not
efficient enough to deal with large-scale ontologies
like Snomed ct.
      </p>
      <p>
        In the present paper, we describe a new method
for axiom pinpointing in the Description Logic
EL+, which is based on the computation of
socalled reachability-based modules [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Our
experiments on Snomed ct show that the sets of axioms
explaining a given subsumption are usually quite
small (78% of the MinAs we computed were of size
ten or less), and that our method is fast enough
(on average, it took one second to obtain a MinA)
to compute these sets on demand, i.e., whenever
the user asks for a MinA for a suspect
subsumption relationship.
      </p>
      <p>Axiom pinpointing in E L+
In this section, we first introduce the DL EL+,
which is an extension of the DL EL used to define
Snomed ct. Then, we define minimal axiom sets
(MinAs) for subsumption, and recall some of the
known results about computing MinAs in EL+.</p>
      <p>Syntax
C D
∃r.C</p>
      <p>Semantics
ΔI
CI ∩ DI
{x ∈ ΔI | ∃y ∈ ΔI :</p>
      <p>(x, y) ∈ rI ∧ y ∈ CI}
C</p>
      <p>D
r1 ◦ · · · ◦ rn
s</p>
      <p>CI ⊆ DI
r1I ◦ · · · ◦ rnI ⊆ sI</p>
      <p>Starting with a set of concept names CN and a
set of role names RN, EL+ concept descriptions
can be built using the constructors shown in the
upper part of Table 1, i.e., every concept name
A ∈ CN and the top concept are EL+
concept descriptions, and if C, D are EL+ concept
descriptions and r ∈ RN is a role name, then
C D (conjunction) and ∃r.C (existential
restriction) are EL+ concept descriptions. Role chains
of the form r1 ◦ · · · ◦ rn for n ≥ 0 are called role
descriptions. An EL+ ontology is a finite set of
axioms of the form shown in the lower part of
Table 1, where axioms of the form C D are called
general concept inclusions (GCIs) and of the form
r1 ◦ · · · ◦ rn s role inclusions (RIs). An EL
ontology is an EL+ ontology that does not contain
RIs. We use C ≡ D as an abbreviation for the two
GCIs C D, D C.</p>
      <p>The semantics of EL+ is defined in terms of
interpretations I = (ΔI, ·I), where the domain ΔI is a
non-empty set of individuals, and the
interpretation function ·I maps each concept name A ∈ CN
to a subset AI of ΔI and each role name r ∈ RN
to a binary relation rI on ΔI. The extension of ·I
to arbitrary concept descriptions is inductively
defined, as shown in the semantics column of Table 1.
An interpretation I is a model of an ontology O if,
for each inclusion axiom in O, the conditions given
in the semantics column of Table 1 are satisfied.
The main reasoning problem in EL+ is the
subsumption problem: given an EL+ ontology O and
two EL+ concept descriptions C, D, check whether
C is subsumed by D w.r.t. O (written C O D),
i.e., whether CI ⊆ DI holds in all models of O.
The computation of all subsumption relationships
between the concept names occurring in O is called
classification of O.</p>
      <p>
        Figure 1 shows a small EL ontology defining
concepts related to amputation/injury of hand and
finger. It uses the so-called SEP-triplet encoding
[
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], in which anatomical concepts (like hand) are
represented by three concepts: the structure
concept (e.g, HandS, which stands for the hand and
its proper parts), the part concept (e.g, HandP ,
which stands for the proper parts of the hand),
and the entity concept (e.g, HandE, which stands
for the entire hand). The axioms α5–α10
constitute a completed SEP-triplet encoding for finger,
hand, and upper limb. For example, axiom α8 says
that proper parts of the hand belong to the
structure concept HandS, and they are parts of hand
(i.e., linked with the role part to the entity
concept HandE). Given this encoding, the fact that
the finger is part of the hand can be expressed
using axiom α11. The main reason for using this
encoding in Snomed ct is that it can simulate
transitivity reasoning for the role part, although
transitivity of part cannot be expressed in EL. For
example, it is easy to see that the ontology OAmp
implies that the finger is part of the upper limb,
i.e., FingerE OAmp ∃part.ULimbE . As a side-effect,
the SEP-triplet encoding can also be used to
simulate so-called right-identity rules [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], which allow
to inherit properties along the part role. Consider
the following subsumption relationships that hold
in our example ontology:
      </p>
      <sec id="sec-1-1">
        <title>AmpOfFinger</title>
      </sec>
      <sec id="sec-1-2">
        <title>InjToFinger</title>
        <sec id="sec-1-2-1">
          <title>OAmp</title>
        </sec>
        <sec id="sec-1-2-2">
          <title>OAmp</title>
        </sec>
      </sec>
      <sec id="sec-1-3">
        <title>AmpOfHand,</title>
      </sec>
      <sec id="sec-1-4">
        <title>InjToHand.</title>
        <p>
          (1)
(2)
While subsumption (2) actually makes sense (it
is sensible to say that an injury to the finger is
an injury to the hand), subsumption (1) is clearly
undesirable. Subsumption (1) is an example of a
false positive subsumption, which does indeed
occur in Snomed ct. It has been argued [
          <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
          ] that
this subsumption is due to a faulty SEP-triplet
encoding. In fact, using the entity concepts instead
of the structure concepts in the axioms α1 and α2
would have avoided this problem.
        </p>
        <p>
          In E L+, one could actually dispense with the
SEPtriplet encoding altogether since both
transitivity and right-identity rules can be expressed using
RIs. For example, part◦part part expresses
transitivity of the role part. An alternative and direct
representation of anatomical concepts, as well as
referring concepts like clinical findings and
procedures, based on this additional expressive power of
the DL E L+ is proposed in [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. The new modeling
is succinct and also avoids the above false positive
subsumption (1).
        </p>
        <p>For a small ontology like OAmp, it is not hard to
do the subsumption reasoning manually, and thus
also to find the axioms responsible for a given
subsumption relationship by hand. For a very large
ontology like Snomed ct, this manual approach
to pinpointing the responsible axioms is very
timeconsuming, and thus should be automated. First,
we give a formal definition of what automated
pinpointing is actually supposed to compute.
Definition 1 (MinA). Let O be an E L+
ontology, and A, B concept names such that A O B.
The set S ⊆ O is a minimal axiom set (MinA) for
A O B if, and only if, A S B and, for every
S ⊂ S, A S B.</p>
        <p>
          In our example, {α1, α2, α8, α11} is the only MinA
for subsumption (1), whereas {α3, α4, α8, α11} is
the only MinA for subsumption (2). As shown
in [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ], a given subsumption relationship w.r.t. an
E L+ ontology may have exponentially many
MinAs, and even deciding whether there is a MinA
of cardinality ≤ k is an NP-complete problem. In
contrast, one MinA can always be extracted in
Algorithm 1 Naive linear extraction of a MinA.
function lin-extract-mina(A, B, O)
1: S := O
2: for each axiom α ∈ O do
3: if A S\{α} B then
4: S := S \ {α}
5: return S
Algorithm 2 Logarithmic extraction of a MinA.
function log-extract-mina(A, B, O)
1: return log-extract-mina-r(A, B, ∅, O)
function log-extract-mina-r(A, B, S, O)
1: if |O| = 1 then
2: return O
3: S1, S2 := halve(O)
4: if A S∪S1 B then
5: return log-extract-mina-r(A, B, S, S1)
6: if A S∪S2 B then
7: return log-extract-mina-r(A, B, S, S2)
8: S1 := log-extract-mina-r(A, B, S ∪ S2, S1)
9: S2 := log-extract-mina-r(A, B, S ∪ S1, S2)
10: return S1 ∪ S1
polynomial time. In [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ], this was shown using the
simple Algorithm 1, which requires linearly many
(polynomial) subsumption tests. For a large
ontology, however, this naive approach is not
feasible. For example, for Snomed ct it would require
almost half a million subsumption tests for each
MinA extraction.
        </p>
        <p>
          We can do much better by adopting the algorithm
for computing prime implicates described in [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]
to our problem. Basically, this algorithm applies
binary search to find a MinA. Instead of taking
out one axiom at a time, it partitions the
ontology into two halves, and checks whether one of
them entails the subsumption. If yes, it
immediately recurses on that half, throwing away half
of the axioms in one step. Otherwise, essential
axioms are in both halves. In this case, the
algorithm recurses on each half, using the other half as
the “support set”. Algorithm 2 describes this
approach in more detail, where the function halve(O)
partitions O into S1 ∪ S2 with ||S1| − |S2|| ≤ 1. It
follows from the results in [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] that computing a
MinA S for a given subsumption A O B with
Algorithm 2 requires O ((|S| − 1) + |S|log (|O|/|S|))
subsumption tests. This greatly improves on the
naive algorithm. For instance, computing a MinA
consisting of nine axioms for Snomed ct requires
about one hundred subsumption tests. Though
this is much better than the almost half a million
required by the naive algorithm, it is still not good
enough to compute MinAs on demand (see below).
        </p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Modularization-based axiom</title>
      <p>
        pinpointing in E L+
Instead of applying Algorithm 1 or 2 directly to
the whole ontology O, one can first try to find a
non-minimal (but hopefully small) subset S ⊆ O
with A S B (called nMinA in the following),
and then apply Algorithm 1 or 2 to this subset to
obtain a MinA. In [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], we have sketched a
modified version of the classification algorithm for E L+
[
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ] that extracts such nMinAs. In the
experiments on a version of Galen described in [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ],
Algorithm 1 was then used to minimize these sets.
Whereas the nMinA extraction was fast and
produced quite small sets for Galen, it crashed after
a few hours because of space problems when
applied to Snomed ct.
      </p>
      <p>
        To overcome this problem, we propose an
algorithm for extracting nMinAs that is based on
modularization. In the following, we introduce
only those notions regarding modularization that
are strictly necessary in the context of this
paper. More details regarding the reachability-based
modularization approach from which these notions
are derived, as well as its connection to other work
on modularization, can be found in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>Let O be an E L+ ontology, and A a concept name
occurring in O. We say that S ⊆ O is a
subsumption module for A in O whenever A O B if,
and only if, A S B holds for all concept names
B occurring in O. Obviously, if S is a
subsumption module for A in O and A O B, then S is
an nMinA for this subsumption, and Algorithm 1
or 2 can be used to compute a MinA S ⊆ S from
S. Thus, we know that a subsumption module
for A contains a MinA for every valid
subsumption relationship A O B. The reachability-based
modules introduced below satisfy an even stronger
property: they contain all MinAs for all valid
subsumptions.</p>
      <p>Definition 2. Let O be an E L+ ontology and A
a concept name occurring in O. The subsumption
module S for A in O is called strong if the
following holds for all concept names B occurring in
O: if A O B, then every MinA for A O B is a
subset of S.</p>
      <p>
        Obviously, O itself is a strong subsumption
module for every concept name A occurring in O.
The following definition (first introduced in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ])
yields strong subsumption modules that are
usually much smaller than the whole ontology. For
an E L+ entity X—i.e., either a (concept or role)
description, a (concept or role) inclusion axiom, or
an ontology—we write Sig(X) to denote the set of
concept and role names occurring in the entity X.
Definition 3 (Reachability-based modules).
Let O be an E L+ ontology and A a concept name
occurring in O. The set of A-reachable names in
O is the smallest set N of concept and role names
such that
• A belongs to N;
• for all (concept/role) inclusion axioms αL
in O, if Sig(αL) ⊆ N then Sig(αR) ⊆ N.
αR
We call an axiom αL αR A-reachable in O if
every element of Sig(αL) is A-reachable in O. The
reachability-based module for A in O, denoted by
OAreach, consists of all A-reachable axioms from O.
reach is indeed a
In [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], it has been shown that OA
subsumption module for A in O. Here, we show
the following stronger results.
      </p>
      <p>Theorem 4. Let O be an E L+ ontology and A a
reach is a strong
subsumpconcept name. Then OA
tion module for A in O.</p>
      <p>
        Proof. The fact that OAreach is a subsumption
module was already shown in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. To show that
it is strong, assume that A O B holds, but there
is a MinA S for A O B that is not contained in
OAreach. Thus, there is an axiom α ∈ S \ OAreach.
Let S1 be the subset of S that contains the
Areachable axioms. Note that S1 is a strict subset
of S since α ∈ S1. We claim that A S B implies
A S1 B, which contradicts the assumption that
S is a MinA for A O B.
      </p>
      <p>To show the claim, we assume to the contrary that
A S1 B, i.e., there is a model I1 of S1 such that
AI1 ⊆ BI1 . We modify I1 to I by setting yI := ∅
for all (concept or role) names that are not
Areachable. It is easy to see that AI ⊆ BI . In fact,
we have AI = AI1 (since A is A-reachable), and
BI = BI1 or BI = ∅.</p>
      <p>It remains to show that I is indeed a model of
S, i.e. satisfies all axioms βL βR in S. If
βL contains a name that is not A-reachable, then
(βL)I = ∅, and the axiom is trivially satisfied.
Otherwise, this axiom belongs to S1, and the
definition of A-reachability implies that all names in
βR are A-reachable as well. Consequently, I1 and
I coincide on the names occurring in βL βR.
Since I1 is a model of S1, we thus have (βL)I =
(βL)I1 ⊆ (βR)I1 = (βR)I . ❏
As an immediate consequence of this theorem,
instead of extracting a MinA for A O B from O, it
is sufficient to extract a MinA for A OAreach B from
reach. This is what the function extract-mina in
OA</p>
      <p>Algorithm 3 Modularization-based extraction of
a MinA
function extract-mina(A, B, O)
1: OAreach ← extract-module(O, A)
2: return log-extract-mina(A, B, OAreach)
function second-mina?(A, B, OAreach, S1)
1: for each axiom α ∈ S1 do
2: O ← OAreach\{α}
3: if A B then
4: retuOrn “second MinA exists”
5: return “MinA unique”
function extract-module(O, A)
8: return OA
21:: qOuAeu←e←∅ active-axioms({A})
3: while not empty(queue) do
4: (αL αR) ← fetch(queue)
5: if Sig(αL) ⊆ {A} ∪ Sig(OA) then
6: OA ← OA ∪ {αL αR}
7: queue ← queue ∪</p>
      <p>(active-axioms(Sig(αR)) \ OA)
Algorithm 3 does. Note that, instead of the
logarithmic extraction algorithm (Algorithm 2), we
could also use the linear extraction algorithm
(Algorithm 1). Since reachability-based modules are
usually quite small, it is not a priori clear whether
using the more complicated logarithmic algorithm
really pays off (see the results of our experiments
below). The function second-mina? in Algorithm 3
takes the extracted module and the first MinA as
input, and checks if the subsumption in question
still holds in the absence of one of the axioms in
the MinA. In this case, this subsumption obviously
must have more than one MinA. Note that, for this
function to be correct, we really need to know that
reach is a strong subsumption module.</p>
      <p>
        OA
The function extract-module in Algorithm 3
realizes one way of computing reachability-based
modules. The function call active-axioms used
there yields, for a given set of names, all
axioms that contain at least one of these names in
their left-hand side. It is not hard to show that
the call extract-module(O, A) indeed computes the
reachability-based module for A in O (see [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] for
more details). The experiments described in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]
show that extraction of reachability-based
modules in Snomed ct is usually quite fast, and the
modules obtained this way are quite small. In the
next section, we show that these positive results
extend to the modularization-based extraction of
MinAs.
      </p>
    </sec>
    <sec id="sec-3">
      <title>Experimental Results</title>
      <p>
        We have implemented the three algorithms
described in this paper, using CEL [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] to
compute subsumption. Our experiments use the
January/2005 release of the DL version of
Snomed ct, which contains 379,691 concept
names, 62 role names, and 379,704 axioms.3 In
the following, we call this ontology OSnomed. The
experiments were carried out on a PC with 2.40
GHz Pentium-4 processor and 1 GB of memory.
      </p>
      <p>As stand-alone algorithms for computing a MinA,
we applied Algorithm 1 and 2 only to the
false positive subsumption AmpOfFinger OSnomed
AmpOfHand. Algorithm 1 did not terminate on
this input after 24 hours, whereas Algorithm 2
required 26:05 minutes (1,565 seconds) to compute a
MinA of cardinality 6. (Note that the actual
modelling of “amputation of finger” and “amputation
of hand” in Snomed ct differs from the one given
in Fig. 1 due to the use of role groups and of two
different roles to express location in Snomed ct.</p>
      <p>Thus, the computed MinA also differs from the
one given above. However, it also shows that the
reason for the unintended subsumption is the
incorrect use of the SEP-triplet encoding.)
Algorithm 3 performs much better for the
amputation example. The reachability-based
module OASnmopmOefdFinger contains 57 axioms, and was
computed in 0.04 seconds. Extracting a MinA
for AmpOfFinger OASnmopmOefdFinger AmpOfHand from</p>
      <p>Snomed
OAmpOfFinger using the logarithmic minimization
algorithm then took only half a second. An
application of second-mina? then showed that the
extracted MinA is the only one for this subsumption.</p>
      <p>We have also applied Algorithm 3 to a large
number of subsumption relationships that follow from
OSnomed. Since there are more than five million
such subsumptions, testing the algorithm on all
of them was not feasible: assuming an average
extraction time of 1 second, this would have
required 58 days. For this reason, we sampled 0.5%
of all concepts in each top-level category C in
Snomed ct. Let us denote the set of samples for
category C by c-samples(C). For each sampled
concept A, all positive subsumptions A OSnomed B
with A as subsumee were considered.</p>
      <p>The first column of Table 2 shows the top-level
categories and the second the number of sampled
subsumption relationships with the subsumee in
this category. The next four columns give the
time needed to compute and the size of the
corresponding modules and MinAs. The values in
square brackets give the time required by the</p>
      <p>3The DL version is also known in the SNOMED
lingo as the ‘stated form,’ while axioms here boil down
to (primitive) concept definitions.
nA 100 .416 .300 .576 100 100 .012 .501 .1 0 8 9 6 5 0 8 0 2 6
5 .0 .93 .26 .97 .47 01 .15 10 .28 .82
6 6 9</p>
      <p>B / 1 3 1
inA edom )x .16 / / / 2 2 8 / / 3 / /
MSnA a 3 .54 .00 .10 .83 .23 .36 .72 .68 .82 .18 .65 .267 .359 .52 .55 .2 .17 .11
5 9 4 4 3 4 8 4 4 7
7 8 4 3 8 5 8 3 0 5 1 6 7 9 4 3 7 2 6 f
/ / / 1 1 / 1 3 / / / 1 / 1 3 o
/ / / r
8 e
b
.
s
b h i
u it M
S w e
% n</p>
      <p>o
f O /m
o g
e v
z A a
iS ro (</p>
      <p>f
m o (
i
T m
.s le
b p
Su am
s
] ] ] ] ] ] ] ] ] ] ] ] ] ] ] ] ] ] ] ;
8 4 4 7 9 6 4 6 1 7 4 8 2 8 4 1 8 1 4 s
e fo / / / / / / / / / / / / / / / / / / / t
m
i
T
) 8 6 3 8 9 6 2 1 0 7 2 6 1 5 5 7 8 2 6 n
ifzeoS eSndomAO(ag/avxm .512/ .4076/7 .1507/14 .014/10 .765/ .464/ .8261/ .31032/ .13416/ .070/ .3591/ .1987/14 .6681/ .10111/ .050/ .7746/12 .560/ .4176/3 .3521/14 itgonnnS
i
o
to le /</p>
      <p>g
u v .
e d a 0 &lt;
&lt;
&lt; &lt; &lt;
trceagoyeSndoCm ()l-ssceapmCA∈ itttrbeuA ttsrrceydouuB lliiiicgadnnnFC tttt-eeecegexpddann litsrgegceoaphan&amp;m tsvenE littsrvbeeebyanO isrganOm lliiitcbgceca/ooau lifsrcecyahoP lijtscbceyahoP rrceeoduP llireveaauufiQ littccexaooSn litcceceaoppnS iecepSnm litssggceaadannS tsbecauSn illreavnecSndom iil:lrrscee2apuEm
s
t lt
O b
a
T
25000
20000
21377
modularization-based pinpointing algorithm, but
with the naive linear minimization algorithm
instead of the logarithmic one. In all four columns,
we give both average and maximum values. The
last column shows the percentage of
subsumptions that have only one MinA. Interestingly, more
than two thirds of all subsumptions have only
one MinA. The overall empirical results for the
27,477 sampled subsumptions (about 0.5% of all
subsumptions) are given in the last row of the
table. These results show that, on average, a MinA
can be computed within one second, and its size is
smaller than 10. Thus, MinAs can indeed be
computed on demand, and their size is small enough
such that they can then be inspected by hand.</p>
      <p>Surprisingly, the linear minimization algorithm
performed better in our experiments than the
logarithmic one. An explanation for this is probably
that, unlike the experiments of Algorithm 1 and 2
on the whole ontology, the modules are already
quite small, and thus the overhead required by the
logarithmic algorithm does not pay off. Figure 2
depicts the size distribution of our sampled
modules and MinAs. As easily visible from the chart,
the modules are quite small, but the MinAs are
even smaller. In fact, the majority of all
subsumptions (78%) have a MinA of size ten or less.</p>
    </sec>
    <sec id="sec-4">
      <title>Conclusions</title>
      <p>We have introduced a new method for axiom
pinpointing in the DL E L+ that is based on the
computation of reachability-based modules. The
experiments carried out on Snomed ct show that
this method is fast enough to extract a minimal
axiom set (MinA) for a given subsumption on
demand. In addition, the extracted MinAs are
usually quite small and can therefore be inspected by
users and designers of Snomed ct by hand. In the
future, we will extend the approach such that it
can (i) extract all MinAs, (ii) provide natural
language explanations for subsumption, and (iii) give
suggestions for how to revise the ontology to get
rid of an unwanted subsumption.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgements</title>
      <p>The first author was partially supported by NICTA,
Canberra Research Lab, and the second by the
German Research Foundation (DFG) under grant BA
1122/11-1.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>F.</given-names>
            <surname>Baader</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Calvanese</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>McGuinness</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Nardi</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P. F.</given-names>
            <surname>Patel-</surname>
          </string-name>
          Schneider, editors.
          <source>The Description Logic Handbook: Theory</source>
          , Implementation, and
          <string-name>
            <surname>Applications</surname>
          </string-name>
          . Cambridge University Press,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>I.</given-names>
            <surname>Horrocks</surname>
          </string-name>
          ,
          <string-name>
            <surname>P. F. P.-Schneider</surname>
            , and
            <given-names>F. van Harmelen. From SHIQ</given-names>
          </string-name>
          and
          <article-title>RDF to OWL: The making of a web ontology language</article-title>
          .
          <source>Journal of Web Semantics</source>
          ,
          <volume>1</volume>
          (
          <issue>1</issue>
          ):
          <fpage>7</fpage>
          -
          <lpage>26</lpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>F.</given-names>
            <surname>Baader</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Brandt</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Lutz</surname>
          </string-name>
          .
          <article-title>Pushing the EL envelope</article-title>
          .
          <source>In Proc. IJCAI</source>
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>F.</given-names>
            <surname>Baader</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Lutz</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B.</given-names>
            <surname>Suntisrivaraporn</surname>
          </string-name>
          .
          <article-title>CEL-a polynomial-time reasoner for life science ontologies</article-title>
          .
          <source>In Proc. IJCAR</source>
          <year>2006</year>
          , Springer LNAI 4130,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>B.</given-names>
            <surname>Suntisrivaraporn</surname>
          </string-name>
          .
          <article-title>Module extraction and incremental classification: A pragmatic approach for EL+ ontologies</article-title>
          .
          <source>In Proc. ESWC</source>
          <year>2008</year>
          ,
          <string-name>
            <surname>Springer</surname>
            <given-names>LNCS</given-names>
          </string-name>
          ,
          <year>2008</year>
          . To appear.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>B.</given-names>
            <surname>Suntisrivaraporn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Baader</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Schulz</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Spackman</surname>
          </string-name>
          .
          <article-title>Replacing SEP-triplets in SNOMED CT using tractable description logic operators</article-title>
          .
          <source>In Proc. AIME</source>
          <year>2007</year>
          , Springer LNCS 4594,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>U.</given-names>
            <surname>Hahn S. Schulz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Mark</surname>
          </string-name>
          .
          <article-title>Spatial location and its relevance for terminological inferences in bio-ontologies</article-title>
          .
          <source>BMC Bioinformatics</source>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S.</given-names>
            <surname>Schlobach</surname>
          </string-name>
          and
          <string-name>
            <given-names>R.</given-names>
            <surname>Cornet</surname>
          </string-name>
          .
          <article-title>Non-standard reasoning services for the debugging of description logic terminologies</article-title>
          .
          <source>In Proc. IJCAI</source>
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>B.</given-names>
            <surname>Parsia</surname>
          </string-name>
          , E. Sirin,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Kalyanpur</surname>
          </string-name>
          .
          <article-title>Debugging OWL ontologies</article-title>
          .
          <source>In Proc. WWW</source>
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10] T. Meyer,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Booth</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J. Z.</given-names>
            <surname>Pan</surname>
          </string-name>
          .
          <article-title>Finding maximally satisfiable terminologies for the description logic ALC</article-title>
          .
          <source>In Proc. (AAAI</source>
          <year>2006</year>
          ). AAAI Press/The MIT Press,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>F.</given-names>
            <surname>Baader</surname>
          </string-name>
          , R. Pen˜aloza, and
          <string-name>
            <given-names>B.</given-names>
            <surname>Suntisrivaraporn</surname>
          </string-name>
          .
          <article-title>Pinpointing in the description logic EL+</article-title>
          .
          <source>In Proc. KI</source>
          <year>2007</year>
          , Springer LNAI 4667,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>S.</given-names>
            <surname>Schulz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Romacker</surname>
          </string-name>
          , and
          <string-name>
            <given-names>U.</given-names>
            <surname>Hahn</surname>
          </string-name>
          .
          <article-title>Partwhole reasoning in medical ontologies revisited: Introducing SEP triplets into classification-based description logics</article-title>
          .
          <source>JAMIA</source>
          ,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>K.A.</given-names>
            <surname>Spackman</surname>
          </string-name>
          .
          <article-title>Managing clinical terminology hierarchies using algorithmic calculation of subsumption: Experience with SNOMED-RT</article-title>
          . JAMIA,
          <year>2000</year>
          . Fall Symposium Special Issue.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>A. R.</given-names>
            <surname>Bradley</surname>
          </string-name>
          and
          <string-name>
            <given-names>Z.</given-names>
            <surname>Manna</surname>
          </string-name>
          .
          <article-title>Checking safety by inductive generalization of counterexamples to induction</article-title>
          .
          <source>In Proc. FMCAD</source>
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>