Representing and sharing knowledge using SNOMED
Proceedings of the 3rd international conference on Knowledge Representation in Medicine (KR-MED 2008)
R. Cornet, K.A. Spackman (Eds)


                                    Debugging Snomed ct
                      Using Axiom Pinpointing in the Description Logic EL+


                                    Franz Baader and Boontawee Suntisrivaraporn
                                  Institute for Theoretical Computer Science, TU Dresden, Germany


                               Abstract                                   the subsumption (subconcept–superconcept) rela-
          Snomed ct is a large-scale medical ontology,                    tionships between the concepts deﬁned in the on-
          which is developed using a variant of the inexpres-             tology. The advantage of using an inexpressive DL
          sive Description Logic EL. Description Logic rea-               of the EL family is that classiﬁcation is tractable,
          soning can not only be used to compute subsump-                 i.e., EL reasoners such as CEL [4] can compute
          tion relationships between Snomed concepts, but                 the subsumption hierarchy of a given ontology in
          also to pinpoint the reason why a certain subsump-              polynomial time.
          tion relationship holds by computing the axioms
          responsible for this relationship. This helps devel-            Similar to writing large programs, building large-
          opers and users of Snomed ct to understand why                  scale ontologies is an error-prone endeavor. Clas-
          a given subsumption relationship follows from the               siﬁcation can help to alert the developer or user
          ontology, which can be seen as a ﬁrst step toward               of an ontology to the existence of errors. For
          removing unwanted subsumption relationships.                    example, the subsumption relationship between
          In this paper, we describe a new method for axiom               “amputation of ﬁnger” and “amputation of up-
          pinpointing in the Description Logic EL+ , which is             per limb” in Snomed ct is clearly unintended
          based on the computation of so-called reachability-             [6, 7], and thus reveals a modeling error. How-
          based modules. Our experiments on Snomed ct                     ever, given an unintended subsumption relation-
          show that the sets of axioms explaining subsump-                ship in a large ontology like Snomed ct with al-
          tion are usually quite small, and that our method               most four hundred thousand axioms, it is not al-
          is fast enough to compute such sets on demand.                  ways easy to ﬁnd the erroneous axioms responsible
                                                                          for it by hand. To overcome this problem, the DL
                              Introduction                                community has recently invested quite some work
                                                                          on automating this process. Given a subsump-
          Description Logics (DLs) [1] are a family of logic-             tion relationship or another questionable conse-
          based knowledge representation formalisms, which                quence, axiom pinpointing computes a minimal
          can be used to develop ontologies in a formally                 subset (all minimal subsets) of the ontology that
          well-founded way. This is true both for expres-                 have this consequence (called MinAs in the fol-
          sive DLs, which are the logical basis of the Web                lowing). Most of the work on axiom pinpointing
          Ontology Language OWL [2], and for inexpres-                    in DLs was concerned with rather expressive DLs
          sive DLs of the EL family [3], which are used in                (see, e.g., [8, 9, 10]). The only work that concen-
          the design of large-scale medical ontologies such as            trated on pinpointing in the EL family of DLs was
          Snomed ct1 and the National Cancer Institute’s                  [11]. In addition to providing complexity results
          ontology.2                                                      for pinpointing, [11] introduces a “pragmatic” al-
          One of the main advantages of employing a logic-                gorithm for computing one MinA, which is based
          based ontology language is that reasoning services              on a modiﬁed version of the classiﬁcation algo-
          can be used to derive implicit knowledge from the               rithm used by the CEL reasoner [4]. Though this
          one explicitly represented. DL systems can, for ex-             approach worked quite well for mid-size ontologies
          ample, classify a given ontology, i.e., compute all             (see the experiments on a variant of the Galen
             1
                 http://www.ihtsdo.org/our-standards/                     medical ontology described in [11]), it was not ef-
             2
                 http://www.nci.nih.gov/cancerinfo/terminologyresources   ﬁcient enough to deal with large-scale ontologies
                                                                          like Snomed ct.


                                                                          1
Representing and sharing knowledge using SNOMED
Proceedings of the 3rd international conference on Knowledge Representation in Medicine (KR-MED 2008)
R. Cornet, K.A. Spackman (Eds)


          In the present paper, we describe a new method
          for axiom pinpointing in the Description Logic               α1    AmpOfFinger     ≡   Amp  ∃site.FingerS
          EL+ , which is based on the computation of so-               α2     AmpOfHand      ≡   Amp  ∃site.HandS
          called reachability-based modules [5]. Our experi-           α3      InjToFinger   ≡   Inj  ∃site.FingerS
          ments on Snomed ct show that the sets of axioms              α4       InjToHand    ≡   Inj  ∃site.HandS
                                                                       α5          FingerE      FingerS
          explaining a given subsumption are usually quite
                                                                       α6          FingerP      FingerS  ∃part.FingerE
          small (78% of the MinAs we computed were of size             α7           HandE       HandS
          ten or less), and that our method is fast enough             α8           HandP       HandS  ∃part.HandE
          (on average, it took one second to obtain a MinA)            α9         ULimbE        ULimbS
          to compute these sets on demand, i.e., whenever              α10         ULimbP       ULimbS  ∃part.ULimbE
          the user asks for a MinA for a suspect subsump-              α11         FingerS      HandP
          tion relationship.                                           α12          HandS       ULimbP

                  Axiom pinpointing in EL+                         Figure 1: Ontology OAmp illustrating a faulty
          In this section, we ﬁrst introduce the DL EL+ ,          SEP-triplet encoding in Snomed ct.
          which is an extension of the DL EL used to deﬁne
          Snomed ct. Then, we deﬁne minimal axiom sets
          (MinAs) for subsumption, and recall some of the          to a subset AI of ΔI and each role name r ∈ RN
          known results about computing MinAs in EL+ .             to a binary relation rI on ΔI . The extension of ·I
                                                                   to arbitrary concept descriptions is inductively de-
            Syntax                 Semantics                       ﬁned, as shown in the semantics column of Table 1.
                                                                   An interpretation I is a model of an ontology O if,
                                  ΔI                              for each inclusion axiom in O, the conditions given
            C D                   C I ∩ DI                        in the semantics column of Table 1 are satisﬁed.
            ∃r.C                   {x ∈ ΔI | ∃y ∈ ΔI :             The main reasoning problem in EL+ is the sub-
                                        (x, y) ∈ rI ∧ y ∈ C I }    sumption problem: given an EL+ ontology O and
                                                                   two EL+ concept descriptions C, D, check whether
            CD                    C I ⊆ DI                        C is subsumed by D w.r.t. O (written C O D),
            r 1 ◦ · · · ◦ rn  s   r1I ◦ · · · ◦ rnI ⊆ sI          i.e., whether C I ⊆ DI holds in all models of O.
                                                                   The computation of all subsumption relationships
                Table 1: Syntax and semantics of EL+ .             between the concept names occurring in O is called
                                                                   classiﬁcation of O.
          Starting with a set of concept names CN and a            Figure 1 shows a small EL ontology deﬁning con-
          set of role names RN, EL+ concept descriptions           cepts related to amputation/injury of hand and
          can be built using the constructors shown in the         ﬁnger. It uses the so-called SEP-triplet encoding
          upper part of Table 1, i.e., every concept name          [12], in which anatomical concepts (like hand) are
          A ∈ CN and the top concept  are EL+ con-                represented by three concepts: the structure con-
          cept descriptions, and if C, D are EL+ concept           cept (e.g, HandS , which stands for the hand and
          descriptions and r ∈ RN is a role name, then             its proper parts), the part concept (e.g, HandP ,
          C  D (conjunction) and ∃r.C (existential restric-       which stands for the proper parts of the hand),
          tion) are EL+ concept descriptions. Role chains          and the entity concept (e.g, HandE , which stands
          of the form r1 ◦ · · · ◦ rn for n ≥ 0 are called role    for the entire hand). The axioms α5 –α10 consti-
          descriptions. An EL+ ontology is a ﬁnite set of          tute a completed SEP-triplet encoding for ﬁnger,
          axioms of the form shown in the lower part of Ta-        hand, and upper limb. For example, axiom α8 says
          ble 1, where axioms of the form C  D are called         that proper parts of the hand belong to the struc-
          general concept inclusions (GCIs) and of the form        ture concept HandS , and they are parts of hand
          r1 ◦ · · · ◦ rn  s role inclusions (RIs). An EL on-     (i.e., linked with the role part to the entity con-
          tology is an EL+ ontology that does not contain          cept HandE ). Given this encoding, the fact that
          RIs. We use C ≡ D as an abbreviation for the two         the ﬁnger is part of the hand can be expressed us-
          GCIs C  D, D  C.                                       ing axiom α11 . The main reason for using this
          The semantics of EL+ is deﬁned in terms of inter-        encoding in Snomed ct is that it can simulate
          pretations I = (ΔI , ·I ), where the domain ΔI is a      transitivity reasoning for the role part, although
          non-empty set of individuals, and the interpreta-        transitivity of part cannot be expressed in EL. For
          tion function ·I maps each concept name A ∈ CN           example, it is easy to see that the ontology OAmp


                                                                   2
Representing and sharing knowledge using SNOMED
Proceedings of the 3rd international conference on Knowledge Representation in Medicine (KR-MED 2008)
R. Cornet, K.A. Spackman (Eds)


          implies that the ﬁnger is part of the upper limb,        Algorithm 1 Naive linear extraction of a MinA.
          i.e., FingerE OAmp ∃part.ULimbE . As a side-eﬀect,      function lin-extract-mina(A, B, O)
          the SEP-triplet encoding can also be used to simu-        1: S := O
          late so-called right-identity rules [13], which allow     2: for each axiom α ∈ O do
                                                                    3:   if A S\{α} B then
          to inherit properties along the part role. Consider       4:      S := S \ {α}
          the following subsumption relationships that hold         5: return S
          in our example ontology:

                 AmpOfFinger     OAmp    AmpOfHand,        (1)    Algorithm 2 Logarithmic extraction of a MinA.
                   InjToFinger   OAmp    InjToHand.        (2)    function log-extract-mina(A, B, O)
                                                                    1: return log-extract-mina-r(A, B, ∅, O)
          While subsumption (2) actually makes sense (it           function log-extract-mina-r(A, B, S, O)
          is sensible to say that an injury to the ﬁnger is         1: if |O| = 1 then
          an injury to the hand), subsumption (1) is clearly        2:    return O
                                                                    3: S1 , S2 := halve(O)
          undesirable. Subsumption (1) is an example of a           4: if A S∪S1 B then
          false positive subsumption, which does indeed oc-         5:    return log-extract-mina-r(A, B, S, S1 )
          cur in Snomed ct. It has been argued [6, 7] that          6: if A S∪S2 B then
          this subsumption is due to a faulty SEP-triplet en-       7:    return log-extract-mina-r(A, B, S, S2 )
          coding. In fact, using the entity concepts instead        8: S1 := log-extract-mina-r(A, B, S ∪ S2 , S1 )
                                                                    9: S2 := log-extract-mina-r(A, B, S ∪ S1 , S2 )
          of the structure concepts in the axioms α1 and α2        10: return S1 ∪ S1
          would have avoided this problem.
          In EL+ , one could actually dispense with the SEP-
          triplet encoding altogether since both transitiv-
          ity and right-identity rules can be expressed using      polynomial time. In [11], this was shown using the
          RIs. For example, part◦part  part expresses tran-       simple Algorithm 1, which requires linearly many
          sitivity of the role part. An alternative and direct     (polynomial) subsumption tests. For a large on-
          representation of anatomical concepts, as well as        tology, however, this naive approach is not feasi-
          referring concepts like clinical ﬁndings and proce-      ble. For example, for Snomed ct it would require
          dures, based on this additional expressive power of      almost half a million subsumption tests for each
          the DL EL+ is proposed in [6]. The new modeling          MinA extraction.
          is succinct and also avoids the above false positive     We can do much better by adopting the algorithm
          subsumption (1).                                         for computing prime implicates described in [14]
          For a small ontology like OAmp , it is not hard to       to our problem. Basically, this algorithm applies
          do the subsumption reasoning manually, and thus          binary search to ﬁnd a MinA. Instead of taking
          also to ﬁnd the axioms responsible for a given sub-      out one axiom at a time, it partitions the ontol-
          sumption relationship by hand. For a very large          ogy into two halves, and checks whether one of
          ontology like Snomed ct, this manual approach            them entails the subsumption. If yes, it imme-
          to pinpointing the responsible axioms is very time-      diately recurses on that half, throwing away half
          consuming, and thus should be automated. First,          of the axioms in one step. Otherwise, essential
          we give a formal deﬁnition of what automated pin-        axioms are in both halves. In this case, the algo-
          pointing is actually supposed to compute.                rithm recurses on each half, using the other half as
          Deﬁnition 1 (MinA). Let O be an EL+ ontol-               the “support set”. Algorithm 2 describes this ap-
          ogy, and A, B concept names such that A O B.            proach in more detail, where the function halve(O)
          The set S ⊆ O is a minimal axiom set (MinA) for          partitions O into S1 ∪ S2 with ||S1 | − |S2 || ≤ 1. It
          A O B if, and only if, A S B and, for every            follows from the results in [14] that computing a
          S  ⊂ S, A S  B.                                  3   MinA S for a given subsumption A O B with Al-
          In our example, {α1 , α2 , α8 , α11 } is the only MinA   gorithm 2 requires O ((|S| − 1) + |S|log(|O|/|S|))
          for subsumption (1), whereas {α3 , α4 , α8 , α11 } is    subsumption tests. This greatly improves on the
          the only MinA for subsumption (2). As shown              naive algorithm. For instance, computing a MinA
          in [11], a given subsumption relationship w.r.t. an      consisting of nine axioms for Snomed ct requires
          EL+ ontology may have exponentially many Mi-             about one hundred subsumption tests. Though
          nAs, and even deciding whether there is a MinA           this is much better than the almost half a million
          of cardinality ≤ k is an NP-complete problem. In         required by the naive algorithm, it is still not good
          contrast, one MinA can always be extracted in            enough to compute MinAs on demand (see below).


                                                                   3
Representing and sharing knowledge using SNOMED
Proceedings of the 3rd international conference on Knowledge Representation in Medicine (KR-MED 2008)
R. Cornet, K.A. Spackman (Eds)


                Modularization-based axiom                         concept and role names occurring in the entity X.
                   pinpointing in EL+                              Deﬁnition 3 (Reachability-based modules).
                                                                   Let O be an EL+ ontology and A a concept name
          Instead of applying Algorithm 1 or 2 directly to
                                                                   occurring in O. The set of A-reachable names in
          the whole ontology O, one can ﬁrst try to ﬁnd a
                                                                   O is the smallest set N of concept and role names
          non-minimal (but hopefully small) subset S ⊆ O
                                                                   such that
          with A S B (called nMinA in the following),
          and then apply Algorithm 1 or 2 to this subset to        • A belongs to N;
          obtain a MinA. In [11], we have sketched a modi-
          ﬁed version of the classiﬁcation algorithm for EL+       • for all (concept/role) inclusion axioms αL  αR
          [3, 4] that extracts such nMinAs. In the exper-            in O, if Sig(αL ) ⊆ N then Sig(αR ) ⊆ N.
          iments on a version of Galen described in [11],
                                                                   We call an axiom αL  αR A-reachable in O if
          Algorithm 1 was then used to minimize these sets.
                                                                   every element of Sig(αL ) is A-reachable in O. The
          Whereas the nMinA extraction was fast and pro-
                                                                   reachability-based module for A in O, denoted by
          duced quite small sets for Galen, it crashed after          reach
                                                                   OA       , consists of all A-reachable axioms from O.
          a few hours because of space problems when ap-
                                                                                                                             3
          plied to Snomed ct.
                                                                                                            reach
          To overcome this problem, we propose an algo-            In [5], it has been shown that OA              is indeed a
          rithm for extracting nMinAs that is based on             subsumption module for A in O. Here, we show
          modularization. In the following, we introduce           the following stronger results.
          only those notions regarding modularization that         Theorem 4. Let O be an EL+ ontology and A a
                                                                                                reach
          are strictly necessary in the context of this pa-        concept name. Then OA              is a strong subsump-
          per. More details regarding the reachability-based       tion module for A in O.
                                                                                                    reach
          modularization approach from which these notions         Proof. The fact that OA                is a subsumption
          are derived, as well as its connection to other work     module was already shown in [5]. To show that
          on modularization, can be found in [5].                  it is strong, assume that A O B holds, but there
          Let O be an EL+ ontology, and A a concept name           is a MinA S for A O B that is not contained in
                                                                      reach                                              reach
          occurring in O. We say that S ⊆ O is a sub-              OA       . Thus, there is an axiom α ∈ S \ OA               .
          sumption module for A in O whenever A O B if,           Let S1 be the subset of S that contains the A-
          and only if, A S B holds for all concept names          reachable axioms. Note that S1 is a strict subset
          B occurring in O. Obviously, if S is a subsump-          of S since α ∈ S1 . We claim that A S B implies
          tion module for A in O and A O B, then S is             A S1 B, which contradicts the assumption that
          an nMinA for this subsumption, and Algorithm 1           S is a MinA for A O B.
          or 2 can be used to compute a MinA S  ⊆ S from          To show the claim, we assume to the contrary that
          S. Thus, we know that a subsumption module               A S1 B, i.e., there is a model I1 of S1 such that
          for A contains a MinA for every valid subsump-           AI1 ⊆ B I1 . We modify I1 to I by setting y I := ∅
          tion relationship A O B. The reachability-based         for all (concept or role) names that are not A-
          modules introduced below satisfy an even stronger        reachable. It is easy to see that AI ⊆ B I . In fact,
          property: they contain all MinAs for all valid sub-      we have AI = AI1 (since A is A-reachable), and
          sumptions.                                               B I = B I1 or B I = ∅.
          Deﬁnition 2. Let O be an EL+ ontology and A              It remains to show that I is indeed a model of
          a concept name occurring in O. The subsumption           S, i.e. satisﬁes all axioms βL  βR in S. If
          module S for A in O is called strong if the fol-         βL contains a name that is not A-reachable, then
          lowing holds for all concept names B occurring in        (βL )I = ∅, and the axiom is trivially satisﬁed.
          O: if A O B, then every MinA for A O B is a            Otherwise, this axiom belongs to S1 , and the def-
          subset of S.                                   3         inition of A-reachability implies that all names in
                                                                   βR are A-reachable as well. Consequently, I1 and
          Obviously, O itself is a strong subsumption mod-         I coincide on the names occurring in βL  βR .
          ule for every concept name A occurring in O.             Since I1 is a model of S1 , we thus have (βL )I =
          The following deﬁnition (ﬁrst introduced in [5])         (βL )I1 ⊆ (βR )I1 = (βR )I .                              ❏
          yields strong subsumption modules that are usu-
          ally much smaller than the whole ontology. For           As an immediate consequence of this theorem, in-
          an EL+ entity X—i.e., either a (concept or role)         stead of extracting a MinA for A O B from O, it
          description, a (concept or role) inclusion axiom, or     is suﬃcient to extract a MinA for A OAreach B from
                                                                      reach
          an ontology—we write Sig(X) to denote the set of         OA       . This is what the function extract-mina in


                                                                   4
Representing and sharing knowledge using SNOMED
Proceedings of the 3rd international conference on Knowledge Representation in Medicine (KR-MED 2008)
R. Cornet, K.A. Spackman (Eds)


          Algorithm 3 Modularization-based extraction of           pute subsumption. Our experiments use the
          a MinA                                                   January/2005 release of the DL version of
          function extract-mina(A, B, O)                           Snomed ct, which contains 379,691 concept
                reach
           1: OA      ← extract-module(O, A)                       names, 62 role names, and 379,704 axioms.3 In
                                                 reach
           2: return log-extract-mina(A, B, OA         )           the following, we call this ontology OSnomed . The
                                           reach
          function second-mina?(A, B, OA , S1 )                    experiments were carried out on a PC with 2.40
           1: for each axiom α ∈ S1 do                             GHz Pentium-4 processor and 1 GB of memory.
           2:    O  ← OAreach
                               \{α}                                As stand-alone algorithms for computing a MinA,
           3:    if A O B then
           4:       return “second MinA exists”                    we applied Algorithm 1 and 2 only to the
           5: return “MinA unique”                                 false positive subsumption AmpOfFinger OSnomed
          function extract-module(O, A)                            AmpOfHand. Algorithm 1 did not terminate on
           1: OA ← ∅                                               this input after 24 hours, whereas Algorithm 2 re-
           2: queue ← active-axioms({A})
           3: while not empty(queue) do                            quired 26:05 minutes (1,565 seconds) to compute a
           4:    (αL  αR ) ← fetch(queue)                         MinA of cardinality 6. (Note that the actual mod-
           5:    if Sig(αL ) ⊆ {A} ∪ Sig(OA ) then                 elling of “amputation of ﬁnger” and “amputation
           6:       OA ← OA ∪ {αL  αR }                           of hand” in Snomed ct diﬀers from the one given
           7:       queue ← queue ∪
                             (active-axioms(Sig(αR )) \ OA )       in Fig. 1 due to the use of role groups and of two
           8: return OA                                            diﬀerent roles to express location in Snomed ct.
                                                                   Thus, the computed MinA also diﬀers from the
                                                                   one given above. However, it also shows that the
          Algorithm 3 does. Note that, instead of the log-         reason for the unintended subsumption is the in-
          arithmic extraction algorithm (Algorithm 2), we          correct use of the SEP-triplet encoding.)
          could also use the linear extraction algorithm (Al-      Algorithm 3 performs much better for the am-
          gorithm 1). Since reachability-based modules are         putation example. The reachability-based mod-
          usually quite small, it is not a priori clear whether    ule OAmpOfFinger
                                                                          Snomed
                                                                                      contains 57 axioms, and was
          using the more complicated logarithmic algorithm         computed in 0.04 seconds. Extracting a MinA
          really pays oﬀ (see the results of our experiments       for AmpOfFinger OAmpOfFinger
                                                                                           Snomed   AmpOfHand from
          below). The function second-mina? in Algorithm 3         OAmpOfFinger
                                                                     Snomed
                                                                                using the logarithmic minimization al-
          takes the extracted module and the ﬁrst MinA as          gorithm then took only half a second. An appli-
          input, and checks if the subsumption in question         cation of second-mina? then showed that the ex-
          still holds in the absence of one of the axioms in       tracted MinA is the only one for this subsumption.
          the MinA. In this case, this subsumption obviously       We have also applied Algorithm 3 to a large num-
          must have more than one MinA. Note that, for this        ber of subsumption relationships that follow from
          function to be correct, we really need to know that      OSnomed . Since there are more than ﬁve million
            reach
          OA      is a strong subsumption module.                  such subsumptions, testing the algorithm on all
          The function extract-module in Algorithm 3 re-           of them was not feasible: assuming an average
          alizes one way of computing reachability-based           extraction time of 1 second, this would have re-
          modules. The function call active-axioms used            quired 58 days. For this reason, we sampled 0.5%
          there yields, for a given set of names, all ax-          of all concepts in each top-level category C in
          ioms that contain at least one of these names in         Snomed ct. Let us denote the set of samples for
          their left-hand side. It is not hard to show that        category C by c-samples(C). For each sampled
          the call extract-module(O, A) indeed computes the        concept A, all positive subsumptions A OSnomed B
          reachability-based module for A in O (see [5] for        with A as subsumee were considered.
          more details). The experiments described in [5]
                                                                   The ﬁrst column of Table 2 shows the top-level
          show that extraction of reachability-based mod-
                                                                   categories and the second the number of sampled
          ules in Snomed ct is usually quite fast, and the
                                                                   subsumption relationships with the subsumee in
          modules obtained this way are quite small. In the
                                                                   this category. The next four columns give the
          next section, we show that these positive results
                                                                   time needed to compute and the size of the cor-
          extend to the modularization-based extraction of
                                                                   responding modules and MinAs. The values in
          MinAs.
                                                                   square brackets give the time required by the
                     Experimental Results                              3
                                                                       The DL version is also known in the SNOMED
          We have implemented the three algorithms de-             lingo as the ‘stated form,’ while axioms here boil down
          scribed in this paper, using CEL [4] to com-             to (primitive) concept deﬁnitions.


                                                                   5
                                                                                                                                                       R. Cornet, K.A. Spackman (Eds)


                                                       Time to extract         Size of      Time to extract MinA        Size of MinA        %Subs.
            Snomed category C                Subs.              Snomed         Snomed
                                                       module OA              OA                for A OASnomed B      for A OASnomed B     with
             A ∈ c-samples(C)               samples
                                                         (avg/max)          (avg/max)              (avg/max)              (avg/max)        one MinA
                 Attribute                        25    < 0.01 / < 0.01           5.12/8   0.05 / 0.09 [0.15 / 0.18]             3.16/7          100
              Body structure                   4 738      < 0.01 / 0.01      40.76 / 76    0.41 / 4.19 [0.63 / 2.24]          5.54 / 18        64.16
             Clinical Finding                 11 112         0.03 / 3.97   71.50 / 143     1.66 / 9.58 [1.15 / 5.04]          9.00 / 34        63.00
        Context-dependent categories             208         0.01 / 0.03     0.14 / 108    0.37 / 1.43 [0.63 / 1.77]          4.10 / 13        95.67
                                                                                                                                                       Representing and sharing knowledge using SNOMED


    Environments & geographical locations         51    < 0.01 / < 0.01         7.65 / 9   0.07 / 0.12 [0.17 / 0.19]           3.82 / 8          100
                   Events                         28    < 0.01 / < 0.01         4.64 / 6   0.04 / 0.08 [0.13 / 0.16]           2.32 / 5          100
             Observable entity                   253    < 0.01 / < 0.01       8.26 / 12    0.08 / 0.18 [0.18 / 0.24]           3.68 / 8        90.12
                 Organism                      1 429      < 0.01 / 0.01      13.03 / 21    0.09 / 0.20 [0.25 / 0.36]          4.72 / 13        65.01
      Pharmaceutical/biologic product          1 233      < 0.01 / 0.01      31.41 / 60    0.16 / 0.47 [0.50 / 0.91]          3.68 / 10        81.51
               Physical force                      6    < 0.01 / < 0.01         7.00 / 7   0.38 / 0.58 [0.16 / 0.17]           2.83 / 5        50.00
              Physical object                    166                          9.35 / 12    0.09 / 0.19 [0.19 / 0.24]          4.18 / 11        93.98


6
                                                        < 0.01 / < 0.01
                 Procedure                     5 183         0.02 / 0.05   71.89 / 146     0.62 / 5.21 [0.15 / 4.38]          8.65 / 36        66.29
              Qualiﬁer value                     216      < 0.01 / 0.01       6.68 / 11    0.06 / 0.13 [0.16 / 0.22]           2.67 / 7        87.96
               Social context                    204      < 0.01 / 0.01      10.11 / 15    0.18 / 0.47 [0.21 / 0.28]           3.59 / 9        77.45
              Special concept                  1 272      < 0.01 / 0.01         5.00 / 5   0.05 / 0.09 [0.14 / 0.14]            2.5 / 4          100
                 Specimen                         38         0.01 / 0.02   67.74 / 127     0.19 / 0.59 [1.06 / 2.11]          4.55 / 13        81.58
            Staging and scales                    20    < 0.01 / < 0.01         5.60 / 8   0.04 / 0.08 [0.14 / 0.18]            2.8 / 7          100
                 Substance                     1 295      < 0.01 / 0.01      14.76 / 32    0.16 / 0.41 [0.19 / 0.51]          4.17 / 12        72.82
           Overall in Snomed ct               27 477         0.02 / 3.97   53.21 / 146     1.03 / 9.58 [0.67 / 5.04]          7.11 / 36        68.26

        Table 2: Empirical results of the modularization-based axiom pinpointing on Snomed ct (time in seconds; size in number of axioms).
                                                                                                                                                       Proceedings of the 3rd international conference on Knowledge Representation in Medicine (KR-MED 2008)
Representing and sharing knowledge using SNOMED
Proceedings of the 3rd international conference on Knowledge Representation in Medicine (KR-MED 2008)
R. Cornet, K.A. Spackman (Eds)


               25000                                                                Acknowledgements
                              21377          #Modules               The ﬁrst author was partially supported by NICTA,
                20000
                                             #MinA                  Canberra Research Lab, and the second by the Ger-
                 15000                                              man Research Foundation (DFG) under grant BA
                  10000
                                                                    1122/11-1.
                       5000

                          0
                                                   Size (#axioms)            Address for Correspondence
                                                                          Franz Baader and Boontawee Suntisrivaraporn
                                  11 0
                                  21 0
                                 31 0
                                 41 0
                                        1


                                51 0
                                      -2


                                61 0
                                      -3


                               71 0
                                     -4


                                                                           TU Dresden, Theoretical Computer Science,
                                     1-


                               81 0
                                     -5


                             91 90
                                    -6


                           10 00
                                    -7


                           11 10
                                   -8


                          12 20

                          13 30

                                  0
                                  -
                                -1


                               14
                                1
                                1
                               1
                              0-
                             1-
                             1-


                                                                                    01062 Dresden, Germany
                            1-


            Figure 2: Module and MinA size distribution.                     {baader,meng}@tcs.inf.tu-dresden.de

                                                                                          References
          modularization-based pinpointing algorithm, but               [1] F. Baader, D. Calvanese, D. McGuinness, D.
                                                                            Nardi, and P. F. Patel-Schneider, editors. The
          with the naive linear minimization algorithm in-                  Description Logic Handbook: Theory, Implemen-
          stead of the logarithmic one. In all four columns,                tation, and Applications. Cambridge University
                                                                            Press, 2003.
          we give both average and maximum values. The
          last column shows the percentage of subsump-                  [2] I. Horrocks, P. F. P.-Schneider, and F. van
                                                                            Harmelen. From SHIQ and RDF to OWL: The
          tions that have only one MinA. Interestingly, more                making of a web ontology language. Journal of
          than two thirds of all subsumptions have only                     Web Semantics, 1(1):7–26, 2003.
          one MinA. The overall empirical results for the               [3] F. Baader, S. Brandt, and C. Lutz. Pushing the
          27,477 sampled subsumptions (about 0.5% of all                    EL envelope. In Proc. IJCAI 2005.
          subsumptions) are given in the last row of the ta-            [4] F. Baader, C. Lutz, and B. Suntisrivaraporn.
          ble. These results show that, on average, a MinA                  CEL—a polynomial-time reasoner for life science
          can be computed within one second, and its size is                ontologies. In Proc. IJCAR 2006, Springer LNAI
                                                                            4130, 2006.
          smaller than 10. Thus, MinAs can indeed be com-
                                                                        [5] B. Suntisrivaraporn. Module extraction and in-
          puted on demand, and their size is small enough                   cremental classiﬁcation: A pragmatic approach
          such that they can then be inspected by hand.                     for EL+ ontologies.     In Proc. ESWC 2008,
          Surprisingly, the linear minimization algorithm                   Springer LNCS, 2008. To appear.
          performed better in our experiments than the log-             [6] B. Suntisrivaraporn, F. Baader, S. Schulz, and K.
          arithmic one. An explanation for this is probably                 Spackman. Replacing SEP-triplets in SNOMED
                                                                            CT using tractable description logic operators. In
          that, unlike the experiments of Algorithm 1 and 2                 Proc. AIME 2007, Springer LNCS 4594, 2007.
          on the whole ontology, the modules are already
                                                                        [7] U. Hahn S. Schulz, K. Mark. Spatial location
          quite small, and thus the overhead required by the                and its relevance for terminological inferences in
          logarithmic algorithm does not pay oﬀ. Figure 2                   bio-ontologies. BMC Bioinformatics, 2007.
          depicts the size distribution of our sampled mod-             [8] S. Schlobach and R. Cornet. Non-standard rea-
          ules and MinAs. As easily visible from the chart,                 soning services for the debugging of description
          the modules are quite small, but the MinAs are                    logic terminologies. In Proc. IJCAI 2003.
          even smaller. In fact, the majority of all subsump-           [9] B. Parsia, E. Sirin, and A. Kalyanpur. Debugging
          tions (78%) have a MinA of size ten or less.                      OWL ontologies. In Proc. WWW 2005.
                                                                    [10] T. Meyer, K. Lee, R. Booth, and J. Z. Pan. Find-
                                      Conclusions                        ing maximally satisﬁable terminologies for the
                                                                         description logic ALC. In Proc. (AAAI 2006).
          We have introduced a new method for axiom pin-                 AAAI Press/The MIT Press, 2006.
          pointing in the DL EL+ that is based on the com-          [11] F. Baader, R. Peñaloza, and B. Suntisrivaraporn.
          putation of reachability-based modules. The ex-                Pinpointing in the description logic EL+ . In Proc.
          periments carried out on Snomed ct show that                   KI 2007, Springer LNAI 4667, 2007.
          this method is fast enough to extract a minimal           [12] S. Schulz, M. Romacker, and U. Hahn. Part-
          axiom set (MinA) for a given subsumption on de-                whole reasoning in medical ontologies revisited:
                                                                         Introducing SEP triplets into classiﬁcation-based
          mand. In addition, the extracted MinAs are usu-                description logics. JAMIA, 1998.
          ally quite small and can therefore be inspected by
                                                                    [13] K.A. Spackman. Managing clinical terminol-
          users and designers of Snomed ct by hand. In the               ogy hierarchies using algorithmic calculation of
          future, we will extend the approach such that it               subsumption: Experience with SNOMED-RT.
                                                                         JAMIA, 2000. Fall Symposium Special Issue.
          can (i) extract all MinAs, (ii) provide natural lan-
          guage explanations for subsumption, and (iii) give        [14] A. R. Bradley and Z. Manna. Checking safety
                                                                         by inductive generalization of counterexamples to
          suggestions for how to revise the ontology to get              induction. In Proc. FMCAD 2007.
          rid of an unwanted subsumption.


                                                                    7