Representing and sharing knowledge using SNOMED Proceedings of the 3rd international conference on Knowledge Representation in Medicine (KR-MED 2008) R. Cornet, K.A. Spackman (Eds) Debugging Snomed ct Using Axiom Pinpointing in the Description Logic EL+ Franz Baader and Boontawee Suntisrivaraporn Institute for Theoretical Computer Science, TU Dresden, Germany Abstract the subsumption (subconcept–superconcept) rela- Snomed ct is a large-scale medical ontology, tionships between the concepts defined in the on- which is developed using a variant of the inexpres- tology. The advantage of using an inexpressive DL sive Description Logic EL. Description Logic rea- of the EL family is that classification is tractable, soning can not only be used to compute subsump- i.e., EL reasoners such as CEL [4] can compute tion relationships between Snomed concepts, but the subsumption hierarchy of a given ontology in also to pinpoint the reason why a certain subsump- polynomial time. tion relationship holds by computing the axioms responsible for this relationship. This helps devel- Similar to writing large programs, building large- opers and users of Snomed ct to understand why scale ontologies is an error-prone endeavor. Clas- a given subsumption relationship follows from the sification can help to alert the developer or user ontology, which can be seen as a first step toward of an ontology to the existence of errors. For removing unwanted subsumption relationships. example, the subsumption relationship between In this paper, we describe a new method for axiom “amputation of finger” and “amputation of up- pinpointing in the Description Logic EL+ , which is per limb” in Snomed ct is clearly unintended based on the computation of so-called reachability- [6, 7], and thus reveals a modeling error. How- based modules. Our experiments on Snomed ct ever, given an unintended subsumption relation- show that the sets of axioms explaining subsump- ship in a large ontology like Snomed ct with al- tion are usually quite small, and that our method most four hundred thousand axioms, it is not al- is fast enough to compute such sets on demand. ways easy to find the erroneous axioms responsible for it by hand. To overcome this problem, the DL Introduction community has recently invested quite some work on automating this process. Given a subsump- Description Logics (DLs) [1] are a family of logic- tion relationship or another questionable conse- based knowledge representation formalisms, which quence, axiom pinpointing computes a minimal can be used to develop ontologies in a formally subset (all minimal subsets) of the ontology that well-founded way. This is true both for expres- have this consequence (called MinAs in the fol- sive DLs, which are the logical basis of the Web lowing). Most of the work on axiom pinpointing Ontology Language OWL [2], and for inexpres- in DLs was concerned with rather expressive DLs sive DLs of the EL family [3], which are used in (see, e.g., [8, 9, 10]). The only work that concen- the design of large-scale medical ontologies such as trated on pinpointing in the EL family of DLs was Snomed ct1 and the National Cancer Institute’s [11]. In addition to providing complexity results ontology.2 for pinpointing, [11] introduces a “pragmatic” al- One of the main advantages of employing a logic- gorithm for computing one MinA, which is based based ontology language is that reasoning services on a modified version of the classification algo- can be used to derive implicit knowledge from the rithm used by the CEL reasoner [4]. Though this one explicitly represented. DL systems can, for ex- approach worked quite well for mid-size ontologies ample, classify a given ontology, i.e., compute all (see the experiments on a variant of the Galen 1 http://www.ihtsdo.org/our-standards/ medical ontology described in [11]), it was not ef- 2 http://www.nci.nih.gov/cancerinfo/terminologyresources ficient enough to deal with large-scale ontologies like Snomed ct. 1 Representing and sharing knowledge using SNOMED Proceedings of the 3rd international conference on Knowledge Representation in Medicine (KR-MED 2008) R. Cornet, K.A. Spackman (Eds) In the present paper, we describe a new method for axiom pinpointing in the Description Logic α1 AmpOfFinger ≡ Amp  ∃site.FingerS EL+ , which is based on the computation of so- α2 AmpOfHand ≡ Amp  ∃site.HandS called reachability-based modules [5]. Our experi- α3 InjToFinger ≡ Inj  ∃site.FingerS ments on Snomed ct show that the sets of axioms α4 InjToHand ≡ Inj  ∃site.HandS α5 FingerE  FingerS explaining a given subsumption are usually quite α6 FingerP  FingerS  ∃part.FingerE small (78% of the MinAs we computed were of size α7 HandE  HandS ten or less), and that our method is fast enough α8 HandP  HandS  ∃part.HandE (on average, it took one second to obtain a MinA) α9 ULimbE  ULimbS to compute these sets on demand, i.e., whenever α10 ULimbP  ULimbS  ∃part.ULimbE the user asks for a MinA for a suspect subsump- α11 FingerS  HandP tion relationship. α12 HandS  ULimbP Axiom pinpointing in EL+ Figure 1: Ontology OAmp illustrating a faulty In this section, we first introduce the DL EL+ , SEP-triplet encoding in Snomed ct. which is an extension of the DL EL used to define Snomed ct. Then, we define minimal axiom sets (MinAs) for subsumption, and recall some of the to a subset AI of ΔI and each role name r ∈ RN known results about computing MinAs in EL+ . to a binary relation rI on ΔI . The extension of ·I to arbitrary concept descriptions is inductively de- Syntax Semantics fined, as shown in the semantics column of Table 1. An interpretation I is a model of an ontology O if,  ΔI for each inclusion axiom in O, the conditions given C D C I ∩ DI in the semantics column of Table 1 are satisfied. ∃r.C {x ∈ ΔI | ∃y ∈ ΔI : The main reasoning problem in EL+ is the sub- (x, y) ∈ rI ∧ y ∈ C I } sumption problem: given an EL+ ontology O and two EL+ concept descriptions C, D, check whether CD C I ⊆ DI C is subsumed by D w.r.t. O (written C O D), r 1 ◦ · · · ◦ rn  s r1I ◦ · · · ◦ rnI ⊆ sI i.e., whether C I ⊆ DI holds in all models of O. The computation of all subsumption relationships Table 1: Syntax and semantics of EL+ . between the concept names occurring in O is called classification of O. Starting with a set of concept names CN and a Figure 1 shows a small EL ontology defining con- set of role names RN, EL+ concept descriptions cepts related to amputation/injury of hand and can be built using the constructors shown in the finger. It uses the so-called SEP-triplet encoding upper part of Table 1, i.e., every concept name [12], in which anatomical concepts (like hand) are A ∈ CN and the top concept  are EL+ con- represented by three concepts: the structure con- cept descriptions, and if C, D are EL+ concept cept (e.g, HandS , which stands for the hand and descriptions and r ∈ RN is a role name, then its proper parts), the part concept (e.g, HandP , C  D (conjunction) and ∃r.C (existential restric- which stands for the proper parts of the hand), tion) are EL+ concept descriptions. Role chains and the entity concept (e.g, HandE , which stands of the form r1 ◦ · · · ◦ rn for n ≥ 0 are called role for the entire hand). The axioms α5 –α10 consti- descriptions. An EL+ ontology is a finite set of tute a completed SEP-triplet encoding for finger, axioms of the form shown in the lower part of Ta- hand, and upper limb. For example, axiom α8 says ble 1, where axioms of the form C  D are called that proper parts of the hand belong to the struc- general concept inclusions (GCIs) and of the form ture concept HandS , and they are parts of hand r1 ◦ · · · ◦ rn  s role inclusions (RIs). An EL on- (i.e., linked with the role part to the entity con- tology is an EL+ ontology that does not contain cept HandE ). Given this encoding, the fact that RIs. We use C ≡ D as an abbreviation for the two the finger is part of the hand can be expressed us- GCIs C  D, D  C. ing axiom α11 . The main reason for using this The semantics of EL+ is defined in terms of inter- encoding in Snomed ct is that it can simulate pretations I = (ΔI , ·I ), where the domain ΔI is a transitivity reasoning for the role part, although non-empty set of individuals, and the interpreta- transitivity of part cannot be expressed in EL. For tion function ·I maps each concept name A ∈ CN example, it is easy to see that the ontology OAmp 2 Representing and sharing knowledge using SNOMED Proceedings of the 3rd international conference on Knowledge Representation in Medicine (KR-MED 2008) R. Cornet, K.A. Spackman (Eds) implies that the finger is part of the upper limb, Algorithm 1 Naive linear extraction of a MinA. i.e., FingerE OAmp ∃part.ULimbE . As a side-effect, function lin-extract-mina(A, B, O) the SEP-triplet encoding can also be used to simu- 1: S := O late so-called right-identity rules [13], which allow 2: for each axiom α ∈ O do 3: if A S\{α} B then to inherit properties along the part role. Consider 4: S := S \ {α} the following subsumption relationships that hold 5: return S in our example ontology: AmpOfFinger OAmp AmpOfHand, (1) Algorithm 2 Logarithmic extraction of a MinA. InjToFinger OAmp InjToHand. (2) function log-extract-mina(A, B, O) 1: return log-extract-mina-r(A, B, ∅, O) While subsumption (2) actually makes sense (it function log-extract-mina-r(A, B, S, O) is sensible to say that an injury to the finger is 1: if |O| = 1 then an injury to the hand), subsumption (1) is clearly 2: return O 3: S1 , S2 := halve(O) undesirable. Subsumption (1) is an example of a 4: if A S∪S1 B then false positive subsumption, which does indeed oc- 5: return log-extract-mina-r(A, B, S, S1 ) cur in Snomed ct. It has been argued [6, 7] that 6: if A S∪S2 B then this subsumption is due to a faulty SEP-triplet en- 7: return log-extract-mina-r(A, B, S, S2 ) coding. In fact, using the entity concepts instead 8: S1 := log-extract-mina-r(A, B, S ∪ S2 , S1 ) 9: S2 := log-extract-mina-r(A, B, S ∪ S1 , S2 ) of the structure concepts in the axioms α1 and α2 10: return S1 ∪ S1 would have avoided this problem. In EL+ , one could actually dispense with the SEP- triplet encoding altogether since both transitiv- ity and right-identity rules can be expressed using polynomial time. In [11], this was shown using the RIs. For example, part◦part  part expresses tran- simple Algorithm 1, which requires linearly many sitivity of the role part. An alternative and direct (polynomial) subsumption tests. For a large on- representation of anatomical concepts, as well as tology, however, this naive approach is not feasi- referring concepts like clinical findings and proce- ble. For example, for Snomed ct it would require dures, based on this additional expressive power of almost half a million subsumption tests for each the DL EL+ is proposed in [6]. The new modeling MinA extraction. is succinct and also avoids the above false positive We can do much better by adopting the algorithm subsumption (1). for computing prime implicates described in [14] For a small ontology like OAmp , it is not hard to to our problem. Basically, this algorithm applies do the subsumption reasoning manually, and thus binary search to find a MinA. Instead of taking also to find the axioms responsible for a given sub- out one axiom at a time, it partitions the ontol- sumption relationship by hand. For a very large ogy into two halves, and checks whether one of ontology like Snomed ct, this manual approach them entails the subsumption. If yes, it imme- to pinpointing the responsible axioms is very time- diately recurses on that half, throwing away half consuming, and thus should be automated. First, of the axioms in one step. Otherwise, essential we give a formal definition of what automated pin- axioms are in both halves. In this case, the algo- pointing is actually supposed to compute. rithm recurses on each half, using the other half as Definition 1 (MinA). Let O be an EL+ ontol- the “support set”. Algorithm 2 describes this ap- ogy, and A, B concept names such that A O B. proach in more detail, where the function halve(O) The set S ⊆ O is a minimal axiom set (MinA) for partitions O into S1 ∪ S2 with ||S1 | − |S2 || ≤ 1. It A O B if, and only if, A S B and, for every follows from the results in [14] that computing a S  ⊂ S, A S  B. 3 MinA S for a given subsumption A O B with Al- In our example, {α1 , α2 , α8 , α11 } is the only MinA gorithm 2 requires O ((|S| − 1) + |S|log(|O|/|S|)) for subsumption (1), whereas {α3 , α4 , α8 , α11 } is subsumption tests. This greatly improves on the the only MinA for subsumption (2). As shown naive algorithm. For instance, computing a MinA in [11], a given subsumption relationship w.r.t. an consisting of nine axioms for Snomed ct requires EL+ ontology may have exponentially many Mi- about one hundred subsumption tests. Though nAs, and even deciding whether there is a MinA this is much better than the almost half a million of cardinality ≤ k is an NP-complete problem. In required by the naive algorithm, it is still not good contrast, one MinA can always be extracted in enough to compute MinAs on demand (see below). 3 Representing and sharing knowledge using SNOMED Proceedings of the 3rd international conference on Knowledge Representation in Medicine (KR-MED 2008) R. Cornet, K.A. Spackman (Eds) Modularization-based axiom concept and role names occurring in the entity X. pinpointing in EL+ Definition 3 (Reachability-based modules). Let O be an EL+ ontology and A a concept name Instead of applying Algorithm 1 or 2 directly to occurring in O. The set of A-reachable names in the whole ontology O, one can first try to find a O is the smallest set N of concept and role names non-minimal (but hopefully small) subset S ⊆ O such that with A S B (called nMinA in the following), and then apply Algorithm 1 or 2 to this subset to • A belongs to N; obtain a MinA. In [11], we have sketched a modi- fied version of the classification algorithm for EL+ • for all (concept/role) inclusion axioms αL  αR [3, 4] that extracts such nMinAs. In the exper- in O, if Sig(αL ) ⊆ N then Sig(αR ) ⊆ N. iments on a version of Galen described in [11], We call an axiom αL  αR A-reachable in O if Algorithm 1 was then used to minimize these sets. every element of Sig(αL ) is A-reachable in O. The Whereas the nMinA extraction was fast and pro- reachability-based module for A in O, denoted by duced quite small sets for Galen, it crashed after reach OA , consists of all A-reachable axioms from O. a few hours because of space problems when ap- 3 plied to Snomed ct. reach To overcome this problem, we propose an algo- In [5], it has been shown that OA is indeed a rithm for extracting nMinAs that is based on subsumption module for A in O. Here, we show modularization. In the following, we introduce the following stronger results. only those notions regarding modularization that Theorem 4. Let O be an EL+ ontology and A a reach are strictly necessary in the context of this pa- concept name. Then OA is a strong subsump- per. More details regarding the reachability-based tion module for A in O. reach modularization approach from which these notions Proof. The fact that OA is a subsumption are derived, as well as its connection to other work module was already shown in [5]. To show that on modularization, can be found in [5]. it is strong, assume that A O B holds, but there Let O be an EL+ ontology, and A a concept name is a MinA S for A O B that is not contained in reach reach occurring in O. We say that S ⊆ O is a sub- OA . Thus, there is an axiom α ∈ S \ OA . sumption module for A in O whenever A O B if, Let S1 be the subset of S that contains the A- and only if, A S B holds for all concept names reachable axioms. Note that S1 is a strict subset B occurring in O. Obviously, if S is a subsump- of S since α ∈ S1 . We claim that A S B implies tion module for A in O and A O B, then S is A S1 B, which contradicts the assumption that an nMinA for this subsumption, and Algorithm 1 S is a MinA for A O B. or 2 can be used to compute a MinA S  ⊆ S from To show the claim, we assume to the contrary that S. Thus, we know that a subsumption module A S1 B, i.e., there is a model I1 of S1 such that for A contains a MinA for every valid subsump- AI1 ⊆ B I1 . We modify I1 to I by setting y I := ∅ tion relationship A O B. The reachability-based for all (concept or role) names that are not A- modules introduced below satisfy an even stronger reachable. It is easy to see that AI ⊆ B I . In fact, property: they contain all MinAs for all valid sub- we have AI = AI1 (since A is A-reachable), and sumptions. B I = B I1 or B I = ∅. Definition 2. Let O be an EL+ ontology and A It remains to show that I is indeed a model of a concept name occurring in O. The subsumption S, i.e. satisfies all axioms βL  βR in S. If module S for A in O is called strong if the fol- βL contains a name that is not A-reachable, then lowing holds for all concept names B occurring in (βL )I = ∅, and the axiom is trivially satisfied. O: if A O B, then every MinA for A O B is a Otherwise, this axiom belongs to S1 , and the def- subset of S. 3 inition of A-reachability implies that all names in βR are A-reachable as well. Consequently, I1 and Obviously, O itself is a strong subsumption mod- I coincide on the names occurring in βL  βR . ule for every concept name A occurring in O. Since I1 is a model of S1 , we thus have (βL )I = The following definition (first introduced in [5]) (βL )I1 ⊆ (βR )I1 = (βR )I . ❏ yields strong subsumption modules that are usu- ally much smaller than the whole ontology. For As an immediate consequence of this theorem, in- an EL+ entity X—i.e., either a (concept or role) stead of extracting a MinA for A O B from O, it description, a (concept or role) inclusion axiom, or is sufficient to extract a MinA for A OAreach B from reach an ontology—we write Sig(X) to denote the set of OA . This is what the function extract-mina in 4 Representing and sharing knowledge using SNOMED Proceedings of the 3rd international conference on Knowledge Representation in Medicine (KR-MED 2008) R. Cornet, K.A. Spackman (Eds) Algorithm 3 Modularization-based extraction of pute subsumption. Our experiments use the a MinA January/2005 release of the DL version of function extract-mina(A, B, O) Snomed ct, which contains 379,691 concept reach 1: OA ← extract-module(O, A) names, 62 role names, and 379,704 axioms.3 In reach 2: return log-extract-mina(A, B, OA ) the following, we call this ontology OSnomed . The reach function second-mina?(A, B, OA , S1 ) experiments were carried out on a PC with 2.40 1: for each axiom α ∈ S1 do GHz Pentium-4 processor and 1 GB of memory. 2: O  ← OAreach \{α} As stand-alone algorithms for computing a MinA, 3: if A O B then 4: return “second MinA exists” we applied Algorithm 1 and 2 only to the 5: return “MinA unique” false positive subsumption AmpOfFinger OSnomed function extract-module(O, A) AmpOfHand. Algorithm 1 did not terminate on 1: OA ← ∅ this input after 24 hours, whereas Algorithm 2 re- 2: queue ← active-axioms({A}) 3: while not empty(queue) do quired 26:05 minutes (1,565 seconds) to compute a 4: (αL  αR ) ← fetch(queue) MinA of cardinality 6. (Note that the actual mod- 5: if Sig(αL ) ⊆ {A} ∪ Sig(OA ) then elling of “amputation of finger” and “amputation 6: OA ← OA ∪ {αL  αR } of hand” in Snomed ct differs from the one given 7: queue ← queue ∪ (active-axioms(Sig(αR )) \ OA ) in Fig. 1 due to the use of role groups and of two 8: return OA different roles to express location in Snomed ct. Thus, the computed MinA also differs from the one given above. However, it also shows that the Algorithm 3 does. Note that, instead of the log- reason for the unintended subsumption is the in- arithmic extraction algorithm (Algorithm 2), we correct use of the SEP-triplet encoding.) could also use the linear extraction algorithm (Al- Algorithm 3 performs much better for the am- gorithm 1). Since reachability-based modules are putation example. The reachability-based mod- usually quite small, it is not a priori clear whether ule OAmpOfFinger Snomed contains 57 axioms, and was using the more complicated logarithmic algorithm computed in 0.04 seconds. Extracting a MinA really pays off (see the results of our experiments for AmpOfFinger OAmpOfFinger Snomed AmpOfHand from below). The function second-mina? in Algorithm 3 OAmpOfFinger Snomed using the logarithmic minimization al- takes the extracted module and the first MinA as gorithm then took only half a second. An appli- input, and checks if the subsumption in question cation of second-mina? then showed that the ex- still holds in the absence of one of the axioms in tracted MinA is the only one for this subsumption. the MinA. In this case, this subsumption obviously We have also applied Algorithm 3 to a large num- must have more than one MinA. Note that, for this ber of subsumption relationships that follow from function to be correct, we really need to know that OSnomed . Since there are more than five million reach OA is a strong subsumption module. such subsumptions, testing the algorithm on all The function extract-module in Algorithm 3 re- of them was not feasible: assuming an average alizes one way of computing reachability-based extraction time of 1 second, this would have re- modules. The function call active-axioms used quired 58 days. For this reason, we sampled 0.5% there yields, for a given set of names, all ax- of all concepts in each top-level category C in ioms that contain at least one of these names in Snomed ct. Let us denote the set of samples for their left-hand side. It is not hard to show that category C by c-samples(C). For each sampled the call extract-module(O, A) indeed computes the concept A, all positive subsumptions A OSnomed B reachability-based module for A in O (see [5] for with A as subsumee were considered. more details). The experiments described in [5] The first column of Table 2 shows the top-level show that extraction of reachability-based mod- categories and the second the number of sampled ules in Snomed ct is usually quite fast, and the subsumption relationships with the subsumee in modules obtained this way are quite small. In the this category. The next four columns give the next section, we show that these positive results time needed to compute and the size of the cor- extend to the modularization-based extraction of responding modules and MinAs. The values in MinAs. square brackets give the time required by the Experimental Results 3 The DL version is also known in the SNOMED We have implemented the three algorithms de- lingo as the ‘stated form,’ while axioms here boil down scribed in this paper, using CEL [4] to com- to (primitive) concept definitions. 5 R. Cornet, K.A. Spackman (Eds) Time to extract Size of Time to extract MinA Size of MinA %Subs. Snomed category C Subs. Snomed Snomed module OA OA for A OASnomed B for A OASnomed B with A ∈ c-samples(C) samples (avg/max) (avg/max) (avg/max) (avg/max) one MinA Attribute 25 < 0.01 / < 0.01 5.12/8 0.05 / 0.09 [0.15 / 0.18] 3.16/7 100 Body structure 4 738 < 0.01 / 0.01 40.76 / 76 0.41 / 4.19 [0.63 / 2.24] 5.54 / 18 64.16 Clinical Finding 11 112 0.03 / 3.97 71.50 / 143 1.66 / 9.58 [1.15 / 5.04] 9.00 / 34 63.00 Context-dependent categories 208 0.01 / 0.03 0.14 / 108 0.37 / 1.43 [0.63 / 1.77] 4.10 / 13 95.67 Representing and sharing knowledge using SNOMED Environments & geographical locations 51 < 0.01 / < 0.01 7.65 / 9 0.07 / 0.12 [0.17 / 0.19] 3.82 / 8 100 Events 28 < 0.01 / < 0.01 4.64 / 6 0.04 / 0.08 [0.13 / 0.16] 2.32 / 5 100 Observable entity 253 < 0.01 / < 0.01 8.26 / 12 0.08 / 0.18 [0.18 / 0.24] 3.68 / 8 90.12 Organism 1 429 < 0.01 / 0.01 13.03 / 21 0.09 / 0.20 [0.25 / 0.36] 4.72 / 13 65.01 Pharmaceutical/biologic product 1 233 < 0.01 / 0.01 31.41 / 60 0.16 / 0.47 [0.50 / 0.91] 3.68 / 10 81.51 Physical force 6 < 0.01 / < 0.01 7.00 / 7 0.38 / 0.58 [0.16 / 0.17] 2.83 / 5 50.00 Physical object 166 9.35 / 12 0.09 / 0.19 [0.19 / 0.24] 4.18 / 11 93.98 6 < 0.01 / < 0.01 Procedure 5 183 0.02 / 0.05 71.89 / 146 0.62 / 5.21 [0.15 / 4.38] 8.65 / 36 66.29 Qualifier value 216 < 0.01 / 0.01 6.68 / 11 0.06 / 0.13 [0.16 / 0.22] 2.67 / 7 87.96 Social context 204 < 0.01 / 0.01 10.11 / 15 0.18 / 0.47 [0.21 / 0.28] 3.59 / 9 77.45 Special concept 1 272 < 0.01 / 0.01 5.00 / 5 0.05 / 0.09 [0.14 / 0.14] 2.5 / 4 100 Specimen 38 0.01 / 0.02 67.74 / 127 0.19 / 0.59 [1.06 / 2.11] 4.55 / 13 81.58 Staging and scales 20 < 0.01 / < 0.01 5.60 / 8 0.04 / 0.08 [0.14 / 0.18] 2.8 / 7 100 Substance 1 295 < 0.01 / 0.01 14.76 / 32 0.16 / 0.41 [0.19 / 0.51] 4.17 / 12 72.82 Overall in Snomed ct 27 477 0.02 / 3.97 53.21 / 146 1.03 / 9.58 [0.67 / 5.04] 7.11 / 36 68.26 Table 2: Empirical results of the modularization-based axiom pinpointing on Snomed ct (time in seconds; size in number of axioms). Proceedings of the 3rd international conference on Knowledge Representation in Medicine (KR-MED 2008) Representing and sharing knowledge using SNOMED Proceedings of the 3rd international conference on Knowledge Representation in Medicine (KR-MED 2008) R. Cornet, K.A. Spackman (Eds) 25000 Acknowledgements 21377 #Modules The first author was partially supported by NICTA, 20000 #MinA Canberra Research Lab, and the second by the Ger- 15000 man Research Foundation (DFG) under grant BA 10000 1122/11-1. 5000 0 Size (#axioms) Address for Correspondence Franz Baader and Boontawee Suntisrivaraporn 11 0 21 0 31 0 41 0 1 51 0 -2 61 0 -3 71 0 -4 TU Dresden, Theoretical Computer Science, 1- 81 0 -5 91 90 -6 10 00 -7 11 10 -8 12 20 13 30 0 - -1 14 1 1 1 0- 1- 1- 01062 Dresden, Germany 1- Figure 2: Module and MinA size distribution. {baader,meng}@tcs.inf.tu-dresden.de References modularization-based pinpointing algorithm, but [1] F. Baader, D. Calvanese, D. McGuinness, D. Nardi, and P. F. Patel-Schneider, editors. The with the naive linear minimization algorithm in- Description Logic Handbook: Theory, Implemen- stead of the logarithmic one. In all four columns, tation, and Applications. Cambridge University Press, 2003. we give both average and maximum values. The last column shows the percentage of subsump- [2] I. Horrocks, P. F. P.-Schneider, and F. van Harmelen. From SHIQ and RDF to OWL: The tions that have only one MinA. Interestingly, more making of a web ontology language. Journal of than two thirds of all subsumptions have only Web Semantics, 1(1):7–26, 2003. one MinA. The overall empirical results for the [3] F. Baader, S. Brandt, and C. Lutz. Pushing the 27,477 sampled subsumptions (about 0.5% of all EL envelope. In Proc. IJCAI 2005. subsumptions) are given in the last row of the ta- [4] F. Baader, C. Lutz, and B. Suntisrivaraporn. ble. These results show that, on average, a MinA CEL—a polynomial-time reasoner for life science can be computed within one second, and its size is ontologies. In Proc. IJCAR 2006, Springer LNAI 4130, 2006. smaller than 10. Thus, MinAs can indeed be com- [5] B. Suntisrivaraporn. Module extraction and in- puted on demand, and their size is small enough cremental classification: A pragmatic approach such that they can then be inspected by hand. for EL+ ontologies. In Proc. ESWC 2008, Surprisingly, the linear minimization algorithm Springer LNCS, 2008. To appear. performed better in our experiments than the log- [6] B. Suntisrivaraporn, F. Baader, S. Schulz, and K. arithmic one. An explanation for this is probably Spackman. Replacing SEP-triplets in SNOMED CT using tractable description logic operators. In that, unlike the experiments of Algorithm 1 and 2 Proc. AIME 2007, Springer LNCS 4594, 2007. on the whole ontology, the modules are already [7] U. Hahn S. Schulz, K. Mark. Spatial location quite small, and thus the overhead required by the and its relevance for terminological inferences in logarithmic algorithm does not pay off. Figure 2 bio-ontologies. BMC Bioinformatics, 2007. depicts the size distribution of our sampled mod- [8] S. Schlobach and R. Cornet. Non-standard rea- ules and MinAs. As easily visible from the chart, soning services for the debugging of description the modules are quite small, but the MinAs are logic terminologies. In Proc. IJCAI 2003. even smaller. In fact, the majority of all subsump- [9] B. Parsia, E. Sirin, and A. Kalyanpur. Debugging tions (78%) have a MinA of size ten or less. OWL ontologies. In Proc. WWW 2005. [10] T. Meyer, K. Lee, R. Booth, and J. Z. Pan. Find- Conclusions ing maximally satisfiable terminologies for the description logic ALC. In Proc. (AAAI 2006). We have introduced a new method for axiom pin- AAAI Press/The MIT Press, 2006. pointing in the DL EL+ that is based on the com- [11] F. Baader, R. Peñaloza, and B. Suntisrivaraporn. putation of reachability-based modules. The ex- Pinpointing in the description logic EL+ . In Proc. periments carried out on Snomed ct show that KI 2007, Springer LNAI 4667, 2007. this method is fast enough to extract a minimal [12] S. Schulz, M. Romacker, and U. Hahn. Part- axiom set (MinA) for a given subsumption on de- whole reasoning in medical ontologies revisited: Introducing SEP triplets into classification-based mand. In addition, the extracted MinAs are usu- description logics. JAMIA, 1998. ally quite small and can therefore be inspected by [13] K.A. Spackman. Managing clinical terminol- users and designers of Snomed ct by hand. In the ogy hierarchies using algorithmic calculation of future, we will extend the approach such that it subsumption: Experience with SNOMED-RT. JAMIA, 2000. Fall Symposium Special Issue. can (i) extract all MinAs, (ii) provide natural lan- guage explanations for subsumption, and (iii) give [14] A. R. Bradley and Z. Manna. Checking safety by inductive generalization of counterexamples to suggestions for how to revise the ontology to get induction. In Proc. FMCAD 2007. rid of an unwanted subsumption. 7