Representation of Part-Whole Relationships in SNOMED CT A. Patrice Seyed 1∗, Alan Rector 2 , Uli Sattler 2 , Bijan Parsia 2 , and Robert Stevens 2 1 Department of Computer Science and Engineering, University at Buffalo, USA 2 School of Computer Science, University of Manchester, UK ABSTRACT or “refined by” (Rogers and Rector, 2000). This amounts to an In this paper we investigate representation of the part-whole axiom that the disorder of the part is a disorder of the whole. relationship in SNOMED CT. We discuss the current approach, In this case a mechanism must be provided to cope with the based on “SEP” triples, and several translations of it, which involve exceptions when the rule does not apply. For example, in this DLs at different levels of expressivity. We intend that our analysis case “Heart disease” is defined simply as “Disorder that has locus will concretely inform the SNOMED community about the important some Heart”. tradeoffs of expressivity for their ontology, and help with future 2.Explicit definition of diseases as disjunctions - e.g., “Heart decisions about the representation of the SNOMED CT’s anatomical disease” is defined explicitly as ”Disease that has locus some taxonomy. Heart OR some part of Heart”. 3.The use of Structure-Entity-Part (SEP) triples - separate classes 1 INTRODUCTION for the whole or its parts (Structure), just the whole (Entity), or just the parts (Part). In this case “Heart disease” is defined as a A common pattern in knowledge representation is that a fault of “Disorder that has locus some Heart Structure”. a part is considered a fault of the whole. For example, a fault in the battery is a fault in the ignition system, and is a fault in the Note that these three methods require different expressiveness in the car. This pattern pervades common medical terminology: “Heart description logic: disease” includes diseases of any of the parts of the heart - muscle, valves, walls, etc. Gastrointestinal disease includes any disease of the stomach (gastrum) or any of the parts of the intestine. The same 1.Propagation across transitive properties requires property-paths, is true of procedures: fixing a heart valve is a kind of heart operation; which were not supported in early description logics and are not repair of the retina is a kind of eye operation, etc. part of the basic specification of the standard starting description However, the pattern does not always hold. “Amputation of the logic, ALC. They were originally thought to be intractable, but hand” means amputation of the entire hand. “Amputation of a have since been shown not only to be tractable (Horrocks and finger” is not a kind of “Amputation of the hand” (although it is Sattler, 2004) but to be even available in EL++, a maximal a kind of “Operation on hand”). Similarly, there are diseases that description logic with polynomial complexity (Baader et al., affect an entire organ, for example “pancarditis” means literally, 2005). “inflammation throughout (pan) the heart”. 2.Definition of diseases in terms of disjunctions requires a In general, therefore, there is a requirement to represent two disjunction operator, which falls within ALC but outside EL++. cases: It also requires transitive properties but not property paths. 1.“Disorder/Procedure of A and/or any of its parts” and 3.SEP triples can be implemented within the simplest possible 2.“Disorder/Procedure of the entire A” description logic, and does not require transitive properties, where A is any anatomical structure. disjunction or properties paths (Hahn et al., 1999). In common medical language, the distinction is usually implicit. The distinction between the meaning of “Operation on hand” and The history of the use of these three methods and their variants “Amputation of hand” is left to the medical knowledge of the reader. is intertwined with the development of description logics for use It is only in unusual cases such as “pancarditis” (“inflammation with medical terminologies. The large description logic based throughout the heart”) that the distinction is made explicit in the terminology, SNOMED CT (Stearns et al., 2001) was originally language. However, when representing diseases and procedures developed using a variant of propagation along transitive properties formally, the distinction must be made explicitly and systematically. (Method 1) as was GALEN, the other large description logic based Over the past twenty years, there have been at least three terminology developed in the mid 1990s (Rector et al., 1997), mechanisms used to represent this pattern and the associated (Rogers and Rector, 2000). SNOMED converted to Method 3, and is distinctions: now being re-examined in the light of experience, one format being considered being a variant of Method 1 (Personal communication, 1.Propagation across transitive properties - the property used for Kent Spackman, 2011). Re-examination of these approaches is “of”, usually “has locus”, is said to be inherited across the therefore particularly timely. property “part of”. In modern description logics this is achieved The purpose of this paper is to explore variants on the three by using property paths in subproperty axioms (Horrocks and methods in the light of modern description logics, which has also Sattler, 2004). In earlier languages it was achieved by equivalent been investigated in (Baader et al., 2009). Although we comment mechanisms known as “right identities” (Stearns et al., 2001) briefly on the apparent cognitive complexity for the user of the different representations, any of the three techniques might be ∗ To whom correspondence should be addressed: apseyed@buffalo.edu “hidden” from users by syntactic and user interface mechanisms. 1 Seyed et al Our primary concern has been, therefore, with their formal, rather whole heart is a part of some body, and furthermore, a specific part than cognitive aspects. of a myocardium or a whole myocardium is a part of some body. These axioms are also illustrated in Figure 2, and given formally 2 THE CURRENT APPROACH (SEP TRIPLES) below: We view SNOMED’s set of class names C to be partitioned into: Cn ∪ CS ∪ CE ∪ CP where CS ∪ CE ∪ CP are specific to (human) anatomy. We use XS for class names in CS , XE for class names in CE , and XP for class names in CP . We assume that in any occurrence of XS , XE , or XP in an axiom, ‘X’ refers to the same term, e.g., Heart. The SEP “triple” approach represents parthood implicitly within a class hierarchy (Hahn et al., 1999). For an anatomical entity of a certain kind, XS represents its Structure class, and refers to any part of the anatomical entity, including the entire entity. For instance, HeartS refers to any part of a heart or an entire heart. XE represents its Entire class, and refers to an entire anatomical entity, and XP represents its Part class, and refers to a certain part of an entity. For instance, HeartE refers to an entire heart, and HeartP refers to any part of a heart but not an entire heart. XE and XP classes are immediate subclasses of XS ; hence, HeartE and HeartP are immediate subclasses of HeartS . In the OWL version of the SNOMED CT ontology,1 the SEP notation is part of the class label, for example ‘Heart Structure’, ‘Entire Heart’, and ‘Part of Heart’, but in this paper we apply subscripts for notational convenience. Ideally, a SEP triple is given for each anatomical entity, and every XS class (except that for the top anatomical class) is a subclass of some YP class.2 Fig. 2: Taxonomy of SEP Triple classes for Heart, Myocardium, and Body. Unlabeled arcs represent the subclass relationship. MyocardiumE v MyocardiumS v HeartP vHeartS ... v BodyP vBodyS HeartE vHeartS ... v BodyP vBodyS Note that, in SNOMED-CT, we neither find disjointness axioms for classes XE and XP nor covering axioms for XS , XE , and XP , although both are assumed to be true under the SEP triple theory. The SEP triples approach is iteratively applied along what is considered a partonomic hierarchy, for example for the Fig. 1: Illustration of the Human Heart anterior myocardium under the SEP triple for myocardium. The subsumption relationships are explicit, as given, but their reading is implicit; in particular, there is no ‘part of’ property that links The heart has as part of it a muscular wall that contracts to pump XE and XP . However, transitivity of the subsumption relation blood out of the heart, and then relaxes as the heart refills with implies the transitivity of this implicit part of reading, and so returning blood. This wall is called the myocardium. The heart transitive parthood entailments are determined by subsumption and myocardium are illustrated in Figure 1.3 Applying SEP triples, reasoning. We refer to the SEP triple approach from SNOMED- MyocardiumS is a subclass of HeartP and HeartS is a subclass of CT described so far and sketched in Figure 2 as the Current BodyP . This means that a specific part of a myocardium or a whole SEP Triple Approach (A). In the following sections we discuss myocardium is a part of some heart, a specific part of a heart or a several alternative approaches to representing part-whole relations and discuss their relative expressivity. 1 http://www.nlm.nih.gov/research/umls/Snomed/snomed main.html. On how approach A applies to subsumption reasoning for 2 In SNOMED CT, however, the SEP triples are thus far incompletely disorders, take for example a disorder specified in some anatomical populated. location that is given as some class XS . Carditis is an inflammation 3 http://texasheart.org/HIC/Topics/Cond/myocard.cfm that is located in some specific part of a heart, or a whole heart, 2 therefore HeartS .4 These axioms and entailments are illustrated in Figure 3.5 Fig. 4: No Entailment given the Part-Whole Relationship Fig. 3: Entailment given the Part-Whole Relationship. In the OWL representation class definition for Carditis, Inflammation is the range restriction for the property Associated morphology. We 3 ALTERNATIVE APPROACHES FOR exclude this expression from the definition of Carditis above in REPRESENTING PART-WHOLE order to simplify our examples. RELATIONSHIPS We discuss five alternative approaches for representing part-whole In SNOMED CT, there are numerous disorders defined in terms relationships in SNOMED CT, the first of which is a reformulation of their location. For instance, Myocarditis is inflammation that of approach A. is located in some specific part of a myocardium or a whole myocardium, therefore, MyocardiumS . 3.1 Alternative Approach 1 As illustrated in Figure 3, because MyocardiumS is a subclass We define Alternative Approach 1 (A1 ) such that XS and XP of HeartS , the location for Myocarditis is also HeartS , and are fully defined based on XE by introducing a transitive part of further, Myocarditis is a subclass of Carditis. We provide the DL property, as described by Seidenberg and Rector (2006). SNOMED representation for these findings and the corresponding inferences: is the set-theoretic difference of the original anatomy-specific Carditis ≡ Inflammation u ∃has locus.HeartS SNOMED CT axioms from all SNOMED CT axioms. We define A1 as follows: Myocarditis ≡ Inflammation u ∃has locus.MyocardiumS SNOMED ∪ {XS ≡ XE t ∃part of.XE | XS ∈ CS , XE ∈ CE } ∪  Myocarditis v Inflammation u ∃has locus.HeartS {XP ≡ ∃part of.XE | XP ∈ CP } HeartS and HeartP are therefore defined as follows:  Myocarditis v Carditis HeartS ≡ HeartE t ∃part of.HeartE A disorder that occurs at some location that is specified as a class XE , however, does not have such inferred subclasses. For example, Pancarditis is a disorder that is characterized by inflammation and is HeartP ≡ ∃part of.HeartE specified as being located in the entire heart and not just some part MyocardiumS and MyocardiumP are also defined in this manner, and of the heart, therefore HeartE . Recall that Myocarditis is located the following axiom connects the two triples: in some specific part of the myocardium or the entire myocardium, therefore MyocardiumS . As illustrated in Figure 4, it is accurately MyocardiumS v HeartP not entailed that Myocarditis is a subclass of Pancarditis: Therefore MyocardiumE and MyocardiumP are subclasses of Pancarditis ≡ Inflammation u ∃has locus.HeartE the expression ∃part of.HeartE . Because Myocarditis is an inflammation located in MyocardiumS , and by inference HeartS , it Myocarditis ≡ Inflammation u ∃has locus.MyocarditisS appropriately follows that Myocarditis is a subclass of Carditis. 3.2 Alternative Approach 2 6 Myocarditis v Pancarditis Alternative Approach 2 (A2 ) is based on modifications to A1 which 4 is obtained by the following steps: When there is any question, SNOMED CT uses the Structure class. 5 Inferred relationships are given as dotted arcs. 1.Remove all axioms of the form XE v XS and XP v XS. 3 Seyed et al 2.Replace all connecting axioms of the form XS v YP But, different from A2 , applying (3) for our example disorders (where X and Y are different) with X v ∃part of.Y. results in: 3.Replace every occurrence of XS of a class name in CS with Carditis ≡ Inflammation u ∃has locus.∃part of.Heart X t ∃part of.X and every occurrence of XE of a class name in CE with X. Myocarditis ≡ Inflammation u ∃has locus.∃part of.Myocardium Applying step (2) in A2 , the connecting axiom for our running example classes is: The definition for Pancarditis remains the same as A2 . By the connecting axiom, along with (4) and the transitivity Myocardium v ∃part of.Heart of part of, as was the case for A, A1 , and A2 , Myocarditis is an inferred subclass of Carditis. Note that by this approach, that (5) in Applying step (3) the example disorders are defined as: connection with (4) leads to cycles (as described in (Baader et al., Carditis ≡ Inflammation u ∃has locus.(Heart t ∃part of.Heart) 2009)), which is not allowed in the DL language that underlies OWL 2. Fortunately this does not pose any problems for those reasoners Myocarditis ≡ implemented for EL++ expressivity. Inflammation u ∃has locus.(Myocardium t ∃part of.Myocardium) 3.4 Alternative Approach 4 And by applying (3) to an inflammation disorder that is located in Alternative Approach 4 (A4 ) introduces the has locus entire the entire heart, we apply the X class, Heart: property, a subproperty of has locus, which expresses when a finding is located in some XE class. This approach was first Pancarditis ≡ Inflammation u ∃has locus.Heart introduced in (Baader et al., 2009)). A4 repeats Step (1) from A2 , as A3 did, and repeats Step (2), from A3 , while including the following By the connecting axiom, every myocardium is a part of some heart, step for the treatment of class names in CS and CE :7 and because part of is transitive, every part of some myocardium is a part of some heart. Because Myocarditis is an inflammation of the 3.Replace every occurrence of XS of a class name in CS with X and myocardium or some part, both of which are parts of the heart, as in every occurrence of ∃has locus.XE of a class name in CE with the prior two approaches, Myocarditis is a subclass of Carditis. ∃has locus entire.X. 3.3 Alternative Approach 3 A4 also repeats (4) and (5) from A3 , while including an additional Alternative Approach 3 (A3 ) repeats Step (1) from A2 , applies the step: proper part of property as a subproperty of part of, and includes 6.Add has locus ◦ part of v has locus. the following steps for the connecting axiom and treatment of class names in CS and CE : A4 differs from A3 in two respects. First, in (3) A4 treats X— 2.Replace all connecting axioms of the form XS v YP instead of ∃part of.X—as a replacement for XS , and employs the (where X and Y are different) with X v ∃proper part of.Y. has locus entire property. Second, for A4 in (6) a right identity 3.Replace every occurrence of XS of a class name in CS with axiom is applied, where the has locus property is “transitive over” ∃part of.X, and every occurrence of XE of a class name in CE the part of relation. with X. Applying (2) the connecting axiom for Myocardium and Heart is the same as for A3 . Different from all other alternative approaches, Additionally, for inferences of parthood: applying (3) for our example disorders results in: 4.Add proper part of v part of. Carditis ≡ Inflammation u ∃has locus.Heart 5.Add part of ◦ proper part of v proper part of. Myocarditis ≡Inflammation u ∃has locus.Myocardium A3 differs from A2 in three important respects. First, for (3) part of.X replaces X t part of.X; second, part of here is defined as reflexive, Also applying (3) to an inflammation disorder that is located in the where it is assumed irreflexive in A2 (and A1 ); and third, Step (5) entire heart yields: introduces a left identity axiom which is necessary because it allows us to infer:6 Pancarditis ≡ Inflammation u ∃has locus entire.Heart which prevents erroneous propagation via the right identity  ∃part of.Myocardium v ∃proper part of.Heart axiom. By the connecting axiom, along with (4) and (5), the and subsequently: same inferences hold for our example disorders, primarily that Myocarditis is a subclass of Carditis.  Myocarditis v ∃has locus.∃proper part of.Heart Applying (2) the connecting axiom for Myocardium and Heart is: 4 DISCUSSION In Section 1 we introduced three major methods for representing Myocardium v ∃proper part of.Heart part-whole relationships, by applying: (1) transitive properties (2) 6 A left identity axiom can be formalized in OWL2 as a property chain 7 Baader et al. (2009) also keep Structure and Part expressions fully defined axiom. as XS ≡ ∃part of.X and XP ≡ ∃propert part of.X, for legacy reasons. 4 disjunctions and (3) SEP triples. In Section 2 we introduced the has utility as a representation used for mapping between ontologies logic underlying the current approach in SNOMED CT, and in that use the propositional approach and those that use the relational Section 3 the logic underlying four alternative approaches. The approach. Clearly, formulations that include the part of property approach used in SNOMED CT currently, A, is an application of facilitate ontology modularity, merging, and enrichment where A1 (3), which is within ALC expressivity. A1 is an application of both can serve as a bridge. (2) and (3), while A2 is an application of just (2); both are within In future work we will empirically measure classification ALC but are outside EL++ due to disjunctions. A3 and A4 are an and query performance for these different SNOMED ontology application of just (1), and fall within EL++. formulations approaches across several DL reasoners. Furthermore, In general, there is a modeling choice between treating a we will apply an evaluation framework across the formulations for generalized ‘part of’ property as reflexive or irreflexive. In A1 and various types of information requests. In that work we will address A2 the part of property corresponds to the latter choice, and is what kinds of information requests are expressible as OWL class assumed irreflexive. It is only assumed because in OWL2 we cannot expressions, and which require a more expressive query language. assert that a transitive property is irreflexive, but we can assert that a transitive property is reflexive. Therefore we can also introduce approaches (as shown for A3 and A4 ) which correspond to the former ACKNOWLEDGEMENTS choice, where ‘part of’ is reflexive, which can be therefore be This work was supported by the National Science Foundation (NSF applied—directly and without disjunctions—for representing the XS Grant IIS-1107011) in conjunction with IJCAI 2011. We would class expression. In these approaches a subproperty proper of, again like to give thanks to Luigi Iannone for assistance in using the assumed irreflexive, is also introduced for representing the XP class OPPL scripting toolkit and useful advice for using the OWLAPI expression; subsequently cyclic role chains are required in order for for the translation work. We would also like to give thanks to Kent the respective ontologies to entail correct subclasses of the pattern Spackman and the reviewers for their helpful feedback. ∃proper part.X. Also, an important distinction between the approaches A3 and A4 REFERENCES is that while A4 has the same approach as A3 for translating and Baader, F., Brandt, S., and Lutz, C. (2005). Pushing the thus representing SEP class expressions (via patterns ∃part of.X and el envelope. In International Joint Conference on Artificial ∃propert part of.X for Structure and Part expressions, respectively), Intelligence, volume 19, page 364. Citeseer. A4 has a different approach for inheritance of properties along a Baader, F., Schulz, S., Spackman, K., and Suntisrivaraporn, partonomy. For A4 the inheritance is through a right identity axiom, B. (2009). How Should Parthood Relations be Expressed while for A3 it is through the transitivity of part of. in SNOMED CT? Proceedings of the First Workshop des GI-Arbeitskreises Ontologien in Biomedizin und 5 CONCLUSION Lebenswissenschaften. A major difference between the current approach, A, and the Hahn, U., Schulz, S., and Romacker, M. (1999). Partonomic alternative approaches, A1 - A4 , is that the former offers only reasoning as taxonomic reasoning in medicine. In Proceedings a propositional representation and the latter offer a relational of the Sixteenth National Conference on Artificial Intelligence representation of parthood. A does not model partonomic structure, and the Eleventh Innovative Applications of Artificial Intelligence but rather partonomic “level”. By modeling partonomic structure Conference, pages 271–276. American Association for Artificial explicitly via the part of property we can make explicit statements Intelligence. of how part of interacts with other properties (i.e., laterality): Horrocks, I. and Sattler, U. (2004). Decidability of SHIQ with ∃ hasLat.Left v (∀ part of.(∃ hasLat.> ⇒ ∃ hasLat.Left)) complex role inclusion axioms. Artificial Intelligence, 160(1-2), says that, if something has a left laterality, then, what it is a part of, 79–104. if this ’whole’ has a laterality at all, it has a left laterality. Modelling Rector, A., Bechhofer, S., Goble, C., Horrocks, I., Nowlan, W., and this kind of interaction requires an explicit part of - which then can, Solomon, W. (1997). The GRAIL concept modelling language of course, be used in sub-role and inverse role axioms as well. for medical terminology. Artificial intelligence in medicine, 9(2), It is reported by users of SNOMED-specific browsers that SEP 139–171. triples are cumbersome to browse and search through. We suggest Rogers, J. and Rector, A. (2000). GALEN’s model of parts that this problem can be addressed by providing more intuitive and wholes: experience and comparisons. In Proceedings of labels. In the context of user navigation, it is simply a rendering the AMIA symposium, page 714. American Medical Informatics issue. It is for this reason we do not necessarily recommend against Association. the A or A1 approach. Nevertheless, A1 - A4 do provide the benefit Seidenberg, J. and Rector, A. (2006). Representing transitive of allowing a user to explicitly query parts, for A queries require propagation in OWL. Conceptual Modeling-ER 2006, pages knowledge of the SEP class hierarchy. 255–266. In preliminary performance testing, A1 performed the worst for Stearns, M., Price, C., Spackman, K., and Wang, A. (2001). classification across all the DL reasoners we tested. This is no doubt SNOMED clinical terms: overview of the development process attributable to the inclusion of disjuncts in the class definitions, and and project status. In Proceedings of the AMIA Symposium, page corresponding unfolding performed by the reasoner. Despite this, A1 662. American Medical Informatics Association. 5