=Paper=
{{Paper
|id=Vol-410/paper-7
|storemode=property
|title=Comparing SNOMED CT and the NCI Thesaurus through Semantic Web Technologies
|pdfUrl=https://ceur-ws.org/Vol-410/Paper07.pdf
|volume=Vol-410
|dblpUrl=https://dblp.org/rec/conf/krmed/Bodenreider08
}}
==Comparing SNOMED CT and the NCI Thesaurus through Semantic Web Technologies==
Representing and sharing knowledge using SNOMED
Proceedings of the 3rd international conference on Knowledge Representation in Medicine (KR-MED 2008)
R. Cornet, K.A. Spackman (Eds)
Comparing SNOMED CT and the NCI Thesaurus
through Semantic Web Technologies
Olivier Bodenreider
U.S. National Library of Medicine, NIH, Bethesda, Maryland, USA
olivier@nlm.nih.gov
Objective: The objective of this study is to compare ent purposes: the NCI Thesaurus (NCIt), used for the
two large biomedical terminologies, SNOMED CT annotation of cancer research data, and SNOMED
and the National Cancer Institute (NCI) Thesaurus, CT, the largest clinical terminology used in electronic
through Semantic Web technologies. Methods: The patient records. We take advantage of the fact that
two terminologies are converted into the Resource both ontologies were developed using Description
Description Framework (RDF) and loaded into a Logic-based systems. Although most classes are not
common triple store. The Unified Medical Language defined with a set of necessary and sufficient condi-
System (UMLS) is used to identify correspondences tions, the set of relations in which a given concept is
between concepts across terminologies. Concepts involved still provides a formal definition for this
common to both terminologies are compared based concept, which can be used to compare it to other
on shared relations to other concepts. Results: A concepts. We also take advantage of the fact that both
total of 20,369 pairs of equivalent SNOMED CT and ontologies are represented in the Unified Medical
NCI Thesaurus concepts were identified through the Language System (UMLS), which asserts the equiva-
UMLS. The highest proportion of shared relata is for lence between concepts across biomedical ontologies.
the superclasses traversed recursively (75% of the Finally, we exploit Semantic Web technologies, such
concepts share at least one superclass). Slightly more as the Resource Description Framework (RDF) to
than half of the concepts studied share at least one carry out the comparison between these two ontolo-
associative relation (direct relation or inherited from gies.
some ancestor). Conclusions: Overall, SNOMED CT The objective of this study is to compare the formal
and NCI Thesaurus concepts exhibit a relatively definitions of SNOMED CT and NCIt concepts,
small proportion of shared relata. Semantic Web using Semantic Web technologies. The assumption
technologies, including RDF and triple stores, are underlying this study is that two concepts, one from
suitable for comparing large biomedical ontologies, SNOMED CT and one from NCIt, when identified as
at least from a quantitative perspective. equivalent in the UMLS, should have similar formal
definitions. In other words, our hypothesis is that
INTRODUCTION equivalent concepts from SNOMED CT and NCIt
should have related concepts that are also equivalent.
In the era of translational medicine, i.e., the applica- To our knowledge, this is the first study to compare
tion of the discoveries of basic research (made at the biomedical ontologies on a large scale using RDF.
bench) to clinical medicine (the patient’s bedside)
and the refinement of research hypotheses based on
BACKGROUND
clinical findings, basic researchers and healthcare
practitioners need to exchange information back and The general framework of this study is that of quality
forth. In order to be processed efficiently, both re- assurance in biomedical terminologies and ontolo-
search data and clinical data must be annotated to gies, which is known to be is a difficult task [1]. Sev-
some reference terminology or ontology. Although eral approaches to auditing terminologies have been
some research ontologies and clinical ontologies have proposed, including semantic methods [2], structural
a significant degree of overlap, there has typically methods [3] and linguistic and formal ontological
been little coordination between the groups develop- approaches [4]. Methods based on description logics
ing them. As a consequence, the definitions – textual have also been proposed, but have generally been
or formal – provided in research ontologies and clini- restricted to subsets of large medical ontologies [5].
cal ontologies for the same biomedical entity may Various methods have been applied to SNOMED CT
vary significantly, which constitutes a hindrance to [3, 4] and to the NCIt [6]. In contrast to these ap-
the effective integration of data from basic research proaches, we propose to evaluate SNOMED CT and
and clinical practice. the NCIt simultaneously and against each other. In
The evaluation of biomedical terminologies for com- other words, we want to cross-validate the definitions
pleteness and accuracy remains largely an open re- or assertions provided in one ontology for a given
search question. In this paper, we propose to compare entity with the definitions or assertions provided in
two large biomedical ontologies developed for differ- the other ontology for the same entity.
37
Representing and sharing knowledge using SNOMED
Proceedings of the 3rd international conference on Knowledge Representation in Medicine (KR-MED 2008)
R. Cornet, K.A. Spackman (Eds)
The Semantic Web provides a common framework Ontology Language (OWL-DL) for its representation
that enables the integration, sharing and reuse of data [12]. Version 07.05e of the NCIt contains 58,869
from multiple sources. Recent research in Semantic active classes, 123 associative relationships and
Web technologies has delivered promising results to 124,775 relations (subsumption and equivalence
enable information integration across heterogeneous relations, as well as restrictions in the OWL file). The
knowledge sources, particularly in the biomedical OWL file for the NCIt was downloaded from the
domain [7]. Semantic Web technologies are a collec- caCORE FTP site (ftp://ftp1.nci.nih.gov/pub/cacore/),
tion of formalisms, languages and tools created to under EVS.
support the Semantic Web. Among them, the Re-
source Description Framework (RDF) is a W3C- Unified Medical Language System
recommended framework for representing data in a The Unified Medical Language System (UMLS) is a
common format that captures the logical structure of terminology integration system developed at the U.S.
the data [8]. The RDF representational model uses a National Library of Medicine [13]. The UMLS Meta-
single schema in contrast to multiple heterogeneous thesaurus is a repository of integrated biomedical
schemas or Data Type Definitions (DTD) used to terms drawn from 143 biomedical vocabularies and
represent data in XML by different sources. In con- ontologies. Terms referring to the same entity in sev-
junction with a single Uniform Resource Identifier eral vocabularies are clustered together and given the
(URI), all data represented in RDF form a single same concept unique identifier (CUI). Both
knowledge repository that may be queried as one SNOMED CT (July 31, 2007) and NCIt (07.05e) are
knowledge resource. An RDF repository consists of a integrated in version 2007AC of the Metathesaurus,
set of assertions or triples. Each triple comprises three which provides a convenient way of identifying equi-
entities namely, subject, predicate and object. A col- valences between terms from these two ontologies.
lection of triples forms a graph and can be stored in a The UMLS is available for download from the UMLS
specialized database called a triple store. Knowledge Source Server (http://umlsks.nlm.nih.-
gov/). (A free license is required).
MATERIALS
METHODS
SNOMED CT The method developed for comparing concepts from
SNOMED CT is a concept system and an associated SNOMED CT and NCIt can be summarized as fol-
terminology for healthcare [9].. It is managed by the lows. The formal definition of concepts is extracted
International Health Terminology Standards Devel- from SNOMED CT and NCIt and converted to RDF
opment Organisation (IHTSDO), a not-for-profit triples. Equivalence relations between SNOMED CT
international standards body with nine member coun- and NCIt concepts are extracted from the UMLS . All
tries. Although its development is based on the De- triples are loaded into a triple store. Additional triples
scription Logic system KRSS, SNOMED CT is pro- are generated from inference rules applied to the
vided as a set of relational tables corresponding to an original knowledge base. The triple store is then que-
“inferred view”, i.e., the set of non-redundant defin- ried to compare the representation of concepts in
ing relations for each concept. The July 2007 interna- SNOMED CT and NCIt.
tional release contains 310,311 active elements
(309,175 concepts and 1,136 relationships, of which Acquiring RDF triples
only 61 are actually used to relate concepts) and For each concept and relationship from SNOMED
1,218,983 relations (pairs of semantically-related CT and NCIt, we extract the following information:
concepts). The source files for SNOMED CT original identifier, preferred name, source (SNOMED
(sct_concepts and sct_relationships) were down- CT or NCIt), type (concept or relationship). RDF
loaded from the UMLS Knowledge Source Server triples are created to represent this information, in
(http://umlsks.nlm.nih.gov/). which the subject is the concept itself. The predicates
corresponding to the properties listed above are hasID,
NCI Thesaurus hasName, hasSource and hasType, respectively. The
The National Cancer Institute Thesaurus (NCIt) is a object of these triples is a literal corresponding to, for
“terminology based on current science that helps example, the concept name for the predicate hasName.
individuals and software applications connect and Triples are also created for representing the relations
organize the results of cancer research” [10]. The of each concept to other concepts from the same
NCIt is produced by the National Cancer Institute, source. The relationship indicated in the source is
and is a key element of the cancer common ontologic used as predicate for these triples, whose objects are
representation environment (caCORE) [11]. The concepts. Similarly, triples are created for
NCIt uses the description logic flavor of the Web representing relations among relationships (e.g., sub-
38
Representing and sharing knowledge using SNOMED
Proceedings of the 3rd international conference on Knowledge Representation in Medicine (KR-MED 2008)
R. Cornet, K.A. Spackman (Eds)
PropertyOf). Finally, we create triples to represent the this triple store and ended up not using it. (The lack
mapping of concepts to the UMLS Metathesaurus. of generalized transitive closure in the triple store was
For each concept from SNOMED CT and NCIt, we compensated for by graph traversal functions in the
create one triple with the predicate hasCUI and the queries.)
corresponding UMLS CUI as object literal. In practice, the only rule we created and applied to
SNOMED CT. The fields ‘CONCEPTID’ and the store makes a concept from SNOMED CT
‘FULLYSPECIFIEDNAME’ from the table equivalent to a concept from NCIt when both con-
stc_concept were used to instantiate the properties cepts are mapped to the same UMLS concept (i.e.,
hasID and hasName, respectively. All nodes were as- share the same UMLS CUI). This relation was im-
signed the value ‘concept’ for the property hasType, plemented by creating an owl:sameAs relationship be-
except for the elements of the table stc_concept ac- tween the two concepts, bidirectionally.
tually corresponding to relationships, namely, Lin-
kage concept (linkage concept) and its descendants,
to which the value ‘relationship’ was assigned. All SNOMED CT NCI Thesaurus
nodes were assigned the value ‘SNOMEDCT’ for the
S1 N1
property hasSource. S2
NCI Thesaurus. The elements ‘code’ and ‘Pre-
nr1
ferred_Name’ from the ‘’ sections of the sr2 sr1
nr2 N2
OWL file were used to instantiate the properties hasID S3
and hasName, respectively. All nodes were assigned sr3 S0 N0
the value ‘concept’ for the property hasType. Analo- nr3
gously, information extracted from the sr4 sr5
nr4
‘’ sections of the OWL file was S4 N3
used to create the corresponding triples for properties S5
(i.e., predicates). These nodes were assigned the N4
value ‘relationship’ for the property hasType. All
nodes were assigned the value ‘NCI’ for the property
hasSource. Equivalent concepts according to the UMLS
Legend
Legend Relationship between 2 concepts
UMLS Metathesaurus. The table MRCONSO.RRF Shared relata of S and N
from the UMLS distribution was used for acquiring
the mapping between terms from SNOMED CT and Figure 1. Graph formed by the related concept of one
the UMLS concepts, as well as between terms from pair of equivalent concepts (S0, N0)
the NCIt and the UMLS concepts. We used the
source abbreviation (SAB) to identify strings contri- Querying the triple store
buted by SNOMED CT (SAB = ‘SNOMEDCT’) or A set of queries was developed to explore the relata
NCTt (SAB = NCI). We extracted the concept iden- of those concepts that are equivalent between
tifier in the source (SCUI) and UMLS concept unique SNOMED CT and NCIt according to the UMLS.
identifier (CUI) and created triples of the form (con- More specifically, these queries explore the set of
cept, hasCUI, CUI) for each pair (SCUI, CUI). relata of the SNOMED CT concept and that of the
NCIt concept, and select from the two sets the relata
Creating the triple store identified as equivalent in the UMLS. For example,
These triples generated from SNOMED CT, NCIt as illustrated in Figure 1, the concepts S0 from
and the UMLS were represented in N-triple format SNOMED CT and N0 from NCIt are equivalent ac-
and loaded into the open source triple store Mulga- cording to the UMLS. Among the relata of S0 (S1 to
ra™ (http://mulgara.org/) in a linux environment. S5) and N0 (N1 to N4), the pairs {S1, N1} and {S5, N3}
Mulgara automatically indexes the triples, as well as denote equivalent concepts and constitute the set of
the subject, predicate and object elements of each shared relata of {S0, N0}.
triple. Each relation between two concepts (e.g., (S0, sr4,
S4)) is represented as a triple in the RDF store and the
Inference rules set of all relations forms a graph. Comparing the set
Inference rules are typically added to a triple store in of relata of two concepts can thus be expressed as a
order to infer new RDF statements (i.e., triples) from set of constraints on the graph. For example, {S1, N1}
existing RDF statements. Mulgara provides a series are shared relata of {S0, N0}, because there is a path
of rules, which implement RDF Schema (RDFS) between S0 and N0, constituted of any link from S0 to
entailment, including rules for the transitivity of the S1, any link from N0 to N1, and a “UMLS equiva-
relationships rdfs:subClassOf and rdfs:subPropertyOf. We lence” link between S1 and N1.
found the set of rules for RDFS impractical to use on
39
Representing and sharing knowledge using SNOMED
Proceedings of the 3rd international conference on Knowledge Representation in Medicine (KR-MED 2008)
R. Cornet, K.A. Spackman (Eds)
The set of relata is not necessarily limited to direct In practice, starting from the list of pairs of equivalent
relata. Some relations can be traversed recursively in concepts, we generated one query per pair for each
order to explore, for example, the set of common type of relationship to be explored. The relata in
ancestors (as opposed to common direct subclasses). common were recorded for each pair of equivalent
Depending on the constraints put on the graph, vari- concepts for each type of relationship explored. Fig-
ous kinds of relationships can be explored, together ure 2 shows a typical query used to explore (recur-
or independently. sively) the common superclasses of two concepts.
One of the major query languages for RDF stores is Figure 3 displays the output of this query, showing
SPARQL. Mulgara currently provides no support for the 7 ancestors in common.
SPARQL. Instead, it provides iTQLTM (Interactive
Tucana Query LanguageTM), which is functionally Data analysis
equivalent to SPARQL for most purposes. We analyzed the lists of shared relata resulting from
the queries from a quantitative perspective, in order
select $n_sub $n_rel $n_obj $s_sub $s_rel $s_obj
to examine the distribution of the number of common
from relata for the various kinds of relationships under
where
(
investigation.
# ---------- NCIT side ----------
walk( $n_obj
and $n_sub_tmp $n_obj)
RESULTS
and $n_rel
and $n_sub
)
and
Triple store
( A total of 3,194,215 triples were created, 2,770,477
# ---------- SNCT side ----------
walk( $s_obj
for SNOMED CT and 423,738 for NCIt. It took
and $s_sub_tmp $s_obj) about 20 minutes to load these N-triples into Mulga-
and $s_rel
and $s_sub
ra, including the creation of indexes.
) The rule asserting the equivalence of SNOMED CT
and $n_obj $s_obj
in
and NCIt concepts when they share the same UMLS
; CUI generated 40,738 additional triples (representing
the owl:sameAs relations bidirectionally). It took about
Figure 2. iTQLquery used to explore the common su-
perclasses of the concepts C2986 from NCIt and 5 minutes to apply this rule to the triple store.
46635009 from SNOMED CT Queries were executed in batches, one batch for each
set of equivalent concepts for a given kind of rela-
[ ncit:C2986, rdfs:subClassOf, ncit:C2991, snct:46635009, snct:116680003, snct:64572001 ] tionship. Executing a batch of queries took anywhere
[ ncit:C2986, rdfs:subClassOf, ncit:C3009, snct:46635009, snct:116680003, snct:362969004 ]
[ ncit:C2986, rdfs:subClassOf, ncit:C2985, snct:46635009, snct:116680003, snct:73211009 ] between several minutes (for direct relations) to sev-
[ ncit:C2986, rdfs:subClassOf, ncit:C27067, snct:46635009, snct:116680003, snct:17346000 ]
[ ncit:C2986, rdfs:subClassOf, ncit:C53655, snct:46635009, snct:116680003, snct:126877002 ]
eral hours (when relations are allowed to be traversed
[ ncit:C2986, rdfs:subClassOf, ncit:C2990, snct:46635009, snct:116680003, snct:53619000 ]
[ ncit:C2986, rdfs:subClassOf, ncit:C26842, snct:46635009, snct:116680003, snct:3855007 ]
recursively).
Figure 3. Results of the query in Figure 2 (aliases are Overlap between SNOMED CT and NCIt con-
used in lieu of the full URIs) cepts
Of the 309,175 SNOMED CT concepts, 19,506
Comparing the shared relata of concepts (6.3%) mapped to the same UMLS concept as some
In order to compare the formal definitions of a con- NCIt concept. Analogously, 14,054 (23.9%) of the
cept S0 from SNOMED CT and N0 from NCIt, we 58,869 NCIT concepts mapped to the same UMLS
prepared queries to explore the following sets of concept as some SNOMED CT concept. A total of
shared relata: all shared relata (including through 20,369 pairs of SNOMED CT and NCIt concepts
associative relations), shared superclasses, shared were identified in which the two concepts are deemed
wholes (of which the entity is a part of), shared sub- equivalent based on their mapping to the UMLS.
classes and shared parts. More precisely, these kinds
of relations were first explored directly to extract the Quantitative results
set of relata in direct relation to the original concepts, The distribution of the number of relata for several
and indirectly, allowing the recursive traversal of isa types of relationships investigated is summarized in
and part_of relationships. Finally, in order to account Table 1. The first column (N) shows the total number
for the inheritance of properties from a superclass to of pairs of concepts for which both concepts have at
its subclasses, we also explored the concepts in asso- least one related concept for this relation. This num-
ciative relation to any of the superclasses of the origi- ber is used as the denominator for computing the
nal concepts. percentage of pairs of equivalent concepts having a
given number of related concepts in common. The
40
Representing and sharing knowledge using SNOMED
Proceedings of the 3rd international conference on Knowledge Representation in Medicine (KR-MED 2008)
R. Cornet, K.A. Spackman (Eds)
minimum, maximum and median number of shared Further research is needed to distinguish among pri-
relata are presented in the last three columns. For mitive concepts in both ontologies (e.g., Aneurismal
example, the row “Dir. Superclass” corresponds to bone cyst), concepts for which a relatively rich de-
the shared direct parent classes (traversing isa in scription is provided, but only in one ontology (e.g.,
SNOMED CT and subClassOf in NCIt). N = 20,360 the description provided for many cancers in NCIt is
indicates that almost all concepts have at least one typically richer than in SNOMED CT), and concepts
ancestor. 18.4% of the pairs of equivalent concepts defined in both ontologies, but with minimal overlap
studied share a parent class and only 1.3% share two. in their relata. We did not complete the comparison
Over 80% of the pairs do not share any direct parents. of shared descendants, but, even in the absence of a
The row “Ind. Superclass” corresponds to the shared rich description, a large proportion of shared descen-
ancestors (traversing isa or subClassOf recursively). dants can be a good indicator of consistency between
Only 25% of the pairs of equivalent concepts studied ontologies (e.g., Sulfonamide agents share 18 des-
do not have any ancestors in common. The largest cendants).
number of ancestors in common is 22.
Details about shared relata for other kinds of relation- Semantic Web technologies
ships are provided in the other rows of Table 1, in- We found RDF to be suitable for comparing termino-
cluding direct parent and child classes for the tax- logical ontologies, especially when the two ontologies
onomic relation (super/subclass) and for the mero- are large and are not both available in OWL. While
nomic relation (whole/part). The identification of OWL classifiers are useful for consistency checking
indirect relata involves the recursive traversal of purposes, they tend to be limited in the number of
taxonomic and meronomic relations and combination classes they can handle. Moreover, the queries pre-
of sucblassOf and associative relations. sented in this study arguably allow more flexibility
than OWL DL classifiers.
EXTENDED EXAMPLE The triple store approach also offers clear advantages
over relational databases, as SQL provides no support
In order to illustrate our approach to comparing on-
for performing transitive closures (i.e., for performing
tologies, we explore how Type 1 diabetes mellitus is
joint operations recursively). While ad hoc programs
represented in SNOMED CT and NCIt. As shown in
(or stored procedures) embedding SQL queries can
Figure 4, this concept has many relata both in
be written against the database, we showed that sim-
SNOMED CT and in NCIt, of which a large number
ple queries against the RDF store were sufficient to
are shared, including 7 shared ancestors (e.g., Dis-
carry out this study. Because it supports the seamless
order of pancreas) and 4 shared concepts in associa-
traversal of complex graphs (recursive traversal of
tive relation (e.g., Gastrointestinal System). Dotted
one relationship and traversal of selected combina-
lines represent indirect isa relations through concepts
tions of relationships), RDF is an effective approach
that are not shown. The equivalence between con-
to comparing terminologies.
cepts in SNOMED CT and NCIt assessed through the
The comparison of large ontologies remains nonethe-
UMLS is shown with grey links. Of note, two distinct
less difficult. The inference engine of Mulgara could
concepts in one ontology can be equivalent to one
not apply the set of rules defined for RDFS, including
concept in the other (e.g., Endocrine Pancreas and
the transitivity of subClassOf to large, heavily hierar-
Islet of Langerhans in NCIt vs. Endocrine pancreatic
chical structures. However, the graph traversal func-
structure in SNOMED CT).
tions supported by the query language partially com-
pensated for the absence of precomputed transitive
DISCUSSION closures.
SNOMED CT and NCIt Limitations and future work
Overall, the two ontologies under investigation in this This approach essentially provides a quantitative
study were found to have a relatively small proportion comparison between two ontologies and is insuffi-
of relata in common, including when the properties cient for fine-grained comparisons. Although we did
(e.g., associative relations) are explored in the ances- not study whether pairs of related concepts in both
tors to simulate the inheritance of properties along isa ontologies were linked by similar relations, the in-
hierarchies. The highest proportion of shared relata is formation could be easily extracted from the triple
for the superclasses traversed recursively (75% of the store. We also would like to test the structural consis-
concepts share at least one superclass). Slightly more tency of the combined ontologies (e.g., by testing the
than half of the concepts studied share at least one presence of cycles in isa relations in the RDF store
associative relation (direct relation or inherited from containing both SNOMED CT and NCIt). The advan-
some ancestor). tage of using the UMLS perspective on concept equi-
41
Representing and sharing knowledge using SNOMED
Proceedings of the 3rd international conference on Knowledge Representation in Medicine (KR-MED 2008)
R. Cornet, K.A. Spackman (Eds)
valence outweighs the potential bias it introduces 6. Ceusters W, Smith B, Goldberg L. A terminolog-
with its “concept view”. ical and ontological analysis of the NCI Thesau-
rus. Methods Inf Med 2005;44(4):498-507
Acknowledgements
7. Ruttenberg A, Clark T, Bug W, Samwald M,
This research was supported by the Intramural Re- Bodenreider O, Chen H, et al. Advancing transla-
search Program of the National Institutes of Health tional research with the Semantic Web. BMC
(NIH), National Library of Medicine (NLM). Our Bioinformatics 2007;8 Suppl 3:S2
thanks go to Ramez Ghazzaoui who helped create the 8. RDF: http://www.w3.org/RDF/
triple store and Lee Peters who processed SNOMED
9. SNOMED CT: http://www.ihtsdo.org/
CT.
10. de Coronado S, Haber MW, Sioutos N, Tuttle
References MS, Wright LW. NCI Thesaurus: using science-
based terminology to integrate cancer research
1. Rogers JE. Quality assurance of medical ontolo- results. Medinfo 2004;11(Pt 1):33-7
gies. Methods Inf Med 2006;45(3):267-74 11. Phillips J, Chilukuri R, Fragoso G, Warzel D,
2. Cimino JJ. Auditing the Unified Medical Lan- Covitz PA. The caCORE Software Development
guage System with semantic methods. J Am Med Kit: streamlining construction of interoperable
Inform Assoc 1998;5(1):41-51 biomedical information services. BMC Med In-
3. Wang Y, Halper M, Min H, Perl Y, Chen Y, form Decis Mak 2006;6:2
Spackman KA. Structural methodologies for au- 12. Golbeck J, Fragoso G, Hartel F, Hendler J, Ober-
diting SNOMED. J Biomed Inform thaler J, Parsia B. The National Cancer Institute's
2007;40(5):561-81 Thesaurus and Ontology. Web Semantics:
4. Ceusters W, Smith B, Kumar A, Dhaen C. On- Science, Services and Agents on the World Wide
tology-based error detection in SNOMED-CT. Web 2003;1(1):75-80
Medinfo 2004;11(Pt 1):482-6 13. Bodenreider O. The Unified Medical Language
5. Cornet R, Abu-Hanna A. Auditing description- System (UMLS): integrating biomedical termi-
logic-based medical terminological systems by nology. Nucleic Acids Res 2004;32(Database is-
detecting equivalent concept definitions. Int J sue):D267-70
Med Inform 2007
Table 1. Distribution of the number of related concepts shared by pairs of equivalent concepts (N) for various kinds of
relationships (top: direct relations, bottom: indirect relations, including recursive traversal and combination of sucblassOf
and associative relations)
Number of related concepts med
Relationship N min max
0 1 2 3 4 5 >5 ian
Any 20,363 66.8% 21.1% 5.9% 2.9% 1.3% 0.7% 1.3% 0 47 0
Superclass 20,360 80.3% 18.4% 1.3% 0.0% 0.0% 0.0% 0.0% 0 4 0
Dir. Whole 1,004 96.2% 3.8% 0.0% 0.0% 0.0% 0.0% 0.0% 0 1 0
Subclass 3,699 48.9% 21.9% 15.2% 6.4% 2.8% 1.8% 2.9% 0 19 1
Part 76 57.9% 34.2% 7.9% 0.0% 0.0% 0.0% 0.0% 0 2 0
Superclass 20,360 25.0% 28.5% 18.7% 11.1% 5.5% 3.6% 7.7% 0 22 1
Ind. Whole 1,004 93.3% 6.1% 0.6% 0.0% 0.0% 0.0% 0.0% 0 2 0
Associative 6,548 46.3% 18.6% 11.3% 10.6% 6.8% 2.4% 4.1% 0 11 1
42
disease has
Gastrointestinal System associated Disease or Disorder Disease finding site Structure of digestive system
C12378 anatomic site C2991 64572001 86762007
R. Cornet, K.A. Spackman (Eds)
f f
Pancreas Gastrointestinal Disorder Disorder of digestive system finding site Pancreatic structure
e C12393 C2990 53619000 15776009 e
disease has associated
anatomic site
Endocrine Disorder Pancreatic Disorder Disorder of pancreas Disorder of endocrine system
Representing and sharing knowledge using SNOMED
C3009 C26842 3855007 362969004
Glucose Metabolism Disorder Endocrine Pancreas Disorder Disorder of endocrine pancreas Disorder of glucose metabolism
C53655 C27067 17346000 126877002
disease has primary
43
anatomic site
Diabetes Mellitus Diabetes mellitus
Endocrine Pancreas C2985 73211009
associative relationships
c C32509
Autoimmune Cell-mediated
disease has normal
tissue origin
Disease cytotoxic disorder
Islet of Langerhans Allergic disorder
d C12608 of digestive system
d d Equivalent concepts finding site Endocrine pancreatic structure c
(UMLS) 78696007
Proceedings of the 3rd international conference on Knowledge Representation in Medicine (KR-MED 2008)
Insulin Dependent Diabetes Mellitus Diabetes mellitus type 1 d
Isa relationship C2986 46635009
Immune hypersensitivity
Assocative relationship
reaction
due to
NCI Thesaurus SNOMED CT
Figure 4. Representation of Type 1 diabetes mellitus in SNOMED CT and NCIt, showing shared relata for ancestors and