=Paper= {{Paper |id=Vol-2137/paper_34.pdf |storemode=property |title=Scrutinizing the Relationships Between SNOMED CT Concepts and Semantic Tags |pdfUrl=https://ceur-ws.org/Vol-2137/paper_34.pdf |volume=Vol-2137 |authors=Jonathan Bona,Werner Ceusters |dblpUrl=https://dblp.org/rec/conf/icbo/BonaC17 }} ==Scrutinizing the Relationships Between SNOMED CT Concepts and Semantic Tags== https://ceur-ws.org/Vol-2137/paper_34.pdf
Bona and Ceusters




Scrutinizing the relationships between SNOMED CT concepts and
                           semantic tags
                                           Jonathan Bona1,* and Werner Ceusters2
              1
                  Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, USA
                            2
                              Department of Biomedical Informatics, University at Buffalo, Buffalo, NY



ABSTRACT                                                                       mately subsumed by the highest-level concepts for morpho-
The fully specified name of a concept in SNOMED CT is formed by a term         logic abnormalities and disorders respectively. Because STs
to which in the typical case is added a semantic tag (ST). An ST is meant to   are substrings added to names inside FSNs and are not repre-
disambiguate homonymous terms and indicate where that concept fits into        sented separately as part of SNOMED CT’s formal model, it
SNOMED’s massive concept hierarchy. We have developed a method to de-          is not easy to determine whether a tag on a concept should be
termine whether or not a concept’s tag correctly identifies its place in the   taken to mean that the concept is necessarily part of the same
hierarchy, and applied this method to an analysis of all active concepts in    sub-hierarchy as others with that tag. A concept’s ST would
every SNOMED CT release from January 2003 to January 2017. Our results         strictly identify its place within the hierarchy if each tag had
show that there are concepts in every release whose tags do not match their    a single, high-level corresponding concept that used it, and
placement in the hierarchy. These tag/hierarchy mismatches appear to be er-    every concept using the tag was below that high-level concept
rors. The number of such errors is increasing in recent versions.              in the hierarchy. For instance: in the clinical finding hierar-
                                                                               chy the highest finding concept, [404684003 | Clinical find-
1    INTRODUCTION                                                              ing (finding)], subsumes all other findings.
SNOMED CT is a large reference terminology for the clinical                       The exact relationship between SNOMED CT’s STs and
domain made up of 300,000+ active concepts with machine-                       concepts has thus far not been widely researched. In
readable logical definitions that can be used for logical infer-               (Ceusters & Bona, 2016) we explored how the STs of con-
ence (IHTSDO, 2015). SNOMED concepts are organized                             cepts changed over time. We found in total 285 patterns ac-
into a hierarchy of ‘Is-a’ relations. The top concept,                         cording to which SNOMED CT concepts underwent changes
138875005 | SNOMED CT Concept (SNOMED                                          in the STs assigned to them -- a change from no ST at all to a
RT+CTV3) directly subsumes 19 high level concepts. This                        ST (43 patterns) counted also as a change. There were no pat-
includes first order concepts such as 404684003 | Clinical                     terns with more than 3 changes over time. Changes in STs
finding (finding), and 123037004 | Body structure (body                        were found to happen for a number of reasons. One is a
structure), which serve as the root of subhierarchies of con-                  change in SNOMED CT’s concept model, for instance when
cepts about entities directly relevant to and within the domain                distinctions are made that didn’t exist in earlier versions, or
of healthcare. It includes also relations used amongst con-                    different interpretations were introduced (e.g. the product /
cepts in SNOMED CT as well as second order concepts that                       substance distinction). Such changes have a global impact on
describe the structure of SNOMED CT rather than the struc-                     large parts of the ontology. Another reason is that concepts
ture of what the first-order concepts of SNOMED CT are                         were in one or other way erroneous and had to be corrected.
about. Every SNOMED CT concept comes with descriptions                         While doing these analyses, we were nevertheless hampered
one of which is selected as the Fully Specified Name (FSN)                     by the fact that the SNOMED CT documentation available
and which typically ends in a semantic tag (ST) that disam-                    from the IHTSDO webserver provides insufficient infor-
biguates it from other concepts that may have similar names                    mation on what the precise set of STs the SNOMED CT edi-
(IHTSDO, 2015, p41). The ST also serves to indicate where                      tors are working with might be. The information that a ST is
the concept fits into the SNOMED CT concept hierarchy                          that what appears at the end of a FSN between brackets
(IHTSDO, 2017). For example, the concepts [35566002 | He-                      (IHTSDO, 2015, p41) turned out not to be reliable. Histori-
matoma (morphologic abnormality)] and [385494008 |                             cally, FSNs didn’t have a ST at all as this was apparently in-
Hematoma (disorder)] have the STs ‘morphologic abnor-                          troduced later as witnessed by the many changes in descrip-
mality’ and ‘disorder’ attached to the name they have in                       tions to that end. It was found that parsing anything that ter-
comon: hematoma. In the hierarchy, these concepts are ulti-                    minates a FSN between brackets leads to many false positives
                                                                               in older concepts, thus requiring manual inspection for dis-
                                                                               ambiguation. The work presented here examines the January
* To whom correspondence should be addressed: jonathanbona@gmail.com
                                                                               31, 2017 International Release of SNOMED CT to investi-



                                                                                                                                             1
K.Takahashi et al.



gate the extent to which SNOMED CT’s use of STs is sys-              We define therefore the corresponding concept for any ST
tematic and consistent with its placement of concepts that use    t as: the highest concept in the hierarchy that is tagged with
those STs within the concept hierarchy. Research hypotheses       t. Note that this definition does not require tags to keep the
driving this work are:                                            same corresponding concept across releases.
                                                                     Based on this we determine the corresponding concept Ct
    (1) All STs are related to the concept system through a
                                                                  for each ST in a SNOMED release by:
         one-to-one correspondence between the ST and
         some high-level concept. Every concept that uses a         (1) Calculating the whole number depth for each concept
         particular ST t should be subsumed by that ST’s                 C as the length of the shortest Is-a path from the top
         ‘corresponding concept’ Ct, where Ct is the highest             concept to C.
         level concept that uses t. This hypothesis is moti-        (2) For each ST t, select from the set of concepts tagged
         vated by the apparent change in terminology from                with t the concept with the least depth, Xt.
         ‘semantic tag’ in (IHTSDO, 2015) to ‘hierarchy             (3) Let Ct = Xt if none of Xt’s ancestors is tagged with t.
         tag’ in (IHTSDO, 2017).                                         Otherwise let Ct be the ancestor of Xt that has the
    (2) We consider a concept to be ‘mismatched’ if it has the           least depth.
         ST t but is not subsumed by the corresponding con-
                                                                     Step 3 is necessary to handle special cases (Fig. 1) that arise
         cept Ct.
                                                                  from SNOMED CT’s use of multiple inheritance caused by
    (3) Where such mismatches exist, they are due to errors       its Is-a hierarchy forming a directed acyclic graph with a sin-
         in the concept’s placement in the hierarchy or in its    gle root node (SNOMED CT Concept) that has no edges com-
         ST, and should be corrected in future releases.          ing into it (i.e. is not subsumed by any other concept). Such
  This paper reports on techniques we have developed to de-       special cases occur whenever there is a concept with some ST
tect mismatched concepts, categorize them, and extract pat-       t that is closest to the top as compared to all other concepts
terns to understand how they change over time as new ver-         with ST t, and at the same time is also subsumed by another
sions of SNOMED CT are released.                                  concept with ST t that has a longer shortest path to the top
                                                                  concept. Such patterns were found in some releases making
2     METHODS                                                     it thus possible for a more general concept – i.e. higher up in
                                                                  the hierarchy – for a ST to be subsumed by less general con-
We have developed computational procedures (1) to identify
                                                                  cepts that use the same ST.
the concept that corresponds to an ST and (2) to facilitate an-
swering questions about subsumption that involve consider-                                    123037004
                                                                                             Body structure
ing all SNOMED CT concepts in each release. These are de-                                   (body structure)
scribed in detail below.
                                                                                                         442083009
2.1     Identifying tag corresponding concepts                                                      Anatomical or acquired
                                                                                                        body structure
In order to determine whether a concept C is mismatched it                                             (body structure)
is necessary to know which concept is the corresponding con-
                                                                                                         91723000
cept for C’s ST. There does still not appear to be an official                                       Anatomical structure
                                                                                91832008
published mapping that lists the ST / concept correspond-                      Anatomical             (body structure)
ences for SNOMED CT. In many cases this correspondence                    organizational pattern
may seem obvious to a human observer since for some tags                     (body structure)               4421005
                                                                                                         Cell structure
there is a single high-level concept that uses the tag and                                              (cell structure)
whose name is the same as the tag. For example, one direct
sub-concept of the top SNOMED CT Concept is 71388002 |                                                   67185001
                                                                                                     Subcellular structure
Procedure (procedure). This concept is has the ST ‘proce-                                              (cell structure)
dure’ and its name in the FSN is the word ‘Procedure’.
                                                                                           11874005
   In other cases, the correspondence is less obvious. For in-                      Distinctive arrangement
stance, no direct sub-concept of the top concept is tagged                          of cytoplasmic filaments
‘morphologic abnormality’, nor is there any concept whose                                (cell structure)
name is exactly ‘Morphologic abnormality’. The concept
                                                                  Fig. 1. Effect of concept multiple inheritance on ST hierarchy.
118956008 | Body structure, altered from its original an-
atomical structure (morphologic abnormality) is a child of          The output of this process is a mapping of STs to corre-
123037004 | Body structure (body structure) and appears           sponding concepts for each release. This mapping is fairly
to be the highest concept (i.e. closest to the top) tagged with   stable across releases, though there are some changes, which
‘morphologic abnormality’.                                        we discuss more in the results section below.


2
                                                         Scrutinizing the relationships between SNOMED CT concepts and semantic tags



2.2    Identifying mismatched concepts                              cepts among their subsumers These were organized into a ta-
Once a corresponding concept has been identified for each           ble with concepts as rows and SNOMED CT release dates as
tag in each release, it is possible to find mismatched concepts     columns, with each cell indicating the concept’s category for
by looking at each concept in turn to see whether it is sub-        that release. We took of course into account that SNOMED
sumed by the corresponding concept for its ST. We devel-            CT concepts can be either active or inactive in a release and
oped computational procedures to do this.                           that a concept that is active in one release may be deactivated
   In order to make use of the built-in subsumption reasoning       in the next one, for instance if the concept was deemed by
provided by standard semantic web tools, we constructed an          SNOMED CT’s editors to be no longer accurate or useful.
RDF/OWL model that represents SNOMED CT’s concept                   Less commonly, a concept that is inactive at one release may
hierarchy (300,000+ concepts connected by the is-a relation)        be (re)activated at the next. We consider concepts to be not
and STs. Each concept is represented as an OWL class with           active in releases that precede their addition to SNOMED CT.
separate annotations for its FSN and ST. Each of SNOMED’s             The categories into which concepts were classified were
Is-a relations between concepts has a corresponding rdf:sub-        constructed by building up a three-character code ‘_ _ _’
ClassOf assertion this representation. We built one such            where each character is a flag indicating whether a certain
OWL file for each SNOMED CT release from January 2003               condition holds of the concept in that release. If a concept is
to January 2017. The identifiers (URIs) for each concept use        inactive or did not yet exist at a release, then that concept was
a namespace that indicates the release version, e.g.                marked with the three-character empty code ‘ ‘ for that re-
 is an identifier for the         lease. The following construction principles were used:
concept with concept id 64572001 in the January 31 2017 re-              The first character is ‘Y’ if the concept is subsumed by
lease. These files were loaded into a single repository in a              its ST’s corresponding concept in this release (i.e. if it is
triple store database (Bishop et al., 2011) configured for                NOT mismatched in the release), and ‘N’ otherwise.
RDFS+ inference that, upon loading, pre-computed sub-
                                                                         The second character is ‘Y’ if the concept has any ances-
sumption for each hierarchy, resulting in a total of 185 mil-
                                                                          tor concept that is NOT mismatched. It is ‘N’ if every
lion triples. This facilitates very fast retrieval of subsumption
                                                                          ancestor of this concept is mismatched.
information using simple SPARQL queries, and allows us to
instantly answer questions such as: given a release R, a tag t,          The third character is ‘Y’ if the concept has any ancestor
and a concept C, which concept - if any - are tagged with t in            concept that IS mismatched. It is ‘N’ if no ancestor of
R, but not subsumed by C in R? As an example, the following               this concept is mismatched.
query retrieves the concept URI, label, and ST for every con-          Combinatorically, this would allow us to code for nine dif-
cept that is not subsumed by 64572001 | Disease (disorder)          ferent situations including the inactive concepts. However,
even though it uses the corresponding tag.                          given the meanings assigned to these codes, some combina-
                                                                    tions are impossible. Ideally, every active concept in
  PREFIX corr: 
  PREFIX tagged: < http://ex.com/r20170131#tagged>
                                                                    SNOMED would be in the ‘YYN’ category, indicating that
  PREFIX :                                the concept is properly matched to its ST’s corresponding
  SELECT ?conc ?l ?tag                                              concept, as are all of the concepts above it. Possible codes for
  WHERE {                                                           mismatched concepts are 'NYY' and 'NYN' while for non-
    ?conc rdfs:label ?l .
    ?conc tagged: ?target_tag .                                     mismatched concepts 'YYN' and 'YYY'. The latter indicates
    corr: tagged: ?target_tag .                                     a concept that itself is not mismatched, but it is subsumed by
    ?conc tagged: ?tag .                                            at least one mismatched concept.
    FILTER NOT EXISTS {?conc rdfs:subClassOf corr: }
  }
  Two flavors of mismatches were then looked for: (a) ‘local’       3     RESULTS
mismatching as defined in assumption (2) above which oc-            3.1      Corresponding concept mappings
curs within the scope of a specific release, and (b) ‘global’
                                                                    We used the corresponding concept discovery procedure pre-
mismatching in which the reference is the most recent release
                                                                    sented above to construct a table with [STconcept] pairs
investigated. The group of globally mismatched concepts in-
                                                                    and inspected this table manually to assess whether the map-
cludes thus those concepts which have in at least one version
                                                                    pings made sense. In the majority of cases, the ST turned out
V a semantic tag which is different from the one it has in the
                                                                    to be identical to the name of the corresponding concept mod-
last version, whether or not it is locally mismatched in V.
                                                                    ulo capitalization and spacing. Exceptions were: [SNOMED
2.3    Characterizing mismatched concepts                           RT+CTV3SNOMED                CT       Concept],    [metadata
We then group locally mismatched concepts into categories           SNOMED CT Model Component], [Environment / lo-
based on the presence or absence of other mismatched con-           cationEnvironment or geographical location], [Staging




                                                                                                                                     3
Bona and Ceusters




                           Table 1: Per-tag counts of globally mismatched concepts for the January release of each year


                       0301 0307 0401 0407 0501 0507 0601 0607 0701 0707 0801 0807 0901 0907  1001
        disorder           0    0    0    0    0    0    0    0    0   44    8   19    0    0      0
        finding            4    1    0    5    0    0    0    0    0    0    0    0    0    0      0
   observable entity       0    0    0    2    0    0    0    0    0    0    0    0    0    0      0
        product            0    0    0    0    0    0    0    0    0    0    0    0    1    1      1
    regime/therapy         0    0    0    0    0    0    0    0    0   37  259  123    0    0      0
       substance           1    0    0    0    0    0    0    0    0    0    0    0    0    0      0
                       1007 1101 1107 1201 1207 1301 1307 1401 1407 1501 1507 1601 1607 1701 Total
        disorder           0    0    0    0    2    4   10   26   32   44   24   74   78   83   188
        finding            0    0    0    0    0    0    0    0    0    0    0    0    0    0    10
   observable entity       0    0    0    0    0    0    0    0    0    0    0    0    0    0      2
        product            1    1    1    1    1    1    1    1    1    1    1    1    1    1      1
    regime/therapy         0    0    0    0    0    0    0    0    0    0    3    3    3    4   263
       substance           0    0    0    0    0    0    0    0    0    0    0    0    0    1      2
                                     Table 2: Per-tag counts of locally mismatched concepts per release.


scaleStaging and scales], [situationSituation with ex-                 release and examined this table for changes. We found that
plicit context], [assessment scaleAssessment scales], [re-              the majority of tag to corresponding concept pairings are sta-
gime/therapyRegimes and therapies], [cellEntire cell],                 ble over all releases. A few are absent initially but appeared
[morphologic abnormalityBody structure, altered from its                when their tag was added to SNOMED, e.g.415229000 | Ra-
original anatomical structure], [geographic locationGeo-                cial group (racial group) appears for the first time in Janu-
graphical and/or political region of the world], [prod-                  ary 2005.A few are present initially but disappeared when
uctPharmaceutical / biologic product], and [disorderDis-               they were removed from SNOMED, e.g. 304813002 | Ad-
ease]. These seem to be plausible mappings. We then con-                 ministrative values (administrative concept), which was
structed a table with all tag corresponding concepts in every            removed as of the July 2010 release.



                                                                                                                                     4
                                                       Scrutinizing the relationships between SNOMED CT concepts and semantic tags



  Some corresponding concepts had minor edits made to                  Release    NE Inact. N|I NYN NYY YYN YYY #Err. %Err. Total
their FSNs, but we do not count this as a change in corre-            20030131     78     0 78    0    0 110  0     0     0 110
sponding concept. Finally, one tag switched its correspond-           20030731     78     0 78    0    0 110  0     0     0 110
ing concepts from one release to the next: the tag ‘finding’          20040131     75     0 75    0    0 113  0     0     0 113
initially had as its corresponding concept 246188002 | Find-          20040731     75     0 75    0    0 113  0     0     0 113
                                                                      20050131     75     0 75    0    0 113  0     0     0 113
ing (finding) but this concept was deactivated in the January
                                                                      20050731     75     0 75    0    0 113  0     0     0 113
2004 release and the ‘finding’ corresponding concept
                                                                      20060131     75     0 75    0    0 113  0     0     0 113
changed to 404684003 | Clinical finding (finding).                    20060731     75     0 75    0    0 113  0     0     0 113
3.2   Mismatched concepts                                             20070131     73     0 73    0    0 115  0     0     0 115
                                                                      20070731     66     0 66 18     26 78   0 44 36.07 122
After identifying all mismatched concepts for every ST in             20080131     61     0 61    6    2 119  0     8 6.3 127
every release, we organized counts of mismatched concepts             20080731     60     0 60    1   18 109  0 19 14.84 128
into two tables, one for global mismatching (Table 1) and             20090131     60     0 60    0    0 128  0     0     0 128
one for local mismatching (Table 2), with one row per ST              20090731     60     0 60    0    0 128  0     0     0 128
and one column per release (Table 1, for readability, contains        20100131     60   15 75     0    0 113  0     0     0 113
only the counts for the January versions of each release). The        20100731     59   15 74     0    0 114  0     0     0 114
number of globally mismatched concepts per release is gen-            20110131     59   15 74     0    0 114  0     0     0 114
erally decreasing over time, and has gone from 14,814 (5%             20110731     59   15 74     0    0 114  0     0     0 114
                                                                      20120131     59   15 74     0    0 114  0     0     0 114
of active concepts) in the January 2003 release to 89
                                                                      20120731     58   15 73     2    0 113  0     2 1.74 115
(0.027%) in the January 2017 release. The number of global
                                                                      20130131     56   15 71     4    0 113  0     4 3.42 117
mismatches dropped dramatically from the July 2005 release            20130731     50   15 65 10       0 113  0 10 8.13 123
(14,715 mismatches – 4.83% of active concepts) to the next            20140131     40   15 55 17       9 107  0 26 19.55 133
release in January 2006 (1522 – 0.5%). This improvement is            20140731     35   15 50 22      10 106  0 32 23.19 138
likely attributable to large changes in the hierarchy that in-        20150131     21   15 36 34      10 108  0 44 28.95 152
volved changes in semantic tags for three hierarchies: ‘disor-        20150731     12   16 28 24       0 136  0 24      15 160
der’, ‘event’, and ‘finding’. This reorganization is docu-            20160131      6   16 22 65       9 92   0 74 44.58 166
mented in the SNOMED CT Editorial Guide’s section on                  20160731      5   16 21 67      11 89   0 78 46.71 167
Changes and historical notes: ‘In January 2006, a number of           20170131      0   16 16 69      14 89   0 83 48.26 172
concepts from the | Clinical finding | hierarchy were moved                      Table 3. Locally mismatched ‘disorder’ concepts
to the Event hierarchy’ (IHTSDO, 2015, p294).
In the January 2017 release there are only four tags with mis-
matched concepts for a total of 89 mismatches: 83 are tagged      4     DISCUSSION
‘disorder’, four are tagged ‘regime/therapy’, one is tagged       Our hypothesis that SNOMED CT intends its STs to have a
‘product’, and one is tagged ‘substance’.                         one-to-one correspondence between tags and certain high-
                                                                  level concepts is supported by:
3.3   Mismatched disorders
                                                                      (1) the very existence of identifiable tag corresponding
Table 3 provides more detail about the categorization of the
                                                                          concepts (a single ‘highest’ concept for each tag that
188 locally mismatched ‘disorder’ concepts by release. This
                                                                          is close to the top concept and that in each case sub-
table was constructed by collecting all concepts that at least
                                                                          sumes the vast majority of concepts that use the tag);
once in their lifetime were locally mismatched while having
the semantic tag ‘disorder’.                                        (2) the generally very low occurrence of mismatched con-
  The colum marked ‘NE’ has for each release counts of the                cepts.
number of ‘disorder’ concepts that appear and are mis-              Errors, however, remain present and sometimes are even
matched in some later SNOMED release, but that did not yet        introduced, interestingly as witnessed by our analysis of the
exist at the release for that row.                                locally mismatched disorder concepts (Table 2) increasingly
  The ‘Inact.’ column counts how many were active concepts        more in more recent versions. A particularly illustrative ex-
in an earlier release but were inactive at the row release.       ample of this in the January 2017 release is the concept
  The column ‘N | I’ is a sum of the previous two columns.        109186003 | Sickle cell test kit (substance) which turned out
  The ‘NYN’, ‘NYY’, ‘YYN’, and ‘YYY’ columns count the            to be newly mismatched as there were no ‘substance’ con-
number of concepts that fall into each of those categories. In    cepts mismatched globally from 2009 until this release, and
the January 2017 release, the 83 mismatched concepts fall         not even a locally mismatched ‘substance’ concept ever be-
into two of our categories: ‘NYN’ (69) and ‘NYY’ (14).            fore. It is mismatched because it is not subsumed by the ‘sub-
                                                                  stance’ tag’s corresponding concept 105590001 | Substance
                                                                  (substance). Indeed, the sickle cell test kit concept is directly



                                                                                                                                   5
K.Takahashi et al.



subsumed by 385387009 | Test kit (physical object), which                                  level concepts. The occurrence of mismatches between the
has 29 other children that all have the words ‘test kit’ in their                          semantic tags of lower-level concepts and their placement in
FSN and are correctly tagged with ‘physical object‘ (e.g.                                  another hierarchy than where expected according to the se-
1109190001 | Virus test kit (physical object)).                                            mantic tag is a sign that the SNOMED CT authoring tool is
  There are a number of ways for a mismatched concept to                                   not equipped with a formal mechanism to keep the hierarchy
appear in a release. These include the addition of a new con-                              consistent with the semantic tags. It is our recommendation
cept, re-activation of an old concept, and changes in the con-                             that such mechanism would be implemented and the method
cept’s subsumption hierarchy. In the sickle cell test kit case                             developed here might be a good starting point in addition to
changes in the hierarchy are responsible: in 2016 and earlier,                             other mechanisms for quality control that have been devel-
the concept 385387009 | Test kit (physical object) was itself                              oped by third parties (Geller, Ochs, Perl, & Xu, 2012; Ochs
mismatched, being subsumed by 105590001 | Substance                                        et al., 2015).
(substance) and not by the ‘physical object’ tag’s corre-
sponding concept, 260787004 | Physical object (physical                                    ACKNOWLEDGEMENTS
object). The test kit concept’s children were all tagged ‘sub-                             This work was supported in part by Clinical and Translational
stance’.                                                                                   Science Award NIH 1 UL1 TR001412-01 from the National
  In 2017 the test kit concept was (correctly) moved to the                                Institutes of Health. The content of this paper is solely the
physical object hierarchy, and it went from being mismatched                               responsibility of the authors and does not necessarily repre-
to being not mismatched. 29 of its children had their FSNs                                 sent the official views of the NIDCR, the NLM or the Na-
changed to use the tag ‘physical object’. Though this move                                 tional Institutes of Health.
resolved one mismatch, another appeared: the sickle cell test
kit concept became mismatched as a result, as shown in the                                 REFERENCES
two hierarchy excerpts in Fig.4. Most of the child concepts
                                                                                           Bishop, B., Kiryakov, A., Ognyanoff, D., Peikov, I., Tashev, Z., &
of 385387009 | Test kit (physical object) are omitted here in                                 Velkov, R. (2011). OWLIM: A family of scalable semantic
the interest of space, as are child concepts of all the other con-                            repositories. Semant. web, 2(1), 33-42.
cepts that appear here.                                                                    Ceusters W. (2009) Applying Evolutionary Terminology Auditing
                                                                                              to the Gene Ontology. Journal of Biomedical Informatics
                      2016                                       2017                         42:518–529.
                  138875005                                   138875005                    Ceusters, W., & Bona, J. P. (2016). Analyzing SNOMED CT's
                 SNOMED CT                                   SNOMED CT                        Historical Data: Pitfalls and Possibilities. AMIA Annu Symp
                   Concept                                     Concept
                                                                                              Proc, 2016, 361-370.
        105590001             260787004            105590001             260787004         Geller, J., Ochs, C., Perl, Y., & Xu, J. (2012). New abstraction
         Substance          Physical object         Substance          Physical object        networks and a new visualization tool in support of auditing the
        (substance)        (physical object)       (substance)        (physical object)
                                                                                              SNOMED CT content. AMIA Annu Symp Proc, 2012, 237-246.
                  385387009                                    385387009                   IHTSDO. (2015). International Health Terminology Standards
                   Test kit                                     Test kit
                                                                                              Development Organization - SNOMED CT® Technical
               (physical object)                            (physical object)
                                        +28                                          +28
                                                                                              Implementation Guide - January 2015 International Release (US
       109186003              109190001            109186003              109190001           English).
    Sickle cell test kit     Virus test kit    Sickle cell test kit      Virus test kit
       (substance)           (substance)          (substance)          (physical object)
                                                                                           IHTSDO. (2017). SNOMED CT Editorial Guide: 7.3.4. Hierarchy
                                                                                              tag. Retrieved from https://confluence.ihtsdotools.org/display/
                                                                                              DOCEG/7.3.4+Hierarchy+tag
           Fig.4. Test kit concept changes 2016 - 2017
                                                                                           Ochs, C., Geller, J., Perl, Y., Chen, Y., Xu, J., Min, H., . . . Wei, Z.
                                                                                              (2015). Scalable quality assurance for large SNOMED CT
5       CONCLUSION                                                                            hierarchies using subject-based subtaxonomies. J Am Med
                                                                                              Inform Assoc, 22(3), 507-518. doi:10.1136/amiajnl-2014-
We have successfully demonstrated that it is possible to im-                                  003151
plement an algorithm that maps semantic tags to correspond-
ing SNOMED CT concepts. We applied this mapping in an
analysis of all active concepts across SNOMED CT releases,
assessing the extent to which the tags as used reflect the
placement within the hierarchy of the concepts that use them
both locally, and, in the spirit of Evolutionary Terminology
Auditing (Ceusters 2009) with respect to the last version
which functions as a gold standard. The results support our
hypothesis that SNOMED CT indeed intends its semantic
tags to have a one-to-one correspondence with certain high-



6