Representing SNOMED CT Concept
            Evolutions using Process Profiles.
                        Werner CEUSTERS1 and Jonathan P. BONA
   a
       Division of Biomedical Ontology, Department of Biomedical Informatics, Jacobs
              School of Medicine and Biomedical Sciences, University at Buffalo


             Abstract. SNOMED CT is a very large biomedical terminology supported by a
             concept-based ontology. In recent years it has been distributed under the new
             release format ‘RF2’. RF2 provides a more consistent and coherent mechanism for
             keeping track of changes over versions, even to the extent that – in theory at least –
             any release will contain enough information to allow reconstruction of all previous
             versions. In this paper, using the January 2016 release of SNOMED CT, we
             explore various ways to transform change-assertions in RF2 into a more uniform
             representation with the goal of assessing how faithful these changes are with
             respect to biomedical reality. Key elements in our approach are (1) recent
             proposals for the Information Artifact Ontology that provide a realism-based
             perspective on what it means for a representation to be about something, and (2)
             the expectation that the theory of what we call ‘process profiles’ can be applied not
             merely to quantitative information artifacts but also to other sorts of symbolic
             representations of processes.

             Keywords. SNOMED CT, ontological realism, changes in ontologies


1. Introduction

There are many differing views on what it means to do research conducted under the
term ‘ontology’, on what ontologies as representational artifacts exactly are, on what
the precise role of ontologies in information systems is, on what they should or should
not be used for, and on what qualities or capabilities they should have [1]. Our view is
that an ontology should be a faithful representation of the part of reality that it covers
[2, 3]: looking through the c(/g)lasses of an ontology and how they are organized
therein, should give us exactly the same view as if we were looking directly at the
structure of the corresponding part of reality. This would hold both for T-box and A-
box assertions, as well as throughout time. Imagine on one side of a room an aquarium
with 12 fish, three of each of 4 types and some plants of various types, rocks, etc. and
on the other side a holographic simulation of that aquarium and its relevant
environment powered by a faithful ontology (including an A-box). If the simulation is
synchronized from the start with the exact configuration of the aquarium at that time,
though without access to the aquarium and its contents itself, then any change in the
aquarium would happen in exactly the same way as in the simulation. If we would run
the simulation faster, then we would see exactly what is going to happen in the

       1
        Corresponding author. New York State Center of Excellence in Bioinformatics & Life Sciences. 701
Ellicott street, suite B2-160, Buffalo NY – 14203, USA. Email: ceusters@buffalo.edu
aquarium at a future time. To keep the simulation faithful, the maintenance contract for
the aquarium and its contents, should go hand in hand with a maintenance contract for
the ontology. If, for instance, fish of another type were to be added, then the ontology
would need to be updated accordingly. These updates should be such that, by
inspecting the ontology directly, we could find out exactly what happened in reality.
This means also that the formalism, language, data structures, i.e. the entire
representational machinery in and through which the ontology is expressed, must allow
us to detect which changes in the ontology correspond to changes in reality, and which
are purely ontology- or simulation internal. If the ontologist who maintained the
ontology for this simulation is replaced by one who is color-blind and therefore
changed the ontology so as to write out in words the names of typical fish colors on the
avatars in the simulation, then we should not be forced to believe that the goldfish in
our aquarium suddenly have the word ‘orange’ written all over their bodies.
     It is this line of thinking that formed the basis of Evolutionary Terminology
Auditing, a framework designed to measure quality improvements in ontologies over
time using reality as benchmark by taking into account changes in reality itself,
changes in our scientific understanding thereof, and pure editorial changes such as
corrections of mistakes or changes in representation that are not inspired by changes in
reality [4]. In [5, 6] this framework was applied to 18 versions of the Systematic
Nomenclature of Medicine / Clinical Terms (SNOMED CT) [7] with the conclusion
that changes to concepts over those versions do not necessarily correspond to
improvements in quality, and that many changes are due to idiosyncrasies in the
underlying ontology rather than to changes in the domain or in our scientific
understanding. In [8], the method was found to have predictive power over future
quality improvements in the Gene Ontology. It was also applied to the Basic Formal
Ontology (BFO) [3] which led to a number of improvements to the framework itself [9].
     In these past efforts we looked at consecutive versions of an ontology from the
perspective of reality, the goal being to assess quality improvements of the ontology in
terms of corresponding changes in reality. Here we look instead at mechanisms that an
ontology can offer to let us see changes in reality in a reliable way by examining the
changes in the ontology. We use as foundations the Basic Formal Ontology (BFO) [3]
and recent proposals for the Information Artifact Ontology (IAO) [10] that provide a
realism-based perspective for what it means for a representation to be about something.
SNOMED CT is an ideal candidate for such analytical exploration as its distribution in
the last few years includes a new release format known as ‘RF2’ which is characterized
by a more elaborate, and – as we will demonstrate unfortunately not yet totally –
coherent and consistent representation of changes in its content to the extent that each
newly released version includes all previous versions rolled up inside itself. Our
exploration forms the basis for a long-term research objective to determine whether the
totality of assertions about changes in SNOMED CT rather than about external reality
constitutes in and of itself a valuable resource to identify patterns that allow detecting
mistakes in assertions about external reality that have thus far not been discovered.


2. SNOMED CT as a concept-based ontology

SNOMED CT – the name used to be an acronym for Systematic Nomenclature of
Medicine / Clinical Terms but is now considered a mere brand name of a new product
that grew out of this nomenclature – is developed by the International Health
Terminology Standards Development Organization (IHTSDO) and is claimed,
probably rightly, to be the largest healthcare terminology currently available [7]. The
International Edition released on January 31, 2016 is supported by an ontology
consisting of 319,446 active concepts which are connected by in total 962,497 active
relationships and described by 1,097,028 active descriptions which link 999,639 terms
to these concepts. The relationships reported here are those generated by IHTSDO’s
EL++ description logic classifier on the basis of 655,312 active so-called stated
relationships which have been directly edited by authors or editors prior to running the
classifier on the logic definitions [11, p108].
     In addition to active components – ‘component’ being the umbrella term used by
IHTSDO for concept or relationship or description – SNOMED CT contains also
inactive components which were active in one or more prior versions but at some point
have been inactivated for one or other reason. Indeed, SNOMED CT is regularly
updated [12], not only to correct mistakes, but also to reflect changes in biomedical
science. Concepts are classified under several hierarchies. Most top classes correspond
to the types of entities instances of which are encountered by clinicians during their
work (body parts, organisms, diseases, substances, procedures, etc.) while other top
classes correspond to types instantiated by descriptive elements of the SNOMED CT
knowledge representation itself, for example classes denoted by terms such as ‘inactive
concept’, ‘navigational concept’, and ‘core metadata concept’ [13]. Although the
number of classes of this sort was originally – and is still – rather small, it is increasing
as a result of the move from Release Format 1 (RF1) to Release Format 2 (RF2). The
latter was introduced in 2012 to implement a more robust and consistent representation
of versions including an added hierarchy to represent metadata about the structure of
SNOMED CT itself [11 p127, 14].
     At the heart of SNOMED CT is the notion of ‘concept’ which in the SNOMED CT
documentation is defined as ‘a clinical idea to which a unique concept identifier has
been assigned’[11, p38]. What is represented by a specific concept cannot be
determined on the basis of the identifier, but ‘the meaning of a concept can be
determined from relationships to other concepts and from associated descriptions that
include human readable terms’ [11, p87]. Descriptions provide for each concept a
Fully Specified Name (FSN): ‘Each concept has at least one Fully Specified Name
(FSN) intended to provide an unambiguous way to name a concept. The purpose of the
FSN is to uniquely describe a concept and clarify its meaning’ [11, p40]. Furthermore:
‘Each FSN term ends with a “semantic tag” in parentheses. The semantic tag indicates
the semantic category to which the concept belongs (e.g. clinical finding, disorder,
procedure, organism, person, etc.). The “semantic tag” helps to disambiguate different
concepts which may be referred to by the same commonly used word or phrase’ [11,
p41]. For example, it is the semantic tag ‘morphologic abnormality’ in the FSN
‘Hematoma (morphologic abnormality)’ that disambiguates the concept to which this
FSN is assigned from a second concept with FSN ‘Hematoma (disorder)’. The former
is intended to be used for what ‘a pathologist sees at the tissue level’, while the latter
‘represents the clinical diagnosis that a clinician makes when they decide that a person
has a “hematoma”’ [11, p41].
     SNOMED CT’s authors have noted – and have to a certain extent started to act
upon, though not completely satisfactorily – the confusions around what ‘concept’
might denote [15]. Despite their definition of ‘concept’ as a clinical idea, the term is
also stated to be a homonym for ‘concept identifier’ as well as for ‘the real-world
referent(s) of the concept identifier, that is, the class of entities in reality that the
concept identifier represents’ [11, p127]. One consequence is that there are doubts
about the sort of ontological commitments that are made by SNOMED CT authors and
editors [16]. Another consequence is that SNOMED CT contains many ambiguities and
competing interpretations of, for instance, pathological conditions and disorders [17].
     Another consequence of this ambiguity, the one we address specifically in this
paper, is that it also requires every occurrence of the word ‘concept’ in the SNOMED
literature – and indeed, in the literature about concept-based ontologies in general – to
be disambiguated in terms of whether it is used to denote something which is outside or
inside the ontology. Tumors, procedures and other entities clinicians come in contact
with while at work are outside SNOMED CT. Examples of something inside the
SNOMED CT representation are the SNOMED CT concept identifier ‘313029009’ and
the corresponding FSN ‘Brachytherapy – action (qualifier value)’, both of which are
supposed to denote the method involved in what it takes for a procedure to be of a sort
denoted both by ‘384692006’ and by the term ‘Brachytherapy procedure’.
     This ambiguity arises not only in the documentation but also in SNOMED CT
itself ! We can safely assume that the relationship (T1), between a procedure and a
qualifier value, extracted from the SNOMED CT relationships file and rendered in
human readable form by using FSNs is, as SNOMED CT puts it, about ‘a class of
entities in reality’, thus about something outside SNOMED CT. More concretely: the
term ‘Intracavitary brachytherapy (procedure)’ is inside SNOMED CT, but that what
this term denotes and of which a specific brachytherapy procedure carried out on a
specific patient is an instance (see section 3), is on the outside.

         ‘Intracavitary brachytherapy (procedure)’                                  (T1)
            – ‘Method (attribute)’
                     – ‘Brachytherapy – action (qualifier value)’
         ‘Actions by modality (qualifier value)’                                    (T2)
            – ‘Is a (attribute)’
                     – ‘Action (qualifier value)’
         ‘Brachytherapy – action (qualifier value)’                                 (T3)
             – ‘Is a (attribute)’
                    – ‘Actions by modality (qualifier value)’

     We can however no less safely assume that the triples (T2) and (T3) are to be
interpreted as statements about how SNOMED CT classifies certain actions, perhaps in
order to allow for easier browsing when SNOMED CT is used in some application as
an interface terminology. These are thus statements about something inside SNOMED
CT, rather than that ‘actions by modality’, on the outside, are a special kind of actions
in and by itself of which brachytherapy actions are an example. These distinctions are
important if we want to quantify reliably how much of external reality is represented in
SNOMED CT and how SNOMED CT is qualitatively improving as a representation
using reality as benchmark. For example, although (T2) and (T3) together use three
concepts – (1) ‘actions by modality’, (2) ‘action’, and (3) ‘brachytherapy’ – only two of
them, (2) and (3), correspond to an entity in reality.
3. SNOMED CT as an Information Content Entity

One way to address these issues is to perceive a version of SNOMED CT as an
instance of an Information Content Entity (ICE), i.e. the sort of entity which is
represented as the root of the IAO which is under development as a BFO-compatible
ontology for information artifacts [10]. Table 1 summarizes the definitions (Dn) and
elucidations (En) as they crystalized out of several proposals in the past few years [10,
18-21]. They are themselves based in part on the terms ENTITY, GENERICALLY
DEPENDENT CONTINUANT, MATERIAL ENTITY, QUALITY, FUNCTION and ROLE as well as
the notions of specific and generic dependence as defined in BFO [3]. These definitions
allow us to perceive a version of SNOMED CT as an ICE of which concretizations
exist as INFORMATION ARTIFACTS in the form of, for example, a paper print out, or the
portion of a hard drive which contains the RF2 distribution files each one of which can
be rendered as a table on a computer screen by using appropriate software.
    In this light, the PORTION OF REALITY (PoR) described by SNOMED CT, includes,
from the ontological realist perspective as we perceive it [2]:
     1. universals, such as, for instance, the universal denoted by the SNOMED CT
          concept identifier ‘126838000’ which is further annotated by means of the
          description (with ID ‘126016’) stating that the term ‘neoplasm of colon’ is an
          allowed term since the January 2002 version;
     2. relations, for instance the formal subsumption relation which in SNOMED CT
          is represented by the concept identifier ‘116680003’ and by the corresponding
          term ‘Is a (attribute)’;

                Table 1. Core definitions and elucidations for representation and aboutness
 INFORMATION CONTENT ENTITY (ICE) =def. an ENTITY which is (1) GENERICALLY                    [18]   (D1)
 DEPENDENT on (2) some MATERIAL ENTITY and which (3) stands in a relation of
 aboutness to some PORTION OF REALITY.
 INFORMATION QUALITY ENTITY (IQE) =def. a QUALITY that is the concretization of               [19]   (D2)
 some INFORMATION CONTENT ENTITY.
 ARTIFACT =def. a MATERIAL ENTITY created or modified or selected by some agent to            [10]   (D3)
 realize a certain FUNCTION or ROLE.
 INFORMATION ARTIFACT =def. an ARTIFACT whose FUNCTION is to bear an                          [10]   (D4)
 INFORMATION QUALITY ENTITY.
 REPRESENTATION =def. a QUALITY which is_about or is intended to be about a                   [20]   (D5)
 PORTION OF REALITY.
 MENTAL QUALITY =def. a QUALITY which specifically_depends_on an ANATOMICAL                   [20]   (D6)
 STRUCTURE in the cognitive system of an ORGANISM.
 COGNITIVE REPRESENTATION =def. a REPRESENTATION which is a MENTAL QUALITY.                   [20]   (D7)
 REPRESENTATIONAL UNIT (RU) = def. a smallest constituent sub-representation,                 [21]   (D8)
 including icons, names, simple word forms, or the sorts of alphanumeric identifiers we
 might find in patient records.
 x is_about y means: x refers to or is cognitively directed towards y. Domain:                [10]   (E1)
 COGNITIVE REPRESENTATIONS; Range: PORTIONS OF REALITY. Axiom: if x is_about y
 then y exists (veridicality).
 x concretizes y at t means:                                                                  [10]   (E2)
         x is a QUALITY & y is a GENERICALLY DEPENDENT CONTINUANT
         & for some MATERIAL ENTITY z, x specifically_depends_on z at t;
         & y generically_depends_on z at t;
         & if y migrates from bearer z to another bearer w then a copy of x will be
         created in w.
 x is_a_representation_of y =def. x is a REPRESENTATION & x is_about y (where y is a          [10]   (D9)
 portion of reality).
    3.    instances, for example the one denoted by the concept ID ‘223502009’ and
          corresponding FSN ‘Europe (geographic location)’; and,
     4. configurations, for instance the one directly referred to in the ICE concretized
          by the triple (T1), and the one indirectly represented by combining the triples
          (T3) and (T2) used as examples in section 2.
     Note that the representation formalism used by SNOMED CT is not able to let us
distinguish universals from instances [13]. Configurations are formally represented
through records in, for instance, the relationships file. This includes configurations
formed by ICEs themselves such as those denoted by records in Historical Association
Reference Sets (section 4.2). Relations are implicitly represented as such by being
subsumed by the concept with FSN ‘Attribute (attribute)’ and explicitly through their
specific position in records of, for example, the relationships file in RF2.
     To avoid the confusions arising from the word ‘concept’ as used in the SNOMED
CT documentation, we will use in this paper the term ‘SNOMED CT concept’ – or
‘concept’ for short – exclusively in the ICE sense, i.e. to denote a representational
element inside the SNOMED CT representation. If this representational element
succeeds in being about something (see D9), we will denote that something by terms
such as ‘the corresponding PoR’ or ‘the corresponding universal’. This holds also for
the other SNOMED CT components such as descriptions and relations. These terms
will exclusively be used to denote representational elements inside SNOMED CT.


4. Changes in SNOMED CT

4.1. Additions and deactivations

The content of SNOMED CT evolves with each release. The types of changes made
include the addition and inactivation of concepts, descriptions, and relationships as well
as updates in definitions, and to a certain extent also the provision of motivations for
these changes. Once released, SNOMED CT components are persistent and their
identifiers are not reused [11, p45]. When a component becomes inactive this is
indicated by the value of the active field, a field which is present in all components.
Components continue to be distributed even when they are no longer active. This
allows a current release to be used to interpret data entered using an earlier release.
Whereas in RF1 the history mechanism was only used to annotate changes in concepts
and descriptions, RF2 annotates changes in a consistent fashion for all components,
though only for changes that occurred since the January 2002 release. Within RF2, all
changes in components are represented in the corresponding files by adding a new row,
with the same component ID, a new effective time and any necessary change in the
component values. As an example, Table 2 shows that the concept ‘301381004’ with
FSN ‘Discomforting present pain (finding)’ was set to active in release 20020131 and
to inactive in 20080131. Table 3 shows that during the life time of that concept, it
underwent considerable changes in its reported relationships to other concepts after full
DL classification. It must however be noted that the SNOMED CT documentation
remains silent on whether these reported changes are syntactical changes, effectual
changes or a combination thereof. From Table 3 alone it can indeed not be assessed
whether the relationship ‘Isa – Pain (finding)’ is truly inactivated, or whether it is still
active in the historical transitive closures, something that can be computed on the basis
of the history information available in RF2. Between 2009 and 2011 there were
typically more effectual changes (74%) than ineffectual ones (26%); within the
removals there was a high number of ineffectual changes (37%) whereas in the
additions there were on average more effectual changes (84%) than ineffectual ones
(16%) [22]. Note, however, that ‘effectual change’ in [22] is to be understood as a pure
change inside SNOMED CT from one version to another, and not as an assertion that
an effectual change corresponds to a change in reality or SNOMED CT’s authors’
knowledge thereof.
     Table 4 demonstrates how changes in the descriptions of concepts are similarly
logged. Only one description record with the same descriptionID field is current at any
point in time. The current record is the one with the most recent Effective Time before
or equal to the point in time under consideration. If the active field is false (‘0’), then
the description is inactive at that point in time. If it is true (‘1’), then the description is
associated with the concept identified by the conceptId field (not shown in Table 4).
     Table 4 points out another weakness in the concept-orientation adhered to by
SNOMED CT, and its consequent reliance on ‘meanings’ and all problems that arise
therefrom [23]. The SNOMED CT documentation states that ‘only limited changes may
be made to the “term” field, as defined by editorial rules’ [11, p145]. This is consistent
with the view that ‘the meaning of a concept can be determined […] from associated
descriptions that include human readable terms’ [11, p87]. This editorial rule is also
used as an argument for not retiring the concept to which it is attached in cases where
the FSN undergoes minor changes. Indeed, ‘Minor changes in the FSN are those
changes that do not alter its meaning. A change to the semantic type shown in
parentheses at the end of the FSN may sometimes be considered a minor change if it
occurs within a single top-level hierarchy (e.g. a change from a finding tag to a
disorder tag, or a change from a procedure tag to a regime/therapy tag), but a move to
a completely different top-level hierarchy is regarded as a significant change to the
Concept's meaning and is prohibited’ [11, p393]. Yet, a change from ‘finding’ to
‘context-dependent category’ (later renamed ‘situation’) is precisely a move from one
top-level category to another. Despite this change, the concept was not deactivated!
This can only be explained – unless it was a mistake introduced in 2003 and detected
prior to the release of the July 2009 version in which this concept became deactivated –
if we assume that the SNOMED CT editors at that time clearly realized that whatever
they change inside SNOMED CT does not have an impact on how matters are on the
outside.
     Thus what stays fixed – modulo the appearance of truly new entities such as new
drugs, mutated viruses, and, perhaps new disorder types caused by newly developed
techniques or chemicals – are the entities on the outside, the portions of reality denoted
by some SNOMED CT component on the inside. This holds, of course, also for the
massive number of changes that occur at the level of the SNOMED CT relationships:
although they clearly change ‘the meaning’ of the concept in many cases, they are still,
from a realist perspective, intended to denote the very same PoRs.

 Table 2. Updates in the SNOMED CT concept file (RF2) for concept 301381004 with FSN ‘Discomforting
                                     present pain (finding)’.
 conceptID          Effective Time         Active   ModuleID                 Definitional Status
 301381004          20020131               1        900000000000207008       900000000000074008
 301381004          20080131               0        900000000000207008       900000000000074008
Legend: Active: (1) = active, (0) = inactive.
      Table 3. Updates in the SNOMED CT relationships file (RF2) for the same concept 301381004
 RelID          Effective       Active       Attribute        Target
                Time
 126300024      20020131        1            Is a             Pain (finding)
 126300024      20040131        0            Is a             Pain (finding)
 126301023      20020131        1            Is a             Finding of present pain intensity (finding)
 126301023      20080131        0            Is a             Finding of present pain intensity (finding)
 657858027      20020131        1            Finding site     Structure of nervous system (body structure)
 657858027      20060131        0            Finding site     Structure of nervous system (body structure)
 2260209021     20030731        1            Interprets       Nervous system function (observable entity)
 2260209021     20050131        0            Interprets       Nervous system function (observable entity)
 2458913020     20040131        1            Is a             Discomfort (finding)
 2458913020     20080131        0            Is a             Discomfort (finding)
 2858465020     20060131        1            Finding site     Anatomical structure (body structure)
 2858465020     20080131        0            Finding site     Anatomical structure (body structure)
Legend: RelID = Relationship identifier; Active: 1=active, 0= inactive. Columns irrelevant for our purposes
here are not shown. For readability, Attribute and Target identifiers have been replaced by their
corresponding FSN – omitting ‘(attribute)’ – in the most recent version studied (January 2016).


          Table 4. Updates in the SNOMED CT descriptions file (RF2) for concept ‘274236006’
 descriptionID      Effective   Active      Description     Term
                    Time                       Type
  410015012         20020131       1         Synonym        Asthenia       [D]
  410015012         20020731       0         Synonym        Asthenia       [D]
  666971011         20020131       1           FSN          Asthenia [D] (finding)
  666971011         20030131       0           FSN          Asthenia [D] (finding)
  1237162017        20020731       1         Synonym        Asthenia [D]
  1472277017        20030131       1           FSN          [D]Asthenia (context-dependent category)
  1472277017        20060731       0           FSN          [D]Asthenia (context-dependent category)
  1489933012        20030131       1         Synonym        [D]Asthenia
  2610401019        20060731       1           FSN          [D]Asthenia (situation)
Legend: Active: 1=active, 0=inactive. Columns irrelevant for our purposes here are not shown. For
readability, Description Type identifiers have been replaced by their corresponding term – omitting their
semantic tag ‘(core metadata concept)’.


4.2. Replacements

RF2 replaces the ‘history mechanism’ implemented in RF1 [5] by means of Historical
Association Reference Sets (HARS) and Component Inactivation Reference Sets
(CIRS). HARSs (Table 5) are used to indicate, for example, which deactivated
concepts are in one way or another related to other active concepts, and CIRSs (Table
6) to indicate the reasons for inactivating a component – such as errors, duplication of
another component and ambiguity of meaning [11, p506]. Records that express such
association are called reference set members. The primary purpose of these reference
sets is to specify which (if any) of these associations should be followed in a fashion
similar to following ‘Is a (attribute)’ relations when determining whether to retrieve a
record entry previously coded with a concept that has since then been inactivated.
Whereas ‘same as’ and ‘replaced by’ associations can be followed unproblematically,
the solution for ambiguous concepts related by ‘possibly equivalent to’ associations is
less clear-cut [11, p654].
      Table 5. Historical association reference set types in SNOMED CT (modified from [11, p509])
 HARS name             Use
 POSSIBLY              From an ambiguous concept to one or more active concepts that represents one of
 EQUIVALENT TO         the possible meanings of the inactive concept.
 MOVED TO              From a component to a namespace to which the component has been moved
 REPLACED BY           From an erroneous or obsolete inactive component to a single active replacement
                       component.
 SAME AS               From a duplicate component to the active component that this component
                       duplicates.
 WAS A                 From an inactive classification concept such as "not otherwise specified" to the
                       active concept that was formerly its most proximal supertype.
 ALTERNATIVE           From an inactive classification concept derived from ICD-9 Chapter XVI
                       'Symptoms signs and ill-defined conditions' with the most similar active concept.
 REFERS TO             From an inactive description which is inappropriate to the concept it is directly
                       linked to but instead should refer to the concept referenced.


         Table 6. Component inactivation set types for concepts (modified from [11, p506-507])
 CIRS value       Concept status
 Duplicate        inactive because it has the same meaning as another Concept
 Outdated         inactive because it is an outdated concept that is no longer used.
 Ambiguous        inactive because it is inherently ambiguous either because of an incomplete FSN or
                  because it has several associated terms that are not regarded as synonymous or partial
                  synonymous.
 Erroneous        inactive because it contains an error
 Limited          active prior to Jan 2010, inactive since then because of unstable meaning within
                  SNOMED CT.
 moved to         inactive because moved to another namespace.
 Pending move     active but in the process of being moved to another namespace


     Interestingly, the very same concepts can not only appear as source concept in one
HARS member and as target concept in another HARS member, but also appear in
members of distinct HARSs. This allows the computation of association networks of
concepts by randomly selecting a concept from a HARS member and recursively
collecting all reference set members in which this concept appears with the goal of
processing each concept in the same way until no more concepts can be found.


5. Discussion: towards process profiles for changes in SNOMED CT components

As instances of ICE, thus continuants, components have a history – an occurrent
process – in which they participate for the entire time of their existence. This is
comparable to the history of an organism, i.e. the process in which an organism
participates for the entire temporal period during which it exists. For organisms, there
is a process of shorter duration with can be qualified as life, the process in which the
organism participates for the entire time it is alive and which is an occurrent-part [3] of
the organism’s history. In a similar sense, a component can be perceived as being alive
or dead, when declared to be active or inactive respectively. Furthermore, depending on
the type of component, it can be alive or dead in different ways. While a concept is
active, it can be ‘fully’ alive or, when it is marked for a pending move, ‘dying’ (Table
6). Prior to 2010, it could also be alive in a ‘limited’ way. In [24], process profiles were
identified as something that is not numerically but qualitatively ‘the same’ in distinct
processes such as the ‘same’ temperature change of two rocks in our aquarium when
the water temperature changes. These processes each have as part an instance of a
quality process profile of exactly the same (determinate) type, i.e. ‘that part of a
process which serves as the target of selective abstraction focused on a sequence of
instances of determinate temperature qualities’ [24]. It is speculated in [24] that the
theory of process profiles can be applied not merely to quantitative information
artifacts but also to other sorts of symbolic representations of processes. It is this that
we try to achieve with respect to changes that occur in SNOMED CT components,
including memberships in HARSs and CIRSs. Although SNOMED CT’s RF2 format is
more coherent than its predecessor at the syntactic level, it requires more restructuring
of the data to arrive at a uniform view of what changed in relation to a specific concept,
and from there to infer what might have happened on the side of the corresponding PoR
(in case there is one).
     Table 7 uses 5 concepts (C1 … C5) as examples of how to construct process
profile representations (PPRs) in a (nearly) uniform way for the various sorts of
changes the concepts – from this perspective – underwent. Each PPR consists of 29
characters, 1 for each version, each one representing the status of some quality-like
feature that can be ascribed to the concept. The column ‘Attr.’ represents those features
at a level on a par with ‘temperature’, ‘color’, etc. For the rows with neutral
background, the combination of what appears in the ‘Attr.’ and ‘Value’ columns
represents those features at the most determinate level that we were able to measure,
comparable to ‘37.2 centigrade temperature’. Here ‘FSN+T-367’, f.i, means that the
term ‘General symptom NOS (finding)’, the 367th term (randomly numbered) out of
999,639 terms was ‘measured’ as the determinate value for ‘FSN’ (since we used a
FSN-thermometer, not a Synonym-thermometer). The table shows that this quality-like
feature was found to inhere in the concepts C1 through C4, whereby, as can be
determined from the respective PPRs, the histories of C1, C2 and C3 all share some
occurrent-part which instantiates the same most-determinate process profile universal,
and they do this at the same time (starting from the 14th version). ‘A’ in this case
stands for ‘active’, while ‘_’ means that there is at the respective time no instance of
the quality-like feature inhering in the concept. C4, in contrast, exhibits a different PPR
for this feature, one that is the result of a start in the 9th version. For the rows in grey
background, the value in ‘Value’ does not correspond to a measurement at a specific
point in time, but with a most-determinate PPR type itself. ‘DSP-05’, for instance, is
one out of 34 most-determinate PPR types for the quality-like feature ‘Dstatus’
(definitional status). It is C1, C2, and C3 that exhibit an instance of this type.


6. Related work

     Computer scientists and logicians have developed a number of theoretical
approaches to deal with logical changes in description logic based ontologies. For
instance, in [25] a model-theoretic semantics for ontology versioning based on first-
order-logic is proposed that can be applied to ontologies expressed in RDF and OWL.
[26] reports on the development of a Multi-version Ontology REasoner (MORE) based
on using temporal logics to perform reasoning across multiple versions of ontologies.
MORE was tested on small ontologies in two different domains. In [27], a change
detection approach for OWL based on a logical change definition language and
temporal logic is proposed. [28] presents a tool for tracking and visualizing differences
between two versions of an ontology. [29] describes an interactive tool for visualizing
and exploring ontology changes that offers both overview and concept-based analyses.
     In [30] the rate of changes in SNOMED CT was characterized and quantified from
2002 to 2005, finding that most changes were occurring among relationships, and in
particular subsumption relationships, and concluding that implementers must ‘carefully
examine mechanisms for handling this degree of change’. By examining changes in
SNOMED CT over three years as recorded in the Component History and Concept
Model with a focus on the subset of concepts in the NLM CORE Problem List, four
types of changes (present in over 40% of the target concepts over the studied timespan)
were identified that are likely to impact health recordkeeping [31]. In [32], an approach
is presented to identify idiosyncrasies such as relation reversals (a particularly dramatic
type of structural change) in the evolution of SNOMED CT, finding 48 such reversals
since 2009. [33] demonstrates how changes between two SNOMED CT versions
affected a majority of concepts used in a legacy mapped interface terminology,
including unexpected effects of structural changes in SNOMED CT, and argues for a
consideration of impact on such implementations as part of terminology development.
Motivated by [33], [12] presents indicators that can be computed to assess whether an
upgrade from one version to the next would be worth the effort.


  Table 7. Uniform representation of changes in SNOMED CT components using process quality profiles.
 S     FSN       Attr.     Value             Value label           Process Profile Representation (PPR)
C1 GS NOS Dstatus DSP-05                                           DDDDDDDDDDDDDDDDDDDDDDDDDDDDD
C1 GS NOS Reason CIP-15                                            DDDDDDDDDDDDDDDDLLLLLLLLLLLLL
C1 GS NOS FSN             T-367    GS NOS (finding)                _____________AAAAAAAAAAAAAAAA
C1 GS NOS Same-as C4               GS NOS (finding)                _AAAAAAAAAAAAAAAAAAAAAAAAAAAA
C1 GS NOS Was-a           C5       GS (finding)                    ________________AAAAAAAAAAAAA
C2 GS NOS Dstatus DSP-05                                           DDDDDDDDDDDDDDDDDDDDDDDDDDDDD
C2 GS NOS Reason CIP-15                                            DDDDDDDDDDDDDDDDLLLLLLLLLLLLL
C2 GS NOS FSN             T-367    GS NOS (finding)                _____________AAAAAAAAAAAAAAAA
C2 GS NOS Same-as C4               GS NOS (finding)                _AAAAAAAAAAAAAAAAAAAAAAAAAAAA
C2 GS NOS Was-a           C5       GS (finding)                    ________________AAAAAAAAAAAAA
C3 GS NOS Dstatus DSP-05                                           DDDDDDDDDDDDDDDDDDDDDDDDDDDDD
C3 GS NOS Reason CIP-15                                            DDDDDDDDDDDDDDDDLLLLLLLLLLLLL
C3 GS NOS FSN             T-367    GS NOS (finding)                _____________AAAAAAAAAAAAAAAA
C3 GS NOS Same-as C4               GS NOS (finding)                _AAAAAAAAAAAAAAAAAAAAAAAAAAAA
C3 GS NOS Was-a           C5       GS (finding)                    ________________AAAAAAAAAAAAA
C4 GS NOS Dstatus DSP-20                                           PPPPPPPPPPPPPPPPDDDDDDDDDDDDD
C4 GS NOS Reason CIP-18                                            LLLLLLLLLLLLLLLLLLLLLLLLLLLLL
C4 GS NOS FSN             T-367    GS NOS (finding)                ________AAAAAAAAAAAAAAAAAAAAA
C4 GS NOS FSN             T-258    GS NOS (cont-dep. category) AAAAAAAADDDDDDDDDDDDDDDDDDDDD
C4 GS NOS Is a            C5       GS (finding)                    AAAAAAAAAAAAAAAADDDDDDDDDDDDD
C4 GS NOS Same-as C1               GS NOS (finding)                ________________AAAAAAAAAAAAA
C4 GS NOS Same-as C2               GS NOS (finding)                ________________AAAAAAAAAAAAA
C4 GS NOS Same-as C3               GS NOS (finding)                ________________AAAAAAAAAAAAA
C4 GS NOS Was-a           C5       GS (finding)                    ________________AAAAAAAAAAAAA
C5 GS          Dstatus DSP-03                                      PPPPPPPPPPPPPPPPPPPPPPPPPPPPP
C5 GS          FSN        T-368    GS (finding)                    ________AAAAAAAAAAAAAAAAAAAAA
C5 GS          FSN        T-277    GSs (cont-dep. category)        AAAAAAAADDDDDDDDDDDDDDDDDDDDD
Legend: ‘S’ source concept. Concept identifiers were abbreviated for space reasons: C1=139169008,
C2=139174000, C3=161914002, C4=161919007, C5=267022002. FSNs of concepts are abbreviated to ‘GS’
for ‘General symptom’. Dstatus=concept definition status. DSP=description status profile, CIP=concept
inactivation profile. T=term. Individual characters in PPR are abbreviations of SNOMED CT properties:
‘A’=active, ‘L’=limited value, ‘P’=primitive, ‘D’=defined, ‘_’=no value present.
7. Conclusion

Many efforts have been made to measure the amount and type of changes occurring
between SNOMED CT versions. To our best knowledge, a method based on the
representation of process profiles has thus far not been attempted. The results we
obtained in our exploration are promising although more work on our side towards
further harmonization is required. In any case, when in 2011 we asked ourselves the
question whether with RF2 SNOMED CT’s future is bright [14], we were not able to
answer it. Now we believe we can: when complemented with an approach as proposed
here, it is! We strongly recommend any ontology to be distributed using such an
improved RF2 format – or semantic equivalent along the lines described here – since
without such mechanisms data annotated in terms of previous versions lose value
dramatically.


Acknowledgments

This work was supported in part by Clinical and Translational Science Award NIH 1
UL1 TR001412-01 from the National Institutes of Health, by grant R21LM009824
from the National Library of Medicine (NLM), and by grant 1R01DE021917-01A1
from the National Institute of Dental and Craniofacial Research (NIDCR). The content
of this paper is solely the responsibility of the authors and does not necessarily
represent the official views of the NIDCR, the NLM or the National Institutes of Health.


References

[1]  F. Fonseca, “The Double Role of Ontologies in Information Science Research,” Journal of the
     American Society for Information Science and Technology, vol. 58, no. 6, pp. 786-793, 2007.
[2] B. Smith, and W. Ceusters, “Ontological realism: A methodology for coordinated evolution of scientific
     ontologies,” Applied Ontology, vol. 5, no. 3-4, pp. 139-188, 2010.
[3] R. Arp, B. Smith, and A. D. Spear, "Building ontologies with basic formal ontology," The MIT Press,,
     2015, p. 1 online resource.
[4] W. Ceusters, and B. Smith, "A Realism-Based Approach to the Evolution of Biomedical Ontologies,"
     Biomedical and Health Informatics: Proceedings of the 2006 AMIA Annual Symposium, pp. 121-125,
     Washington DC: American Medical Informatics Association, 2006.
[5] W. Ceusters, “Applying Evolutionary Terminology Auditing to SNOMED CT,” AMIA Annu Symp Proc,
     vol. 2010, pp. 96-100, 2010.
[6] W. Ceusters, K. A. Spackman, and B. Smith, "Would SNOMED CT benefit from Realism-Based
     Ontology Evolution?." In Teich JM, Suermondt J, Hripcsak C. (eds.), American Medical Informatics
     Association 2007 Annual Symposium Proceedings, Biomedical and Health Informatics: From
     Foundations to Applications to Policy, Chicago IL, 2007;:105-109..
[7] Donnelly K, "SNOMED CT: The Advanced Terminology and Coding System for eHealth," Studies in
     Health Technology and Informatics - Medical and Care Compunetics 3. Vol 121, Bos L, Roa L,
     Yogesan K et al., eds., pp. 279 - 290, Amsterdam: IOS Press, 2006.
[8] W. Ceusters, “Applying Evolutionary Terminology Auditing to the Gene Ontology,” Journal of
     Biomedical Informatics; Special Issue of the Journal of Biomedical Informatics on Auditing of
     Terminologies, vol. 42, no. 3, pp. 518-529, 2009.
[9] S. Seppälä, B. Smith, and W. Ceusters, "Applying the Realism-Based Ontology-Versioning Method for
     Tracking Changes in the Basic Formal Ontology.," Formal Ontology in Information Systems, Frontiers
     in Artificial Intelligence and Applications P. Garbacz and O. Kutz, eds., pp. 227-240, 2014.
[10] B. Smith, and W. Ceusters, “Aboutness: Towards Foundations for the Information Artifact Ontology,”
     in International Conference on Biomedical Ontology, Lisbon, Portugal, 2015, pp. 47-51.
[11] IHTSDO, "International Health Terminology Standards Development Organization - SNOMED CT®
     Technical Implementation Guide - January 2015 International Release (US English)," 2015, p. 757.
[12] W. Ceusters, “SNOMED CT revisions and coded data repositories: when to upgrade?,” AMIA Annu
     Symp Proc, vol. 2011, pp. 197-206, 2011.
[13] S. Schulz, B. Suntisrivaraporn, and F. Baader, “SNOMED CT's problem list: ontologists' and logicians'
     therapy suggestions,” Stud Health Technol Inform, vol. 129, no. Pt 1, pp. 802-6, 2007.
[14] W. Ceusters, “SNOMED CT's RF2: Is the future bright?,” Stud Health Technol Inform, vol. 169, pp.
     829-33, 2011.
[15] B. Smith, "Beyond concepts: ontology as reality representation," Proceedings of the third international
     conference on formal ontology in information systems (FOIS 2004), pp. 73-84, Amsterdam: IOS Press,
     2004.
[16] S. Schulz, and R. Cornet, "SNOMED CT's Ontological Commitment," ICBO: International Conference
     on Biomedical Ontology, B. Smith, ed., pp. 55-58, Buffalo NY: National Center for Ontological
     Research, 2009.
[17] S. Schulz, A. Rector, J. M. Rodrigues et al., “Competing interpretations of disorder codes in SNOMED
     CT and ICD,” AMIA Annu Symp Proc, vol. 2012, pp. 819-27, 2012.
[18] W. Ceusters, “An information artifact ontology perspective on data collections and associated
     representational artifacts,” Stud Health Technol Inform, vol. 180, pp. 68-72, 2012.
[19] B. Smith, T. Malyuta, R. Rudnicki et al., "IAO-Intel: An Ontology of Information Artifacts in the
     Intelligence Domain," CEUR Workshop Proceedings. pp. 33-40.
[20] W. Ceusters, and B. Smith, “Foundations for a realist ontology of mental disease,” Journal of
     Biomedical Semantics, vol. 1, no. 10, pp. 1-23, 9 December 2010, 2010.
[21] B. Smith, W. Kusnierczyk, D. Schober et al., "Towards a Reference Terminology for Ontology
     Research and Development in the Biomedical Domain," KR-MED 2006, Biomedical Ontology in
     Action., Baltimore MD, USA 2006.
[22] R. S. Gonçalves, B. Parsia, and U. Sattler, “Facilitating the analysis of ontology differences,” in Joint
     workshop on knowledge evolution and ontology dynamics (EvoDyn) 2011, pp. 20-35.
[23] B. Smith, "Against Fantology," Experience and Analysis, M. E. Reicher and J. C. Marek, eds., pp. 153-
     170, Wien, 2005.
[24] B. Smith, “Classifying Processes: An Essay in Applied Ontology,” Ratio (Oxf), vol. 25, no. 4, pp. 463-
     488, Dec 1, 2012.
[25] J. Heflin, and Z. Pan, "A model theoretic semantics for ontology versioning," Lecture Notes in
     Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in
     Bioinformatics), 2004, pp. 62-76.
[26] Z. Huang, and H. Stuckenschmidt, "Reasoning with multi-version ontologies: A temporal logic
     approach," Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial
     Intelligence and Lecture Notes in Bioinformatics), 2005, pp. 398-412.
[27] P. Plessers, O. De Troyer, and S. Casteleyn, “Understanding ontology evolution: A change detection
     approach,” Web Semantics: Science, Services and Agents on the World Wide Web, vol. 5, no. 1, pp. 39-
     49, 3//, 2007.
[28] N. F. Noy, S. Kunnatur, M. Klein et al., "Tracking Changes during Ontology Evolution," Lecture Notes
     in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in
     Bioinformatics), 2004, pp. 259-273.
[29] M. Hartung, T. Kirsten, A. Gross et al., “OnEX: Exploring changes in life science ontologies,” BMC
     Bioinformatics, vol. 10, pp. 250, 2009.
[30] K. A. Spackman, “Rates of Change in a Large Clinical Terminology: Three Years Experience with
     SNOMED Clinical Terms,” AMIA Annual Symposium Proceedings, vol. 2005, pp. 714-718, 2005.
[31] D. Lee, R. Cornet, and F. Lau, “Implications of SNOMED CT versioning,” International Journal of
     Medical Informatics, vol. 80, no. 6, pp. 442-453, 6//, 2011.
[32] S. Tao, L. Cui, W. Zhu et al., “Mining Relation Reversals in the Evolution of SNOMED CT Using
     MapReduce,” AMIA Summits on Translational Science Proceedings, vol. 2015, pp. 46-50, 03/23, 2015.
[33] G. Wade, and S. T. Rosenbloom, “The impact of SNOMED CT revisions on a mapped interface
     terminology: Terminology development and implementation issues,” Journal of Biomedical Informatics,
     vol. 42, no. 3, pp. 490-493, 6//, 2009.