=Paper= {{Paper |id=Vol-2137/paper_12.pdf |storemode=property |title=Scrutinizing the Axiomatic Basis of SNOMED CT: How Confused is it by the Ambiguous Terminology Paradigm? |pdfUrl=https://ceur-ws.org/Vol-2137/paper_12.pdf |volume=Vol-2137 |authors=Jean Marie Rodrigues,Stefan Schulz,Alan L. Rector |dblpUrl=https://dblp.org/rec/conf/icbo/Rodrigues0R17 }} ==Scrutinizing the Axiomatic Basis of SNOMED CT: How Confused is it by the Ambiguous Terminology Paradigm?== https://ceur-ws.org/Vol-2137/paper_12.pdf
           Scrutinizing the axiomatic basis of SNOMED CT:
      How confused is it by the ambiguous terminology paradigm?
                          Jean-Marie Rodrigues1,2* Stefan Schulz3 and Alan Rector4
                                         1
                                           INSERM LIMICS UPMC UP 13 Paris, France
                   2
                    University of Saint Etienne, CHU, Department of Public Health and Medical Informatics,
                                                      Saint Etienne, France
           3
             3Institute for Medical Informatics, Statistics and Documentation, Medical University of Graz, Austria
                                                 4
                                                   University of Manchester, UK

ABSTRACT                                                                     rectness of the axiomatic expressions is affected when
    SNOMED CT, the world’s largest clinical terminology introduces it-       SNOMED CT curators are led preferentially by language.
self as “a terminological resource which consists of codes representing
meanings expressed as terms, with interrelationships between the             We first analyse the external inconsistencies between axio-
codes to provide enhanced representation of the meanings.” On the            matic descriptions and definitions of SNOMED CT con-
one hand, concepts are linked to lexical entities (terms), including Fully   cepts on the one hand and the ICD11 class. Thereafter, we
Specified Names, Preferred Terms, and Synonyms. On the other hand,
SNOMED CT concepts are described and defined by expressions follow-
                                                                             investigate inconsistencies within SNOMED CT and their
ing a formalism called Compositional Grammar (CG), according to              relation to ambiguities in typical clinical interface terms. As
which SNOMED CT might be considered a formal ontology. We investi-           a conclusion, we recommend that the axiomatic underpin-
gate whether or not the ambiguity in the terms, which are formulated         ning of SNOMED CT should be developed autonomously
according to lexical and linguistic principles, is hampering the quality
of the formal concept model using DL semantics and propose a more            from the lexical entitites/terms, and that the linkage of terms
autonomous development process for formal concept definitions.               for concepts to the axiomatic descriptions of those concepts
                                                                             be done after the axiomatic model of the concepts is consol-
1    INTRODUCTION                                                            idated.
   SNOMED CT [1], a clinical terminology standard with
about 300,000 representational units, is presented as a ter-                 2   MATERIAL AND METHODS
minological resource linked to description logics expres-                    SNOMED CT‘s representational units, called concepts are
sions [1]. We can therefore consider SNOMED CT as both                       linked to clinical terms (so called “descriptions”) in several
    A terminology – as constituted by concepts (entities of                 languages. Terms are of several types including Fully Speci-
     lexical meaning), related terms of different types (Fully               fied Names (FSNs), Preferred Terms (PTs), and Synonyms.
     Specified Names, Preferred Terms, and Synonyms,                         SNOMED CT concepts are also formally described by ex-
     obeying several naming conventions).                                    pressions following a language called Compositional
 A formal ontology constituted by classes, individuals                      Grammar (CG) [3], which can be interpreted according to
     and formal relations expressed as axioms in “Composi-                   description logic (DL) semantics. In the following example,
     tional Grammar” equivalent to EL++/OWL-EL – what                        Fracture of tibia, is fully defined as being equivalent to
     SNOMED call the “concept model”. As such, the con-                      Injury of tibia and Fracture of lower leg, with Associated
     sistency of the SNOMED CT concept model can be                          morphology Fracture and Finding site Bone structure of
     checked by description logics reasoners.                                tibia. Its rendering in CG and the Description Logics Man-
It is critical that the concepts referred to by linguistic ex-               chester Syntax is shown below (class symbols are set in
pressions used in electronic health records are accurately                   Italics and relation symbols are in Bold):
aligned with the underlying axiomatic representation of
those concepts. Recent works on the harmonization between                         31978002 |Fracture of tibia(disorder)|
a subset of SNOMED CT and a pre-final version of ICD-11                           === 428881005 |Injury of tibia (disorder)| +
have highlighted significant modelling issues. In more than                           414292006 |Fracture of lower leg (disorder)| :
one third of cases, the SNOMED CT axiomatic expressions                           { 363698007 |Finding site (attribute)| =
did not align well with the intuitive meaning derived from                            12611008 |Bone structure of tibia (body structure)|,
their Fully Specified Names or synonyms, when lexically                            116676008 |Associated morphology (attribute)| =
mapped to ICD-11 classes [2].                                                         72704001 |Fracture (morphologic abnormality)| }
   This paper will investigate the hypothesis that in the pro-                    ‘Fracture of tibia’ equivalentTo
cess of building and maintaining SNOMED CT, the cor-                                 ‘Injury of tibia (disorder)’ and
                                                                                     ‘Fracture of lower leg (disorder)’ and
                                                                                      RoleGroup some
*     To    whom       correspondence       should      be     addressed:                 ((‘Finding site (attribute)’ some
rodrigue@univ-st-etienne.fr



                                                                                                                                             1
               ‘Bone structure of tibia (body structure)’) and   lexically mapped to a single ICD 11 class constituted a fully
             (‘Associated morphology (attribute)’ some           equivalent representation of the ICD11 class.
               ‘Fracture (morphologic abnormality)’))            The details are developed below and summarized in Figure
Table 1. SNOMED CT definitions in Conceptual Grammar             1 and Table 2.
(above) and OWL Manchester Syntax (below)                        We introduce the following symbols for the mapping types:
CG supports logic-based compositional expressions in order       M (refined by M1 and M2), A (refined by A1 and A2), P
to maximise the coverage of utterances in clinical records,      and Z. We consider the mapping of a SNOMED CT Con-
without requiring the terminology to attend the users’ de-       cept SCi, described by terms STi{1…n} to an ICD class ICi,
mand by continuous creation of new concepts. The latter is       described by a name ITi.
known as pre-coordination. An example for a pre-                 Lexical map
coordinated concept is “right hand”, which has the code             The following rules apply for the lexical maps
78791008 |Structure of right hand (body structure). In con-         If there is a full lexical map between the ICD-11 class
trast, there is no code for “right thumb”, but the meaning of        name ITi and one SNOMED CT description STi{1…n,
this is expressible by post-co-ordination, viz. by the CG            considered as pre-coordinated in SNOMED CT it is
expression 76505004 |Thumb structure (body structure)|:              classified as M (for lexical Map) type .
272741003 |Laterality (attribute)| = 24028007 |Right              If there is no lexical map between any ITi and STik , but
(qualifier value), corresponding to the OWL expression:              if mapping can be achieved to the post-coordination of
‘Thumb          structure      (body     structure)’     and         two or more descriptions STi{1…n, of SCk , it is classi-
‘Laterality (attribute)’ some ‘Right (qualifier value)’.             fied as A (for Addition map) type.
                                                                  If only a part of ITi of ICi can be lexically mapped to
ICD – the International Classification of Diseases and Re-           any STik it is classified as P (for Partial) type.
lated Health Problems – is promoted by WHO as “the                Finally, if not even a partial lexical mapping between
standard diagnostic tool for causes of death, epidemiology,          any ITi o of ICi and STik is possible, it is classified as Z
health management and clinical purposes”. However, it is             (for Zero) type.
particularly focused on the analysis of the health of popula-    Match of meaning
tion groups, and is used to monitor the incidence and preva-     Subsequently, the defining and constraining axioms of one
lence of diseases and other health problems. The ongoing         or more than one SCi CG expressions were analysed to
11th (ICD-11) revision, named ICD-11-MMS (Mortality,             check whether they correspond to the totality of the textual
Morbidity and Standard) is planned to be finalized in 2018.      definition and to the hierarchy inheritance of ICi . The fol-
ICD has recently been characterized as an “aggregation           lowing cases are distinguished:
terminology” [2]. This terminology genre typically contains       M (lexical map) type:
rules that enforce the principle of single hierarchies and           1. This expression fully represents the meaning of ICi,
disjoint classes. Partitioning ICD-11 into non-overlapping                a complete match meaning is assumed: the classifi-
chapters requires exclusion rules at all hierarchical levels.             cation is refined to M1.
E.g., the chapter “circulatory system” excludes infections,          2. This expression does not fully represent the mean-
neoplasms, endocrine and congenital diseases called “devel-               ing of ICi, a new expression is produced according
opmental”, which have their own chapters. Making ICD                      to CG: the classification is refined to M2.
exhaustive requires residual classes (“other specified”, “oth-    A (addition map) type:
er unspecified”), indicated by codes ending in “Y” or “Z”.           1. These expressions fully represent the meaning of
named residuals which have no meaning outside the ICD                     ICi, a complete match meaning is assumed: the
hierarchy.                                                                classification is refined to A1.
The current study is limited to 428 classes from ICD-11, as          2. These expressions do not fully represent the mean-
displayed by the WHO browser [5], covering the circulatory                ing of ICi, a new expression is produced according
system, and 522 classes covering the digestive system. We                 to CG: the classification is refined to A2.
exclude ICD-11 residuals because they are meaningless             P type:
outside ICD. The resulting totals are 206 in the circulatory         For ICi it is then necessary to create a logical represen-
chapter and 250 in the digestive chapter (see Table 4).              tation based on one existing CG expression plus an ex-
                                                                     tended de novo CG expression.
In a first step, we compared the Compositional Grammar
(CG) expressions of lexically mapped ICD11 classes and            Z type:
                                                                     For this ICi it is necessary to create a logical expression
SNOMED          CT      concepts     using     WHO        and
                                                                     in accordance with SNOMED CT CG .
IHTSDO/SNOMED Browsers [4][5]. As explained in [6],
the lexical map is based on ICD 11 class names and
                                                                 In the following, only M and A types will be analysed.
SNOMED CT FSNs or synonyms. In a second step, we
checked if the CG expressions of SNOMED CT concepts


2
                                                                          Erreur ! Il n'y a pas de texte répondant à ce style dans ce document.



                                                                          adapted SNOMED CT concept terms. We were conforming
                                                                          to the assumptions, rules, and standards of the SNOMED
                                                                          CT concept model when we have to extend the representa-
                                                                          tion (Types M2 and A2). Two knowledge engineering mas-
                                                                          ter students did the work, one each for the circulatory and
                                                                          digestive chapters. The same senior ICD-11 and SNOMED
                                                                          CT expert supervised both.

                                                                          Map and meaning           ICD11            Rate       ICD11            Rate
                                                                          match types               Circ.            (%)
                                                                                                    count                       Digestive            (%)
    Fig. 1. ICD-11 SNOMED CT semantic alignment principle                                                                       count
                                                                          M1                                  209       51            251            53
Lexical    map                                                            M2                                  123       30            125            26
and     meaning     Action                    Compositional grammar
                                                                          A1                                   17           4           23               5
match
                                                                          A2                                   15           3           25               5
Lexical map and     Take the representation   The existing pre-
                                                                          P                                    44       11              45               9
full meaning        expression of the         coordinated inferred
                                                                          Z                                     4           1           9                2
match (M 1).        SNOMED CT concept         expression of SNOMED
                                                                          Total (M + A + P + Z)               412       68            478            66
                                              CT concept
                                                                          “complete chapter”
Lexical map and     Take the representation   Modify the existing pre-
                                                                          Other and unspecified               197       32            250            34
no full meaning     expression of the         coordinated inferred
                                                                          number of codes
match (M 2)         SNOMED CT concept         expression of SNOMED
                                                                          Total number of codes               609       100           728            100
                                              CT concept
                                                                          Table 3. Numbers of codes in the Circulatory chapter and Diges-
Post-coordinated    Take the representation   Post-coordination of two
                                                                          tive chapter, from ICD 11 MMS 2017 to SNOMED CT 31 January
lexical map         of two or more pre-       or more pre-coordinated
                                                                                    2017 release by map and meaning match types
possible and full   coordinated existing      existing inferred expres-
meaning match       representations of        sion of SNOMED CT
                                                                          3        RESULTS
(A 1).              SNOMED CT concepts        concepts
Post-coordinated    Take the representation   Post-coordination and       Table 3 provides an overview of the results. The two most
lexical map         of two or more pre-       modification of two or      frequent lexical map types are M (M1 plus M2) for full
possible but no     coordinated existing      more pre-coordinated        lexical map with a pre-coordinated SNOMED CT concept
full meaning        representations of        existing inferred expres-   and A (A1 plus A2) full lexical map with more than one
match (A 2).        SNOMED CT concepts        sion of SNOMED CT           post-coordinated SNOMED CT concepts: 78 % for the
                                              concepts                    circulatory chapter and 89% for the digestive chapter. The
Partial lexical     Take the representation   One pre-coordinated         most frequent type is M1 for both. The less frequent types
map (P)             of one pre-coordinated    existing inferred expres-   are Z for no possible lexical map for the circulatory chapter
                    existing representation   sion of a SNOMED CT         (1%) and for the digestive chapter (2%). These differences
                    of SNOMED CT con-         concept plus an extended    can be explained by inter-ratter differences (the work was
                    cept                      de novo CG expression       done by two different knowledge engineering master stu-
No lexical map      Create a logical CG       A new logical CG expres-    dents supervised by the same senior terminology expert) or
(Z).                expression                sion                        quality differences between these two chapters either in
 Table 2. The lexical maps types and meaning matches between              WHO ICD 11 or in SNOMED CT or in both.
 the ICD-11 MMS classes and SNOMED CT formal expressions                      Map and       ICD11           ICD11        ICD11            ICD11
                                                                              meaning        Circ.           Circ.      Digestive        Digestive
We did not consider the current pre-final version of ICD-11
                                                                              match types   system       system             system    system primi-
as a gold standard. Therefore, the total or partial omission
                                                                                             total      primitives           total         tives
of a SNOMED CT concept that seemed necessary to ICD 11
                                                                              M1                  209        44 (21%)           251          58 (23 %)
was not considered an issue, and these cases were omitted.
Neither did we assess the clinical consistency of ICD 11’s                    M2                  123       112 (91%)           125          105 (84%)
textual definitions. We assessed only the existing CG ex-                     A1                  17          6 (35%)           23            11 (47%)
pression(s) as to how well they represented the ICD-11 class                  A2                  15          8 (53%)           25            13 (52%)
textual definitions when the IC11 class names have been                    Table 4. Primitive SNOMED CT concepts by map and meaning
lexically mapped to SNOMED terms or to a minimally                                               match types



                                                                                                                                                         3
To address the quality of the formal descriptions of                   ‘Sudden onset AND/OR short duration (qualif. value)’) and
SNOMED CT, it is interesting to compare the rate of primi-             (‘Has definitional manifestation (attribute)’ some
tive SNOMED CT concepts in the different Map and Mean-                  ‘Tachycardia (finding)’) )
ing match types as shown in Table 4. The types with full           This representation lacks the localization of the arrhythmia
map and meaning match (M1 and A1) have a lower rate of             at the atrium and the formalization allows representing it as
SNOMED CT primitive concepts (from 21 % to 47%) and                the following one. The modification to the original expres-
the types with no full match (M2 and A2) have a higher rate        sion is underlined.
of SNOMED CT primitive concepts (from 52% to 91%).
                                                                      RoleGroup some
Nevertheless the primitive concepts rate of full Map and              ((‘Finding site (attribute)’ some
Meaning match types (M1 and A1) is high when it is con-                ‘Preferential interatrial pathway (body structure)’)and
sidered that the lexical map was complete between the ICD-             (‘Clinical course (attribute))’ some
11 class name and the SNOMED CT FSN or synonym. On                     ‘Sudden onset AND/OR short duration (qualif. value)’) and
the contrary, when the lexical map is incomplete we should             (‘Has definitional manifestation (attribute)’ some
have expected a rate nearer from 100 % which is nearly                  ‘Tachycardia (finding)’) )
true for M2 but less for A2.                                       An example for the type A1 is BA04.3 is Secondary hyper-
It is necessary to go further by taking some examples of           tension associated with renal tubular disorders This ICD-11
mismatches regarding primitive and fully defined SNOMED            class has no definition in most recent version (Jan 2017). A
CT concepts.                                                       full lexical map can be done with the SNOMED CT concept
As an example for the type M1, the ICD ICD-11 class DA             31992008, Secondary hypertension(disorder), a primitive
40.4 Perforation of esophagus is defined by: “Perforation of       concept, together with 95568003, Renal tubular disorder
esophagus is a penetration or hole of the wall of the esoph-       (disorder), a fully defined one, using the following post-
agus, resulting in luminal contents in esophagus flowing           coordinated SNOMED CT inferred expressions, which
into the mediastinum and/or thoracic cavity”. The full lexi-       introduces the aetiology using the relation DueTo:
cal map is with the fully defined SNOMED CT concept                   Has definitional manifestation (attribute) some
23387001,                                                                Finding of increased blood pressure (finding) and
Perforation of esophagus (disorder), which is equivalent to            RoleGroup some
                                                                        (‘Finding site (attribute)’ some
the following (inferred) pre-coordinated SNOMED CT
                                                                        ‘Systemic circulatory system structure (body structure)’) and
inferred expression:                                                  ‘Due to (attribute)’ some Renal tubular disorder (disorder)
    RoleGroup some                                                 As an example for the type A2, let us analyse the ICD-11
      ((‘Finding site (attribute)’ some
        ‘Esophageal structure (body structure)’) and               class DB02.31 Ig-E mediated allergic enteritis of small
      (‘Associated morphology (attribute)’ some                    intestine, defined as “Immediate type (IgE-mediated) enter-
        ‘Perforation (morphologic abnormality)’))                  ic hypersensitivity due to exposure to an allergen in individ-
                                                                   uals previously sensitized. The symptoms are acute ab-
As an example for the type M2, the ICD-11 class BB67.3
                                                                   dominal pain and diarrhoea and can be combined to other
Macro re-entrant atrial tachycardia is defined as “An atrial
                                                                   symptoms in cases of anaphylaxis”. A full lexical map is
arrhythmia in which there is intra-atrial re-entry or circus
                                                                   possible with the fully defined SNOMED CT concepts
movement around a fixed or functional central obstacle. The
                                                                   22231002 Allergic enteritis (disorder) and 422076005
central obstacle may consist normal (e.g. valves) or abnor-
                                                                   Immunoglobulin E-mediated allergic disorder (disorder),
mal (e.g., scar) structures. Conduction to the ventricles is
                                                                   constructing the following expression (addition underlined):
not necessary for the tachycardia to continue. All that is
required is an organised atrial rhythm with a rate typically          ‘Pathological process (attribute)’ equivalentTo
between 250 and 350 bpm, including tachycardia using a                   ‘Allergic process (qualifier value)’ and
                                                                         RoleGroup some
variety of re-entry circuits that often occupy large areas of
                                                                              ((‘Associated morphology (attribute)’ some
the atrium (‘‘macro-re-entrant’’). Here the arrhythmia in-                       ‘Inflammation (morphologic abnormality)’) and
volves the cavo-tricuspid isthmus”.                                             (‘Finding site (attribute)’ some
The full lexical map is with the SNOMED CT concept                               ‘Intestinal structure (body structure)’)) and
233893007 Re-entrant atrial tachycardia (disorder), a prim-             ‘Due to (attribute)’ some
itive concept with the following pre-coordinated SNOMED                       ‘Type 1 hypersensitivity response (disorder)’ and
                                                                        ‘Causative agent (attribute)’ some
CT inferred expression:
                                                                               ‘Immunoglobulin E (substance)’
    RoleGroup some
    ((‘Finding site (attribute)’ some
     ‘Cardiac conducting system structure (body structure.)’)and
     (‘Clinical course (attribute))’ some



4
                                                                   Erreur ! Il n'y a pas de texte répondant à ce style dans ce document.



4     DISCUSSION                                                   4.2      Misalignment between SNOMED CT concept
The study makes the attempt to propose semantically pre-                    FSN and full definitions
cise mappings between two independent representation               The ICD-11 class DA52.51 Allergic gastritis due to IgE-
artefacts (ICD-11 and SNOMED CT), based on OWL-DL,                 mediated hypersensitivity can be fully represented by the
using the axioms in the SNOMED Composition Grammar                 SNOMED CT concepts 1824008 Allergic gastritis (disor-
“concept model” (and OWL-EL equivalent to from it),                der) and 422076005 Immunoglobulin E-mediated allergic
which are intended to fine what is universally true in a do-       disorder (disorder), both of which are fully defined. The
main, [7-8].                                                       role of Immunoglobulin E is not represented in the present
The findings are summarised in Table 3: 138 (123 M2 plus           version.
15 A2 )out of 364 SNOMED CT concepts (38%) in the
circulatory chapter and 150 (125 M2 plus 25 A2) out of 424         4.3      Inconsistencies across SNOMED CT concept
SNOMED CT concepts (35%) in the digestive chapter from                      definitions
the Clinical finding hierarchy that were lexically mapped to       It is interesting to try to understand why they are so many
ICD-11 classes show modelling issues resulting in misa-            issues: let us take the example of hypertension. In clinical
lignments between the meaning of the ICD-11 MMS classes            settings, most healthcare professionals who use “hyperten-
(as given by their name, hierarchic context and text defini-       sion” in their daily patient monitoring practice this means
tion) and formal axioms that characterise SNOMED CT                exclusively systemic arterial hypertension, which is a fre-
concepts. We equally found misalignments within                    quent disease. However, the SNOMED CT concept
SNOMED CT, i.e. between Fully Specified Names and                  59621000 Essential hypertension (disorder) is described by
formal axioms. As shown in Table 4, in most of the cases           the expression:
this is related to the high number of primitives, i.e. not fully
defined SNOMED CT concepts but as well with some fully                   Has definitional manifestation (attribute) some
defined concepts.                                                         Finding of increased blood pressure (finding) and
                                                                         RoleGroup some (‘Finding site (attribute)’ some
4.1    Misalignment between SNOMED CT concept                               ‘Systemic circulatory system structure (body structure)’)
       FSN and primitive representation                            On the other hand, the SNOMED CT 11399002, Pulmonary
There were higher rates of primitive in lexical and meaning        hypertensive arterial disease (disorder) is described with
match types M2 vs M1, viz. 91% vs 21% in the Circulatory           RoleGroup some (‘Finding site (attribute)’ some
chapter and 84% vs 23% in the Digestive chapter; and in A2            ‘Pulmonary artery structure (body structure)’)
vs A1 53% vs 35% in the Circulatory chapter and 52% vs             Both are primitive concepts, and since 24184005. Finding
47% in the Digestive chapter.                                      of increased blood pressure (finding) is clinically under-
                                                                   stood as a finding measuring only for systemic arterial hy-
What is challenging is that the OWL axioms allow a fully
                                                                   pertension it cannot be applied to Pulmonary hypertensive
defined representation. For example, Essential hypertension
                                                                   arterial disease.
(ICD-11 class BA 00), lexically matched to the SNOMED
                                                                   On the other hand, the CG formalism would allow the fol-
CT concept 59621000 Essential hypertension (disorder) is
                                                                   lowing representations:
the most frequent arterial disease. SNOMED CT does not
represent the lack of secondary cause, which is the meaning         ‘Pulmonary hypertensive arterial disease (disorder)’
of “essential” or “idiopathic”. SNOMED CT CG provides                 subclassOf
                                                                      RoleGroup some (‘Finding site (attribute)’ some
the possibility to represent the lack of secondary cause by
                                                                           ‘Pulmonary artery structure (body structure)’) and
adding the following expression:                                      ‘Has interpretation (attribute)’ some
           ‘Pathological process (attribute)’ some                       ‘Abnormally high (qualifier value)’ and
              ‘spontaneous (qualifier value)’                         ‘Interprets (attribute)’ some
                                                                         ‘Blood pressure (observable entity)’
Apart from some other cases of SNOMED CT concepts
with the wording “of unknown etiology” there are numer-             ‘Essential hypertension (disorder)’
ous cases of “real” qualifying adjectives that are not reflect-       subclassOf
ed in the definition, such as 85598007, Constrictive peri-            RoleGroup some (‘Finding site (attribute)’ some
carditis (disorder) with no representation of “constrictive”,          ‘Systemic circulatory system structure (body structure)’) and
373945007 Pericardial effusion (disorder) with no repre-              ‘Has interpretation (attribute)’ some
sentation of “effusion” and 706882009 Hypertensive crisis                ‘Abnormally high (qualifier value)’ and
                                                                      ‘Interprets (attribute)’ some
(disorder) with no representation of “crisis”.                           ‘Blood pressure (observable entity)’ and
                                                                       ‘Pathological process (attribute)’ some
                                                                          ‘Spontaneous (origin) (qualifier value)’



                                                                                                                                        5
If the clinical vocabulary (interface terminology) and the       omatized expression minus the representations (codes) or
logic-based descriptions were defined independently, this        “traumatic tears of meniscus” as recommended in [8].
would reduce the problem. However, there would still be
issues where the full meaning of the natural language ex-        5    CONCLUSION
pression would not be captured in the formal logical expres-
                                                                 To answer the main question of this paper, viz. whether the
sion.
                                                                 logic based expressions in SNOMED CT are blurred by a
The difference between flexible human language and ma-           primarily language-driven modelling approach, we can state
chine-required logic is apparent in the SNOMED CT Edito-         the following points as a route to an answer:
rial guide [1]. What is an inappropriate synonym when a
                                                                 SNOMED CT currently integrates two aspects, a reference
synonym is defined by SNOMED as “a term other than the
                                                                 clinical terminology and a formal ontology.
FSN that is an acceptable way to express the meaning of a
SNOMED CT concept in a particular language”? This syn-           It is necessary to distinguish clearly the part of SNOMED
onym is anchored to a FSN which shall be aligned on the          CT natural language definition to be used as the basis of a
FSN concept model instance. An inappropriate synonym             formal representation in the Composition Gram-
must therefore be “an acceptable (or unacceptable) way to        mar/Description Logic from the part used for the manage-
express the meaning of a SNOMED CT concept” and                  ment of the clinical interface vocabularies used by clinicians
aligned or not aligned on the FSN concept model instance.        in electronic health records. Clinical language is character-
                                                                 ised by lexical ambiguities due to brevity and assumed con-
The dimension of this issue is summarized by 24,782 shared
                                                                 text. The words used by clinicians often hide widely under-
terms between pairs of active concepts either in one hierar-
                                                                 stood conventions that, if taken literally, give rise to incor-
chy or across hierarchies. In the Clinical findings disorder
                                                                 rect formal representations.
hierarchy there are 1394 instances of duplicate terms
(around 3%). Across hierarchies, most of duplicate terms         Given the conflict between clinical usage and formal repre-
are between Product and Substance, e.g. 53009005 Analge-         sentation, errors in the axiomatized formal content arise
sic (product) and 373265006 Analgesic (substance). Such          easily. External validation of the axiomatic content in
definitions (a drug name replaced by the name of the active      SNOMED CT is critical to reach validated DL-based (or
ingredient) are acceptable for interface terminologies but       any other logic-based) model medical knowledge and con-
inappropriate for ontological standards. This therefore sug-     cept descriptions. The harmonization of SNOMED CT with
gests a principled reworking of the relations between FSN,       ICD-11 provides one example of such an external valida-
concept model instances and synonyms.                            tion.
Another example is related to negation as in Non traumatic
tear of meniscus. The formal SNOMED CT expression is
                                                                 REFERENCES
                                                                 1.   SNOMED CT® Editorial Guide January 2017 International Release
based on their Compositional Grammar (equivalent to
                                                                      (US English) chapter 2.1. snomed.org/eg last access 15 may 2017
OWL-EL and EL++ without disjointness), which does not            2.   Rodrigues JM, Robinson D, Della Mea V, Campbell J, Rector A,
support any form of negation. Here the question arises                Schulz S, Brear H, Üstün B, Spackman K , Chute CG , Millar J, Sol-
whether the negative expression might be rather restricted to         brig H, Brand Persson K. Semantic Alignment between ICD-11 and
                                                                      SNOMED CT. Studies in health technology and informatics. 2015;
a common interface term feature or represented in CG. Such            216:790-794.
an interface term, in our example, could point to a fully        3.   SNOMED CT Compositional Grammar. Version 2.3.1 may 2015.
specified name Degenerative tear of meniscus. But on a                http://snomed.org/scg last access 15 may 2017.
                                                                 4.   IHTSDO Browser. http://browser.ihtsdotools.org/ last access 30 may
logic basis as there are developmental, inflammatory, or              2017
other non-traumatic non-degenerative tears it does not ap-       5.   WHO Browser. http://who.int/classifications/icd11/browse/f/en. Last
pear correct to equate non-traumatic and degenerative carti-          access 08 may 2017
                                                                 6.   Mamou M, Rector AL, Schulz S, Campbell J, Solbrig H, Rodrigues.
lage tears. The issue is that even if negation is understanda-        Representing ICD-11 JLMMS Using IHTSDO Representation For-
ble at the clinical interface terminology level it cannot be          malisms. Studies in Health Technology and Informatics. 2016;228:
represented with the SNOMED formalism. The logical                    431-435.
alternative is to point the negated concept at the alternative   7.   Reiter R. On closed world data bases. In H. Gallaire and J. Minker,
                                                                      editors, Logic and Data Bases, Plenum, New York. 1978, 55-76.
concepts – developmental, degenerative, etc.                     8.   Schulz S, Rodrigues JM, Rector AL, Chute CG.Interface Terminolo-
                                                                      gies, Reference Terminologies and Aggregation Terminologies: A
This is the base of the solution we recommend to represent            Strategy for Better Integration. Studies in Health Technology and In-
such concepts or classes clinical names. For example, it is           formatics. 2017: accepted for publication
possible to represent the closely related notion “tears of       9.   de Matos, P., Alcántara, R., Dekker, A., Ennis, M., Hastings, J.,
                                                                      Haug, K., Spiteri, I., Turner, S., and Steinbeck, C. (2010). Chemical
meniscus excluding traumatic tears” as a query on the repre-          Entities of Biological Interest: an update. Nucl. Acids Res., 38,
sentation (codes) for “ tears of meniscus” which is an axi-           D249–D254.




6