=Paper= {{Paper |id=Vol-3603/Paper4 |storemode=property |title=Towards Principles of Ontology-Based Annotation of Clinical Narratives |pdfUrl=https://ceur-ws.org/Vol-3603/Paper4.pdf |volume=Vol-3603 |authors=Stefan Schulz,Warren Del-Pinto,Lifeng Han,Markus Kreuzthaler,Sareh Aghaei Dinani,Goran Nenadic |dblpUrl=https://dblp.org/rec/conf/icbo/0001DHKDN23 }} ==Towards Principles of Ontology-Based Annotation of Clinical Narratives== https://ceur-ws.org/Vol-3603/Paper4.pdf
                                Towards principles of ontology-based annotation of
                                clinical narratives
                                Stefan Schulz1,2,∗ , Warren Del-Pinto3 , Lifeng Han3 , Markus Kreuzthaler1 ,
                                Sareh Aghaei1 and Goran Nenadic3
                                1 Institute for Medical Informatics, Statistics and Documentation, Medical University of Graz, Austria
                                2 Averbis GmbH, Freiburg, Germany
                                3 Department of Computer Science, University of Manchester, UK



                                                                      Abstract
                                                                      Despite the increasing availability of ontology-based semantic resources for biomedical content
                                                                      representation, large amounts of clinical data are in narrative form only. Therefore, many clinical
                                                                      information management tasks require to unlock this information using natural language processing (NLP).
                                                                      Clinical corpora annotated by humans are crucial resources. On the one hand, they are needed to train
                                                                      and domain-fine-tune language models with the goal to transform information from unstructured free
                                                                      text into an interoperable form. On the other hand, manually annotated corpora are indispensable for
                                                                      assessing the results of information extraction using NLP. Annotation quality is crucial. Therefore, detailed
                                                                      annotation guidelines are needed to define the form that extracted information should take, to prevent
                                                                      human annotators from making erratic annotation decisions and to guarantee a good inter-annotator
                                                                      agreement. Our hypothesis is that, to this end, human annotations (and subsequently machine annotations
                                                                      learned from human annotations) should (i) be based on ontological principles, and (ii) be consistent
                                                                      with existing clinical documentation standards. With the experience of several annotation projects, we
                                                                      highlight the need for sophisticated guidelines. We formulate a set of abstract principles on which such
                                                                      guidelines should be based, followed by examples of how to keep them, on the one hand, user-friendly and
                                                                      consistent, and on the other hand compatible with the international semantic standards SNOMED CT and
                                                                      FHIR, including their areas of overlap. We sketch the representation of the resulting representations in a
                                                                      knowledge graph as a state-of-the-art semantic representation paradigm, which can be enriched by addi-
                                                                      tional content on A-Box and T-Box levels and on which symbolic and neural reasoning tasks can be applied.


                                                                      Keywords
                                                                      Formal Ontologies, Clinical Information Models, Natural Language Processing, Text Annotation Guide-
                                                                      lines, Electronic Health Records




                                Proceedings of the International Conference on Biomedical Ontologies 2023, August 28th-September 1st, 2023,
                                Brasilia, Brazil
                                ∗ Corresponding author

                                $ stefan.schulz@medunigraz.at (S. Schulz); warren.del-pinto@manchester.ac.uk (W. Del-Pinto);
                                lifeng.han@manchester.ac.uk (L. Han); markus.kreuzthaler@medunigraz.at (M. Kreuzthaler);
                                sareh.aghaei-dinani@medunigraz.at (S. Aghaei); gnenadic@manchester.ac.uk (G. Nenadic)
                                 0000-0001-7222-3287 (S. Schulz); 0000-0003-3307-9432 (W. Del-Pinto); 0000-0002-3221-2185 (L. Han);
                                0000-0001-9824-9004 (M. Kreuzthaler); 0000-0002-0511-095X (S. Aghaei); 0000-0003-0795-5363 (G. Nenadic)
                                                                    © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
                                 CEUR
                                 Workshop
                                 Proceedings
                                               http://ceur-ws.org
                                               ISSN 1613-0073
                                                                    CEUR Workshop Proceedings (CEUR-WS.org)




CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings



                                                                                                                                                                                                              36
1. Introduction
A seamless and effective flow of clinical information is vital for high-quality healthcare and health
management. Thus, information must be stored in a way that supports effective communication,
search, and analysis. However, most content of electronic health records (EHRs) consists of
unstructured text in documents and free-text database fields. As a result, crucial data that require
key insights into populations and individuals remain inaccessible without additional processing.
   In contrast, the good news regarding EHR interoperability is the increasing support by elabo-
rated semantic standards, viz. terminologies, ontologies (e.g., SNOMED CT, LOINC) [1] and
information models (e.g., FHIR [2]), which enjoy increasing international adoption. Interoper-
able clinical data representations would ideally bridge syntactically different but semantically
equivalent expressions in different degrees of structure, from data collected in standardised forms
to clinical narratives, across disciplines, jurisdictions, and natural languages.
   To achieve this goal for narrative data, text passages must be linked to identifiers from a
controlled vocabulary, a process referred to as semantic annotation or tagging. Such vocabularies
are typically rooted in semantic resources such as those mentioned above. An additional step
is the assertion of links between tags, known as relation annotation. To do this manually is
resource-intensive and difficult in practice; human annotators need to be trained and monitored,
and diverging annotations must be reconciled by subsequent adjudication.
   Nevertheless, annotated text corpora are indispensable as a “fuel” for training, domain-fine-
tuning, and evaluation of natural language processing (NLP) models. Fig. 1 shows an example of
a manual annotation of a clinical text passage.
   Annotation guidelines play a vital role in this process. They target consistency and unifor-
mity by providing clear instructions, reducing variability, and ensuring that annotations are
standardised across annotators and projects. The inter-annotator agreement serves as a measure
of the effectiveness and quality of guided annotation processes. Annotation guidelines provide
instructions on, and examples of, handling ambiguous cases, addressing recurring challenges,
and resolving annotation discrepancies.
   An ideal annotation should produce, with the same input text, the same target representation cre-
ated by different trained annotators. The same should be achieved based on different paraphrases
of the same input or with its translation into a different language. Even where annotations differ,
semantic equivalence should be stated based on logical reasoning. Although these desiderata will
probably never be completely fulfilled, they justify more effort spent on principled annotation
guidelines.
   Several guidelines for annotating clinical narratives have been proposed. Examples include the
Clinical E-Science Framework (CLEF) [3, 4] which focuses on clinical information extraction
tasks, annotation guidelines for the identification of personal health information such as patient
names [5] and a range of guidelines that are very specific in terms of language and task, e.g.
lung diseases in Japanese [6], family history in Norwegian [7], diseases in Spanish [8], disorders,
findings, drugs and body structures in Swedish and Chinese [9, 10]. Guidelines may also specify
how to annotate temporal references [11, 12], negation statements [13], and other contextual
markers.
   Although the recognition of diagnostic statements has been a preferred goal in clinical an-
notation tasks, the need for a broader coverage of annotation guidelines has been recognised




                                                                                                        37
Figure 1: Semantic annotation example of a passage from a discharge summary




by Luo et al. [14], who extended the scope of annotation to “medical problems, treatments and
tests”. These examples show that previous works on annotating clinical text have often produced
guidelines that were specific in scope, constrained to a particular semantic type or designed for a
very use-case-specific data set.
   Although the argument that specific use cases necessitate specific annotation strategies is
valid, we postulate a need for more general principles that provide a systematic approach to
clinical text processing. Such an approach should be based on ontological principles [15, 16]
and international semantic standards [17]: a prerequisite to ensuring that data from different
sources can be meaningfully compared. Once such principles have been formulated, task-specific
guidelines could be instantiated without unnecessary repetition of previous efforts.
   In this work, we propose such general annotation principles. Regarding the annotation vo-
cabulary, we focus on SNOMED CT [18], an ontology-based clinical terminology system, and
FHIR [2], a set of information templates for recurring clinical documentation tasks. Both are
considered leading international semantic standards for healthcare data. We separate ontological
aspects – the description of classes of biomedical entities and their properties, which is the
domain of SNOMED CT – from contextual information about individuals, which are represented
as instances of FHIR resources. This addresses A. Rector’s claim to distinguish “models of
meaning” (ontologies) and “models of use” (information models) [19]. Applied to clinical text
annotation, this means that FHIR resources provide semantically explicit templates to capture
instance-level information on epistemic, temporal, and provenance aspects belonging to a given
patient and his/her documentation context. In contrast, SNOMED CT codes represent types of
clinical entities1 , whose instances are referred to by an appropriate field in the FHIR template. It
is well known that SNOMED CT also supports epistemic and temporal aspects in its Situation in
specific context hierarchy, whereas FHIR resources include references to HL7 value sets, mostly
for roles and qualities, which compete with SNOMED CT Qualifier value concepts. This overlap
constitutes a complicating factor of clinical information management in general and clinical text
annotations in particular. Efforts to principally handle this overlap2 have remained unfinished.




1AKA concepts or T-box entities, modelled as OWL classes.
2 TermInfo Project. (This and the following footnotes: click to navigate to the respective project website)




                                                                                                              38
2. Methods
We employed a qualitative and heuristic approach to develop annotation guidelines for standards-
aware representation of clinical documents. The goal was to bootstrap annotation rules from
sample narratives, inspect the results, discuss disagreements between annotators, identify recur-
ring patterns and structures, and consolidate a final guideline after a series of iterations. The
authors have gained experience in numerous annotation activities. These include the use of
SNOMED CT for the annotation of clinical summaries in Brazilian Portuguese [20] and for
clinical text snippets in five European languages [21], and the application of a set of manually
designed annotation rules to TAC20173 data to supervise the extraction of adverse drug reactions
and related entities. SNOMED CT has also been used to provide the semantics for normalising
clinical event mentions in diagnostic statements extracted from hospital data in the UK.
   The problems of representing context using SNOMED CT and FHIR [22] led the authors to
pilot annotation tasks using the German corpus GRASCCO4 and a clinical corpus derived from
UK hospital data by using FHIR, including HL7 value sets, as a framework in which SNOMED
CT is used for the ontological information proper. Soon, it became necessary to formulate rules
to manage the overlap between both systems. In addition, it became clear that the handling of the
complex structure of FHIR was difficult for the human annotators.
   Guideline-based text annotations are also in the centre of the ongoing projects AIDAVA5
(Horizon Europe), the German annotation initiative GemTeX6 , as well as the UK projects
JIGSAW7 and HIPS8 . The latter projects required the formulation of annotation guidelines
to create diagnosis representations in SNOMED CT, using the Situation in specific context
hierarchy, which proposes precoordinated content and post-coordination patterns for factuality
(e.g. “suspected asthma”), temporality (e.g. “history of major depression”) and family history
(“mother died from breast cancer”).
   While FHIR is compatible with multiple terminologies, the use of SNOMED CT was motivated
in part by its widespread adoption, e.g., by the requirement that all systems of the UK National
Health Service (NHS) must use SNOMED CT as a core terminology9 . Additionally, given the
required annotation tasks, the broad coverage of SNOMED CT hierarchies, which include clinical
findings (disorders), pharmaceutical products, procedures, and others, was deemed beneficial.
   The continuous crafting and revision of annotation guidelines, supported by the outcomes of
annotator training sessions, resulted in the creation of a set of annotation principles as formulated
in the next section, followed by a case study that serves as an instantiation of these principles,
centring on the rooting of annotations into the standards SNOMED CT and FHIR.




3 Adverse Drug Reaction Extraction from Drug Labels
4 Graz Synthetic Clinical Text Corpus
5 AI-powered Data Curation & Publishing Virtual Assistant
6 German Medical Text Corpus
7 Assembling the Data Jigsaw
8 Healthcare Impact Partnership: Integrating hospital outpatient letters into the healthcare data space
9 NHS Digital, SCCI0034




                                                                                                          39
3. Results
3.1. Proposed Annotation Principles
Different use cases require different annotation guidelines. For example, secondary uses of clinical
data such as epidemiological research have different requirements than data representation for
providing direct patient care or data support for billing purposes. Therefore, we first propose a set
of general principles from which more specific annotation guidelines for given applications can
be derived. This first set of annotation principles is independent of the annotation vocabularies
used, i.e. the ontologies and information models (or subsets thereof) used.

    • Annotation is limited to the semantic aspects of narratives. It comprises the assignment of
      codes for unary predicates (types, but also individuals) from a domain ontology, together
      with literals such as decimal numbers to spans of text. These annotations are additionally
      connected by binary predicates (relations). Both unary and binary predicates are rooted in
      a semantic reference such as a domain ontology or a clinical information model.
    • The endpoint is a canonical form of representing clinical narrative information as a primary
      knowledge graph (KG), with subject-predicate-object triples describing individual patients
      and related clinical entities. This primary KG should then be transferable, by applying
      supporting rules and resources, into a knowledge graph that follows the structures of the
      underlying standards and is committed to Applied Ontology principles. Such a KG makes
      a clear T-Box/A-Box distinction, i.e. between nodes that point to entity types as given by
      an ontology and those that point to individual entities that instantiate the types provided
      by the annotation. The difference is relevant, because, e.g., “asthma” should point to an
      individual entity (particular) in an affirmative context (“The patient has asthma”), but to
      an entity type (concept) in a negative context (“no signs of asthma”). Another ontological
      distinction is the one between information entities [23] (instances of FHIR resources) and
      the clinical entities they are about.
    • This complexity needs to be hidden from annotators. Predicate types should be restricted
      to the necessary. E.g., the relation between “tumour” and “right” is ontologically very
      different from the one between “tumour” and “suspected”. Nevertheless, a literal annotation
      (by using predicates such as “laterality” and “verificationStatus” in the same way) would
      suffice for the annotators, provided that annotation post-processing is sufficiently specified
      by transformation rules.
    • Scope and granularity are determined by the underlying annotation vocabulary. For re-
      stricted annotation tasks, subsets of the maximum vocabulary are provided to the annotators.
      For example, a targeted cancer annotation task might not require fine-grained reference
      to unrelated diseases like heart disorders or injuries. This could motivate the pruning of
      sub-hierarchies, e.g., underneath Coronary disease or Fracture of bone.
    • Annotations are descriptive and not interpretative: annotators annotate only what they read,
      without complicating their task by seeking individual interpretations. For example, the
      annotations “fever” after “hip replacement” are only linked with a predicate for causality
      if there is a causality statement in the text. Two exceptions to this rule are highlighted:
      (i) word sense disambiguation, as long as meaning can be derived from the context; and




                                                                                                        40
     (ii) co-reference, as long as the antecedent to which an anaphoric reference points back is
     identifiable.
   • The granularity of annotation spans is not given by a named entity recognition step prior to
     annotation, which would yield entity types, such as Disorder and Body part as annotations
     of the spans “fracture of skull” or “left ventricle”. Instead, the spans are determined by
     the underlying annotation vocabulary. The principle of longest match is followed and
     pre-coordinated expressions are used as preferred if they correspond to a contiguous span.
     Otherwise, e.g. in “the skull exhibited the sign of a fracture”, shorter text spans (“skull”,
     “fracture”) are annotated and linked afterwards. If the representation of the meaning of
     a span requires more than one code, a conjunction (logical AND) is preferred over a
     construction that additionally requires binary predicates.
   • The annotation vocabulary is distinguished between core content and supporting content.
     The former one characterises the central focus of the given annotation task, as defined by
     the intended use case, while the latter one provides additional, mostly refining information
     to the core. For example, for the identification of diagnosis statements, core content
     would be given by the concepts of the Clinical Finding hierarchy of SNOMED CT, such
     as diseases. Supporting concepts would be those under Body Structure, which specify a
     particular Clinical finding concept, as well as those that capture factuality, such as Probably
     present.

   The following annotation principles explicitly refer to the use of SNOMED CT and HL7 FHIR
as annotation vocabularies.

   • “Core” hierarchies as introduced above are Clinical finding, Event, Observable entity,
     Pharmaceutical / biologic product, Procedure, Specimen in SNOMED CT. They have a
     high proportion of fully defined concepts, expressed by OWL ‘equivalentTo’ axioms, in
     which concepts from “non-core” hierarchies such as Substance, Organism, Body structure
     and Physical object are referred to by existentially quantified links. Ambiguous annotations
     are addressed by preference rules, e.g. to prefer C1 over C2 , if C1 belongs to a core hierarchy.
     For example, SNOMED CT offers different codes for “Sarcoma”, viz. Sarcoma (disorder)
     and for Sarcoma (morphological abnormality). The former is preferred for annotation of
     clinical texts because it is fully defined and axiomatically implies the latter.
   • The hierarchy Situation in specific context – although it would formally correspond to a core
     hierarchy – is not used for annotation because FHIR has been shown to be more granular,
     actively maintained, and frequently used to represent the context of clinical statements.
   • A set of binary predicates with their own namespace “anno:” was introduced for close-
     to-user relation annotation. These binary predicates were grounded in (i) SNOMED CT
     object properties or chains thereof, (ii) relational chains of FHIR elements, or (iii) both. For
     example, the predicate site between a SNOMED CT clinical finding and a body structure,
     is mapped to the linkage concept (relation) ‘finding site’ as well as the concatenation of
     the inverse of the FHIR element Condition.code with Condition.body, cf. Table 2.
   • SNOMED CT mappings to HL7 value sets are proposed, e.g. hl7:Recurrence to
     sct:Recurrent or hl7:Confirmed to ‘sct:Confirmed present’ (see Table 2).




                                                                                                         41
  Guidelines designed for a specific annotation task can be viewed as an instantiation of the
general principles outlined above. It is then necessary to define sets of permitted codes for both
the core and supporting concepts to be drawn from, for example, SNOMED CT reference sets.
To summarise, the steps in utilising such a guideline to annotate clinical text are as follows:

    • Identify a core concept mentioned explicitly in the document.
    • Assign to the identified phrase a suitable concept from the set of possible core concepts.
    • If present, identify phrases corresponding to supporting concepts that refine the core
      concept.
    • Assign to each identified supporting concept a suitable SNOMED CT concept, drawn from
      the set of possible supporting concepts.
    • For each of the supporting concepts, link them directly to the core concept and identify the
      type of relation.

  Given a clinical document, the above steps can be repeated until all of the relevant clinical
mentions have been captured. The final annotation shall be a semantic representation of the
explicit meaning of the original clinical text.

3.2. Examples
Here, we provide examples of applying these general principles to produce task-specific templates
for annotating clinical text.

3.2.1. Example from JIGSAW/HIPS
The JIGSAW and HIPS projects include the task to extract diagnosis information from both
semi-structured lists and free-text narratives in outpatient letters for secondary use purposes
such as specifying patient cohorts for epidemiological studies. The annotation task was decided
to be uniform across both projects to ensure consistent capture of relevant information. The
combination of SNOMED CT with FHIR naturally resulted from the NHS use of SNOMED CT as
its preferred terminology, and the need to represent and communicate instance-level information
about patients. Following the principles outlined above, core and supporting concepts were
identified. As a source for core concepts all concepts in the SNOMED CT Clinical Finding
hierarchy were taken. The supporting concepts were primarily taken by the elements of the FHIR
Condition resource10 , with additional relations from the SNOMED CT concept model including
Associated morphology and Causative agent. In order to keep the annotation task consistent and
simple, the initial assignment of codes to diagnostic statements in the text has been entirely done
using concepts from the SNOMED CT Clinical finding hierarchy. The linking of supporting
concepts to their corresponding core concepts follows the strategy outlined in the previous section:
a set of binary predicates were specified for relation annotation, to avoid the need for annotators
to be familiar with all SNOMED CT “linkage concepts”11 and FHIR elements. In this context,
given the narrative phrase “osteoarthritis of the spine”, the FHIR Condition resource specifies that

10 FHIR Condition
11 Corresponding to OWL object and datatype properties.




                                                                                                       42
a Clinical finding code should be provided for the condition being diagnosed and, if necessary, a
body part can also be provided in the form of a SNOMED CT Body structure code. Therefore,
Osteoarthritis is a core concept that is refined by the SNOMED CT Clinical finding concept
Osteoarthritis. Meanwhile, “spine” acts as supporting information. It is annotated using the Body
Structure concept Joint structure of spine. As a result, it is necessary to perform post-processing
both to map the provided SNOMED CT codes to appropriate FHIR values and to map the relations
to either SNOMED CT object properties or FHIR elements where applicable.
   For FHIR elements with value sets that are specified as SNOMED CT codes by default, such
as Condition.bodySite, no mapping was required. For those that specify an alternative FHIR
value set, such as Condition.verificationStatus and Condition.clinicalStatus, mappings were
specified between appropriate SNOMED CT concepts and the FHIR values. These mappings
were based upon discussion with clinicians regarding which information is needed for their use
case, such as the need to specify uncertainty of diagnoses via SNOMED CT concepts such as
Probably present, and how these should be interpreted in FHIR, for example Provisional.

3.2.2. Example from AIDAVA
Regarding the annotation of composed expressions, AIDAVA puts more emphasis on the use
of pre-coordinated SNOMED CT content. For the contiguous span “osteoarthritis of the spine”
preference would be given to the single concept Spondylosis, whereas a passage such as “os-
teoarthritis of knees, hips, spine” would require a post-coordinated expression such as exemplified
in 3.2.1. The important is here that due to the formal axioms in SNOMED CT, equivalence can
be stated between pre-coordinated content and post-coordinated expressions.
   AIDAVA also required mappings between SNOMED CT and FHIR as shown in Table 1. In
order to support, alternatively, queries on SNOMED CT and FHIR, more attention was paid
to the interoperability of predications, i.e. the use of binary predicates and their anchoring in
both standards. In general, predications are straightforward at a text level but complex at an
ontology-based representation level. The phenomenon that representations may compete, e.g.
SNOMED CT only vs. SNOMED CT in FHIR contexts had to be addressed, cf. Table 2. Figure
2 demonstrates the generation of an ontology-based knowledge graph, as intended by AIDAVA,
consisting of SNOMED CT concepts, instances thereof, and literals like dates and identifiers,
emerging from guideline-driven annotations of clinical narratives. It shows how, according to the
annotation predicates chosen, different FHIR resources, viz. Condition and FamilyMemberHistory
are instantiated. It also shows how the choice of the predicate anno:verificationStatus related to
sct:Suspected leads to a reference to the concept ’sct:Neoplasm of breast’, whereas the annotation
without modifier (which assumes that the diagnosis is confirmed) the same FHIR relation points
to an individual referent that instantiates ’sct:Neoplasm of breast’. Other details are only hinted
at, such as the class Information content entity from the Information Artifact Ontology12 , as well
as the inferred relation ‘occurs in’ from the Basic Formal Ontology (BFO)[24].




12 Information Artifact Ontology (IAO)




                                                                                                      43
Figure 2: From text to ontology-based knowledge graphs. The text level shows three text snippets
(quoted) belonging to clinical texts about one patient. The human annotations (grey background)
use SNOMED CT and “anno:” as annotation vocabularies. The lower, most complex level shows
the instantiations of FHIR resources with references to SNOMED concepts, instances thereof, and
literals. Nodes with random IDs represent individuals, linked to the concept whose code appears in the
annotation. An example of an inferred predication is shown by a dashed arrow.



Table 1
Examples for mappings between HL7 FHIR values and SNOMED CT concepts
                HL7 FHIR        SNOMED CT (from the Qualifier value hierarchy)
                Unconfirmed     Suspected OR Probably not present
                Provisional     Probably present OR Suspected
                Confirmed       Confirmed present
                Refuted         Known absent


4. Discussion
Past clinical annotation projects were often based on UMLS CUIs, as freely accessible but
semantically often shallow concept identifiers, or on in-house annotation languages which restrict
their openness such as in [10]. In other cases, annotations were limited to high-level entity types,
such as Disorder and Body part, with a focus on relations. Many different ontologies were used,
among which the Human Phenotype Ontology (HPO) or the Disease ontology (DO) should be
emphasised.
   The reason why we focus on SNOMED CT is its growing international acceptance as a standard
for all health record content, its scope and granularity, and particularly its logical under-fitting,
which facilitates the bridging between pre-coordinated and post-coordinated expressions.




                                                                                                         44
    However, our approach also have some reservations. It has been argued that SNOMED CT is
little used in routine, particularly in continental Europe, that current licenses exclude important
jurisdictions, and that translations are still missing. We reply that the status quo in clinical
terminologies, with national ICD versions, national procedure classifications and drug catalogues,
does not offer a convincing interoperability perspective without SNOMED CT. Furthermore,
SNOMED CT is fully available to the research community, so the fact that its productive use
requires a licence, should not be an obstacle.
    One specific limitation is the still unresolved management of the overlap between SNOMED
CT, FHIR, and related value sets. A further continuation of the Terminfo work in the light of FHIR
would be desirable. Another limitation is that numerous SNOMED CT concepts lack formal and
textual definitions, and pose challenges to annotators, particularly with texts in languages for
which no official translation exists.
    We are convinced that in times when large language models are skyrocketing, and under the
hypothesis that machine understanding of clinical language is a realistic goal, semantic standards
do not become obsolete. On the contrary, large language model technology has to be leveraged
to generate canonical, standardised representations. Such representations as a gold standard for
clinical content representation need to be elaborated and refined. We understand the proposed
annotation principles as a step in this direction.


5. Conclusion and Future Work
Clinical annotation guidelines are crucial for structured data preparation, content representation
for human reading and searching, enhancing supervised learning and for benchmarking language
models via gold-standard data sets. Clinical annotation tasks become even more sophisticated
due to the diversity of the text and broad coverage of knowledge across multiple dimensions, such
as diseases, signs, symptoms, findings, procedures, as well as temporality, factuality, and other


Table 2
Example binary predicates for relation annotations with their translation into SNOMED CT and FHIR
syntax. “INV” = inverse, “||” = concatenation. The relation paths marked with [a] refer to the concatena-
tion of SNOMED CT relations, those marked with [b] to roughly equivalent FHIR elements
  anno:          Domain            Target path                                            Range
  site           ‘sct:Clinical     [a]‘sct:Finding site’                                  ‘sct:Body
                 finding’          [b] INV(fhir:Condition.code) || fhir:Condition.body    structure’
  site           ‘sct:Procedure’   [a]‘sct:Procedure - Direct’                            ‘sct:Body
                                   [b] INV(fhir:Procedure.code) || fhir:Procedure.body    structure’
                                   [b] INV(fhir:FamilyMemberHistory.condition) ||
  inFamily       ‘sct:Clinical
                                   fhir:FamilyMemberHistory.relationship                  ‘sct:Person’
                 finding’          [a] INV(‘sct:Associated finding’)
                                   || ‘sct:Subject relationship context’
                                   [b] INV(fhir:Condition.code) ||
  verification   ‘sct:Clinical                                                            Qualifier
                                   fhir:Condition.verificationStatus
  status         finding’          [a] INV(‘sct:Associated finding’)                      value’
                                   || ‘sct:Finding context’                               (cf. Tab. 1)




                                                                                                            45
contexts. Existing annotation guidelines have been mostly motivated by shared task organisers to
solve NLP challenges such as entity recognition and relation extraction, which tended to lead
to shallow policies. With the purpose to address clinical annotations from an ontology-driven
methodology and to take advantage of the rich content of SNOMED CT and FHIR, we proposed
a set of annotation principles and designed the mapping of annotations into SNOMED CT and
FHIR via entity and relation linking, as well as expression normalisation. The full design and
implementation of our annotation principles cover a broad and in-depth semantic representation
of the original clinical text, for which we recommend a representation as knowledge graphs,
which can then be enriched by additional content on A-Box and T-Box level and on which both
symbolic and neural reasoning tasks can be applied. For practical applications, we do suggest the
users carry out the core annotation first and choose the level of depths of annotations according
to their needs.
   We understand that clinical free text annotation is a huge and challenging task, and the goal
of achieving our full guideline principles might take a long journey. But there is indeed such a
need to unify the clinical annotation tasks so that it can facilitate clinical research across different
sectors, and languages, as well as for current large NLP model training.
   In future work, we will prepare more annotation examples with a full set of instructions. We
also plan to carry out the evaluation of our principles from the NLP perspective, in addition to
case studies for measuring the inter-rater agreement [25] levels via trained workers.


6. Acknowledgments
The work was partially supported by Grant 101057062 “AIDAVA” (funder: the European Commis-
sion, HORIZON-HLTH-2021), Grant “Assembling the Data Jigsaw: Powering Robust Research
on the Causes, Determinants and Outcomes of MSK Disease” (The project has been funded by
the Nuffield Foundation, but the views expressed are those of the authors and not necessarily the
Foundation. Visit www.nuffieldfoundation.org) and Grant EP/V047949/1 “Integrating hospital
outpatient letters into the healthcare data space” (funder: UKRI/EPSRC).


References
 [1] O. Bodenreider, R. Cornet, D. J. Vreeman, Recent developments in clinical terminolo-
     gies—SNOMED CT, LOINC, and RxNorm, Yearb Med Inform 27 (2018) 129–139.
 [2] D. Bender, K. Sartipi, HL7 FHIR: An agile and RESTful approach to healthcare information
     exchange, in: Proc 26th IEEE Symp on Computer-based Med Syst, 2013, pp. 326–331.
 [3] A. Roberts, R. Gaizauskas, M. Hepple, et al., The CLEF corpus: semantic annotation of
     clinical text, in: AMIA Annu Symp Proc, 2007, pp. 625–9.
 [4] A. Roberts, R. Gaizauskas, M. Hepple, et al., Building a semantically annotated corpus of
     clinical texts, J Biomed Inform 42 (2009) 950–966.
 [5] B. R. South, D. Mowery, Y. Suo, et al., Evaluating the effects of machine pre-annotation
     and an interactive annotation interface on manual de-identification of clinical text, J Biomed
     Inform 50 (2014) 162–172.




                                                                                                           46
 [6] S. Yada, A. Joh, R. Tanaka, et al., Towards a versatile medical-annotation guideline feasible
     without heavy medical knowledge: starting from critical lung diseases, in: Proc of the 12th
     LREC, 2020, pp. 4565–4572.
 [7] T. Rama, P. Brekke, Ø. Nytrø, L. Øvrelid, Iterative development of family history annotation
     guidelines using a synthetic corpus of clinical text, in: Proc of the 9th Intl Workshop on
     Health Text Mining and Information Analysis, ACL, 2018, pp. 111–121.
 [8] A. Miranda-Escalada, L. Gascó, S. Lima-López, et al., Overview of DisTEMIST at BioASQ:
     Automatic detection and normalization of diseases from clinical texts: results, methods,
     evaluation and multilingual resources 13390 (2022).
 [9] M. Skeppstedt, M. Kvist, G. H. Nilsson, et al., Automatic recognition of disorders, findings,
     pharmaceuticals and body structures from clinical text, J Biomed Inform 49 (2014) 148–158.
[10] E. Zhu, Q. Sheng, H. Yang, et al., A unified framework of medical information annotation
     and extraction for chinese clinical text, Artif Intell Med (2023) 102573.
[11] W. F. Styler IV, S. Bethard, S. Finan, et al., Temporal annotation in the clinical domain,
     Transactions of the ACL 2 (2014) 143–154.
[12] D. L. Mowery, H. Harkema, W. Chapman, Temporal annotation of clinical text, in: Proc of
     the Workshop on Current Trends in Biomedical NLP, 2008, pp. 106–107.
[13] M. Marimon, J. Vivaldi, N. Bel Rafecas, Annotation of negation in the IULA Spanish
     corpus, SemBEaR –Computational Semantics Beyond Events and Roles (2017) 43–52.
[14] Y.-F. Luo, W. Sun, A. Rumshisky, MCN: a comprehensive corpus for medical concept
     normalization, J Biomed Inform 92 (2019) 103132.
[15] N. Guarino, Formal ontology, conceptual analysis and knowledge representation, Int Journal
     of Human-computer Studies 43 (1995) 625–640.
[16] B. Smith, M. Ashburner, C. Rosse, et al., The OBO foundry: coordinated evolution of
     ontologies to support biomedical data integration, Nature biotech 25 (2007) 1251–1255.
[17] S. Schulz, R. Stegwee, C. Chronaki, Standards in healthcare data, in: K. et al. (Ed.),
     Fundamentals of Clinical Data Science, Springer, Cham(CH), 2019.
[18] J. Millar, The need for a global language–SNOMED CT introduction, Stud Health Technol
     Inform 225 (2016) 683–685.
[19] A. L. Rector, R. Qamar, T. Marley, Binding ontologies and coding systems to electronic
     health records and messages, Applied Ontology 4 (2009) 51–69.
[20] E. J. Pacheco, MorphoMap: Mapeamento automático de narrativas clínicas para uma
     terminologia médica, PhD dissertation. UTFPR, Brazil, 2009.
[21] J. A. Miñarro-Giménez, R. Cornet, M.-C. Jaulent, et al., Quantitative analysis of manual
     annotation of clinical text samples, Int J Med Inform 123 (2019) 37–48.
[22] M. Ayaz, M. F. Pasha, M. Y. Alzahrani, R. Budiarto, D. Stiawan, The Fast Health Interoper-
     ability Resources (FHIR) standard, JMIR Med Inform 9 (2021) e21929.
[23] E. M. Sanfilippo, Ontologies for information entities: State of the art and open challenges,
     Appl Ontology 16 (2021) 111–135.
[24] J. N. Otte, J. Beverley, A. Ruttenberg, BFO: Basic Formal Ontology, Appl Ontology 17
     (2022) 17–43.
[25] S. Gladkoff, L. Han, G. Nenadic, Student’s t-distribution: On measuring the inter-rater
     reliability when the observations are scarce, in: Proc of the RANLP, 2023.




                                                                                                     47