FAIRness of openEHR Archetypes and Templates

                  Caroline Bönisch*, 1[0000-0001-7169-6090], Anneka Sargeant*, 1[0000-0003-3289-948X], Antje
                 Wulff2[0000-0002-2550-2627], Marcel Parciak1[0000-0002-6950-929X], Christian R Bauer1[0000-0003-
                                              2613-419X]
                                                         , Ulrich Sax1[0000-0002-8188-3495]
                1Department of Medical Informatics, University Medical Center Goettingen, Robert-Koch-Str.

                                                  40, 37075 Goettingen, Germany
                2Peter L. Reichertz Institute for Medical Informatics of TU Braunschweig and Hannover Medi-

                                   cal School, Carl-Neuberg-Str.1, 30625 Hannover, Germany

                        Abstract. Background: The FAIR Data Publishing Group designed 15 principles
                        to quantify the FAIRness of scientific data. By using the FAIR Principles it is
                        possible to make scientific data findable, accessible, interoperable and reusable.
                        This paper checks the FAIRness of openEHR archetypes and templates as for-
                        malisms to preserve semantic interoperability in electronic health records. Objec-
                        tives: Within the semantic framework of the HiGHmed project, the aim is to ex-
                        change harmonized data between various institutions and make them available
                        for research, by modelling archetypes and templates within openEHR. To ensure
                        interoperability across various locations, archetypes and templates have been ex-
                        amined in this paper with regard to the FAIR principles (Findable, Accessible,
                        Interoperable and Re-Useable). Methods: Analysis of the archetypes developed
                        in HiGHmed and stored in the HiGHmed Clinical Knowledge Manager to deter-
                        mine the degree of fulfillment of FAIRness. Results: All fifteen FAIR Principles
                        are met, respectively partially fulfilled. The openEHR approach and the Clinical
                        Knowledge Manager as a collaborative library are compatible with the FAIR
                        Principles and are well suited for the exchange of research data.

                        Keywords: openEHR, FAIR, Principles, HiGHmed, Archetypes, Clinical
                        Knowledge Manager, Modelling


               1        Introduction

                   The collection and processing of health data is a prerequisite for medical care and
               patient management. Health data is often recorded electronically in a variety of appli-
               cation systems (e.g. radiological information system or laboratory information system)
               in different formats (Bauer, et al., 2016). In clinics, heterogeneity is intensified not only
               by divergent departmental and personal documentation approaches, but also by the fact
               that technical and clinical parameters of similar examination devices are not described
               in the same way by different manufacturers (Krefting, et al., 2010). Often the exchange
               and comparison of otherwise equivalent data is hindered by e.g. not using a standard-
               ized format, which leads to redundancies and inconsistencies and thus has a significant
               impact on data quality.


                   *Both authors contributed equally to this manuscript


Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
2


   The HiGHmed project funded by the Federal Ministry of Education and Research
(BMBF) aims to establish a shared information governance framework connecting lo-
cal medical data integration centers. The primary goals are the shared use of heteroge-
neous data from different clinical departments and the reuse of collected data for re-
search purposes and clinical care. Exemplary use cases established in the HiGHmed
project are demonstrating the feasibility of the planned governance framework. These
medical driven use cases are located in the areas cardiology, oncology and infection
control and pursuing different objectives. For example, the infection use case is devel-
oping an early detection system trying to encounter outbreaks of multidrug-resistant
germs.
   To achieve the different use case objectives, the data has to be introduced and man-
aged in a semantic framework, which allows to separate “the knowledge and infor-
mation levels in information systems.” (Beale, 2002) Information means specific char-
acteristics of an entity. This includes information that can be assigned to a dedicated
patient (for example John Doe with a blood pressure of 120 to 80). Interpretation of
information (knowledge) denotes statements, which apply to all entities of a class and
are not dependent on a patient. Archetypes, as described in openEHR, form the distinc-
tion of the knowledge: “The term archetype is used to denote knowledge level models
which define valid information structures.” (Beale, 2002) . Templates contain a context-
specific set of archetypes, where the used archetypes can be further constrained to meet
the specific requirements. Templates are often used to represent medical reports or find-
ings.
   HiGHmed follows the openEHR approach to reach semantic interoperability
(Haarbrandt, 2018). Clinical concepts, terminologies and services are combined and
converted into a machine-readable form that is still easily understandable by humans.


2      Objectives

   We aim to evaluate the FAIRness of our openEHR approach (archetypes and tem-
plates) to create semantic interoperable data models within HiGHmed. The evaluation
of medical data is not part of this work.


3      Methods

    In the HiGHmed project, archetypes and templates are created to enable sharing of
harmonized data across various institutions for three different use cases (cardiology,
oncology, infection control). As described in Wulff et al. (Wulff, 2019), 79 archetypes
had been identified in the context of HiGHmed. Most of these archetypes were already
available at the public instance of the Clinical Knowledge Manager (CKM)
(https://www.openehr.org/ckm/) and could be used of-the-shelf with minor changes to
fit the HiGHmed requirements. The CKM in general serves as a collaborative tool to
support the modelling process and acts as a repository of archetypes and templates. The
identified archetypes are constrained and serve as building blocks for templates. Cur-
                                                                                           3


rently, 12 templates, are part of the HiGHmed modelling process. The of-the-shelf ar-
chetypes were already FAIR compliant and have not been modified within the
HiGHmed Project. Some archetypes such as “ethnic background” could not be adopted
from the international CKM and were created according to the requirements of the cli-
nicians involved.
   All templates were newly created in the course of the project, taking care that the
FAIR principles were taken into account. The role of the HiGHmed project is not only
to re-use archetypes and templates but also to create FAIR compliant archetypes and
templates, specific to the HiGHmed use cases, where needed.
   The FAIR Data Publishing Group designed fifteen principles to quantify levels of
FAIRness, such as the principle F2. “data are described with rich metadata”, in 2016.
To check the developed HiGHmed archetypes and templates for their FAIRness, the
FAIR Principles of Wilkinson et al. (Wilkinson, et al., 2016) are taken into account.
The FAIR Principles define further characteristics for the terms Findable, Accessible,
Interoperable and Re-usable. To make data findable, it must be equipped with a globally
unique, persistent identifier and enriched with extensive metadata. Data has to be reg-
istered or indexed in a searchable resource and the metadata needs to be specified by
the data identifier.
   In order to be accessible, data sets shall be retrievable by their unique identifier using
a standardized communication protocol that is open, free and universally implementa-
ble. Furthermore, the protocol has to allow an authentication and authorization proce-
dure, where it is necessary. Metadata should be accessible, even when the data is no
longer available. For data to be interoperable, data should use a formal, accessible,
shared and broadly applicable language for knowledge representation and use FAIR
compliant vocabularies. The reference between data and other (meta)data needs to be
included (Wilkinson, et al., 2016).
   To be re-usable, data must have a plurality of accurate and relevant attributes and
need to be released with a clear and accessible usage licence. Moreover, data have to
be associated with their provenance and meet the domain-relevant community stand-
ards.
   In scope of this examination, we assessed the archetypes and templates in their com-
pliance to the fifteen FAIR Principles in the HiGHmed CKM. They have been analysed,
regarding to the generally characteristic of an archetype or template such as header
information, concept name or reference information.


4      Results

   The following sections are divided into the four categories of the FAIR principles
(Findable, Accessible, Interoperable and Re-Useable). In the category “Findable”, four
out of four principles could be met in the HiGHmed CKM. However, in the category
“Accessible” there were only two of the four principles that are fulfilled and two prin-
ciples that are partially fulfilled in the HiGHmed project. Furthermore, four out of four
4


principles could be met in the category “Re-usable” and three out of three principles in
the category “Interoperable”.
   The subsequent paragraphs describe which criteria have contributed to fulfil the par-
ticular FAIR principles.


Findable

   The principle F1 is achieved by the Attribution Build Uid and Major Version ID.
The Build Uid is unique to the corresponding instance of the archetype and is initially
set during the creation of the archetype. It changes whenever the archetype is uploaded,
checked out or committed. Based on these two IDs, archetypes and templates can be
kept persistent at any time.
   Principle F2 is closely connected with principle R1. To fulfil the principle F2 arche-
types and templates provide multiple mandatory attributes. In addition to the Build Uids
and Major Version ID, attributes contain information on the archetype ID, the assigned
licence, the original author, the current custodian, and other contributors or translators.
All these aspects are also available in the XML representation of the archetype. Within
the HiGHmed project, the tool Archetype Editor from Ocean Informatics was used to
create archetypes. In the process of creating an archetype, a unique ID is assigned to
every archetype created. This ID is checked every time an archetype is uploaded to any
instances of the CKM, so that the ID of an archetype is ensured to be the same across
all CKMs. It is therefore possible to describe which object is referenced by means of
an ID.
   Archetypes and templates include identifier of the data they describe (Archetype ID
and Template ID). Therefore, it is possible for machines to identify an archetype or a
template without appropriate support (principle F3).
   Archetypes and templates are indexed with several keywords to make them search-
able across the HiGHmed CKM (principle F4.) We consider F4 therefore as fulfilled.


Accessible

   The HiGHmed clinical knowledge governance framework, as described in (Wulff,
et al., 2018), defines requirements to make archetypes and templates accessible within
the HiGHmed project. Hence, it is used as starting basis to investigate the accessibility
of archetypes and templates. The archetypes and templates are retrievable via the CKM
REST API, but the information of the server, which has to be requested, is not included
in the XML representation of archetypes nor templates. Principle A1 can therefore be
considered only as partially fulfilled.
   The openly documented CKM REST API (https://ckm.openehr.org/ckm/rest-doc/)
defines mechanisms to list selection of archetypes or update an archetype. It is also
possible to get a file set for all archetypes used in a template (principle A1.1).
   In addition, the CKM uses a role and rights management that supports authorization
and authentication (principle A1.2). To create and review archetypes and templates,
researchers and data stewards need appropriate roles so that special user-dependent
                                                                                       5


rights can be assigned. As part of the role and rights management of the CKM, a dif-
ferentiation can be made between system-wide roles, subdomain roles and project roles.
System-wide roles can create, change and delete ontology classes as well as release
sets. In addition, complete classification schemes can be created or deleted. The trans-
lation of classification schemes is also part of the system-wide roles. At subdomain
level, the user can only take over the above rights for one project (e.g. HiGHmed-spe-
cific). On project level (e.g. use case-specific), there are roles such as Editor or Re-
viewer. These have their rights only for the corresponding project and cannot create or
change archetypes or templates in any other projects.
    The long-term archiving of data and documents faces several challenges. It must be
ensured that data can be retrieved over a long period of time and that data cannot be
changed unnoticed. When an archetype or template is completely and irrevocably de-
leted in the CKM, all dependencies such as review rounds, comments, discussions,
change requests and history are removed. However, the deletion is only anticipated if
the archetype or template is no longer needed. It is preferred to set the status of an
archetype to “rejected” or “deprecated”. The status “deprecated” is used, when an ar-
chetype has already been published in advance, however, the “rejected” status is applied
for archetypes still in development. Both status allow that archetypes to be accessible
under the tab "checked-out resources” within the CKM. It should be noted, that the
metadata is not kept separate from the actual data and thus not kept independent of each
other. Therefore, we consider principle A2 as partially fulfilled.


Interoperable

   OpenEHR makes use of the Archetype Definition Language (ADL) to represent
knowledge formally, accessible, shared and broadly applicable (principle I1). ADL is
established and documented as an open, domain-relevant standard in the openEHR en-
vironment.1 In addition to the representation in ADL, archetypes and templates can also
be represented in XML format.
In addition, terminologies such as LOINC can also be integrated for each data item of
an archetype to be able to exchange data more easily and be interoperable according to
the FAIR Principles (principle I2).
   Within archetypes, cluster slots are defined, in which further archetypes can be ref-
erenced to nest other archetypes. Furthermore, templates mainly contain references to
various archetypes. Templates map context-specific documents in which archetypes are
referenced as representations of clinical information. The relationships between indi-
vidual archetypes within a template can be queried using the CKM REST API. The
service queries the ID of the template together with the archetypes that are required for
the template (principle I3).
   Complying with all three interoperability principles, openEHR archetypes and tem-
plates provide a FAIR basis for information exchange.


1 https://specifications.openehr.org/releases/AM/latest/ADL2.html
6


Re-Usable

   Every archetype or template in every CKM instance is licensed under the Creative
Commons Attribution-ShareAlike 3.0 License or higher, so that the (meta)data are re-
leased with a clear and accessible data usage license (R1.1).
   To keep data reusable, a detailed history of editing (creation, modification or dele-
tion) is of high importance. The time, initiator, content and logged processes are rec-
orded. This audit trail is stored in the CKM as the history of archetypes and templates
and can be viewed without user authentication. Information such as current status, date
of last changes and modifier can be pictured. The history also includes all previous
versions of archetypes and templates. Additionally, the earlier versions can be com-
pared to avoid inconsistencies. Data and workflow provenance within openEHR is
stored in elements from the openEHR reference model. The feeder audit class and
feeder audit details class describe the semantic content of an audit trail, for example,
the source system, feeder system, and other audit information transferred. Statements
about when, by who and where which information was provided, can be referenced to
an archetype (principle R1.2).
   Firstly, all developed HiGHmed archetypes and templates are fully compliant to the
openEHR approach. The openEHR approach serves as a relevant standard for the med-
ical community. The semantic framework defines specifications regarding the structur-
ing of the data to store them in a standardized way. Specifications and APIs of the
openEHR community can be viewed via the openEHR specifications2 and form the
basis of every development in openEHR. In addition, the HiGHmed project has its own
community guidelines regarding the translation of international archetypes. This en-
sures compliance with community standards within the project and within the openEHR
community (principle R1.3).


    Table 1. Overview of the FAIR Principles and their equivalent in openEHR archetypes and
                                          templates.

                                                                                 Degree of
FAIR Principles                     Equivalent in openEHR
                                                                                 compliance
F1. (meta)data are assigned a The Build UID and Major Version ID
globally unique and eternally acts as a globally unique and persistent           Fulfilled
persistent identifier.        identifier.
                             The Attribution tab of archetypes and
F2. data are described with
                             templates provides information about ad-
rich metadata (defined by R1                                          Fulfilled
                             ditional metadata such as authors, con-
below)
                             tributors and licenses.


2 https://specifications.openehr.org/
                                                                                            7


F3. Metadata clearly and ex-      The archetype UID and template ID are
plicitly include the identifier   contained as an attribution in archetypes    Fulfilled
of the data                       and templates.
F4. (meta)data are registered Archetypes and templates are indexed
or indexed in a searchable    via keywords in the Clinical Knowledge            Fulfilled
resource                      Manager.

A1. (meta)data are retrieva-      The archetypes and templates are retriev-
ble by their identifier using a   able via the CKM REST API, but the in- Partially
standardized communica-           formation of the server, which has to be fulfilled
tions protocol                    requested, is not included.
A1.1. the protocol is open,       Archetypes and templates can be derived
free and universally imple-       via the openly available CKM REST       Fulfilled
mentable                          API.
A1.2. the protocol allows for
an authentication and author- The HiGHmed CKM is access controlled
                                                                   Fulfilled
ization procedure, where      through a username and password.
necessary
                                  By setting the status of an archetype to
A2. metadata are accessible,
                                  DEPRECATED or REJECTED metadata Partially
even when the data are no
                                  is still retrievable, but not if an archetype fulfilled
longer available
                                  is deleted.

I1. (meta)data use a formal,
accessible, shared and       ADL and XML are used as formal syn-
                                                                               Fulfilled
broadly applicable language tax languages.
for knowledge representation
I2. (meta)data use vocabular-
                              The terminologies that are used include
ies that follow FAIR princi-                                                   Fulfilled
                              SNOMED-CT and LOINC.
ples
I3. (meta)data include quali-     Nesting of archetypes due to Slots and
fied references to other          within templates. Reference to FHIR re-      Fulfilled
(meta)data                        sources and IHE concepts can be set.

R1. meta(data) are richly de-
                              All attributes are displayed under the at-
scribed with a plurality of
                              tribution tab of the archetype in the      Fulfilled
accurate and relevant attrib-
                              HiGHmed Clinical Knowledge Manager.
utes
R1.1. (meta)data are released
                              The license used is stored under Attribu-
with a clear and accessible                                                    Fulfilled
                              tion tab of the Archetype.
data usage license
8


                                 Data Provenance is supported by Audit-
                                 Trailing and Use/Misuse information.
R1.2. (meta)data are associ-
                                 Workflow Provenance is managed via         Fulfilled
ated with their provenance
                                 feeder audit classes, which contain infor-
                                 mation about the workflow process.

R1.3. (meta)data meet do-        Clinical and technical reviewers check
main-relevant community          whether archetypes and templates com-       Fulfilled
standards                        ply with the Community Standards.


5      Discussion

The FAIR Principles are designed to make research data findable, accessible, interop-
erable and reusable for the public. As a part of the HiGHmed project, future research
data will be collected and stored by using openEHR archetypes and templates. There-
fore, the CKM is used as a library of clinical knowledge artefacts (archetypes and tem-
plates).
In the context of this work the archetypes and templates have been explored with regard
to the FAIR Principles. As a result, all 15 principles can be regarded as fulfilled or
partially fulfilled. Archetypes and templates have extraordinary strengths, especially in
the area of findability and interoperability. By using a clear ID management and mean-
ingful keywords, the required archetypes and templates can easily be found. Data mod-
els are subject to a formal syntax and are enriched with terminologies and metadata to
make them exchangeable across various institutions.
  Archetypes are closely linked to their metadata. When an archetype is irrevocably
deleted, the corresponding metadata will also be removed. Therefore, it is necessary to
set archetypes and templates as deprecated or rejected within their lifecycles so that the
metadata is still available. This procedure is part of the community guidelines in
openEHR. However, it would also be possible to set up a Git repository, in which all
archetypes are additionally stored. The HiGHmed CKM would always push the arche-
types into the repository during an update. The use of a Git repository could thus be-
come part of the Clinical Knowledge Governance Framework (Wulff, et al., 2018).
Furthermore, there are additional approaches to extend the data and workflow prove-
nance, which can be incorporated into the Clinical Knowledge Governance Framework.
As described in (Parciak, et al., 2018), w3c prov can be used, to improve the provenance
process including further information on provenance for every interaction.
   The FAIR Principles are not specified as a mandatory set of rules, but they can be
used to provide sustainable data management. However due to the high influence of
FAIR Principles in the scientific community, the FAIRmetrics.org working group has
developed a framework to objectively test digital objects for FAIRness. Therefore,
fourteen FAIR Metrics are created, based on the FAIR Principles. The FAIR Metrics
                                                                                        9


Framework is currently in a test phase and was therefore not part of this paper. Initia-
tives such as GO-FAIR3 evaluate and discuss the use of these metrics to test FAIRness
(Wilkinson, et al., 2018). After the successful test phase of FAIR Metrics, all archetypes
and templates created within the HiGHmed project will be tested using the FAIR Met-
rics Framework. It can be assumed that there will be a higher gradation of the degree
of compliance after the FAIR Metrics Framework application.
   In conclusion, it can be stated that openEHR supports the FAIRification process, as
described in the GO-Fair Initiative, with medical data. Using the semantic framework,
medical data becomes linkable and semantic-enriched. FAIR archetypes and templates
have an effect on the collected medical data. They offer the possibility to store data in
a structured way and link information on data provenance. This makes medical data
easy to find for future research questions as well as available for subsequent use.
   From the beginning, the data management of the HiGHmed project was aimed to
establish an infrastructure with findable, accessible, interoperable and reusable data in
a distributed environment. The FAIR compliance of the archetypes and templates used,
ensures that all clinical models are correct, understandable, available and complete.
They serve as the target schema for all data integration pipelines and are therefore the
basis for semantic interoperability.
   Based on the results of this work, we will investigate whether it is possible to per-
form an automated examination of medical data for FAIRness on the basis of the fea-
tured FAIR archetypes.


Acknowledgment

This work was supported by the German Federal Ministry of Education and Research
(BMBF) within the framework of the research and funding concepts of the Medical
Informatics Initiative (01ZZ1802B/HiGHmed).


References

   Bauer, C. R. et al., 2016. Architecture of a Biomedical Informatics Research Data
Management Pipeline. In: Volume 228: Exploring Complexity in Health: An
Interdisciplinary Systems Approach. s.l.:s.n., pp. 262-266.
   Beale, T., 2002. Archetypes: Constraint-based Domain Models for Future-proof
Informations Systems. OOPSLA workshop on behavioural semantics, p. 18.
   Brien, E. O. et al., 2001. Blood pressure measuring devices: recommendations of the
European Society of Hypertension. BMJ, pp. 531-536.
   Haarbrandt, B. e. a., 2018. HiGHmed – An Open Platform Approach to Enhance
Care and Research across Institutional Boundaries. Methods of Information in
Medicine, July, pp. e66-e81.


33 https://www.go-fair.org/
10


   Krefting, D., Loose, H. & Penzel, T., 2010. Employment of a Healthgrid for
Evaluation and Development of Polysomnographic Biosignal Processing Methods.
Conference proceedings : ... Annual International Conference of the IEEE Engineering
in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society.
Conference.
   Parciak, M. et al., 2018. PROV@TOS. a Java Wrapper To Capture Provenance for
Talend Open Studio Jobs.
   Wilkinson, M. D. et al., 2016. The FAIR Guiding Principles for scientific data
management and stewardship. Scientific Data, p. 4.
   Wilkinson, M. D. et al., 2018. Evaluating FAIR-Compliance Through an Objective,
Automated, Community-Governed Framework, s.l.: s.n.
   Wulff, A., 2019. A Report on Archetype Modelling in a Nationwide Data
Infrastructure Project. In: Studies in health technology and informatics (258). s.l.:IOS
Press, pp. 146-150.
   Wulff, A., Haarbrandt, B. & Marschollek, M., 2018. Clinical Knowledge
Governance Framework for Nationwide Data Infrastructure Projects. s.l.:s.n.