<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Clinical Ontology for Biomedical Research</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Núria Queralt-Rosinach</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>César H. Bernabé</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Qinqin Long</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rajaram Kaliyaperumal</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marco Roos</string-name>
          <email>m.roos@lumc.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Ontologies, FAIR, Patient Data, Biomedical Research</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Leiden University Medical Center</institution>
          ,
          <addr-line>Einthovenweg 20, 2333 ZC Leiden</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The application of ontologies to solve biomedical problems in hospitals is increasing. In the Leiden University Medical Centre (LUMC) physicians, clinical researchers, data managers and FAIR specialists are addressing the question of how to manage research data in the hospital to enable eficient research for patient care and treatment. We hypothesized to improve data integration based on machine readable ontology, i.e. improve the Interoperability aspect of FAIR. In this paper, we describe the development and evaluation of the LUMC Clinical Ontology for biomedical research in academic hospitals.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Motivation</title>
      <p>CEUR
on machine readable ontology, i.e. improve the Interoperability aspect of FAIR. We aimed at
developing an application ontology to integrate diferent data in the hospital. Our approach is
based on following knowledge-engineering best practices such as modularization, the reuse of
common biomedical ontology terms, design patterns and data models to be as FAIR as possible.
In this paper, we describe the development and evaluation of the LUMC Clinical Ontology or
LCO for biomedical research in academic hospitals.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Method</title>
      <sec id="sec-2-1">
        <title>2.1. Data Sources</title>
        <p>A workflow to create synthetic datasets of cytokine laboratory measurements was developed
to facilitate simulation of patient health data and avoid complications with the General Data
Protection Regulation (GDPR) too early in development. Regulators and scientists are still
adjusting to the European GDPR, a new data privacy and security regulation that is possibly
the most rigorous in the world. Though it was drafted and passed by the European Union, it
imposes obligations onto organizations anywhere, so long as they target or collect data related
to people in the EU. This synthesized dataset contains basic information related to clinical
observations, lab measurements and biosamples used per patient and time point. This data
comes from information systems to manage health data within the LUMC.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Ontology Development</title>
        <p>
          We developed an application ontology for data integration and management of FAIR research
data in the hospital. It is represented in OWL 2, a Semantic Web W3C recommended standard
to improve interoperability of patient data with the biomedical Semantic Web. We build
the ontology using Protégé 5.5.0 [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], which is a free, open-source editor and framework for
developing and maintaining ontologies.
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Evaluation with Competency Questions</title>
        <p>Following medical doctors’ research questions, we defined a set of Competency Questions (CQs)
to evaluate the ontology. These questions ranged from simple retrieval of metadata to more
sophisticated queries to analyse correlations in data.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Results</title>
      <sec id="sec-3-1">
        <title>3.1. Ontology Design</title>
        <p>
          We developed an OWL ontology following the EJP RD core semantic model [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], which is used
to represent common data elements in rare disease patient registries [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. The model is based
on a design pattern for measurements resulting from some process in the Semanticscience
Integrated Ontology (SIO) [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. It allows describing diferent types of measurements using a
common structure. SIO provides a simple, upper-middle level, integrated structure of types
and relations for rich descriptions of objects, processes and their attributes [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. It also provides
design patterns to help in defining a structure for common data types, and it is used in biomedical
Semantic Web projects such as DisGeNET [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. We designed the ontology by modules, one for
each semantic type: clinical observations, score calculations, lab measurements, and biosamples,
and we created a semantic model for each module. Moreover, we also modelled the LUMC
disease severity score, which is developed by clinical researchers in the LUMC to facilitate the
study of correlations for prediction. Each model shares the same fundamental structure reusing
the EJP RD core semantic model. We represent the ontology using the OWL Semantic Web
standard and we used Protégé to build it. The ontology is online1 and publicly available for
reuse in GitHub2.
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Evaluation</title>
        <p>
          We used the LUMC disease severity score phenotypes to answer medical doctors’ questions
such as ‘what are the clinical parameters that can predict the disease course of a patient?’. We
created a set of CQs to evaluate the ontology. Our evaluation results demonstrate that the
ontology enables answering sophisticated questions. CQs and results are accessible at [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ].
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Discussion</title>
      <p>
        Healthcare sufers from a data silo problem. In a typical healthcare-delivery organization such
as a hospital, several diferent information systems are used for data management of diferent
types: EHR (electronic health record), RIS (radiology information system), LIS (laboratory
information system) and HIE (health information exchange) to mention just a few. This creates
heterogeneity in clinical data structures (syntactic and semantic), and in using diferent tools
and software systems. We created an OWL ontology to facilitate meaningful integration of
diferent patient data within a hospital as well as to improve interoperability of clinical data for
queries across external Linked Open Data and other clinical datasets from other hospitals on
the Web. Beyond the scope of this paper is our work on applying ontologies (DCAT2 [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] and
extensions thereof) to also make the metadata of clinical data containers machine readable, and
thereby addressing F, A, and R principles of FAIR.
      </p>
      <p>
        The ontology is designed to integrate four semantic types: clinical observations, score
calculations, lab measurements and patient biosamples, and represented in OWL to enable
interoperability of clinical data, data sharing and reasoning. We used knowledge-engineering
good practices such as reuse of ontological terms, design patterns and modularization guidelines
to increase interoperability with biomedical Linked Data. The ontology is publicly accessible
at [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] and FAIRsharing3.
      </p>
      <p>The main challenge was to access clinical data for modelling the ontology. To preserve
the privacy of patients in the hospital, GDPR enforces compliance with a rigorous regulatory
framework. However, the intention of the GDPR is not to close of data, but to regulate access
and make access rules transparent. Therefore, we recommend setting up a FAIR data governance</p>
      <sec id="sec-4-1">
        <title>1https://lumc-biosemantics.github.io/beat-covid/docs/LUMC-Clinical-Ontology.html 2https://github.com/LUMC-BioSemantics/beat-covid/tree/master/fair-data-model/lumc-clinical-ontology 3https://fairsharing.org/bsg-s001616/</title>
        <p>
          policy for clinical data as soon as possible. FAIR facilitates a policy for making data as open
as possible and as closed as necessary [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. A machine readable representation of access rules
can be included with the metadata that describes the data resource, such as via a DCAT2-based
FAIR Data Point4. With this ontology we demonstrated how to increase interoperability of
clinical data, i.e. the “I” in the FAIR principles, independent of the imposed access rules. Other
ontologies already exist in the clinical domain that are applied in similar contexts such as
LOINC 5, SNOMED CT 6 and ICD 7. They are used to represent observations, medical terms
and diagnoses in clinical information models and constitute robust artifacts for data acquisition
and retrieval of data associated with these terms. The problem is that they are ontology-like
terminologies, i.e. term-centered and they represent linguistic entities and no semantic types.
With the creation of LCO based on logically defined representations and on reusing the same
biomedical design pattern to represent heterogeneous health data that can be applied in diverse
contexts from clinical measurements in hospitals to elements in patient registries, we aim to
support semantic interoperability, algorithmic reasoning and computable concepts for analysis
in multicentric clinical and biomedical research projects. We next envision to create FAIR data
‘in terms of’ the ontology and build knowledge-based applications in the hospital for analysis
and hypothesis generation.
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>N. Queralt-Rosinach, R. Kaliyaperumal, C. Bernabé, Q. Long, and M. Roos are supported by
funding from the European Union’s Horizon 2020 research and innovation program under the
EJP RD COFUND-EJP N° 825575. We would also like to thank to the EJP RD, the GO FAIR
VODAN, and the ZonMW Health Holland under the Trusted World of Corona, for supporting
the research on FAIR data that was reused here. We would like to acknowledge that work in
the BEAT-COVID project was partly funded by the Wake Up To Corona crowdfunding initiated
by the Leiden University Fund (LUF).</p>
      <sec id="sec-5-1">
        <title>4https://github.com/FAIRDataTeam/FAIRDataPoint-Spec 5https://loinc.org/ 6https://www.snomed.org/snomed-ct/five-step-briefing 7https://www.who.int/classifications/classification-of-diseases</title>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M. D.</given-names>
            <surname>Wilkinson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dumontier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. J.</given-names>
            <surname>Aalbersberg</surname>
          </string-name>
          , G. Appleton,
          <string-name>
            <given-names>M.</given-names>
            <surname>Axton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Baak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Blomberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-W.</given-names>
            <surname>Boiten</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. B. da Silva</given-names>
            <surname>Santos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. E.</given-names>
            <surname>Bourne</surname>
          </string-name>
          , et al.,
          <article-title>The FAIR guiding principles for scientific data management and stewardship</article-title>
          ,
          <source>Scientific data 3</source>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Musen</surname>
          </string-name>
          , et al,
          <article-title>The protégé project: A look back and a look forward</article-title>
          ,
          <source>AI Matters. Association of Computing Machinery Specific Interest Group in Artificial Intelligence</source>
          <volume>1</volume>
          (
          <year>2015</year>
          ). doi:
          <volume>10</volume>
          .1145/2557001.2757003.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <article-title>[3] EJP RD SIO core model graph</article-title>
          ,
          <year>2021</year>
          . URL: https://github.com
          <article-title>/ejp-rd-vp/ CDE-semantic-model/wiki/Core-model-SIO.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>R.</given-names>
            <surname>Kaliyaperumal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. D.</given-names>
            <surname>Wilkinson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Alarcón Moreno</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Benis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Cornet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. dos Santos</given-names>
            <surname>Vieira</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dumontier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. H.</given-names>
            <surname>Bernabé</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Jacobsen</surname>
          </string-name>
          ,
          <string-name>
            <surname>C. M. A. Le Cornec</surname>
            ,
            <given-names>M. P.</given-names>
          </string-name>
          <string-name>
            <surname>Godoy</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Queralt-Rosinach</surname>
            ,
            <given-names>L. J. Schultze</given-names>
          </string-name>
          <string-name>
            <surname>Kool</surname>
            ,
            <given-names>M. A.</given-names>
          </string-name>
          <string-name>
            <surname>Swertz</surname>
          </string-name>
          , P. van Damme,
          <string-name>
            <surname>K. J. van der Velde</surname>
          </string-name>
          , N. van
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Zhang</surname>
          </string-name>
          , M. Roos,
          <article-title>Semantic modelling of common data elements for rare disease registries, and a prototype workflow for their deployment over registry data</article-title>
          ,
          <source>medRxiv</source>
          (
          <year>2021</year>
          ). URL: https: //www.medrxiv.org/content/early/2021/07/30/
          <year>2021</year>
          .07.27.21261169. doi:
          <volume>10</volume>
          .1101/
          <year>2021</year>
          . 07.27.21261169. arXiv:https://www.medrxiv.org/content/early/2021/07/30/
          <year>2021</year>
          .07.27.21261169.full.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <article-title>[5] SIO DP measurements homepage</article-title>
          ,
          <year>2014</year>
          . URL: https://github.com/MaastrichtU-IDS/ semanticscience/wiki/DP-Measurements.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M.</given-names>
            <surname>Dumontier</surname>
          </string-name>
          , et al,
          <article-title>The Semanticscience Integrated Ontology (SIO) for biomedical research and knowledge discovery</article-title>
          ,
          <source>Journal of Biomedical Semantics</source>
          <volume>5</volume>
          (
          <year>2014</year>
          ). doi:
          <volume>10</volume>
          . 1186/2041-1480-5-14.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J.</given-names>
            <surname>Piñero</surname>
          </string-name>
          , et al,
          <article-title>The DisGeNET knowledge platform for disease genomics: 2019 update</article-title>
          ,
          <source>Nucleic Acids Research</source>
          <volume>48</volume>
          (
          <year>2020</year>
          )
          <fpage>D845</fpage>
          -
          <lpage>D855</lpage>
          . doi:
          <volume>10</volume>
          .1093/nar/gkz1021.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>The</surname>
            <given-names>LCO</given-names>
          </string-name>
          <source>evaluation homepage</source>
          ,
          <year>2021</year>
          . URL: https://github.com/LUMC-BioSemantics/ beat-covid/tree/master/fair
          <article-title>-data-model/lumc-clinical-ontology/competency-questions.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <issue>W3C</issue>
          ,
          <source>DCAT2 W3C Homepage</source>
          ,
          <year>2020</year>
          . URL: https://www.w3.org/TR/vocab-dcat-
          <volume>2</volume>
          /.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>The</surname>
            <given-names>LUMC</given-names>
          </string-name>
          <article-title>Clinical Ontology (LCO) owl file</article-title>
          ,
          <year>2021</year>
          . URL: https://github.com/ LUMC-BioSemantics/beat-covid/blob/master/fair
          <article-title>-data-model/lumc-clinical-ontology/ owl/lco</article-title>
          .owl.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>A.</given-names>
            <surname>Landi</surname>
          </string-name>
          , et al,
          <article-title>The “A” of FAIR - as open as possible, as closed as necessary</article-title>
          ,
          <source>Data Intelligence</source>
          <volume>2</volume>
          (
          <year>2020</year>
          )
          <fpage>47</fpage>
          -
          <lpage>55</lpage>
          . doi:
          <volume>10</volume>
          .1162/dint_a_
          <fpage>00027</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>