<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>OMOP-CDM mapping to RDF/OWL: Attempting to bridge the OHDSI ecosystem and the Semantic Web world</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Achilleas Chytas</string-name>
          <email>achytas@certh.gr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nick Bassiliades</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pantelis Natsiavas</string-name>
          <email>pnatsiavas@certh.gr</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Aristotle University of Thessaloniki</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Centre for Research and Technology Hellas</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>School of Informatics</institution>
          ,
          <addr-line>Thessaloniki 541 24, Thessaloniki</addr-line>
          ,
          <country country="GR">Greece</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Utilizing Real-World Data (RWD) for secondary use is still an open issue. Initiatives like OHDSI aim to tackle it by introducing a common data model (OMOP-CDM) to which data providers can opt to convert their data. While OMOP-CDM supports data interoperability and maintains a degree of intertwined terminologies/vocabularies, does not utilize the benefits of the Semantic Web technical paradigm. This paper presents an effort to convert the OMOPCDM to RDF format to further enhance its linked data capabilities.</p>
      </abstract>
      <kwd-group>
        <kwd>1 OMOP-CDM</kwd>
        <kwd>ETL</kwd>
        <kwd>Semantic Web</kwd>
        <kwd>Real-World Data</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
    </sec>
    <sec id="sec-2">
      <title>2. Methodology</title>
      <p>
        R2RML is a language for expressing customized mappings from relational databases to RDF
datasets [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The R2RML mappings are RDF graphs in Turtle syntax and can be used to map the
relational OMOP-CDM data tables and relevant RDF/OWL concepts.
      </p>
      <p>
        MIMIC-IV (Medical Information Mart for Intensive Care IV) is a large, and available upon-request
relational database that contains anonymized health data for over 40,000 Intensive Care Unit (ICU)
patients [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] that is commonly used for exploring research questions and testing HC algorithms. This
dataset has been converted to OMOP-CDM format [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] and it was used as the testbed dataset for the
described data modelling conversion pipeline.
      </p>
      <p>In general, each OMOP-CDM data table is mapped to a separate OWL class, while each table
column corresponds to OWL properties:
1. Object properties: foreign keys from the initial source are mapped as object properties
using a URI to link to a different individual
2. Data Properties: the majority of the numerical, string, date, etc fields from the initial source
are mapped as Data Properties of the respective domain
3. Annotation Properties: fields that didn’t fall in the previous categories and usually contain
information like the initial Vocabulary that a term derived from, such as ATC or MedDRA
Regarding validation, a set of querying scripts was created to compare the source data (MIMIC-IV
data in relational OMOP-CDM format) with the target data (MIMIC-IV data in OWL/RDF format).</p>
    </sec>
    <sec id="sec-3">
      <title>3. Discussion</title>
      <p>Semantic-based ontologies are indispensable in HC for their role in promoting interoperability,
supporting clinical and policy decision-making, while advancing medical research. As the HC industry,
both applied and research, continues to evolve and embrace digital transformation, the adoption of
semantic technologies is vital for unlocking the full potential of the collected RWD that can lead to
direct improvements to patient outcomes and enhance the overall efficiency of HC systems.</p>
      <p>A seamless transformation of the OMOP-CDM to a semantically enriched format means that all
those sources can be easily converted to a format that benefits from capabilities provided by semantic
knowledge modelling such as the ease of integration with other diverse data sources such as genetic
profiling, signalling pathways, drug biochemistry, could lead to the identification of latent relationships
and patterns, elevating the usage of RWD to a higher level.
4. References</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <article-title>[1] OHDSI, The Book of OHDSI: Observational Health Data Sciences and Informatics</article-title>
          . OHDSI,
          <year>2019</year>
          . [Online]. Available: https://books.google.gr/books?id=JxpnzQEACAAJ
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Boyce</surname>
            <given-names>RD</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Voss</surname>
            <given-names>EA</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huser</surname>
            <given-names>V</given-names>
          </string-name>
          , et al.
          <article-title>Large-scale adverse effects related to treatment evidence standardization (LAERTES): an open scalable system for linking pharmacovigilance evidence sources with clinical data</article-title>
          .
          <source>Journal of Biomedical Semantics</source>
          .
          <year>2017</year>
          ;
          <volume>8</volume>
          (
          <issue>1</issue>
          ):
          <fpage>11</fpage>
          . doi:
          <volume>10</volume>
          .1186/s13326- 017-0115-3
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Banda</surname>
          </string-name>
          , “
          <article-title>Fully connecting the Observational Health Data Science and Informatics (OHDSI) initiative with the world of linked open data,” Genomics Inform</article-title>
          , vol.
          <volume>17</volume>
          , no.
          <issue>2</issue>
          , p.
          <fpage>e13</fpage>
          ,
          <string-name>
            <surname>Jun</surname>
          </string-name>
          .
          <year>2019</year>
          , doi: 10.5808/GI.
          <year>2019</year>
          .
          <volume>17</volume>
          .2.e13.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Das</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sundara</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Cyganiak</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          (
          <year>2012</year>
          ).
          <article-title>R2rml: Rdb to rdf mapping language</article-title>
          .
          <source>W3c recommendation. World wide web consortium</source>
          ,
          <volume>9</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Goldberger</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Amaral</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Glass</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hausdorff</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ivanov</surname>
            ,
            <given-names>P. C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mark</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , ... &amp;
          <string-name>
            <surname>Stanley</surname>
            ,
            <given-names>H. E.</given-names>
          </string-name>
          (
          <year>2000</year>
          ). PhysioBank, PhysioToolkit, and
          <article-title>PhysioNet: Components of a new research resource for complex physiologic signals</article-title>
          .
          <source>Circulation [Online]</source>
          .
          <volume>101</volume>
          (
          <issue>23</issue>
          ), pp.
          <fpage>e215</fpage>
          -
          <lpage>e220</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Kallfelz</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tsvetkova</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pollard</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kwong</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lipori</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huser</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Osborn</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hao</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Williams</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          (
          <year>2021</year>
          ).
          <article-title>MIMIC-IV demo data in the OMOP Common Data Model (version 0.9)</article-title>
          . PhysioNet. https://doi.org/10.13026/p1f5-7x35
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>