=Paper= {{Paper |id=Vol-3890/paper-32 |storemode=property |title=OMOP-CDM mapping to RDF/OWL: Attempting to bridge the OHDSI ecosystem and the Semantic Web world |pdfUrl=https://ceur-ws.org/Vol-3890/paper-32.pdf |volume=Vol-3890 }} ==OMOP-CDM mapping to RDF/OWL: Attempting to bridge the OHDSI ecosystem and the Semantic Web world== https://ceur-ws.org/Vol-3890/paper-32.pdf
                         OMOP-CDM mapping to RDF/OWL: Attempting to bridge the
                         OHDSI ecosystem and the Semantic Web world
                         Achilleas Chytas 1,2, Nick Bassiliades 2 and Pantelis Natsiavas 1
                         1
                           Centre for Research and Technology Hellas| Institute of Applied Biosciences, 6th km Charilaou-Thermi 570 01,
                         Thessaloniki, Greece
                         2
                           Aristotle University of Thessaloniki | School of Informatics, Thessaloniki 541 24, Thessaloniki, Greece

                                         Abstract
                                         Utilizing Real-World Data (RWD) for secondary use is still an open issue. Initiatives like
                                         OHDSI aim to tackle it by introducing a common data model (OMOP-CDM) to which data
                                         providers can opt to convert their data. While OMOP-CDM supports data interoperability and
                                         maintains a degree of intertwined terminologies/vocabularies, does not utilize the benefits of
                                         the Semantic Web technical paradigm. This paper presents an effort to convert the OMOP-
                                         CDM to RDF format to further enhance its linked data capabilities.

                                         Keywords 1
                                         OMOP-CDM, ETL, Semantic Web, Real-World Data

                         1. Introduction
                             OMOP-CDM (common data model) has been introduced and maintained by OHDSI aiming to
                         support federated observational studies [1] and is used as a common reference to harmonize data from
                         heterogeneous real-world healthcare data (RWD) sources, including electronic health records (EHRs),
                         administrative/insurance claims, etc. A CDM can facilitate large-scale analyses and the use of
                         distributed data without the need to share data, as healthcare (HC) data sharing is a legally, ethically,
                         and technically complex process. OMOP-CDM consists of patient data (e.g., demographics, diagnosis,
                         laboratory results, vital signs, etc.) but also interlinked vocabularies/terminologies, such as SNOMED-
                         CT, WHO-ATC, and RxNorm, to ensure consistency and interoperability across different data sources.
                             Numerous international initiatives support the OHDSI distributed data network upon OMOP-CDM
                         – EHDEN has been funding the conversion to OMOP-CDM of 187 data sources across Europe.
                         Notably, OMOP-CDM is the main reference data model for the European Medicines Agency DARWIN
                         infrastructure and has been used for many observational studies, including cohort studies, comparative
                         effectiveness studies, etc across large datasets containing potentially millions of records.
                         Technically, OMOP-CDM is developed as a plain relational database model. It heavily relies on
                         multiple hierarchical interconnected vocabularies and aims to support data interoperability, but it does
                         not at all exploit the Semantic Web paradigm. While the Semantic Web stack could be used to provide
                         a common language and standardized representation to support federated analysis of HC data, and even
                         though ontologies and the RDF-based Knowledge Graphs (KGs) have been used to support HC data
                         interoperability, still, the OMOP-CDM data model remains distant to the Semantic Web paradigm.
                             There have been attempts to use RDF-based knowledge structures to support activities related to the
                         OHDSI ecosystem, e.g. LAERTES [2] a knowledge base using RDF, or an effort to map the OMOP-
                         CDM vocabularies to RDF [3]. However, to the best of the authors’ knowledge, there is no actively
                         maintained full mapping of OMOP-CDM to RDF. This work presents an attempt to map OMOP-CDM
                         to the RDF/OWL realm to bridge the gap between the world of OMOP-CDM and the Semantic Web
                         ecosystem.


                         15th International SWAT4HCLS Conference, February 26-29, 2024, Leiden, The Netherlands
                         EMAIL: achytas@certh.gr (A. 1); nbassili@csd.auth.gr (A. 2); pnatsiavas@certh.gr (A.3)
                         ORCID: 0000-0001-8486-011X (A. 1); 0000-0001-6035-1038 (A. 2); 0000-0002-4061-9815 (A. 3)
                                      ©️ 2024 Copyright for this paper by its authors.
                                      Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
                                      CEUR Workshop Proceedings (CEUR-WS.org)


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
2. Methodology
    R2RML is a language for expressing customized mappings from relational databases to RDF
datasets [4]. The R2RML mappings are RDF graphs in Turtle syntax and can be used to map the
relational OMOP-CDM data tables and relevant RDF/OWL concepts.
    MIMIC-IV (Medical Information Mart for Intensive Care IV) is a large, and available upon-request
relational database that contains anonymized health data for over 40,000 Intensive Care Unit (ICU)
patients [5] that is commonly used for exploring research questions and testing HC algorithms. This
dataset has been converted to OMOP-CDM format [6] and it was used as the testbed dataset for the
described data modelling conversion pipeline.
    In general, each OMOP-CDM data table is mapped to a separate OWL class, while each table
column corresponds to OWL properties:
        1. Object properties: foreign keys from the initial source are mapped as object properties
            using a URI to link to a different individual
        2. Data Properties: the majority of the numerical, string, date, etc fields from the initial source
            are mapped as Data Properties of the respective domain
        3. Annotation Properties: fields that didn’t fall in the previous categories and usually contain
            information like the initial Vocabulary that a term derived from, such as ATC or MedDRA
    Regarding validation, a set of querying scripts was created to compare the source data (MIMIC-IV
data in relational OMOP-CDM format) with the target data (MIMIC-IV data in OWL/RDF format).

3. Discussion
   Semantic-based ontologies are indispensable in HC for their role in promoting interoperability,
supporting clinical and policy decision-making, while advancing medical research. As the HC industry,
both applied and research, continues to evolve and embrace digital transformation, the adoption of
semantic technologies is vital for unlocking the full potential of the collected RWD that can lead to
direct improvements to patient outcomes and enhance the overall efficiency of HC systems.
   A seamless transformation of the OMOP-CDM to a semantically enriched format means that all
those sources can be easily converted to a format that benefits from capabilities provided by semantic
knowledge modelling such as the ease of integration with other diverse data sources such as genetic
profiling, signalling pathways, drug biochemistry, could lead to the identification of latent relationships
and patterns, elevating the usage of RWD to a higher level.

4. References
[1] OHDSI, The Book of OHDSI: Observational Health Data Sciences and Informatics. OHDSI, 2019.
    [Online]. Available: https://books.google.gr/books?id=JxpnzQEACAAJ
[2] Boyce RD, Voss EA, Huser V, et al. Large-scale adverse effects related to treatment evidence
    standardization (LAERTES): an open scalable system for linking pharmacovigilance evidence
    sources with clinical data. Journal of Biomedical Semantics. 2017;8(1):11. doi:10.1186/s13326-
    017-0115-3
[3] J. M. Banda, “Fully connecting the Observational Health Data Science and Informatics (OHDSI)
    initiative with the world of linked open data,” Genomics Inform, vol. 17, no. 2, p. e13, Jun. 2019,
    doi: 10.5808/GI.2019.17.2.e13.
[4] Das, S., Sundara, S., & Cyganiak, R. (2012). R2rml: Rdb to rdf mapping language. W3c
    recommendation. World wide web consortium, 9.
[5] Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E.
    (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for
    complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.
[6] Kallfelz, M., Tsvetkova, A., Pollard, T., Kwong, M., Lipori, G., Huser, V., Osborn, J., Hao, S., &
    Williams, A. (2021). MIMIC-IV demo data in the OMOP Common Data Model (version 0.9).
    PhysioNet. https://doi.org/10.13026/p1f5-7x35