<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">OMOP-CDM mapping to RDF/OWL: Attempting to bridge the OHDSI ecosystem and the Semantic Web world</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Achilleas</forename><surname>Chytas</surname></persName>
							<email>achytas@certh.gr</email>
							<affiliation key="aff0">
								<orgName type="department">Centre for Research and Technology Hellas| Institute of Applied Biosciences</orgName>
								<address>
									<addrLine>6th km Charilaou-Thermi 570 01</addrLine>
									<settlement>Thessaloniki</settlement>
									<country key="GR">Greece</country>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="department">School of Informatics</orgName>
								<orgName type="institution">Aristotle University of Thessaloniki |</orgName>
								<address>
									<addrLine>Thessaloniki 541 24</addrLine>
									<settlement>Thessaloniki</settlement>
									<country key="GR">Greece</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Nick</forename><surname>Bassiliades</surname></persName>
							<affiliation key="aff1">
								<orgName type="department">School of Informatics</orgName>
								<orgName type="institution">Aristotle University of Thessaloniki |</orgName>
								<address>
									<addrLine>Thessaloniki 541 24</addrLine>
									<settlement>Thessaloniki</settlement>
									<country key="GR">Greece</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Pantelis</forename><surname>Natsiavas</surname></persName>
							<email>pnatsiavas@certh.gr</email>
							<affiliation key="aff0">
								<orgName type="department">Centre for Research and Technology Hellas| Institute of Applied Biosciences</orgName>
								<address>
									<addrLine>6th km Charilaou-Thermi 570 01</addrLine>
									<settlement>Thessaloniki</settlement>
									<country key="GR">Greece</country>
								</address>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff2">
								<orgName type="department">International SWAT4HCLS Conference</orgName>
								<address>
									<addrLine>February 26-29</addrLine>
									<postCode>2024</postCode>
									<settlement>Leiden</settlement>
									<country key="NL">The Netherlands</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">OMOP-CDM mapping to RDF/OWL: Attempting to bridge the OHDSI ecosystem and the Semantic Web world</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">590B9CFE0F2710A48A36CC136739D3F1</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T17:40+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>OMOP-CDM</term>
					<term>ETL</term>
					<term>Semantic Web</term>
					<term>Real-World Data</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Utilizing Real-World Data (RWD) for secondary use is still an open issue. Initiatives like OHDSI aim to tackle it by introducing a common data model (OMOP-CDM) to which data providers can opt to convert their data. While OMOP-CDM supports data interoperability and maintains a degree of intertwined terminologies/vocabularies, does not utilize the benefits of the Semantic Web technical paradigm. This paper presents an effort to convert the OMOP-CDM to RDF format to further enhance its linked data capabilities.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>OMOP-CDM (common data model) has been introduced and maintained by OHDSI aiming to support federated observational studies <ref type="bibr" target="#b0">[1]</ref> and is used as a common reference to harmonize data from heterogeneous real-world healthcare data (RWD) sources, including electronic health records (EHRs), administrative/insurance claims, etc. A CDM can facilitate large-scale analyses and the use of distributed data without the need to share data, as healthcare (HC) data sharing is a legally, ethically, and technically complex process. OMOP-CDM consists of patient data (e.g., demographics, diagnosis, laboratory results, vital signs, etc.) but also interlinked vocabularies/terminologies, such as SNOMED-CT, WHO-ATC, and RxNorm, to ensure consistency and interoperability across different data sources.</p><p>Numerous international initiatives support the OHDSI distributed data network upon OMOP-CDM -EHDEN has been funding the conversion to OMOP-CDM of 187 data sources across Europe. Notably, OMOP-CDM is the main reference data model for the European Medicines Agency DARWIN infrastructure and has been used for many observational studies, including cohort studies, comparative effectiveness studies, etc across large datasets containing potentially millions of records. Technically, OMOP-CDM is developed as a plain relational database model. It heavily relies on multiple hierarchical interconnected vocabularies and aims to support data interoperability, but it does not at all exploit the Semantic Web paradigm. While the Semantic Web stack could be used to provide a common language and standardized representation to support federated analysis of HC data, and even though ontologies and the RDF-based Knowledge Graphs (KGs) have been used to support HC data interoperability, still, the OMOP-CDM data model remains distant to the Semantic Web paradigm.</p><p>There have been attempts to use RDF-based knowledge structures to support activities related to the OHDSI ecosystem, e.g. LAERTES <ref type="bibr" target="#b1">[2]</ref> a knowledge base using RDF, or an effort to map the OMOP-CDM vocabularies to RDF <ref type="bibr" target="#b2">[3]</ref>. However, to the best of the authors' knowledge, there is no actively maintained full mapping of OMOP-CDM to RDF. This work presents an attempt to map OMOP-CDM to the RDF/OWL realm to bridge the gap between the world of OMOP-CDM and the Semantic Web ecosystem.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Methodology</head><p>R2RML is a language for expressing customized mappings from relational databases to RDF datasets <ref type="bibr" target="#b3">[4]</ref>. The R2RML mappings are RDF graphs in Turtle syntax and can be used to map the relational OMOP-CDM data tables and relevant RDF/OWL concepts.</p><p>MIMIC-IV (Medical Information Mart for Intensive Care IV) is a large, and available upon-request relational database that contains anonymized health data for over 40,000 Intensive Care Unit (ICU) patients <ref type="bibr" target="#b4">[5]</ref> that is commonly used for exploring research questions and testing HC algorithms. This dataset has been converted to OMOP-CDM format <ref type="bibr" target="#b5">[6]</ref> and it was used as the testbed dataset for the described data modelling conversion pipeline.</p><p>In general, each OMOP-CDM data table is mapped to a separate OWL class, while each table column corresponds to OWL properties:</p><p>1. Object properties: foreign keys from the initial source are mapped as object properties using a URI to link to a different individual 2. Data Properties: the majority of the numerical, string, date, etc fields from the initial source are mapped as Data Properties of the respective domain 3. Annotation Properties: fields that didn't fall in the previous categories and usually contain information like the initial Vocabulary that a term derived from, such as ATC or MedDRA Regarding validation, a set of querying scripts was created to compare the source data (MIMIC-IV data in relational OMOP-CDM format) with the target data (MIMIC-IV data in OWL/RDF format).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Discussion</head><p>Semantic-based ontologies are indispensable in HC for their role in promoting interoperability, supporting clinical and policy decision-making, while advancing medical research. As the HC industry, both applied and research, continues to evolve and embrace digital transformation, the adoption of semantic technologies is vital for unlocking the full potential of the collected RWD that can lead to direct improvements to patient outcomes and enhance the overall efficiency of HC systems.</p><p>A seamless transformation of the OMOP-CDM to a semantically enriched format means that all those sources can be easily converted to a format that benefits from capabilities provided by semantic knowledge modelling such as the ease of integration with other diverse data sources such as genetic profiling, signalling pathways, drug biochemistry, could lead to the identification of latent relationships and patterns, elevating the usage of RWD to a higher level.</p></div>		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<author>
			<persName><surname>Ohdsi</surname></persName>
		</author>
		<ptr target="https://books.google.gr/books?id=JxpnzQEACAAJ" />
		<title level="m">The Book of OHDSI: Observational Health Data Sciences and Informatics</title>
				<imprint>
			<publisher>OHDSI</publisher>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Large-scale adverse effects related to treatment evidence standardization (LAERTES): an open scalable system for linking pharmacovigilance evidence sources with clinical data</title>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">D</forename><surname>Boyce</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">A</forename><surname>Voss</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Huser</surname></persName>
		</author>
		<idno type="DOI">10.1186/s13326-017-0115-3</idno>
	</analytic>
	<monogr>
		<title level="j">Journal of Biomedical Semantics</title>
		<imprint>
			<biblScope unit="volume">8</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page">11</biblScope>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Fully connecting the Observational Health Data Science and Informatics (OHDSI) initiative with the world of linked open data</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">M</forename><surname>Banda</surname></persName>
		</author>
		<idno type="DOI">10.5808/GI.2019.17.2.e13</idno>
	</analytic>
	<monogr>
		<title level="j">Genomics Inform</title>
		<imprint>
			<biblScope unit="volume">17</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page">e13</biblScope>
			<date type="published" when="2019-06">Jun. 2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">R2rml: Rdb to rdf mapping language. W3c recommendation</title>
		<author>
			<persName><forename type="first">S</forename><surname>Das</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Sundara</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Cyganiak</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">World wide web consortium</title>
				<imprint>
			<date type="published" when="2012">2012</date>
			<biblScope unit="volume">9</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals</title>
		<author>
			<persName><forename type="first">A</forename><surname>Goldberger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Amaral</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Glass</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Hausdorff</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">C</forename><surname>Ivanov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Mark</surname></persName>
		</author>
		<author>
			<persName><forename type="first">.</forename><forename type="middle">.</forename><surname>Stanley</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">E</forename></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Circulation</title>
		<imprint>
			<biblScope unit="volume">101</biblScope>
			<biblScope unit="issue">23</biblScope>
			<biblScope unit="page" from="e215" to="e220" />
			<date type="published" when="2000">2000</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<author>
			<persName><forename type="first">M</forename><surname>Kallfelz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Tsvetkova</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Pollard</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Kwong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Lipori</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Huser</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Osborn</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Hao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Williams</surname></persName>
		</author>
		<idno type="DOI">10.13026/p1f5-7x35</idno>
		<ptr target="https://doi.org/10.13026/p1f5-7x35" />
		<title level="m">MIMIC-IV demo data in the OMOP Common Data Model</title>
				<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
	<note>version 0. PhysioNet</note>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
