=Paper=
{{Paper
|id=Vol-3805/ICBO-2022_paper_1549
|storemode=property
|title=OMOP-2-OPMI: Ontologization of OMOP CDM Using OPMI to Support Clinical Data interoperability and analysis
|pdfUrl=https://ceur-ws.org/Vol-3805/ICBO-2022_paper_1549.pdf
|volume=Vol-3805
|authors=Long Tran,Yongqun He
|dblpUrl=https://dblp.org/rec/conf/icbo/TranH22
}}
==OMOP-2-OPMI: Ontologization of OMOP CDM Using OPMI to Support Clinical Data interoperability and analysis==
OMOP-2-OPMI: Ontologization of OMOP CDM using OPMI to
support clinical data interoperability and analysis
Long Tran 1, and Yongqun He 1
1
University of Michigan, Ann Arbor, MI, USA
Abstract
The OMOP Common Data Model (CDM) has been widely used as an open community data
standard in observational data integration and analysis. However, it still has its drawbacks
including weak semantics and interoperability with other CDMs. In this study, we report our
ontologization of the OMOP CDM elements and the semantic relations among the elements
using the Ontology of Precision Medicine and Investigation (OPMI). A total of 165 terms from
15 OMOP CDM tables has been mapped to OPMI, with 46 terms newly generated with OPMI
namespace and the other terms reported from OBO reference ontologies. An Omop2Opmi.owl
file was also generated by extracting the OMOP CDM related terms and relations from OPMI.
Three categories of use cases are reported, using the ontology-level OMOP CDM element
standardization and data integration, adverse event (AE) modeling, and COVID-19 clinical data
studies. Following the Ontology of Adverse Events (OAE) definition, we developed a
generalizable OMOP-AE model that transforms the OMOP data to systematically define,
identify, and analyze specific adverse events following some medical interventions that include
Drug/Device Exposure and Procedure Occurrence in OMOP. Overall, OMOP-2-OPMI
complements and empower OMOP CDM for enhanced clinical data standardization, sharing,
interoperability, and analysis.
Keywords 1
OMOP, Common Data Model, ontology, OPMI, adverse events, COVID-19.
model used by the National COVID Cohort
1. Introduction Collaborative (N3C, https://ncats.nih.gov/n3c).
As of May 2022, the N3C data enclave has stored
the records of 14 million persons, including over
The Observational Medical Outcomes
5 million COVID+ cases. Based on the N3C data
Partnership (OMOP) Common Data Model
use design, the COVID-19 clinical data
(CDM) is an open community data standard that warehouse data dictionary used in N3C is based
aims to allow for systematic analysis of disparate on OMOP CDM, and the other data formats need
observational databases [1]. With the CDM, the to be aligned with the OMOP CDM in order to be
data contained in those databases can be entered and used in the N3C data enclave.
transformed into a common format with a Therefore, the OMOP CDM has clearly played a
common representation. OMOP CDM has been significant role in the data standardization and
widely used to support the standardization of integration.
various electronic medical records (EMR) and Still the OMOP CDM has its own drawbacks
administrative claims within and outside the [2, 3]. One drawback is its weak semantics in that
United States. Billions of patient records have OMOP CDM does not provide robust semantic
been standardized using OMOP CDM. Recently,
relations among CDM elements. Basically, the
OMOP CDM has become an established data
OMOP CDM provides the schema structure of a
ICBO 2022, September 25-28, 2022, Ann Arbor, USA
EMAILs: longtr@umich.edu (A.1 ); yongqunh@med.umich.edu
(A. 2). ORCID: 0000-0002-5735-7540 (A. 1);
0000-0001-9189-9661 (A. 2)
2022 Copyright for this paper by its authors. Use permitted under Creative
Commons License Attribution 4.0 International (CC BY 4.0).
CEUR Workshop Proceedings (CEUR-WS.org)
©️
CEUR
Workshop
ceur-ws.org
ISSN 1613-0073
1
Proceedings
standardized a relational database that includes used to support the Kidney Precision Medicine
over 10 tables, which has an inherent weakness in Project [9, 10]. We have been using the OPMI to
terms of representing the relations among terms model and represent the core OMOP CDM
from different tables. As a result, the layout of elements and relations among the elements [3].
OMOP and how it is set up to document patients’ This manuscript reports our usage and
conditions could lead to ambiguities, inaccurate extension of the OPMI to ontologize the OMOP
representations and erroneous counting [2]. CDM elements and the relations among these
Another drawback is that OMOP CDM does not elements, and how such OMOP-2-OPMI
inherently provide systematic interoperability ontologization supports systematic clinical data
with other CDMs such as National Patient- interoperability, sharing, and integration.
Centered Clinical Research Network (PCORnet)
[4] and Clinical Data Interchange Standards 2. Methods
Consortium (CDISC) [5]. In the N3C data
integration, the COVID-19 data formulated with 2.1. OMOP CDM resource
other CDMs are required to be harmonized based used in the study
on OMOP CDM version 5.3 [6], which is
separately conducted and difficult to achieve The OMOP version 5.4 was used in our OPMI
robust interoperability and scalability. mapping. First, we obtained terms and their
Ontology can be a solution to solve the above annotations from the OMOP CDM version 5.4
drawbacks [3, 7]. In the 2018 OHDSI resource [12]. The Athena software program
Symposium, we proposed a strategy of (https://athena.ohdsi.org/) is the tool used to
ontological representation of the OMOP CDM search OMOP CDM terms and related terms from
using the OBO framework [3]. In addition to the OMOP-associated terminologies.
core OMOP CDM model, the OMOP system also
includes many standardized clinical terminologies
that can be used under the OMOP CDM 2.2. OMOP-2-OPMI
framework to collaboratively support development strategy
observational data standardization and
integration. In the 2020 OHDSI Symposium, The OPMI ontology is used as the default
Callahan et al. reports their development of the ontology platform for the ontology mapping and
OMOP2OBO, a health system-wide program of new term generation of the OMOP CDM elements
the integration and alignment between OMOP’s and semantic relations among the elements. In
standardized clinical terminologies and eight general, the eXtensible Ontology Development
OBO biomedical ontologies spanning diseases, (XOD) strategy [13], including the methods of
phenotypes, anatomical entities, cell types, ontology term reuse, semantic alignment,
organisms, chemicals, metabolites, hormones, ontology design pattern, and community
vaccines, and proteins [7]. As of the end of May extensibility, were used for the OPMI mapping.
2022, the OMOP2OBO mapping program has Specifically, all those OMOP CDM element terms
collected 92,367 OMOP Conditions, 8,615 Drug were first searched in Ontobee [14]. For those
Exposure ingredients, and 3,827 Measurements terms existing in reference OBO ontologies that
(10,673 measurement test results) terms [8]. map to the OMOP CDM elements, Ontofox [15]
OMOP2OBO allows its users to construct their was used to import those terms to OPMI (if the
own sets of omop2obo mappings. import has not been done before). For those
Among >100 ontologies in the Open OMOP elements that cannot be mapped to any
Biomedical Ontology (OBO) library, the OBO reference ontology, we generated new terms
Ontology of Precision Medicine and Investigation and defined them with OPMI namespace based on
(OPMI) is an ontology in the domain of precision specific ontology design patterns. The OPMI
medicine and investigation [9, 10]. Following the ontology editing was performed using Protege-
OBO ontology principles (e.g., openness and OWL editor [16], and the ontology reasoning was
collaboration, OPMI reuses many terms of conducted using the Hermit reasoner [17]. All the
existing reference ontologies and include many of terms are aligned under the upper-level Basic
its own terms in the field of clinical and Formal Ontology (BFO) [18]. Meanwhile, we
translational precision medicine, supporting non- have discussed our project design in different
redundant and interoperable ontology scenarios, and community feedback and
development [11]. OPMI has been developed and
2
comments were obtained to adjust our definitions adverse event model based on the OMOP CDM
and design. logic and available data formats. The third use
case is the usage of OMOP-2-OPMI to study N3C
2.3. Download and license COVID-19 related clinical data.
The OMOP-2-OPMI GitHub web page is: 3. Results
https://github.com/OPMI/OMOP-2-OPMI. The 3.1. General OMOP CDM
source code of the Omop2Opml.owl file is openly
available at this GitHub website for downloading.
ontologization architecture
The OWL file is generated primarily by extracting
the OMOP CDM-related terms and associated Figure 1 represents the hierarchical structure
relations from the OPMI using Ontofox [15]. of the OMOP-2-OPMI, which is the
Considering the usage of OPMI as the platform ontologization of the OMOP CDM using the
for the OMOP CDM mapping, the OMOP-2- OPMI as the ontology platform. Specifically, all
OPMI source page is designated as a repository the terms are aligned under the Basic Formal
under the general OPMI organization in GitHub. Ontology (BFO) [18], an ISO-approved upper
Meanwhile, the OMOP-2-OPMI repository level ontology [19]. BFO includes two branches:
has also stored related data files including our continuants and occurrents. Continuants cover
cleanup spreadsheets of the mapping details time-independent entities including material
available at: https://github.com/OPMI/OMOP-2- entities, quality, realizable entities such as
OPMI/tree/main/docs. disposition, and information content entities.
Occurrents are time-dependent entities including
temporal region and processes. All the OMOP
2.4. Use case studies CDM elements can be categorized under these
two categories (Figure 1). BFO has been used by
Three use cases are developed and discussed in over 300 ontologies. The alignment with BFO
this study. Specifically, the first use case is about allows us to integrate our ontology with the large
the OMOP data standardization and inference. number of other ontologies, supporting data
The second use case is the development of an interoperability.
Figure 1: OMOP-2-OPMI top level hierarchical structure and representative terms. Ontology names
are highlighted with different colors. Ontology-mapped OMOP terms are also provided.
(i.e., visit/condition/procedure occurrences, and
Figure 2 is a simplified high level OMOP-2- drug/device exposure) and the observation
OPMI ontology design pattern (ODP) that covers process, which are all under BFO:process (Figure
the major elements in 11 OMOP tables. 1). The observation happens during a specific
Specifically, the person (usually here it refers to observation period. The person is also the target
patient in OMOP) is centric to the ODP. The of measurement. A specimen derives from some
person participates in five medical occurrences organ or tissue of the person. The person has
3
different phenotypes, and death is a specific 1 lists ontology mapped CDM element terms from
phenotype (Figure 2). 10 representative OMOP tables.
Our current mapping primarily covers those
clinical data tables and health system data tables.
We have not yet included the Metadata Tables,
Vocabulary Tables, Standardized Derived Tables
except for Episode, and the Cost table which
belongs in the Health Economics Data Tables
category. These missing tables do not directly
involve clinical investigation, which is our current
focus. Also as shown in Table 2, many terms are
not mapped to ontology. Most of these missing
terms are various “source value” or source
concept ID terms. Throughout OMOP CDM,
Figure 2: General ontology design pattern that there are similar terms representing various
links CDM elements from 11 OMOP tables. Note source concepts and source values. In the OMOP
one box covers five OMOP occurrence/exposure structure, a source concept set organizes terms
tables. Mapped ontology terms are also labeled. into groups called source value sets. A value set
(e.g., ‘procedure_source_value’) is a set of codes
whose context and usage are defined by one or
3.2. OMOP-2-OPMI statistics more code systems in which the clinical data came
from. However, the organization of value sets is
A total of 165 terms from 15 OMOP CDM not often ontology-based. In most cases, we have
tables has been mapped to OPMI, with 46 terms decided to not incorporate terms for “source
newly generated with OPMI namespace and the concept” and “source value” sets until we figure
other terms reported from OBO reference out a place for these terms to make sense
ontologies. In addition to the 11 tables listed in ontologically within OPMI. In our ontologization,
Figure 2, the other four tables are Care Site, Payer we have also included specific source value terms
Plan Period, Episode, and Location, which are not as seen in Table 1 and detailed later in the
included in Figure 2 to simplify that figure. Table manuscript.
Table 1. CDM terms from 10 representative OMOP tables mapped to OPMI
Selected OMOP Mapped Mapped Ontology Term Examples
tables OMOP terms
PERSON 13/19* person ID (OPMI_0000470), gender (PATO_0001894), year of birth
(OPMI_0000473), race (NCIT_C17049)
PROVIDER 9/13 care provider (OPMI_0000163), National Provider Identifier (OPMI_0000503),
DEA identifier (OPMI_0000504)
SPECIMEN 6/15 specimen ID (OBI_0001616), date of specimen collection (OBIB_0000714),
anatomical structure (UBERON_0000061)
VISIT 26/17 visit occurrence (OPMI_0000482), visit start date (OPMI_0000487), preceding
OCCURRENCE visit occurrence (OPMI_0000492)
PROCEDURE 13/16 procedure (NCIT_C25218), procedure start date (OPMI_0000508), procedure
OCCURRENCE end date (OPMI_0000510)
DRUG 18/23 drug exposure (OPMI_0000572), drug product (DRON_00000005) drug
EXPOSURE exposure start time (OPMI_0000565)
CONDITION 38/16 condition occurrence (OPMI_0000527), medical condition status
OCCURRENCE (OPMI_0000533), admission diagnosis status (OPMI_0000542)
DEVICE 7/15 device exposure (OPMI_0000554), device (OBI_0000968), device exposure
4
EXPOSURE start date (OPMI_0000562)
MEASUREMENT 11/20 clinical measurement identifier (OPMI_0000582), measurement time
(OPMI_0000579), measurement unit label (IAO_0000003)
OBSERVATION 5/6 observation period start date (OPMI_0000577),
PERIOD observation period end date (OPMI_0000578),
Note: *13/19 represents that 13 out of 19 OMOP CDM terms in the specific category have been mapped to terms
in the OPMI ontology. The unmapped terms are primarily those terms related to “source value”. More terms in
the visit/condition occurrences are mapped because some specific source value terms are ontologized.
In addition to source values or source concept
IDs, there are also many terms in OMOP CDM
not yet ontologized. The reasons of such
imcompleteness include the lack of necessity of
many terms, and the complexity of many other
terms in terms of ontology modeling. We will
continue this work later, ideally by involving
more collaboration and discussion with the
ontology and clinical informatics communities.
Table 2. Ontology mapping of OMOP CDM terms
by element types
OMOP OMOP percent
types
terms mapped mapped
Figure 3: Modeling of 5 medical occurrence
_id 23 19 82.61% categories and 11 specific visit occurrences.
_date 34 27 79.41%
_concept_id 41 29 70.73% In two of the five OMOP tables, Visit
Occurrence and Condition Occurrence, in
_concept_name 30 16 53.33%
addition to mapping the elements in original
_source_concept_id 17 1 5.88% tables (Table 1), we also added some terms from
_source_value 34 1 2.94% the supporting OMOP vocabularies for
Total 179 93 51.96% developing a complete semantic model. In the
case of Visit Occurrence, the extra terms are due
to the ontologization of 11 types of visit
Next we will focus on a few major ontology
occurrences (e.g., ‘emergency room visit’, ‘home
modeling topics to show how we model and
visit’) that are originally not defined in OMOP’s
ontologize the OMOP CDM elements.
CDM model and instead are from the supporting
OMOP vocabularies identified on the Athena
3.3. Ontologization of OMOP program. We have ontologized such terms under
medical occurrences ‘visit occurrence’ (OPMI) (Figure 3). These terms
represent the overarching types of encounters
By examining the OMOP CDM elements, we between a person and the healthcare system,
found that five OMOP tables can be categorized which are adopted in most healthcare systems
under an ontology class called ‘medical worldwide.
occurrence’, which is defined as a process event In the case of Condition Occurrence, the extra
that a patient experiences over a period of time 22 terms come from the incorporation of medical
(Figure 3). These five OMOP tables are: condition statuses (e.g., ‘admission diagnosis’,
‘cause of death’, and ‘confirmed diagnosis’),
‘condition occurrence’, ‘device exposure’, ‘drug
exposure’, ‘procedure occurrence’, and ‘visit which were defined by OMOP and searchable in
occurrence’ (Figure 3). Athena. In OMOP, a medical condition status
denotes the stages of a patient’s diagnosis, not the
actual state of the disease by itself. OPMI
represents these medical condition statuses in two
5
strategies. First, OPMI includes a term called current status of the patient at a specific stage. For
‘medical condition status’ under the ‘status’ term, example, ‘admission diagnosis status’ represents
which is a subclass of BFO:‘realizable entity’. In the status at which a person is diagnosed at the
this classification, a medical diagnosis status, admission stage. On the other hand, as the data
such as admission diagnosis, represents a patient item, the ‘admission diagnosis’ indicates the
diagnosis status such as the status of diagnosis at conclusion or outcome of the diagnosis process at
the time when the patient is admitted to the the stage of patient admission. A diagnosis
hospital. conclusion made at the admission or discharge
We have also adopted the OGMS:diagnosis stage may be the same or different.
classification and defines various diagnosis types Meanwhile, the diagnosis clinical data type vs
under the OGMS:diagnosis (Figure 4). According the diagnosis medical condition status are closely
to the Ontology for General Medical Science related. In OPMI, we propose to generate a
(OGMS), diagnosis (OGMS_0000073) is a relation term called ‘has status content’, which
subclass of clinical data item and represents the represents a relation between a status and an
conclusion of a diagnostic process. Based on the information content entity where the status has its
OMOP classification, OPMI has defined different content information defined by the information
categories of diagnosis, including ‘admission content entity. For example, we can define an
diagnosis’, ‘primary diagnosis’, ‘secondary axiom that links a diagnosis status to a diagnosis
diagnosis’, and ‘death diagnosis’, etc. (Figure 4). data item:
These specific diagnosis types are commonly ‘admission diagnosis status’: ‘has status
used at the clinical setting. The classification of content’ some ‘admission diagnosis’
these diagnosis types facilitates the clinical data However, such duplicated representation may
annotations. not be needed. It is possible to just define
‘admission diagnosis status’ and remove the term
‘admission diagnosis’. We will examine more use
cases and discuss with the ontology and medical
informatics communities on this regard.
3.4. Ontologization of
temporal date/time in OMOP
To ontologically represent various entities
denoting time that can be found throughout
OMOP, we have mapped 24 temporal terms from
6 tables. The OMOP tables that have temporal
terms ontologized are Visit Occurrence, Device
Exposure, Drug Exposure, Procedure Occurrence,
Condition Occurrence, and Person. For all tables
but Person, the entities are ontologized with
temporal terms for -start date, -start datetime, -end
Figure 4: Modeling of different medical diagnosis date, and -end datetime. Meanwhile, temporal
under the OGMS:diagnosis, which is a subclass of terms related to the Person table are instead
clinical data item. ontologized with more familiar terms which are
‘birth datetime’, ‘day of birth’, ‘month of birth’,
As OPMI separates diagnosis clinical data and ‘year of birth’. All temporal terms are
type vs the diagnosis medical condition status, we grouped under a higher level term for a better
can define different diagnoses and diagnosis organizational purpose (e.g., ‘visit start
statuses. For example, ‘discharge diagnosis date/datetime’, ‘end date/datetime’ are all
status’, ‘referral diagnosis status’, and ‘admission grouped under ‘visit temporal region’) (Figure 5).
diagnosis status’ are realizable entities, and
‘discharge diagnosis, ‘referral diagnosis status’,
and ‘admission diagnosis’ are data items. The
main benefit of separate representation of status
and data is the semantic separation and clarity.
The medical condition status represents the
6
In OPMI, type_concepts are mapped as
various terms under ‘provenance of record’, a
class under ‘information content entity’. So far,
we have generated 12 terms for the provenance of
records for 12 corresponding entities of OMOP
CDM tables. The provenance of records is
dedicated for each corresponding OMOP entity
since the sources of the entries can vary across
different fields.
Meanwhile, OPMI also defines most of the
records for the OMOP provenance purposes under
‘electronic health record’, such as ‘electronic
medical visit record,’ ‘electronic death record,’
‘electronic device record,’ etc. (Figure 6). The
users can choose the usage of these electronic
health records as the sources of the data collected
to the OMOP database. Note that not all the
Figure 5: OPMI modeling of date and time used provenance records are electronic health records
in OMOP CDM. (EHR). For example, in addition to the record
from an EHR system, the measurement record
might also come from an insurance claim,
3.5. Ontologization of entity registry, or other sources.
identifiers in OMOP CDM
In OMOP, fields with the suffix “_id_”
usually denote identifiers, which function as
primary keys in their respective OMOP tables
along with other supporting entities (e.g.,
person_id in Person table). These identifiers can
also be used as foreign keys to connect other
related OMOP tables (e.g., person_id to connect
Provider and Care Site tables).
OPMI has ontologized OMOP CDM related
identifiers under the class of ‘centrally registered
identifier’, a subclass under ‘information content
entity’. Example identifiers defined include Figure 6: OPMI modeling of different records
‘person ID’, ‘care site identifier’, ‘clinical used as data provenance in OMOP CDM.
measurement identifier’, ‘DEA identifier’ and Next, we will focus on the description of three
‘National Provider Identifier’. These identifiers use cases of the OMOP-2-OPMI approach.
identify assets belonging to different but centrally
registered local databases. 3.7. Use case 1: Ontology-
level data standardization
3.6. Ontologization of
provenance records in OMOP The first use case is rooted in the nature of
ontology. As an open access ontology following
In OMOP, most entities from various tables the OBO ontology development principles,
have their own “type_concept” terms, which OMOP-2-OPMI provides the standard
indicate the provenance, or the source of the representation and definitions of the OMOP CDM
record in which it comes from. For instance, drug mapped terms and the axioms among these terms.
exposure entries could come from either The OMOP-2-OPMI ontology terms can be used
prescriptions list or self-reported by patients, the to support standardized clinical data
provenance of which can differ from a patient’s representation and annotation. The semantic
measurement records. relations among the OMOP CDM terms and their
associated other terms provide solid semantic
7
associations, which addresses the OMOP CDM conditions of different phenotypes that are the
drawback of weak semantics. outcomes of specific adverse events (Figure 7).
The ontologized terms are also interoperable.
For example, the Coronavirus Infectious Disease
Ontology (CIDO), a biomedical ontology in the
domain of coronavirus diseases [20], has imported
the OMOP-2-OPMI ontology contents. The
contents of OMOP-2-OPMI fit seamlessly with
the other CIDO contents, providing another
demonstration of the ontology-supported
knowledge and data interoperability, sharing, and
integration. It is also possible to use the some
ontology terms for mapping to the other CDMs
such as PCORnet [4] and CDISC [5], which will
be explored in the future.
Such interoperable ontology representation
also supports data and knowledge inferencing. Figure 7: General OMOP-AE model based on
This is also rooted from the nature of ontology. OMOP-2-OPMI. The red boxes represent OMOP
The following two other use cases provide such tables and their mapped ontology terms. The
demonstrations.
black boxes are added ontology representation
to fill up the gaps for adverse event modeling. *,
3.8. Use case 2: Adverse OMOP uses SMOMED-CT concepts for disease or
event modeling and analysis symptom representation. These can be mapped
to Human Phenotype Ontology (HP) terms.
Another use case of the OMOP CDM
ontologization is the modeling of adverse events Our original OPMI conference proceeding
(AEs) post medical intervention. The OMOP paper presented a use case study of identifying
CDM does not include AE per se. However, by and analyzing the acute kidney injury (AKI) AE
specific modeling, we can find the OMOP CDM following heart surgery [9]. Using OHDSI data
data can be processed to support specific AE provided by the IQVIA Pharmetric Plus database,
identification and analysis. our OHDSI cohort study identified a total of
Figure 7 is a general OMOP-AE ontology 15,548 patients that fulfilled our predefined model
design pattern, which follows the AE definition of AKI AE following heart surgery. Specific
by the Ontology of Adverse Events (OAE) [21]. patterns were identified. For example, 72% of the
According to the OAE, an adverse event (AE) is a identified patients were male and 28% were
pathological bodily process that occurs following female patients. Over 78% of these AE cases
some medical intervention [21]. In order to model occurred in patients aged greater than 55 years
AEs with OMOP data, we need to identify the old. Many phenotypes, such as coronary
medical intervention vs. adverse events to be arteriosclerosis, kidney disease, pain, dyspnea,
mapped in OMOP. By examining all the five hyperlipidemia, and Type II diabetes, were found
medical occurrence types defined in OMOP, only in these patients as well [9].
three of them are considered as medical Our OMOP AE model is a very general model
interventions: Drug Exposure, Device Exposure, in that it can be used to study specific adverse
and Procedure Occurrence (e.g., surgical event profiles following various medical
procedure). Vaccination can be considered as a interventions including different drug/medicine
special drug exposure. exposure and procedure occurrence. We are
Note that the visit occurrence and condition currently applying such a strategy to design a
occurrence are regarded as natural occurrence pattern for identifying and analyzing the vaccine
events without medical intervention. Based on the and drug AEs in COVID-19 patients using the
AE definition, contracting a natural infection is N3C data. Note that if a patient contracted
not an AE since the patient does not receive an COVID-19 in a natural environment, the patient
adverse outcome after a medical intervention. has a condition, which is not an adverse event
However, the condition occurrence may include (because an AE is always associated with medical
intervention). However, the occurrence of new
8
phenotypes after medical treatment on these The above two studies are currently ongoing
COVID-19 patients are considered AEs. and we expect to have more specific results
available in near future.
3.9. Use case 3: COVID-19
clinical data standardization, 4. Discussion
modeling, and analysis This manuscript has made two main
contributions. First, we report our systematic
In addition to the import of the OMOP-2- survey and ontologization of the OMOP CDM
OPMI to CIDO and the study of COVID-19 elements using the OPMI ontology. The Omop-
associated AE modeling and analysis as described 2Opmi.owl file is the OWL file that includes only
above, we are also applying the OMOP-2-OPMI the OMOP CDM-related ontology terms, their
for more COVID-19 clinical data modeling and directly associated terms (e.g., their parent terms),
analysis. Two data resources for our OMOP-2- and the semantic relations between these terms
OPMI based studies are the literature reports and that are presented as ontology axioms. Second, we
N3C clinical data. presented three categories of use cases of our
One specific use case is the study of the OMOP CDM ontologization, including ontology-
relation between the COVID-19 infection and the level OMOP CDM element standardization and
increased risk for kidney diseases. For example, inferencing, adverse event modeling and analysis,
acute kidney injury (AKI) is a significant and COVID-19 clinical data studies. Overall, our
complication of COVID-19. The incidence of systematic ontologization of the OMOP CDM
AKI in hospitalized patients varies from 0.5% to complements and empowers the OMOP CDM
75%. The mortality rate for patients with kidney system, providing a new way of supporting
disease is also significantly higher than the systematic clinical data interoperability, sharing,
general infected population. However, the big and integration.
variation of AKI incidence in COVID-19 patients A similar and related system is OMOP2OBO,
appears to depend on many factors such as race, a systematic mapping tool that maps OMOP
region, and disease severity. The N3C cohort data related terms to OBO ontologies [7]. The terms
is being used to detect, compare, and analyze the mapped in OMOP2OBO cover 8 OBO ontologies,
occurrences of kidney disease following COVID- including Cell Ontology (CL), ChEBI chemical
19 infection. The OMOP-2-OPMI model, entity ontology, Human Phenotype Ontology
together with the OMOP2OBO, can be used to (HP), MONDO disease Ontology, NCBI
support data modeling, integration, and analysis. Taxonomy Ontology (NCBITaxon), Protein
The integrated data can also be further used for Ontology (PR), Uberon anatomy ontology, and
machine learning tool development for kidney Vaccine Ontology (VO). While OMOP2OBO
disease prediction following COVID-19 includes the mapping of over 100,000 terms in the
prediction. We have registered for an N3C OMOP terminology system, it does not cover the
program to perform related research. OMOP CDM elements in the over 10 basic
Another use case in this category is the OMOP tables. Instead, OMOP-2-OPMI focuses
application of OMOP-2-OPMI and CIDO for on the core OMOP CDM level mapping and
secondary literature data analysis and knowledge representation. In addition to ontology term
representation. There have been a big number of mapping, since many high level terms in OMOP
COVID-19 studies reported in the literature, many CDM are not yet represented in OBO ontologies,
of which involve the usage of OMOP CDM we have taken extensive effort to generate many
model. For example, one study examined the new terms in OPMI. We have also generated
association between immune dysfunction and ontological relations among these OMOP CDM
COVID-19 breakthrough infection after SARS- elements using the OPMI ontology platform.
CoV-2 vaccination in the US using N3C data [22]. Overall, OMOP2OBO and OMOP-2-OPMI are
The N3C data and the results out of the data complementary in that they map and integrate
analysis can both be modeled, annotated, and OMOP data from different aspects.
represented using ontology including our OMOP- There are still many issues to consider in our
2-OPMI and CIDO. ontologization. For example, we presented two
types of methods for representing medical
condition statuses and two types of methods of
9
representing provenance records in our work. Center Joint Institute for Clinical and
Since most medical condition statuses are Translational Research (U072807). We appreciate
different types of diagnosis, such status the discussion and comments from the ontology
representations can be defined under “status”, and OMOP societies including Dr. Asiyah Yu Lin
which is defined as a BFO:‘realizable entity’, or and Dr. Andrew Williams.
under OGMS:diagnosis, which is basically a type
of clinical data item. Similarly, for the provenance 6. References
records, they can be represented under
provenance itself or under electronic health
record. The ICBO-2022 conference will provide [1] E. A. Voss, R. Makadia, A. Matcho, Q.
us a discussion platform to discuss the pros and Ma, C. Knoll, M. Schuemie, et al.,
cons of different representation styles. "Feasibility and utility of applications of
Several use cases are introduced in this article. the common data model to multiple,
We demonstrated the development of a new disparate observational health databases,"
OMOP-based adverse event model based on the J Am Med Inform Assoc, vol. 22, pp. 553-
OMOP CDM data structure. Such an OMOP-AE 64, May 2015.
[2] W. Ceusters and J. Blaisure, "A Realism-
model can be used to support various specific AE
Based View on Counts in OMOP's
studies, including the modeling of adverse event
Common Data Model," Stud Health
cases post COVID-19 vaccination (or drug
admin) using N3C data. In addition to the AKI AE Technol Inform, vol. 237, pp. 55-62,
study following heart surgery [9], we are currently 2017.
applying the OMOP AE model for more COVID- [3] Y. He, E. Ong, and J. Zheng, "Ontological
19 related AE studies. Furthermore, we can representation of OMOP CDM using the
OBO framework," presented at the 2018
develop new models to apply OMOP CDM to
study other topics such as long COVID and the OHDSI Symposium, Bethesda North
effects of different variables to the disease Marriott, Bethesda, MD, 2018.
outcomes. [4] F. S. Collins, K. L. Hudson, J. P. Briggs,
One future project is to map the CDM terms and M. S. Lauer, "PCORnet: turning a
from other systems, including PCORnet [4] and dream into reality," J Am Med Inform
Assoc, vol. 21, pp. 576-7, Jul-Aug 2014.
CDISC [5], to the OPMI ontology using the same
[5] S. Hume, J. Aerts, S. Sarnikar, and V.
OMOP-2-OPMI development strategy. These
different CDMs are overlapped. For example, Huser, "Current applications and future
There are similarities between the organizations directions for the CDISC Operational
of OMOP and PCORnet CDMs, evidenced by the Data Model standard: A methodological
overlaps of certain tables such as Demographic, review," J Biomed Inform, vol. 60, pp.
352-62, Apr 2016.
Procedures, or Condition [23]. When all these
CDM elements and relations are mapped to the [6] COVID-19 Clinical Data Warehouse
same OPMI structure, we can integrate all the data Data Dictionary Based on OMOP
using different CDMs, leading to compatible and Common Data Model Specifications
Version 5.3 Available:
interoperable clinical and observational data
standardization and integration. A recent study https://ncats.nih.gov/files/OMOP_CDM_
reports the development of an ETL tool for COVID.pdf
converting the PCORnet CDM into OMOP CDM [7] T. J. Callahan, J. M. Wyrwa, N. A.
to facilitate the COVID-19 data integration [24]. Vasilevsky, and P. N. Robinson,
It is possible to apply our ontology approach to "OMOP2OBO: Semantic Integration of
enhance such an ETL tool. Standardized Clinical Terminologies to
Power Translational Digital Medicine
Across Health Systems," in 2020 OHDSI
5. Acknowledgements Symposium, Virtual meeting, 2022.
[8] T. J. Callahan. (2022). OMOP2OBO.
We acknowledge the Kidney Precision Available:
Medicine Project (KPMP) project supported by https://github.com/callahantiff/OMOP2O
NIH-NIDDK grant: 1U2CDK114886, and a BO
COVID-19 research grant from the Michigan [9] Y. He, E. Ong, J. Schaub, F. Dowd, J. F.
Medicine–Peking University Health Sciences O'Toole, A. Siapos, et al., "OPMI: the
10
Ontology of Precision Medicine and [20] Y. He, H. Yu, E. Ong, Y. Wang, Y. Liu,
Investigation and its support for clinical A. Huffman, et al., "CIDO, a community-
data and metadata representation and based ontology for coronavirus disease
analysis," in The 10th International knowledge and data integration, sharing,
Conference on Biomedical Ontology and analysis," Sci Data, vol. 7, p. 181, Jun
(ICBO-2019), July 30 - August 2, Buffalo, 12 2020.
NY, USA., 2019, pp. 1-10. [21] Y. He, S. Sarntivijai, Y. Lin, Z. Xiang, A.
[10] E. Ong, L. L. Wang, J. Schaub, J. F. Guo, S. Zhang, et al., "OAE: The
O'Toole, B. Steck, A. Z. Rosenberg, et al., Ontology of Adverse Events," J Biomed
"Modelling kidney disease using Semantics, vol. 5, p. 29, 2014.
ontology: insights from the Kidney [22] J. Sun, Q. Zheng, V. Madhira, A. L. Olex,
Precision Medicine Project," Nat Rev A. J. Anzalone, A. Vinson, et al.,
Nephrol, vol. 16, pp. 686-696, Nov 2020. "Association Between Immune
[11] B. Smith, M. Ashburner, C. Rosse, J. Dysfunction and COVID-19
Bard, W. Bug, W. Ceusters, et al., "The Breakthrough Infection After SARS-
OBO Foundry: coordinated evolution of CoV-2 Vaccination in the US," JAMA
ontologies to support biomedical data Intern Med, vol. 182, pp. 153-162, Feb 1
integration," Nat Biotechnol, vol. 25, pp. 2022.
1251-5, Nov 2007. [23] PCORnet Common Data Model (CDM)
[12] OMOP CDM version 5.4 Available: Specification, Version 6.0. Available:
http://ohdsi.github.io/CommonDataMod https://pcornet.org/wp-
el/cdm54.html content/uploads/2022/01/PCORnet-
[13] Y. He, Z. Xiang, J. Zheng, Y. Lin, J. A. Common-Data-Model-v60-
Overton, and E. Ong, "The eXtensible 2020_10_221.pdf
ontology development (XOD) principles [24] Y. Yu, N. Zong, A. Wen, S. Liu, D. J.
and tool implementation to support Stone, D. Knaack, et al., "Developing an
ontology interoperability," J Biomed ETL tool for converting the PCORnet
Semantics, vol. 9, p. 3, Jan 12 2018. CDM into the OMOP CDM to facilitate
[14] E. Ong, Z. Xiang, B. Zhao, Y. Liu, Y. Lin, the COVID-19 data integration," J
J. Zheng, et al., "Ontobee: A linked Biomed Inform, vol. 127, p. 104002, Mar
ontology data server to support ontology 2022.
term dereferencing, linkage, query and
integration," Nucleic Acids Res, vol. 45,
pp. D347-D352, Jan 04 2017.
[15] Z. Xiang, M. Courtot, R. R. Brinkman, A.
Ruttenberg, and Y. He, "OntoFox: web-
based support for ontology reuse," BMC
Res Notes, vol. 3:175, pp. 1-12, 2010.
[16] M. A. Musen, "The Protégé project: A
look back and a look forward. AI Matters.
," Association of Computing Machinery
Specific Interest Group in Artificial
Intelligence, vol. 1, p. DOI:
10.1145/2557001.25757003., 2015.
[17] Hermit OWL reasoner. Available:
http://hermit-reasoner.com/
[18] R. Arp, B. Smith, and A. D. Spear,
Building Ontologies with Basic Formal
Ontology. MIT Press: Cambridge, MA,
USA, 2015.
[19] ISO/IEC 21838-2:2021. Information
technology — Top-level ontologies (TLO)
— Part 2: Basic Formal Ontology (BFO).
Available:
https://www.iso.org/standard/74572.html
11