=Paper= {{Paper |id=Vol-3805/ICBO-2022_paper_1549 |storemode=property |title=OMOP-2-OPMI: Ontologization of OMOP CDM Using OPMI to Support Clinical Data interoperability and analysis |pdfUrl=https://ceur-ws.org/Vol-3805/ICBO-2022_paper_1549.pdf |volume=Vol-3805 |authors=Long Tran,Yongqun He |dblpUrl=https://dblp.org/rec/conf/icbo/TranH22 }} ==OMOP-2-OPMI: Ontologization of OMOP CDM Using OPMI to Support Clinical Data interoperability and analysis== https://ceur-ws.org/Vol-3805/ICBO-2022_paper_1549.pdf
                         OMOP-2-OPMI: Ontologization of OMOP CDM using OPMI to
                         support clinical data interoperability and analysis
                         Long Tran 1, and Yongqun He 1
                         1
                                University of Michigan, Ann Arbor, MI, USA

                                              Abstract
                                              The OMOP Common Data Model (CDM) has been widely used as an open community data
                                              standard in observational data integration and analysis. However, it still has its drawbacks
                                              including weak semantics and interoperability with other CDMs. In this study, we report our
                                              ontologization of the OMOP CDM elements and the semantic relations among the elements
                                              using the Ontology of Precision Medicine and Investigation (OPMI). A total of 165 terms from
                                              15 OMOP CDM tables has been mapped to OPMI, with 46 terms newly generated with OPMI
                                              namespace and the other terms reported from OBO reference ontologies. An Omop2Opmi.owl
                                              file was also generated by extracting the OMOP CDM related terms and relations from OPMI.
                                              Three categories of use cases are reported, using the ontology-level OMOP CDM element
                                              standardization and data integration, adverse event (AE) modeling, and COVID-19 clinical data
                                              studies. Following the Ontology of Adverse Events (OAE) definition, we developed a
                                              generalizable OMOP-AE model that transforms the OMOP data to systematically define,
                                              identify, and analyze specific adverse events following some medical interventions that include
                                              Drug/Device Exposure and Procedure Occurrence in OMOP. Overall, OMOP-2-OPMI
                                              complements and empower OMOP CDM for enhanced clinical data standardization, sharing,
                                              interoperability, and analysis.

                                              Keywords 1
                                              OMOP, Common Data Model, ontology, OPMI, adverse events, COVID-19.

                                                                                                                        model used by the National COVID Cohort
                         1. Introduction                                                                                Collaborative (N3C, https://ncats.nih.gov/n3c).
                                                                                                                        As of May 2022, the N3C data enclave has stored
                                                                                                                        the records of 14 million persons, including over
                            The Observational Medical Outcomes
                                                                                                                        5 million COVID+ cases. Based on the N3C data
                         Partnership (OMOP) Common Data Model
                                                                                                                        use design, the COVID-19 clinical data
                         (CDM) is an open community data standard that                                                  warehouse data dictionary used in N3C is based
                         aims to allow for systematic analysis of disparate                                             on OMOP CDM, and the other data formats need
                         observational databases [1]. With the CDM, the                                                 to be aligned with the OMOP CDM in order to be
                         data contained in those databases can be                                                       entered and used in the N3C data enclave.
                         transformed into a common format with a                                                        Therefore, the OMOP CDM has clearly played a
                         common representation. OMOP CDM has been                                                       significant role in the data standardization and
                         widely used to support the standardization of                                                  integration.
                         various electronic medical records (EMR) and                                                       Still the OMOP CDM has its own drawbacks
                         administrative claims within and outside the                                                   [2, 3]. One drawback is its weak semantics in that
                         United States. Billions of patient records have                                                OMOP CDM does not provide robust semantic
                         been standardized using OMOP CDM. Recently,
                                                                                                                        relations among CDM elements. Basically, the
                         OMOP CDM has become an established data
                                                                                                                        OMOP CDM provides the schema structure of a

                         ICBO 2022, September 25-28, 2022, Ann Arbor, USA
                         EMAILs: longtr@umich.edu (A.1 ); yongqunh@med.umich.edu
                         (A. 2). ORCID: 0000-0002-5735-7540 (A. 1);
                         0000-0001-9189-9661 (A. 2)
                                           2022 Copyright for this paper by its authors. Use permitted under Creative
                                         Commons License Attribution 4.0 International (CC BY 4.0).

                                         CEUR Workshop Proceedings (CEUR-WS.org)
                                         ©️
CEUR
Workshop
                  ceur-ws.org
              ISSN 1613-0073
                                                                       1
Proceedings
standardized a relational database that includes      used to support the Kidney Precision Medicine
over 10 tables, which has an inherent weakness in     Project [9, 10]. We have been using the OPMI to
terms of representing the relations among terms       model and represent the core OMOP CDM
from different tables. As a result, the layout of     elements and relations among the elements [3].
OMOP and how it is set up to document patients’           This manuscript reports our usage and
conditions could lead to ambiguities, inaccurate      extension of the OPMI to ontologize the OMOP
representations and erroneous counting [2].           CDM elements and the relations among these
Another drawback is that OMOP CDM does not            elements, and how such OMOP-2-OPMI
inherently provide systematic interoperability        ontologization supports systematic clinical data
with other CDMs such as National Patient-             interoperability, sharing, and integration.
Centered Clinical Research Network (PCORnet)
[4] and Clinical Data Interchange Standards           2. Methods
Consortium (CDISC) [5]. In the N3C data
integration, the COVID-19 data formulated with           2.1.     OMOP CDM                 resource
other CDMs are required to be harmonized based               used in the study
on OMOP CDM version 5.3 [6], which is
separately conducted and difficult to achieve             The OMOP version 5.4 was used in our OPMI
robust interoperability and scalability.              mapping. First, we obtained terms and their
    Ontology can be a solution to solve the above     annotations from the OMOP CDM version 5.4
drawbacks [3, 7]. In the 2018 OHDSI                   resource [12]. The Athena software program
Symposium, we proposed a strategy of                  (https://athena.ohdsi.org/) is the tool used to
ontological representation of the OMOP CDM            search OMOP CDM terms and related terms from
using the OBO framework [3]. In addition to the       OMOP-associated terminologies.
core OMOP CDM model, the OMOP system also
includes many standardized clinical terminologies
that can be used under the OMOP CDM                       2.2.      OMOP-2-OPMI
framework       to      collaboratively     support           development strategy
observational      data     standardization    and
integration. In the 2020 OHDSI Symposium,                The OPMI ontology is used as the default
Callahan et al. reports their development of the      ontology platform for the ontology mapping and
OMOP2OBO, a health system-wide program of             new term generation of the OMOP CDM elements
the integration and alignment between OMOP’s          and semantic relations among the elements. In
standardized clinical terminologies and eight         general, the eXtensible Ontology Development
OBO biomedical ontologies spanning diseases,          (XOD) strategy [13], including the methods of
phenotypes, anatomical entities, cell types,          ontology term reuse, semantic alignment,
organisms, chemicals, metabolites, hormones,          ontology design pattern, and community
vaccines, and proteins [7]. As of the end of May      extensibility, were used for the OPMI mapping.
2022, the OMOP2OBO mapping program has                Specifically, all those OMOP CDM element terms
collected 92,367 OMOP Conditions, 8,615 Drug          were first searched in Ontobee [14]. For those
Exposure ingredients, and 3,827 Measurements          terms existing in reference OBO ontologies that
(10,673 measurement test results) terms [8].          map to the OMOP CDM elements, Ontofox [15]
OMOP2OBO allows its users to construct their          was used to import those terms to OPMI (if the
own sets of omop2obo mappings.                        import has not been done before). For those
    Among >100 ontologies in the Open                 OMOP elements that cannot be mapped to any
Biomedical Ontology (OBO) library, the                OBO reference ontology, we generated new terms
Ontology of Precision Medicine and Investigation      and defined them with OPMI namespace based on
(OPMI) is an ontology in the domain of precision      specific ontology design patterns. The OPMI
medicine and investigation [9, 10]. Following the     ontology editing was performed using Protege-
OBO ontology principles (e.g., openness and           OWL editor [16], and the ontology reasoning was
collaboration, OPMI reuses many terms of              conducted using the Hermit reasoner [17]. All the
existing reference ontologies and include many of     terms are aligned under the upper-level Basic
its own terms in the field of clinical and            Formal Ontology (BFO) [18]. Meanwhile, we
translational precision medicine, supporting non-     have discussed our project design in different
redundant      and      interoperable     ontology    scenarios, and community feedback and
development [11]. OPMI has been developed and

                                                      2
comments were obtained to adjust our definitions        adverse event model based on the OMOP CDM
and design.                                             logic and available data formats. The third use
                                                        case is the usage of OMOP-2-OPMI to study N3C
    2.3.          Download and license                  COVID-19 related clinical data.

    The OMOP-2-OPMI GitHub web page is:                 3. Results
https://github.com/OPMI/OMOP-2-OPMI. The                   3.1.     General OMOP CDM
source code of the Omop2Opml.owl file is openly
available at this GitHub website for downloading.
                                                               ontologization architecture
The OWL file is generated primarily by extracting
the OMOP CDM-related terms and associated                   Figure 1 represents the hierarchical structure
relations from the OPMI using Ontofox [15].             of the OMOP-2-OPMI, which is the
Considering the usage of OPMI as the platform           ontologization of the OMOP CDM using the
for the OMOP CDM mapping, the OMOP-2-                   OPMI as the ontology platform. Specifically, all
OPMI source page is designated as a repository          the terms are aligned under the Basic Formal
under the general OPMI organization in GitHub.          Ontology (BFO) [18], an ISO-approved upper
    Meanwhile, the OMOP-2-OPMI repository               level ontology [19]. BFO includes two branches:
has also stored related data files including our        continuants and occurrents. Continuants cover
cleanup spreadsheets of the mapping details             time-independent entities including material
available at: https://github.com/OPMI/OMOP-2-           entities, quality, realizable entities such as
OPMI/tree/main/docs.                                    disposition, and information content entities.
                                                        Occurrents are time-dependent entities including
                                                        temporal region and processes. All the OMOP
    2.4.         Use case studies                       CDM elements can be categorized under these
                                                        two categories (Figure 1). BFO has been used by
    Three use cases are developed and discussed in      over 300 ontologies. The alignment with BFO
this study. Specifically, the first use case is about   allows us to integrate our ontology with the large
the OMOP data standardization and inference.            number of other ontologies, supporting data
The second use case is the development of an            interoperability.




 Figure 1: OMOP-2-OPMI top level hierarchical structure and representative terms. Ontology names
       are highlighted with different colors. Ontology-mapped OMOP terms are also provided.
                                                     (i.e., visit/condition/procedure occurrences, and
   Figure 2 is a simplified high level OMOP-2- drug/device exposure) and the observation
OPMI ontology design pattern (ODP) that covers process, which are all under BFO:process (Figure
the major elements in 11 OMOP tables. 1). The observation happens during a specific
Specifically, the person (usually here it refers to observation period. The person is also the target
patient in OMOP) is centric to the ODP. The of measurement. A specimen derives from some
person participates in five medical occurrences organ or tissue of the person. The person has

                                                        3
different phenotypes, and death is a specific            1 lists ontology mapped CDM element terms from
phenotype (Figure 2).                                    10 representative OMOP tables.
                                                             Our current mapping primarily covers those
                                                         clinical data tables and health system data tables.
                                                         We have not yet included the Metadata Tables,
                                                         Vocabulary Tables, Standardized Derived Tables
                                                         except for Episode, and the Cost table which
                                                         belongs in the Health Economics Data Tables
                                                         category. These missing tables do not directly
                                                         involve clinical investigation, which is our current
                                                         focus. Also as shown in Table 2, many terms are
                                                         not mapped to ontology. Most of these missing
                                                         terms are various “source value” or source
                                                         concept ID terms. Throughout OMOP CDM,
Figure 2: General ontology design pattern that           there are similar terms representing various
links CDM elements from 11 OMOP tables. Note             source concepts and source values. In the OMOP
one box covers five OMOP occurrence/exposure             structure, a source concept set organizes terms
tables. Mapped ontology terms are also labeled.          into groups called source value sets. A value set
                                                         (e.g., ‘procedure_source_value’) is a set of codes
                                                         whose context and usage are defined by one or
    3.2.         OMOP-2-OPMI statistics                  more code systems in which the clinical data came
                                                         from. However, the organization of value sets is
   A total of 165 terms from 15 OMOP CDM                 not often ontology-based. In most cases, we have
tables has been mapped to OPMI, with 46 terms            decided to not incorporate terms for “source
newly generated with OPMI namespace and the              concept” and “source value” sets until we figure
other terms reported from OBO reference                  out a place for these terms to make sense
ontologies. In addition to the 11 tables listed in       ontologically within OPMI. In our ontologization,
Figure 2, the other four tables are Care Site, Payer     we have also included specific source value terms
Plan Period, Episode, and Location, which are not        as seen in Table 1 and detailed later in the
included in Figure 2 to simplify that figure. Table      manuscript.

Table 1. CDM terms from 10 representative OMOP tables mapped to OPMI

   Selected OMOP      Mapped                             Mapped Ontology Term Examples
        tables       OMOP terms

      PERSON            13/19*           person ID (OPMI_0000470), gender (PATO_0001894), year of birth
                                                      (OPMI_0000473), race (NCIT_C17049)

     PROVIDER            9/13       care provider (OPMI_0000163), National Provider Identifier (OPMI_0000503),
                                                         DEA identifier (OPMI_0000504)

     SPECIMEN            6/15         specimen ID (OBI_0001616), date of specimen collection (OBIB_0000714),
                                                    anatomical structure (UBERON_0000061)

      VISIT              26/17       visit occurrence (OPMI_0000482), visit start date (OPMI_0000487), preceding
   OCCURRENCE                                             visit occurrence (OPMI_0000492)

    PROCEDURE            13/16       procedure (NCIT_C25218), procedure start date (OPMI_0000508), procedure
   OCCURRENCE                                             end date (OPMI_0000510)

      DRUG               18/23          drug exposure (OPMI_0000572), drug product (DRON_00000005) drug
    EXPOSURE                                            exposure start time (OPMI_0000565)

    CONDITION            38/16             condition occurrence (OPMI_0000527), medical condition status
   OCCURRENCE                              (OPMI_0000533), admission diagnosis status (OPMI_0000542)

      DEVICE             7/15         device exposure (OPMI_0000554), device (OBI_0000968), device exposure



                                                        4
     EXPOSURE                                                   start date (OPMI_0000562)

  MEASUREMENT               11/20            clinical measurement identifier (OPMI_0000582), measurement time
                                                   (OPMI_0000579), measurement unit label (IAO_0000003)

  OBSERVATION                5/6                       observation period start date (OPMI_0000577),
     PERIOD                                            observation period end date (OPMI_0000578),

Note: *13/19 represents that 13 out of 19 OMOP CDM terms in the specific category have been mapped to terms
in the OPMI ontology. The unmapped terms are primarily those terms related to “source value”. More terms in
the visit/condition occurrences are mapped because some specific source value terms are ontologized.

   In addition to source values or source concept
IDs, there are also many terms in OMOP CDM
not yet ontologized. The reasons of such
imcompleteness include the lack of necessity of
many terms, and the complexity of many other
terms in terms of ontology modeling. We will
continue this work later, ideally by involving
more collaboration and discussion with the
ontology and clinical informatics communities.

Table 2. Ontology mapping of OMOP CDM terms
by element types
                      OMOP          OMOP     percent
       types
                      terms         mapped   mapped
                                                            Figure 3: Modeling of 5 medical occurrence
        _id            23             19     82.61%         categories and 11 specific visit occurrences.
       _date           34             27     79.41%
    _concept_id        41             29     70.73%            In two of the five OMOP tables, Visit
                                                            Occurrence and Condition Occurrence, in
  _concept_name        30             16     53.33%
                                                            addition to mapping the elements in original
 _source_concept_id    17             1       5.88%         tables (Table 1), we also added some terms from
   _source_value       34             1       2.94%         the supporting OMOP vocabularies for
       Total           179            93     51.96%         developing a complete semantic model. In the
                                                            case of Visit Occurrence, the extra terms are due
                                                            to the ontologization of 11 types of visit
    Next we will focus on a few major ontology
                                                            occurrences (e.g., ‘emergency room visit’, ‘home
modeling topics to show how we model and
                                                            visit’) that are originally not defined in OMOP’s
ontologize the OMOP CDM elements.
                                                            CDM model and instead are from the supporting
                                                            OMOP vocabularies identified on the Athena
    3.3.    Ontologization of OMOP                          program. We have ontologized such terms under
       medical occurrences                                  ‘visit occurrence’ (OPMI) (Figure 3). These terms
                                                            represent the overarching types of encounters
   By examining the OMOP CDM elements, we                   between a person and the healthcare system,
found that five OMOP tables can be categorized              which are adopted in most healthcare systems
under an ontology class called ‘medical                     worldwide.
occurrence’, which is defined as a process event               In the case of Condition Occurrence, the extra
that a patient experiences over a period of time            22 terms come from the incorporation of medical
(Figure 3). These five OMOP tables are:                     condition statuses (e.g., ‘admission diagnosis’,
                                                            ‘cause of death’, and ‘confirmed diagnosis’),
‘condition occurrence’, ‘device exposure’, ‘drug
exposure’, ‘procedure occurrence’, and ‘visit               which were defined by OMOP and searchable in
occurrence’ (Figure 3).                                     Athena. In OMOP, a medical condition status
                                                            denotes the stages of a patient’s diagnosis, not the
                                                            actual state of the disease by itself. OPMI
                                                            represents these medical condition statuses in two

                                                           5
strategies. First, OPMI includes a term called         current status of the patient at a specific stage. For
‘medical condition status’ under the ‘status’ term,    example, ‘admission diagnosis status’ represents
which is a subclass of BFO:‘realizable entity’. In     the status at which a person is diagnosed at the
this classification, a medical diagnosis status,       admission stage. On the other hand, as the data
such as admission diagnosis, represents a patient      item, the ‘admission diagnosis’ indicates the
diagnosis status such as the status of diagnosis at    conclusion or outcome of the diagnosis process at
the time when the patient is admitted to the           the stage of patient admission. A diagnosis
hospital.                                              conclusion made at the admission or discharge
    We have also adopted the OGMS:diagnosis            stage may be the same or different.
classification and defines various diagnosis types         Meanwhile, the diagnosis clinical data type vs
under the OGMS:diagnosis (Figure 4). According         the diagnosis medical condition status are closely
to the Ontology for General Medical Science            related. In OPMI, we propose to generate a
(OGMS), diagnosis (OGMS_0000073) is a                  relation term called ‘has status content’, which
subclass of clinical data item and represents the      represents a relation between a status and an
conclusion of a diagnostic process. Based on the       information content entity where the status has its
OMOP classification, OPMI has defined different        content information defined by the information
categories of diagnosis, including ‘admission          content entity. For example, we can define an
diagnosis’, ‘primary diagnosis’, ‘secondary            axiom that links a diagnosis status to a diagnosis
diagnosis’, and ‘death diagnosis’, etc. (Figure 4).    data item:
These specific diagnosis types are commonly                 ‘admission diagnosis status’: ‘has status
used at the clinical setting. The classification of    content’ some ‘admission diagnosis’
these diagnosis types facilitates the clinical data        However, such duplicated representation may
annotations.                                           not be needed. It is possible to just define
                                                       ‘admission diagnosis status’ and remove the term
                                                       ‘admission diagnosis’. We will examine more use
                                                       cases and discuss with the ontology and medical
                                                       informatics communities on this regard.

                                                           3.4.    Ontologization       of
                                                               temporal date/time in OMOP
                                                           To ontologically represent various entities
                                                       denoting time that can be found throughout
                                                       OMOP, we have mapped 24 temporal terms from
                                                       6 tables. The OMOP tables that have temporal
                                                       terms ontologized are Visit Occurrence, Device
                                                       Exposure, Drug Exposure, Procedure Occurrence,
                                                       Condition Occurrence, and Person. For all tables
                                                       but Person, the entities are ontologized with
                                                       temporal terms for -start date, -start datetime, -end
Figure 4: Modeling of different medical diagnosis      date, and -end datetime. Meanwhile, temporal
under the OGMS:diagnosis, which is a subclass of       terms related to the Person table are instead
clinical data item.                                    ontologized with more familiar terms which are
                                                       ‘birth datetime’, ‘day of birth’, ‘month of birth’,
    As OPMI separates diagnosis clinical data          and ‘year of birth’. All temporal terms are
type vs the diagnosis medical condition status, we     grouped under a higher level term for a better
can define different diagnoses and diagnosis           organizational purpose (e.g., ‘visit start
statuses. For example, ‘discharge diagnosis            date/datetime’, ‘end date/datetime’ are all
status’, ‘referral diagnosis status’, and ‘admission   grouped under ‘visit temporal region’) (Figure 5).
diagnosis status’ are realizable entities, and
‘discharge diagnosis, ‘referral diagnosis status’,
and ‘admission diagnosis’ are data items. The
main benefit of separate representation of status
and data is the semantic separation and clarity.
The medical condition status represents the

                                                       6
                                                           In OPMI, type_concepts are mapped as
                                                       various terms under ‘provenance of record’, a
                                                       class under ‘information content entity’. So far,
                                                       we have generated 12 terms for the provenance of
                                                       records for 12 corresponding entities of OMOP
                                                       CDM tables. The provenance of records is
                                                       dedicated for each corresponding OMOP entity
                                                       since the sources of the entries can vary across
                                                       different fields.
                                                           Meanwhile, OPMI also defines most of the
                                                       records for the OMOP provenance purposes under
                                                       ‘electronic health record’, such as ‘electronic
                                                       medical visit record,’ ‘electronic death record,’
                                                       ‘electronic device record,’ etc. (Figure 6). The
                                                       users can choose the usage of these electronic
                                                       health records as the sources of the data collected
                                                       to the OMOP database. Note that not all the
Figure 5: OPMI modeling of date and time used          provenance records are electronic health records
in OMOP CDM.                                           (EHR). For example, in addition to the record
                                                       from an EHR system, the measurement record
                                                       might also come from an insurance claim,
    3.5.      Ontologization of entity                 registry, or other sources.
        identifiers in OMOP CDM
    In OMOP, fields with the suffix “_id_”
usually denote identifiers, which function as
primary keys in their respective OMOP tables
along with other supporting entities (e.g.,
person_id in Person table). These identifiers can
also be used as foreign keys to connect other
related OMOP tables (e.g., person_id to connect
Provider and Care Site tables).
    OPMI has ontologized OMOP CDM related
identifiers under the class of ‘centrally registered
identifier’, a subclass under ‘information content
entity’. Example identifiers defined include           Figure 6: OPMI modeling of different records
‘person ID’, ‘care site identifier’, ‘clinical         used as data provenance in OMOP CDM.
measurement identifier’, ‘DEA identifier’ and             Next, we will focus on the description of three
‘National Provider Identifier’. These identifiers      use cases of the OMOP-2-OPMI approach.
identify assets belonging to different but centrally
registered local databases.                                3.7.       Use case 1: Ontology-
                                                               level data standardization
    3.6.     Ontologization      of
        provenance records in OMOP                         The first use case is rooted in the nature of
                                                       ontology. As an open access ontology following
   In OMOP, most entities from various tables          the OBO ontology development principles,
have their own “type_concept” terms, which             OMOP-2-OPMI          provides      the      standard
indicate the provenance, or the source of the          representation and definitions of the OMOP CDM
record in which it comes from. For instance, drug      mapped terms and the axioms among these terms.
exposure entries could come from either                The OMOP-2-OPMI ontology terms can be used
prescriptions list or self-reported by patients, the   to     support    standardized      clinical    data
provenance of which can differ from a patient’s        representation and annotation. The semantic
measurement records.                                   relations among the OMOP CDM terms and their
                                                       associated other terms provide solid semantic

                                                       7
associations, which addresses the OMOP CDM          conditions of different phenotypes that are the
drawback of weak semantics.                         outcomes of specific adverse events (Figure 7).
    The ontologized terms are also interoperable.
For example, the Coronavirus Infectious Disease
Ontology (CIDO), a biomedical ontology in the
domain of coronavirus diseases [20], has imported
the OMOP-2-OPMI ontology contents. The
contents of OMOP-2-OPMI fit seamlessly with
the other CIDO contents, providing another
demonstration of the ontology-supported
knowledge and data interoperability, sharing, and
integration. It is also possible to use the some
ontology terms for mapping to the other CDMs
such as PCORnet [4] and CDISC [5], which will
be explored in the future.
    Such interoperable ontology representation
also supports data and knowledge inferencing.       Figure 7: General OMOP-AE model based on
This is also rooted from the nature of ontology.    OMOP-2-OPMI. The red boxes represent OMOP
The following two other use cases provide such      tables and their mapped ontology terms. The
demonstrations.
                                                    black boxes are added ontology representation
                                                    to fill up the gaps for adverse event modeling. *,
    3.8.      Use case 2: Adverse                   OMOP uses SMOMED-CT concepts for disease or
        event modeling and analysis                 symptom representation. These can be mapped
                                                    to Human Phenotype Ontology (HP) terms.
    Another use case of the OMOP CDM
ontologization is the modeling of adverse events        Our original OPMI conference proceeding
(AEs) post medical intervention. The OMOP           paper presented a use case study of identifying
CDM does not include AE per se. However, by         and analyzing the acute kidney injury (AKI) AE
specific modeling, we can find the OMOP CDM         following heart surgery [9]. Using OHDSI data
data can be processed to support specific AE        provided by the IQVIA Pharmetric Plus database,
identification and analysis.                        our OHDSI cohort study identified a total of
    Figure 7 is a general OMOP-AE ontology          15,548 patients that fulfilled our predefined model
design pattern, which follows the AE definition     of AKI AE following heart surgery. Specific
by the Ontology of Adverse Events (OAE) [21].       patterns were identified. For example, 72% of the
According to the OAE, an adverse event (AE) is a    identified patients were male and 28% were
pathological bodily process that occurs following   female patients. Over 78% of these AE cases
some medical intervention [21]. In order to model   occurred in patients aged greater than 55 years
AEs with OMOP data, we need to identify the         old. Many phenotypes, such as coronary
medical intervention vs. adverse events to be       arteriosclerosis, kidney disease, pain, dyspnea,
mapped in OMOP. By examining all the five           hyperlipidemia, and Type II diabetes, were found
medical occurrence types defined in OMOP, only      in these patients as well [9].
three of them are considered as medical                 Our OMOP AE model is a very general model
interventions: Drug Exposure, Device Exposure,      in that it can be used to study specific adverse
and Procedure Occurrence (e.g., surgical            event profiles following various medical
procedure). Vaccination can be considered as a      interventions including different drug/medicine
special drug exposure.                              exposure and procedure occurrence. We are
    Note that the visit occurrence and condition    currently applying such a strategy to design a
occurrence are regarded as natural occurrence       pattern for identifying and analyzing the vaccine
events without medical intervention. Based on the   and drug AEs in COVID-19 patients using the
AE definition, contracting a natural infection is   N3C data. Note that if a patient contracted
not an AE since the patient does not receive an     COVID-19 in a natural environment, the patient
adverse outcome after a medical intervention.       has a condition, which is not an adverse event
However, the condition occurrence may include       (because an AE is always associated with medical
                                                    intervention). However, the occurrence of new

                                                    8
phenotypes after medical treatment on these            The above two studies are currently ongoing
COVID-19 patients are considered AEs.               and we expect to have more specific results
                                                    available in near future.
    3.9.      Use case 3: COVID-19
        clinical data standardization,              4. Discussion
        modeling, and analysis                          This manuscript has made two main
                                                    contributions. First, we report our systematic
    In addition to the import of the OMOP-2-        survey and ontologization of the OMOP CDM
OPMI to CIDO and the study of COVID-19              elements using the OPMI ontology. The Omop-
associated AE modeling and analysis as described    2Opmi.owl file is the OWL file that includes only
above, we are also applying the OMOP-2-OPMI         the OMOP CDM-related ontology terms, their
for more COVID-19 clinical data modeling and        directly associated terms (e.g., their parent terms),
analysis. Two data resources for our OMOP-2-        and the semantic relations between these terms
OPMI based studies are the literature reports and   that are presented as ontology axioms. Second, we
N3C clinical data.                                  presented three categories of use cases of our
    One specific use case is the study of the       OMOP CDM ontologization, including ontology-
relation between the COVID-19 infection and the     level OMOP CDM element standardization and
increased risk for kidney diseases. For example,    inferencing, adverse event modeling and analysis,
acute kidney injury (AKI) is a significant          and COVID-19 clinical data studies. Overall, our
complication of COVID-19. The incidence of          systematic ontologization of the OMOP CDM
AKI in hospitalized patients varies from 0.5% to    complements and empowers the OMOP CDM
75%. The mortality rate for patients with kidney    system, providing a new way of supporting
disease is also significantly higher than the       systematic clinical data interoperability, sharing,
general infected population. However, the big       and integration.
variation of AKI incidence in COVID-19 patients         A similar and related system is OMOP2OBO,
appears to depend on many factors such as race,     a systematic mapping tool that maps OMOP
region, and disease severity. The N3C cohort data   related terms to OBO ontologies [7]. The terms
is being used to detect, compare, and analyze the   mapped in OMOP2OBO cover 8 OBO ontologies,
occurrences of kidney disease following COVID-      including Cell Ontology (CL), ChEBI chemical
19 infection. The OMOP-2-OPMI model,                entity ontology, Human Phenotype Ontology
together with the OMOP2OBO, can be used to          (HP), MONDO disease Ontology, NCBI
support data modeling, integration, and analysis.   Taxonomy Ontology (NCBITaxon), Protein
The integrated data can also be further used for    Ontology (PR), Uberon anatomy ontology, and
machine learning tool development for kidney        Vaccine Ontology (VO). While OMOP2OBO
disease prediction following COVID-19               includes the mapping of over 100,000 terms in the
prediction. We have registered for an N3C           OMOP terminology system, it does not cover the
program to perform related research.                OMOP CDM elements in the over 10 basic
    Another use case in this category is the        OMOP tables. Instead, OMOP-2-OPMI focuses
application of OMOP-2-OPMI and CIDO for             on the core OMOP CDM level mapping and
secondary literature data analysis and knowledge    representation. In addition to ontology term
representation. There have been a big number of     mapping, since many high level terms in OMOP
COVID-19 studies reported in the literature, many   CDM are not yet represented in OBO ontologies,
of which involve the usage of OMOP CDM              we have taken extensive effort to generate many
model. For example, one study examined the          new terms in OPMI. We have also generated
association between immune dysfunction and          ontological relations among these OMOP CDM
COVID-19 breakthrough infection after SARS-         elements using the OPMI ontology platform.
CoV-2 vaccination in the US using N3C data [22].    Overall, OMOP2OBO and OMOP-2-OPMI are
The N3C data and the results out of the data        complementary in that they map and integrate
analysis can both be modeled, annotated, and        OMOP data from different aspects.
represented using ontology including our OMOP-          There are still many issues to consider in our
2-OPMI and CIDO.                                    ontologization. For example, we presented two
                                                    types of methods for representing medical
                                                    condition statuses and two types of methods of

                                                    9
representing provenance records in our work.             Center Joint Institute for Clinical and
Since most medical condition statuses are                Translational Research (U072807). We appreciate
different types of diagnosis, such status                the discussion and comments from the ontology
representations can be defined under “status”,           and OMOP societies including Dr. Asiyah Yu Lin
which is defined as a BFO:‘realizable entity’, or        and Dr. Andrew Williams.
under OGMS:diagnosis, which is basically a type
of clinical data item. Similarly, for the provenance     6. References
records, they can be represented under
provenance itself or under electronic health
record. The ICBO-2022 conference will provide            [1]    E. A. Voss, R. Makadia, A. Matcho, Q.
us a discussion platform to discuss the pros and                Ma, C. Knoll, M. Schuemie, et al.,
cons of different representation styles.                        "Feasibility and utility of applications of
    Several use cases are introduced in this article.           the common data model to multiple,
We demonstrated the development of a new                        disparate observational health databases,"
OMOP-based adverse event model based on the                     J Am Med Inform Assoc, vol. 22, pp. 553-
OMOP CDM data structure. Such an OMOP-AE                        64, May 2015.
                                                         [2]    W. Ceusters and J. Blaisure, "A Realism-
model can be used to support various specific AE
                                                                Based View on Counts in OMOP's
studies, including the modeling of adverse event
                                                                Common Data Model," Stud Health
cases post COVID-19 vaccination (or drug
admin) using N3C data. In addition to the AKI AE                Technol Inform, vol. 237, pp. 55-62,
study following heart surgery [9], we are currently             2017.
applying the OMOP AE model for more COVID-               [3]    Y. He, E. Ong, and J. Zheng, "Ontological
19 related AE studies. Furthermore, we can                      representation of OMOP CDM using the
                                                                OBO framework," presented at the 2018
develop new models to apply OMOP CDM to
study other topics such as long COVID and the                   OHDSI Symposium, Bethesda North
effects of different variables to the disease                   Marriott, Bethesda, MD, 2018.
outcomes.                                                [4]    F. S. Collins, K. L. Hudson, J. P. Briggs,
    One future project is to map the CDM terms                  and M. S. Lauer, "PCORnet: turning a
from other systems, including PCORnet [4] and                   dream into reality," J Am Med Inform
                                                                Assoc, vol. 21, pp. 576-7, Jul-Aug 2014.
CDISC [5], to the OPMI ontology using the same
                                                         [5]    S. Hume, J. Aerts, S. Sarnikar, and V.
OMOP-2-OPMI development strategy. These
different CDMs are overlapped. For example,                     Huser, "Current applications and future
There are similarities between the organizations                directions for the CDISC Operational
of OMOP and PCORnet CDMs, evidenced by the                      Data Model standard: A methodological
overlaps of certain tables such as Demographic,                 review," J Biomed Inform, vol. 60, pp.
                                                                352-62, Apr 2016.
Procedures, or Condition [23]. When all these
CDM elements and relations are mapped to the             [6]    COVID-19 Clinical Data Warehouse
same OPMI structure, we can integrate all the data              Data Dictionary Based on OMOP
using different CDMs, leading to compatible and                 Common Data Model Specifications
                                                                Version            5.3          Available:
interoperable clinical and observational data
standardization and integration. A recent study                 https://ncats.nih.gov/files/OMOP_CDM_
reports the development of an ETL tool for                      COVID.pdf
converting the PCORnet CDM into OMOP CDM                 [7]    T. J. Callahan, J. M. Wyrwa, N. A.
to facilitate the COVID-19 data integration [24].               Vasilevsky, and P. N. Robinson,
It is possible to apply our ontology approach to                "OMOP2OBO: Semantic Integration of
enhance such an ETL tool.                                       Standardized Clinical Terminologies to
                                                                Power Translational Digital Medicine
                                                                Across Health Systems," in 2020 OHDSI
5. Acknowledgements                                             Symposium, Virtual meeting, 2022.
                                                         [8]    T. J. Callahan. (2022). OMOP2OBO.
  We acknowledge the Kidney Precision                           Available:
Medicine Project (KPMP) project supported by                    https://github.com/callahantiff/OMOP2O
NIH-NIDDK grant: 1U2CDK114886, and a                            BO
COVID-19 research grant from the Michigan                [9]    Y. He, E. Ong, J. Schaub, F. Dowd, J. F.
Medicine–Peking University Health Sciences                      O'Toole, A. Siapos, et al., "OPMI: the

                                                        10
       Ontology of Precision Medicine and             [20]   Y. He, H. Yu, E. Ong, Y. Wang, Y. Liu,
       Investigation and its support for clinical            A. Huffman, et al., "CIDO, a community-
       data and metadata representation and                  based ontology for coronavirus disease
       analysis," in The 10th International                  knowledge and data integration, sharing,
       Conference on Biomedical Ontology                     and analysis," Sci Data, vol. 7, p. 181, Jun
       (ICBO-2019), July 30 - August 2, Buffalo,             12 2020.
       NY, USA., 2019, pp. 1-10.                      [21]   Y. He, S. Sarntivijai, Y. Lin, Z. Xiang, A.
[10]   E. Ong, L. L. Wang, J. Schaub, J. F.                  Guo, S. Zhang, et al., "OAE: The
       O'Toole, B. Steck, A. Z. Rosenberg, et al.,           Ontology of Adverse Events," J Biomed
       "Modelling kidney disease using                       Semantics, vol. 5, p. 29, 2014.
       ontology: insights from the Kidney             [22]   J. Sun, Q. Zheng, V. Madhira, A. L. Olex,
       Precision Medicine Project," Nat Rev                  A. J. Anzalone, A. Vinson, et al.,
       Nephrol, vol. 16, pp. 686-696, Nov 2020.              "Association       Between          Immune
[11]   B. Smith, M. Ashburner, C. Rosse, J.                  Dysfunction         and         COVID-19
       Bard, W. Bug, W. Ceusters, et al., "The               Breakthrough Infection After SARS-
       OBO Foundry: coordinated evolution of                 CoV-2 Vaccination in the US," JAMA
       ontologies to support biomedical data                 Intern Med, vol. 182, pp. 153-162, Feb 1
       integration," Nat Biotechnol, vol. 25, pp.            2022.
       1251-5, Nov 2007.                              [23]   PCORnet Common Data Model (CDM)
[12]   OMOP CDM version 5.4 Available:                       Specification, Version 6.0. Available:
       http://ohdsi.github.io/CommonDataMod                  https://pcornet.org/wp-
       el/cdm54.html                                         content/uploads/2022/01/PCORnet-
[13]   Y. He, Z. Xiang, J. Zheng, Y. Lin, J. A.              Common-Data-Model-v60-
       Overton, and E. Ong, "The eXtensible                  2020_10_221.pdf
       ontology development (XOD) principles          [24]   Y. Yu, N. Zong, A. Wen, S. Liu, D. J.
       and tool implementation to support                    Stone, D. Knaack, et al., "Developing an
       ontology interoperability," J Biomed                  ETL tool for converting the PCORnet
       Semantics, vol. 9, p. 3, Jan 12 2018.                 CDM into the OMOP CDM to facilitate
[14]   E. Ong, Z. Xiang, B. Zhao, Y. Liu, Y. Lin,            the COVID-19 data integration," J
       J. Zheng, et al., "Ontobee: A linked                  Biomed Inform, vol. 127, p. 104002, Mar
       ontology data server to support ontology              2022.
       term dereferencing, linkage, query and
       integration," Nucleic Acids Res, vol. 45,
       pp. D347-D352, Jan 04 2017.
[15]   Z. Xiang, M. Courtot, R. R. Brinkman, A.
       Ruttenberg, and Y. He, "OntoFox: web-
       based support for ontology reuse," BMC
       Res Notes, vol. 3:175, pp. 1-12, 2010.
[16]   M. A. Musen, "The Protégé project: A
       look back and a look forward. AI Matters.
       ," Association of Computing Machinery
       Specific Interest Group in Artificial
       Intelligence,    vol.     1,    p.    DOI:
       10.1145/2557001.25757003., 2015.
[17]   Hermit OWL reasoner. Available:
       http://hermit-reasoner.com/
[18]   R. Arp, B. Smith, and A. D. Spear,
       Building Ontologies with Basic Formal
       Ontology. MIT Press: Cambridge, MA,
       USA, 2015.
[19]   ISO/IEC 21838-2:2021. Information
       technology — Top-level ontologies (TLO)
       — Part 2: Basic Formal Ontology (BFO).
       Available:
       https://www.iso.org/standard/74572.html

                                                     11