=Paper= {{Paper |id=Vol-2931/ICBO_2019_paper_34 |storemode=property |title=OPMI: the Ontology of Precision Medicine and Investigation and its Support for Clinical Data and Metadata Representation and Analysis |pdfUrl=https://ceur-ws.org/Vol-2931/ICBO_2019_paper_34.pdf |volume=Vol-2931 |authors=Yongqun He,Edison Ong,Jennifer Schaub,Frederick Dowd,John F. O'Toole,Anastasios Siapos,Christian Reich,Sarah Seager,Ling Wan,Hong Yu,Jie Zheng,Christian Stoeckert,Xiaolin Yang,Sheng Yang,Becky Steck,Christopher Park,Laura Barisoni,Matthias Kretzler,Jonathan Himmelfarb,Ravi Iyengar1,Sean D. Mooney |dblpUrl=https://dblp.org/rec/conf/icbo/HeOSDOSRS000SYY19 }} ==OPMI: the Ontology of Precision Medicine and Investigation and its Support for Clinical Data and Metadata Representation and Analysis == https://ceur-ws.org/Vol-2931/ICBO_2019_paper_34.pdf
        OPMI: the Ontology of Precision Medicine and
       Investigation and its support for clinical data and
             metadata representation and analysis
 Yongqun He1, Edison Ong1, Jennifer Schaub1, Frederick Dowd2, John F. O’Toole3, Anastasios Siapos4, Christian
Reich4, Sarah Seager4, Ling Wan1,5, Hong Yu6, Jie Zheng7, Christian Stoeckert7, Xiaolin Yang8, Sheng Yang8, Becky
   Steck1, Christopher Park2, Laura Barisoni9, Matthias Kretzler1, Jonathan Himmelfarb2, Ravi Iyengar10, Sean D.
                          Mooney2, for the Kidney Precision Medicine Project Consortium
 1
 University of Michigan Medical School, Ann Arbor, MI 48109, USA; 2 University of Washington, Seattle, WA 98195, USA; 3
    Cleveland Clinic, Cleveland, OH, USA; 4 IQVIA, Brighton, UK; 5 OntoWise, Nanjing, Jiangzu, China; 6 Department of
Pulmonary and Critical Care Medicine, Guizhou Provincial People’s Hospital, Guiyang, Guizhou 550002, China; 7 University of
   Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA; 8 Institute of Basic Medical Science, Chinese
 Academy of Medical Sciences, Beijing, China; 9 Duke University, NC, USA; 10 University Icahn School of Medicine at Mount
                                                  Sinai, NY 10029, USA.

    Abstract—Consortia conducting precision medicine studies            factors (e.g., biological sex and age) are generally poorly
face a major challenge of integrating big data including clinical       recorded and studied. Before investigators can deeply and
and biomedical data. In this study, we report our development of        accurately analyze precision medicine data, the clinical data
the community-driven Ontology of Precision Medicine and                 need to be captured and modeled systematically and robustly.
Investigation (OPMI) and its applications in clinical data and          For example, to achieve this goal, KPMP investigators created
metadata representation. OPMI has been used to represent the            over 30 case report forms (CRFs), which are being used across
common data model (CDM) of the Observational Health Data                many institutes. These clinical forms cover over 2000
Sciences and Informatics (or OHDSI) program. It has also been           questions and hundreds of clinical factors. Each of the clinical
used to represent approximately 30 case report forms defined by
                                                                        factors may affect the phenotype or omics analysis outcomes.
the NIH-supported Kidney Precision Medicine Project (KPMP).
Our case studies showed that OPMI is able to semantically and               To support clinical data collection and analysis, there have
precisely represent the OHDSI CDM, various KPMP clinical                exist many common data models (CDMs), including the CDMs
forms, and their associated data and metadata. Such ontological         of the OHDSI Observational Medical Outcomes Partnership
representations support standardized data representation,               (OMOP) [2], the Patient-Centered Outcomes Research
sharing, recording, integration, and advanced analysis.                 Network (PCORnet) [3], the healthcare management
                                                                        organizations’ research network (HMORN) virtual data
     Keywords— Common data model; kidney; case report form.
                                                                        warehouse [4], and the Study Data Tabulation Model (SDTM)
                                                                        of the Clinical Data Interchange Standards Consortium
                       I. INTRODUCTION                                  (CDISC) [5]. One issue is that these CDMs are often not
    Precision medicine is an emerging medical approach for              interoperable at the semantic level. We hypothesized that an
disease prevention and treatment that takes into account                ontological representation of the OMOP CDM (and other
individual variability in genes, environment, and lifestyle. An         CDMs) would better semantically represent and standardize the
example of a study in precision medicine is the Kidney                  data formatted based on the CDM and support better data
Precision Medicine Project (KPMP; http://kpmp.org), a large             analysis. As an example, the OMOP CDM is a relational
NIH/NIDDK-funded consortium project with the aim of                     database model that supports interoperable analyses of
understanding and treating human kidney diseases. With a                disparate observational databases [2]. The OMOP CDM has
focus on human studies, the KPMP project covers clinical                been widely adopted to support the accommodation of
recruitment, clinical study, biopsy, pathology, molecular data          observational medical data from disparate data sources.
and Omics data analysis. With the large amounts of data                 However, the terms in the OMOP CDM lacks strong semantic
generated, we will identify how to systematically collect,              relations. For example, the “Condition” in the OMOP CDM
represent, integrate, and analyze and make use of the big data          could be a natural disease or an adverse event following a
with the help of ontologies.                                            surgery or drug administration. The usage of ontology makes it
                                                                        possible to better differentiate the two types of conditions and
    Precision medicine faces the challenge of big data. Big data        support better data representation and analysis.
represents the data characterized with the 5 Vs: volume,
veracity, velocity, variety, and value [1], which requires                  A formal biomedical ontology is a human-comprehensible
specific technology and analytical methods for its                      and computer-interpretable set of terms and relations that
transformation into meaningful knowledge.                               represent entities in a specific domain and their relationships to
                                                                        each other. The Open Biological/Biomedical Ontology (OBO)
    In precision medicine, basic research results, such as Omics        community [6] has developed over 150 biomedical ontologies
study results, are affected by many clinical factors. Clinical




Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
that support alignment with each other. Most current OBO           terms and their associated upper level terms, and the Protégé
ontologies cover basic research domains. Our proposed              OWL editor tool [16] was used to display the structure.
Ontology of Precision Medicine and Investigation (OPMI) has
recently been included in the OBO library ontology list, which     C. OPMI representation of KPMP case report forms and
aims to focus on the representation of entities and relations in       their contents using CRF-Question-Entity model
the domain of precision medicine and its investigation.                The KPMP CRFs were extracted, modeled, and analyzed
                                                                   using the OPMI platform. The CRFs and the contents defined
    In this study, we report the OPMI development strategy and
                                                                   in CRFs were represented using a newly designed “CRF-
results with a focus on its supporting clinical studies. OPMI
                                                                   Question-Entity” model. Based on this model, OPMI generates
has been used to ontologize OMOP CDM and CRFs and to
                                                                   specific ontology terms to represent various CRFs in the
further support the KPMP precision medicine study.
                                                                   ontology. Each CRF usually includes many textual questions,
                                                                   e.g., “Are you aged less than 18 years old?” OPMI also
                        II. METHODS                                represents such textual questions, and also identifies the
                                                                   entities in reality (e.g., age and its value of less than 18 years
A. OPMI ontology development methods                               old) that are referred to by the questions. Many of these entity
    OPMI is developed as a community-based open source             terms are imported from existing ontologies. All the labels,
biomedical ontology by following the OBO Foundry ontology          synonyms and definitions of the CRFs and CRF-related terms
development principles such as openness and collaboration [6].     were carefully evaluated by the KPMP community and domain
The eXtensive Ontology Development (XOD) strategy [7] was          experts in the field.
applied for the ontology development. Specifically, OPMI
reuses many terms and relations from existing ontologies,          D. OPMI format, source code, license, and deposition
including the Ontology of General Medical Science (OGMS)               Formatted in the W3C standard Web Ontology Language
[8], Ontology for Biomedical Investigations (OBI) [9, 10],         (OWL2), the OPMI source code is open and freely available at
Human Phenotype Ontology (HP) [11], Uberon multi-species
                                                                   GitHub: https://github.com/OPMI/opmi. The OPMI uses the
anatomy ontology (UBERON) [12], Ontology of Adverse
                                                                   open       Creative   Commons        CC-BY       4.0    license
Events (OAE) [13], and Informed Consent Ontology (ICO)
[14]. The tool Ontofox (http://ontofox.hegroup.org) [15] was       (https://creativecommons.org/licenses/by/4.0/).
used to extract and reuse terms from these existing ontologies.        The OPMI ontology is deposited in several well recognized
                                                                   ontology repositories, including the Ontobee [17] website:
    OPMI-specific terms were assigned new identifiers using        http://www.ontobee.org/ontology/OPMI, NCBO BioPortal
the prefix “OPMI_” followed by auto-generated seven-digit          website: https://bioportal.bioontology.org/ontologies/OPMI, as
numbers.          The        Protégé       OWL        editor       well as OLS: https://www.ebi.ac.uk/ols/ontologies/opmi.
(http://protege.stanford.edu/) was used for the OPMI
visualization and manual term editing. The Hermit reasoner         E. OPMI query and analysis
(http://hermit-reasoner.com/) inside the Protégé OWL editor           To demonstrate the usage of OPMI, we developed
was applied for ontology consistency checking and                  SPARQL scripts to query OPMI using Ontobee’s SPARQL
inferencing.                                                       query endpoint (http://www.ontobee.org/sparql), and DL
                                                                   (description logic) query using the Protégé OWL editor.
B. OPMI representation and analysis of OHDSI CDM
    We used OPMI to ontologically model the OMOP CDM                                        III. RESULTS
used in the OHDSI program. As the underlying data standard
of OHDSI, the OMOP CDM allows for interoperable analyses           A. OPMI design and top level structure
of disparate observational databases. To demonstrate the usage         Fig. 1 illustrates selected key OPMI terms and top level
of OPMI to study OMOP CDM, we used the data extracted              hierarchical structure. OPMI adopts the Basic Formal
from the IQVIA Pharmetric Plus database data                       Ontology (BFO) [18, 19] as its upper level ontology. The
(https://www.iqvia.com), which had already been converted          BFO:continuant branch represents entities (e.g., ‘material
into the OMOP CDM format. In this study, kidney disease            entity’ which endure through time. The BFO:occurrent branch
data were extracted from the database based on the OPMI data       represents entities that are temporal (e.g., temporal region) and
model. Supported by this model, we developed an algorithm          which occur over time (e.g., ‘process’). As the default upper
to identify the concept IDs that covered the correct conditions    level ontology in the OBO ontology community, BFO has
of interest. Once identified, we extracted the patients who        been adopted by many ontologies. The alignment with the
initially did not have acute kidney injury (AKI), then were        BFO structure makes OPMI interoperable with a large number
treated with heart surgery, and diagnosed with AKI with 14         of other ontologies, including those OBO ontologies.
days after the surgery. The SNOMED concept term "Acute                 OPMI imports and semantically links terms from many
renal failure syndrome" and 62 other associated concept terms      existing biomedical ontologies, such as OGMS [8], OBI [9,
were used. The conditions within 30 days before the heart          10], HP [11], UBERON [12], and ICO [14] (Fig. 1). There are
surgery were extracted and mapped to the Human Phenotype           many reasons to choose these ontologies. First, the importing
Ontology (HP) [11]. To better analyze the subset of related HP     and reusing of these reliable precision medicine-related
terms, the tool Ontofox [15] was used to extract these HP          ontology terms avoids the reinvention of the wheel and also
provides a good starting point for OPMI development. Second,                                      are frequently reused. But it is time consuming to build up
all these ontologies are reliable OBO library ontologies                                          new CRFs from the ground, and it is difficult to compare the
(http://obofoundry.org/) and can all be aligned with the same                                     questions and results from different CRFs. To make more
upper level ontology BFO. Such alignments allow the                                               efficient CRF design and usage, it would be important to
interoperability among these reused terms with the same                                           standardize CRF components. Textual questions are the key
semantic relations. The semantic alignments and                                                   components of CRFs. The same questions (e.g., age and
interoperability also make it efficient to build up OPMI. It is                                   biological sex questions) may appear in different CRFs.
noted that the OBO Foundry aims to form a non-redundant set                                       Therefore, the standardization of the questions becomes
of ontologies to cover different biological and biomedical                                        essential to the whole CRF standardization process.
areas, the terms imported from the other ontologies are                                           Meanwhile, the same textual question may be expressed in
designed to be unique and do not overlap with terms from                                          different ways. From a scientific research standpoint, we
other OBO library ontologies.                                                                     should more focus on what each question is really about in
    OPMI also includes many OPMI-specific precision                                               reality, i.e., the entities or metadata types behind each question
medicine-related terms such as ‘precision medicine                                                rather than how a question is expressed. Accordingly, we
investigation’. The newly added OPMI terms also includes                                          developed the CRF-Question-Entity strategy with the aim to
those CRF terms, textual questions used in CRFs, the                                              standardize CRF questions, entities (or metadata types) and
question-related entities in reality, clinical metadata terms                                     answers under these questions, leading to the standardization
related to precision medicine studies, and terms related to                                       and efficient analysis of different CRFs. While the KPMP
clinical and health-related CDMs.                                                                 project will learn a lot from the ontologization of KPMP CRFs
    The most important reason why OPMI focuses on                                                 and their contents, many of benefits will go to future CRF
ontologization of CRFs and CRF questions is that the CRF                                          studies that do not have to go over the CRF generation from
development is critical to clinical studies and a lot of questions                                scratch as KPMP has done.

                                                                                                 entity (BFO)




                                                               continuant (BFO)                                                       occurrent (BFO)




                                information                                                                           temporal
                             content entity (IAO)                realizable         quality      material                                                 process (BFO)
                                                                entity (BFO)        (BFO)       entity (BFO)        region (BFO)


                   document                                                                                                              planned                          bodily process
                                                  textual       disposition       Phenotypic     specimen
                     (IAO)                                                                                                             process (OBI)                        (OGMS)
                                                entity (IAO)       (BFO)          abnormality      (OBI)
                                                                                     (HP)
                               informed              textual                                                                                               medical         pathological
         case report                                              disease                                       specimen collectoin        assay
                             consent form           question                                                                                             intervention     bodily process
        form (OPMI)                                               (OGMS)          fever (HP)                       process (OBI)           (OBI)
                                 (ICO)               (OPMI)                                                                                                  (OAE)           (OGMS)

       screening and
                                                                                                                collecting specimen
      patient tracking                         age question    kidney disease                                                                               medical       adverse event
                                                                                                                from organism (OBI)
        CRF (OPMI)                               (OPMI)          (MONDO)                                                                                procedure (OAE)      (OAE)


    eligibility assessment                                                                                        biopsy (OPMI)                                           kidney adverse
                                                                                                                                                        surgery (OPMI)
         form (OPMI)                                                                                                                                                        event (OAE)



  Fig. 1. OPMI top level hierarchical structure and representative terms. All terms are aligned together under the BFO structure.

                                                                                                  OPMI also includes additional terms such as ‘specimen
B. OPMI ontology design pattern to support OMOP CDM                                               collection’ and ‘specimen assay’, which are linked to OMOP
    Figure 2 represents the overall layout of OPMI ontological                                    elements (e.g. specimen and measurement).
representation of OMOP CDM.                  OPMI ontology
unambiguously represents the clinical terms defined in OMOP                                           The OPMI model clearly shows the differences between
CDM and the relations among these terms. Established on a                                         natural disease courses and adverse events. A disease course is
realism-based view [20], OPMI treats ‘visit occurrence’ as a                                      a pathological bodily process that produces specific signs or
process and ‘visit detail’ as information content entity. Many                                    symptoms at a specific location of a patient. An adverse event
other processes, including ‘procedure occurrence’ and ‘device                                     is a pathological bodily process that occurs after a medical
exposure’ but not necessarily ‘drug exposure’, are ‘part of’ the                                  intervention such as a drug exposure or a surgery procedure
visit occurrence process. OPMI separates ‘condition                                               [13]. According to the FDA standards, it is not necessary to
occurrence’ into different scenarios including disease course,                                    have a causal relation between the medical intervention and the
symptom phenotype, and drug/surgery adverse events. To                                            adverse event outcome. However, the main aim of adverse
support specimen-focused precision medicine investigations,                                       event study is to identify potential causal relations. To identify
whether a surgery adverse event occurs, we need to ensure that       relations that are commonly used among OBO ontologies. New
an abnormal medical condition occurs after a surgery instead of      relations are also generated (Figure 2).
before it. Such a strategy was then used in our kidney adverse
event use case study as described below.                                 Note that such a class level ontology design pattern (Figure
                                                                     2) can also be used to represent instance level data, which can
    In OMOP CDM-based database schema, foreign keys are              be stored in a RDF triple store and subject to SPARQL queries
used to link different tables. In OPMI, the relations among          and analyses.
these entities are more clearly represented using well-defined




Fig. 2. OPMI ontological representation of OMOP CDM elements and their relations. The terms highlighted in red boxes are
table names in OPMI CDM that are also represented as OPMI ontology terms. The terms in black boxes represent ontology terms
in OPMI to add values to the OMOP CDM. The lines with text in the middle represent the relations (i.e., object properties)
between different terms. OMOP model uses relational database primary keys and foreign keys to make links between different
CDM elements. In contrast, OPMI uses the ontology relations to more explicitly represent the linkages between terms. Such
ontology relations have the advantage of logically defining the relation meanings and directions with input and output. ICE:
information content entity.

                                                                     surgery can be up to 30-50% [21, 22]. Many risk factors are
C. OHDSI kidney data analysis using OPMI stratregy                   associated with AKI after cardiac surgery, for example,
    An important precision medicine application is related to        advanced age, female gender, hypertension, hyperlipemia,
the precision medical intervention to reduce the occurrence of       diabetes, surgery types, etc. [21, 23]. Therefore, the study of
various adverse events, especially severe adverse events. It is      this highly prevalent and prognostically important AKI adverse
possible that the occurrences of these adverse events are due to     event after heart surgery is very needed to the public health.
various genetic, health or environmental conditions. If we can       The knowledge learned from this study may also later help the
identify important conditions that are correlate with the adverse    study of drug-associated kidney adverse events.
events, we can then design rational tests to reduce the threats of
adverse events and support public health.                                Based on the Fig. 2 OPMI modeling, we developed an
                                                                     algorithm to differentiate surgery adverse events from natural
   In this study, we hypothesize that ontology-based semantic        diseases. Specifically, our algorithm identifies and treats the
modeling, together with the usage of ontologies, including           heart surgery time as the index time. To be qualified as an AKI
Human Phenotype Ontology (HP) and Ontology of Adverse                adverse event following heart surgery, the patient should not
Events (OAE), could help clarify different conditions in             have AKI during a period before the index time, and have AKI
OMOP CDM-compatible database, and better understand the              during a short period after the index time. We then used
contributions of different factors to the presence of specific       ontologies to represent the phenotypes, heart surgeries, and
adverse events. In the area of kidney adverse event research,        adverse events systematically, with the aim to identify
surgery and drug-induced kidney injury is common, well               insightful patterns.
recognized and an important public health problem. For
example, heart surgeries are often followed with AKI adverse
events [21]. The incidence of AKI among patients after cardiac
                                                                   D. OPMI representation of KPMP case report forms
                                                                       Figure 4 demonstrates the representative list of KPMP
                                                                   CRFs. In total, KPMP includes approximately 30 CRFs used in
                                                                   different stages of clinical study. These stages cover the
                                                                   screening and patient tracking, enrollment, pre-biopsy, biopsy,
                                                                   post-biopsy, and pathology test, etc. Overall, these CRFs cover
                                                                   over 2,800 questions. Each question is about some specific
                                                                   entities related to the clinical study. Note that for the US Food
                                                                   and Drug Administration (FDA), a case report form often
                                                                   means the cases of adverse events. However, in clinical trials
                                                                   or clinical studies, a case report form means any form that
                                                                   related to clinical study, which has a broader coverage.
                                                                       Our OPMI strategy of representing these CRFs can be
                                                                   summarized as “CRF-Question-Entity” (Fig. 5A). In this
                                                                   strategy, each CRF includes one or more questions, and each
                                                                   question is about some entity or entities, and different entities
                                                                   are connected using semantic relations in ontology. The
                                                                   questions in the strategy are essential since they link CRF and
                                                                   entities. While CRFs for a particular project may be very
                                                                   specific and cannot be reused, the questions are often similar
                                                                   among projects and can be reused. It is also noted that the same
Fig. 3. Identification of conditions associated with the heart     question may be expressed in different words, for example, the
surgery and the following AKI adverse event using OHDSI            questions “Are you aged less than 18 years old” and “Are you
data and OPMI ontololgy modeling. The condition terms are          aged 18 years or younger?” are essentially the same question.
represented using HPO.                                             Once we model the entity or content behind the question, we
                                                                   do not need to worry about different expression formats.
    We used OHDSI data provided by the IQVIA Pharmetric
                                                                       Fig. 5B provides an example on how the “CRF-Question-
Plus database. Our OHDSI cohort study identified a total of
                                                                   Entity” can be used. This example illustrates the KPMP
15,548 patients that fulfilled our selection criteria. These
                                                                   eligibility assessment form, which includes different questions.
patients were categorized as having a heart surgery-associated
                                                                   We defined two specific types of questions: exclusion question
AKI adverse event.
                                                                   and inclusion question. An exclusion question is a question
    Our demographic study of the cohort data showed that           where a positive answer of the question would lead to the
among all the identified 15,548 patients, 72% are male and         exclusion of the participant candidate from the specific clinical
28% are female patients. The patient groups aged greater than      study. For example, if a person is aged 17 years, he or she will
55 years old occupied 78.5% of the AKI adverse event cases.        answer Yes to a “Whether age less than 18 years” question.
The high incidence in advanced age group is consistent with        These questions are explicitly asked in the CRFs for IRB and
the previous report [21]. Different from the previous reports of   legality requirement which are frequently asked in other
higher risk in female patients [21, 24], our study showed a        clinical studies besides KPMP. These questions are also often
much higher incidence (18:7) in male patients than in female       time anchored in multiple CRF forms at different stage of the
patients. The underlying reasons deserve further investigation.    studies. Even though these questions may not be necessarily
                                                                   important to the scientific interests, they are important in the
    The conditions during 30 days before the heart surgery         context of precision medicine studies to enroll participants. In
associated with AKI adverse events are represented and             this example, the age can be calculated from the date of birth
classified using the Human Phenotype Ontology (HPO) (Fig.          recorded in the database or retrieved from other questions.
3). The largest group of phenotype conditions is the               However, the definition of the concepts in the ontology enables
abnormality of the cardiovascular system. Many of these            us to raise questions from different angles and with additional
conditions might be reasons for heart surgery, and some of         information. Since this is an exclusion question that defines an
them might have higher chance to causally link with the AKI        exclusion criterion, the person’s positive answer will indicate
occurrence. For example, our study found that 8433 patients        that he or she is ineligible for the KPMP study. This specific
(54%) had coronary arteriosclerosis. The identified patients       question is about the entity term ‘age less than 18 years’, and
were also associated with other phenotypes including kidney        then we can logically define this term as a subclass of ‘age’,
disease, pain, dyspnea, hyperlipidemia, and Type II diabetes       which is a physical quality by itself. Furthermore, we can
(Fig. 3). Our cohort includes 7,546 patients with hypertensive     define this specific age quality with a specific measured value:
disorder, 4,684 with kidney disease, 5,121 with hyperlipidemia,
4,561 with Type 2 diabetes, and 4,523 with dyspnea.                                ‘quality measured by year’ max 17
    Specific surgery types were also identified. For example,         Such a logical definition can be parsed and understood by
our cohort study found that many patients underwent different      computers. Therefore, our strategy successfully defines the
types of valvular procedures, which were previously found to       question, what the question is about, and how the question is
be associated with a higher risk [23].                             used in the eligibility assessment CRF.
   One use of such strategy is the interoperability of CRFs and   as long as their questions can be mapped to the OPMI question
CRF questions. For example, some new European precision           IDs, OPMI will be able to provide the underlying entities and
medicine project may quickly sum up their CRFs using the          their relations. This way can help support the CRF and clinical
questions defined in OPMI. Their specific questions can differ,   data standardization, sharing, and cross-institute data analysis.
and their ways to express their questions can differ. However,




                                          Fig. 4. CRFs developed in the KPMP project.




Fig. 5. OPMI design pattern of representing CRFs. (A) General “CRF-Question-Entity” design pattern; (B) Example of eligibiilty
 assement CRF. This form includes many questions such as “Whether age less than 18 years old”, which is about the age quality
                that has a measured value of less than 18 years old. All these are logically represented in OPMI.

                                                                  Entity” strategy as described above. In addition, these clinical
E. OPMI representaiton of clinical metadata
                                                                  variables can be represented as metadata, i.e., “data about
    The follow-up Omics and pathology studies in KPMP             data”, which sum up the clinical variable types to be studied in
would generate a lot of genes up- or down-regulated given         KPMP and other studies. These ontologically represented
different conditions. The clinical variables become a big pool    clinical variables will later be useful in systematic Omics data
of conditions that would influence the data analysis of the       analysis by providing possible reasons for some statistically
follow-up data analysis. The conditions are essentially           identified Omics data analysis results.
reflected by the “entity” part laid out in the “CRF-Question-
   Table 1 provides a set of representative metadata types that                F. OPMI statistics
are derived from the entities referred by the KPMP CRF                             The latest release of OPMI contains a total of 2,958 terms,
questions, which are defined in the ~30 KPMP CRFs.                             including 2,701 classes, 124 object properties, 2 data
                                                                               properties, and 118 annotation properties. Among these terms,
                                                                               340 terms have OPMI_ namespace, and the other terms were
 TABLE I.        REPRESENTATIVE KPMP CLINICAL METADATA TYPES                   imported from over 30 existing ontologies. The full ontology
    Metadata types                      Metadata Examples                      statistics of OPMI can be found on the Ontobee ontology
                                                                               statistics website at: http://www.ontobee.org/ontostat/OPMI.
                                  Measurement protocol details
      Quality and
                              (e.g., arm and stand/sit/lay position in         G. OPMI-based data query and analysis
     measurements
                                   blood pressure measurement)
                                                                                  The OPMI ontology is being developed with many
    Health conditions        Comorbities, pregnancy, adverse events            applications in mind. Here we demonstrate the usage of the
                                 drug medication, prior surgeries
                                                                               OPMI information for querying for two important questions.
  Medical interventions          transplantation, dialysis, biopsy,                The first example is to use SPARQL to query what
                                          transplantation
                                                                               questions are exclusion questions in the KPMP eligibility
                             Additional prescription drugs, recreation         assessment form and what entities these questions are about
  Substances exposed to
                                 drugs, cigarettes, and alcohols               (Fig. 6A). With only a few lines, this query easily identified
                                employment status, race, ethnicity,            those exclusion questions and the entities to which the
  Socioeconomic factors          education level, income, health               questions refer.
                                        insurance status
                                                                                   Based on the exclusion question setting and participant
     Environmental
                                  county, state, country, hospital,            candidates’ answers, we can identify which candidates are
                                      primary care location                    ineligible. We generated a use case demonstration to illustrate
                                 collection time, processing time,             such an application (Fig. 6B). In our sandbox study, there are 3
       Biosample             transportatoin tracking, biopsy location,         candidates who provided different answers to a list of
                                  storage location, storage time               eligibility questions. These candidates and their provided
                                                                               answers can be represented as instances of OPMI classes. A
     Patient reported     patient experience, quality of life, pain, axiety,
        outcomes                     complicatoin, likert scale
                                                                               DL (description logic) query can be used developed to query
                                                                               the data. Let us assume the 3 clinical study participant
      Patient study          pass or fail screening, whether informed          candidates came from 2 different recruitment sites (e.g., UT
     status tracking        consent signed, is active in study? is live?       Southwestern and Yale University). Since we used the same
                                                                               ontology and terminology, we can query across different sites.
    Electronic health           source of EHR, record availability,
     record (EHR)               processing/harmonization method
                                                                               As shown in Fig. 6B, we could identify that two of the
                                                                               participants answered yes to the ‘Whether age less than 18
                                                                               years’ question. Based on the exclusion rule, this candidate is
                                                                               not qualified for participating in the KPMP project.
Fig. 6. OPMI query examples. (A) SPARQL query of exclusion questions and the entities that the questions are about as defined
    in KPMP eligibility assessment form. Ontobee SPARQL (http://www.ontobee.org/sparql) was used for this query. (B) DL
  (description logic) query of who are ineligible based on an exclusion question. This sandbox example includes three patients,
      each of which provided some answers to CRF questions. The DL query was conducted using the Protégé OWL editor.


                                                                    by ontology can be used to support CDM description and
                       IV. DISCUSSION                               harmonize the integration of data from different CDM systems.
    To support challenging precision medicine studies, we can       While the current study focuses on OHDSI OMOP CDM, we
greatly benefit from ontologies to represent, standardize, share,   plan to study other CDMs and test how OPMI can be used to
and integrate various clinical and biomedical big data. Similar     harmonize different CDMs at a semantic ontology
to other big data domains, the big data in precision medicine       representation level.
have features of high volume, high variety, high velocity, and          The follow-up KPMP study provides a more systematic
high veracity. As an open source ontology in the domain of          and integrated use case to study the kidney disease precision
precision medicine, OPMI is a timely community-based effort         medicine. Over 20 universities and institutes will participate in
to systematically represent various precision medicine-related      the KPMP, recruiting individuals with various forms of acute
entities and how these entities are related. Our use case studies   kidney injury (AKI) and chronic kidney disease (CKD). Each
demonstrate that OPMI, together with other existing OBO             participant will be biopsied, and the kidney tissue samples will
ontologies, is able to support OHDSI CDM and OHDSI data             be divided for assays including RNA-seq, proteomics,
analysis, as well as KPMP CRF and associated content                metabolomics, pathology, and histological studies. To better
representation and analysis, leading to valuable clinical and       analyze the basic assay data, we will need to fully capture the
scientific insights.                                                clinical data types and all instance data from each patient
    The ontology representation of different common data            given different conditions. With this information, we can then
models (CDMs) may provide a feasible way to semantically            analyze whether an Omics finding is related to a clinical
integrate the different CDM systems. The CDMs, like OMOP            variable (such as age or biological sex).
CDM, provides a robust platform to standardize data from                Our CRF-Question-Entity strategy is a new way to capture
different databases and clinical studies. The OMOP relational       the CRF contents and their associated entities. CRFs are
database CDM is easy to be interpreted by humans. The               commonly used. It is time consuming to generate CRFs. Once
relations between elements in different tables can be linked and    generated and used for a specific study they are then archived,
queried through relational database primary keys and foreign        but not reused for similar studies. To support efficient CRF
keys. However, the CDM relations are indirect (through              generation and reuse, our ontology-based strategy
foreign keys instead of direct linkages), and the representation    systematically record CRFs, their associated questions, and
is difficult to be interpreted by machines without human            the question-referred entities. Although specific CRFs may not
operation. Meanwhile, the CDM model is overall a high level         be reused, the questions often reappear in different forms.
design and may not be used to handle deep granularity as            Although many questions are expressed differently, they are
ontology can do. Our OPMI modeling (Fig. 2) shows that the          designed to capture the same concepts. Through modeling and
CDM elements and their relations can be logically represented       representation of the underlying concepts, we are able
using ontology. The OHDSI-based kidney adverse event data           semantically define questions, which then further help define
analysis (Fig. 3) further demonstrated that the ontological         the CRFs. We believe that such a strategy can help automate
modeling and application can support practical research studies.    the process of digitalizing and processing CRFs, supporting
In this case, OMOP Condition cannot differentiate adverse           clinical research.
events as a consequence from a medical intervention (e.g.,
surgery or drug treatment) from the symptoms or abnormal                To the best of our knowledge, such a CRF-Question-Entity
phenotypes of an on-going disease. However, based on the            strategy is first proposed and implemented in this study. This
adverse event definition, we can design a method to perform         strategy was inspired by our own previous ontology
such a differentiation in ontology level i.e., that an adverse      representation and analysis of 12 informed consent forms from
event is an abnormal condition that occurs after a medical          pharmacies and local governments [25]. The representation of
intervention. In our study, we only considered AKI adverse          those forms allowed us to compare different questions in
event that did not occur within 30 days before heart surgery but    different forms. However, that study did not emphasize the
did occur after the heart surgery. The representation and           representation of the concepts in reality that the questions are
analysis of the conditions before heart surgery using the           designed determine. Abidi et al. presented a framework to
Human Phenotype Ontology (HPO) (Fig. 3) allowed us to have          semi-automatically extract medical entities from referral
a clear idea on how the patients’ information (e.g., age and        letters, classifying the unstructured referral letters according to
symptoms) and heart surgery are associated with the AKI             their semantic types based on SNOMED-CT [26], and
adverse event. However, even though the ontology can help           transcribe CRFs based on the extracted information from the
better represent and interpret the adverse event definitions, the   referral letters. Such a strategy does not result in ontology
ontology by itself does not directly handle large volumes of big    representation of CRFs. However, the semi-automatic
data well, for which OMOP is good at. Therefore, our ontology       extraction of medical entities from text is a valuable way to
representation can be used as a complementary method to             improve the speed of ontology development. Lin et al.
support OMOP data analysis. Furthermore, the logic generated        presented a multi-technique approach to facilitate electronic
CRF (eCRF) design by adopting common data element                   with the coverage of OPMI and represents one area of potential
standards and ontology-based knowledgebase [27]. It is likely       collaboration. It will also be interesting to compare the
that our OPMI CRF-Question-Entity representation will               commonalities and differences between the CRFs in
indeed support eCRF development. OPMI will be able to               ClinEpiDB and KPMP, and provide template CRFs for other
provide a pool of questions for eCRF designers to choose and        clinical projects.
use. Once a set of questions are defined, our system will be
able to allow users to automatically identify the concepts in           In addition, OPMI is also being explored to support many
reality behind these questions and the semantic relations           other community-based precision medicine projects, including
between the entities.                                               the representation of clinical trial terms as seen in
                                                                    ClinicalTrials.gov, a database of clinical studies conducted
    We presented the OPMI and its CRF-Question-Entity               around      the    world     (https://clinicaltrials.gov/).  The
strategy in the Seventh Clinical and Translational Science          ClinicalTrials.gov database defines many clinical trial related
Ontology Workshop, Orlando, Florida, on February 20 2019.           terms (https://prsinfo.clinicaltrials.gov/definitions.html). We
This workshop had the theme of “Ontology for Precision              are currently collaborating with the researchers in the US
Medicine: From Genomes to Public Health”. Our presentation          National Institute of Health (NIH) and model and represent
and another one-hour discussion on this topic in the next day       these terms in OPMI.
were well-received. While there were efforts to record CRF
questions and answers, our strategy of ontological modeling of                           ACKNOWLEDGMENT
the underlying semantic meanings of CRF questions was
generally considered novel. Constructive and insightful                 This KPMP project is supported by the NIH National
comments were also received, for example, how to properly           Institute of Diabetes and Digestive and Kidney Diseases
represent the reality of ‘unknown answer to question’. These        (NIDDK) U2C Project #: 1U2CDK114886-01. We appreciate
comments are being carefully considered in our OPMI                 Dr. Deborah Hoshizaki’s discussion and support during the
development.                                                        ontology development and applications. We also appreciate the
                                                                    discussion and feedback provided by the attendees (including
    OPMI is a community effort. Its initial development came        Matthias Brochhausen, Peter Elkin, William Hogan, etc.) in the
from the development of the Ontology of Respiratory Disease         Seventh Clinical and Translational Science Ontology
Investigation (ORDI), which ontologically represented many          Workshop.
clinical terms frequently used in the respiratory disease studies
[28]. Respiratory diseases are among the leading causes of
                                                                                   ADDRESS FOR CORRESPONDENCE
death worldwide. It remains a challenge to standardize,
integrate, and analyze high volume and heterogeneous                   Please contact YH from the University of Michigan, Ann
respiratory disease investigation data for deep mechanism           Arbor, MI, USA. Email address: yongqunh@med.umich.edu.
understanding and rationale treatment design. One study             Telephone: +1-734-615-8231.
surveyed hundreds of residents from the urban and suburb
communities associated with various variables and different                                  REFERENCES
respiratory diseases [28].
                                                                    [1]   R. Higdon, W. Haynes, L. Stanberry, E. Stewart, G. Yandl, C.
    Another use case is the application of OPMI to support the            Howard, et al., "Unraveling the Complexities of Life Sciences
National Physique and Health Database in China                            Data," Big Data, vol. 1, pp. 42-50, Mar 2013.
(http://cnphd.bmicc.cn/chs/en/), which was initiated in 2001,       [2]   G. Hripcsak, J. D. Duke, N. H. Shah, C. G. Reich, V. Huser,
and is being maintained by the Biologic Medicine Information              M. J. Schuemie, et al., "Observational Health Data Sciences
Center of China (BMICC, http://www.bmicc.org), Institute of               and Informatics (OHDSI): Opportunities for Observational
Basic Medical Sciences (IBMS), Chinese Academy of Medical                 Researchers," Stud Health Technol Inform, vol. 216, pp. 574-
Sciences, Beijing, China. The database contains the physical              8, 2015.
                                                                    [3]   F. S. Collins, K. L. Hudson, J. P. Briggs, and M. S. Lauer,
and health data of over 160,000 Chinese from different
                                                                          "PCORnet: turning a dream into reality," J Am Med Inform
locations, genders, and ages. Over 200 parameters, related to             Assoc, vol. 21, pp. 576-7, Jul-Aug 2014.
morphology, function and physical capacity of an individual         [4]   T. R. Ross, D. Ng, J. S. Brown, R. Pardee, M. C. Hornbrook,
body, were identified and used in the database. In addition,              G. Hart, et al., "The HMO Research Network Virtual Data
more data will be collected and added to this database in the             Warehouse: A Public Data Model to Support Collaboration,"
future. OPMI is being applied to standardize and analyze the              EGEMS (Wash DC), vol. 2, p. 1049, 2014.
data in the database and make the data more accessible and          [5]   T. Souza, R. Kush, and J. P. Evans, "Global clinical data
useful by others.                                                         interchange standards are here!," Drug Discov Today, vol. 12,
                                                                          pp. 174-81, Feb 2007.
    The ClinEpiDB project, launched in February 2018, is an
                                                                    [6]   B. Smith, M. Ashburner, C. Rosse, J. Bard, W. Bug, W.
open-access online resource enabling investigators to                     Ceusters, et al., "The OBO Foundry: coordinated evolution of
maximize the utility and reach of their clinical epidemiology             ontologies to support biomedical data integration," Nat
data and to make optimal use of the data released by others               Biotechnol, vol. 25, pp. 1251-5, Nov 2007.
(https://clinepidb.org). With a focus on diarrheal and infectious   [7]   Y. He, Z. Xiang, J. Zheng, Y. Lin, J. A. Overton, and E. Ong,
disease epidemiology, ClinEpiDB datasets involve many                     "The eXtensible ontology development (XOD) principles and
clinical epidemiology-related questions from CRFs.                        tool implementation to support ontology interoperability," J
Representing these requires many clinical terms that overlap              Biomed Semantics, vol. 9, p. 3, Jan 12 2018.
[8]    The Ontology for General Medical Science (OGMS).                [20]   W. Ceusters and J. Blaisure, "A realism-based view on counts
       Available: https://github.com/OGMS/ogms                                in OMOP's common data model," 2017, pp. 1-8. DOI:
[9]    A. Bandrowski, R. Brinkman, M. Brochhausen, M. H. Brush,               10.3233/978-1-61499-761-0-55.
       B. Bug, M. C. Chibucos, et al., "The Ontology for Biomedical    [21]   J. B. O'Neal, A. D. Shaw, and F. T. t. Billings, "Acute kidney
       Investigations," PLoS One, vol. 11, p. e0154556, 2016.                 injury following cardiac surgery: current understanding and
[10]   R. R. Brinkman, M. Courtot, D. Derom, J. M. Fostel, Y. He,             future directions," Crit Care, vol. 20, p. 187, Jul 4 2016.
       P. Lord, et al., "Modeling biomedical experimental processes    [22]   M. G. Lagny, F. Jouret, J. N. Koch, F. Blaffart, A. F.
       with OBI," J Biomed Semantics, vol. 1 Suppl 1, p. S7, 2010.            Donneau, A. Albert, et al., "Incidence and outcomes of acute
[11]   T. Groza, S. Kohler, D. Moldenhauer, N. Vasilevsky, G.                 kidney injury after cardiac surgery using either criteria of the
       Baynam, T. Zemojtel, et al., "The Human Phenotype                      RIFLE classification," BMC Nephrol, vol. 16, p. 76, May 30
       Ontology: Semantic Unification of Common and Rare                      2015.
       Disease," Am J Hum Genet, vol. 97, pp. 111-24, Jul 2 2015.      [23]   M. H. Rosner and M. D. Okusa, "Acute kidney injury
[12]   C. J. Mungall, C. Torniai, G. V. Gkoutos, S. E. Lewis, and M.          associated with cardiac surgery," Clin J Am Soc Nephrol, vol.
       A. Haendel, "Uberon, an integrative multi-species anatomy              1, pp. 19-32, Jan 2006.
       ontology," Genome Biol, vol. 13, p. R5, 2012.                   [24]   K. A. Ramos and C. B. Dias, "Acute Kidney Injury after
[13]   Y. He, S. Sarntivijai, Y. Lin, Z. Xiang, A. Guo, S. Zhang, et          Cardiac Surgery in Patients Without Chronic Kidney
       al., "OAE: The Ontology of Adverse Events," J Biomed                   Disease," Braz J Cardiovasc Surg, vol. 33, pp. 454-461, Sep-
       Semantics, vol. 5, p. 29, 2014.                                        Oct 2018.
[14]   Y. Lin, M. R. Harris, F. J. Manion, E. Eisenhauer, B. Zhao,     [25]   Y. Lin, J. Zheng, and Y. He, "VICO: Ontology-based
       W. Shi, et al., "Development of a BFO-based Informed                   representation and integrative analysis of vaccination
       Consent Ontology (ICO)," in The 5th International                      informed consent forms," J Biomed Semantics, vol. 7, p. 20,
       Conference on Biomedical Ontologies (ICBO), Houston,                   2016.
       Texas, USA, October 8-9, 2014, 2014.                            [26]   S. H. Brown, P. L. Elkin, B. A. Bauer, D. Wahner-Roedler, C.
[15]   Z. Xiang, M. Courtot, R. R. Brinkman, A. Ruttenberg, and Y.            S. Husser, Z. Temesgen, et al., "SNOMED CT: utility for a
       He, "OntoFox: web-based support for ontology reuse," BMC               general medical evaluation template," AMIA Annu Symp
       Res Notes, vol. 3:175, pp. 1-12, 2010.                                 Proc, pp. 101-5, 2006.
[16]   D. L. Rubin, N. F. Noy, and M. A. Musen, "Protege: a tool       [27]   C. H. Lin, N. Y. Wu, and D. M. Liou, "A multi-technique
       for managing and using terminology in radiology                        approach to bridge electronic case report form design and
       applications," J Digit Imaging, vol. 20 Suppl 1, pp. 34-46,            data standard adoption," J Biomed Inform, vol. 53, pp. 49-57,
       Nov 2007.                                                              Feb 2015.
[17]   E. Ong, Z. Xiang, B. Zhao, Y. Liu, Y. Lin, J. Zheng, et al.,    [28]   H. Yu, J. Zheng, H. Wang, E. Ong, X. Ye, Z. Zhang, et al.,
       "Ontobee: A linked ontology data server to support ontology            "ORDI: An integrative community-driven ontology to support
       term dereferencing, linkage, query and integration," Nucleic           standardized representation and data analysis for respiratory
       Acids Res, vol. 45, pp. D347-D352, Jan 04 2017.                        disease investigations " in The 11th International Biocuration
[18]   P. Grenon and B. Smith, "SNAP and SPAN: Towards                        Conference (BioCuration-2018), Shanghai, China, April 8-11,
       Dynamic Spatial Ontology," Spatial Cognition and                       2018.
       Computation, vol. 4, pp. 69-103, 2004.
[19]   R. Arp, B. Smith, and A. D. Spear, Building Ontologies Using
       Basic Formal Ontology. MIT Press: Cambridge, MA, USA,
       2015.