=Paper=
{{Paper
|id=Vol-2931/ICBO_2019_paper_34
|storemode=property
|title=OPMI: the Ontology of Precision Medicine and Investigation and its Support for Clinical Data and Metadata Representation and Analysis
|pdfUrl=https://ceur-ws.org/Vol-2931/ICBO_2019_paper_34.pdf
|volume=Vol-2931
|authors=Yongqun He,Edison Ong,Jennifer Schaub,Frederick Dowd,John F. O'Toole,Anastasios Siapos,Christian Reich,Sarah Seager,Ling Wan,Hong Yu,Jie Zheng,Christian Stoeckert,Xiaolin Yang,Sheng Yang,Becky Steck,Christopher Park,Laura Barisoni,Matthias Kretzler,Jonathan Himmelfarb,Ravi Iyengar1,Sean D. Mooney
|dblpUrl=https://dblp.org/rec/conf/icbo/HeOSDOSRS000SYY19
}}
==OPMI: the Ontology of Precision Medicine and Investigation and its Support for Clinical Data and Metadata Representation and Analysis ==
OPMI: the Ontology of Precision Medicine and
Investigation and its support for clinical data and
metadata representation and analysis
Yongqun He1, Edison Ong1, Jennifer Schaub1, Frederick Dowd2, John F. O’Toole3, Anastasios Siapos4, Christian
Reich4, Sarah Seager4, Ling Wan1,5, Hong Yu6, Jie Zheng7, Christian Stoeckert7, Xiaolin Yang8, Sheng Yang8, Becky
Steck1, Christopher Park2, Laura Barisoni9, Matthias Kretzler1, Jonathan Himmelfarb2, Ravi Iyengar10, Sean D.
Mooney2, for the Kidney Precision Medicine Project Consortium
1
University of Michigan Medical School, Ann Arbor, MI 48109, USA; 2 University of Washington, Seattle, WA 98195, USA; 3
Cleveland Clinic, Cleveland, OH, USA; 4 IQVIA, Brighton, UK; 5 OntoWise, Nanjing, Jiangzu, China; 6 Department of
Pulmonary and Critical Care Medicine, Guizhou Provincial People’s Hospital, Guiyang, Guizhou 550002, China; 7 University of
Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA; 8 Institute of Basic Medical Science, Chinese
Academy of Medical Sciences, Beijing, China; 9 Duke University, NC, USA; 10 University Icahn School of Medicine at Mount
Sinai, NY 10029, USA.
Abstract—Consortia conducting precision medicine studies factors (e.g., biological sex and age) are generally poorly
face a major challenge of integrating big data including clinical recorded and studied. Before investigators can deeply and
and biomedical data. In this study, we report our development of accurately analyze precision medicine data, the clinical data
the community-driven Ontology of Precision Medicine and need to be captured and modeled systematically and robustly.
Investigation (OPMI) and its applications in clinical data and For example, to achieve this goal, KPMP investigators created
metadata representation. OPMI has been used to represent the over 30 case report forms (CRFs), which are being used across
common data model (CDM) of the Observational Health Data many institutes. These clinical forms cover over 2000
Sciences and Informatics (or OHDSI) program. It has also been questions and hundreds of clinical factors. Each of the clinical
used to represent approximately 30 case report forms defined by
factors may affect the phenotype or omics analysis outcomes.
the NIH-supported Kidney Precision Medicine Project (KPMP).
Our case studies showed that OPMI is able to semantically and To support clinical data collection and analysis, there have
precisely represent the OHDSI CDM, various KPMP clinical exist many common data models (CDMs), including the CDMs
forms, and their associated data and metadata. Such ontological of the OHDSI Observational Medical Outcomes Partnership
representations support standardized data representation, (OMOP) [2], the Patient-Centered Outcomes Research
sharing, recording, integration, and advanced analysis. Network (PCORnet) [3], the healthcare management
organizations’ research network (HMORN) virtual data
Keywords— Common data model; kidney; case report form.
warehouse [4], and the Study Data Tabulation Model (SDTM)
of the Clinical Data Interchange Standards Consortium
I. INTRODUCTION (CDISC) [5]. One issue is that these CDMs are often not
Precision medicine is an emerging medical approach for interoperable at the semantic level. We hypothesized that an
disease prevention and treatment that takes into account ontological representation of the OMOP CDM (and other
individual variability in genes, environment, and lifestyle. An CDMs) would better semantically represent and standardize the
example of a study in precision medicine is the Kidney data formatted based on the CDM and support better data
Precision Medicine Project (KPMP; http://kpmp.org), a large analysis. As an example, the OMOP CDM is a relational
NIH/NIDDK-funded consortium project with the aim of database model that supports interoperable analyses of
understanding and treating human kidney diseases. With a disparate observational databases [2]. The OMOP CDM has
focus on human studies, the KPMP project covers clinical been widely adopted to support the accommodation of
recruitment, clinical study, biopsy, pathology, molecular data observational medical data from disparate data sources.
and Omics data analysis. With the large amounts of data However, the terms in the OMOP CDM lacks strong semantic
generated, we will identify how to systematically collect, relations. For example, the “Condition” in the OMOP CDM
represent, integrate, and analyze and make use of the big data could be a natural disease or an adverse event following a
with the help of ontologies. surgery or drug administration. The usage of ontology makes it
possible to better differentiate the two types of conditions and
Precision medicine faces the challenge of big data. Big data support better data representation and analysis.
represents the data characterized with the 5 Vs: volume,
veracity, velocity, variety, and value [1], which requires A formal biomedical ontology is a human-comprehensible
specific technology and analytical methods for its and computer-interpretable set of terms and relations that
transformation into meaningful knowledge. represent entities in a specific domain and their relationships to
each other. The Open Biological/Biomedical Ontology (OBO)
In precision medicine, basic research results, such as Omics community [6] has developed over 150 biomedical ontologies
study results, are affected by many clinical factors. Clinical
Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
that support alignment with each other. Most current OBO terms and their associated upper level terms, and the Protégé
ontologies cover basic research domains. Our proposed OWL editor tool [16] was used to display the structure.
Ontology of Precision Medicine and Investigation (OPMI) has
recently been included in the OBO library ontology list, which C. OPMI representation of KPMP case report forms and
aims to focus on the representation of entities and relations in their contents using CRF-Question-Entity model
the domain of precision medicine and its investigation. The KPMP CRFs were extracted, modeled, and analyzed
using the OPMI platform. The CRFs and the contents defined
In this study, we report the OPMI development strategy and
in CRFs were represented using a newly designed “CRF-
results with a focus on its supporting clinical studies. OPMI
Question-Entity” model. Based on this model, OPMI generates
has been used to ontologize OMOP CDM and CRFs and to
specific ontology terms to represent various CRFs in the
further support the KPMP precision medicine study.
ontology. Each CRF usually includes many textual questions,
e.g., “Are you aged less than 18 years old?” OPMI also
II. METHODS represents such textual questions, and also identifies the
entities in reality (e.g., age and its value of less than 18 years
A. OPMI ontology development methods old) that are referred to by the questions. Many of these entity
OPMI is developed as a community-based open source terms are imported from existing ontologies. All the labels,
biomedical ontology by following the OBO Foundry ontology synonyms and definitions of the CRFs and CRF-related terms
development principles such as openness and collaboration [6]. were carefully evaluated by the KPMP community and domain
The eXtensive Ontology Development (XOD) strategy [7] was experts in the field.
applied for the ontology development. Specifically, OPMI
reuses many terms and relations from existing ontologies, D. OPMI format, source code, license, and deposition
including the Ontology of General Medical Science (OGMS) Formatted in the W3C standard Web Ontology Language
[8], Ontology for Biomedical Investigations (OBI) [9, 10], (OWL2), the OPMI source code is open and freely available at
Human Phenotype Ontology (HP) [11], Uberon multi-species
GitHub: https://github.com/OPMI/opmi. The OPMI uses the
anatomy ontology (UBERON) [12], Ontology of Adverse
open Creative Commons CC-BY 4.0 license
Events (OAE) [13], and Informed Consent Ontology (ICO)
[14]. The tool Ontofox (http://ontofox.hegroup.org) [15] was (https://creativecommons.org/licenses/by/4.0/).
used to extract and reuse terms from these existing ontologies. The OPMI ontology is deposited in several well recognized
ontology repositories, including the Ontobee [17] website:
OPMI-specific terms were assigned new identifiers using http://www.ontobee.org/ontology/OPMI, NCBO BioPortal
the prefix “OPMI_” followed by auto-generated seven-digit website: https://bioportal.bioontology.org/ontologies/OPMI, as
numbers. The Protégé OWL editor well as OLS: https://www.ebi.ac.uk/ols/ontologies/opmi.
(http://protege.stanford.edu/) was used for the OPMI
visualization and manual term editing. The Hermit reasoner E. OPMI query and analysis
(http://hermit-reasoner.com/) inside the Protégé OWL editor To demonstrate the usage of OPMI, we developed
was applied for ontology consistency checking and SPARQL scripts to query OPMI using Ontobee’s SPARQL
inferencing. query endpoint (http://www.ontobee.org/sparql), and DL
(description logic) query using the Protégé OWL editor.
B. OPMI representation and analysis of OHDSI CDM
We used OPMI to ontologically model the OMOP CDM III. RESULTS
used in the OHDSI program. As the underlying data standard
of OHDSI, the OMOP CDM allows for interoperable analyses A. OPMI design and top level structure
of disparate observational databases. To demonstrate the usage Fig. 1 illustrates selected key OPMI terms and top level
of OPMI to study OMOP CDM, we used the data extracted hierarchical structure. OPMI adopts the Basic Formal
from the IQVIA Pharmetric Plus database data Ontology (BFO) [18, 19] as its upper level ontology. The
(https://www.iqvia.com), which had already been converted BFO:continuant branch represents entities (e.g., ‘material
into the OMOP CDM format. In this study, kidney disease entity’ which endure through time. The BFO:occurrent branch
data were extracted from the database based on the OPMI data represents entities that are temporal (e.g., temporal region) and
model. Supported by this model, we developed an algorithm which occur over time (e.g., ‘process’). As the default upper
to identify the concept IDs that covered the correct conditions level ontology in the OBO ontology community, BFO has
of interest. Once identified, we extracted the patients who been adopted by many ontologies. The alignment with the
initially did not have acute kidney injury (AKI), then were BFO structure makes OPMI interoperable with a large number
treated with heart surgery, and diagnosed with AKI with 14 of other ontologies, including those OBO ontologies.
days after the surgery. The SNOMED concept term "Acute OPMI imports and semantically links terms from many
renal failure syndrome" and 62 other associated concept terms existing biomedical ontologies, such as OGMS [8], OBI [9,
were used. The conditions within 30 days before the heart 10], HP [11], UBERON [12], and ICO [14] (Fig. 1). There are
surgery were extracted and mapped to the Human Phenotype many reasons to choose these ontologies. First, the importing
Ontology (HP) [11]. To better analyze the subset of related HP and reusing of these reliable precision medicine-related
terms, the tool Ontofox [15] was used to extract these HP ontology terms avoids the reinvention of the wheel and also
provides a good starting point for OPMI development. Second, are frequently reused. But it is time consuming to build up
all these ontologies are reliable OBO library ontologies new CRFs from the ground, and it is difficult to compare the
(http://obofoundry.org/) and can all be aligned with the same questions and results from different CRFs. To make more
upper level ontology BFO. Such alignments allow the efficient CRF design and usage, it would be important to
interoperability among these reused terms with the same standardize CRF components. Textual questions are the key
semantic relations. The semantic alignments and components of CRFs. The same questions (e.g., age and
interoperability also make it efficient to build up OPMI. It is biological sex questions) may appear in different CRFs.
noted that the OBO Foundry aims to form a non-redundant set Therefore, the standardization of the questions becomes
of ontologies to cover different biological and biomedical essential to the whole CRF standardization process.
areas, the terms imported from the other ontologies are Meanwhile, the same textual question may be expressed in
designed to be unique and do not overlap with terms from different ways. From a scientific research standpoint, we
other OBO library ontologies. should more focus on what each question is really about in
OPMI also includes many OPMI-specific precision reality, i.e., the entities or metadata types behind each question
medicine-related terms such as ‘precision medicine rather than how a question is expressed. Accordingly, we
investigation’. The newly added OPMI terms also includes developed the CRF-Question-Entity strategy with the aim to
those CRF terms, textual questions used in CRFs, the standardize CRF questions, entities (or metadata types) and
question-related entities in reality, clinical metadata terms answers under these questions, leading to the standardization
related to precision medicine studies, and terms related to and efficient analysis of different CRFs. While the KPMP
clinical and health-related CDMs. project will learn a lot from the ontologization of KPMP CRFs
The most important reason why OPMI focuses on and their contents, many of benefits will go to future CRF
ontologization of CRFs and CRF questions is that the CRF studies that do not have to go over the CRF generation from
development is critical to clinical studies and a lot of questions scratch as KPMP has done.
entity (BFO)
continuant (BFO) occurrent (BFO)
information temporal
content entity (IAO) realizable quality material process (BFO)
entity (BFO) (BFO) entity (BFO) region (BFO)
document planned bodily process
textual disposition Phenotypic specimen
(IAO) process (OBI) (OGMS)
entity (IAO) (BFO) abnormality (OBI)
(HP)
informed textual medical pathological
case report disease specimen collectoin assay
consent form question intervention bodily process
form (OPMI) (OGMS) fever (HP) process (OBI) (OBI)
(ICO) (OPMI) (OAE) (OGMS)
screening and
collecting specimen
patient tracking age question kidney disease medical adverse event
from organism (OBI)
CRF (OPMI) (OPMI) (MONDO) procedure (OAE) (OAE)
eligibility assessment biopsy (OPMI) kidney adverse
surgery (OPMI)
form (OPMI) event (OAE)
Fig. 1. OPMI top level hierarchical structure and representative terms. All terms are aligned together under the BFO structure.
OPMI also includes additional terms such as ‘specimen
B. OPMI ontology design pattern to support OMOP CDM collection’ and ‘specimen assay’, which are linked to OMOP
Figure 2 represents the overall layout of OPMI ontological elements (e.g. specimen and measurement).
representation of OMOP CDM. OPMI ontology
unambiguously represents the clinical terms defined in OMOP The OPMI model clearly shows the differences between
CDM and the relations among these terms. Established on a natural disease courses and adverse events. A disease course is
realism-based view [20], OPMI treats ‘visit occurrence’ as a a pathological bodily process that produces specific signs or
process and ‘visit detail’ as information content entity. Many symptoms at a specific location of a patient. An adverse event
other processes, including ‘procedure occurrence’ and ‘device is a pathological bodily process that occurs after a medical
exposure’ but not necessarily ‘drug exposure’, are ‘part of’ the intervention such as a drug exposure or a surgery procedure
visit occurrence process. OPMI separates ‘condition [13]. According to the FDA standards, it is not necessary to
occurrence’ into different scenarios including disease course, have a causal relation between the medical intervention and the
symptom phenotype, and drug/surgery adverse events. To adverse event outcome. However, the main aim of adverse
support specimen-focused precision medicine investigations, event study is to identify potential causal relations. To identify
whether a surgery adverse event occurs, we need to ensure that relations that are commonly used among OBO ontologies. New
an abnormal medical condition occurs after a surgery instead of relations are also generated (Figure 2).
before it. Such a strategy was then used in our kidney adverse
event use case study as described below. Note that such a class level ontology design pattern (Figure
2) can also be used to represent instance level data, which can
In OMOP CDM-based database schema, foreign keys are be stored in a RDF triple store and subject to SPARQL queries
used to link different tables. In OPMI, the relations among and analyses.
these entities are more clearly represented using well-defined
Fig. 2. OPMI ontological representation of OMOP CDM elements and their relations. The terms highlighted in red boxes are
table names in OPMI CDM that are also represented as OPMI ontology terms. The terms in black boxes represent ontology terms
in OPMI to add values to the OMOP CDM. The lines with text in the middle represent the relations (i.e., object properties)
between different terms. OMOP model uses relational database primary keys and foreign keys to make links between different
CDM elements. In contrast, OPMI uses the ontology relations to more explicitly represent the linkages between terms. Such
ontology relations have the advantage of logically defining the relation meanings and directions with input and output. ICE:
information content entity.
surgery can be up to 30-50% [21, 22]. Many risk factors are
C. OHDSI kidney data analysis using OPMI stratregy associated with AKI after cardiac surgery, for example,
An important precision medicine application is related to advanced age, female gender, hypertension, hyperlipemia,
the precision medical intervention to reduce the occurrence of diabetes, surgery types, etc. [21, 23]. Therefore, the study of
various adverse events, especially severe adverse events. It is this highly prevalent and prognostically important AKI adverse
possible that the occurrences of these adverse events are due to event after heart surgery is very needed to the public health.
various genetic, health or environmental conditions. If we can The knowledge learned from this study may also later help the
identify important conditions that are correlate with the adverse study of drug-associated kidney adverse events.
events, we can then design rational tests to reduce the threats of
adverse events and support public health. Based on the Fig. 2 OPMI modeling, we developed an
algorithm to differentiate surgery adverse events from natural
In this study, we hypothesize that ontology-based semantic diseases. Specifically, our algorithm identifies and treats the
modeling, together with the usage of ontologies, including heart surgery time as the index time. To be qualified as an AKI
Human Phenotype Ontology (HP) and Ontology of Adverse adverse event following heart surgery, the patient should not
Events (OAE), could help clarify different conditions in have AKI during a period before the index time, and have AKI
OMOP CDM-compatible database, and better understand the during a short period after the index time. We then used
contributions of different factors to the presence of specific ontologies to represent the phenotypes, heart surgeries, and
adverse events. In the area of kidney adverse event research, adverse events systematically, with the aim to identify
surgery and drug-induced kidney injury is common, well insightful patterns.
recognized and an important public health problem. For
example, heart surgeries are often followed with AKI adverse
events [21]. The incidence of AKI among patients after cardiac
D. OPMI representation of KPMP case report forms
Figure 4 demonstrates the representative list of KPMP
CRFs. In total, KPMP includes approximately 30 CRFs used in
different stages of clinical study. These stages cover the
screening and patient tracking, enrollment, pre-biopsy, biopsy,
post-biopsy, and pathology test, etc. Overall, these CRFs cover
over 2,800 questions. Each question is about some specific
entities related to the clinical study. Note that for the US Food
and Drug Administration (FDA), a case report form often
means the cases of adverse events. However, in clinical trials
or clinical studies, a case report form means any form that
related to clinical study, which has a broader coverage.
Our OPMI strategy of representing these CRFs can be
summarized as “CRF-Question-Entity” (Fig. 5A). In this
strategy, each CRF includes one or more questions, and each
question is about some entity or entities, and different entities
are connected using semantic relations in ontology. The
questions in the strategy are essential since they link CRF and
entities. While CRFs for a particular project may be very
specific and cannot be reused, the questions are often similar
among projects and can be reused. It is also noted that the same
Fig. 3. Identification of conditions associated with the heart question may be expressed in different words, for example, the
surgery and the following AKI adverse event using OHDSI questions “Are you aged less than 18 years old” and “Are you
data and OPMI ontololgy modeling. The condition terms are aged 18 years or younger?” are essentially the same question.
represented using HPO. Once we model the entity or content behind the question, we
do not need to worry about different expression formats.
We used OHDSI data provided by the IQVIA Pharmetric
Fig. 5B provides an example on how the “CRF-Question-
Plus database. Our OHDSI cohort study identified a total of
Entity” can be used. This example illustrates the KPMP
15,548 patients that fulfilled our selection criteria. These
eligibility assessment form, which includes different questions.
patients were categorized as having a heart surgery-associated
We defined two specific types of questions: exclusion question
AKI adverse event.
and inclusion question. An exclusion question is a question
Our demographic study of the cohort data showed that where a positive answer of the question would lead to the
among all the identified 15,548 patients, 72% are male and exclusion of the participant candidate from the specific clinical
28% are female patients. The patient groups aged greater than study. For example, if a person is aged 17 years, he or she will
55 years old occupied 78.5% of the AKI adverse event cases. answer Yes to a “Whether age less than 18 years” question.
The high incidence in advanced age group is consistent with These questions are explicitly asked in the CRFs for IRB and
the previous report [21]. Different from the previous reports of legality requirement which are frequently asked in other
higher risk in female patients [21, 24], our study showed a clinical studies besides KPMP. These questions are also often
much higher incidence (18:7) in male patients than in female time anchored in multiple CRF forms at different stage of the
patients. The underlying reasons deserve further investigation. studies. Even though these questions may not be necessarily
important to the scientific interests, they are important in the
The conditions during 30 days before the heart surgery context of precision medicine studies to enroll participants. In
associated with AKI adverse events are represented and this example, the age can be calculated from the date of birth
classified using the Human Phenotype Ontology (HPO) (Fig. recorded in the database or retrieved from other questions.
3). The largest group of phenotype conditions is the However, the definition of the concepts in the ontology enables
abnormality of the cardiovascular system. Many of these us to raise questions from different angles and with additional
conditions might be reasons for heart surgery, and some of information. Since this is an exclusion question that defines an
them might have higher chance to causally link with the AKI exclusion criterion, the person’s positive answer will indicate
occurrence. For example, our study found that 8433 patients that he or she is ineligible for the KPMP study. This specific
(54%) had coronary arteriosclerosis. The identified patients question is about the entity term ‘age less than 18 years’, and
were also associated with other phenotypes including kidney then we can logically define this term as a subclass of ‘age’,
disease, pain, dyspnea, hyperlipidemia, and Type II diabetes which is a physical quality by itself. Furthermore, we can
(Fig. 3). Our cohort includes 7,546 patients with hypertensive define this specific age quality with a specific measured value:
disorder, 4,684 with kidney disease, 5,121 with hyperlipidemia,
4,561 with Type 2 diabetes, and 4,523 with dyspnea. ‘quality measured by year’ max 17
Specific surgery types were also identified. For example, Such a logical definition can be parsed and understood by
our cohort study found that many patients underwent different computers. Therefore, our strategy successfully defines the
types of valvular procedures, which were previously found to question, what the question is about, and how the question is
be associated with a higher risk [23]. used in the eligibility assessment CRF.
One use of such strategy is the interoperability of CRFs and as long as their questions can be mapped to the OPMI question
CRF questions. For example, some new European precision IDs, OPMI will be able to provide the underlying entities and
medicine project may quickly sum up their CRFs using the their relations. This way can help support the CRF and clinical
questions defined in OPMI. Their specific questions can differ, data standardization, sharing, and cross-institute data analysis.
and their ways to express their questions can differ. However,
Fig. 4. CRFs developed in the KPMP project.
Fig. 5. OPMI design pattern of representing CRFs. (A) General “CRF-Question-Entity” design pattern; (B) Example of eligibiilty
assement CRF. This form includes many questions such as “Whether age less than 18 years old”, which is about the age quality
that has a measured value of less than 18 years old. All these are logically represented in OPMI.
Entity” strategy as described above. In addition, these clinical
E. OPMI representaiton of clinical metadata
variables can be represented as metadata, i.e., “data about
The follow-up Omics and pathology studies in KPMP data”, which sum up the clinical variable types to be studied in
would generate a lot of genes up- or down-regulated given KPMP and other studies. These ontologically represented
different conditions. The clinical variables become a big pool clinical variables will later be useful in systematic Omics data
of conditions that would influence the data analysis of the analysis by providing possible reasons for some statistically
follow-up data analysis. The conditions are essentially identified Omics data analysis results.
reflected by the “entity” part laid out in the “CRF-Question-
Table 1 provides a set of representative metadata types that F. OPMI statistics
are derived from the entities referred by the KPMP CRF The latest release of OPMI contains a total of 2,958 terms,
questions, which are defined in the ~30 KPMP CRFs. including 2,701 classes, 124 object properties, 2 data
properties, and 118 annotation properties. Among these terms,
340 terms have OPMI_ namespace, and the other terms were
TABLE I. REPRESENTATIVE KPMP CLINICAL METADATA TYPES imported from over 30 existing ontologies. The full ontology
Metadata types Metadata Examples statistics of OPMI can be found on the Ontobee ontology
statistics website at: http://www.ontobee.org/ontostat/OPMI.
Measurement protocol details
Quality and
(e.g., arm and stand/sit/lay position in G. OPMI-based data query and analysis
measurements
blood pressure measurement)
The OPMI ontology is being developed with many
Health conditions Comorbities, pregnancy, adverse events applications in mind. Here we demonstrate the usage of the
drug medication, prior surgeries
OPMI information for querying for two important questions.
Medical interventions transplantation, dialysis, biopsy, The first example is to use SPARQL to query what
transplantation
questions are exclusion questions in the KPMP eligibility
Additional prescription drugs, recreation assessment form and what entities these questions are about
Substances exposed to
drugs, cigarettes, and alcohols (Fig. 6A). With only a few lines, this query easily identified
employment status, race, ethnicity, those exclusion questions and the entities to which the
Socioeconomic factors education level, income, health questions refer.
insurance status
Based on the exclusion question setting and participant
Environmental
county, state, country, hospital, candidates’ answers, we can identify which candidates are
primary care location ineligible. We generated a use case demonstration to illustrate
collection time, processing time, such an application (Fig. 6B). In our sandbox study, there are 3
Biosample transportatoin tracking, biopsy location, candidates who provided different answers to a list of
storage location, storage time eligibility questions. These candidates and their provided
answers can be represented as instances of OPMI classes. A
Patient reported patient experience, quality of life, pain, axiety,
outcomes complicatoin, likert scale
DL (description logic) query can be used developed to query
the data. Let us assume the 3 clinical study participant
Patient study pass or fail screening, whether informed candidates came from 2 different recruitment sites (e.g., UT
status tracking consent signed, is active in study? is live? Southwestern and Yale University). Since we used the same
ontology and terminology, we can query across different sites.
Electronic health source of EHR, record availability,
record (EHR) processing/harmonization method
As shown in Fig. 6B, we could identify that two of the
participants answered yes to the ‘Whether age less than 18
years’ question. Based on the exclusion rule, this candidate is
not qualified for participating in the KPMP project.
Fig. 6. OPMI query examples. (A) SPARQL query of exclusion questions and the entities that the questions are about as defined
in KPMP eligibility assessment form. Ontobee SPARQL (http://www.ontobee.org/sparql) was used for this query. (B) DL
(description logic) query of who are ineligible based on an exclusion question. This sandbox example includes three patients,
each of which provided some answers to CRF questions. The DL query was conducted using the Protégé OWL editor.
by ontology can be used to support CDM description and
IV. DISCUSSION harmonize the integration of data from different CDM systems.
To support challenging precision medicine studies, we can While the current study focuses on OHDSI OMOP CDM, we
greatly benefit from ontologies to represent, standardize, share, plan to study other CDMs and test how OPMI can be used to
and integrate various clinical and biomedical big data. Similar harmonize different CDMs at a semantic ontology
to other big data domains, the big data in precision medicine representation level.
have features of high volume, high variety, high velocity, and The follow-up KPMP study provides a more systematic
high veracity. As an open source ontology in the domain of and integrated use case to study the kidney disease precision
precision medicine, OPMI is a timely community-based effort medicine. Over 20 universities and institutes will participate in
to systematically represent various precision medicine-related the KPMP, recruiting individuals with various forms of acute
entities and how these entities are related. Our use case studies kidney injury (AKI) and chronic kidney disease (CKD). Each
demonstrate that OPMI, together with other existing OBO participant will be biopsied, and the kidney tissue samples will
ontologies, is able to support OHDSI CDM and OHDSI data be divided for assays including RNA-seq, proteomics,
analysis, as well as KPMP CRF and associated content metabolomics, pathology, and histological studies. To better
representation and analysis, leading to valuable clinical and analyze the basic assay data, we will need to fully capture the
scientific insights. clinical data types and all instance data from each patient
The ontology representation of different common data given different conditions. With this information, we can then
models (CDMs) may provide a feasible way to semantically analyze whether an Omics finding is related to a clinical
integrate the different CDM systems. The CDMs, like OMOP variable (such as age or biological sex).
CDM, provides a robust platform to standardize data from Our CRF-Question-Entity strategy is a new way to capture
different databases and clinical studies. The OMOP relational the CRF contents and their associated entities. CRFs are
database CDM is easy to be interpreted by humans. The commonly used. It is time consuming to generate CRFs. Once
relations between elements in different tables can be linked and generated and used for a specific study they are then archived,
queried through relational database primary keys and foreign but not reused for similar studies. To support efficient CRF
keys. However, the CDM relations are indirect (through generation and reuse, our ontology-based strategy
foreign keys instead of direct linkages), and the representation systematically record CRFs, their associated questions, and
is difficult to be interpreted by machines without human the question-referred entities. Although specific CRFs may not
operation. Meanwhile, the CDM model is overall a high level be reused, the questions often reappear in different forms.
design and may not be used to handle deep granularity as Although many questions are expressed differently, they are
ontology can do. Our OPMI modeling (Fig. 2) shows that the designed to capture the same concepts. Through modeling and
CDM elements and their relations can be logically represented representation of the underlying concepts, we are able
using ontology. The OHDSI-based kidney adverse event data semantically define questions, which then further help define
analysis (Fig. 3) further demonstrated that the ontological the CRFs. We believe that such a strategy can help automate
modeling and application can support practical research studies. the process of digitalizing and processing CRFs, supporting
In this case, OMOP Condition cannot differentiate adverse clinical research.
events as a consequence from a medical intervention (e.g.,
surgery or drug treatment) from the symptoms or abnormal To the best of our knowledge, such a CRF-Question-Entity
phenotypes of an on-going disease. However, based on the strategy is first proposed and implemented in this study. This
adverse event definition, we can design a method to perform strategy was inspired by our own previous ontology
such a differentiation in ontology level i.e., that an adverse representation and analysis of 12 informed consent forms from
event is an abnormal condition that occurs after a medical pharmacies and local governments [25]. The representation of
intervention. In our study, we only considered AKI adverse those forms allowed us to compare different questions in
event that did not occur within 30 days before heart surgery but different forms. However, that study did not emphasize the
did occur after the heart surgery. The representation and representation of the concepts in reality that the questions are
analysis of the conditions before heart surgery using the designed determine. Abidi et al. presented a framework to
Human Phenotype Ontology (HPO) (Fig. 3) allowed us to have semi-automatically extract medical entities from referral
a clear idea on how the patients’ information (e.g., age and letters, classifying the unstructured referral letters according to
symptoms) and heart surgery are associated with the AKI their semantic types based on SNOMED-CT [26], and
adverse event. However, even though the ontology can help transcribe CRFs based on the extracted information from the
better represent and interpret the adverse event definitions, the referral letters. Such a strategy does not result in ontology
ontology by itself does not directly handle large volumes of big representation of CRFs. However, the semi-automatic
data well, for which OMOP is good at. Therefore, our ontology extraction of medical entities from text is a valuable way to
representation can be used as a complementary method to improve the speed of ontology development. Lin et al.
support OMOP data analysis. Furthermore, the logic generated presented a multi-technique approach to facilitate electronic
CRF (eCRF) design by adopting common data element with the coverage of OPMI and represents one area of potential
standards and ontology-based knowledgebase [27]. It is likely collaboration. It will also be interesting to compare the
that our OPMI CRF-Question-Entity representation will commonalities and differences between the CRFs in
indeed support eCRF development. OPMI will be able to ClinEpiDB and KPMP, and provide template CRFs for other
provide a pool of questions for eCRF designers to choose and clinical projects.
use. Once a set of questions are defined, our system will be
able to allow users to automatically identify the concepts in In addition, OPMI is also being explored to support many
reality behind these questions and the semantic relations other community-based precision medicine projects, including
between the entities. the representation of clinical trial terms as seen in
ClinicalTrials.gov, a database of clinical studies conducted
We presented the OPMI and its CRF-Question-Entity around the world (https://clinicaltrials.gov/). The
strategy in the Seventh Clinical and Translational Science ClinicalTrials.gov database defines many clinical trial related
Ontology Workshop, Orlando, Florida, on February 20 2019. terms (https://prsinfo.clinicaltrials.gov/definitions.html). We
This workshop had the theme of “Ontology for Precision are currently collaborating with the researchers in the US
Medicine: From Genomes to Public Health”. Our presentation National Institute of Health (NIH) and model and represent
and another one-hour discussion on this topic in the next day these terms in OPMI.
were well-received. While there were efforts to record CRF
questions and answers, our strategy of ontological modeling of ACKNOWLEDGMENT
the underlying semantic meanings of CRF questions was
generally considered novel. Constructive and insightful This KPMP project is supported by the NIH National
comments were also received, for example, how to properly Institute of Diabetes and Digestive and Kidney Diseases
represent the reality of ‘unknown answer to question’. These (NIDDK) U2C Project #: 1U2CDK114886-01. We appreciate
comments are being carefully considered in our OPMI Dr. Deborah Hoshizaki’s discussion and support during the
development. ontology development and applications. We also appreciate the
discussion and feedback provided by the attendees (including
OPMI is a community effort. Its initial development came Matthias Brochhausen, Peter Elkin, William Hogan, etc.) in the
from the development of the Ontology of Respiratory Disease Seventh Clinical and Translational Science Ontology
Investigation (ORDI), which ontologically represented many Workshop.
clinical terms frequently used in the respiratory disease studies
[28]. Respiratory diseases are among the leading causes of
ADDRESS FOR CORRESPONDENCE
death worldwide. It remains a challenge to standardize,
integrate, and analyze high volume and heterogeneous Please contact YH from the University of Michigan, Ann
respiratory disease investigation data for deep mechanism Arbor, MI, USA. Email address: yongqunh@med.umich.edu.
understanding and rationale treatment design. One study Telephone: +1-734-615-8231.
surveyed hundreds of residents from the urban and suburb
communities associated with various variables and different REFERENCES
respiratory diseases [28].
[1] R. Higdon, W. Haynes, L. Stanberry, E. Stewart, G. Yandl, C.
Another use case is the application of OPMI to support the Howard, et al., "Unraveling the Complexities of Life Sciences
National Physique and Health Database in China Data," Big Data, vol. 1, pp. 42-50, Mar 2013.
(http://cnphd.bmicc.cn/chs/en/), which was initiated in 2001, [2] G. Hripcsak, J. D. Duke, N. H. Shah, C. G. Reich, V. Huser,
and is being maintained by the Biologic Medicine Information M. J. Schuemie, et al., "Observational Health Data Sciences
Center of China (BMICC, http://www.bmicc.org), Institute of and Informatics (OHDSI): Opportunities for Observational
Basic Medical Sciences (IBMS), Chinese Academy of Medical Researchers," Stud Health Technol Inform, vol. 216, pp. 574-
Sciences, Beijing, China. The database contains the physical 8, 2015.
[3] F. S. Collins, K. L. Hudson, J. P. Briggs, and M. S. Lauer,
and health data of over 160,000 Chinese from different
"PCORnet: turning a dream into reality," J Am Med Inform
locations, genders, and ages. Over 200 parameters, related to Assoc, vol. 21, pp. 576-7, Jul-Aug 2014.
morphology, function and physical capacity of an individual [4] T. R. Ross, D. Ng, J. S. Brown, R. Pardee, M. C. Hornbrook,
body, were identified and used in the database. In addition, G. Hart, et al., "The HMO Research Network Virtual Data
more data will be collected and added to this database in the Warehouse: A Public Data Model to Support Collaboration,"
future. OPMI is being applied to standardize and analyze the EGEMS (Wash DC), vol. 2, p. 1049, 2014.
data in the database and make the data more accessible and [5] T. Souza, R. Kush, and J. P. Evans, "Global clinical data
useful by others. interchange standards are here!," Drug Discov Today, vol. 12,
pp. 174-81, Feb 2007.
The ClinEpiDB project, launched in February 2018, is an
[6] B. Smith, M. Ashburner, C. Rosse, J. Bard, W. Bug, W.
open-access online resource enabling investigators to Ceusters, et al., "The OBO Foundry: coordinated evolution of
maximize the utility and reach of their clinical epidemiology ontologies to support biomedical data integration," Nat
data and to make optimal use of the data released by others Biotechnol, vol. 25, pp. 1251-5, Nov 2007.
(https://clinepidb.org). With a focus on diarrheal and infectious [7] Y. He, Z. Xiang, J. Zheng, Y. Lin, J. A. Overton, and E. Ong,
disease epidemiology, ClinEpiDB datasets involve many "The eXtensible ontology development (XOD) principles and
clinical epidemiology-related questions from CRFs. tool implementation to support ontology interoperability," J
Representing these requires many clinical terms that overlap Biomed Semantics, vol. 9, p. 3, Jan 12 2018.
[8] The Ontology for General Medical Science (OGMS). [20] W. Ceusters and J. Blaisure, "A realism-based view on counts
Available: https://github.com/OGMS/ogms in OMOP's common data model," 2017, pp. 1-8. DOI:
[9] A. Bandrowski, R. Brinkman, M. Brochhausen, M. H. Brush, 10.3233/978-1-61499-761-0-55.
B. Bug, M. C. Chibucos, et al., "The Ontology for Biomedical [21] J. B. O'Neal, A. D. Shaw, and F. T. t. Billings, "Acute kidney
Investigations," PLoS One, vol. 11, p. e0154556, 2016. injury following cardiac surgery: current understanding and
[10] R. R. Brinkman, M. Courtot, D. Derom, J. M. Fostel, Y. He, future directions," Crit Care, vol. 20, p. 187, Jul 4 2016.
P. Lord, et al., "Modeling biomedical experimental processes [22] M. G. Lagny, F. Jouret, J. N. Koch, F. Blaffart, A. F.
with OBI," J Biomed Semantics, vol. 1 Suppl 1, p. S7, 2010. Donneau, A. Albert, et al., "Incidence and outcomes of acute
[11] T. Groza, S. Kohler, D. Moldenhauer, N. Vasilevsky, G. kidney injury after cardiac surgery using either criteria of the
Baynam, T. Zemojtel, et al., "The Human Phenotype RIFLE classification," BMC Nephrol, vol. 16, p. 76, May 30
Ontology: Semantic Unification of Common and Rare 2015.
Disease," Am J Hum Genet, vol. 97, pp. 111-24, Jul 2 2015. [23] M. H. Rosner and M. D. Okusa, "Acute kidney injury
[12] C. J. Mungall, C. Torniai, G. V. Gkoutos, S. E. Lewis, and M. associated with cardiac surgery," Clin J Am Soc Nephrol, vol.
A. Haendel, "Uberon, an integrative multi-species anatomy 1, pp. 19-32, Jan 2006.
ontology," Genome Biol, vol. 13, p. R5, 2012. [24] K. A. Ramos and C. B. Dias, "Acute Kidney Injury after
[13] Y. He, S. Sarntivijai, Y. Lin, Z. Xiang, A. Guo, S. Zhang, et Cardiac Surgery in Patients Without Chronic Kidney
al., "OAE: The Ontology of Adverse Events," J Biomed Disease," Braz J Cardiovasc Surg, vol. 33, pp. 454-461, Sep-
Semantics, vol. 5, p. 29, 2014. Oct 2018.
[14] Y. Lin, M. R. Harris, F. J. Manion, E. Eisenhauer, B. Zhao, [25] Y. Lin, J. Zheng, and Y. He, "VICO: Ontology-based
W. Shi, et al., "Development of a BFO-based Informed representation and integrative analysis of vaccination
Consent Ontology (ICO)," in The 5th International informed consent forms," J Biomed Semantics, vol. 7, p. 20,
Conference on Biomedical Ontologies (ICBO), Houston, 2016.
Texas, USA, October 8-9, 2014, 2014. [26] S. H. Brown, P. L. Elkin, B. A. Bauer, D. Wahner-Roedler, C.
[15] Z. Xiang, M. Courtot, R. R. Brinkman, A. Ruttenberg, and Y. S. Husser, Z. Temesgen, et al., "SNOMED CT: utility for a
He, "OntoFox: web-based support for ontology reuse," BMC general medical evaluation template," AMIA Annu Symp
Res Notes, vol. 3:175, pp. 1-12, 2010. Proc, pp. 101-5, 2006.
[16] D. L. Rubin, N. F. Noy, and M. A. Musen, "Protege: a tool [27] C. H. Lin, N. Y. Wu, and D. M. Liou, "A multi-technique
for managing and using terminology in radiology approach to bridge electronic case report form design and
applications," J Digit Imaging, vol. 20 Suppl 1, pp. 34-46, data standard adoption," J Biomed Inform, vol. 53, pp. 49-57,
Nov 2007. Feb 2015.
[17] E. Ong, Z. Xiang, B. Zhao, Y. Liu, Y. Lin, J. Zheng, et al., [28] H. Yu, J. Zheng, H. Wang, E. Ong, X. Ye, Z. Zhang, et al.,
"Ontobee: A linked ontology data server to support ontology "ORDI: An integrative community-driven ontology to support
term dereferencing, linkage, query and integration," Nucleic standardized representation and data analysis for respiratory
Acids Res, vol. 45, pp. D347-D352, Jan 04 2017. disease investigations " in The 11th International Biocuration
[18] P. Grenon and B. Smith, "SNAP and SPAN: Towards Conference (BioCuration-2018), Shanghai, China, April 8-11,
Dynamic Spatial Ontology," Spatial Cognition and 2018.
Computation, vol. 4, pp. 69-103, 2004.
[19] R. Arp, B. Smith, and A. D. Spear, Building Ontologies Using
Basic Formal Ontology. MIT Press: Cambridge, MA, USA,
2015.