<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Transforming and Unifying Research with Biomedical Ontologies</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>The Penn TURBO project</string-name>
          <email>stoeckrt@pennmedicine.upenn.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Christian J. Stoeckert Jr.</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>David Birtwell, Heather Williams Penn Medicine BioBank, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania Philadelphia</institution>
          ,
          <addr-line>PA</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Dept. of Genetics, Institute for Biomedical Informatics Perelman School of Medicine, University of Pennsylvania Philadelphia</institution>
          ,
          <addr-line>PA</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Hayden Freedman, Mark A. Miller Institute for Biomedical Informatics Perelman School of Medicine, University of Pennsylvania Philadelphia</institution>
          ,
          <addr-line>PA</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>TURBO is supported by the Institute for Biomedical Informatics and the Institute for Translational Medicine And Therapeutics at the University of Pennsylvania Perelman School of Medicine. ICBO 2018</institution>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2018</year>
      </pub-date>
      <fpage>7</fpage>
      <lpage>10</lpage>
      <abstract>
        <p>- The Penn TURBO (Transforming and Unifying Research with Biomedical Ontologies) project aims to accelerate finding and connecting key information from clinical records for research through semantic associations to the processes that generated the clinical data. Major challenges to using clinical data for research are integrating data from different sources which may contain multiple references to the same entity (e.g., person, health care encounter) and incomplete or conflicting information (e.g., gender, BMI). There is also the need to track the provenance of information used when making decisions on what is the actual phenotype of a person. We take a realism-based ontology approach to address these problems through transformation and instantiation of clinical data with an OBO-Foundry based application ontology in a semantic graph database. We have developed an application stack and used it on an 11,237 whole exome sequencing patient cohort capturing key demographics, diagnosis codes, and prescribed medications. The anticipated payoff is to be able to make use of inferencing provided by the semantics to classify and search for instances of people and specimens with desired characteristics.</p>
      </abstract>
      <kwd-group>
        <kwd>realism-based ontology</kwd>
        <kwd>OBO Foundry</kwd>
        <kwd>referent tracking</kwd>
        <kwd>clinical data</kwd>
        <kwd>diagnosis codes</kwd>
        <kwd>prescriptions</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>I. INTRODUCTION</title>
      <p>
        The goal of the TURBO project is to transform and unify
research data with biomedical ontologies. Typically data are
obtained in tabular form often from relational databases. The
column headers and row values are often idiosyncratic and even
when based on a standard may be malformed, incomplete, and
contradictory. Dependencies and deep relations between the
headers (data variables) and values are rarely explicit.
Transforming the data into a semantic graph instantiating a
realism-based ontology allows us to state what is known about
people and what has happened to them, what information is
available about them, and what conclusions can be drawn based
on that information. Clinical data often comes from multiple
sources (e.g., EPIC, REDCap). Instantiation of data from
different sources in the same realism-based ontology [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] allows
us to unify the data. Part of the unification comes through
referent tracking [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], associating information for the same
person, quality, or event with a unique identifier for that referent
regardless of where and when the information was obtained.
      </p>
      <p>
        The Open Biomedical Ontologies Foundry [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] provides
through its library of ontologies the ability to create a biomedical
ontology that is realism-based. We created the TURBO
ontology as an application ontology based on these ontologies
drawing from the Ontology for Biomedical Investigations (OBI)
[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and the Ontology for Biobanking (OBIB) [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] in particular.
By application ontology, we mean that we are primarily reusing
terms (classes, instances, and relations) from existing ontologies
and creating terms only as needed to move the project forward.
Terms that potentially have broader usage are submitted to
existing ontologies.
      </p>
      <p>
        An application stack called Drivetrain was developed to
perform part of the transformation, the unification, referent
tracking, and generating conclusions as RDF statements about
people and their qualities. Currently the Karma tool [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] is used
to transform tabular data into initial RDF triples for Drivetrain
to use. Ontology modeling is also used to capture provenance
of data and conclusions drawn based on the data. After running
the Drivetrain stack, the reasoning capabilities of the semantic
graph database can be used to classify and aid search for
instances of people and specimens with desired characteristics.
For example, people can be identified who have been prescribed
a particular class of drugs (‘statins’). We intend to create
phenotypic profiles in the form of equivalence axioms that will
be used to infer which people or specimens match those profiles.
      </p>
    </sec>
    <sec id="sec-2">
      <title>II. METHODS</title>
      <sec id="sec-2-1">
        <title>A. Technologies used in TURBO</title>
        <p>
          Ontotext GraphDB (version 8.4.1) [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] is the semantic graph
database used. Scala (version 2.11) [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] is used for
programmatic interaction with the database, leveraging the
RDF4J (version 2.2.2) library [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. UUIDs are generated using
the randomUUID() method found in the java.util.UUID
package [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. LIBSVM was used through the svm() function
from R e1071 [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ].
        </p>
        <p>
          The TURBO ontology was generated following the
approach described in [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. Terms were selected from OBIB
using Ontodog [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] and additional terms were imported using
the OntoFox tool [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]. New terms were added using Protégé
[
          <xref ref-type="bibr" rid="ref15">15</xref>
          ].
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>B. TURBO content</title>
        <p>Data on a whole exome sequencing cohort of 11,237
participants (‘biobank consenters’) have been used to populate
a GraphDB database. The data include information on gender
identity, date of birth, and body mass index (BMI, calculated
from height and weight) collected during 14,450 biobank
encounters and 98,585 health care encounters. In addition,
181,420 diagnosis codes and 136,249 medications were
obtained during health care encounters. The data was obtained
from relational tables provided by the Penn Medicine Biobank
from two sources, a data warehouse and REDCap.</p>
        <p>
          In addition to RDF triples generated from the data,
individual ontologies and terminologies were also loaded into
the GraphDB database. The ontologies included the TURBO
application ontology, RDF representations of ICD9 and ICD10
codes obtained from the NCBO Bioportal [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ], all portions of
the Drug Ontology [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ] except NDC annotation, the “lite”
component of ChEBI [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ], and the Monarch Disease Ontology
(MonDO) [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ].
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>C. Generation of RDF triples to load into the TURBO</title>
      </sec>
      <sec id="sec-2-4">
        <title>GraphDB database.</title>
        <p>The Karma application (version 2.1) was used to generate
RDF triples from the tabular data for loading into the GraphDB
database. Karma models were based on the TURBO ontology.</p>
      </sec>
      <sec id="sec-2-5">
        <title>D. TURBO code and documentation</title>
        <p>The code base for the Drivetrain component is available at
GitHub including documentation of the full TURBO stack and
description of ontology modeling.
https://pennturbo.github.io/Turbo-Documentation/</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>III. RESULTS</title>
      <p>
        A technology stack has been developed for the TURBO
project that implements a pipeline to transform tabular data into
semantic triples, stored in a Resource Description Framework
(RDF) triple store, using terms from the TURBO Ontology
(https://raw.githubusercontent.com/PennTURBO/TurboOntology/master/ontologies/turbo_merged.owl). The TURBO
ontology at time of writing consists of 727 terms (415 classes,
41 individuals, 271 properties). These are primarily drawn from
25 ontologies with 161 new terms created for TURBO (69
classes, 19 individuals, 73 properties). URIs and all labels of
terms instantiated in the current TURBO semantic repository are
listed at the bottom of:
https://pennturbo.github.io/TurboDocumentation/turbo-ontology.html (along with a discussion
and an example of an instantiated triple higher on the page).
Terms in the TURBO ontology are focused on patients and their
qualities along with information collected on them, ‘health care
encounter’s (http://purl.obolibrary.org/obo/OGMS_0000097)
and their outputs (diagnoses, measurements), and biobank
encounters and their outputs. The new terms mainly cover
shortcut relations utilized in the Karma mapping and for
managing UUIDs during referent tracking. At the Penn
Medicine Biobank, data are collected when participants are
consented at which time they have not yet donated a specimen
but have been assigned an ID. To capture this case, a ‘biobank
consenter’ term has been generated defined as a participant in a
biobank consenting process (Figure1). Incorporating the essence
of this term is in progress with ICO [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] and OBIB developers.
      </p>
      <p>The Karma tool was used to map relational data to ontology
terms saved with an extended version of the R2RML language.
The mappings were then used to publish the data as RDF triples.
The initial RDF triples make use of shortcut relation properties
to simplify the manual mapping. The essence of TURBO
shortcut relations is to allow a minimal number of classes to be
instantiated – frequently just one. For example, an input table
nominally about health care encounters may include height,
weight and body mass index (BMI) values. Those data items are
not values of the encounters, but rather values of properties
borne by the people who participated in the encounters. The
shortcut relation “shortcut health care encounter to BMI”
eliminates the need to instantiate a class that represents the
encounter participants and instead says that there is some path
from the encounter to the BMI value. The Drivetrain application
(described next) contains all of the logic necessary to expand the
shortcut into a semantically complete description of reality.</p>
      <p>The Drivetrain application was built to load and process the
RDF triples with the following steps:</p>
      <sec id="sec-3-1">
        <title>A. Shortcut RDF Triples and TURBO ontology loaded to an</title>
      </sec>
      <sec id="sec-3-2">
        <title>Ontotext GraphDB repository</title>
        <p>During the data import step, the input data are written to an
isolated section of the graph. The triples are not expected to have
globally unique identifiers and so must be sectioned off from all
other data in the triple store.</p>
      </sec>
      <sec id="sec-3-3">
        <title>B. EXPAND Queries create fully ontologized model from shortcut triples</title>
        <p>The shortcut expansion phase takes all triples in the input
data that use shortcut relations and expands them to fully
ontologized forms. A single shortcut triple will likely expand to
multiple ontologized triples. In addition to expanding the triples,
the Internationalized Resource Identifiers (IRIs) in the imported
data are made unique using Universally Unique Identifiers
(UUIDs). After this phase is complete, the data in the isolated
import graph have globally unique identifiers and are fully
ontologized, though they may not yet be ready to be
incorporated into the rest of the triple store.</p>
        <p>Data integrity rules are applied to all triples in the isolated
import graph to assure that the data meet the minimum level of
integrity required by the Drivetrain application. Several
conditions must be met to pass, including checks that all classes
and properties present in the incoming data must also be present
in the TURBO ontology, all denoted registries must be
represented in the ontology, and all dates must be parseable,
reasonable, and be typed as dates. If all integrity checks have
passed, then the data are ready to be connected to the rest of the
graph.</p>
      </sec>
      <sec id="sec-3-4">
        <title>C. Scala-based REFERENT TRACKER combines duplicate entities</title>
        <p>During the Referent Tracking phase, all instantiated
IRIbearing terms that singularly and uniquely refer to a single thing
in reality are replaced with a single Instance Unique Identifier
(IUI), which is implemented by Drivetrain as an IRI that
specifically contains a Universally Unique Identifier value
(UUID). After this phase is complete, the RDF data are
normalized such that all entities in reality can be identified by a
single unique identifier that is independent yet connected to the
source relational data (Figure 2).</p>
        <p>Since our data comes from many sources, it is possible that
the same ‘biobank consenter’ may appear in multiple data
sources, each of which may contain different or contradicting
information. It is the goal of the Referent Tracker to apply
custom rules in order to determine when two consenters must be
combined into one. Likewise, the same encounter may also
appear in multiple data sources. A simple rule is that the
identifier and identifier source (central registry ID symbol and
registry) associated with the entity are the same.</p>
      </sec>
      <sec id="sec-3-5">
        <title>D. Scala-based ENTITY LINKER links Health care and</title>
      </sec>
      <sec id="sec-3-6">
        <title>Biobank Encounters to Biobank Consenters</title>
        <p>Entity Linking is a generic term used here to mean the
process of attaching consenters to their encounters based on
data provided by a relational Join table. This process is
necessary because consenters and their encounters may be
received in separate files. Drivetrain can make matches by
comparing the literal values of encounter symbols and
consenter symbols, and the values of the respective registries.</p>
      </sec>
      <sec id="sec-3-7">
        <title>E. Scala-based CONCLUSIONATOR creates inferences about Dates of Birth, Biological Sex, and BMI</title>
        <p>During the conclusionating phase, rules are applied to the
data to generate statements about a person or event. Currently
this is done to resolve potentially conflicting data to single
conclusions, which can be used for querying purposes. The
potentially conflicting data derived from the sources remain in
the graph and can be queried. In the future, it will be used to
combine data of different types (e.g., diagnosis code,
medication, lab test result) to make a single statement (e.g., a
person is diabetic). To facilitate easy querying, the conclusions,
which are RDF triples, are placed in a separate named graph.
After this phase is complete, there will be a named graph of
conclusions, which contains simplified non-conflicting
statements. Conclusionating is applied to generate statements
about the consenter’s biological sex, date of birth, and BMI at
the date of each biobank encounter. The rules used for drawing
conclusions are currently very simple, but the system is
envisioned to handle more complex rules and be able to draw
on a library of different rules in the future.</p>
        <p>One way to calculate BMI is by performing a computation
over a person’s height and weight, which can be measured
during a health care encounter or recorded on a case report form
during study recruitment during a biobank encounter (when a
person becomes a ‘biobank consenter’). It is useful to know the
BMI of biobank consenters at their date of recruitment.</p>
        <p>It is not guaranteed that the source data required to calculate
BMI at date of biobank encounter will be both available and of
sufficient quality. It may be that height and weight
measurements were recorded at the health care encounter, the
biobank encounter, neither, or both. Further, the data may have
been recorded improperly, which would result in a calculated
BMI that is outside the acceptable range.</p>
        <p>The following rules are currently applied to account for
these situations:
For each date of recruitment for each person:
• If there are in-range height and weight measurements
recorded in the health care encounter on the date of
recruitment, compute the BMI and conclude that it is the
person’s BMI at the given date of recruitment.
•
•</p>
        <p>If the BMI cannot be computed from the health care
encounter, but there are valid height and weight
measurements records on the case report form filled out as
part of the study recruitment process, compute the BMI
from the case report form data and conclude that it is the
person’s BMI at the given date of recruitment.</p>
        <p>If neither the health care encounter nor the study
recruitment encounter yield a BMI conclusion, then record
that BMI for this given date of recruitment is inconclusive.</p>
      </sec>
      <sec id="sec-3-8">
        <title>F. Diagnosis Data is mapped by cross-referencing</title>
      </sec>
      <sec id="sec-3-9">
        <title>ICD9/ICD10 hierarchies and MonDO ontologies</title>
        <p>
          Diagnosis codes come to TURBO in the form of ICD9 and
ICD10 codes [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ]. In order to enable searches broader than a
single code value, we load RDF versions of ICD9 and ICD10
downloaded from the NCBO Bioportal, which provide
subClassOf relations. We also load MonDO, an aggregation of
disease ontologies including the Human Disease Ontology
[
          <xref ref-type="bibr" rid="ref22">22</xref>
          ]), which includes database cross references for ICD codes.
We use these cross references to create mentions between
diagnosis codes and diseases, thereby enabling disease-based
searches.
        </p>
      </sec>
      <sec id="sec-3-10">
        <title>G. Medication Order Name Data are mapped to ontologies</title>
        <p>using Solr indexed text search and a Support Vector</p>
      </sec>
      <sec id="sec-3-11">
        <title>Machine (SVM)</title>
        <p>
          Medication orders are provided primarily as free text, often
including dosage and route of administration information.
Associating these orders to terms in ChEBI (Chemical Entities
of Biological Interest) would enable searches based on the
parent classes of active ingredients and their roles. To
accomplish this, the orders are computationally mapped to
terms from the Drug Ontology (DRON) which provides
crossreferences to ChEBI. About 30% of the distinct medications
prescribed to our WES cohort also came with RxNorm
identifiers [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ] that could be directly associated to DRON and
ChEBI via direct cross references. The RxNorm associations
were then used as a training set for machine learning (LIBSVM)
using results from the string matching output from Apache Solr
[
          <xref ref-type="bibr" rid="ref24">24</xref>
          ]. For the WES cohort, we were able to map 86.1% of
distinct medications (sensitivity = 0.98; specificity = 0.95)
covering 88% of the total medications prescribed (excluding
non-drug prescriptions).
        </p>
      </sec>
      <sec id="sec-3-12">
        <title>H. Performance</title>
        <p>The complete Drivetrain stack was run on a linux application
server with 8 GB RAM and 2 processors and a GraphDB
database server with 64 GB RAM and 4 processors.</p>
        <p>The run from loading of graph through medication mapping
(steps described in sections A through G above) took 82
minutes for the WES cohort data and supportive ontologies. It
resulted in 25,521,235 triples. About 3.6 million triples were
initially loaded and then expanded to about 12 million triples.
Additional triples resulted from referent tracking,
conclusionating, and adding diagnosis and medication terms
and associations.</p>
        <p>Searches for diagnosis classes take approximately a second.
For example, a search for all participants in a health care
encounter which resulted in a diagnosis that mentions
‘myocardial infarction’ will return those assigned a ICD10 code
of I21.3 (acute myocardial infarction).</p>
        <p>Searches for medications also take on the order of seconds.
A search for all participant prescribed a ‘statin’ returned all
appropriate statins and no inappropriate ones based on drug
name matches and their active ingredients with one important
exception. Crestor contains rosuvastatin but is not identified as
a statin. That is because rosuvastatin while present in both
DRON and ChEBI have different IRIs. We are able to address
this issue locally by using equivalence statements between the
two (we are also following up with DRON to resolve this issue).</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>IV. DISCUSSION</title>
      <p>The TURBO project is currently in active development as a
demonstration project for the Penn Institute for Biomedical
Informatics. We have a stable application stack, Drivetrain, that
combined with the Karma tool, enabled us to transform, load,
referent track, and make conclusions related to a real dataset of
interest, a WES cohort of 11,237 participants. Unlike traditional
data warehousing, the TURBO system performs integration
through rules applied during referent tracking and
conclusionating. The processes used to determine when entities
are the same (people, encounters) in referent tracking or make
statements about a person (e.g., BMI) in conclusionating are
modeled in the ontology and stored in the graph for provenance.
Thus, Drivetrain provides an ontology-supported knowledge
layer along with the loaded data.</p>
      <p>User stories, common requests by researchers searching
clinical data, are driving TURBO development. Competency
questions based on these user stories are then used to evaluate
the system. Examples include identification of people of
specified age, biological sex, and BMI. These are possible as is
finding those who have been prescribed a particular class of
drugs and assigned a diagnosis code linked to a particular class
of disease. We are currently working on adding genotype data
resulting from exome sequencing. Future additions will include
laboratory tests.</p>
      <p>Scalability of the system remains to be determined. We plan
to expand both the number of participants and type of data
instantiated in the semantic graph database. At 25 million triples,
our current graph database has room to grow. We run Drivetrain
with reasoning off but can then load into a graph database with
RDFS+ or OWL-Horst reasoning turned on. For the current
datasets this takes less than an hour. We are also exploring
loading shortcut triples generated by alternative methods to
Karma that are less manual.</p>
      <p>
        Our efforts at medication mapping have used standard tools
with good success but we would like to improve coverage as
much as possible. Some prescriptions are not medications at all
(e.g., wheelchairs, saline solutions, etc.) and we can generate
lists to recognize these. We will explore use of other
terminologies (e.g., NDFRT [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ]) that may provide routes
through active ingredients and equivalence matches to entries in
ChEBI. Once we have a ChEBI IRI linked to a prescription it
then can be searched based on the structure or role of the active
ingredient.
      </p>
      <p>
        The TURBO project represents a new direction in applying
ontologies to clinical data. Most efforts do not explicitly involve
realism-based ontologies or if they do use them it is in the form
of associations and not instantiations. However, there are related
projects instantiating OBO and realism-based ontologies. These
include ones by William Duncan (Roswell Park) [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ], by
Amanda Hicks and William Hogan (U. Florida) [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ], and by
Bjoern Peters (LaJolla Institute for Immunology) [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ] although
they don’t do referent tracking or conclusionating as in TURBO.
This growing number of independent efforts raise the exciting
potential of linking such systems together.
      </p>
      <p>Ultimately, we intend for the TURBO project to provide a
Phenotype Storefront that users can query to find participants
and specimens of interest. The current plan is to just return the
number of hits as results and require IRB approval for accessing
identifiable data. We also want to learn from searches made by
investigators in order to generate defined classes of participants
and specimens. For example, equivalence axioms for someone
who has had a particular disease course could include an
appropriate diagnosis code but also a relevant prescription and
laboratory test result. Inferencing applications of this nature will
bring to bear the power of ontologies to provide what can’t be
done by traditional relational systems.</p>
    </sec>
    <sec id="sec-5">
      <title>ACKNOWLEDGMENT</title>
      <p>All the authors have been approved under IRB protocol 813913
from the University of Pennsylvania to work with the described
patient data. We thank Werner Ceusters and William Hogan for
their advice and feedback on implementation of referent
tracking. We also thank Jason Moore, Scott Damrauer, Michael
Feldman, Peter Gabriel, John Holmes, and Daniel Rader for
their support and guidance as the TURBO governance board.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>B.</given-names>
            <surname>Smith</surname>
          </string-name>
          and
          <string-name>
            <given-names>W.</given-names>
            <surname>Ceusters</surname>
          </string-name>
          , “
          <article-title>Ontological realism: A methodology for coordinated evolution of scientific ontologies</article-title>
          ,
          <source>” Appl Ontol</source>
          .
          <source>2010 Nov</source>
          <volume>15</volume>
          ;
          <issue>5</issue>
          (
          <issue>3</issue>
          -4):
          <fpage>139</fpage>
          -
          <lpage>188</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>W.</given-names>
            <surname>Ceusters</surname>
          </string-name>
          and
          <string-name>
            <given-names>B.</given-names>
            <surname>Smith</surname>
          </string-name>
          , “
          <article-title>Strategies for referent tracking in electronic health records</article-title>
          ,”
          <string-name>
            <given-names>J Biomed</given-names>
            <surname>Inform</surname>
          </string-name>
          .
          <year>2006</year>
          Jun;
          <volume>39</volume>
          (
          <issue>3</issue>
          ):
          <fpage>362</fpage>
          -
          <lpage>78</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>B.</given-names>
            <surname>Smith</surname>
          </string-name>
          , et al., “
          <article-title>The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration</article-title>
          ,
          <source>” Nat Biotechnol</source>
          ,
          <year>2007</year>
          .
          <volume>25</volume>
          (
          <issue>11</issue>
          ): p.
          <fpage>1251</fpage>
          -
          <lpage>5</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bandrowski</surname>
          </string-name>
          , et al.,
          <source>“The Ontology for Biomedical Investigations,” PLoS One</source>
          ,
          <year>2016</year>
          .
          <volume>11</volume>
          (
          <issue>4</issue>
          ): p.
          <fpage>e0154556</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Brochhausen</surname>
          </string-name>
          , et al.,
          <article-title>“OBIB-a novel ontology for biobanking</article-title>
          ,
          <source>” J Biomed Semantics</source>
          ,
          <year>2016</year>
          . 7: p.
          <fpage>23</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>C. A.</given-names>
            <surname>Knoblock</surname>
          </string-name>
          , et al., “
          <article-title>Semi-Automatically Mapping Structured Sources into the Semantic Web,” ESWC'2012</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Ontotext</given-names>
            <surname>GraphDB</surname>
          </string-name>
          . https://ontotext.com/products/graphdb/
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <article-title>[8] The Scala Programming Language</article-title>
          . https://www.scala-lang.org/
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Eclipse</given-names>
            <surname>RDF4J</surname>
          </string-name>
          . http://rdf4j.org/
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Class</surname>
            <given-names>UUID</given-names>
          </string-name>
          . https://docs.oracle.com/javase/8/docs/api/java/util/UUID.html
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11] R e1701e1071:
          <article-title>Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien</article-title>
          . https://cran.rproject.org/web/packages/e1071/index.html
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>J.</given-names>
            <surname>Zheng</surname>
          </string-name>
          , E. Manduchi, and
          <string-name>
            <given-names>C. J.</given-names>
            <surname>Stoeckert</surname>
          </string-name>
          , “
          <article-title>Development of an application ontology for beta cell genomics based on the ontology for biomedical investigations</article-title>
          ,
          <source>” CEUR Workshop Proceedings</source>
          ,
          <volume>1060</volume>
          ,
          <fpage>62</fpage>
          -
          <lpage>67</lpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>J.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Xiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. J. Stoeckert</given-names>
            <surname>Jr.</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Y.</given-names>
            <surname>He</surname>
          </string-name>
          , “
          <article-title>Ontodog: a web-based ontology community view generation tool</article-title>
          ” Bioinformatics, 2014 May 1;
          <issue>30</issue>
          (
          <issue>9</issue>
          ):
          <fpage>1340</fpage>
          -
          <lpage>2</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Xiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Courtot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. R.</given-names>
            <surname>Brinkman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ruttenberg</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Y.</given-names>
            <surname>He</surname>
          </string-name>
          , “
          <article-title>OntoFox: web-based support for ontology reuse</article-title>
          ,
          <source>” BMC Res Notes. 2010 Jun</source>
          <volume>22</volume>
          ;3:
          <fpage>175</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Musen</surname>
          </string-name>
          , “
          <article-title>The Protégé project: A look back and a look forward</article-title>
          .
          <source>AI Matters,” Association of Computing Machinery Specific Interest Group in Artificial Intelligence</source>
          ,
          <volume>1</volume>
          (
          <issue>4</issue>
          ),
          <year>June 2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>P. L.</given-names>
            <surname>Whetzel</surname>
          </string-name>
          , et al.,
          <article-title>“BioPortal: enhanced functionality via new Web services from the National Center for Biomedical Ontology to access and use ontologies in software applications</article-title>
          ”
          <source>Nucleic Acids Res</source>
          .
          <year>2011</year>
          Jul;
          <volume>39</volume>
          (
          <issue>Web Server issue</issue>
          ):
          <fpage>W541</fpage>
          -
          <lpage>5</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>W. R.</given-names>
            <surname>Hogan</surname>
          </string-name>
          , et al., “
          <article-title>Therapeutic indications and other use-case-driven updates in the drug ontology: anti-malarials, anti-hypertensives, opioid analgesics, and a large term request</article-title>
          ,”
          <string-name>
            <given-names>J Biomed</given-names>
            <surname>Semantics</surname>
          </string-name>
          .
          <source>2017 Mar</source>
          <volume>3</volume>
          ;
          <issue>8</issue>
          (
          <issue>1</issue>
          ):
          <fpage>10</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>J.</given-names>
            <surname>Hastings</surname>
          </string-name>
          , et al., “ChEBI in 2016:
          <article-title>Improved services and an expanding collection of metabolites,”</article-title>
          <source>Nucleic Acids Res</source>
          .
          <source>2016 Jan</source>
          <volume>4</volume>
          ;
          <fpage>44</fpage>
          (
          <issue>D1</issue>
          ):
          <fpage>D1214</fpage>
          -
          <lpage>9</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>Monarch</given-names>
            <surname>Disease</surname>
          </string-name>
          <article-title>Ontology</article-title>
          . http://obofoundry.org/ontology/mondo.html
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>Inormed</given-names>
            <surname>Consent</surname>
          </string-name>
          <article-title>Ontology (ICO)</article-title>
          . https://github.com/ICO-ontology/ICO
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>[21] World Health Organization International Classification of Diseases. http://www.who.int/classifications/icd/en/</mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>W. A.</given-names>
            <surname>Kibbe</surname>
          </string-name>
          , et al.,
          <source>“Disease Ontology</source>
          <year>2015</year>
          update
          <article-title>: an expanded and updated database of human diseases for linking biomedical knowledge through disease data</article-title>
          ,
          <source>” Nucleic Acids Res</source>
          .
          <year>2015</year>
          Jan;
          <volume>43</volume>
          (Database issue):
          <fpage>D1071</fpage>
          -
          <lpage>8</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>[23] RxNorm. https://www.nlm.nih.gov/research/umls/rxnorm/</mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>Apache</given-names>
            <surname>Solr</surname>
          </string-name>
          . http://lucene.apache.org/solr/
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <surname>NDFRT (National Drug File - Reference Terminology</surname>
          </string-name>
          )
          <article-title>- Synopsis</article-title>
          . https://www.nlm.nih.gov/research/umls/sourcereleasedocs/current/NDF RT/
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>T.</given-names>
            <surname>Thyvalikakath</surname>
          </string-name>
          , et al.,
          <source>“National Dental PBRN. Restorative/Endodontic Procedures Performed in National Dental PBRN Practices,” J Dent Res</source>
          <volume>97</volume>
          (
          <issue>Spec Iss )</issue>
          :
          <fpage>2859794</fpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>[27] PCORowl. https://zenodo.org/record/1241209#.WvoBFsgh2L4</mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>R.</given-names>
            <surname>Vita</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Overton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Greenbaum</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sette</surname>
          </string-name>
          , OBI consortium, and
          <string-name>
            <given-names>B.</given-names>
            <surname>Peters</surname>
          </string-name>
          , “
          <article-title>Query enhancement through the practical application of ontology: the IEDB</article-title>
          and OBI,”
          <article-title>Journal of Biomedical Semantics20134(Suppl 1):S6.</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>