=Paper= {{Paper |id=Vol-3073/paper13 |storemode=property |title=Analysis of Biomarker Data Towards Development of a Molecular Biomarker Ontology |pdfUrl=https://ceur-ws.org/Vol-3073/paper13.pdf |volume=Vol-3073 |authors=Daniel Lyman,Darren Natale,Lynn Schriml,Kriston Anton,Daniel C. Crichton,Raja Mazumder |dblpUrl=https://dblp.org/rec/conf/icbo/LymanNSACM21 }} ==Analysis of Biomarker Data Towards Development of a Molecular Biomarker Ontology== https://ceur-ws.org/Vol-3073/paper13.pdf
Analysis of Biomarker Data Towards Development of a
Molecular Biomarker Ontology
Daniel Lyman1, Darren Natale2, Lynn Schriml3, Kriston Anton4, Daniel C. Crichton5, Raja
Mazumder1
1
  The George Washington University, 2121 I St NW, Washington, DC, USA
2
  Georgetown University Medical Center, 37th and O Street, N.W., Washington, DC, USA
3
  University of Maryland School of Medicine, 655 W Baltimore St S, Baltimore, MD, USA
4
  University of North Carolina, Chapel Hill, NC, USA
5
  NASA Jet Propulsion Laboratory, 4800 Oak Grove Dr, Pasadena, CA, USA


                                  Abstract
                                  Molecular biomarkers comprise fundamental elements of biomedical inquiry. No ontology has
                                  been developed to organize this knowledge across disciplines and to link diseases, processes,
                                  or technologies, which would be a substantial asset for research and healthcare. A sustained
                                  effort is under way to construct such an ontology for greater precision in representation.
                                  Observed overlaps of biomarkers for COVID-19, diabetes, and cancers underscore the
                                  potential of novel ontology-based explorations. A biomarker ontology can harmonize data
                                  across varied diseases and technologies, tie together NIH programs, guide future data
                                  collection, support machine learning, and foster research in ethical use of biomarkers.

                                  Keywords 1
                                  Ontology development, molecular biomarkers, data integration, data linkage, disease

1. Introduction

     Acquisition and use of knowledge from biomedical research reduce illness and enhance human
health. Investigations of living systems routinely assess data on objects used as indicators (i.e.,
biomarkers) of biological processes, at the molecular, cellular, or physiologic level. The FDA-NIH
Biomarker Working Group (BEST Resource [1]) defines biomarkers as “a defined characteristic that is
measured as an indicator of normal biological processes, pathogenic processes, or biological responses
to an exposure or intervention, including therapeutic interventions”. As indicators of biological
processes, biomarkers comprise a growing focus of biomedical research: as crucial factors in biological
inquiries, essential elements of precision medicine, critical components of pipeline screens or clinical
trials, and vital ingredients of investment decision making in the development of therapeutics.
     Evidence indicates that impaired cellular components contribute to the onset or progression of many
disorders; hundreds of genes and proteins, for example, are implicated in oncogenic pathway
deregulations. Advances in omics technologies have driven the use of molecular biomarkers as principal
instruments of current inquiry and improvements in medical treatment. Molecular biomarkers gauge
and illuminate alterations of specific genomic, proteomic, glycomic, lipidomic, or metabolomic
components that underlie key functions. Biomarkers, therefore, comprise a fundamental nexus of
biomedical inquiry and great interest exists across medical disciplines in further biomarker discoveries.
     Accordingly, numerous databases collect biomarker content; e.g., OncoMX, EDRN (Early
Detection Research Network), the Alzbiomarker DB, MarkerDB, and others. Additional databases
collect closely related content. However, these resources are not harmonized with a common standard.

International Conference on Biomedical Ontologies 2021, September 16–18, 2021, Bozen-Bolzano, Italy
EMAIL:         danlyman@email.gwu.edu;      dan5@georgetown.edu;      lschriml@som.umaryland.edu;   kristen_anton@med.unc.edu;
daniel.j.crichton@jpl.nasa.gov; mazumder@gwu.edu
ORCID: 0000-0001-9333-1455; 0000-0001-5809-9523; 0000-0001-8910-9851; 0000-0002-5487-7719; 0000-0001-8823-9945
                               © 2021 Copyright for this paper by its authors.
                               Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Wor
    Pr
       ks
        hop
     oceedi
          ngs
                ht
                I
                 tp:
                   //
                    ceur
                       -
                SSN1613-
                        ws
                         .or
                       0073
                           g

                               CEUR Workshop Proceedings (CEUR-WS.org)
Despite their central investigative importance (Figure 1), no harmonized organization (vocabulary or
ontology) of biomarker knowledge has been developed as a cross-cutting infrastructure or unifying
instrument across diseases, processes, components, or technologies. As a result, significant challenges
hinder generation and translation of integrated biomarker observations into beneficial clinical
applications. A logic-based data structure that facilitates pan-biomarker investigation would be a
significant asset. Formal models have enhanced analysis of large datasets; identified driver genes and
determinants; supported patient staging; identified treatment options; and predicted responses to therapy
and survival. However, a recent search of resources and literature identified only narrowly-scoped
biomarker ontologies; e.g., Imaging Biomarker Ontology [2], Food-Biomarker Ontology [3], and
Coronavirus Infectious Disease Ontology [4]. Other identified ontologies are inactive or not publicly
available and the Ontology for General Medical Science [5], describing clinical encounters, has been
extended with definition and classification of disjoint biomarker types (material, quality, process), but
does not include processes of biomarker measurements.




Figure 1: Molecular biomarkers may comprise assorted interrelated biological objects (circle) that are
also related to other biomedical objects, properties, or processes (column) in particular ways that can
be specifically described and represented in a biomarker ontology.

2. Results and Discussion

    Through work on biomarker discovery, data integration, and ontology development [6-9], we see an
urgent need to harmonize key biomarker knowledge, organized on OBO Foundry principles [10] and
linked with related models, for cross-disciplinary investigations and exploration of novel hypotheses.
A sustained effort is, therefore, under way to evaluate biomarker knowledge towards identifying and
defining domain scope, classes, attributes, and relations. The goal is to harmonize terminology,
structured in a machine-readable framework, and axiomatically connected to related elements: e.g.,
gene, protein, disease, phenotype, cell, anatomy, variants. The general workflow of the project is shown
in Figure 2.
    Following literature reviews, searches of public resources, and discussions with associated parties,
the scope of the initial model has focused on 1) molecular disease biomarkers; 2) primary use cases; 3)
examination of source materials; 4) ontological evaluation of biomarker objects; and 5) evaluation of
associations across related elements. Examination of data models and resource contents has provided
an outline of the subject landscape, domain elements, and connections with related data. Detailed
inspection of source materials has refined our view of biomarker and related knowledge: metadata,
biomarker instances and measures, definitions, and labels, as well as classification and relations.
Collection of essential biomarker data types, terms, classes, and relations from databases, resources,
and use cases provides elements to design a foundational representation and informs the import of data
from use case resources. Use cases provide practical substance to devise representation and organization
of data in an initial model that addresses known biomarkers (single and panels) and data types, is able
to evolve to a material ontology, and can accommodate future data, data types, and concepts.
    Rapid publication of studies in 2020, proposing new COVID-19 biomarkers for therapy, evaluation,
risk assessment. etc., served as one case that highlighted the need for biomarker harmonization and
integration. It further provided an opportunity to test a data model to capture, describe, and
accommodate key data as new biomarkers were identified. Crowdsourced curation of COVID-19
studies extracted 188 biomarkers (gathered to date) that facilitated configuration of a model comprised
of core attributes: Assessed Biomarker Entity, Biomarker Measurement, Cross Reference to a standard
resource (e.g., UniProt accession [11]), BEST Biomarker Type [1], Specimen Type (with Uberon ID),
Disease Name (with DOID), LOINC Code, Literature Evidence (declarative source text), and Notes
(further information). Each annotated biomarker entry is assigned a unique alphanumeric ID and labels
for Assessed Biomarker Entity and the Biomarker Measurement are standardized.




Figure 2: Biomarker data is collected from assorted resources, organized, and stored in a basic,
extensible data model. Following ontological analysis and import of related knowledge, stored
biomarker data is reformatted into a machine-readable ontology for efficient and flexible data query,
sharing, and analysis across biomarker and related domain knowledge for generation of novel
hypotheses, exploration, and discovery.

   Analysis of the biomarker data found increased or decreased activity of specific proteins,
metabolites, lipids, glycoconjugates, carbohydrates, electrolytes, and circulating blood cells [12].
Examination of biomarkers indicated COVID-19 responses to viral challenge, immune activation and
regulation (innate and adaptive), and effects in cell growth, coagulation, vascular remodeling,
homeostasis, cell adhesion, and metabolism. Thirty of the biomarkers are also biomarkers of diabetes
and showed similar increased or decreased biologic activities in important host functions. Eighty-eight
biomarkers are also biomarkers of twenty-three different cancer types and involve cell growth and
proliferation, homeostasis, metabolic components, cell adhesion, coagulation, and immune activation
and regulation. These overlaps underscore the need to harmonize biomarker knowledge across diseases,
resources, and technologies and suggest potential ontology-based discovery of significant biomarkers
through exploration of key pathways, cross-associations among diseases, and novel avenues of
investigation.
   We are also curating biomarkers for 15 broad cancer types (373 biomarkers gathered to date), which
show altered activity of proteins, genes, metabolites, lipids, carbohydrates, and cells. Several entities
are biomarkers for more than one cancer and seven biomarkers of diabetes are also biomarkers of
cancers, involving primarily immune activation and regulation. The cancer biomarker data is
supplemented with mappings of single (1600) and panel gene/protein biomarker data from EDRN and
FDA. Curation of N-linked glycans in hepatocellular carcinoma facilitates further cross-cutting
analyses, establishing a unique environment for glycan panel data linked with genomic and proteomic
data, which may differentiate N-glycosylation profiles in cancer cohorts. All data collected to support
evolution of a molecular biomarker ontology are freely available for download and analysis at
https://data.oncomx.org/.
    Analysis of biomarker data collected from diverse sources supports acquisition and detection of
shared ontological attributes that express properties of components in the knowledge space; e.g.,
harmonization of terms, expressions, and semantics; identification, definition, and organization of
classes and relations. Ontological analysis and representation of biomarker data began with, and rests
upon, the BEST Resource definition of biomarkers [1], which includes features described in other
definitions. Rigorous examination of the definition indicates (unstated, though implied) that each
instance of a measured indicator object is compared to an established standard measure of that object,
to assess an instance of a biological process (or response). The definition further implies that
measurement of the indicator object (the observed result compared to a standard), rather than the
indicator object per se, constitutes the biomarker. To illustrate the ground semantics, an increased level
of a protein (rather than the protein) is a biomarker; a sequence variant of a gene (rather than the wild
type gene) is a biomarker; altered structure of a glycan (rather than the typical glycan) is a biomarker;
and so on.
    However, conventional designations about biomarkers in databases, publications, and data models
customarily name the standard referent substance (e.g., ‘IL-6’ or ‘BRCA1’) as an easily recognized
label (understood as proxy) for the measured object, instead of precisely naming the actual measured
object. Ontological implementation of this common name conflation (biomarker and referent),
however, can produce logical errors. Our approach explicitly expresses in the model the precise
intended semantics in a biomarker name/label (e.g., ‘increased IL-6 expression’ or ‘BRCA1 variant
xyz’). Representation of knowledge in the emerging ontology, assembled from diverse resources,
therefore, includes a distinction between “Assessed Biomarker Entity” and “Biomarker”. An “Assessed
Biomarker Entity” (e.g., ‘IL-6’) in the ontology designates the standard referent object, while named
“Biomarker” objects (e.g., ‘increased IL-6 expression’) designate measured indicator objects.
    Analysis of biomarker content from diverse sources has collected assorted examples of biomarker
types, identified diverse relations to facilitate semantic reasoning, created annotations, and
distinguished links to entities in related models. Biomarker terms will be hierarchically organized into
classes, subclasses, and instances accordingly. Ontological analysis indicates that biomarkers may be
classified along several possible axes: for example, 1) intended clinical use of a biomarker with respect
to some disease/medical condition or medical product/environmental agent; 2) specimen source of a
biomarker; 3) type of assessed entity of a biomarker. We have chosen assessed entity as an asserted
classification axis, since each biomarker can appear in only one sub-branch of this axis and we have
begun to examine custom relations with ranges set to ensure proper reasoning; such as
indicated_by_increased_level_of for some biomarkers X logically defined as elevated X level (and
converse) or with indicated_by_increased_level_of for some biomarkers defined as computed ratios,
for example. We have also examined the use of relations to classify biomarkers by reasoning in non-
asserted axes and successfully tested that reasoning performs correctly (including also a ‘fake’ example
that catches errors in logic). Relations will be further reviewed with respect to the Relations Ontology
[13]. Analysis of molecular biomarker source materials, data models, and resource contents has also
prompted closer ontological investigation of Biomarker relations to Phenotypes and Risk Factors.
    Evolution of the model will be informed by data collected from biomarker databases, use case data
models, and peer-reviewed publications, including novel data types. Additional data types will also be
obtained from biomarkers undergoing clinical trials or those approved by the FDA. Additional use cases
are of interest. Related data on genes, proteins, phenotypes, cells, and anatomy will be identified in
reference ontologies, comprising critical resources for cross-disciplinary interoperability and
integration and providing key components for harmonization, alignment, and relations. To enhance the
core model, we will import, integrate, and connect (via axioms) data of selected reference ontologies
for domain coverage, data sharing, and exploration of knowledge in relevant resources. Annotations
with imported data will reveal connections between biomarkers associated with specific diseases and
phenotypes; with underlying protein and gene actors; and with affected cells, tissues, and organs.
3. Conclusion

   Ontological modeling will provide a standardized terminology, structured representation of
molecular biomarker data and knowledge, consistent rich annotations of biomarker objects in machine-
readable language, and logically inferable knowledge for data science approaches to discovery. Explicit
assertions of common properties will link biomarker elements with knowledge in related models,
datasets, and nodes (e.g., protein, disease, phenotype, anatomy, and more), facilitating data integration
and interoperability across critical independent data resources through the lens of biomarker
associations. Structured representation of molecular biomarker knowledge will promote efficient, FAIR
[14], and flexible data query, acquisition, sharing, and analysis. Unification of data across the biomarker
domain and with related knowledge enables generation of novel hypotheses, exploration, and discovery.

4. Acknowledgements
   Work was supported in part by National Cancer Institute (NCI) (U01CA0215010) to RM and DC.

5. References

[1] FDA-NIH Biomarker Working Group, BEST (Biomarkers, EndpointS, and other Tools) Resource,
     2016. URL: https://www.ncbi.nlm.nih.gov/books/NBK326791.
[2] E. Amdouni, B. Gibaud, Imaging Biomarker Ontology (IBO): A Biomedical Ontology to Annotate
     and Share Imaging Biomarker Data, J. Data Semantics. 7 (2018) 223. doi:10.1007/s13740-018-
     0093-3.
[3] P. Castellano-Escuder, R. González-Domínguez, D. S. Wishart, C. Andrés-Lacueva, A. Sánchez-
     Pla, FOBI: an ontology to represent food intake data and associate it with metabolomic data,
     Database, 2020 (2020) baaa033. doi:10.1093/databa/baaa033.
[4] Y. He, H. Yu, E. Ong, et al., CIDO, a community-based ontology for coronavirus disease
     knowledge and data integration, sharing, and analysis, Sci. Data. 7 (2020) 181.
     doi:10.1038/s41597-020-0523-6.
[5] W. Ceusters, B. Smith, Biomarkers in the ontology for general medical science, Stud. Health
     Technol. Inform. 210 (2015) 155-9. doi:10.3233/978-1-61499-512-8-155.
[6] D. J. Crichton, A. Altinok, C. I. Amos, et al., Cancer Biomarkers and Big Data: A Planetary
     Science Approach, Cancer Cell. 38 (2020) 757-760. doi:10.1016/j.ccell.2020.09.006.
[7] H. M. Dingerdissen, F. Bastian, K. Vijay-Shanker, et al., OncoMX: A Knowledgebase for
     Exploring Cancer Biomarkers in the Context of Related Cancer and Healthy Data, JCO. Clin.
     Cancer Inform. 4 (2020) 210-220. doi:10.1200/CCI.19.00117.
[8] D. A. Natale, C. N. Arighi, J. A. Blake, et al., Protein Ontology (PRO): enhancing and scaling up
     the representation of protein entities, Nucleic Acids Res. 45 (2017) D339-D346.
     doi:10.1093/nar/gkw1075.
[9] L. M. Schriml, E. Mitraka, J. Munro, et al., Human Disease Ontology 2018 update: classification,
     content and workflow expansion, Nucleic Acids Res. 47 (2019) D955-D962.
     doi:10.1093/nar/gky1032.
[10] B. Smith, M. Ashburner, C. Rosse, et al., The OBO Foundry: coordinated evolution of ontologies
     to support biomedical data integration, Nat. Biotechnol. 25 (2007) 1251-5. doi:10.1038/nbt1346.
[11] UniProt consortium, UniProt: a worldwide hub of protein knowledge, Nucleic Acids Res. 47
     (2019) D506-D515. doi:10.1093/nar/gky1049.
[12] N. Gogate, D. Lyman, A. Bell, et al., COVID-19 biomarkers and their overlap with comorbidities
     in a disease biomarker data model, Brief Bioinform. (2021) bbab191. doi:10.1093/bib/bbab191.
[13] B. Smith, W. Ceusters, B. Klagges, et al., Relations in biomedical ontologies, Genome Biol. 6
     (2005) R46. doi:10.1186/gb-2005-6-5-r46.
[14] M. D. Wilkinson, M. Dumontier, I. J. J. Aalbersberg, et al., The FAIR Guiding Principles for
     scientific data management and stewardship, Sci. Data. 3 (2016) 160018.
     doi:10.1038/sdata.2016.18.