A Semantic Model Leveraging Pattern-based Ontology Terms to Bridge Environmental Exposures and Health Outcomes Lauren E. Chan1, Nicole A. Vasilevsky2, Anne Thessen1,2, Nicolas Matentzoglu3, William D. Duncan 4, Christopher J. Mungall 4, and Melissa A. Haendel1,2 1 Oregon State University, Corvallis, OR, 97331, USA 2 University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA 3 Semanticly Ltd, London, UK 4 Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA Abstract Chemicals are a critical aspect of modern agriculture and residues of these chemicals are commonly consumed by humans. Consumption, inhalation, or topical exposure to agricultural chemicals can pose a risk for human health through a variety of mechanisms. Similarly, exposures to radiation, nutrient consumption, and many other environmental entities can impact health and thus a wide array of research has been pursued to better understand the mechanisms and impacts of environmental exposures. While extensive exposure research has been conducted and the data stored in environmental health databases, the ability to computationally assess these findings in the larger context of biomedical research to inform our knowledge for improved human health is still challenging. We developed an integrative exposure-disease model based on the Exposure Ontology (ExO) upper level ontology and established four Dead Simple OWL Design Patterns (DOSDP) for Mondo Disease Ontology. These patterns offer coordination of exposure event and exposure stimulus terms with disease terms, utilizing content from Open Biological Ontologies. Our model and pattern set can leverage logical axioms from integrated ontologies including the Food Ontology and the Environmental Conditions, Treatments, and Exposures Ontology (ECTO) for greater data and knowledge enrichment. Development of exposure event component terms and related logical axioms can facilitate the standardization needed for exposure modeling. Exposure content and our model can be utilized for the development of integrative knowledge graphs of exposure health data. Additionally, this model serves as a resource to aid the integration of common exposure data sources such as self-reported survey tools. Future work is needed to incorporate essential exposure data components into a comprehensive model, such as estimated or known exposure values, temporality of exposures, and biologically active exposure dosages that incur toxic effects. Keywords 1 Ontology, knowledge graph, semantic model, environmental exposure, disease 1. Introduction For decades, chemicals such as fertilizers, pesticides, herbicides, and insecticides have been used as an essential component to modern agriculture [1]. While the use of these agricultural chemicals is beneficial for promoting crop growth and controlling pests and diseases, they may also pose concerns to human health. Safety of various agricultural chemicals when ingested as residues on food and as inhaled or absorbed by humans applying the chemicals to crops continues to be a concern and research priority for toxicologists [2,3]. In addition to agricultural chemicals, humans experience hundreds if not International Conference on Biomedical Ontologies 2021, September 16–18, 2021, Bozen-Bolzano, Italy EMAIL: chanl@oregonstate.edu ORCID: 0000-0002-7463-6306 © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Wor Pr ks hop oceedi ngs ht I tp: // ceur - SSN1613- ws .or 0073 g CEUR Workshop Proceedings (CEUR-WS.org) thousands of environmental exposures daily (e.g., sun exposure, air pollution, beauty products), each of which may pose health risks to the individual. In turn, environmental exposure characterization and documentation is essential to determining mechanisms of disease onset, understanding clinical sequelae, and recommending mitigating care strategies. Data from evaluations of model organisms, non-experimental exposures, and human exposures are maintained within environmental health databases. Unfortunately, limited computational standards are available for environmental health data [4,5]. This hinders the integration of environmental health findings to inform policy, health risk, and medical care [4,5]. Ontologies offer a unique opportunity to represent real life and experimental exposures facing crops, model organisms, and humans. Additionally, ontologies can support integration and connection of heterogeneous research findings and modeled knowledge to facilitate inference and inform future research [6]. Previously, we developed the Environmental Conditions, Treatments, and Exposures Ontology (ECTO) to address these use cases. ECTO’s terms represent a variety of stimuli and environmental conditions, including experimental and non-experimental exposures to humans, plants, and animals [7,8]. The Exposure Ontology (ExO) [9] is an upper ontology that models the relationship between ‘exposure event’, ‘exposure stimulus’, ‘exposure receptor’, and ‘exposure outcome’. This foundation can be used to encode ‘exposure event’ terms that reference stimuli, mediums, and routes. ECTO classes utilize the ExO ontology and are at varying levels of granularity, allowing generalized querying or encoding of exposure to specific chemicals and other entities. In this paper, we expand prior work to establish an exposure-disease model that will enable inference regarding human exposures and their correlation with health concerns or disease states. Our model relies upon ontology term logical axioms and supports population of knowledge bases for mechanistic inquiry, including exposure events, genes, diseases, and pathways. We utilize exposure to agricultural chemicals as our primary example and describe four development patterns that are used to populate necessary classes in the model. 2. Semantic Modeling Goals To facilitate encoding of environmental exposures and their impact on health, our adaptable exposure-disease model was developed to include exposures, food products, crop plants, mechanism of action, phenotypes, and disease. This model was the outcome of multiple workshops and community coordination, which included ExO, ECTO, and Mondo developers [5]. Within this proposed model, we have identified prospective ontologies from which to derive interoperable terms and relations including ECTO, Chemical Entities of Biological Interest (ChEBI) [10], Gene Ontology (GO) [11,12], National Center for Biotechnology Information Taxonomy (NCBI Taxon) [13,14], Food Ontology (FoodOn) [15,16], Human Phenotype Ontology (HPO) [17,18], Mondo Disease Ontology (Mondo) [19], and the OBO Relations Ontology (RO) [20,21]. Figure 1 depicts the three-part progression of our model including the upper ExO ontology (Figure 1A), our adaptable exposure-disease model (Figure 1B), and an application of the model using an instance level example of chlorpyrifos residue ingestion on an apple. Chlorpyrifos, an organophosphorus insecticide, is a common agricultural chemical used for production of produce and other crops within the US and beyond [22]. Chlorpyrifos has faced criticism previously for its potential impact on the human nervous system, and particularly for the risks it may post to children’s neurological development [23]. Based on reported literature of chlorpyrifos mechanisms, our model can be used to identify exposure sources, mechanisms, and associations with presenting disease and phenotypes. As seen in Figure 1, ExO describes an ‘exposure stimulus’ as ‘an agent, stimulus, activity, or event that causes stress or tension on an organism and interacts with an exposure receptor during an exposure event’. A ‘exposure receptor’ is defined as ‘an entity (e.g., a human, human population, or a human organ) that interacts with an exposure stimulus during an exposure event’. An ‘exposure outcome’ is defined as ‘entity that results from the interaction between an exposure receptor and an exposure stimulus during an exposure event’ and represents the negative or positive outcomes of having been exposed to the stimulus. It is important to recognize that it is the axioms encoded within the model that connect exposure information to a variety of knowledge that then allows the potential inference of a candidate stimulus or a predicted outcome in response to one. Figure 1. Defining and populating the exposure semantic framework. All figure panels contain consistent model variables: the exposure event in green, the entity stimulating the exposure in blue, the organism or entity being exposed in yellow, and the resulting outcomes in purple. Within Figure 1C, an instance level exposure is included in a red panel to display the integration of data. Figure 1A. ExO upper ontology: ExO includes the central ‘exposure event’ as well as associated ‘exposure stimulus’, ‘exposure receptor’, and ‘exposure outcome’ elements. Granular exposures (e.g., exposure to chlorpyrifos) are modeled in ECTO, but leverage the upper ExO ontology. In ECTO, each element can be annotated with associated metadata. Figure 1B. Exposure-disease model: Utilizing ECTO ‘exposure event’ classes, our model can include a variety of exposure stimuli, mediums, routes, and outcomes due to the inherent ExO upper level schema. Solid edges include direct relationships which can be modeled as a part of an exposure event, with dashed lines representing inferred relationships that are derived from the known direct relationships. This exposure-disease model offers a precomposed template for which to map documented relationships from the literature to support computational assessment of environmental health research findings. Figure 1C. Chlorpyrifos exposure instance example: The adaptable exposure-disease model can be used to coordinate instance level data with ontology knowledge, resulting in a translatable schema for environmental exposures. This example provides a multilayer exposure process for an individual who ingests an apple after it is exposed to chlorpyrifos, coordinated with the known phenotype and disease presentation in the individual. Documented relationships are seen with solid lines and inferred relationships are seen in dashed lines. By documenting not only food items that are the mediums for the exposure to chlorpyrifos, but also including the mechanism of action, known phenotypes, and disease states, our example schema of chlorpyrifos exposure offers access points in which further information can be inferred. For example, if another chemical served as an acetylcholinesterase inhibitor within humans, by inclusion of that chemical exposure and known regulatory activity, one could infer that the second chemical exposure may also be related to cognitive disorders, or that the chemicals composition may be similar to chlorpyrifos. 3. Exposure Model Axioms To support the exposure-disease model described above, we utilize ontology relationship axioms and structures. Some examples include logical axioms in ECTO and FoodOn. ECTO terms are developed as pre-composed classes. Exposure terms are inherently coordinated with the relevant ontology term for the chemical, environmental stimulus, or condition the ECTO term label refers to. Each pre-composed ECTO class includes a reference to another ontology term. For example, with the ECTO term ‘exposure to fertilizer’ (ECTO:9000091) the equivalence axiom for this term includes a reference to the ChEBI term ‘fertilizer’ CHEBI:33287. This logic provides the ‘has exposure stimulus’ relationship and aligns the exposure term with the detailed content of the referenced ChEBI term. Class: ‘exposure to fertilizer’ Equivalence axiom: 'exposure event' and 'has exposure stimulus' some fertilizer" Existing logic from FoodOn is also included in our model. Within FoodOn, the source ontology for food terminology, foods produced directly from a crop include a logical axiom. For example, the FoodOn term ‘orange (whole)’ (FOODON:03315106) has the logical axiom shown below that references the plant term ‘hesperidium fruit’ (PO:0030109) and the taxon ‘Citrus sinensis’ (NCBITaxon:2711). Class: orange (whole) Logical axiom: ‘hesperidium fruit and derives from some Citrus sinensis’ While these relationships and components of our model are already represented in ontology structures, the critical relationship between exposure and disease outcomes was not yet well defined. 4. Modeling Exposures as Disease Influencers To facilitate the integration and modeling of environmental exposures and human disease, we have developed and implemented four patterns for disease terms with a known exposure basis for the Mondo Disease Ontology. Creation of these patterns established logical axioms within Mondo disease terms that coordinate with environmental exposure ECTO terms. This Mondo-ECTO term relationship can then be directly implemented into our exposure-disease model. Within Mondo, as well as other ontologies, Dead Simple OWL Design Patterns (DOSDP) are frequently used to develop new terms with logical axioms in a consistent and easily maintained manner [24]. Mondo is a significant resource for mapping disease knowledge across many disease information sources (e.g., MESH, ICD, and OMIM). We chose Mondo as the target of our modeling as it was relatively easy to extend the existing logic as well as supporting alignment of many disparate resources. The disease ‘radiodermatitis’ (MONDO:0043771) conforms to the Mondo ‘realized_in_response_to_environmental_exposure‘ design pattern (https://github.com/monarch- initiative/mondo/blob/master/src/patterns/dosdp- patterns/realized_in_response_to_environmental_exposure.yaml). This pattern uses the relation ‘realized in response to’ to link diseases to the exposures (represented by ECTO classes) causing the disease. The logical axiom utilized for this pattern is: '%s and (''realized in response to'' some %s)' Vars: • Disease • Exposure Within this logical axiom template are the variable (vars) fields, represented by ‘%s’. For each vars, a variable term is required to complete the axiom statement. In this instance, the vars are ‘Disease’ and ‘Exposure’. These variable terms will be identified from Mondo and ECTO and will be used to fill in the first and second fields respectively. For example, the logical axiom for ‘radiodermatitis’ is represented as: radiodermatitis and ('realized in response to' some ‘exposure to electromagnetic radiation’) For the variety of diseases that may be caused by or initiated via an environmental exposure or external entity, we have created multiple DOSDPs for Mondo that support general and specific exposure-based disease terms. Their content and applications are described in Table 1. At this time, over 390 terms have been implemented using these patterns, with 46 terms including logical axioms referencing 17 unique ECTO exposure terms. Table 1 Exposure Related Mondo Patterns. All exposure patterns can be found on the Mondo GitHub page (https://github.com/monarch-initiative/mondo/tree/master/src/patterns/dosdp-patterns). Pattern Name Included Not included Logical axioms Example disease Poisoning.yaml Diseases caused Diseases that '''poisoning'' colchicine by exposure to a include exposure and ''realized in poisoning chemical or to a chemical or response to (MONDO:0017859) mixture that mixture but that stimulus'' some meets the do not reach the %s' threshold to threshold of cause poisoning poisoning or Vars: stimulus or intoxication. intoxication. Substance_abuse. Behavioral Diseases that do '''substance amphetamine yaml diseases that not include a abuse'' and abuse include the behavioral ''realized in (MONDO:0003969) abuse of a substance abuse response to chemical component stimulus'' some substance %s' Vars: stimulus Realized_in_respo Disease states Diseases that are '%s and alcoholic nse_to_environm that are directly not a direct (''realized in cardiomyopathy ental_ realized due to result of an response to'' (MONDO:0006643) exposure.yaml exposure to an environmental some %s)' environmental exposure. condition, Diseases caused Vars: disease, chemical, or by an infectious exposure mixture. Include agent s reference terms from Mondo and ECTO. Infectious_disease Diseases caused Diseases not '''infectious Toxoplasmosis _by_agent.yaml directly by an caused by disease'' and (MONDO:0005989) infectious agent. exposure to an ''disease has infectious agent infectious (organism, virus, agent'' some viroid etc.) %s' Vars: agent 5. Future Directions Building models for exposure risk and disease causality has been challenging due to the heterogeneity and lack of interoperability across agricultural, toxicological, and clinical data [25,26]. The model outlined in here is a preliminary foundation for how exposure influenced diseases can be described in a computable fashion. The four patterns presented here can be used to establish exposure based disease classes for Mondo Disease Ontology, and similar approaches could likely be translated into other ontologies. Our modeling structure can be used for chemical, nutrient, and other environmental exposures and their impact on phenotypes, disease, and gene function. This modular approach supports adaptation of exposure source and types while also allowing for multiple different exposures to be integrated for a comprehensive mapping of exposures to outcomes. In addition to the proposed model, work to include variables for comprehensive exposure to health modeling such as estimated or known exposure values (e.g. residual agricultural chemicals consumed in average diet), estimated or known temporality of exposures, and biologically active exposure dosages for toxic effects are needed. We plan to utilize this semantic framework for integrating a wide range of dietary and other exposures for predictive analytics, inference of causality, and to inform mitigation of exposures. The goal is to be able to integrate clinical data and biomarkers of exposure with data collected via self-reported surveys, which are commonly used for dietary data collection and estimation tools for personal environmental exposures. Additionally, harmonization of this model with other existing resources for describing adverse outcome pathways and ecotoxicology such those presented by Myklebust et al. [27] would offer substantial data integration for inference development. Using this semantic framework, we will be able to populate a knowledge graph that would leverage content found in numerous biomedical ontologies alongside instance level data from surveys, clinical data, and more. Future efforts will be focused on improving the accuracy with which exposure events can be documented to include temporality, dosage, and resulting environmental and health outcomes. In turn, these efforts are intended to support methods for risk estimations of disease and phenotype outcomes given predicted or known environmental exposures. The ECTO repository: https://github.com/EnvironmentOntology/environmental-exposure- ontology The Exposure-Disease wiki: https://github.com/EnvironmentOntology/environmental- exposure-ontology/wiki/Exposure-disease-model 6. References [1] W. C. Rhoades, The History and Use of Agricultural Chemicals. Fla Entomol 46.4 (1963): 275– 277. [2] G. M. Calvert, W. A. Alarcon, A. Chelminski, M. S. Crowley, R. Barrett, A. Correa, et al., Case report: three farmworkers who gave birth to infants with birth defects closely grouped in time and place-Florida and North Carolina, 2004-2005. Environ Health Perspect. 115 (2007): 787–791. [3] J. de Cock, D. Heederik, F. Hoek, J. Boleij, H. Kromhout, Urinary excretion of tetrahydrophtalimide in fruit growers with dermal exposure to captan. Am J Ind Med. 28 (1995): 245–256. [4] R. R. Boyles, A. E. Thessen, A. Waldrop, Ontology-based data integration for advancing toxicological knowledge. Current Opinion in Toxicology. 16 (2019): 67-74. https://doi.org/10.1016/j.cotox.2019.05.005 [5] A. E. Thessen, C. J. Grondin, R. D. Kulkarni, S. Brander, L. Truong, N. A. Vasilevsky, et al., Community Approaches for Integrating Environmental Exposures into Human Models of Disease. Environ Health Perspect. 128 (2020): 125002. [6] R. Hoehndorf, M. Dumontier, G. V. Gkoutos, Evaluation of research in biomedical ontologies. Brief Bioinform. 14 (2013): 696–712. [7] Environmental Conditions, Treatments, and Exposures Ontology (ECTO), 2020. URL: http://www.obofoundry.org/ontology/ecto.html. [8] GitHub Repository - Environmental Conditions, Treatments, and Exposures Ontology (ECTO), 2020. URL: https://github.com/EnvironmentOntology/environmental-exposure-ontology. [9] C. J. Mattingly, T. E. McKone, M. A. Callahan, J. A. Blake, E. A. C. Hubal, Providing the missing link: the exposure science ontology ExO. Environ Sci Technol. 46 (2012): 3046–3053. [10] J. Hastings, G. Owen, A. Dekker, M. Ennis, N. Kale, V. Muthukrishnan, et al., ChEBI in 2016: Improved services and an expanding collection of metabolites. Nucleic Acids Res. 44 (2016): D1214–D1219. [11] Gene Ontology Consortium, The Gene Ontology resource: enriching a GOld mine. Nucleic Acids Res. 49 (2021): D325–D334. [12] Gene Ontology Resource, 2020. URL: http://geneontology.org/. [13] S. Federhen, The NCBI Taxonomy database. Nucleic Acids Res. 40 (2012): D136–43. [14] NCBI – Taxonomy, 2021. URL: https://www.ncbi.nlm.nih.gov/taxonomy. [15] FoodOn: A farm to fork ontology, 2020. URL: https://foodon.org/. [16] D. M. Dooley, E. J. Griffiths, G. S. Gosal, P. L. Buttigieg, R. Hoehndorf, M. C. Lange, et al., FoodOn: a harmonized food ontology to increase global food traceability, quality control and data integration. npj Science of Food. 2 (2018): 1–10. [17] Human Phenotype Ontology, 2020. URL: https://hpo.jax.org/app/. [18] S. Köhler, M. Gargano, N. Matentzoglu, L. C. Carmody, D. Lewis-Smith, N. A. Vasilevsky, et al., The Human Phenotype Ontology in 2021. Nucleic Acids Res. 49 (2021): D1207–D1217. [19] Mondo Disease Ontology, 2021. URL: http://mondo.monarchinitiative.org. [20] OBO Relation Ontology, 2021. URL: https://oborel.github.io/. [21] G. D. A. Guardia, R. Z. N. Vêncio, C. R. G. de Farias, A UML profile for the OBO relation ontology. BMC Genomics. 13 (2012): Suppl 5: S3. [22] Chlorpyrifos Facts, 2021. URL: https://www.panna.org/resources/chlorpyrifos-facts. [23] R. D. Burke, S. W. Todd, E. Lumsden, R. J. Mullins, J. Mamczarz, W. P. Fawcett, et al., Developmental neurotoxicity of the organophosphorus insecticide chlorpyrifos: from clinical findings to preclinical models and potential mechanisms. J Neurochem. 142 (2017) Suppl 2: 162– 177. [24] D. Osumi-Sutherland, M. Courtot, J. P. Balhoff, C. Mungall, Dead simple OWL design patterns. J Biomed Semantics. 8 (2017): 18. [25] L. Chan, N. Vasilevsky, A. Thessen, J. McMurry, M. Haendel, The landscape of nutri-informatics: a review of current resources and challenges for integrative nutrition research. Database. (2021). doi:10.1093/database/baab003 [26] T. Hartung, R. E. FitzGerald, P. Jennings, G. R. Mirams, M. C. Peitsch, A. Rostami-Hodjegan, et al., Systems Toxicology: Real World Applications and Opportunities. Chem Res Toxicol. 30 (2017): 870–882. [27] E. B. Myklebust, E. Jimenez-Ruiz, J. Chen, R. Wolf, Tera: the toxicological effect and risk assessment knowledge graph. arXiv preprint arXiv. (2019). URL: https://arxiv.org/abs/1908.10128