Cardiovascular Health and Physical Activity: A Model for Health Promotion and Decision Support Ontologies Vimala Ponna Aaron Baer Matthew Lange, PhD Department of Neurobiology, Department of Food Science and Department of Food Science and Physiology, and Behavior Technology Technology University of California at Davis University of California at Davis University of California at Davis Davis, CA Davis, CA Davis, CA vmponna@ucdavis.edu ambaer@ucdavis.edu mclange@ucdavis.edu Abstract— Current cardiovascular disease decision support repository and scientific workflow capable of providing the systems (DSS) rely primarily on ontologies that characterize and foundation for accelerated creation of health-focused quantify disease, recommending appropriate pharmacotherapy ontologies. This semi-automated workflow enables conversion (PT) and/or surgical interventions (SI). PubMed and Google of textual annotations from scientific literature into triples Scholar searches reveal no specific ontologies or literature (knowledge propositions in the form of semantic triples). A related to DSS for recommending physical activity (PA) and diet interventions (DI) for cardiovascular health and fitness (CVHF) semantically enabled backend repository stores triples improvement. This dearth of CVHF-PA/DI structured combined from many sources. Knowledge gleaned from knowledge repositories has resulted in a scarcity of user-friendly multiple, sometimes-conflicting sources enables these triples tools for scientifically validated information retrieval about from many sources of literature into one conceptual map with CVHF improvement. Advancement of health science depends on visualization of new and unexpected relationships in the form timely development and implementation of health (rather than of a conceptual lattice. With the help of Protégé, these disease) ontologies. We developed a time-efficient workflow for conceptual lattices convert to health-focused ontologies, constructing/maintaining structured knowledge repositories which equip DSS with knowledge regarding PA and DI. capable of providing informational underpinnings for CVHF- PA/DI ontologies and DSS that support health promotion, Ultimately, health-focused ontologies and DSS provide including precise, personalized exercise prescription. This patients, physicians, and researchers easy access to knowledge workflow creates conceptual lattices about effects of varied PA on health trajectories, health improvement, and individual on CVHF. These conceptual maps lay the foundation for health outcomes. By employing this semi-automated accelerated creation of health-focused ontologies, which workflow and enabling concept lattice to ontology conversion, ultimately equip DSS with CVHF knowledge related PA and DI. we have created an express tool for health-focused data extraction. With this system, modern medicine can embrace the idea of health promotion, rather than disease risk INTRODUCTION assessment. Current healthcare ontologies and DSS rely primarily on knowledge relevant to disease risk assessment and treatment DESIGN AND METHODS and are focused almost entirely on assessing PT and SI. We employed open source and commercial off the shelf Analogous ontologies and DSS for advancing consumer health technologies including Zotero [1], Excel [2], MySQL [3], via PA do not yet exist. Successful implementation of Python [4], Cmap [5], and Protégé [6] as part of the semi- healthcare ontologies and DSS for recommending specific PT automated workflow for easy data mining and concept lattice and SI for cardiovascular diseases are built upon databases extraction from literature. This workflow begins in Zotero’s from clinical trials and patient records, combined with highly PDF viewer where human annotation takes place to highlight curated, hierarchical vocabularies of diseases, diagnoses, PT, and note semantic triples of interest in an article as illustrated and SI. PubMed and Google Scholar searches reveal no in Fig. 1. Next, the “extract annotations” tool in Zotero is used scientific literature about healthcare ontologies and consumer to create a .txt file, shown in Fig. 2, of the annotations made. DSS for CVHF using analogous systems related to knowledge The information in this .txt file is then transferred to Excel about PA and DI for health improvement. Medicine today where a macro parses the annotations into four columns as relies heavily on modeling disease, rather than modeling represented by Fig. 3. The .csv file created in Excel is then health. Part of the problem is the dearth of queryable, curated imported into a table in MySQL and further parsed into a three- column table shown in Fig 4. The table in MySQL is exported and structured knowledge repositories dedicated to CVHF as a .txt file and imported as “Propositions to text” in Cmap, relative to specific DI and PAs. These immense reserves of creating a concept map, part of which can be seen in Fig. 5. information often require time-consuming data mining and Finally, the concept maps obtained from such articles can be inhibit timely advancement of health and lifestyle science. exported as .cxl files, reformatted to .owl files, and imported User-friendly tools for information retrieval from scientific into Protégé for ontology creation. As an example, we utilized literature such as research articles, clinical studies, and this semi-automated workflow to extract information from published texts have yet to be pioneered. We developed a “Potential adverse cardiovascular effects from excessive straightforward, time effective structured knowledge endurance exercise” by O’Keefe et al. and create a conceptual lattice about the effects of PAs with varied types, intensities, durations and frequencies on CVHF [7]. A total of 177 unique concepts, 49 linking phrases, and 156 propositions were compiled from the article. These concepts are linked to concepts in other maps created from ontologies, for example The Foundational Model of Anatomy Ontology [8]. Fig. 5. Two concepts from article in CMAP. Sustainability plans for the ontology will be developed once we receive initial feedback from the community about how paths forward for integration with related ontologies. We have not yet tested this initial ontology. CONCLUSIONS AND FURTHER RESEARCH We have created a prototype platform for semi-automated concept lattice generation from data mining that is easy to use, integrates information, and creates visualization for a knowledge network. It enables health professionals in preventing health problems before they start, bringing an Fig. 1. Article annotations in Zotero’s PDF viewer. enormous change to the medical industry. Immediate implications of this workflow are the creation of a health- focused ontology for individuals who engage in vigorous exercise and their physicians who may use it as a teaching tool. The health-focused ontology built on PA can be combined with the creation of other health-related ontologies related to PA, DI, and other health improvement methods, as part of a multi-ontology framework to accelerate the development of health promotion [9]. Correlational relationships discovered from integration of multiple ontologies will provide foundations for more research on health promotion. Further automation of this semi-automated workflow will make health-focused ontology creation even faster and more easy to use. Part of this automation process will employ development of add-on functions within Zotero, eliminating the use of Excel and extracting concepts directly into the database. Additional steps would include crowd- Fig. 2. Extracted annotations as .txt file using tool within Zotero. sourcing information by enabling this tool to communicate through web services into cross-disciplinary conceptual lattices. The goal is to develop an environment where, with minimal oversight, one can move from textual annotations into map creation easily. Ultimately, this will lay the foundation for building a large repository of structured knowledge related to PA and provide a model for mapping other human behaviors to individual health outcomes. However, in working with this prototype semi-automated workflow, errors involving imprecise language and varying Fig. 3. Extracted annotations parsed to four-column table in Excel. tense highlight the need for detailed inspection and refinement of annotations. These errors emphasize areas of ambiguous jargon used in health, which need to be explicitly characterized. Such manual inspections take considerable time and underscore the need for semi-automated concept/linking phrase suggestion mechanisms. Despite its errors, this prototype semi-automated workflow serves as the solution for the dire necessity of a fast, accessible, and comprehensible system for improving current knowledge and information about health promotion in medicine. Fig. 4. Extracted annotations parsed to three-column table in MySQL. REFERENCES [5] Florida Institute for Human & Machine Cognition (2014). Cmap. Retrieved from http://cmap.ihmc.us [1] Roy Rosenweig Center for History and New Media. Zotero. Retrieved [6] Stanford Center for Biomedical Informatics Research (2016). Protégé. from https://www.zotero.org Retrieved from http://protege.stanford.edu [2] Microsoft Office (2016). Microsoft Excel. Retrieved from https://www.microsoftstore.com/store/msusa/en_US/pdp/productID.323 [7] J. O’Keefe et al., “Potential adverse cardiovascular effects from excessive endurance exercise.” Mayo Clinic Proceedings 87.6, pp. 587– 021400?s_kwcid=AL!4249!3!105984118253!e!!g!!excel&WT.mc_id=p ointitsem+Google+Adwords+5+- 595, 2012. +Excel+2016&invsrc=search&ef_id=UsDsQwAAAWWySTEQ:201607 [8] C. Rosse, J. Mejino, “A reference ontology for biomedical informatics: 01200844:s the foundational model of anatomy.” Journal of Biomedical Informatics [3] Oracle Corporation and/or its affliliates (2016). MySQL. Retrieved from 36.6 pp. 478–500, 2003. https://www.mysql.com [9] M. Lange, D. Lemay, J. German, “A multi-ontology framework to guide agriculture and food towards diet and health.” Journal of the Science of [4] Python Software Foundation (2016). Python. Retrieved from https://www.python.org Food and Agriculture 87 pp. 1427-1434, 2007