FoodOn: A Semantic Ontology Approach for mapping Foodborne Disease Metadata Dalia A. Alghamdi, Damion M. Dooley, Gurinder Gosal, Emma J. Griffiths, Fiona S.L. Brinkman and William W.L. Hsiao 1 BC Center for Disease Control, 655 W 12th Ave, Vancouver, BC V5Z 4R4, Canada 2 University of British Columbia, 2329 West Mall, Vancouver, BC V6T 1Z4, Canada ₃ Simon Fraser university, 8888 University, Burnaby, BC V5A 1S6, Canada ABSTRACT FoodOn (http://foodon.org) is an ontology resource that The FoodOn Food Ontology contains standardized terms and a facet- aims to model the food domain, which includes knowledge based classification scheme for describing food products, processing and about food and food-related human activities, such as agri- environments. Mapping of foodborne pathogen isolate source information (descriptors of the contaminated materials and locations) to the FoodOn culture, medicine, food safety inspection, shopping patterns standard can facilitate data sharing and integration between multi- and sustainable development (Griffiths et al., 2016). Map- jurisdictional health and regulatory agencies utilizing disparate software ping of foodborne pathogen isolate metadata to FoodOn can platforms and data dictionaries. Faster and more efficient sharing of in- provide a means for standardizing, translating, and com- formation is critical for tracking and controlling outbreaks of foodborne municating this critical contextual information between disease at local, national and international levels. This work describes mapping procedures which can be utilized by organizations and software health agencies and platforms in a timely fashion. Here we developers to better enable interoperability between foodborne pathogen describe a semi-automated method derived from mapping surveillance and outbreak management systems. metadata from the widely used online microbial MLST typ- ing platform Enterobase, which can be broadly applied to other use cases. INTRODUCTION Globalization of food networks increases opportunities FOODON DESIGN PRINCIPLES for the spread of foodborne pathogens beyond borders and Although there are several existing indexing systems jurisdictions, with major impacts on global health and econ- directly or indirectly related to food and food-borne illness, omies (Altekruse & Swerdlow et al.,1996; World Health including those maintained by Health Canada, the US De- Organization, 2008). Whole genome sequencing (WGS) partment of Agriculture, and the UN’s Food and Agriculture provides the highest resolution evidence for identifying, Organization, they have been built for different purposes typing and matching foodborne pathogen isolates from dif- and so differences in their architecture hinder interoperabil- ferent sources. WGS results must be combined with source ity. To provide a more comprehensive view of food safety, information to be meaningfully interpreted for regulatory data from these various sources must be integrated. In a and health interventions, outbreak investigation, and risk concerted effort to solve this semantic interoperability prob- assessment. Isolate metadata (source of a pathogen) is criti- lem, the OBOFoundry.org family of ontologies was estab- cal for determining mode of disease transmission, sources of lished in 2007 in order to provide a comprehensive set of exposure and risk, susceptible populations, geographical vocabularies in the biomedical domain. FoodOn, built distribution and more. Public health and regulatory agencies largely on a longstanding American and European facet- not only use different analytical platforms to track and re- based food indexing system called LanguaL solve outbreaks, but implement different data dictionaries (http://langual.org), provides a list of over 2,000 plant and and free text descriptions for describing isolates and expo- animal food ingredient terms, as well as a supplemental list sures. The most important factor in reducing the number of of over 9,000 indexed food products. Facets include fields preventable cases of disease is timeliness of investigations for describing food processing, cooking and preservation, as and responses, which is negatively impacted by the time- well as source ingredient anatomy, taxonomy, geography consuming re-coding and manual curation required for and cultural heritage. The aim of FoodOn is to develop an translating non-standardized information between systems international standard for describing properties of food re- and agencies. To address the interoperability problem, it is lated to agriculture, animal husbandry, collection, distribu- important to relate similar concepts or relations from one tion, preservation, culinary use, consumption and food safe- agency, information management system or jurisdiction, to ty. FoodOn was accepted into the OBOFoundry in 2017. another. Mapping terms using an ontology represents a very powerful solution for standardizing and integrating hetero- FOODON MAPPING AND DATA HARMONIZATION geneous data. Microbial Multilocus Sequence Typing (MLST) is a technique used to classify and identify pathogenic strains for * To whom correspondence should be addressed: Dalia.alghamdi@bccdc.ca outbreak investigation and surveillance of contamination. 1 Alghamdi et al. Enterobase is a widely used online platform enabling MLST Griffiths, E., Dooley, D., Buttigieg, P.L., Hoehndorf, R., Brinkman, F., and analysis of enteric pathogens such as E. coli, Salmonella, Hsiao, W. (2016). FoodOn: A global farm-to-fork food ontology. ICBO Shigella, Yersinia and Moraxella. Enterobase contains Conference. Corvalis, OR, USA >100,000 genomes, along with their source metadata, en- compassing food, anatomical and clinical, as well as envi- J. Euzenat and P. Shvaiko. Ontology Matching. Springer, 2007. ronmental domains. Based on curation and mapping of Enterobase isolate Jérôme Euzenat. 2007. Semantic precision and recall for ontology align- metadata, we have developed a semi-automated ontology ment evaluation. In Proceedings of the 20th international joint conference mapping system that will enable mapping of food safety on Artifical intelligence (IJCAI'07), Rajeev Sangal, Harish Mehta, and R. K. Bagga (Eds.). Morgan Kaufmann Publishers Inc., San Francisco, CA, metadata to FoodOn food products and processing environ- USA, 348-353. ments according to the following steps: (1) Syntactic analy- sis, where each categorical term in a single free text entry will be separated. (2) Semantic mapping of each categori- cal term according one of the following rules: (a) mapping to similar concept; (b) mapping to similar ancestors; (c) mapping to similar relations; (d) combining several match- ing techniques. (3) structural mapping, where the items are mapped to a corresponding subclass in the reference hierar- chy. Non-interactive ontology matching tool can be evaluated using recall and precision (J. Euzenat and P. Shvaiko., 2010). They are measured based on comparing the expected results with the results of the evaluated system. Precision measure the ration of the correct matched terms over the total number of the matched terms, on the other hand, recall measures the ratio of the correct matched of the total num- ber of expected terms to be matched. Logically, one can say that the precision can evaluate the correctness of the evalu- ated system and recall can evaluate the completion of the evaluated system (Euzenat., 2007). Here, we will evaluate our mapping accuracy by sub-sampling randomly a set of 500 records from the Enterobase and evaluate the suitability of the terms assigned using recall and precision methods. We will further manually review all the terms that cannot be manually mapped to an ontological term and add the terms to appropriate ontologies. The goal of our exercise is to min- imize manually intervention when annotating food-related sample sources using FoodOn. Integrating genomic profiling of foodborne pathogens with a food descriptor framework will help reduce barriers for knowledge exchange among research communities, gov- ernment risk managers and health providers. ACKNOWLEDGEMENTS We thank Mark Achtman, Nabil-Fareed Alikhan, Mark Pallen, Martin Sergeant, Zhemin Zhou for sharing the En- terobase. REFERENCES Altekruse S, Swerdlow D. The changing epidemiology of foodborne dis- ease. Am J Med Sci. 1996, 311: 23-29. World Health Organization, Foodborne disease outbreaks: guidelines for investigation and control, WHO Press, Geneva (2008). 2