Mapping WordNet to the Basic Formal Ontology using the KYOTO ontology Selja Seppälä 1∗ 1 Department of Philosophy, University at Buffalo, USA 1 INTRODUCTION The KYOTO ontology (hereafter KYOTO) is part of a project Ontologies are often used in combination with natural language aimed at representing domain-specific terms in a computer-tractable processing (NLP) tools to carry out ontology-related text axiomatized formalism to allow machines to reason over texts manipulation tasks, such as automatic annotation of biomedical in natural language (Vossen et al., 2010). It links WordNets texts with ontology terms. These tasks involve categorizing relevant of different languages to ontology classes, on the basis of a terms from texts under the appropriate categories. This requires mapping of the English WN to KYOTO. The approximately 2000 coupling ontologies with lexical resources. Several projects have classes of KYOTO are subdivided into three layers: (1) The top- realized these kinds of mappings with upper-level ontologies that most layer is based on the Descriptive Ontology for Linguistic are extended by domain-specific ontologies (Gangemi et al., 2010; and Cognitive Engineering (DOLCE-Lite-Plus, version 3.9.7) and Laparra et al., 2012; Niles and Pease, 2003; Pease and Fellbaum, OntoWordNet (Gangemi et al., 2003). DOLCE shares a number 2010). However, no such resource is available for the Basic Formal of relevant characteristics with BFO: domain neutrality; bi-partition Ontology (BFO), which is widely used in the biomedical domain.1 into ‘endurants’ (CONTINUANTS) and ‘perdurants’ (OCCURRENTS); We describe and evaluate a semi-automatic method for mapping strict hierarchical is a taxonomy; distinction between independent the large lexical network WordNet 3.0 (WN) to BFO 2.0 exploiting and dependent entities. (2) The second layer is composed of noun an existing mapping between WN and the KYOTO ontology, which and verb synsets constituting a set of Base Concepts (BCs). (3) includes an upper-level ontology similar to BFO. Our hypothesis The third layer contains domain-specific classes (e.g. from the is that a large portion of WN, primarily nouns and verbs, can be environmental domain). semi-automatically mapped to BFO 2.0 types by means of simple mapping rules exploiting another ontology already linked to WN. 3 MAPPING METHOD Our semi-automatic mapping method involves three main steps: 2 ONTOLOGICAL AND LEXICAL RESOURCES 1. Manually creating mappings: The Basic Formal Ontology (BFO) is a domain-neutral upper-level • from KYOTO to BFO on the basis of existing mappings ontology (Smith et al., 2012). It represents the types of things that of DOLCE to BFO 1.0 and BFO 1.1 (Grenon, 2003; Khan exist in the world and relations between them. BFO serves as an and Keet, 2013; Seyed, 2009; Temal et al., 2010), ignoring integration hub for mid-level and domain-specific ontologies, such the axiomatization incompatibilities; as the Ontology for Biomedical Investigations (OBI) and the Cell • from BFO 1.0 and BFO 1.1 to BFO 2.0 on the basis of Line Ontology (CLO), which thus become interoperable (Smith work in Seppälä et al., 2014; and Ceusters, 2010). BFO is subdivided into CONTINUANTS (e.g., • from WN semantic labels to BFO 2.0. OBJECTS and FUNCTIONS) and OCCURRENTS (e.g., PROCESSES 2. Manually creating mapping rules using the above mappings and EVENTS). Continuants can be either independent (e.g., physical and extending them with more specific rules from other OBJECTS like persons and hearts) or dependent (e.g., the ROLE of a KYOTO types. person as a physician and the FUNCTION of a heart to pump blood). The most recent version, BFO 2.0, represents 35 types to which 3. Implementing the 33 resulting mapping rules in a Python previous versions (BFO 1.0 and BFO 1.1) have been mapped in pipeline using the natural language toolkit for Python that Seppälä et al., 2014. integrates WN 3.04 (NLTK 3.0). WordNet 3.0 is a large lexical network linking over 117000 The rules are of the form: ‘KYOTO/WN > BFO 2.0’, for sets of synonymous English words (synsets) by means of semantic example: relations; it is widely used in NLP tasks (Fellbaum, 1998). Noun ‘#non-agentive-social-object > disposition’ and verb synsets are linked via the hypernym relation.2 WN 3.0 ‘accomplishment > process’ distinguishes between types and instances, meaning named entities. ‘noun.act > process’ It also links a subset of synsets to topic domains (e.g., ‘medicine’) The implementation first lists all KYOTO types that subsume and semantic labels (e.g., the ‘noun.artifact’ lexicographer file a WN synset using the WN-KYOTO mapping data files.5 For contains “nouns denoting man-made objects”3 ). example, the synset immunity.n.02 is linked to: ∗ To whom correspondence should be addressed: seljamar@buffalo.edu 4 Natural Language Toolkit for Python (NLTK), version 3.0, 1 See http://ifomis.uni-saarland.de/bfo/users. http://www.nltk.org. 2 Adjectives and adverbs are linked by way of other semantic relations. 5 http://kyoto-project.eu/xmlgroup.iit.cnr.it/kyoto/index9c60.html?option= 3 See http://wordnet.princeton.edu/man2.1/lexnames.5WN.html. com contentview=articleid=429Itemid=156 Copyright c 2015 for this paper by its authors. Copying permitted for private and academic purposes 1 Seppälä ‘Kyoto#condition__status-eng-3.0-13920835-n’, will be to provide BFO-compliant interpretations of unmatched WN ‘Kyoto#state-eng-3.0-00024720-n’, synsets. ‘ExtendedDnS.owl#situation’, 6 CONCLUSION AND FUTURE WORK ‘ExtendedDnS.owl#non-agentive-social-object’, We presented a method to semi-automatically map WordNet 3.0 ‘ExtendedDnS.owl#social-object’, synsets to BFO 2.0 types via the KYOTO ontology. Our preliminary ‘DOLCE-Lite.owl#non-physical-object’, results are encouraging, but more work is needed to see if the ‘DOLCE-Lite.owl#non-physical-endurant’, method scales to the full WN. Future work will include: extending ‘DOLCE-Lite.owl#endurant’, the evaluation set of medical synsets using hyponymy relations and ‘DOLCE-Lite.owl#spatio-temporal-particular’, other domain resources; carrying out more thorough evaluations, ‘DOLCE-Lite.owl#particular’ e.g., by randomly extracting samples of synsets grouped by part Second, the mapping rules are applied starting from the more of speech; augmenting the mapping rules by exploiting other specific ones (BFO leaf nodes): the program tests if a given string resources, e.g., WN-SUMO mappings and ontologies extending (e.g., ‘#non-agentive-social-object’) matches a string in the BFO. types list; if the strings match, the program assigns to that synset the corresponding BFO 2.0 type (e.g., ‘disposition’). Thus, the synset immunity.n.02 is categorized as referring to a subtype of ACKNOWLEDGEMENTS the BFO type DISPOSITION. Work on this paper was supported by the Swiss National Science Foundation (SNSF). Thanks also to Christopher Crowner, Barry 4 EVALUATION AND RESULTS Smith, and Alan Ruttenberg. We manually evaluated the method on the 106 synsets in KYOTO marked with a ‘medicine’ topic domain. 72% of the assigned BFO types were correct (63% of the synsets were assigned the expected REFERENCES BFO type; 8% a superclass). As hypothesized, all the correctly Fellbaum, C., editor (1998). WordNet: An Electronic Lexical Database. MIT Press, Cambridge, MA. categorized synsets were nominal and verbal. 27% of the assigned Gangemi, A., Guarino, N., Masolo, C., and Oltramari, A. (2003). Sweetening WordNet BFO types were incorrect (mostly adjectives). One synset was not with DOLCE. AI magazine, 24(3), 13–24. matched by any rule. Gangemi, A., Guarino, N., Masolo, C., and Oltramari, A. (2010). Interfacing WordNet with DOLCE: towards OntoWordNet. In C.-r. Huang, N. Calzolari, and A. Gangemi, editors, Ontology and the Lexicon: A Natural Language Processing Perspective, 5 DISCUSSION pages 36–52. Cambridge University Press. WN is too large to be manually mapped to BFO. Using the Grenon, P. (2003). BFO in a Nutshell: A Bi-categorial Axiomatization of BFO and properties of the hypernym hierarchy, we could have approached Comparison with DOLCE. IFOMIS Report 06/2003. Technical report, Institute for Formal Ontology and Medical Information Science (IFOMIS), University of the problem by mapping the top levels of WN to the relevant Leipzig, Leipzig, Germany. BFO types, and propagating the mapped BFO types downwards. Khan, Z. C. and Keet, C. M. (2013). Addressing issues in foundational ontology However, WN’s organization fails to comply with basic ontological mediation. In Proceedings of KEOD’13, pages 5–16, Vilamoura, Portugal. principles (Gangemi et al., 2010). Moreover, that method would SCITEPRESS. only cover nouns and verbs, while KYOTO also includes adjectives. Laparra, E., Rigau, G., and Vossen, P. (2012). Mapping WordNet to the Kyoto ontology. In LREC, pages 2584–2589. Mapping DOLCE to BFO is not trivial: their categories do not Niles, I. and Pease, A. (2003). Linking Lexicons and Ontologies: Mapping Wordnet to align in every case and are in some cases governed by different the Suggested Upper Merged Ontology. In Proceedings of the IEEE International axioms. The former is meant to capture our use of language and Conference on Information and Knowledge Engineering, pages 412–416. conceptualization of the world; the latter is a realist ontology and Pease, A. and Fellbaum, C. (2010). Formal ontology as interlingua: The SUMO and WordNet linking project and global WordNet. In C.-r. Huang, N. Calzolari, and excludes from its scope unicorns and other putative non-real entities. A. Gangemi, editors, Ontology and the Lexicon: A Natural Language Processing However, these differences will not matter for our purposes here. Perspective. Cambridge University Press. Mapping WN to BFO is not trivial: WN represents linguistic usage; Seppälä, S., Smith, B., and Ceusters, W. (2014). Applying the Realism-Based BFO, entities in the world. WN thus includes synsets that, in BFO Ontology-Versioning Method for Tracking Changes in the Basic Formal Ontology. terms, do not refer (at all or to a BFO type, e.g. positive.a.04). In 8th International Conference on Formal Ontology in Information Systems (FOIS 2014), Rio de Janeiro, Brazil. 10 synsets in the evaluation set posed categorization issues. Seyed, A. P. (2009). BFO/DOLCE Primitive Relation Comparison. In Nature Our solutions to these issues are: (1) to extend the coverage of Precedings. the rules by adding other types included in KYOTO and WN’s Smith, B. and Ceusters, W. (2010). Ontological Realism: A Methodology for semantic labels; (2) to ignore the axiomatizations. Indeed, this work Coordinated Evolution of Scientific Ontologies. Applied Ontology, 5, 139–188. Smith, B., Almeida, M., Bona, J., Brochhausen, M., Ceusters, W., Courtot, M., Dipert, is neither aimed at mapping DOLCE to BFO, nor at axiomatizing R., Goldfain, A., Grenon, P., Hastings, J., Hogan, W., Jacuzzo, L., Johansson, I., WN. Instead, we attempt to answer the question: to what types of Mungall, C., Natale, D., Neuhaus, F., Rovetto, A. P. R., Ruttenberg, A., Ressler, M., entities do WN synsets refer? The resulting mappings are to be read and Schulz, S. (2012). Basic Formal Ontology 2.0: DRAFT SPECIFICATION AND as ‘a WN synset X refers to something that is a subtype of BFO type USER’S GUIDE. Y’, as in ‘the synset immunity.n.02 refers to a subtype of the BFO Temal, L., Rosier, A., Dameron, O., and Burgun, A. (2010). Mapping BFO and DOLCE. Studies In Health Technology And Informatics, 160(Pt 2), 1065–1069. type DISPOSITION’ — we exclude instances for now. Even a partial Vossen, P., Rigau, G., Agirre, E., Soroa, A., Monachini, M., and Bartolini, R. (2010). mapping should be sufficient to cover a large portion of WN, leaving KYOTO: an open platform for mining facts. In Proceedings of the 6th Workshop on a smaller subset of problematic cases. An interesting challenge Ontologies and Lexical Resources, pages 1–10. 2 Copyright c 2015 for this paper by its authors. Copying permitted for private and academic purposes