Evaluation of a Semantic-Oriented Approach to Cross- Lingual Ontology Mapping Bo Fu, Rob Brennan, Declan O’Sullivan Knowledge and Data Engineering Group, School of Computer Science and Statistics, Trinity College Dublin, Ireland {bofu, rob.brennan, declan.osullivan}@cs.tcd.ie ABSTRACT Most ontology mapping research has focused on the matching of 2. THE SOCOM FRAMEWORK The semantic-oriented cross-lingual ontology mapping (SOCOM) ontologies written in the same natural language, and developing framework is designed specifically for cross-lingual mapping tools and techniques that support this monolingual ontology tasks carried out in multilingual environments. In doing so, it first mapping process. However, as knowledge modelling is not transforms one of the given ontologies into an equivalent of itself restricted to the usage of a single natural language, mapping that is labelled in the natural language used by the other(s), it then systems must be able to operate upon ontologies that are labelled applies existing monolingual matching techniques. The in diverse natural languages. This paper outlines a semantic- transformation of an ontology requires the translation of ontology oriented cross-lingual ontology mapping framework that makes labels from the source natural language to the target natural use of several information sources to influence the selection of language, whereby the notion of appropriate ontology label ontology label translations in the process of generating high translation (AOLT) is employed. An AOLT is a translation that is quality mapping results, and presents a high-level overview of the most likely to maximise the success of the subsequent evaluation strategy of the proposed framework. monolingual ontology matching step. The AOLT selection process therefore is concerned with identifying the translations Keywords that will most likely enhance the matching ability of monolingual Cross-Lingual Ontology Mapping; Appropriate Ontology Label matching techniques, but not necessarily the translations that are Translation; Multilingual Ontologies. linguistically most correct. To achieve AOLT, several sources of information are used. 1. INTRODUCTION Firstly, the source ontology semantics are used to indicate the Benjamins et al. [1] identify multilinguality as one of the great context of use for the to-be-translated resource labels. Given a challenges for the semantic web, and point out that one way to certain position of a node, the labels of its surrounding nodes (i.e. address this challenge is by providing assistance for the context) can be analysed. For example, for a class node, the labels annotation of ontologies regardless of the natural languages used of its super/sub/sibling-classes can illustrate its context of use. in them. However, to date, research in the field of ontology Secondly, since the source ontology is transformed so that it can mapping has largely focused on the matching of ontologies be best mapped to the target ontology, the target ontology labelled in the same natural language, where various monolingual semantics can be perceived as translation selection guidelines. For ontology matching techniques have been developed as example, when several candidate translations are linguistically documented by Euzenat & Shvaiko [2]. With ontologies being correct for a label, its AOLT is the one that is closest to what is widely accepted as a knowledge management mechanism in used in the target ontology. Thirdly, mapping intent captures the multilingual organisations [3] and used in a range of applications user’s motive in a CLOM scenario. For example, when working in including machine translation [4], information retrieval (IR) [5] a highly refined domain such as medicine, achieving highly and cross-lingual IR [6], multilinguality is increasingly evident in precise matches would be priority. Whereas when merging ontologies. One way to enable knowledge discovery, sharing and knowledge repositories, gaining reasonable recall in the matches reuse across natural language barriers in ontology-based systems generated may be desired. With known intent, the SOCOM is by means of cross-lingual ontology mapping (CLOM). framework selects the most suitable translation source(s) in order This paper proposes the semantic-oriented cross-lingual ontology to generate mappings with high precision and/or recall. Fourthly, mapping (SOCOM) framework and presents a high-lever background knowledge can be drawn on the ontology domains overview of its evaluation. which can be system specified or user specified. In other words, encyclopedia or users can assist the AOLT process by providing additional context of use. Fifthly, to draw on user expertise, the SOCOM framework allows a user to specify preferred translation Permission to make digital or hard copies of all or part of this work for sources and/or matching algorithms. Sixthly, mapping assessment personal or classroom use is granted without fee provided that copies are is used as a feedback mechanism in the SOCOM framework, not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy whereby statistics containing top-rated translation sources and/or otherwise, or republish, to post on servers or to redistribute to lists, matching techniques are collected to aid the future execution of requires prior specific permission and/or a fee. the framework. This feedback can be implicit or explicit. Implicit EKAW 2010, October 11-15, 2010, Lisbon, Portugal. feedback is generated when the system assumes certain matches are correct and identifies the most effective tools based on the large multilingual ontologies in English and German. These assumption. Explicit feedback is generated by the users and is mapping results then enabled cross-lingual document retrieval of more reliable. Seventhly, time constraints may limit the run time an adaptive personalised result composition and presentation for the AOLT process. E.g., when rapid execution is desired, the system. Bilingual users can issue queries in German and retrieve user can turn on/off certain features dynamically. Lastly, not all of relevant as well as personalised content in English. More details the aforementioned resources will be always available to every of this information retrieval and composition system can be found CLOM scenario. Resource constraints therefore may restrict the in [8]. level of sophistication of the AOLT selection process. Lastly, in all the experiments carried out, precision, recall and f- measure scores were calculated to evaluate the quality of 3. EVALUATION STRATEGY mappings generated. In addition, statistic analysis, namely two- A state of the art review is conducted first to identify current tailed t-tests were carried out on the score generated by the approaches to CLOM. Through this review process, a generic SOCOM framework and the generic approach in order to validate approach to CLOM was identified and implemented that uses off- the statistical significance of the experimental findings. the-shelf machine translation tools and monolingual ontology matching techniques. To investigate the effectiveness and to 4. ACKNOWLEDGMENT identify potential limitations of this generic approach to CLOM, it This research is partially supported by Science Foundation Ireland is evaluated in two CLOM scenarios involving ontologies written (Grant 07/CE/11142) as part of the Centre for Next Generation in Chinese, English and French. These ontologies contain Localisation (http://www.cngl.ie) at Trinity College Dublin. approximately one hundred entities and are of the semantic research community and the bibliography domain. Results from these experiments showed that mappings can be neglected by 5. REFERENCES monolingual matching tools when entity labels are translated [1] Benjamins R. V., Contreras J., Corcho O., Gomez-Perez A. independently from the ontologies of interest. When the 2004. Six Challenges for the Semantic Web. AIS SIGSEMIS translations of ontology labels are carried out in isolation of the Bulletin, Vol. 1, Iss. 1, 2004. CLOM tasks at hand, inadequate and synonymic translations can [2] Euzenat J., Shvaiko P. 2007. Ontology Matching. Springer- introduce further complications to the subsequent monolingual Verlag Berlin/Heidelberg. matching step. [3] Chang C., Lu W. 2002. The Translation of Agricultural Based on this finding, the notion of appropriate ontology label Multilingual Thesaurus. In Proceedings of the 3rd Asian translation arose. An initial framework prototype is implemented Conference for Information Technology in Agriculture that makes use of the readily defined semantics of the given (Beijing, China, October 26-28, 2002), 526-528. ontologies in a CLOM scenario. This prototype is evaluated [4] Shi C., Wang H. 2005. Research on Ontology-Driven against the generic approach in the aforementioned CLOM Chinese-English Machine Translation. In Proceedings of scenarios using the same multilingual ontologies and gold 2005 IEEE International Conference on Natural Language standards. Experimental results showed that the SOCOM Processing & Knowledge Engineering (Wuhan, China, framework generated higher quality mapping results than the October 30 – November 01, 2005), 426-430. DOI=10.1109/ generic approach due to its ability to select translations that are NLPKE.2005.1598775 similar to what were used by the target ontology in a specific CLOM setting. [5] Guan J., Deng J, Qu Y. 2005. An Ontology-Driven Information Retrieval Mechanism for Semantic Information Motivated by this initial result, a second framework prototype was Portals. In Proceedings of 1st International Conference on then designed and implemented to draw on additional inputs Semantic, Knowledge and Grid (Beijing, China, November (discussed in section 2) in the AOLT selection process, effectively 27 - 29, 2005). SKG. IEEE Computer Society, Washington, allowing fine tuning of the system. This second prototype is DC, 63. DOI= http://dx.doi.org/10.1109/SKG.2005.42 evaluated against the generic approach in the same CLOM experiments involving the aforementioned multilingual [6] Zhang L., Wu G., Xu Y., Li W., Zhong Y. 2004. ontologies. Various combinations of the AOLT influence sources Multilingual Collection Retrieving Via Ontology Alignment. were executed in a range of experimental runs of the framework, In Proceedings of the 7th International Conference on Asian and several sets of mappings were generated. Versatility in these Digital Libraries (Shanghai, China, December 13-17, 2004) mapping results demonstrated the flexibility of the AOLT LNCS 3334, 939-957. DOI=10.1007/978-3-540-30544-6_57 selection mechanism and showcased the tuning ability of the [7] Şah M., Wade V. 2010. Automatic Metadata Extraction from SOCOM framework. Multilingual Enterprise Content. In Proceedings of the 19th Furthermore, as the experiments discussed above only concern ACM International Conference on Information and ontologies of relatively small sizes, to assess the scalability of the Knowledge Management (Toronto, Canada, October 26-30, framework, the second prototype was applied in a real-world 2010), to appear. CLOM setting involving large organisational ontologies written in [8] Steichen B., Wade V. 2010. Adaptive Retrieval and English and German. These ontologies contained over 7000 Composition of Socio-Semantic Content for Personalised entities and were generated semi-automatically using enterprise Customer Care. In Proceedings of International Workshop data of the technical customer support domain. More details of on Adaptation in Social and Semantic Web (Big Island of how these ontologies are generated can be found in [7]. Mappings Hawaii, USA, June 21, 2010), 1-10, ISSN 1613-0073. were then generated using the SOCOM framework between these