=Paper= {{Paper |id=None |storemode=property |title=Evaluation of a Semantic-oriented Approach to Cross-lingual Ontology Mapping |pdfUrl=https://ceur-ws.org/Vol-674/Paper134.pdf |volume=Vol-674 |dblpUrl=https://dblp.org/rec/conf/ekaw/FuBO10 }} ==Evaluation of a Semantic-oriented Approach to Cross-lingual Ontology Mapping== https://ceur-ws.org/Vol-674/Paper134.pdf
     Evaluation of a Semantic-Oriented Approach to Cross-
                   Lingual Ontology Mapping
                                              Bo Fu, Rob Brennan, Declan O’Sullivan
                     Knowledge and Data Engineering Group, School of Computer Science and Statistics,
                                                          Trinity College Dublin, Ireland
                                       {bofu, rob.brennan, declan.osullivan}@cs.tcd.ie

ABSTRACT
Most ontology mapping research has focused on the matching of
                                                                             2. THE SOCOM FRAMEWORK
                                                                             The semantic-oriented cross-lingual ontology mapping (SOCOM)
ontologies written in the same natural language, and developing
                                                                             framework is designed specifically for cross-lingual mapping
tools and techniques that support this monolingual ontology
                                                                             tasks carried out in multilingual environments. In doing so, it first
mapping process. However, as knowledge modelling is not
                                                                             transforms one of the given ontologies into an equivalent of itself
restricted to the usage of a single natural language, mapping
                                                                             that is labelled in the natural language used by the other(s), it then
systems must be able to operate upon ontologies that are labelled
                                                                             applies existing monolingual matching techniques. The
in diverse natural languages. This paper outlines a semantic-
                                                                             transformation of an ontology requires the translation of ontology
oriented cross-lingual ontology mapping framework that makes
                                                                             labels from the source natural language to the target natural
use of several information sources to influence the selection of
                                                                             language, whereby the notion of appropriate ontology label
ontology label translations in the process of generating high
                                                                             translation (AOLT) is employed. An AOLT is a translation that is
quality mapping results, and presents a high-level overview of the
                                                                             most likely to maximise the success of the subsequent
evaluation strategy of the proposed framework.
                                                                             monolingual ontology matching step. The AOLT selection
                                                                             process therefore is concerned with identifying the translations
Keywords                                                                     that will most likely enhance the matching ability of monolingual
Cross-Lingual Ontology Mapping; Appropriate Ontology Label                   matching techniques, but not necessarily the translations that are
Translation; Multilingual Ontologies.                                        linguistically most correct.
                                                                             To achieve AOLT, several sources of information are used.
1. INTRODUCTION                                                              Firstly, the source ontology semantics are used to indicate the
Benjamins et al. [1] identify multilinguality as one of the great            context of use for the to-be-translated resource labels. Given a
challenges for the semantic web, and point out that one way to               certain position of a node, the labels of its surrounding nodes (i.e.
address this challenge is by providing assistance for the                    context) can be analysed. For example, for a class node, the labels
annotation of ontologies regardless of the natural languages used            of its super/sub/sibling-classes can illustrate its context of use.
in them. However, to date, research in the field of ontology                 Secondly, since the source ontology is transformed so that it can
mapping has largely focused on the matching of ontologies                    be best mapped to the target ontology, the target ontology
labelled in the same natural language, where various monolingual             semantics can be perceived as translation selection guidelines. For
ontology matching techniques have been developed as                          example, when several candidate translations are linguistically
documented by Euzenat & Shvaiko [2]. With ontologies being                   correct for a label, its AOLT is the one that is closest to what is
widely accepted as a knowledge management mechanism in                       used in the target ontology. Thirdly, mapping intent captures the
multilingual organisations [3] and used in a range of applications           user’s motive in a CLOM scenario. For example, when working in
including machine translation [4], information retrieval (IR) [5]            a highly refined domain such as medicine, achieving highly
and cross-lingual IR [6], multilinguality is increasingly evident in         precise matches would be priority. Whereas when merging
ontologies. One way to enable knowledge discovery, sharing and               knowledge repositories, gaining reasonable recall in the matches
reuse across natural language barriers in ontology-based systems             generated may be desired. With known intent, the SOCOM
is by means of cross-lingual ontology mapping (CLOM).                        framework selects the most suitable translation source(s) in order
This paper proposes the semantic-oriented cross-lingual ontology             to generate mappings with high precision and/or recall. Fourthly,
mapping (SOCOM) framework and presents a high-lever                          background knowledge can be drawn on the ontology domains
overview of its evaluation.                                                  which can be system specified or user specified. In other words,
                                                                             encyclopedia or users can assist the AOLT process by providing
                                                                             additional context of use. Fifthly, to draw on user expertise, the
                                                                             SOCOM framework allows a user to specify preferred translation
 Permission to make digital or hard copies of all or part of this work for   sources and/or matching algorithms. Sixthly, mapping assessment
 personal or classroom use is granted without fee provided that copies are
                                                                             is used as a feedback mechanism in the SOCOM framework,
 not made or distributed for profit or commercial advantage and that
 copies bear this notice and the full citation on the first page. To copy    whereby statistics containing top-rated translation sources and/or
 otherwise, or republish, to post on servers or to redistribute to lists,    matching techniques are collected to aid the future execution of
 requires prior specific permission and/or a fee.                            the framework. This feedback can be implicit or explicit. Implicit
 EKAW 2010, October 11-15, 2010, Lisbon, Portugal.                           feedback is generated when the system assumes certain matches
are correct and identifies the most effective tools based on the         large multilingual ontologies in English and German. These
assumption. Explicit feedback is generated by the users and is           mapping results then enabled cross-lingual document retrieval of
more reliable. Seventhly, time constraints may limit the run time        an adaptive personalised result composition and presentation
for the AOLT process. E.g., when rapid execution is desired, the         system. Bilingual users can issue queries in German and retrieve
user can turn on/off certain features dynamically. Lastly, not all of    relevant as well as personalised content in English. More details
the aforementioned resources will be always available to every           of this information retrieval and composition system can be found
CLOM scenario. Resource constraints therefore may restrict the           in [8].
level of sophistication of the AOLT selection process.                   Lastly, in all the experiments carried out, precision, recall and f-
                                                                         measure scores were calculated to evaluate the quality of
3. EVALUATION STRATEGY                                                   mappings generated. In addition, statistic analysis, namely two-
A state of the art review is conducted first to identify current         tailed t-tests were carried out on the score generated by the
approaches to CLOM. Through this review process, a generic               SOCOM framework and the generic approach in order to validate
approach to CLOM was identified and implemented that uses off-           the statistical significance of the experimental findings.
the-shelf machine translation tools and monolingual ontology
matching techniques. To investigate the effectiveness and to             4. ACKNOWLEDGMENT
identify potential limitations of this generic approach to CLOM, it      This research is partially supported by Science Foundation Ireland
is evaluated in two CLOM scenarios involving ontologies written          (Grant 07/CE/11142) as part of the Centre for Next Generation
in Chinese, English and French. These ontologies contain                 Localisation (http://www.cngl.ie) at Trinity College Dublin.
approximately one hundred entities and are of the semantic
research community and the bibliography domain. Results from
these experiments showed that mappings can be neglected by               5. REFERENCES
monolingual matching tools when entity labels are translated             [1] Benjamins R. V., Contreras J., Corcho O., Gomez-Perez A.
independently from the ontologies of interest. When the                      2004. Six Challenges for the Semantic Web. AIS SIGSEMIS
translations of ontology labels are carried out in isolation of the          Bulletin, Vol. 1, Iss. 1, 2004.
CLOM tasks at hand, inadequate and synonymic translations can            [2] Euzenat J., Shvaiko P. 2007. Ontology Matching. Springer-
introduce further complications to the subsequent monolingual                Verlag Berlin/Heidelberg.
matching step.                                                           [3] Chang C., Lu W. 2002. The Translation of Agricultural
Based on this finding, the notion of appropriate ontology label              Multilingual Thesaurus. In Proceedings of the 3rd Asian
translation arose. An initial framework prototype is implemented             Conference for Information Technology in Agriculture
that makes use of the readily defined semantics of the given                 (Beijing, China, October 26-28, 2002), 526-528.
ontologies in a CLOM scenario. This prototype is evaluated               [4] Shi C., Wang H. 2005. Research on Ontology-Driven
against the generic approach in the aforementioned CLOM                      Chinese-English Machine Translation. In Proceedings of
scenarios using the same multilingual ontologies and gold                    2005 IEEE International Conference on Natural Language
standards. Experimental results showed that the SOCOM                        Processing & Knowledge Engineering (Wuhan, China,
framework generated higher quality mapping results than the                  October 30 – November 01, 2005), 426-430. DOI=10.1109/
generic approach due to its ability to select translations that are          NLPKE.2005.1598775
similar to what were used by the target ontology in a specific
CLOM setting.                                                            [5] Guan J., Deng J, Qu Y. 2005. An Ontology-Driven
                                                                             Information Retrieval Mechanism for Semantic Information
Motivated by this initial result, a second framework prototype was           Portals. In Proceedings of 1st International Conference on
then designed and implemented to draw on additional inputs                   Semantic, Knowledge and Grid (Beijing, China, November
(discussed in section 2) in the AOLT selection process, effectively          27 - 29, 2005). SKG. IEEE Computer Society, Washington,
allowing fine tuning of the system. This second prototype is                 DC, 63. DOI= http://dx.doi.org/10.1109/SKG.2005.42
evaluated against the generic approach in the same CLOM
experiments involving the aforementioned multilingual                    [6] Zhang L., Wu G., Xu Y., Li W., Zhong Y. 2004.
ontologies. Various combinations of the AOLT influence sources               Multilingual Collection Retrieving Via Ontology Alignment.
were executed in a range of experimental runs of the framework,              In Proceedings of the 7th International Conference on Asian
and several sets of mappings were generated. Versatility in these            Digital Libraries (Shanghai, China, December 13-17, 2004)
mapping results demonstrated the flexibility of the AOLT                     LNCS 3334, 939-957. DOI=10.1007/978-3-540-30544-6_57
selection mechanism and showcased the tuning ability of the              [7] Şah M., Wade V. 2010. Automatic Metadata Extraction from
SOCOM framework.                                                             Multilingual Enterprise Content. In Proceedings of the 19th
Furthermore, as the experiments discussed above only concern                 ACM International Conference on Information and
ontologies of relatively small sizes, to assess the scalability of the       Knowledge Management (Toronto, Canada, October 26-30,
framework, the second prototype was applied in a real-world                  2010), to appear.
CLOM setting involving large organisational ontologies written in        [8] Steichen B., Wade V. 2010. Adaptive Retrieval and
English and German. These ontologies contained over 7000                     Composition of Socio-Semantic Content for Personalised
entities and were generated semi-automatically using enterprise              Customer Care. In Proceedings of International Workshop
data of the technical customer support domain. More details of               on Adaptation in Social and Semantic Web (Big Island of
how these ontologies are generated can be found in [7]. Mappings             Hawaii, USA, June 21, 2010), 1-10, ISSN 1613-0073.
were then generated using the SOCOM framework between these