Exploiting Multilinguality For Ontology Matching Purposes

Introduction

The alignment between linguistic artifacts like vocabularies, thesauri, etc., is a task that has attracted considerable attention in recent years [1] [2]. With very few exceptions, however, research in this field has primarily focused on the development of monolingual matching algorithms. As more and more artifacts, especially in the Linked Open Data realm, become available in a multilingual fashion, novel matching algorithms are required.

Indeed, in the case of a multilingual environment, there are some peculiarities that can be exploited in order to relax the classic schema matching task:

the use of multilinguality permits to reduce the problems raised when two different concepts have the same label; indeed, the probability for two diverse concepts to have the same label across several languages is very low; multilingual artifacts provide term translations that have already been adapted to the represented domains; therefore, the human creators of a multilingual artifact put a lot of their cultural heritage in choosing the right terms for the each concept.

In this paper, we present a work exploiting the two aspects described above in order to build a multilingual ontology approach for defining mappings between multilingual ontologies. Such an approach, extending the one presented in [3], has been evaluated on domain-specific use cases belonging to the agriculture and medical domains.

An Approach for the Matching of Multilingual Thesauri

The proposed approach is based on the exploitation of the labels associated with each concept defined in an ontology. Let us consider two ontologies: (i) a source ontology containing the elements that have to be mapped, and a target ontology used as reference for creating the mappings. The proposed approach has been built by taking inspiration from IR techniques and it exploits the creation of indexes for identifying candidate mappings.

The process is split in two different phases: (i) in the first one, we created the index containing information about the target ontology represented in a structured way; while, (ii) in the second phase, we build queries using information contained in the source ontology for retrieving a rank representing the candidate mappings that we may define between the two thesauri.

Firstly, we extract the whole set of labels from the target ontology and, after a set of preprocessing activities, each concept "C" of the target ontology is transformed into a structured representation containing all multilingual labels describing "C", and all multilingual labels describing concepts belonging to the context of "C" that is the set of concepts directly connected with "C". Such labels are then stored into an index. Then, in the second phase, from each entity of the source index the set of its labels is extracted. A query containing such labels is composed and performed on the index built during the first phase. A rank containing n suggestions ordered by their confidence score is returned by the system and it is used as input for the creation of the mapping that may be done manually from domain experts or automatically by the system.

Concluding Remarks

The approach has been evaluated on a set of six multilingual ontologies, coming from the agricultural and medical domains, for which gold standards containing the mappings were available. Then, it has been compared with the previous one presented in [3].

Table 1 :1Comparison between the results obtained by the previous version of the system and the proposed one.Mapping Set# of Mappings Prec. v1 Rec. v1 F-Measure v1 Prec. v2 Rec. v2 F-Measure v2Eurovoc → Agrovoc12970.816 0.8740.8440.897 1.0000.946Agrovoc → Eurovoc12970.906 0.6950.7870.930 0.9990.963Avg.0.861 0.7850.8210.914 1.0000.955Gemet → Agrovoc11790.909 0.5460.6820.850 0.9990.918Agrovoc → Gemet11790.943 0.7400.8290.893 0.9970.942Avg.0.926 0.6430.7590.872 0.9980.931MDR → MeSH60610.776 0.8070.7910.903 0.9120.907MeSH → MDR60610.716 0.7890.7510.843 0.8880.865Avg.0.746 0.7980.7710.873 0.9000.886MDR → SNOMED199710.621 0.5590.5880.739 0.8260.780SNOMED → MDR199710.556 0.5190.5370.871 0.4590.601Avg.0.589 0.5390.5630.805 0.6430.715MeSH → SNOMED266340.690 0.6600.6750.741 0.8140.776SNOMED → MeSH266340.657 0.5640.6070.831 0.5440.658Avg.0.674 0.6120.6420.786 0.6790.729

Ontology matching JEuzenat PShvaiko 2007 Springer Schema Matching and Mapping Bellahsene, Z., Bonifati, A., Rahm, E. 2011 Springer Exploiting multilinguality for creating mappings between thesauri MDragoni Proceedings of the 30th Annual ACM Symposium on Applied Computing. SAC 2015 the 30th Annual ACM Symposium on Applied Computing. SAC 2015 ACM 2015