=Paper=
{{Paper
|id=Vol-2536/oaei19_paper7
|storemode=property
|title=EVOCROS: Results for OAEI 2019
|pdfUrl=https://ceur-ws.org/Vol-2536/oaei19_paper7.pdf
|volume=Vol-2536
|authors=Juliana Medeiros Destro,Javier A. Vargas,Julio Cesar dos Reis,Ricardo da S. Torres
|dblpUrl=https://dblp.org/rec/conf/semweb/DestroVRT19
}}
==EVOCROS: Results for OAEI 2019==
EVOCROS: Results for OAEI 2019 Juliana Medeiros Destro1 , Javier A. Vargas1 , Julio Cesar dos Reis1 , and Ricardo da S. Torres2 1 Institute of Computing, University of Campinas, Campinas-SP, Brazil 2 Norwegian University of Science and Technology (NTNU), Ålesund, Norway {juliana.destro,jreis}@ic.unicamp.br ricardo.torres@ntnu.no, jalvarm.acm@gmail.com Abstract. This paper describes the updates in EVOCROS, a cross- lingual ontology alignment system suited to create mappings between ontologies described in different natural language. Our tool combines syntactic and semantic similarity measures with information retrieval techniques. The semantic similarity is computed via NASARI vectors used together with BabelNet, which is a domain-neutral semantic net- work. In particular, we investigate the use of rank aggregation techniques in the cross-lingual ontology alignment task. The tool employs automatic translation to a pivot language to consider the similarity. EVOCROS was tested and obtained high quality alignment in the Multifarm dataset. We discuss the experimented configurations and the achieved results in OAEI 2019. This is our second participation in OAEI. Keywords: cross-lingual matching · semantic matching · background knowledge · ranking aggregation 1 Presentation of the system There is a growing number of ontologies described in different natural languages. The mappings among different ontologies are relevant for the integration of het- erogeneous data sources to facilitate the exchange of information between sys- tems. EVOCROS is our approach to automatic cross-lingual ontology matching. In our previous participation, in OAEI 2018, EVOCROS employed a weighted combination of similarity and semantic measures. The new version, submitted in OAEI 2019, combines syntactic and semantic similarity measures with infor- mation retrieval techniques. In this section, we describe the modifications to the system and the implemented techniques. 1.1 State, purpose, general statement EVOCROS is a cross-lingual ontology alignment tool. The newest version of the tool leverages supervised methods of ranking aggregation techniques exploiting Copyright c 2019 for this paper by its authors. Use permitted under Creative Com- mons License Attribution 4.0 International (CC BY 4.0). 2 J. Destro et al. labeled information (i.e., training data) and ground-truth relevance to boost the effectiveness of a new ranker. Our goal is to leverage rank aggregation in cross-lingual mapping, by generating ranked lists based on distinct similarity measurements between the concepts of source and target ontologies. 1.2 Specific techniques used The tool is developed in Python 3 and uses learning to rank techniques imple- mented in the well-known library RankLib. We model the mapping problem as an information retrieval query. Figure 1 depicts the workflow of the proposed technique. The inputs are source and target ontologies written in Web Ontology Language (OWL). These ontologies are converted to objects. The first step is the pre-processing of the source and target input ontologies, converting them into owlready2 objects. Each concept of the source ontology is compared to all concepts of the target ontology. Fig. 1. General description of the technique. The mapping processing stage is where the top-1 entity of the final ranking is mapped to the input concept e1 . RankLib: https://sourceforge.net/p/lemur/wiki/RankLib/ (As of November 16, 2019). Python 3 library to manipulate ontologies as objects. EVOCROS: Results for OAEI 2019 3 Each entity of the source ontology is compared with all entities of the same type found in the target ontology (i.e., classes are matched to classes and prop- erties are matched to properties). In this sense, for each entity ei in the source ontology OX , we calculate the similarity value with each entity ej in the target ontology OY (Figure 2), thus generating a ranked list {rank1, rank2, rank3, rank4} for each similarity measure used (cf. Figure 3). Fig. 2. Concept c1 ∈ OX is compared against all concepts cn ∈ OY . For similarity measures that rely on monolingual comparison (i.e., syntactic and WordNet), the automatic translation of labels of entities ei ∈ OX and ej ∈ OY to a pivot language is used by leveraging Google Translate API during runtime. These similarity comparisons generate k ranks, each one based on a different similarity measure. We use the measures to generate the ranks, thus adding the flexibility to the use or the addition of different similarity measures without disrupting the technique. The ranks are then aggregated using LambdaMART [7] because this tech- nique has the best score among the majority of languages during the execution phase of OAEI 2019. Figure 4 presents that the set of multiple ranks are ag- gregated in a final rank. The Top-1 result of the aggregated rank c2 ∈ COY is mapped to the source ontology entity c1 ∈ COX , thus generating the candidate mapping m(c1 , c2 ) (cf. Figure 5). The mapping output follows the standard used by the Alignment API [?]. 1.3 Link to the set of provided alignments (in align format) Alignment results are available at https://github.com/jmdestro/evocros-results (As of November 16, 2019). 4 J. Destro et al. Fig. 3. Ranked lists generated by each similarity measure used. 2 Results In this section, we describe the results obtained in the experiments conducted in OAEI 2019. 2.1 Multifarm We consider the MultiFarm dataset [5], version released in 2015. Our experiments built cross-language ontology mappings by using English as a pivot language for Levenshtein [4], Jaro [3], and WordNet similarity measures. The semantic similarity relying on the Babelnet does not require a translation as it can retrieve the synsets used in NASARI vectors [1], by using the concepts original language. The application of each similarity measure in our technique generated a rank. A subset of all languages was used for training and validation. The subsets are 10% of queries for training set, 15% queries for validation set, and 75% queries for testing. These subsets were generated per language and then combined, so the algorithms were trained, validated and tested using all languages at once. The comparable gold standard (i.e., MultiFarm manually curated mappings) were adjusted to contain only the queries related to the testing subset. In this sense, a lower number of entities was considered in the tests, because we removed the set of queries used in training and validation from the reference mappings to ensure consistency. Table 1 presents the obtained values for precision, recall, and f-measure for each language pair tested. The precision, recall, and f-measure scores have the same value due to the nature of the experiments. Our approach generates n : n mappings, where n = |OX | = |OY | because the ontologies are translations of each other to different natural languages, thus every entity in the source ontology presents a correspondence in the target ontology. In this sense, both the gold standard and the generated mappings have the same size because each query (i.e., each entity in the source ontology) generates a mapping between the query (source entity) and the top-1 result of the final aggregated rank. Results EVOCROS: Results for OAEI 2019 5 Fig. 4. Rank aggregation of the ranked lists. Each rank aggregation algorithm generates a distinct final rank. Fig. 5. Mapping generated between source entity c1 ∈ OX and top-1 entity of the final rank generated by the rank aggregation algorithm, c2 ∈ OY . show competitive results when compared to the other tools participating in the evaluation. Table 1. Results achieved by different language pair Language pair Precision Recall F-measure fr-nl 0.61290 0.61290 0.61290 en-pt 0.59140 0.59140 0.59140 es-nl 0.58065 0.58065 0.58065 cz-nl 0.52688 0.52688 0.52688 cn-pt 0.50538 0.50538 0.50538 es-ru 0.38710 0.38710 0.38710 cn-ru 0.32258 0.32258 0.32258 cz-ru 0.32258 0.32258 0.32258 de-ru 0.32258 0.32258 0.32258 6 J. Destro et al. 3 General comments In this section, we discuss our results and the ways to improve the system. 3.1 Comments on the results The tool had satisfactory results, with competitive f-measure, but the execution time was exceedingly long due even with local caches for Babelnet NASARI vectors. This is due to the amount of comparisons required during execution because each concept or attribute in the source ontology is compared against all concepts and attributes of the target ontology. 3.2 Discussions on the way to improve the proposed system This was the second evaluation of the system and results are encouraging. Our main goals for future work are: Reduce execution time: the tool has a long execution time even with local caches. Our future work will explore ontology partitioning during the pre-processing stage of the matching task to reduce the amount of comparisons needed, thus improving the execution time. Bag of graphs: ontologies can be represented as graphs, thus allowing for partition- ing [2] and comparison of sub-graphs. Bag-of-graphs [6] is a graph matching approach, similar to bag-of-words. It represents graphs as feature vectors, highly simplifying the computation of graph similarity and reducing execution time. We propose as future investigation to use a simple vector-based representation for graphs and investigate it for cross-lingual ontology matching. 3.3 Comments on OAEI Although we were not participating, our tool was executed on the Knowledge Graph track. There were issues during the evaluation phase, preventing the sys- tem to fully participate in both Multifarm and KG tracks. 4 Conclusion The newest version of EVOCROS proposed an approach considering four similar- ity measures to build ranks and used a supervised method of rank aggregation. This is the second participation of the system in OAEI. The evaluation with the Multifarm dataset confirmed the quality of mappings generated by our tech- nique. For future work, we plan to improve our cross-lingual alignment proposal by considering different combinations of similarity measures and different ways of computing the syntactic and semantic similarities taking into account additional stages in the pre-processing of the ontology. EVOCROS: Results for OAEI 2019 7 Acknowledgements This work was supported by São Paulo Research Foundation (FAPESP): grant #2017/02325-5. References 1. Camacho-Collados, J., Pilehvar, M.T., Navigli, R.: Nasari: Integrating explicit knowledge and corpus statistics for a multilingual representation of concepts and entities. Artificial Intelligence 240, 36–64 (2016) 2. Hamdi, F., Safar, B., Reynaud, C., Zargayouna, H.: Alignment-based partitioning of large-scale ontologies. In: Advances in knowledge discovery and management, pp. 251–269. Springer (2010) 3. Jaro, M.A.: Advances in record-linkage methodology as applied to matching the 1985 census of tampa, florida. Journal of the American Statistical Association 84(406), 414–420 (1989) 4. Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and re- versals. Soviet Physics Doklady 10, 707–710 (1966) 5. Meilicke, C., Garcı́A-Castro, R., Freitas, F., Van Hage, W.R., Montiel-Ponsoda, E., De Azevedo, R.R., Stuckenschmidt, H., ŠVáB-Zamazal, O., Svátek, V., Tamilin, A., et al.: Multifarm: A benchmark for multilingual ontology matching. Web Semantics: Science, Services and Agents on the World Wide Web 15, 62–68 (2012) 6. Silva, F.B., de O. Werneck, R., Goldenstein, S., Tabbone, S., da S. Torres, R.: Graph-based bag-of-words for classification. Pattern Recognition 74(Supplement C), 266 – 285 (Feb 2018). https://doi.org/10.1016/j.patcog.2017.09.018, http://www.sciencedirect.com/science/article/pii/S0031320317303680 7. Wu, Q., Burges, C.J., Svore, K.M., Gao, J.: Adapting boosting for infor- mation retrieval measures. Information Retrieval 13(3), 254–270 (Jun 2010). https://doi.org/10.1007/s10791-009-9112-1