A Framework for Recommending Ontology Matching Systems based on Application Requirements Diego Pessoa Centro de Informática Universidade Federal de Pernambuco (UFPE) Recife, Brazil derp@cin.ufpe.br Abstract. Ontology matching is the process of generating correspon- dences between terms of different ontologies. Today, several methods for ontology matching have been proposed, which makes difficult the choice of the most suitable to use in a particular setting. In this paper, we pro- pose a novel ontology matching framework that uses automatic matchers recommendation to generate alignments. The differential of this work is the employment of application requirements as means of acquiring knowl- edge about a particular matching task. Keywords: Ontology Matching, Ontology Matchers Recommendation, Knowl- edge Acquisition 1 Problem Statement Ontology matching is the task of finding relationships between entities expressed in different ontologies [7]. It usually outputs alignments containing a set of cor- respondences between ontology terms, which are generated by using a single similarity measure or by combining different ones [6]. In the last years, many Ontology Matching Systems (Ontology Matchers) have been proposed, as stated in [15]. There is a yearly event, organized by the Ontology Alignment Evaluation Initiative (OAEI, e.g., [1]), in which matchers are tested under different test cases. The OAEI results have demonstrated that the evaluated matchers have achieved different performances depending on the matching task. For example, in 2016 edition, the matcher ALIN reached a high F-measure (0.74) in the Conference test case, but it was unable to provide any results in the Large-bio test case [1]. As there is limited knowledge about which factors may impact on the matchers’ performance, it becomes challenging to a user the choice of the most suitable ones for a particular matching task. This fact increases the need for an automatic approach to select, combine and tune matchers. This work introduces a framework for the automatic recommendation of on- tology matchers regarding an application-specific matching task. Existing frame- works typically consider matchers’ parameters or reference alignments as input. In this case, the user is engaged mostly only in the later validation of correspon- dences. A differential of the proposed framework is that it allows the user to define a set of application requirements, which are formalized as RDF resources. The framework aims not only to reduce the search space of a matching task (by producing ontology segments), but also to recommend the most suitable matchers for the reduced setting generated according to the requirements. 2 Relevancy The integration of ontologies (i.e., establishing a unified view of ontologies from heterogeneous sources) has several applications (e.g. data integration, search, and analysis). Ontology matching is an essential step in this process, as sources usually employ different terms to describe the same real-world concept, even in the case of sources from the same domain. Especially in large-scale matching tasks (i.e., when dealing with several on- tologies that may contain a lot of elements), it is hard to acquire good quality alignments. It is because of the necessity of more computational effort (given the high quantity of items to compare), as also a greater number of user validations. As there are several ontology matchers available, the configuration of a matching task may be a complicated and time-consuming task to the user. In this aspect, there is a lack of approaches that could automatically provide the generation of alignments, by using a set of recommended matchers for a particular matching task. The basic idea of proposed framework is to solve this issue by allowing users to define a set of application requirements, enabling both the reduction of the amount of compared terms and the delivery of a set of recommended matchers. Consequently, it also will make easier the configuration of a particular matching task, rather than the need of having knowledge on matchers’ characteristics. 3 Related Work There are a few works in the literature addressing the ontology matchers recom- mendation problem. The work in [13] has identified (by applying questionnaires with domain experts) a set of features related to matchers (regarding input, out- put, approach, usage, cost, and documentation). For each feature, the user can define weights which are used by a multi-criteria decision method called Analytic Hierarchy Process (AHP) that determines the suitable matchers. However, they consider only a fixed set of matchers, in such a way that it would be necessary to apply new questionnaires to identify the features for novel approaches. As ontol- ogy matchers are in constant evolution, this could be a useless effort. Also, users may not have knowledge about matchers peculiarities, which makes challenging to choose the relevant ones according to their interests. The approaches [12, 14] consider textual and structural-based characteristics of input ontologies to recommend matchers before the task execution. However, it would lead to dismissing matchers that may provide better results in practice, since they do not consider any result of alignments. The work in [2] deals with this issue by considering previous results. But, as it would be unfeasible to run all matchers on every possible scenario, they are executed only over random on- tology samples (called ontology segments). However, generating random samples may lead to uncertain evaluations, since every execution may present different results. In [11], three recommendation strategies based on the use of ontology seg- ments are proposed. The first generates segment pairs based on the exact match- ing with a set of concepts. The second considers a whole set of validated mapping suggestions and the third only segment pairs of these validated set. However, there is no assurance that good performance on parts of the ontologies may lead to the same result on the whole ontologies. Furthermore, several measures can be used to define the matcher performance, which can result in different recommendations depending on the chosen metric. To the best of our knowledge, there are no other work that addresses ontology matchers recommendation by employing application requirements as means of acquiring knowledge about the priorities for a particular ontology matching task. The assumption of using requirements is allowing the user to define the relevant terms and the quality metrics to be considered in the matching. We intend to provide both a way to reduce the search space, through the generation of ontology segments related to terms that meet the requirements; and a form to evaluate the matchers more accurately, by considering the metrics that are more significant to the user. 4 Research Questions & Hypotheses The following research questions (Q) and related hypothesis (H) investigate how the use of application requirements will reduce the search space of an ontology matching task and consequently improve the matchers’ recommendation: – Q1: Would the use of application requirements provide the generation of better ontology segments? Will these segments reduce the search scope of a matching task without loss of quality? – H1: Employing application requirements will enable to generate better on- tology segments, reducing more efficiently the search space of a matching task, compared to state of the art techniques. – Q2: How can the use of application requirements improve ontology matchers’ recommendations? Is it possible to formalize the application needs regarding an ontology matching task? – H2: Application requirements will allow the users to specify which terms (data requirements) and metrics (quality requirements) should be consid- ered in an ontology matching task. The generation of ontology segments based on the most relevant terms and the use of a set of preferred metrics when evaluating matchers will provide better recommendations, compared to state of the art. RDF resources will be used to formalize the application requirements. 5 Proposed Approach The proposed framework aims to support ontology matching users to perform matching tasks using a set of matchers suitable for a particular application. Fig- ure 1 presents the respective framework components and workflow. We introduce a brief example to illustrate the definition of application requirements and detail the framework workflow in what follows. Fig. 1. Proposed Ontology Matching Framework. To start an ontology matching task, the user should provide a pair of on- tologies to match and a set of application requirements. These requirements are represented in the form of RDF statements, given the following two categories: i) Data Requirements and ii) Quality Requirements. The first are statements describing characteristics of the more relevant terms to be considered in the matching ontologies. These statements are used to generate the ontologies seg- ments. The second ones, are statements that assign weights to quality metrics (e.g. precision, recall, execution time). These weights are used in the evaluation of alignments generated by matchers, resulting in recommendation scores. To illustrate the definition of requirements, we introduce an ontology match- ing scenario. Suppose an application to integrate open data for understanding motivations behind people’s migration from one country to another in the last years. Assuming that a large number of data sources may provide diverse data about countries and cities, this would be a typical case when application require- ments can be used to specify the scope of a particular ontology matching setting. Table 1 shows the definition of two data requirements (DR1 and DR2), stating the preference for ontology classes that match with the terms Weather and GDP, in which the latter should have GDP per capta as a subclass. As Quality Require- ment (QR), we illustrate the definition of the weights 0.7 and 0.3 for precision and execution time respectively, assuming that the application intends to reduce the number of generated correspondences and the execution time, given a large number of sources that may be considered. Table 1. Examples of Application Requirements as RDF statements Subject Predicate Object #DR1 dataRequirement:hasMatchingClass ”weather” #DR2 dataRequirement:hasMatchingClass ”GDP” #DR2 dataRequirement:hasMatchingSubClass ”GDP per capta” #QR1 qualityRequirement:hasPrecisionWeight 0.7 #QR2 qualityRequirement:hasExecutionTimeWeight 0.3 5.1 Framework Workflow The framework workflow starts by receiving a set of ontologies and application requirements from the user. Then, the following steps are performed: i) ontology segment generation; ii) related alignments finding; iii) matchers score calcula- tion, iv) matchers execution and iv) alignments validation. Furthermore, as a support for these steps, the framework also uses an already established knowl- edge base containing ontologies, matchers, alignments and validations acquired from reliable sources (e.g. OAEI). Ontology Segment Generation. The first step is the generation of ontology segments following the data requirements provided, which will result in a re- duced subset of ontologies containing only the most relevant terms for the user. The segment generation is made by traversing the ontologies structure search- ing for terms that meet the requirements. Ontology segments are automatically generated based on these terms and their correspondent elements (e.g. classes, subclasses, superclasses), depending on the data requirements. Related Alignments Finding. The second step is seeking for related align- ments (in the alignment database), i.e., the ones between ontologies that share similar characteristics with the generated segments. To define this similarity, we assign some values to ontologies (or segments) regarding the following matchers types: i) syntactic, ii) lexical; iii) structural and iv) instance-based. Matcher Score Calculation. The third step is to calculate a score for the available matchers considering the results of alignments evaluations, following the weights for each metric defined in the quality requirements. The list of match- ers ordered by score will compound the matchers ranking for the current setting. Matchers Execution. Once the ranking of recommended matchers was es- tablished, the user can apply some criteria (e.g. minimum score threshold or maximum matchers quantity) to select matchers and then perform their execu- tion, regarding the following steps: pre-matching, matching, combination, and filtering. Alignment Validation. In the final step, the user can provide some feedback about alignments provided by the framework. For this, it is possible to walk through the alignments’ correspondences and annotate them with positive or negative statements. This information is also stored in the Validations Database and may impact on subsequent interactions. Knowledge Bases. The knowledge bases store information about ontologies, matchers, alignments and validations, serving as a baseline to the mentioned steps. Regardless of what is the current step in the workflow, the user can provide data to the knowledge base aiming to improve the obtained results. The Ontolo- gies Database includes basic descriptions, such as URI and format (e.g. RDF, OWL). The Matchers Database stores some metadata about existing matchers, such as name, version, main features and service endpoint. In the Alignments Database, following as a standard the Alignment API [4], we to store alignments and, if a gold-standard is available, it also stores a summary of metrics (e.g. pre- cision, recall, f-measure) and matching information (e.g. correspondences found, expected and true positives) about the alignment generation. Finally, the Vali- dation Database stores a set of statements containing negative or positive user annotations on correspondences. 6 Evaluation Plan To evaluate the proposed framework, we perform experiments with real match- ers on public datasets. The initial tests targeted the datasets provided by OAEI tracks (e.g. conference, anatomy). For this, we first have prepared the knowledge bases, by adding some metadata about reference ontologies, existing match- ers (preferably OAEI participants) and alignments evaluations (when reference alignments are available). As further experiments, we also plan to test the frame- work in other domains, such as the integration of ontologies from open data repositories. Our hypothesis will be validated if the experiments demonstrate that the use of application requirements enables the reduction of a matching task and consequently the recommendation of the best matching systems. 7 Preliminary Results To obtain some preliminary results, we have developed a prototype of the pro- posed framework. For initializing the Ontology Database, we have imported some ontologies from the OAEI Conference and Anatomy datasets (Cmt, Conference, ConfOf, Edas, Ekaw, Iasted, Sigkdd, Human, and Mouse). To fill the Matchers Database, we have considered the ones that usually are participants in OAEI campaigns and that have a publicly available source code. To standardize the access (input/output), we have implemented a wrapper for each matcher. In the initial experiment, we considered the matchers COMA [5], YAM [3], AML [9], LogMap [10] and FCAMap [8]. Figure 2 illustrates the comparison of obtained matching results considering the whole ontologies and the segments generated by the prototype. As result of this preliminary experiment, we observed that in the majority of cases, the prototype was able to improve quality metrics and in all the cases it was able to reduce the execution time. Quality Metrics Execution Time (seconds) 0,9 YAM (seg) 0,8 0,7 YAM 0,6 FCAMap (seg) 0,5 FCAMap 0,4 LogMap (seg) 0,3 LogMap 0,2 0,1 AML (seg) 0 AML COMA COMA AML AML LogMap LogMap FCAMap FCAMap YAM YAM COMA (seg) (seg) (seg) (seg) (seg) (seg) COMA Precision Recall F-measure 0 5 10 15 20 25 Fig. 2. Preliminary results. 8 Reflections In this work, we present a framework for matchers recommendation based on application requirements. Even though the preliminary results indicate that the proposed approach is promising, we now are focused on performing further ex- periments to obtain more extensive results. We still have some work on design and implementation of the framework, but the main structure was implemented in the initial prototype, which will support the execution of new experiments. We expect that the proposed framework can bring as main contributions: i) the ease of preparation of a matching task, by using requirements instead of matcher-specific parameters; and ii) the generation of alignments with better quality, by reducing the matching search space and by using the best recom- mended matchers to generate alignments. Another benefit of this framework would be the reduction of execution time, since the matching will not be per- formed on entire ontologies, but only on the more relevant segments to the user. Acknowledgments. I am grateful to my advisor Dr. Ana Carolina Salgado and my co-advisor Dr. Bernadette Farias Lóscio for their support and the opportunity for the realization of this work. References 1. Achichi, M., Cheatham, M., Dragisic, Z., Euzenat, J., Faria, D., Ferrara, A., Flouris, G., Fundulaki, I., Harrow, I., Ivanova, V., Jiménez-Ruiz, E., Kuss, E., Lambrix, P., Leopold, H., 0001, H.L., Meilicke, C., Montanelli, S., Pesquita, C., Saveta, T., Shvaiko, P., Splendiani, A., Stuckenschmidt, H., Todorov, K., dos San- tos, C.T., Zamazal, O.: Results of the Ontology Alignment Evaluation Initiative 2016. OM@ISWC (2016) 2. Anam, S., Kim, Y.S., Kang, B.H., Liu, Q.: Adapting a knowledge-based schema matching system for ontology mapping. In: Proceedings of the Australasian Com- puter Science Week Multiconference. pp. 27:1–27:10. ACSW ’16, ACM, New York, NY, USA (2016), http://doi.acm.org/10.1145/2843043.2843048 3. Bellahsene, Z., Ngo, D.H., Bellahsene, Z.: YAM++ : (not) Yet Another Matcher for Ontology Matching Task. Bases de Données Avancées p. 5 (2012) 4. David, J., Euzenat, J., Scharffe, F., Trojahn dos Santos, C.: The Alignment API 4.0. Semantic Web () 2(1), 3–10 (Jan 2011) 5. Do, H.H., Rahm, E.: COMA - A System for Flexible Combination of Schema Matching Approaches. VLDB pp. 610–621 (2002) 6. Elshwimy, F.A., Algergawy, A., Sarhan, A., Sallam, E.A.: Aggregation of similarity measures in schema matching based on generalized mean. 2014 IEEE 30th Inter- national Conference on Data Engineering Workshops (ICDEW) pp. 74–79 (2014) 7. Euzenat, J., Shvaiko, P.: Ontology Matching. Springer Publishing Company, In- corporated, 2nd edn. (2013) 8. Fan, L., Xiao, T.: An automatic method for ontology mapping. In: Apolloni, B., Howlett, R.J., Jain, L.C. (eds.) KES (3). Lecture Notes in Computer Science, vol. 4694, pp. 661–669. Springer (2007) 9. Faria, D., Pesquita, C., Santos, E., Palmonari, M., Cruz, I.F., Couto, F.M.: The AgreementMakerLight Ontology Matching System. Springer Berlin Heidelberg, Berlin, Heidelberg (2013) 10. Jiménez-Ruiz, E., Grau, B.C.: LogMap: Logic-Based and Scalable Ontology Match- ing. In: The Semantic Web – ISWC 2011, pp. 273–288. Springer, Berlin, Heidelberg, Berlin, Heidelberg (Oct 2011) 11. Lambrix, P., Kaliyaperumal, R.: A Session-based Ontology Alignment Approach enabling User Involvement. Semantic Web Journal (2016) 12. Li, J., Tang, J., Li, Y., Luo, Q.: Rimom: A dynamic multistrategy ontology align- ment framework. IEEE Trans. on Knowl. and Data Eng. 21(8), 1218–1232 (Aug 2009), http://dx.doi.org/10.1109/TKDE.2008.202 13. Mochol, M., Jentzsch, A., Euzenat, J.: Applying an analytic method for matching approach selection. In: OM’06: Proceedings of the 1st International Conference on Ontology Matching - Volume 225. pp. 37–48. Free University of Berlin, CEUR- WS.org (Nov 2006) 14. Pirró, G., Talia, D.: UFOme: An ontology mapping system with strategy prediction capabilities. Data & Knowledge Engineering 69(5), 444–471 (May 2010) 15. Shvaiko, P., Euzenat, J.: Ontology matching: state of the art and future challenges. Knowledge and Data Engineering, IEEE Transactions on 25(1), 158–176 (2013)