A Framework to Generate Reference Sets for Ontology Matching Algorithms? Gurpriya Bhatia[0000−0002−7511−8543] , Kumar Vidhani[0000−0002−2412−6391] , Mangesh Gharote[0000−0002−4942−2429] , and Sachin Lodha[0000−0001−5771−4977] 54B, Tata Research Development and Design Center, Tata Consultancy Services Ltd., Hadapsar Industrial Estate, Hadapsar, Pune, Maharashtra -411013 {gurpriya.bhatia, kumar.vidhani, mangesh.g, sachin.lodha}@tcs.com Abstract. The performance of ontology matching algorithms is eval- uated using F-measure, precision and recall which in turn rely on the availability of the ground truth. Typically, the ground truth generation process is manual, subjective and time consuming. Therefore, there is a need to come up with a (semi) automated approach which generates an unbiased reference set; an approximation of ground truth. We pro- pose a framework based solution to generate a reference set and report encouraging results for the OAEI 2019 conference dataset. Keywords: Reference Set · Ontology Matching Algorithm Property · Ontology Matching. 1 Introduction The performance of ontology matching algorithms is evaluated using the F- measure, precision, and recall measures. These measures in turn rely on the ground truth (gold standard) generated by a community of domain experts. Typically, the ground truth creation is manual, subjective and time consuming exercise. Due to its subjective nature, even the creation of a small size ground truth requires many domain experts to agree on a small set of pairs (e.g., some ontology pairs of the conference data set have less than 15 pairs in their ground truth). Ground truth is the requirement in almost every scientific discipline to vali- date ideas, theories, methods, etc. Therefore, many semiautomated approaches are proposed in various domains to generate it. Euzenat et al. propose benchmark generator framework to measure the meaningful properties of ontology matching algorithms [1]. The objective of their framework is to generate a new benchmark by supporting various alteration operations for any seed ontology. DBPediaNYD, another such effort, has resulted in the machine generated reference set (a silver standard)[5]. Jorn Hees has proposed a semiautomated approach to map Ed- inburgh Associative Thesaurus (EAT) to DBpedia entities [3]. Hees approach ? Copyright c for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). G. Bhatia et al. finds candidate mappings automatically through scores assigned to them by us- ing Wikipedia API. These mappings are further verified manually to generate final set of mappings. Harrow et al. have evaluated 11 matching systems on the biomedical ontologies to evaluate their relative performance with respect man- ually created mappings (gold standard), a set of mappings generated through consensus (silver standard or a reference set), and unique mappings generated by individual participating system [2]. Existing approaches do not consider a way to address bias introduced in the reference set as a result of using particular approach to generate it. For example, an algorithm that uses web search engines may get unfair advantage in an evaluation when using DBPedia-NYD as the reference set [5]. Creation of an unbiased reference set offers multiple advantages: i) it can be used to evaluate a newly proposed ontology matching algorithm, ii) it can be used for training purpose, and iii) it can serve as the starting point for generating the ground truth. 2 Framework We propose a plug-and-play framework that exploits properties of different on- tology matching algorithms to generate an unbiased reference set for the input ontology matching algorithm and a pair of ontologies. Figure 1 outlines a con- ceptual view of the proposed framework. The framework enables the right set of ontology matching algorithms depending on the requirements specified by the user (domain expert, ontology matching algorithm designer, etc). For exam- ple, if the user wants to generate a reference set to be used for evaluating an ontology matching algorithm that exploits distance property between concepts of input ontologies, the framework enables those ontology matching algorithms which exploit different properties (e.g., concept equivalence through synonym set) to avoid bias in the reference set. Further, the user may choose to compute confidence values for all or a subset of concept pairs of input ontologies. To generate a reference set of desired size and quality, it is necessary to filter the alignment set with respect to threshold values on the size and confidence values computed by all framework algorithms. Algorithm 1 outlines the approach to select threshold on the confidence values for an ontology pair. The selection of Alignment set (many-to-many) Alignment set (one-to-one) Plugged ontology matching algorithms O1 O2 O1 O2 Algo1 Algo2 AlgoN Selection Function (SF) Reference Set C11 C21 C11 C21 Cnf11 Cnf12 … Cnf1N SF (cnf11.. cnf1N) 1 O1, O2 C12 C22 Align C12 C22 Cnf21 Cnf22 … Cnf2N SF (cnf21 .. cnf2N) 0 Confidence Alignlo values for Algo C13 C23 Linear C13 C23 Cnf31 Cnf32 … Cnf3N SF (cnf31 .. cnf3N) 0 Optimization enabled algorithms .. .. .. .. .. .. … .. .. .. C1n C2m C1n C2m Cnfn1 Cnfn2 … CnfnN SF (cnfn1 .. cnfnN) 1 Fig. 1. Conceptual View of Framework. Reference Set Generator threshold value τ is determined by two parameters, the cardinality of a set in one- to-one matching form (generated after applying linear optimization - |algoSet| - as shown in algorithm) and ρ ∈ [0, 1], the user defined parameter for the minimum size of reference set. Selection Function (SF) is one of the most important elements of the frame- work. SF takes ‘n’ confidence values computed by chosen ‘n’ ontology matching algorithms for a concept pair and produces a boolean value. To put it formally, SF : [0, 1]n → {1, 0}. Different implementations of the SF function are possible. In its current avatar of the framework, we provide two implementations. First im- plementation uses Unanimity rule approach. All chosen algorithms should agree on a concept pair for its inclusion in the reference set. Second implementation uses Majority rule approach. If the majority of ontology matching algorithms (>= 50%) agree on a concept pair, it is included in the reference set. Algorithm 1 Algorithm to compute threshold value Require: Algoset , a superset containing one-to-one matching sets of all framework algorithms for an ontology pair, ρ, user defined parameter 1: for all threshold in [0.1, .., 1.0] do 2: f lag = true 3: for all algoSet ∈ Algoset do 4: f ilteredSet = f ilterF orT hreshold(threshold, algoSet) 5: if (|f ilteredSet|/|algoSet|) < ρ then 6: f lag = f alse 7: end if 8: end for 9: if f lag == true then 10: setT hresholdF orOntoP air(threshold) 11: end if 12: end for 3 Experiments We have conducted experiments on the OAEI 2019 conference dataset using python v3.7.3. We have evaluated our framework using six different ontology matching algorithms two each for the categories of Deep learning (word2vec1 and fastText2 ), WordNet (WuPalmer and Lin3 ) and character (nGram and MLCS4 ). For the computation of equality relation, classes and properties are compared with classes and properties respectively. Moreover, we first convert the output of each ontology matching algorithm that is in many-to-many form (Align) into 1 https://spacy.io/api/doc/ 2 https://fasttext.cc/docs/en/pretrained-vectors.html 3 https://www.nltk.org/howto/wordnet.html 4 https://pypi.org/project/strsim/ G. Bhatia et al. Table 1. Comparison of two implementations of Selection Function for the conference dataset. FEXDL , FEXWN and FEXCHR - F-measure values (in percentage) excluding DL, WordNet and Character based approaches respectively. ρ = 0.1. Ontology Pair Threshold(τ ) SF-Unanimity rule SF-Majority rule FEXDL FEXWN FEXCHR FEXDL FEXWN FEXCHR cmt Conference 0.8 40.00 46.15 40.00 41.17 45.16 44.44 cmt confOf 0.8 43.47 50.00 43.47 48.00 48.00 46.15 cmt edas 0.8 63.63 60.86 57.14 61.53 64.00 59.25 cmt ekaw 0.8 52.63 52.63 60.00 42.85 54.54 40.00 cmt iasted 0.8 66.66 88.88 75.00 42.10 72.72 44.44 cmt sigkdd 0.8 70.00 72.72 70.00 69.56 75.00 72.00 Conference confOf 0.8 58.33 66.66 56.00 48.64 58.06 47.36 Conference edas 0.8 55.17 58.06 55.17 39.13 50.00 40.00 Conference ekaw 0.8 40.00 45.00 41.02 44.89 51.06 46.15 Conference iasted 0.7 33.33 43.47 33.33 36.84 41.37 34.14 Conference sigkdd 0.8 58.33 56.00 58.33 40.00 50.00 38.88 confOf edas 0.8 58.82 64.86 60.60 63.63 66.66 60.86 confOf ekaw 0.8 66.66 60.60 62.50 59.45 71.79 70.00 confOf iasted 0.7 62.50 66.66 62.50 42.42 46.15 41.17 confOf sigkdd 0.7 66.66 66.66 61.53 38.09 47.05 38.09 edas ekaw 0.8 48.48 50.00 52.94 50.00 65.11 48.00 edas iasted 0.7 46.15 51.61 50.00 38.59 43.47 32.70 edas sigkdd 0.8 60.86 60.86 60.86 60.00 51.85 56.25 ekaw iasted 0.7 52.63 60.86 50.00 31.81 42.42 30.40 ekaw sigkdd 0.8 66.66 66.66 66.66 63.63 60.00 58.33 iasted sigkdd 0.8 75.86 68.96 74.07 60.86 75.67 54.16 one-to-one matching form using the linear optimization [4]. It produces the maxi- mal matching that maximizes overall confidence value of the one-to-one matching form alignment (Alignlo ). We have chosen two ontology matching algorithms for each category as they were computing different confidence values for the same concept pair (in some cases, the difference is as high as 0.2). For ρ = 0.1, we get two different threshold values 0.7 and 0.8 for different ontology pairs as shown in the table 1. We have excluded both ontology matching algorithms for the category for which we want to generate the reference set. Table 1 shows the F-measure values for two different implementations of SF as discussed above. From the table 1, it is clear that our framework generates good quality reference set (maximum F-measure being around 88%). From the F- measure values, we can conclude that not only SF selection strategy influences the quality of reference set, but the enabled algorithms (and therefore, their properties) play an important role too. This behavior is consistent and can be observed for multiple ontology pairs of the conference dataset. Obtained results point to an important direction for generating unbiased reference set: the right mix of ontology matching algorithms exploiting different properties with right selection strategy. Reference Set Generator Discussion and Future work: In its current avatar, the proposed frame- work does not model Inter-Algorithm disagreement between ontology matching algorithms exploiting the similar or different properties. The modeling of Inter- Algorithm disagreement may further improve the quality of the generated ref- erence set and reduces the bias in it. The framework does not account for the impact of approach that generates one-to-one matching form on the reference set. Both research questions require further investigation. The notion of bias, accounted by the proposed framework, is based on a prop- erty exploited by a given ontology matching algorithm. Therefore, that property is applicable for all mappings of a reference set. The evaluation exercise of Har- row et al. considers the bias based on the similarity between two participating ontology matching systems and it is mapping specific [2]. If two variants of the same participating system votes for a mapping, it is counted only once. To generate the output that can be used in real world applications, domain experts need to further validate the generated reference set. Our framework will reduce the efforts required by domain experts in generating silver standard or gold standard. More experiments are needed to further validate the framework with respect to i) the diversity of ontology matching algorithms (e.g., hybrid ontology matching approaches combining and exploiting different properties) and ii) real world ontologies. References 1. Euzenat, J., Roşoiu, M.E., Trojahn, C.: Ontology matching benchmarks: genera- tion, stability, and discriminability. Journal of web semantics 21, 30–48 (2013), https://doi.org/10.1016/j.websem.2013.05.002 2. Harrow, I., Jiménez-Ruiz, E., Splendiani, A., Romacker, M., Woollard, P., Markel, S., Alam-Faruque, Y., Koch, M., Malone, J., Waaler, A.: Matching disease and phenotype ontologies in the ontology alignment evaluation initiative. Journal of biomedical semantics 8(1), 55 (2017), https://doi.org/10.1186/s13326-017-0162-9 3. Hees, J., Bauer, R., Folz, J., Borth, D., Dengel, A.: Edinburgh associative thesaurus as rdf and dbpedia mapping. In: European Semantic Web Conference. pp. 17–20. Springer (2016), https://doi.org/10.1007/978-3-319-47602-5 4 4. Matousek, J., Gärtner, B.: Understanding and using linear programming. Springer Science & Business Media (2007) 5. Paulheim, H.: Dbpedianyd-a silver standard benchmark dataset for semantic relat- edness in dbpedia. In: NLP-DBPEDIA@ ISWC. Citeseer (2013)