Introduction

A Framework to Generate Reference Sets for Ontology Matching Algorithms?

Gurpriy

sh Gh

hin Lo

sachin.lodhag@tcs.com 0 0 54B, Tata Research Development and Design Center, Tata Consultancy Services Ltd., Hadapsar Industrial Estate , Hadapsar, Pune, Maharashtra -411013

The performance of ontology matching algorithms is evaluated using F-measure, precision and recall which in turn rely on the availability of the ground truth. Typically, the ground truth generation process is manual, subjective and time consuming. Therefore, there is a need to come up with a (semi) automated approach which generates an unbiased reference set; an approximation of ground truth. We propose a framework based solution to generate a reference set and report encouraging results for the OAEI 2019 conference dataset.

Reference Set Ontology Matching

Introduction

The performance of ontology matching algorithms is evaluated using the Fmeasure, precision, and recall measures. These measures in turn rely on the ground truth (gold standard) generated by a community of domain experts. Typically, the ground truth creation is manual, subjective and time consuming exercise. Due to its subjective nature, even the creation of a small size ground truth requires many domain experts to agree on a small set of pairs (e.g., some ontology pairs of the conference data set have less than 15 pairs in their ground truth).

Ground truth is the requirement in almost every scienti c discipline to validate ideas, theories, methods, etc. Therefore, many semiautomated approaches are proposed in various domains to generate it. Euzenat et al. propose benchmark generator framework to measure the meaningful properties of ontology matching algorithms [ 1 ]. The objective of their framework is to generate a new benchmark by supporting various alteration operations for any seed ontology. DBPediaNYD, another such e ort, has resulted in the machine generated reference set (a silver standard)[ 5 ]. Jorn Hees has proposed a semiautomated approach to map Edinburgh Associative Thesaurus (EAT) to DBpedia entities [ 3 ]. Hees approach ? Copyright c for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).

nds candidate mappings automatically through scores assigned to them by using Wikipedia API. These mappings are further veri ed manually to generate nal set of mappings. Harrow et al. have evaluated 11 matching systems on the biomedical ontologies to evaluate their relative performance with respect manually created mappings (gold standard), a set of mappings generated through consensus (silver standard or a reference set), and unique mappings generated by individual participating system [ 2 ].

Existing approaches do not consider a way to address bias introduced in the reference set as a result of using particular approach to generate it. For example, an algorithm that uses web search engines may get unfair advantage in an evaluation when using DBPedia-NYD as the reference set [ 5 ]. Creation of an unbiased reference set o ers multiple advantages: i) it can be used to evaluate a newly proposed ontology matching algorithm, ii) it can be used for training purpose, and iii) it can serve as the starting point for generating the ground truth. 2

Framework

We propose a plug-and-play framework that exploits properties of di erent ontology matching algorithms to generate an unbiased reference set for the input ontology matching algorithm and a pair of ontologies. Figure 1 outlines a conceptual view of the proposed framework. The framework enables the right set of ontology matching algorithms depending on the requirements speci ed by the user (domain expert, ontology matching algorithm designer, etc). For example, if the user wants to generate a reference set to be used for evaluating an ontology matching algorithm that exploits distance property between concepts of input ontologies, the framework enables those ontology matching algorithms which exploit di erent properties (e.g., concept equivalence through synonym set) to avoid bias in the reference set. Further, the user may choose to compute con dence values for all or a subset of concept pairs of input ontologies.

To generate a reference set of desired size and quality, it is necessary to lter the alignment set with respect to threshold values on the size and con dence values computed by all framework algorithms. Algorithm 1 outlines the approach to select threshold on the con dence values for an ontology pair. The selection of Alignmentset (many-to-many)

O1 O2 O1, C11 C21 O2 C12 C22 Align Algo C13 ..

C1n

Alignmentset (one-to-one) O1 O2 C11 C21

C12 C22 C23 LOinpetiamrization C13 C23 .. .. ..

C2m C1n C2m

Alignlo

Pluggedontologymatchingalgorithms Algo1 Algo2 AlgoN Cnf11 Cnf12 … Cnf1N

Selection Function (SF)

SF (cnf11.. cnf1N) Cnf21 Cnf22 … Cnf2N Confidence SF (cnf21 .. cnf2N) Cnf31 Cnf32 … Cnf3N veanlaubelsedfor SF (cnf31 .. cnf3N) .. .. … .. algorithms ..

Cnfn1 Cnfn2 … CnfnN SF (cnfn1 .. cnfnN)

Reference Set 1 0 0 .. 1 Fig. 1. Conceptual View of Framework. threshold value is determined by two parameters, the cardinality of a set in oneto-one matching form (generated after applying linear optimization - jalgoSetj - as shown in algorithm) and 2 [0; 1], the user de ned parameter for the minimum size of reference set.

Selection Function (SF) is one of the most important elements of the framework. SF takes `n' con dence values computed by chosen `n' ontology matching algorithms for a concept pair and produces a boolean value. To put it formally, SF : [0; 1]n ! f1; 0g. Di erent implementations of the SF function are possible. In its current avatar of the framework, we provide two implementations. First implementation uses Unanimity rule approach. All chosen algorithms should agree on a concept pair for its inclusion in the reference set. Second implementation uses Majority rule approach. If the majority of ontology matching algorithms (>= 50%) agree on a concept pair, it is included in the reference set. Algorithm 1 Algorithm to compute threshold value Require: Algoset, a superset containing one-to-one matching sets of all framework algorithms for an ontology pair, , user de ned parameter 1: for all threshold in [0:1; ::; 1:0] do 2: f lag = true 3: for all algoSet 2 Algoset do 4: f ilteredSet = f ilterF orT hreshold(threshold; algoSet) 5: if (jf ilteredSetj=jalgoSetj) < then 6: f lag = f alse 7: end if 8: end for 9: if f lag == true then 10: setT hresholdF orOntoP air(threshold) 11: end if 12: end for 3

Experiments

We have conducted experiments on the OAEI 2019 conference dataset using python v3.7.3. We have evaluated our framework using six di erent ontology matching algorithms two each for the categories of Deep learning (word2vec1 and fastText2), WordNet (WuPalmer and Lin3) and character (nGram and MLCS4).

For the computation of equality relation, classes and properties are compared with classes and properties respectively. Moreover, we rst convert the output of each ontology matching algorithm that is in many-to-many form (Align) into 1 https://spacy.io/api/doc/ 2 https://fasttext.cc/docs/en/pretrained-vectors.html 3 https://www.nltk.org/howto/wordnet.html 4 https://pypi.org/project/strsim/ cmt Conference cmt confOf cmt edas cmt ekaw cmt iasted cmt sigkdd Conference confOf Conference edas Conference ekaw Conference iasted Conference sigkdd confOf edas confOf ekaw confOf iasted confOf sigkdd edas ekaw edas iasted edas sigkdd ekaw iasted ekaw sigkdd iasted sigkdd 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.7 0.8 0.8 0.8 0.7 0.7 0.8 0.7 0.8 0.7 0.8 0.8 one-to-one matching form using the linear optimization [ 4 ]. It produces the maximal matching that maximizes overall con dence value of the one-to-one matching form alignment (Alignlo). We have chosen two ontology matching algorithms for each category as they were computing di erent con dence values for the same concept pair (in some cases, the di erence is as high as 0.2). For = 0:1, we get two di erent threshold values 0.7 and 0.8 for di erent ontology pairs as shown in the table 1. We have excluded both ontology matching algorithms for the category for which we want to generate the reference set.

Table 1 shows the F-measure values for two di erent implementations of SF as discussed above. From the table 1, it is clear that our framework generates good quality reference set (maximum F-measure being around 88%). From the Fmeasure values, we can conclude that not only SF selection strategy in uences the quality of reference set, but the enabled algorithms (and therefore, their properties) play an important role too. This behavior is consistent and can be observed for multiple ontology pairs of the conference dataset. Obtained results point to an important direction for generating unbiased reference set: the right mix of ontology matching algorithms exploiting di erent properties with right selection strategy.

Discussion and Future work: In its current avatar, the proposed framework does not model Inter-Algorithm disagreement between ontology matching algorithms exploiting the similar or di erent properties. The modeling of InterAlgorithm disagreement may further improve the quality of the generated reference set and reduces the bias in it. The framework does not account for the impact of approach that generates one-to-one matching form on the reference set. Both research questions require further investigation.

The notion of bias, accounted by the proposed framework, is based on a property exploited by a given ontology matching algorithm. Therefore, that property is applicable for all mappings of a reference set. The evaluation exercise of Harrow et al. considers the bias based on the similarity between two participating ontology matching systems and it is mapping speci c [ 2 ]. If two variants of the same participating system votes for a mapping, it is counted only once.

To generate the output that can be used in real world applications, domain experts need to further validate the generated reference set. Our framework will reduce the e orts required by domain experts in generating silver standard or gold standard. More experiments are needed to further validate the framework with respect to i) the diversity of ontology matching algorithms (e.g., hybrid ontology matching approaches combining and exploiting di erent properties) and ii) real world ontologies.

1. Euzenat , J. , Rosoiu , M.E. , Trojahn , C. : Ontology matching benchmarks: generation, stability, and discriminability . Journal of web semantics 21 , 30{ 48 ( 2013 ), https://doi.org/10.1016/j.websem. 2013 . 05 .002

2. Harrow , I. , Jimenez-Ruiz , E. , Splendiani , A. , Romacker , M. , Woollard , P. , Markel , S. , Alam-Faruque , Y. , Koch , M. , Malone , J. , Waaler , A. : Matching disease and phenotype ontologies in the ontology alignment evaluation initiative . Journal of biomedical semantics 8 ( 1 ), 55 ( 2017 ), https://doi.org/10.1186/s13326-017-0162-9

3. Hees , J. , Bauer , R. , Folz , J. , Borth , D. , Dengel , A. : Edinburgh associative thesaurus as rdf and dbpedia mapping . In: European Semantic Web Conference . pp. 17 { 20 . Springer ( 2016 ), https://doi.org/10.1007/978-3- 319 -47602-5 4

4. Matousek , J. , Gartner, B.: Understanding and using linear programming . Springer Science & Business Media ( 2007 )

5. Paulheim , H.: Dbpedianyd-a silver standard benchmark dataset for semantic relatedness in dbpedia . In: NLP-DBPEDIA@ ISWC . Citeseer ( 2013 )