Applying edge-counting semantic similarities to Link Discovery: Scalability and Accuracy Kleanthi Georgala1,2 , Mohamed Ahmed Sherif2 , Michael Röder2 , and Axel-Cyrille Ngonga Ngomo1,2 1 Department of Computer Science, Paderborn University, Germany, 2 Department of Computer Science, University of Leipzig, Germany georgala@informatik.uni-leipzig.de {mohamed.sherif,michael.roeder,axel.ngonga}@upb.de Abstract. With the growth in number and variety of RDF datasets comes an in- creasing need for both scalable and accurate solutions to support link discovery at instance level within and across these datasets. In contrast to ontology matching, most linking frameworks rely solely on string similarities to this end. The limited use of semantic similarities when linking instances is partly due to the current literature stating that they (1) do not improve the F-measure of instance linking approaches and (2) are impractical to use because they lack time efficiency. We revisit the combination of string and semantic similarities for linking instances. Contrary to the literature, our results suggest that this combination can improve the F-measure achieved by instance linking systems when the combination of the measures is performed by a machine learning approach. To achieve this in- sight, we had to address the scalability of semantic similarities. We hence present a framework for the rapid computation of semantic similarities based on edge counting. This runtime improvement allowed us to run an evaluation of 5 bench- mark datasets. Our results suggest that combining string and semantic similarities can improve the F-measure by up to 6% absolute. 1 Introduction RDF knowledge graphs (KGs) are used in a plethora of applications [10], especially when published using the Linked Data paradigm. The provision of links3 between such KGs is of central importance for numerous tasks such as federated queries [22] and question answering [25]. Popular solutions to linking instances (often called link dis- covery, short LD in the literature, see [12] for a survey) often implement specialized measures for particular datatypes (e.g., geospatial or temporal data). In all other cases, state-of-the-art LD frameworks such as SILK [1] and LIMES [14] rely on string similar- ities and machine learning to compute links between instances in RDF KGs. While the Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons Li- cense Attribution 4.0 International (CC BY 4.0). This work has been supported by the EU H2020 project KnowGraphs (GA no. 860801) as well as the BMVI projects LIMBO (GA no. 19F2029C) and OPAL (GA no. 19F2028A). 3 The fourth principal of Linked Data, see http://www.w3.org/DesignIssues/ LinkedData 2 K. Georgala et al. use of string similarities has been shown to work well in a large number of papers (see, e.g., [12,4]), string similarities have the major drawback of not considering the seman- tics of the sequences of tokens they aim to compare. Hence, most string similarity mea- sures return low scores for pairs of strings such as (lift, elevator), (holiday, vacation), (headmaster, principal) and (aubergine, eggplant), although they often stand for the same real-world concepts. Edge-counting semantic similarities (e.g., [26,9,20]) alleviate this problem by using a dictionary to compute a semantic distance between sequence of tokens within the need for an overlap. The synonymy between aubergine and eggplant would hence lead semantic similarity to assign the pair (aubergine, eggplant) a similarity score close to 1. The use of semantic similarities has been paid little attention to in LD for at least two reasons: First, semantic similarities scale poorly and are thus impractical when used on large knowledge graphs.4 Moreover, current works (e.g., [11]) suggest that they lead to no improvement in F-measure. The goal of this paper is hence twofold: (1) we present means to accelerate the computation of four popular bounded edge-counting semantic similarities. (2) We then combine string and semantic similarities using two state-of- the-art machine learning approaches for LD. Our results refute the current state of the art and suggest that semantic similarities can help achieve better results in LD. 2 Preliminaries The formal framework underlying our preliminaries is derived from [23]. A KG K is a set of triples (s, p, o) ∈ (I ∪ B) × I × (I ∪ B ∪ L), where I is the set of all IRIs, B is the set of all RDF blank nodes and L is the set of all literals. LD frameworks aim to compute the set M = {(s, t) ∈ S × T : R(s, t)} where S and T are sets of RDF resources and R is a binary relation. Note that this setting generalizes what is often known as entity matching or deduplication [12], where the relation R must be owl:sameAs. Given that M is generally difficult to compute directly, declarative LD frameworks compute an approximation M 0 ⊆ S × T of M by executing a link specification (LS), which we define formally in the following. Let M be the set of all similarity functions. We define a similarity function m ∈ M as a function m : S × T × P 2 → [0, 1], where P is the set of all properties, where ps , pt ∈ P. We write m(s, t, ps , pt ) to signify the similarity of s and t w.r.t. their properties ps resp. pt . An atomic LS L is a pair L = ((m(ps , pt ), θ), where θ ∈ [0, 1] is a similarity threshold. A complex LS L is a tuple L = op(L1 , L2 ) where two subspecification L1 and L2 are combined using the specification operator op. Here, we consider the binary operators union (t), intersection (u) and difference (\). The edge-counting semantic similarities are based on a lexical vocabulary. We de- fine a lexical vocabulary as a directed acyclic graph (DAG) G = (V, E), where: – The set of vertices V is a set of concepts ci , were each ci stands for a set of syn- onyms. We denote |V | with nV . – E ⊆ V × V is a set of directed edges ejk = (cj , ck ). We denote |E| with nE . 4 This general finding is supported by our evaluation results presented in Section 4. Applying edge-counting semantic similarities to Link Discovery 3 – The edge ejk stands for the hypernymy relation from a parent concept cj to a child concept ck . We write cj → ck and we say that cj is a hypernym of ck . We also de- fine the hyponymy relation as a directed relation from a child concept ck to a parent concept. We write cj ← ck and we say that cj is a hyponym of ck . Hypernymy and hyponymy are transitive. – The root r is the unique node of the dictionary that has no parent concept. – A leaf concept ci is a concept node without any children concepts. – A concept is a common subsumer of c1 and c2 (denoted cs(c1 , c2 )) iff that concept is a hypernym of both c1 and c2 . – The least common subsumer (LSO) of c1 and c2 (denoted lso(c1 , c2 )) is “the most specific concept which is an ancestor of both c1 and c2 ” [26]. – We define the directed path from c1 to c2 via a common subsumer cs(c1 , c2 ) as: path(c1 , c2 ) = {c1 ← ci ← . . . ← cs(c1 , c2 ) → cj → . . . → c2 : i, j, k ∈ N, i, j, k ≤ nv }. Note that there can be multiple path(c1 , c2 ) between two con- cepts. – len(c1 , c2 ) is the length of the shortest path(c1 , c2 ) between two concepts c1 and c2 . Note that len defines a metric. Hence, it is symmetric and abides by the triangle inequality, i.e., len(c1 , c2 ) ≤ len(c1 , c3 ) + len(c2 , c3 ) for any (c1 , c2 , c3 ) ∈ V 3 . – We define depthm (ci ) as the length of the shortest path between r and ci . Analo- gously, depthM (ci ) as the maximum depth(ci ). We set D = max depthM (c). c∈V Note that the following holds: – depthm (r, ci ) = len(r, ci ) – depthm (lso(c1 , c2 )) ≤ min(depthm (c1 ), depthm (c2 )) – depthM (lso(c1 , c2 )) ≤ min(depthM (c1 ), depthM (c2 )) – (triangle inequality) |len(r, c1 ) − len(r, c2 )| ≤ len(c1 , c2 ) ⇔ |depthm (c1 ) − depthm (c2 )| ≤ len(c1 , c2 ) The Shortest Path (SP) similarity [20] of two concepts c1 and c2 is defined as the length of their shortest path in comparison to the maximum distance (2D). We use the normalized formulation of SP, i.e., 2D − len(c1 , c2 ) SP(c1 , c2 ) = . (1) 2D The Leacock and Chodorow metric (LCH) takes both the path between two con- cepts and the depth of the hierarchy into consideration [8]. We use the normalized for- mulation of LCH:  1 if c1 = c2 LCHN (c1 , c2 ) = − log  len(c1 ,c2 )  (2) 2D  log(2D) else. The normalized Wu Palmer (WP) similarity takes the path between two concepts and the depth of their LSO into consideration [26]: 2 × depthM (lso(c1 , c2 )) WP(c1 , c2 ) = (3) 2 × depthM (lso(c1 , c2 )) + N1 + N2 4 K. Georgala et al. where N1 = len(lso(c1 , c2 ), c1 ) and N2 = len(lso(c1 , c2 ), c2 ). The Li et al. metric (L I) is another take on using the path between two concepts and their LSO to define a similarity [9]: eβdepth(lso(c1 ,c2 )) − e−βdepth(lso(c1 ,c2 )) L I(c1 , c2 ) = e−αlen(c1 ,c2 ) βdepth(lso(c ,c )) (4) e 1 2 + e−βdepth(lso(c1 ,c2 )) where L I(c1 , c2 ) ∈ (0, 1). We set depth(lso(c1 , c2 )) = depthM (lso(c1 , c2 )), since the original specification does not state which depth(lso(c1 , c2 )) to use. 3 Approach Fundamentally, hECATE aims to compute the set M 0 = {(s, t) ∈ S×T : m(s, t, ps , pt ) ≥ θ}, where m is an edge-counting similarity. To achieve this goal, the approach makes use of upper bounds which can be derived from the formulation of this family of mea- sures. Take the SP similarity for example: For any two concepts c1 and c2 , SP(c1 , c2 ) ≥ θ implies len(c1 , c2 ) ≤ 2D(1 − θ). Formally, this means that we can discard all com- parisons of pairs (c1 , c2 ) with len(c1 , c2 ) > 2D(1 − θ) without compromising the computation of M 0 . Note that the computation of len(c1 , c2 ) can be carried out on- line or offline, which affects the total runtime of our approach as discussed in Section 4. As similar bounds can be derived for the other edge-counting measures, hECATE generalizes the computation of M 0 for edge-counting semantic similarities by using the following algorithm. Our approach takes (1) two sets of resources, S and T , (2) an atomic LS L = ((m(ps , pt ), θ), where m is one of the four semantic similarities de- scribed in Section 2, and (3) a lexical vocabulary structured as DAG (VDAG) as input. Our goal is to compute the mapping M 0 = [[L]] For each pair (s, t), hECATE retrieves and pre-processes the property values for ps resp. pt . The pre-processing consists of tokenizing and extracting all stop-words from the objects of the triples (s, ps , os ) and (t, pt , ot ). In order to include a pair (s, t) in M 0 , the algorithm compares each set of source tokens from os (sT okens) to each set of target tokens of ot (tT okens). The pair of objects (os , ot ) with the highest similarity which abides by the bounds we de- rive for each measure is finally used to compute the similarity between s and t, and decides whether or not this pair should be added to M 0 . To do so, for each token sT oken ∈ sT okens, we find the tT oken ∈ tT okens that is most similar. First, the algorithm checks if sT oken and tT oken have been compared before If the tokens are being compared for the first time the algorithm checks if the tokens are equal and as- signs the value of 1 to T T Sim. Otherwise, it calls the function compare(sToken, tToken, VDAG) that compares the corresponding sets of concepts obtained from the input VDAG. 5 Then, T T Sim is compared to the maximum token-to-token similar- ity and maxT T Sim is updated. The procedure continues until the highest similarity between the current sT oken and a tT oken is found or maxT T Sim is equal to 1. The algorithm aggregates the highest similarities maxT T Sim of all sT oken ∈ sT okens and calculates an average similarity. This is done for all pairs of (sT okens, tT okens) searching for the pair with the maximum similarity. If this maxSimilarity > θ the 5 Note that our algorithm handles homonyms by considering that a token can be included in more than one concept. Applying edge-counting semantic similarities to Link Discovery 5 pair (s, t) can be added to the final mapping M 0 . The key behind hECATE lies in the token comparison algorithm compare(sToken, tToken, VDAG) (Algorithms 1 and 3). For a pair of tokens (sToken, tToken), we retrieve the set of concepts they belong to in the VDAG. If both sets of concepts are not empty, we compare each source sCon with each target concept tCon and define the maximum similarity of two tokens as the highest similarity of the corresponding concept pairs. To do so, we first retrieve the set of all hypernym paths of each concept to the root of the VDAG using the getPaths(concept, VDAG) algorithm. This algorithm traverses the VDAG by utilizing the hypernym relation. It starts from the concept node and explores all paths to the root node.For SP and LCH, we additionally retrieve the maximum depth D found in the VDAG and the len(sCon, tCon) before calculating the corresponding sim- ilarity as described in Equations 1 and 2 resp. For calculating len(sCon, tCon) our algorithm relies on the set of hypernym paths of the concepts (Algorithm 2). For each pair of hypernym paths hp1 and hp2 the two concepts have, the algorithm iterates over both paths simultaneously, from top to bottom, until they do not share a common node. Then, it proceeds in calculating the length of the newly found path, as the number of concepts that the two paths do not have in common. Finally, the minimum length that has been found is returned. For WP and L I, the comparison algorithm retrieves the depth of the LSO between sCon and tCon (depth( lso(sCon, tCon)), and N1 and N2 by calling the function getLSO (hps1 , hps2 ) (Algorithm 4). This function uti- lizes the set of hypernym paths in a similar manner as the min length algorithm. For each combination of hypernym paths hp1 and hp2 of the concepts, the algorithm tra- verses them simultaneously searching for the last node they have in common. If this node is deeper than any other common node found so far or it has the same depth but the remaining paths are shorter, it is taken as new LSO. Accordingly, the remaining path lengths N1 and N2 are updated. Based on the deepest LSO and the derived val- ues for depthM (lso(sCon, tCon), N1 and N2 , we proceed in calculating the corresponding similarity as described in Equations 3 and 4 resp. Our first extension of hECATE is based on the idea of pre-computing and storing a set of values that are used often in our algorithm. For edge-counting similarities, these are the hypernym paths. Consequently, the extension hECATE-I of hECATE precom- putes all hypernym paths for all concepts included in the VDAG, using the getPaths( concept, VDAG) function. Therefore, every time the getPaths(concept, VDAG) is invoked at runtime, hECATE-I retrieves the paths from an index. Our second ex- tension of hECATE, hECATE-IF, combines hECATE-I with the idea of minimizing unnecessary comparison between concepts by filtering out pairs of source and target concepts that do not satisfy a condition for each semantic similarity. The filtering is performed inside compare(sToken, tToken, VDAG) for each pair of concepts sCon and tCon. Given a semantic similarity, if a pair of concepts satisfies the corre- sponding filtering condition, then the algorithm proceeds normally as described before. If the condition is not met the algorithm does not compute the similarity between the two concepts. For the SP similarity, two concepts will be considered for comparison, if the following holds: 2D − len(c1 , c2 ) SP(c1 , c2 ) ≥ θ ⇔ ≥θ 2D (5) ⇒ |depthm (c1 ) − depthm (c2 )| ≤ 2D(1 − θ) For the WP similarity, the following must hold: 6 K. Georgala et al. Algorithm 1: compare(sCon, tCon, Algorithm 3: compare(sCon, tCon, V DAG) for SP or LCH V DAG) for WP or L I Input: source concept sCon, target Input: source concept sCon, target concept tCon, and a vocabulary concept tCon, and a vocabulary DAG VDAG DAG VDAG Output: a similarity value Output: a similarity value 1 D ← V DAG.getM axDepth(sCon) 1 hps1 ← getP aths(sCon, V DAG) 2 hps1 ← getP aths(sCon, V DAG) 2 hps2 ← getP aths(tCon, V DAG) 3 hps2 ← getP aths(tCon, V DAG) 3 depth, N1 , N2 ← getLSO(hps1 , hps2 ) 4 minLength ← 4 Return getM inLength(hps1 , hps2 ) computeSimilarity(N1 , N2 , depth) 5 Return computeSimilarity(D, minLength) Algorithm 4: getLSO(hps1 , hps2 ) Input: two sets of hypernym paths, hps1 Algorithm 2: getM inLength(hps1 , and hps2 hps2 ) Output: depthM (lso(sCon, tCon)), N1 Input: two sets of hypernym paths, hps1 and N2 and hps2 1 dLSO ← 0, N1 ← 0, N2 ← 0 Output: len(sCon, tCon) 2 foreach hp1 ∈ hps1 do 1 size ← M AX V ALU E 3 foreach hp2 ∈ hps2 do 2 foreach hp1 ∈ hps1 do 4 l1 ← 0, l2 ← 0 3 foreach hp2 ∈ hps2 do 5 while l1 < hp1 .size() ∧ l2 < 4 l1 ← 0, l2 ← 0 hp2 .size() ∧ hp1 .get(l1 ) == 5 while l1 < hp1 .size() ∧ l2 < hp2 .get(l2 ) do hp2 .size() ∧ hp1 .get(l1 ) == 6 l1 ← l1 + 1, l2 ← l2 + 1 hp2 .get(l2 ) do 7 newSize ← 6 l1 ← l1 + 1, l2 ← l2 + 1 hp1 .size() + hp2 .size() − 2l1 7 newSize ← 8 oldSize ← N1 + N2 hp1 .size() + hp2 .size() − 2l1 9 if condition is met then 8 if newSize < size then 10 dLSO ← l1 , size ← newSize ; N1 ← hp1 .size() − l1 11 N2 ← hp2 .size() − l2 9 Return size 12 Return dLSO, N1 , N2 2depthM (lso(c1 , c2 )) W U (c1 , c2 ) ≥ θ ⇔ ≥θ 2depthM (lso(c1 , c2 )) + N1 + N2 ⇔ 2depthM (lso(c1 , c2 )) ≥ θ(N1 + N2 ) + 2θdepthM (lso(c1 , c2 )) 2depthM (lso(c1 , c2 ))(1 − θ) (6) ⇔ N1 + N2 ≤ θ 2 min(depthM (c1 ), depthM (c2 ))(1 − θ) ⇒ N1 + N2 ≤ θ Based on the triangle inequality and Section 2, Equation 6 can be written as: Applying edge-counting semantic similarities to Link Discovery 7 2 min(depthM (c1 ), depthM (c2 ))(1 − θ) len(c1 , c2 ) ≤ ⇒ θ (7) 2min(depthM (c1 ), depthM (c2 ))(1 − θ) |depthm (c1 ) − depthm (c2 )| ≤ θ For the LCH similarity, two concepts will be considered for comparison, iff: −log len(c 1 ,c2 ) 2D log(2D) − log(len(c1 , c2 )) LCH(c1 , c2 ) ≥ θ ⇔ ≥θ⇔ ≥θ⇔ log(2D) log(2D) log(len(c1 , c2 )) (8) 1− ≥ θ ⇔ log(len(c1 , c2 )) ≤ log(2D)(1 − θ) ⇔ log(2D) len(c1 , c2 ) ≤ 2log(2D)(1−θ) ⇒ |depthm (c1 ) − depthm (c2 )| ≤ 2log(2D)(1−θ) When considering the L I similarity, we make the following variable replacements for the sake of legibility: x = depthM (lso(c1 , c2 )), y = min(depthM (c1 ), depthM (c2 )) and z = len(c1 , c2 ). Then, two concepts will be considered for comparison, iff: (e2βx −1) eβx − e−βx eβx − e−βx L I(c1 , c2 ) ≥ θ ⇔ e−αz βx ≥θ⇔e αz ≤ βx eβx ⇔ eαz ≤ (e2βx ⇔ e + e−βx (e + e−βx )θ βx +1) θ e αz (e2βx − 1) (e2βy − 1) e ≤ 2βx ⇒ eαz ≤ 2βy ⇔ αz ≤ ln(e2βy − 1) − lnθ − ln(e2βy + 1) ⇔ (e + 1)θ (e + 1)θ ln(e2βy − 1) − lnθ − ln(e2βy + 1) |depthm (c1 ) − depthm (c2 )| ≤ α (9) Based on Equations 5, 7, 8 and 9, each filtering condition requires the knowledge of depthm (sCon), depthM (sCon), depthm (tCon) and depthM (tCon). Hence, we fur- ther extend the index hECATE-IF relies on by precomputing depthm (ci ) and depthM (ci ) for every concept ci . 4 Evaluation Our evaluation addresses the following three research questions: Q1 . How do our strate- gies for improving the runtime of semantic similarities compare to each other w.r.t. runtime?, Q2 . How do the different edge-counting semantic similarities compare w.r.t. runtime?, and Q3 . Can semantic similarities improve the F-measure of LD systems? We evaluate our approach against five benchmark data sets: Abt-Buy, Amazon-GP and DBLP-ACM described in [7], DailyMed-Drugbank (dubbed DM-DB) and Movies described in [16]. We use WordNet6 as a DAG. To address Q1 and Q2 , we conduct a set of experiments using the basic hECATE algorithm (dubbed hECATE-B) as a baseline as well as hECATE-I and hECATE-IF. For an easier comparison, all methods are implemented in the LD framework L IMES [14]. For hECATE-B and hECATE-I, we create one atomic LS for each semantic similarity, where m iss the name of the edge- counting similarity, θ = 0.1. We use the ’description’ as the source and target properties for Abt-Buy and Amazon-GP datasets, ’title’ for the DBLP-Scholar and 6 https://wordnet.princeton.edu/ 8 K. Georgala et al. Movies datasets and ’name’ for the DM-DB dataset. For hECATE-IF, we use the same values for m, ps and pt as before, but θ is derived from the interval [0.1, 1] with an increment step of 0.1, since the θ is given as a parameter to the filtering functions. For each dataset, we perform the aforementioned LSs against 2v instances from the source and target datasets. We start with v = 2 and increment v until all instances are covered (e.g., the maximal value of v is 9 for the Amazon-Google dataset). We define a maximum runtime for each LS of 2 hrs. Each experiment is executed 3 times and we present the average values. As explained in Section 1, the second goal of this work is to evaluate edge-counting semantic similarities in LD in terms of accuracy. Consequently, for Q3 , we use the hECATE extension with the best runtime performance based on the results of Q1 and executed a set of experiments using 2 machine learning (ML) algorithms: W OM - BAT [23] and D RAGON [19]. We choose these two approaches because (1) they achieve state-of-the-art performance while being deterministic, (2) they are open-source, mean- ing our experiments can be easily reproduced and (3) they are able to generate complex link specifications with any arbitrary number of measures. We perform a 10-fold cross validation by allowing W OMBAT and D RAGON to use only string similarities (StrSim), only semantic similarities (SmtSim) and a combination of both (StrSmtSim) as input. We use the levenshtein, cosine and qgrams similarity measures for strings implemented in L IMES [14]. For each dataset, we use all properties apart from those that corresponded to numeric values. W OMBAT is configured as presented in [23] and D RAGON is configured as presented in [19]. We use two termination criteria for W OM - BAT : Either a LS with F-measure of 1 is found or a maximal depth of refinement of 10 is reached. For the string similarities, W OMBAT produced LSs with a minimum θ value of 0.4 and for the semantic similarities, the minimum θ value is set to 0.7. D RAGON terminates either when no new nodes are found or when the height of the decision tree reached the maximum of 3. Additionally, we compare the achieved F1 scores with scores for E AGLE [15], E UCLID [13], J48 [5] reported by [19], a Multilayer Perceptron classifier reported by [24] and the Pessimistic as well as Re-weighted versions of the work presented at [6]. As expected, Figure 1 shows that hECATE-B has the highest runtimes compared to hECATE-I and hECATE-IF in all datasets, except DM-DB. This supports the claim that semantic similarities typically scale poorly. The results show that both extensions improve the runtime of all semantic similarities, making them more amenable for LD and scalable for larger datasets. Precisely, LCH’s, WP’s and SP’s runtimes improve by 71% and 57% on average when hECATE-I and hECATE-IF strategies are used resp. L I has the least improvement by 65% and 50%. Comparing the two extensions, in all datasets and for all semantic similarities, hECATE-I outperforms hECATE-IF by 30% on average. A detailed analysis of the runtimes shows that even though hECATE-IF reduces the number of comparisons between semantically different concepts and thus the comparison time, the additional runtime cost of filtering creates an overhead that re- sults in a worse total execution time than hECATE-I (Table 1). Regarding the DM-DB dataset, the only property for both source and target datasets, name, consists of only one value, which corresponds to the official name of a drug. That value can only be asso- ciated with one concept. As a result, introducing an indexing and/or filtering technique Applying edge-counting semantic similarities to Link Discovery 9 3E03 hECATE-B hECATE-I hECATE-IF 4E03 hECATE-B hECATE-I hECATE-IF 2E01 hECATE-B hECATE-I hECATE-IF Overall Runtime for S=1081, T=1092 Overall Runtime for S=1046, T=936 2E03 3E03 Overall Runtime for S=512, T=512 2E01 2E03 3E03 1E01 2E03 2E03 1E01 2E03 2E03 1E01 1E03 2E03 8E00 1E03 1E03 6E00 8E02 1E03 5E02 7E02 4E00 3E02 4E02 2E00 0E00 0E00 0E00 LCH Li Wu-Palmer Shortest Path LCH Li Wu-Palmer Shortest Path LCH Li Wu-Palmer Shortest Path (a) Abt-Buy (b) Amazon-GP (c) DM-DB 3E03 2E02 hECATE-B hECATE-I hECATE-IF hECATE-B hECATE-I hECATE-IF Overall Runtime for S=2616, T=2294 Overall Runtime for S=1017, T=1042 2E03 2E02 2E03 2E02 2E03 2E02 2E03 1E02 1E03 1E02 1E03 1E02 8E02 7E01 5E02 5E01 3E02 2E01 0E00 0E00 LCH Li Wu-Palmer Shortest Path LCH Li Wu-Palmer Shortest Path (d) DBLP-ACM (e) Movies Fig. 1: Average runtime in seconds of hECATE-B, hECATE-I and hECATE-IF on all datasets. For hECATE-IF, the standard deviation among different θ values is added. produces an unnecessary overhead. Overall, Q1 can be answered with hECATE-I being the most efficient approach. To answer Q2 we compare the runtimes of the single semantic similarities revealing that L I has the worst runtime (see Figure 1). For the Movies dataset, we notice that hECATE-I requires 100K more token comparisons for L I compared to the other sim- ilarities (Table 1). The better runtime of the other similarities is caused by a condition inside our algorithm which stops as soon as two tokens/concepts have a similarity of 1. In contrast to the other similarities, L I(c1 , c2 ) ∈ (0, 1), i.e., it can never be 1. However, based on Table 1, L I’s runtime shows a great improvement as the values of θ increase in relation to the other metrics. This justifies the fact that the runtimes for L I have the highest standard deviation, whereas SP, LCH and WP are less influenced by the differ- Table 1: Number of concept comparisons performed by hECATE-I and hECATE-IF for the Movies dataset. The numbers for hECATE-B are the same as for hECATE-I. Threshold 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 hECATE-I 61.8M 61.8M 61.8M 61.8M 61.8M 61.8M 61.8M 61.8M 61.8M 61.8M SP hECATE-IF 61.8M 61.8M 61.8M 61.8M 61.8M 61.8M 61.8M 61.0M 51.4M 10.3M hECATE-I 61.8M 61.8M 61.8M 61.8M 61.8M 61.8M 61.8M 61.8M 61.8M 61.8M WP hECATE-IF 61.8M 61.8M 61.8M 61.7M 61.5M 60.3M 56.7M 44.7M 27.8M 10.3M hECATE-I 61.8M 61.8M 61.8M 61.8M 61.8M 61.8M 61.8M 61.8M 61.8M 61.8M LCH hECATE-IF 61.8M 61.8M 61.8M 61.4M 60.4M 56.5M 42.4M 42.4M 28.5M 28.5M hECATE-I 61.9M 61.9M 61.9M 61.9M 61.9M 61.9M 61.9M 61.9M 61.9M 61.9M LI hECATE-IF 61.9M 61.4M 59.9M 56.6M 51.5M 42.1M 28.3M 28.2M 10.0M 00.0M 10 K. Georgala et al. Table 2: Average F-measure achieved by W OMBAT, D RAGON, E UCLID, E AGLE, J48 and Multilayer Perception within a 10-fold cross validation. The semantic similarities use the hECATE-I strategy. Algorithm W OMBAT D RAGON E UCLID E AGLE J48 Perceptron Similarities StrSim SmtSim StrSmtSim StrSim SmtSim StrSmtSim StrSim StrSim StrSim StrSim Abt-Buy 0.65 0.65 0.66 0.51 0.02 0.10 0.00 0.56 0.43 0.43 Amazon-GP 0.71 0.60 0.77 0.64 0.06 0.05 0.71 0.73 0.41 0.36 DBLP-ACM 0.97 0.74 0.97 0.93 0.81 0.93 0.98 0.98 0.77 0.97 DM-DB 0.94 0.71 0.97 0.89 0.65 0.89 1.00 1.00 0.94 - Movies 1.00 0.73 1.00 0.93 0.80 0.93 0.98 0.99 0.84 - Table 3: Maximum F-measure achieved by W OMBAT, D RAGON, Pessimistic and Re- weighted using 2% of the data for training over 7 iterations [6]. The semantic similari- ties use the hECATE-I strategy. Algorithm W OMBAT D RAGON Pessimistic Re-weighted Similarities StrSim SmtSim StrSmtSim StrSim SmtSim StrSmtSim StrSim StrSim Abt-Buy 0.35 0.39 0.34 0.24 0.10 0.24 0.36 0.37 Amazon-GP 0.53 0.33 0.43 0.45 0.13 0.35 0.39 0.43 DBLP-ACM 0.91 0.55 0.91 0.90 0.66 0.90 0.93 0.95 DM-DB 0.94 0.71 0.97 0.94 0.71 0.96 - - Movies 0.97 0.33 0.97 0.96 0.33 0.96 - - ent values of θ. The answer for Q2 is that for all hECATE strategies, SP is the fastest similarity, whereas L I is the slowest. To answer Q3 , we add the 4 edge-counting measures L I, WP, SP, and LCH to the state-of-the-art algorithms W OMBAT [23] and D RAGON [19]. We evaluate their per- formance with and without string similarities using a ten-fold cross validation. Table 2 shows the results of our experiments with these machine-learning algorithms. In the 6 right most columns of Table 2, we report the F1 score of the string-based LD algorithms. While the performance of D RAGON remained the same or even worsened for 3 of the 5 datasets, adding semantic similarities to the W OMBAT algorithm improved its overall performance for 3 datasets by up to 6% F-measure absolute. As expected, this effect is most pronounced in datasets which rely on long textual descriptions such as Amazon- GP. A look into the specifications learned by W OMBAT suggests that this effect is due to the approach combining semantic and string similarities using operators such as t and learning the correct threshold for each of these measures. The improvement on the DM-DB datasets is achieved using the \ operator, not allowing semantically similar concepts to be matched together. This refutes current results (see [11] where the same similarities have been used) and suggests that the refinement operators can combine semantic and string similarities in a way that improves the F-measure. For enabling a comparison with [6], we used the same configuration setting and report the maximum F-measure in Table 3. It can be seen that WOMBAT outperforms the Pessimistic and Re-weighted methods on the majority of the datasets. Applying edge-counting semantic similarities to Link Discovery 11 5 Related Work We give a brief overview of linking approaches which use semantic similarities. An exhaustive list of frameworks can be found in [12]. Over the past few years, semantic similarities were used in ontology matching (OM) [21]. In this context, concepts in two ontologies O1 and O2 are often matched based on a third ontology, e.g., WordNet. This ontology can be viewed as a background knowledge source or a mediating ontology [2]. Frameworks such as AgreementMaker [3], Zhishi.links [18] and RuleMiner [17] utilize semantic similarities in this way to improve structural matching on the ontology level. While these enhancements have a positive effect on their instance level matching, to the best of our knowledge no instance linking tool has used semantic similarities directly and shown an improvement of the overall linking results. [11] compare the effect of a predefined set of combinations of string and semantic similarities for label comparison and suggest that semantic similarities do not improve the F-measure of the instance matching task. Our results suggest the contrary by showing that dataset-specific combi- nations of measures actually can achieve a better performance. 6 Conclusions and Future Work To study the effect of semantic similarities on LD, we presented hECATE, a generic framework for improving the runtime of edge-counting semantic similarities. Our eval- uation of the framework shows that there is still a lot of potential in improving the runtime of semantic similarities for LD. We used hECATE to evaluate the performance of string similarities in LD on five datasets. Our evaluation shows that combining se- mantic similarities with string similarities can indeed increase the F-measure achieved by LD algorithms. This result is of central importance as it goes against current assump- tions. The reason why we are indeed able to use semantic similarities for improving the F-measure of LD in some cases lies in the refinement operator employed by W OMBAT. In future works, we will investigate means that will allow improving the runtimes of semantic similarities, extend our works beyond edge-counting similarities and aim to classify datasets w.r.t. how suitable they are for semantic similarities. References 1. Bizer, C., Volz, J., Kobilarov, G., Gaedke, M.: Silk - A Link Discovery Framework for the Web of Data. In: 18th International World Wide Web Conference (April 2009) 2. Cross, V., Silwal, P., Morell, D.: Using a reference ontology with semantic similarity in ontology alignment. In: Proceedings of the 3rd ICBO (2012) 3. Cruz, I.F., Antonelli, F.P., Stroe, C.: Agreementmaker: Efficient matching for large real-world schemas and ontologies. PVLDB 2, 1586–1589 (2009) 4. Euzenat, J., Ferrara, A., Meilicke, C., Nikolov, A., Pane, J., Scharffe, F., Shvaiko, P., Stuck- enschmidt, H., Šváb-Zazamal, O., Svátek, V., et al.: Results of the ontology alignment eval- uation initiative 2010. Tech. rep., University of Trento (2011) 5. Holmes, G., Donkin, A., Witten, I.H.: Weka: a machine learning workbench. In: Proceedings of ANZIIS ’94. pp. 357–361 (Nov 1994) 12 K. Georgala et al. 6. Kejriwal, M., Miranker, D.P.: Semi-supervised instance matching using boosted classifiers. In: The Semantic Web. Latest Advances and New Domains. pp. 388–402. Springer Interna- tional Publishing (2015) 7. Köpcke, H., Thor, A., Rahm, E.: Evaluation of Entity Resolution Approaches on Real-world Match Problems. Proc. VLDB Endow. 3(1-2), 484–493 (Sep 2010) 8. Leacock, C., Chodorow, M.: Combining local context and wordnet similarity for word sense identification (01 1998) 9. Li, Y., Bandar, Z.A., McLean, D.: An approach for measuring semantic similarity between words using multiple information sources. IEEE Trans. on Knowl. and Data Eng. 15(4), 871–882 (Jul 2003) 10. Malyshev, S., Krötzsch, M., González, L., Gonsior, J., Bielefeldt, A.: Getting the most out of wikidata: Semantic technology usage in wikipedia’s knowledge graph. In: International Semantic Web Conference. pp. 376–394 (2018) 11. McCrae, J.P., Buitelaar, P.: Linking Datasets Using Semantic Textual Similarity. CYBER- NETICS AND INFORMATION TECHNOLOGIES 18(1), 109–123 (2018) 12. Nentwig, M., Hartung, M., Ngonga Ngomo, A.C., Rahm, E.: A survey of current link dis- covery frameworks. Semantic Web pp. 1–18 (2015) 13. Ngomo, A.C.N., Lyko, K.: Unsupervised learning of link specifications: deterministic vs. non-deterministic. In: OM (2013) 14. Ngonga Ngomo, A.C.: On Link Discovery using a Hybrid Approach. Journal on Data Se- mantics 1(4), 203–217 (2012) 15. Ngonga Ngomo, A.C., Lyko, K.: Eagle: Efficient active learning of link specifications us- ing genetic programming. In: The Semantic Web: Research and Applications. pp. 149–163. Springer Berlin Heidelberg, Berlin, Heidelberg (2012) 16. Ngonga Ngomo, A.C., Lyko, K.: Unsupervised learning of link specifications: deterministic vs. non-deterministic. In: Proceedings of the Ontology Matching Workshop (2013) 17. Niu, X., Rong, S., Wang, H., Yu, Y.: An effective rule miner for instance matching in a web of data. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management. pp. 1085–1094. CIKM ’12, ACM, New York, NY, USA (2012) 18. Niu, X., Rong, S., Zhang, Y., Wang, H.: Zhishi.links results for OAEI 2011. Ontology Match- ing p. 220 (2011) 19. Obraczka, D., Ngomo, A.C.N.: Dragon: Decision tree learning for link discovery. In: 19TH International Conference On Web Engineering. Springer (2019) 20. Rada, R., Mili, H., Bicknell, E., Blettner, M.: Development and application of a metric on semantic nets. IEEE Trans. Systems, Man, and Cybernetics 19, 17–30 (1989) 21. Rodrı́guez, M.A., Egenhofer, M.J.: Determining semantic similarity among entity classes from different ontologies. IEEE Trans. on Knowl. and Data Eng. 15(2), 442–456 (2003) 22. Saleem, M., Ali, M.I., Verborgh, R., Ngonga Ngomo, A.C.: Federated query processing over linked data. In: Tutorial at ISWC (2015) 23. Sherif, M., Ngonga Ngomo, A.C., Lehmann, J.: WOMBAT - A Generalization Approach for Automatic Link Discovery. In: 14th Extended Semantic Web Conference, Portorož, Slove- nia, 28th May - 1st June 2017 (2017) 24. Soru, T., Ngomo, A.C.N.: A comparison of supervised learning classifiers for link discovery. In: Proceedings of the 10th Intern. Conf. on Semantic Systems. pp. 41–44. ACM (2014) 25. Usbeck, R., Ngonga Ngomo, A.C., Haarmann, B., Krithara, A., Röder, M., Napolitano, G.: 7th open challenge on question answering over linked data (QALD-7). In: Semantic Web Evaluation Challenge. pp. 59–69. Springer International Publishing (2017) 26. Wu, Z., Palmer, M.: Verbs semantics and lexical selection. In: Proceedings of the 32Nd Annual Meeting on Association for Computational Linguistics. pp. 133–138. ACL ’94, As- sociation for Computational Linguistics, Stroudsburg, PA, USA (1994)