<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Applying edge-counting semantic similarities to Link Discovery: Scalability and Accuracy</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Kleanthi Georgala</string-name>
          <email>georgala@informatik.uni-leipzig.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mohamed Ahmed Sherif</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Michael Ro¨ der</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Axel-Cyrille Ngonga Ngomo</string-name>
          <email>axel.ngongag@upb.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, Paderborn University</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Computer Science, University of Leipzig</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>With the growth in number and variety of RDF datasets comes an increasing need for both scalable and accurate solutions to support link discovery at instance level within and across these datasets. In contrast to ontology matching, most linking frameworks rely solely on string similarities to this end. The limited use of semantic similarities when linking instances is partly due to the current literature stating that they (1) do not improve the F-measure of instance linking approaches and (2) are impractical to use because they lack time efficiency. We revisit the combination of string and semantic similarities for linking instances. Contrary to the literature, our results suggest that this combination can improve the F-measure achieved by instance linking systems when the combination of the measures is performed by a machine learning approach. To achieve this insight, we had to address the scalability of semantic similarities. We hence present a framework for the rapid computation of semantic similarities based on edge counting. This runtime improvement allowed us to run an evaluation of 5 benchmark datasets. Our results suggest that combining string and semantic similarities can improve the F-measure by up to 6% absolute.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        RDF knowledge graphs (KGs) are used in a plethora of applications [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], especially
when published using the Linked Data paradigm. The provision of links3 between such
KGs is of central importance for numerous tasks such as federated queries [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] and
question answering [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ]. Popular solutions to linking instances (often called link
discovery, short LD in the literature, see [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] for a survey) often implement specialized
measures for particular datatypes (e.g., geospatial or temporal data). In all other cases,
state-of-the-art LD frameworks such as SILK [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and LIMES [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] rely on string
similarities and machine learning to compute links between instances in RDF KGs. While the
Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons
License Attribution 4.0 International (CC BY 4.0). This work has been supported by the EU
H2020 project KnowGraphs (GA no. 860801) as well as the BMVI projects LIMBO (GA no.
19F2029C) and OPAL (GA no. 19F2028A).
3 The fourth principal of Linked Data, see http://www.w3.org/DesignIssues/
      </p>
      <p>
        LinkedData
use of string similarities has been shown to work well in a large number of papers (see,
e.g., [
        <xref ref-type="bibr" rid="ref12 ref4">12,4</xref>
        ]), string similarities have the major drawback of not considering the
semantics of the sequences of tokens they aim to compare. Hence, most string similarity
measures return low scores for pairs of strings such as (lift, elevator), (holiday,
vacation), (headmaster, principal) and (aubergine, eggplant), although
they often stand for the same real-world concepts. Edge-counting semantic similarities
(e.g., [
        <xref ref-type="bibr" rid="ref20 ref26 ref9">26,9,20</xref>
        ]) alleviate this problem by using a dictionary to compute a semantic
distance between sequence of tokens within the need for an overlap. The synonymy
between aubergine and eggplant would hence lead semantic similarity to assign
the pair (aubergine, eggplant) a similarity score close to 1.
      </p>
      <p>
        The use of semantic similarities has been paid little attention to in LD for at least two
reasons: First, semantic similarities scale poorly and are thus impractical when used on
large knowledge graphs.4 Moreover, current works (e.g., [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]) suggest that they lead to
no improvement in F-measure. The goal of this paper is hence twofold: (1) we present
means to accelerate the computation of four popular bounded edge-counting semantic
similarities. (2) We then combine string and semantic similarities using two
state-ofthe-art machine learning approaches for LD. Our results refute the current state of the
art and suggest that semantic similarities can help achieve better results in LD.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Preliminaries</title>
      <p>
        The formal framework underlying our preliminaries is derived from [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ]. A KG K is a
set of triples (s; p; o) 2 (I [ B) I (I [ B [ L), where I is the set of all IRIs, B is the
set of all RDF blank nodes and L is the set of all literals. LD frameworks aim to compute
the set M = f(s; t) 2 S T : R(s; t)g where S and T are sets of RDF resources and
R is a binary relation. Note that this setting generalizes what is often known as entity
matching or deduplication [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], where the relation R must be owl:sameAs. Given
that M is generally difficult to compute directly, declarative LD frameworks compute
an approximation M 0 S T of M by executing a link specification (LS), which we
define formally in the following. Let M be the set of all similarity functions. We define
a similarity function m 2 M as a function m : S T P2 ! [0; 1], where P is the set
of all properties, where ps; pt 2 P. We write m(s; t; ps; pt) to signify the similarity of
s and t w.r.t. their properties ps resp. pt. An atomic LS L is a pair L = ((m(ps; pt); ),
where 2 [0; 1] is a similarity threshold. A complex LS L is a tuple L = op(L1; L2)
where two subspecification L1 and L2 are combined using the specification operator
op. Here, we consider the binary operators union (t), intersection (u) and difference
(n).
      </p>
      <p>
        The edge-counting semantic similarities are based on a lexical vocabulary. We
define a lexical vocabulary as a directed acyclic graph (DAG) G = (V; E), where:
– The set of vertices V is a set of concepts ci, were each ci stands for a set of
synonyms. We denote jV j with nV .
– E V V is a set of directed edges ejk = (cj ; ck). We denote jEj with nE .
4 This general finding is supported by our evaluation results presented in Section 4.
– The edge ejk stands for the hypernymy relation from a parent concept cj to a child
concept ck. We write cj ! ck and we say that cj is a hypernym of ck. We also
define the hyponymy relation as a directed relation from a child concept ck to a parent
concept. We write cj ck and we say that cj is a hyponym of ck. Hypernymy and
hyponymy are transitive.
– The root r is the unique node of the dictionary that has no parent concept.
– A leaf concept ci is a concept node without any children concepts.
– A concept is a common subsumer of c1 and c2 (denoted cs(c1; c2)) iff that concept
is a hypernym of both c1 and c2.
– The least common subsumer (LSO) of c1 and c2 (denoted lso(c1; c2)) is “the most
specific concept which is an ancestor of both c1 and c2” [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ].
– We define the directed path from c1 to c2 via a common subsumer cs(c1; c2) as:
path(c1; c2) = fc1 ci : : : cs(c1; c2) ! cj ! : : : ! c2 : i; j; k 2
N; i; j; k nvg. Note that there can be multiple path(c1; c2) between two
concepts.
– len(c1; c2) is the length of the shortest path(c1; c2) between two concepts c1 and
c2. Note that len defines a metric. Hence, it is symmetric and abides by the triangle
inequality, i.e., len(c1; c2) len(c1; c3) + len(c2; c3) for any (c1; c2; c3) 2 V 3.
– We define depthm(ci) as the length of the shortest path between r and ci.
Analogously, depthM (ci) as the maximum depth(ci). We set D = max depthM (c).
c2V
      </p>
      <sec id="sec-2-1">
        <title>Note that the following holds:</title>
        <p>– depthm(r; ci) = len(r; ci)
– depthm(lso(c1; c2)) min(depthm(c1); depthm(c2))
– depthM (lso(c1; c2)) min(depthM (c1); depthM (c2))
– (triangle inequality) jlen(r; c1) len(r; c2)j len(c1; c2) , jdepthm(c1)
depthm(c2)j len(c1; c2)</p>
        <p>
          The Shortest Path (SP) similarity [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ] of two concepts c1 and c2 is defined as the
length of their shortest path in comparison to the maximum distance (2D). We use the
normalized formulation of SP, i.e.,
        </p>
        <p>SP(c1; c2) =
2D
len(c1; c2) :
2D</p>
        <p>
          The Leacock and Chodorow metric (LCH) takes both the path between two
concepts and the depth of the hierarchy into consideration [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. We use the normalized
formulation of LCH:
        </p>
        <p>8
LCHN (c1; c2) = &lt;1 log len(2cD1;c2)
: log(2D)
if c1 = c2
else.</p>
        <p>
          The normalized Wu Palmer (WP) similarity takes the path between two concepts and
the depth of their LSO into consideration [
          <xref ref-type="bibr" rid="ref26">26</xref>
          ]:
        </p>
        <p>
          WP(c1; c2) =
where N1 = len(lso(c1; c2); c1) and N2 = len(lso(c1; c2); c2). The Li et al. metric
(LI) is another take on using the path between two concepts and their LSO to define a
similarity [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]:
        </p>
        <p>LI(c1; c2) = e
len(c1;c2) e depth(lso(c1;c2)) e depth(lso(c1;c2))
e depth(lso(c1;c2)) + e depth(lso(c1;c2))
(4)
where LI(c1; c2) 2 (0; 1). We set depth(lso(c1; c2)) = depthM (lso(c1; c2)), since the
original specification does not state which depth(lso(c1; c2)) to use.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Approach</title>
      <p>Fundamentally, hECATE aims to compute the set M 0 = f(s; t) 2 S T : m(s; t; ps; pt)
g, where m is an edge-counting similarity. To achieve this goal, the approach makes
use of upper bounds which can be derived from the formulation of this family of
measures. Take the SP similarity for example: For any two concepts c1 and c2, SP(c1; c2)
implies len(c1; c2) 2D(1 ). Formally, this means that we can discard all
comparisons of pairs (c1; c2) with len(c1; c2) &gt; 2D(1 ) without compromising the
computation of M 0. Note that the computation of len(c1; c2) can be carried out
online or offline, which affects the total runtime of our approach as discussed in Section
4. As similar bounds can be derived for the other edge-counting measures, hECATE
generalizes the computation of M 0 for edge-counting semantic similarities by using the
following algorithm. Our approach takes (1) two sets of resources, S and T , (2) an
atomic LS L = ((m(ps; pt); ), where m is one of the four semantic similarities
described in Section 2, and (3) a lexical vocabulary structured as DAG (VDAG) as input.
Our goal is to compute the mapping M 0 = [[L]] For each pair (s; t), hECATE retrieves
and pre-processes the property values for ps resp. pt. The pre-processing consists of
tokenizing and extracting all stop-words from the objects of the triples (s; ps; os) and
(t; pt; ot). In order to include a pair (s; t) in M 0, the algorithm compares each set of
source tokens from os (sT okens) to each set of target tokens of ot (tT okens). The
pair of objects (os; ot) with the highest similarity which abides by the bounds we
derive for each measure is finally used to compute the similarity between s and t, and
decides whether or not this pair should be added to M 0. To do so, for each token
sT oken 2 sT okens, we find the tT oken 2 tT okens that is most similar. First, the
algorithm checks if sT oken and tT oken have been compared before If the tokens are
being compared for the first time the algorithm checks if the tokens are equal and
assigns the value of 1 to T T Sim. Otherwise, it calls the function compare(sToken,
tToken, VDAG) that compares the corresponding sets of concepts obtained from
the input VDAG. 5 Then, T T Sim is compared to the maximum token-to-token
similarity and maxT T Sim is updated. The procedure continues until the highest similarity
between the current sT oken and a tT oken is found or maxT T Sim is equal to 1. The
algorithm aggregates the highest similarities maxT T Sim of all sT oken 2 sT okens
and calculates an average similarity. This is done for all pairs of (sT okens, tT okens)
searching for the pair with the maximum similarity. If this maxSimilarity &gt; the
5 Note that our algorithm handles homonyms by considering that a token can be included in
more than one concept.
pair (s; t) can be added to the final mapping M 0. The key behind hECATE lies in the
token comparison algorithm compare(sToken, tToken, VDAG) (Algorithms 1
and 3). For a pair of tokens (sToken, tToken), we retrieve the set of concepts
they belong to in the VDAG. If both sets of concepts are not empty, we compare each
source sCon with each target concept tCon and define the maximum similarity of
two tokens as the highest similarity of the corresponding concept pairs. To do so, we
first retrieve the set of all hypernym paths of each concept to the root of the VDAG using
the getPaths(concept, VDAG) algorithm. This algorithm traverses the VDAG by
utilizing the hypernym relation. It starts from the concept node and explores all paths to
the root node.For SP and LCH, we additionally retrieve the maximum depth D found
in the VDAG and the len(sCon, tCon) before calculating the corresponding
similarity as described in Equations 1 and 2 resp. For calculating len(sCon, tCon)
our algorithm relies on the set of hypernym paths of the concepts (Algorithm 2). For
each pair of hypernym paths hp1 and hp2 the two concepts have, the algorithm iterates
over both paths simultaneously, from top to bottom, until they do not share a common
node. Then, it proceeds in calculating the length of the newly found path, as the number
of concepts that the two paths do not have in common. Finally, the minimum length
that has been found is returned. For WP and LI, the comparison algorithm retrieves the
depth of the LSO between sCon and tCon (depth( lso(sCon, tCon)), and N1
and N2 by calling the function getLSO (hps1; hps2) (Algorithm 4). This function
utilizes the set of hypernym paths in a similar manner as the min length algorithm. For
each combination of hypernym paths hp1 and hp2 of the concepts, the algorithm
traverses them simultaneously searching for the last node they have in common. If this
node is deeper than any other common node found so far or it has the same depth but
the remaining paths are shorter, it is taken as new LSO. Accordingly, the remaining
path lengths N1 and N2 are updated. Based on the deepest LSO and the derived
values for depthM (lso(sCon, tCon), N1 and N2, we proceed in calculating the
corresponding similarity as described in Equations 3 and 4 resp.</p>
      <p>Our first extension of hECATE is based on the idea of pre-computing and storing a
set of values that are used often in our algorithm. For edge-counting similarities, these
are the hypernym paths. Consequently, the extension hECATE-I of hECATE
precomputes all hypernym paths for all concepts included in the VDAG, using the getPaths(
concept, VDAG) function. Therefore, every time the getPaths(concept, VDAG)
is invoked at runtime, hECATE-I retrieves the paths from an index. Our second
extension of hECATE, hECATE-IF, combines hECATE-I with the idea of minimizing
unnecessary comparison between concepts by filtering out pairs of source and target
concepts that do not satisfy a condition for each semantic similarity. The filtering is
performed inside compare(sToken, tToken, VDAG) for each pair of concepts
sCon and tCon. Given a semantic similarity, if a pair of concepts satisfies the
corresponding filtering condition, then the algorithm proceeds normally as described before.
If the condition is not met the algorithm does not compute the similarity between the
two concepts. For the SP similarity, two concepts will be considered for comparison, if
the following holds:</p>
      <p>SP(c1; c2)
) jdepthm(c1)</p>
      <p>2D len(c1; c2)
, 2D
depthm(c2)j 2D(1
)
(5)</p>
      <sec id="sec-3-1">
        <title>For the WP similarity, the following must hold:</title>
        <sec id="sec-3-1-1">
          <title>Algorithm 1: compare(sCon; tCon;</title>
          <p>V DAG) for SP or LCH</p>
          <p>Input: source concept sCon, target
concept tCon, and a vocabulary
DAG VDAG</p>
          <p>Output: a similarity value
1 D V DAG:getMaxDepth(sCon)
2 hps1 getP aths(sCon; V DAG)
3 hps2 getP aths(tCon; V DAG)
4 minLength</p>
          <p>getMinLength(hps1; hps2)
5 Return</p>
          <p>computeSimilarity(D; minLength)</p>
        </sec>
        <sec id="sec-3-1-2">
          <title>Algorithm 2: getM inLength(hps1, hps2)</title>
          <p>Input: two sets of hypernym paths, hps1
and hps2</p>
          <p>Output: len(sCon; tCon)
1 size MAX V ALUE
2 foreach hp1 2 hps1 do
3 foreach hp2 2 hps2 do
4 l1 0, l2 0
5 while l1 &lt; hp1:size() ^ l2 &lt;
hp2:size() ^ hp1:get(l1) ==
hp2:get(l2) do
6 l1 l1 + 1, l2 l2 + 1
7
8
newSize
hp1:size() + hp2:size()
if newSize &lt; size then
size newSize ;
2l1</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>9 Return size</title>
        <sec id="sec-3-2-1">
          <title>Algorithm 3: compare(sCon; tCon;</title>
          <p>V DAG) for WP or LI</p>
          <p>Input: source concept sCon, target
concept tCon, and a vocabulary
DAG VDAG</p>
          <p>Output: a similarity value
1 hps1 getP aths(sCon; V DAG)
2 hps2 getP aths(tCon; V DAG)
3 depth; N1; N2 getLSO(hps1; hps2)
4 Return</p>
          <p>computeSimilarity(N1; N2; depth)</p>
        </sec>
        <sec id="sec-3-2-2">
          <title>Algorithm 4: getLSO(hps1; hps2)</title>
          <p>Input: two sets of hypernym paths, hps1</p>
          <p>and hps2
Output: depthM (lso(sCon; tCon)), N1</p>
          <p>and N2
1 dLSO 0, N1 0, N2 0
2 foreach hp1 2 hps1 do
3 foreach hp2 2 hps2 do
4 l1 0, l2 0
5 while l1 &lt; hp1:size() ^ l2 &lt;
hp2:size() ^ hp1:get(l1) ==
hp2:get(l2) do
6 l1 l1 + 1, l2 l2 + 1
newSize
hp1:size() + hp2:size()
oldSize N1 + N2
if condition is met then
dLSO l1,
N1 hp1:size() l1
N2 hp2:size() l2</p>
          <p>2l1
7
, 2depthM (lso(c1; c2))
) N1 + N2
, N1 + N2</p>
          <p>2depthM (lso(c1; c2))
, 2depthM (lso(c1; c2)) + N1 + N2
(N1 + N2) + 2 depthM (lso(c1; c2))</p>
          <p>2depthM (lso(c1; c2))(1 )
Based on the triangle inequality and Section 2, Equation 6 can be written as:
(7)
(8)
For the LCH similarity, two concepts will be considered for comparison, iff:
LCH(c1; c2) , lologgle(n2(D2cD1);c2) , log(2D) loglo(g2(Dle)n(c1; c2))
,
1
(9)
Based on Equations 5, 7, 8 and 9, each filtering condition requires the knowledge of
depthm(sCon), depthM (sCon), depthm(tCon) and depthM (tCon). Hence, we
further extend the index hECATE-IF relies on by precomputing depthm(ci) and depthM (ci)
for every concept ci.
4</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Evaluation</title>
      <p>Our evaluation addresses the following three research questions: Q1: How do our
strategies for improving the runtime of semantic similarities compare to each other w.r.t.
runtime?, Q2: How do the different edge-counting semantic similarities compare w.r.t.
runtime?, and Q3: Can semantic similarities improve the F-measure of LD systems?</p>
      <p>
        We evaluate our approach against five benchmark data sets: Abt-Buy, Amazon-GP
and DBLP-ACM described in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], DailyMed-Drugbank (dubbed DM-DB) and Movies
described in [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. We use WordNet6 as a DAG. To address Q1 and Q2, we conduct a set
of experiments using the basic hECATE algorithm (dubbed hECATE-B) as a baseline
as well as hECATE-I and hECATE-IF. For an easier comparison, all methods are
implemented in the LD framework LIMES [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. For hECATE-B and hECATE-I, we
create one atomic LS for each semantic similarity, where m iss the name of the
edgecounting similarity, = 0:1. We use the ’description’ as the source and target
properties for Abt-Buy and Amazon-GP datasets, ’title’ for the DBLP-Scholar and
6 https://wordnet.princeton.edu/
      </p>
      <p>Movies datasets and ’name’ for the DM-DB dataset. For hECATE-IF, we use the
same values for m, ps and pt as before, but is derived from the interval [0:1; 1] with
an increment step of 0:1, since the is given as a parameter to the filtering functions.
For each dataset, we perform the aforementioned LSs against 2v instances from the
source and target datasets. We start with v = 2 and increment v until all instances are
covered (e.g., the maximal value of v is 9 for the Amazon-Google dataset). We define
a maximum runtime for each LS of 2 hrs. Each experiment is executed 3 times and we
present the average values.</p>
      <p>
        As explained in Section 1, the second goal of this work is to evaluate edge-counting
semantic similarities in LD in terms of accuracy. Consequently, for Q3, we use the
hECATE extension with the best runtime performance based on the results of Q1
and executed a set of experiments using 2 machine learning (ML) algorithms:
WOMBAT [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ] and DRAGON [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. We choose these two approaches because (1) they achieve
state-of-the-art performance while being deterministic, (2) they are open-source,
meaning our experiments can be easily reproduced and (3) they are able to generate complex
link specifications with any arbitrary number of measures. We perform a 10-fold cross
validation by allowing WOMBAT and DRAGON to use only string similarities (StrSim),
only semantic similarities (SmtSim) and a combination of both (StrSmtSim) as input.
We use the levenshtein, cosine and qgrams similarity measures for strings
implemented in LIMES [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. For each dataset, we use all properties apart from those
that corresponded to numeric values. WOMBAT is configured as presented in [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ] and
DRAGON is configured as presented in [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. We use two termination criteria for
WOMBAT: Either a LS with F-measure of 1 is found or a maximal depth of refinement of 10
is reached. For the string similarities, WOMBAT produced LSs with a minimum value
of 0:4 and for the semantic similarities, the minimum value is set to 0:7. DRAGON
terminates either when no new nodes are found or when the height of the decision
tree reached the maximum of 3. Additionally, we compare the achieved F1 scores with
scores for EAGLE [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], EUCLID [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], J48 [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] reported by [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], a Multilayer Perceptron
classifier reported by [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ] and the Pessimistic as well as Re-weighted versions of the
work presented at [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>As expected, Figure 1 shows that hECATE-B has the highest runtimes compared
to hECATE-I and hECATE-IF in all datasets, except DM-DB. This supports the claim
that semantic similarities typically scale poorly. The results show that both extensions
improve the runtime of all semantic similarities, making them more amenable for LD
and scalable for larger datasets. Precisely, LCH’s, WP’s and SP’s runtimes improve by
71% and 57% on average when hECATE-I and hECATE-IF strategies are used resp.
LI has the least improvement by 65% and 50%. Comparing the two extensions, in all
datasets and for all semantic similarities, hECATE-I outperforms hECATE-IF by 30%
on average. A detailed analysis of the runtimes shows that even though hECATE-IF
reduces the number of comparisons between semantically different concepts and thus
the comparison time, the additional runtime cost of filtering creates an overhead that
results in a worse total execution time than hECATE-I (Table 1). Regarding the DM-DB
dataset, the only property for both source and target datasets, name, consists of only one
value, which corresponds to the official name of a drug. That value can only be
associated with one concept. As a result, introducing an indexing and/or filtering technique
2E01
362E01
9
,T04=11EE0011
6
1
S=1E01
r
foe8E00
tnm6E00
i
u
laR4E00
r
ve2E00
O
0E00 LCH
(a) Abt-Buy
hECATE-B hECATE-I hECATE-IF
(b) Amazon-GP</p>
      <p>(c) DM-DB
hECATE-B hECATE-I hECATE-IF
3E03
922E03
0
1
,T18=22EE0033
0
1=2E03
S
ro1E03
iftenum81EE0023
laR5E02
r
ve3E02
O
0E00 LCH
3E03
942E03
2
2
,T61=22EE0033
6
2=2E03
S
ro1E03
iftenum81EE0023
laR5E02
r
ve3E02
O
0E00 LCH
4E03
213E03
5
,T21=23EE0033
5
S=2E03
r
fo2E03
e
im1E03
t
n
uR1E03
lra7E02
e
vO4E02
0E00 LCH
2E02
422E02
0
1
,T71=22EE0022
0
1=1E02
S
ro1E02
iftenum17EE0021
laR5E01
r
ve2E01
O
0E00 LCH
Li</p>
      <p>Wu-Palmer Shortest Path</p>
      <p>Li</p>
      <p>Wu-Palmer Shortest Path
(d) DBLP-ACM</p>
      <p>(e) Movies
produces an unnecessary overhead. Overall, Q1 can be answered with hECATE-I being
the most efficient approach.</p>
      <p>To answer Q2 we compare the runtimes of the single semantic similarities revealing
that LI has the worst runtime (see Figure 1). For the Movies dataset, we notice that
hECATE-I requires 100K more token comparisons for LI compared to the other
similarities (Table 1). The better runtime of the other similarities is caused by a condition
inside our algorithm which stops as soon as two tokens/concepts have a similarity of 1.
In contrast to the other similarities, LI(c1; c2) 2 (0; 1), i.e., it can never be 1. However,
based on Table 1, LI’s runtime shows a great improvement as the values of increase
in relation to the other metrics. This justifies the fact that the runtimes for LI have the
highest standard deviation, whereas SP, LCH and WP are less influenced by the
different values of . The answer for Q2 is that for all hECATE strategies, SP is the fastest
similarity, whereas LI is the slowest.</p>
      <p>
        To answer Q3, we add the 4 edge-counting measures LI, WP, SP, and LCH to the
state-of-the-art algorithms WOMBAT [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ] and DRAGON [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. We evaluate their
performance with and without string similarities using a ten-fold cross validation. Table 2
shows the results of our experiments with these machine-learning algorithms. In the 6
right most columns of Table 2, we report the F1 score of the string-based LD algorithms.
While the performance of DRAGON remained the same or even worsened for 3 of the
5 datasets, adding semantic similarities to the WOMBAT algorithm improved its overall
performance for 3 datasets by up to 6% F-measure absolute. As expected, this effect is
most pronounced in datasets which rely on long textual descriptions such as
AmazonGP. A look into the specifications learned by WOMBAT suggests that this effect is due
to the approach combining semantic and string similarities using operators such as t
and learning the correct threshold for each of these measures. The improvement on the
DM-DB datasets is achieved using the n operator, not allowing semantically similar
concepts to be matched together. This refutes current results (see [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] where the same
similarities have been used) and suggests that the refinement operators can combine
semantic and string similarities in a way that improves the F-measure. For enabling a
comparison with [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], we used the same configuration setting and report the maximum
F-measure in Table 3. It can be seen that WOMBAT outperforms the Pessimistic and
Re-weighted methods on the majority of the datasets.
      </p>
    </sec>
    <sec id="sec-5">
      <title>Related Work</title>
      <p>
        We give a brief overview of linking approaches which use semantic similarities. An
exhaustive list of frameworks can be found in [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. Over the past few years, semantic
similarities were used in ontology matching (OM) [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ]. In this context, concepts in two
ontologies O1 and O2 are often matched based on a third ontology, e.g., WordNet. This
ontology can be viewed as a background knowledge source or a mediating ontology [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
Frameworks such as AgreementMaker [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], Zhishi.links [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] and RuleMiner [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] utilize
semantic similarities in this way to improve structural matching on the ontology level.
While these enhancements have a positive effect on their instance level matching, to the
best of our knowledge no instance linking tool has used semantic similarities directly
and shown an improvement of the overall linking results. [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] compare the effect of a
predefined set of combinations of string and semantic similarities for label comparison
and suggest that semantic similarities do not improve the F-measure of the instance
matching task. Our results suggest the contrary by showing that dataset-specific
combinations of measures actually can achieve a better performance.
6
      </p>
    </sec>
    <sec id="sec-6">
      <title>Conclusions and Future Work</title>
      <p>To study the effect of semantic similarities on LD, we presented hECATE, a generic
framework for improving the runtime of edge-counting semantic similarities. Our
evaluation of the framework shows that there is still a lot of potential in improving the
runtime of semantic similarities for LD. We used hECATE to evaluate the performance
of string similarities in LD on five datasets. Our evaluation shows that combining
semantic similarities with string similarities can indeed increase the F-measure achieved
by LD algorithms. This result is of central importance as it goes against current
assumptions. The reason why we are indeed able to use semantic similarities for improving the
F-measure of LD in some cases lies in the refinement operator employed by WOMBAT.
In future works, we will investigate means that will allow improving the runtimes of
semantic similarities, extend our works beyond edge-counting similarities and aim to
classify datasets w.r.t. how suitable they are for semantic similarities.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Bizer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Volz</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kobilarov</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gaedke</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <string-name>
            <surname>Silk - A Link Discovery</surname>
          </string-name>
          <article-title>Framework for the Web of Data</article-title>
          .
          <source>In: 18th International World Wide Web Conference (April</source>
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Cross</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Silwal</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Morell</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Using a reference ontology with semantic similarity in ontology alignment</article-title>
          .
          <source>In: Proceedings of the 3rd ICBO</source>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Cruz</surname>
            ,
            <given-names>I.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Antonelli</surname>
            ,
            <given-names>F.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stroe</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          : Agreementmaker:
          <article-title>Efficient matching for large real-world schemas and ontologies</article-title>
          .
          <source>PVLDB 2</source>
          ,
          <fpage>1586</fpage>
          -
          <lpage>1589</lpage>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Euzenat</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ferrara</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Meilicke</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nikolov</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pane</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Scharffe</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shvaiko</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stuckenschmidt</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <article-title>Sˇva´b-</article-title>
          <string-name>
            <surname>Zazamal</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          , Sva´tek, V., et al.:
          <article-title>Results of the ontology alignment evaluation initiative 2010</article-title>
          .
          <article-title>Tech. rep</article-title>
          ., University of Trento (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Holmes</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Donkin</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Witten</surname>
            ,
            <given-names>I.H.</given-names>
          </string-name>
          :
          <article-title>Weka: a machine learning workbench</article-title>
          .
          <source>In: Proceedings of ANZIIS '94</source>
          . pp.
          <fpage>357</fpage>
          -
          <lpage>361</lpage>
          (
          <year>Nov 1994</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Kejriwal</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Miranker</surname>
            ,
            <given-names>D.P.</given-names>
          </string-name>
          :
          <article-title>Semi-supervised instance matching using boosted classifiers</article-title>
          .
          <source>In: The Semantic Web. Latest Advances and New Domains</source>
          . pp.
          <fpage>388</fpage>
          -
          <lpage>402</lpage>
          . Springer International Publishing (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7. Ko¨pcke, H.,
          <string-name>
            <surname>Thor</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rahm</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          :
          <article-title>Evaluation of Entity Resolution Approaches on Real-world Match Problems</article-title>
          .
          <source>Proc. VLDB Endow</source>
          .
          <volume>3</volume>
          (
          <issue>1-2</issue>
          ),
          <fpage>484</fpage>
          -
          <lpage>493</lpage>
          (
          <year>Sep 2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Leacock</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chodorow</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>Combining local context and wordnet similarity for word sense identification (01</article-title>
          <year>1998</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bandar</surname>
            ,
            <given-names>Z.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McLean</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>An approach for measuring semantic similarity between words using multiple information sources</article-title>
          .
          <source>IEEE Trans. on Knowl. and Data Eng</source>
          .
          <volume>15</volume>
          (
          <issue>4</issue>
          ),
          <fpage>871</fpage>
          -
          <lpage>882</lpage>
          (
          <year>Jul 2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Malyshev</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , Kro¨tzsch,
          <string-name>
            <given-names>M.</given-names>
            , Gonza´lez, L.,
            <surname>Gonsior</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Bielefeldt</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          :
          <article-title>Getting the most out of wikidata: Semantic technology usage in wikipedia's knowledge graph</article-title>
          .
          <source>In: International Semantic Web Conference</source>
          . pp.
          <fpage>376</fpage>
          -
          <lpage>394</lpage>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>McCrae</surname>
            ,
            <given-names>J.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Buitelaar</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Linking Datasets Using Semantic Textual Similarity</article-title>
          .
          <source>CYBERNETICS AND INFORMATION TECHNOLOGIES 18</source>
          (
          <issue>1</issue>
          ),
          <fpage>109</fpage>
          -
          <lpage>123</lpage>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Nentwig</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hartung</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Ngonga</given-names>
            <surname>Ngomo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.C.</given-names>
            ,
            <surname>Rahm</surname>
          </string-name>
          , E.:
          <article-title>A survey of current link discovery frameworks</article-title>
          .
          <source>Semantic</source>
          Web pp.
          <fpage>1</fpage>
          -
          <lpage>18</lpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Ngomo</surname>
            ,
            <given-names>A.C.N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lyko</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Unsupervised learning of link specifications: deterministic vs. non-deterministic</article-title>
          .
          <source>In: OM</source>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <given-names>Ngonga</given-names>
            <surname>Ngomo</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.C.</surname>
          </string-name>
          :
          <article-title>On Link Discovery using a Hybrid Approach</article-title>
          .
          <source>Journal on Data Semantics</source>
          <volume>1</volume>
          (
          <issue>4</issue>
          ),
          <fpage>203</fpage>
          -
          <lpage>217</lpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <given-names>Ngonga</given-names>
            <surname>Ngomo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.C.</given-names>
            ,
            <surname>Lyko</surname>
          </string-name>
          ,
          <string-name>
            <surname>K.</surname>
          </string-name>
          :
          <article-title>Eagle: Efficient active learning of link specifications using genetic programming</article-title>
          .
          <source>In: The Semantic Web: Research and Applications</source>
          . pp.
          <fpage>149</fpage>
          -
          <lpage>163</lpage>
          . Springer Berlin Heidelberg, Berlin, Heidelberg (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <given-names>Ngonga</given-names>
            <surname>Ngomo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.C.</given-names>
            ,
            <surname>Lyko</surname>
          </string-name>
          ,
          <string-name>
            <surname>K.</surname>
          </string-name>
          :
          <article-title>Unsupervised learning of link specifications: deterministic vs. non-deterministic</article-title>
          .
          <source>In: Proceedings of the Ontology Matching Workshop</source>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Niu</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rong</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>An effective rule miner for instance matching in a web of data</article-title>
          .
          <source>In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management</source>
          . pp.
          <fpage>1085</fpage>
          -
          <lpage>1094</lpage>
          . CIKM '12,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          , New York, NY, USA (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Niu</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rong</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , Zhang,
          <string-name>
            <given-names>Y.</given-names>
            ,
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <surname>H.</surname>
          </string-name>
          :
          <article-title>Zhishi.links results for OAEI 2011</article-title>
          . Ontology Matching p.
          <volume>220</volume>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Obraczka</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ngomo</surname>
            ,
            <given-names>A.C.N.</given-names>
          </string-name>
          :
          <article-title>Dragon: Decision tree learning for link discovery</article-title>
          .
          <source>In: 19TH International Conference On Web Engineering</source>
          . Springer (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Rada</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mili</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bicknell</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Blettner</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Development and application of a metric on semantic nets</article-title>
          .
          <source>IEEE Trans. Systems, Man, and Cybernetics</source>
          <volume>19</volume>
          ,
          <fpage>17</fpage>
          -
          <lpage>30</lpage>
          (
          <year>1989</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21. Rodr´ıguez,
          <string-name>
            <given-names>M.A.</given-names>
            ,
            <surname>Egenhofer</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.J.:</surname>
          </string-name>
          <article-title>Determining semantic similarity among entity classes from different ontologies</article-title>
          .
          <source>IEEE Trans. on Knowl. and Data Eng</source>
          .
          <volume>15</volume>
          (
          <issue>2</issue>
          ),
          <fpage>442</fpage>
          -
          <lpage>456</lpage>
          (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Saleem</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ali</surname>
            ,
            <given-names>M.I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Verborgh</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Ngonga</given-names>
            <surname>Ngomo</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.C.</surname>
          </string-name>
          :
          <article-title>Federated query processing over linked data</article-title>
          .
          <source>In: Tutorial at ISWC</source>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Sherif</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Ngonga</given-names>
            <surname>Ngomo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.C.</given-names>
            ,
            <surname>Lehmann</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.:</surname>
          </string-name>
          <article-title>WOMBAT - A Generalization Approach for Automatic Link Discovery</article-title>
          .
          <source>In: 14th Extended Semantic Web Conference</source>
          , Portorozˇ, Slovenia, 28th May - 1st
          <source>June</source>
          <year>2017</year>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Soru</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ngomo</surname>
            ,
            <given-names>A.C.N.:</given-names>
          </string-name>
          <article-title>A comparison of supervised learning classifiers for link discovery</article-title>
          .
          <source>In: Proceedings of the 10th Intern. Conf. on Semantic Systems</source>
          . pp.
          <fpage>41</fpage>
          -
          <lpage>44</lpage>
          . ACM (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <surname>Usbeck</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Ngonga</given-names>
            <surname>Ngomo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.C.</given-names>
            ,
            <surname>Haarmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Krithara</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          , Ro¨der,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Napolitano</surname>
          </string-name>
          , G.:
          <article-title>7th open challenge on question answering over linked data (QALD-7)</article-title>
          .
          <source>In: Semantic Web Evaluation Challenge</source>
          . pp.
          <fpage>59</fpage>
          -
          <lpage>69</lpage>
          . Springer International Publishing (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Palmer</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Verbs semantics and lexical selection</article-title>
          .
          <source>In: Proceedings of the 32Nd Annual Meeting on Association for Computational Linguistics</source>
          . pp.
          <fpage>133</fpage>
          -
          <lpage>138</lpage>
          . ACL '
          <volume>94</volume>
          ,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computational Linguistics, Stroudsburg, PA, USA (
          <year>1994</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>