=Paper=
{{Paper
|id=Vol-2873/paper8
|storemode=property
|title=Embedding-Assisted Entity Resolution for Knowledge Graphs
|pdfUrl=https://ceur-ws.org/Vol-2873/paper8.pdf
|volume=Vol-2873
|authors=Daniel Obraczka,Jonathan Schuchart,Erhard Rahm
|dblpUrl=https://dblp.org/rec/conf/esws/ObraczkaSR21
}}
==Embedding-Assisted Entity Resolution for Knowledge Graphs==
<pdf width="1500px">https://ceur-ws.org/Vol-2873/paper8.pdf</pdf>
<pre>
       Embedding-Assisted Entity Resolution for
                Knowledge Graphs

Daniel Obraczka[0000−0002−0366−9872] , Jonathan Schuchart[0000−0003−3507−8150] ,
                   and Erhard Rahm[0000−0002−2665−1114]

                           Leipzig University, Germany
              {obraczka,schuchart,rahm}@informatik.uni-leipzig.de


        Abstract. Entity Resolution (ER) is a main task for integrating differ-
        ent knowledge graphs in order to identify entities referring to the same
        real-world object. A promising approach is the use of graph embeddings
        for ER in order to determine the similarity of entities based on the simi-
        larity of their graph neighborhood. Previous work has shown that the use
        of graph embeddings alone is not sufficient to achieve high ER quality.
        We therefore propose a more comprehensive ER approach for knowledge
        graphs called EAGER (E mbedding-Assisted Knowledge Graph E ntity
        Resolution) to flexibly utilize both the similarity of graph embeddings
        and attribute values within a supervised machine learning approach and
        that can perform ER for multiple entity types at the same time. Fur-
        thermore, we comprehensively evaluate our approach on 19 benchmark
        datasets with differently sized and structured knowledge graphs and use
        hypothesis tests to ensure statistical significance of our results. We also
        compare our approach with state-of-the-art ER solutions, where EAGER
        yields competitive results for shallow knowledge graphs but much better
        results for deeper knowledge graphs.

        Keywords: Knowledge Graph · Knowledge Graph Embedding · Entity
        Resolution


1     Introduction

Knowledge Graphs (KGs) store real-world facts in machine-readable form. This
is done by making statements about entities in triple form (entity, property,
value). For example the triple (Get Out, director, Jordan Peele) tells us
that the director of the movie ”Get Out” is ”Jordan Peele”. Such structured
information can be used for a variety of tasks such as recommender systems,
question answering and semantic search. For many KG usage forms including
question answering it is beneficial to integrate KGs from different sources. An
integral part of this integration is entity resolution (ER), where the goal is to
find entities which refer to the same real-world object.

    Copyright© 2021 for this paper by its authors. Use permitted under Creative Com-
    mons License Attribution4.0 International (CC BY 4.0).
    Existing ER systems mostly focus on matching entities of one specific entity
type (e.g. publication, movie, customer etc.) and assume matched schemata for
this entity type. This proves challenging when trying to use these systems for ER
in KGs typically consisting of many entity types with heterogeneous attribute
(property) sets. See Figure 1 for an example showcasing the many different
challenges of this task, such as heterogeneous date representations (”1979-02-21”
vs. ”21 Febuary 1979”), differing URIs (”dbr:Jordan Peele” vs. ”wd: Q3371986”),
schemata (”dbo:birthdate” vs. ”wdt:P569”) and overall information contained.


               DBpedia                                          Wikidata

                                            dbo:Film           wd:Q11424
             "Get Out"^^xsd:String                                          "Get Out"^^xsd:String

                               rdfs:label        rdf:type        wdt:P31     rdfs:label


                                            dbr:Get_Out
                                                               wd:Q25136235
                                 dbo:starring
                                                dbo:director
                                                                 wdt:P57    wdt:P161

             dbr:Allison_Williams                                                   wd:Q510970
                                     dbr:Jordan_Peele           wd:Q3371986

                  rdfs:label                                           rdfs:label
                                                 rdfs:label
                                                                                          rdfs:label
                 "Allison         dbo:birthDate "Jordan         wdt:P569     "Jordan
                 Williams                            Peele"                   Peele"
                (actress)"                                                                  "Daniel
                                 1979-02-                                                  Kaluuya"
                                                               21 February 1979
                               21^^xsd:date


Fig. 1: Subgraphs of DBpedia and Wikidata. Green dashed lines show entities
that should be matched. Some URIs are shortened for brevity.


    We observe there are entities of different types (film, director, actor) and
different attributes with heterogeneous value representations (e.g., birth date
values ”1979-02-21” in DBpedia and ”21 Febuary 1979” in Wikidata for two
matching director entities). Moreover, we see that matching entities such as
the movie ”Get Out” have different URIs and differently named edges referring
to properties and related entities, e.g. rdf:type vs. wdt:P31. These aspects
make a traditional schema (property) matching as a means to simplify ER very
challenging so that entity resolution for KGs should ideally not depend on it.
Given that URIs and property names may not show any similarity it becomes
apparent that the graph structure and related entities should be utilized in the
ER process, e.g., to consider the movie label and director to match movies.
    A promising way to achieve this in a generic manner, applicable to virtually
any entity type, is the use of graph embeddings. By encoding the entities of
the KGs into a low-dimensional space such approaches alleviate the obstacles
posed by the aforementioned KG heterogeneities. Capturing the topological and
semantic relatedness of entities in a geometric embedding space enables the
use of these embeddings as inputs for machine learning (ML) algorithms. The
performance of graph embedding approaches for ER has been recently studied
by Sun et. al [28]. However, as they point out, most approaches focus on refining
the embedding process, while ER mostly consists of finding the nearest neighbors
in the embedding space. Hence, the use of graph embeddings has to be tailored
to the ER task for good effectiveness. We build on the findings of [28] and
investigate the usefulness of learned graph embeddings as input for ML classifiers
for entity resolution. While there are different settings for KG integration, such
as enhancing a given KG or KG fusion, we focus here on the simple ER setting,
i.e., finding matching entities in two data sources. The resulting match mappings
can then be used for applications such as question answering or as input for KG
fusion.
     In this paper, we propose and comprehensively evaluate the first (to our
knowledge) graph embedding supported ER system named EAGER: Embedding
Assisted Knowledge Graph Entity Resolution. It uses both knowledge graph
embeddings and attribute similarities as inputs for an ML classifier for generic
entity resolution with several entity types. EAGER utilizes different kinds of
graph embeddings, specifically the ones that performed best in [28], as well as
different ML classifiers. We comprehensively evaluate the match effectiveness of
EAGER with using graph embeddings and attribute similarities either alone or in
combination for 19 datasets of varying size and structure. Some of these ER tasks
with multiple entity types from the movie domain have been newly developed
for this work. The evaluation also includes a comparison of EAGER with state-
of the-art ER approaches, namely Magellan [16] and DeepMatcher [19]. All our
results are analyzed using hypothesis tests to ensure statistical significance of
our findings.
     We first discuss related work followed by an overview of EAGER. Section 4
describes the used datasets including the new benchmarks from the movie do-
main. Our evaluation is presented in Section 5 and we conclude in Section 6.


2   Related Work

Entity resolution has attracted a significant amount of research, sometimes under
different names such as record linkage [8,10], link discovery [29,26] or dedupli-
cation [25]. We focus on the discussion of the most related ER approaches. and
refer to surveys and books such as [11,20,6] for a more thorough overview. Tra-
ditional ER approaches rely on learning distance- or similarity-based measures
and then use a threshold or classifier to decide about whether two entities are the
same. These classifiers can be unsupervised [22,23], supervised [26,14] or employ
active learning [25,21]. For example the Magellan Framework [16] provides su-
pervised ML classifiers and provides extensive guides for the entire ER process.
Recently, deep learning has seen some success in certain settings. DeepER [9]
and DeepMatcher [19] provide a variety of different architectures and among
other aspects, such as attribute similarities, use word embeddings as inputs for
these networks. Both frameworks have shown that especially for unstructured
textual data deep learning can outperform existing frameworks.
    Collective ER approaches try to overcome the limitations of the more con-
ventional attribute-based methods. This paradigm uses the relationships be-
tween entities as additional information and in some cases even considers pre-
vious matching decisions in the neighborhood. Bhattacharya and Getoor [3]
show that using the neighborhood of potential match candidates in addition
to attribute-based similarity is especially useful for data with many ambiguous
entities. SiGMa [17] uses an iterative graph propagation algorithm relying on
relationship information as well as attribute-based similarity between graphs to
integrate large-scale knowledge bases. Pershina et al. [24] propagate similari-
ties using Personalized PageRank and are able to align industrial size knowl-
edge graphs. Zhu et al. [32] reformulate entity resolution as multi-type graph
summarization problem and use attribute-based similarity as well as structural
similarity, i.e. connectivity patterns in the graph.
    More recently the use of graph embeddings has been shown promising for
the integration of KGs. An overview of currently relevant approaches that solely
rely on embedding techniques can be found in [28], some of these techniques
have been used in this work and will be discussed in more detail in Section
3.3. Knowledge graph embedding (KGE) models typically aim to capture the
relationship structure of each entity in latent vector representations in order to be
used for further downstream applications. For an overview of current knowledge
graph embedding approaches we refer the reader to a recent survey from Ali et
al. [1].
    EAGER aims to combine the two generally separate ER approaches of graph
embedding techniques and traditional attribute- based methods for the integra-
tion of KGs with multiple entity types without relying on additional schema
matching or any structural assumptions about the entities. Our extensive eval-
uation for a large spectrum of KGs demonstrates the viability of the proposed
approach.


3   Overview of EAGER


                     Fig. 2: Schematic summary of EAGER
    In this section we present an overview of the EAGER approach for ER in
knowledge graphs and the specific approaches and configurations we will evalu-
ate. We start with a formal definition of the ER problem and an overview of the
EAGER workflow. Subsequently we explain how we create the input vector for
our ML classifiers and conclude with a discussion of the prediction step.

3.1   Problem statement
KGs are constructed by triples in the form of (entity, property, value), where
property can be either a attribute property or a relationship and value a literal
or another entity, respectively. Therefore, a KG is a tuple KG = (E, R, A, L, T ),
where E is the set of entities, A the set of attribute properties, R the set of
relationship properties, L the set of literals and T is the set of triples. We
distinguish attribute triples TA and relationship triples TR , where TA : E × A ×
L are triples connecting entities and literals, e.g. (dbr:Jordan Peele, dbo:
birthDate, "1979-02-21") and TR : E × R × E connect entities, e.g. (dbr:
Get Out, dbo:director, dbr:Jordan Peele) as seen in Figure 1. Our goal is
to find a mapping between entities of two KGs. More formally, we aim to find
M = {(e1 , e2 ) ∈ E1 × E2 |e1 ≡ e2 }, where ≡ refers to the equivalence relation.
Furthermore, we assume we are provided with a subset of the mapped entities
MT ⊆ M as training data, which is also sometimes referred to as seed alignment
in the literature.

3.2   Overview
The remaining chapter is dedicated to illustrate how our approach tackles en-
tity resolution in heterogeneous KGs. A schematical overview can be found in
Figure 2. Given two KGs KG 1 , KG 2 and a set of initial matches MT we create
a feature vector for each match (e1 , e2 ) ∈ MT to train a machine learning clas-
sifier. Additionally to the positive matches provided in MT we sample negative
examples by sampling random pairs (e1 , e2 ) ∈ / MT to create a balanced set of
positive and negative examples. After the training step the classifier then acts as
an oracle to answer specific alignment queries, i.e. entity pairs, in order to make
a prediction. In the following we present our approach in more detail.

3.3   Input Vector Creation
Since schemata across different KGs may differ wildly, creating a schema match-
ing before ER in heterogeneous KGs is difficult and can introduce additional
sources for error. Keeping the focus on the matching process, we chose to concate-
nate all attribute values of each entity into a single string and used 3 similarity
measures for comparisons: Levenshtein, Generalized Jaccard with an Alphanu-
meric Tokenizer, which returns the longest strings of alphanumeric characters,
  The code for EAGER and our experiments can be found in https://github.com/
  jonathanschuchart/eager
and Trigrams with the Dice coefficient. The second part of the input vector
consists of KGEs. Given that the focus of this study lies not on the creation of
embeddings itself, our approach can take any entity embeddings that are embed-
ded in the same space. Since most KG embedding frameworks are not specialized
for ER, we use OpenEA which was developed by Sun et al. for their 2020 bench-
mark study[28]. It offers a variety of embedding approaches and embeds entities
into the same space. Specifically, we chose three of the best approaches of said
study, namely BootEA, MultiKE and RDGCN:

BootEA Sun et al. in 2018 [27] based their approach on the TransE model and
combined it with elaborate bootstrapping and negative sampling techniques to
improve performance. TransE aims to find an embedding function φ that min-
imizes ||φ(eh ) + φ(r) − φ(et )|| for any (eh , r, et ) ∈ TR . Bootstrapping is done
by additionally sampling likely matching entities (resampled every few epochs
based on the current model) in order to increase the effective seed alignment
size. Additionally, negative relationship tuples are sampled and resampled ev-
ery few epochs based on the current model in order to improve the distinction
between otherwise similar entities. Since TransE is an unsupervised model, Sun
et al. proposed a new objective function which incorporates both the original
objective function of TransE and the likelihood of two entities from different
KGs matching.

MultiKE In order to also incorporate more than just relational information,
Zhang et al. [31] proposed a flexible model which combines different views on each
entity. Here, the name attribute, relations and all remaining attributes are em-
bedded separately, using pre-trained word2vec word embeddings [18] for names
and a variation on TransE for relations. Attribute embeddings are obtained by
training a convolutional neural network taking the attribute and attribute value
as input. All three embedding vectors are then combined into a single unified
embedding space. In this approach the two knowledge graphs are treated as one
combined graph where entities from the seed alignment are treated as equal.

RDGCN Different to the aforementioned approaches, Wu et al. [30] proposed
a new technique using two constructed conventional graphs and the GCN model
by Kipf and Welling with highways. Instead of learning embeddings for entities
and relations within one graph, RDGCN constructs a primary entity graph and
a dual relationship graph in order to alternate the optimization process between
the two. That way, the relationship representations from the dual graph are used
to optimize the entity representations from the primal graph and vice versa by
applying a graph attention mechanism. As the actual neigborhood information
of each entity is not fully exploited in this case, Wu et al. showed that feeding
the resulting entity representations into a GCN can help significantly improve
the overall embedding quality.
  https://github.com/nju-websoft/OpenEA
3.4    Combinations
As the aim of our study is to investigate to what degree combining entity em-
beddings with attribute similarities is superior to using either on their own, we
present three different variants of our approach, that only differ in the construc-
tion of their input vector: EAGERE contains solely the embeddings, EAGERA
consists exclusively of the attribute similarities and finally EAGERAkE which is
a concatenation of entity embeddings and attribute similarities. The respective
input vector is given to a classifier along with the seed alignment. In the evalua-
tion we achieved the best results using either a Multilayer Perceptron (MLP) [13]
or Random Forest (RF) [4], but any classifier can be used.

3.5    Prediction
The trained classifier is presented with alignment queries, i.e. pairs of entities
that it will have to classify as match or non-match. Choosing these pairs is a
non-trivial question since exploring all possible pairs would lead to a quadratic
number of alignment queries relative to the KG size, which is not scalable to large
datasets. Traditionally, blocking strategies are used to reduce the number of pairs
by a linear factor. Due to the heterogeneous nature of KGs new strategies for
this problem have to be found. An alternative could be to use the embeddings to
find a number of nearest neighbors, which is a scalable solution since the triangle
inequality in metric spaces can be exploited to reduce the number of comparisons
for the neighborhood search. Finding a good solution for this problem is however
out of scope for our study and in the experiments we therefore use the test data
to create prediction pairs, sampling negative examples randomly as done in the
training step. More on our experimental setup can be found in Section 5.1.


4     Datasets
To evaluate our approach we use multiple datasets that can generally be put
into two categories: rich and shallow graph datasets. The rich graph datasets
were presented in [28] and consist of samples from DBpedia (D), Wikidata (W)
and Yago (Y). Given their origin in web-scale KGs they offer a wide range of
relationships as well as entity types. The linking tasks include KG samples of
different density, size, as well as covering cross-lingual settings (EN-DE & EN-
FR).
    To investigate how the interplay of attribute similarities and graph embed-
dings fares in settings with less dense KGs we created a new benchmark dataset
with multiple entity types. These KGs are taken from the movie domain, where
the gold standard was hand-labeled for the five entity types Person, Movie,
TvSeries, Episode, Company. The movie datasets were created from three
sources containing information about movies and tv series: IMDB, TheMovieDB
    https://www.imdb.com/
    https://www.themoviedb.org/
and TheTVDB. We make the movie datasets publicly available for future re-
search at https://github.com/ScaDS/MovieGraphBenchmark. More details on
each of the datasets can be found in Table 1 and Table 2 respectively.


                         Table 1: Shallow graph datasets statistics
                        Datasets   KGs |R| |A| |TR | |TA | |E| |M|
                                  imdb 3 13 17532 25723 5129
                       imdb-tmdb                               1978
                                  tmdb 4 493 27903 24695 6056
                                  imdb 3 13 17532 25723 5129
                        imdb-tvdb                              2488
                                  tvdb 3 350 15455 21430 7810
                                  tmdb 4 493 27903 24695 6056
                       tmdb-tvdb                               2483
                                  tvdb 3 350 15455 21430 7810


                 Table 2: Rich graph datasets statistics, adapted from [28]
                                  V1                              V2
     Datasets KGs
                    |R| |A| |TR |     |TA |   |M| |R| |A| |TR |       |TA |   |M|
                 DB 248 342 38,265 68,258           167 175 73,983 66,813
            D-W                              15,000                          15,000
                 WD 169 649 42,746 138,246          121 457 83,365 175,686
                 DB 165 257 30,291 71,716            72 90 68,063 65,100
            D-Y                              15,000                          15,000
                 YG 28 35 26,638 132,114             21 20 60,970 131,151
    15K


                 EN 215 286 47,676 83,755           169 171 84,867 81,988
           EN-DE                             15,000                          15,000
                 DE 131 194 50,419 156,150           96 116 92,632 186,335
                 EN 267 308 47,334 73,121           193 189 96,318 66,899
           EN-FR                             15,000                          15,000
                 FR 210 404 40,864 67,167           166 221 80,112 68,779
                 DB 413 493 293,990 451,011         318 328 616,457 467,103
            D-W                             100,000                         100,000
                 WD 261 874 251,708 687,860         239 760 588,203 878,219
                 DB 287 379 294,188 523,062         230 277 576,547 547,026
            D-Y                             100,000                         100,000
    100K


                 YG 32 38 400,518 749,787            31 36 865,265 855,161
                 EN 381 451 335,359 552,750         323 326 622,588 560,247
           EN-DE                            100,000                         100,000
                 DE 196 252 336,240 716,615         170 189 629,395 793,710
                 EN 400 466 309,607 497,729         379 364 649,902 503,922
           EN-FR                            100,000                         100,000
                 FR 300 519 258,285 426,672         287 468 561,391 431,379


5          Evaluation

We discuss our results on the presented datasets, starting with a description
of the experiment setup, followed by the results on the shallow and rich graph
datasets, with a focus on investigating whether the use of attribute similarities
    https://www.thetvdb.com/
in combination with knowledge graph embeddings is beneficial for the respective
setting. Furthermore we compare our approach with state-of-the-art frameworks.

5.1       Setup
For the evaluation we use a 5-fold cross validation with a 7-2-1 split in accordance
with [28]: For each dataset pair the set of reference entity matches is divided into
70% testing, 20% training and 10% validation. For each split we sample negative
examples to create an equal share of positive and negative examples. The entire
process is repeated 5 times to create 5 different folds.
    For the OpenEA datasets the graph embeddings were computed using the
hyperparameters given by the study of [28]. For all other datasets the *-15K
parameter sets were used. For the classifiers, mostly scikit-learn’s default param-
eters were used, though Random Forest Classifier was used with 500 estimators
and MLP used two hidden layers of size 200 and 20. Furthermore, MLP was
trained using the Adam [15] optimizer with α = 10−5 .

5.2       Results


Table 3: Averaged F-measure on test set of rich graph datasets. The best value
in a row is highlighted. For average rank the best 3 values of the compared ranks
are highlighted
                          EAGERAkE                                          EAGERE
                                                   EAGERA
       Dataset   BootEA     MultiKE RDGCN                        BootEA     MultiKE      RDGCN
               MLP RF MLP RF MLP RF MLP RF MLP RF MLP RF MLP RF
   imdb-tmdb 0.967 0.977 0.988 0.984 0.969 0.975 0.979 0.980 0.874 0.859 0.911 0.913 0.874 0.873
    imdb-tvdb 0.938 0.960 0.973 0.967 0.940 0.953 0.965 0.960 0.821 0.786 0.873 0.844 0.807 0.792
   tmdb-tvdb 0.973 0.977 0.983 0.981 0.966 0.977 0.980 0.978 0.874 0.844 0.871 0.877 0.857 0.831
      D-W(V1) 0.775 0.668 0.881 0.858 0.805 0.842 0.827 0.828 0.764 0.678 0.853 0.871 0.718 0.707
      D-W(V2) 0.934 0.841 0.945 0.918 0.897 0.890 0.868 0.870 0.938 0.847 0.939 0.942 0.808 0.796
      D-Y(V1) 0.870 0.775 0.986 0.982 0.974 0.986 0.972 0.971 0.837 0.746 0.952 0.941 0.947 0.953
      D-Y(V2) 0.983 0.908 0.995 0.993 0.977 0.991 0.978 0.978 0.975 0.888 0.973 0.971 0.947 0.960
15K


     EN-DE(V1) 0.923 0.852 0.986 0.984 0.966 0.976 0.947 0.945 0.891 0.798 0.957 0.950 0.937 0.955
     EN-DE(V2) 0.970 0.918 0.992 0.990 0.968 0.978 0.956 0.955 0.946 0.875 0.961 0.958 0.934 0.956
     EN-FR(V1) 0.868 0.736 0.978 0.973 0.950 0.963 0.922 0.920 0.806 0.709 0.952 0.942 0.907 0.935
     EN-FR(V2) 0.965 0.876 0.991 0.989 0.963 0.977 0.937 0.936 0.942 0.875 0.977 0.978 0.921 0.948
      D-W(V1) 0.873 0.850 0.887 0.862 0.768 0.774 0.810 0.811 0.868 0.820 0.850 0.871 0.645 0.556
      D-W(V2) 0.962 0.927 0.951 0.923 0.756 0.792 0.845 0.844 0.959 0.916 0.917 0.957 0.610 0.609
      D-Y(V1) 0.980 0.958 0.990 0.987 0.991 0.993 0.975 0.975 0.959 0.942 0.949 0.954 0.963 0.968
100K


      D-Y(V2) 0.993 0.965 0.995 0.990 0.983 0.989 0.976 0.975 0.979 0.958 0.953 0.978 0.921 0.968
     EN-DE(V1) 0.943 0.907 0.989 0.982 0.954 0.961 0.944 0.943 0.901 0.859 0.956 0.947 0.872 0.891
     EN-DE(V2) 0.965 0.933 0.993 0.988 0.926 0.932 0.943 0.941 0.934 0.890 0.970 0.969 0.779 0.847
     EN-FR(V1) 0.925 0.867 0.981 0.969 0.947 0.938 0.920 0.919 0.866 0.819 0.948 0.943 0.866 0.894
     EN-FR(V2) 0.968 0.899 0.989 0.979 0.897 0.901 0.925 0.923 0.925 0.877 0.959 0.968 0.742 0.806
    Avg Rank 6.211 10.105 1.316 2.842 6.895 5.474 6.947 7.632 8.947 12.789 6.737 6.211 12.000 10.895


   The results for all datasets are displayed in Table 3. In the bottom row
we display the average rank of each combination of input variant, embedding
approach and classifier which is a number between 1 and 14 (since there are 14
possible combinations), where 1 would mean this combination achieves the best
result for each dataset.
    For the movie datasets we can see that EAGERAkE with MultiKE performs
best, especially with the MLP classifier. This suggests that even for datasets with
relatively few relational information it can be beneficial to use knowledge graph
embeddings. However, it seems very dependent on the way these embeddings are
constructed in order for them to be useful with MultiKE being the only one of
the three embedding approaches to explicitly incorporate attribute data. This
also becomes apparent when comparing the results of EAGERE and EAGERA
where MultiKE performs best for EAGERE but is still clearly outperformed by
EAGERA .


                imdb-tmdb                         imdb-tvdb                        tmdb-tvdb
1.00
0.95
0.90                                                                                                       Metric
0.85                                                                                                          fm
                                                                                                              prec
0.80                                                                                                          rec
0.75
0.70
       TVShow   Film TVEpisode Person   TVShow   Film TVEpisode Person   Company TVShow Person TVEpisode


Fig. 3: Averaged F-measure, Precision and Recall per Type on Movie Datasets
using EAGERAkE with RF


    Looking at the movie datasets in more detail as shown in Figure 3, we can
see that there is a difference in performance depending on the entity type. In
most cases, EAGERAkE reaches an F-measure of over 90% for all entity types
showing that the approach is generic and able to achieve good match quality
for multiple heterogeneous entity types. Still there are some differences between
the entity types. TVShows and Films generally perform worse than TVEpisodes
and Persons with especially the precision for Film standing out negatively. This
is especially pronounced in the IMDB-TMDB and IMDB-TVDB datasets. This
might be attributed to different sets of attributes between those datasets, e.g.
as IMDB does not contain full-length descriptions of films and tv shows whereas
TMDB and TVDB do. Interestingly, Films/TVShows with very dissimilar titles
due to different representations of non-English titles can be matched using the
KGEs. For example the soviet drama ”Defence Counsel Sedov” has the roman-
ized title ”Zashchitnik Sedov” in IMDB, while TMDB has either the translated
”Defence Counsel Sedov” or the cyrillic "Защитник Седов". These entity pairs
are correctly matched in the EAGERAkE variant.
    Looking at the rich datasets it is again evident that EAGERAkE achieves
the best results. Overall it can solve the diverse match tasks including for multi-
lingual KGs and larger KGs very well with F-Measure values between 96% and
99% in most cases. As before MultiKE with the MLP classifier performs the best
                     CD
                 7       6       5      4       3       2       1


    E, RDGCN                                                        A k E, MultiKE
     E, BootEA                                                      E, MultiKE
 A k E, BootEA                                                      A k E, RDGCN
             A


Fig. 4: Critical distance diagram of Nemenyi test, connected groups are not sig-
nificantly different (at p = 0.05)


out of all graph embedding approaches, which is due to the fact that it explicitly
takes advantage of attribute information of each entity, as opposed to BootEA
and RDGCN.
    Comparing the performances between the datasets we see that on the variants
with richer graph structure (V2) the results are better than on (V1) for the
respective datasets. There is also a difference when contrasting the different
sizes of the datasets. While EAGERAkE with BootEA and MultiKE generally
seem to achieve better results on the larger 100K datasets compared to their
15K counterparts, this is less true for RDGCN.
    To properly compare the performance of the approaches across all approaches
we used the statistical analysis presented by Demšar [7] and the Python pack-
age Autorank [12], which aims to simplify the use of the proposed methods by
Demšar. The performance measurement for each dataset and classifier are our
paired samples. Given that we have more than two datasets simply using hy-
pothesis tests for all pairs, would result in a multiple testing problem, which
means the probability of accidentally reporting a significant difference would
be highly increased. We therefore use the procedure recommended by Demšar:
First we test if the average ranks of algorithms are significantly different using
the Friedman test. If this is the case we perform a Nemenyi test to compare all
classifiers and input combinations.
    The null hypothesis of the Friedman test can be rejected (p = 6.572 × 10−25 ).
A Nemenyi test is therefore performed and we present the critical distance di-
agram in Figure 4. The axis shows the average rank of the input/embedding
combination. Groups that are connected are not significantly different at the
significance level of 0.05, which is internally corrected to ensure that all results
together fulfill this. Approaches that have a higher difference in average rank
than the critical distance (CD) are significantly different.
    We can see that EAGERAkE with MultiKE significantly outperforms all
other variants. This is evidence that the combination of attribute similarities
and embeddings is preferable to using attribute similarities or embeddings on
their own for the task of entity resolution in rich knowledge graphs, even if the
embedding approaches already incorporate attribute information.
5.3       Comparison with other approaches

We compare our approach to the state-of-the-art ER frameworks Magellan [16]
and DeepMatcher [19]. Magellan is an ER framework that allows the use of ML
classifiers for ER. We present the best performing classifiers XGBoost [5] and
Random Forest (RF). DeepMatcher provides several deep learning solutions for
ER, we employ the hybrid variant which uses a bidirectional recurrent neural
network with a decomposable attention-based attribute summarization module.
To avoid any decrease in performance due to blocking we provide both frame-
works with respective training or test entity mappings directly. Because such
a setup is not possible for the approaches discussed in [28], which mostly use
resolution strategies based on nearest neighbors, we cannot fairly compare our
approach with theirs and therefore refrain from this comparison here.


Table 4: Averaged F-measure, Precision and Recall on test set of rich graph
datasets. The best F-measure value in a row is highlighted
                 EAGER MLP          EAGER RF         DeepMatcher Magellan XGBoost Magellan RF
       Dataset
                fm prec rec fm prec rec fm prec rec fm prec                       rec   fm prec rec
   imdb-tmdb 0.990 0.987 0.993 0.986 0.979 0.994 0.983 0.970 0.996 0.996 0.999 0.994 0.997 0.998 0.997
    imdb-tvdb 0.979 0.965 0.994 0.974 0.951 0.999 0.989 0.981 0.998 0.991 0.990 0.992 0.992 0.989 0.995
   tmdb-tvdb 0.991 0.992 0.990 0.985 0.988 0.982 0.987 0.977 0.997 0.993 0.991 0.994 0.995 0.993 0.997
      D-W(V1) 0.898 0.989 0.823 0.872 0.991 0.779 0.876 0.844 0.910 0.837 0.886 0.793 0.823 0.865 0.784
      D-W(V2) 0.968 0.990 0.948 0.909 0.992 0.838 0.904 0.895 0.913 0.863 0.899 0.830 0.848 0.859 0.837
      D-Y(V1) 0.985 1.000 0.971 0.985 1.000 0.971 0.979 0.974 0.983 0.971 0.985 0.957 0.971 0.984 0.958
      D-Y(V2) 0.996 0.999 0.993 0.993 0.999 0.986 0.986 0.985 0.987 0.974 0.972 0.977 0.974 0.973 0.976
15K


     EN-DE(V1) 0.985 0.996 0.973 0.984 0.995 0.973 0.971 0.976 0.966 0.969 0.992 0.948 0.962 0.977 0.948
     EN-DE(V2) 0.992 0.996 0.988 0.989 0.997 0.982 0.974 0.967 0.982 0.973 0.993 0.954 0.969 0.984 0.955
     EN-FR(V1) 0.980 0.995 0.965 0.973 0.994 0.952 0.956 0.959 0.953 0.953 0.983 0.924 0.952 0.979 0.926
     EN-FR(V2) 0.990 0.998 0.982 0.990 0.996 0.984 0.966 0.963 0.970 0.971 0.992 0.951 0.970 0.992 0.950
      D-W(V1) 0.873 0.996 0.777 0.864 0.990 0.767 0.926 0.907 0.945 0.815 0.907 0.741 0.812 0.896 0.742
      D-W(V2) 0.965 0.989 0.941 0.926 0.988 0.871 0.936 0.924 0.949 0.836 0.925 0.762 0.831 0.897 0.774
      D-Y(V1) 0.991 1.000 0.982 0.988 1.000 0.977 0.992 0.990 0.994 0.984 0.994 0.974 0.983 0.991 0.974
100K


      D-Y(V2) 0.997 0.999 0.995 0.991 1.000 0.982 0.993 0.993 0.994 0.985 0.983 0.987 0.984 0.982 0.987
     EN-DE(V1) 0.990 0.997 0.982 0.982 0.997 0.968 0.972 0.971 0.972 0.968 0.990 0.946 0.967 0.988 0.946
     EN-DE(V2) 0.993 0.997 0.990 0.987 0.997 0.978 0.975 0.972 0.978 0.968 0.993 0.945 0.966 0.987 0.946
     EN-FR(V1) 0.980 0.997 0.964 0.969 0.994 0.944 0.956 0.959 0.953 0.947 0.988 0.910 0.946 0.985 0.911
     EN-FR(V2) 0.989 0.995 0.983 0.981 0.992 0.970 0.964 0.959 0.969 0.964 0.991 0.938 0.962 0.987 0.938
    Avg Rank         1.579             2.579             2.895              3.737            4.211


    We start with the comparison for the shallow datasets. Since both Magellan
and DeepMatcher expect matched schemata we align the attributes by hand
where necessary. We report F-measure (fm), Precison (prec) and Recall (rec)
averaged over the 5 folds. For the comparison with other approaches we use
EAGERAkE with MultiKE and for brevity we will refer to it simply as EAGER.
The results are shown in Table 4. All frameworks perform very well with almost
all F-measure values over 0.95.
    For all three movie datasets Magellan RF outperforms all other approaches
in terms of F-measure.
                           CD
                    5           4          3          2          1


      Magellan RF                                                    EAGER MLP
 Magellan XGBoost                                                    EAGER RF
     DeepMatcher


Fig. 5: Critical distance diagram of Nemenyi test for comparison of frameworks,
connected groups are not significantly different (at p = 0.05)


    For the rich graph datasets the heterogeneity of the different KGs was a prob-
lem for Magellan and DeepMatcher since they both expect perfectly matched
schemata. This was manageable for the smaller datasets, were this can be done
by hand. In order to use Magellan and Deepmatcher on the rich graph datasets
we did the same as for EAGER and concatenated all entity attributes into a
single attribute. We can see that EAGER using MLP outperforms all other
approaches except on D-W (V1) and D-Y (V1) for the 100K sizes, where Deep-
Matcher performs best. Magellan is outperformed on all datasets by EAGER
and DeepMatcher.
    The Friedman test shows a significant difference (p = 8.675×10−07 ). Looking
at the critical distance diagram in Figure 5 we can see that EAGER MLP does
not significantly outperform EAGER RF or DeepMatcher, but it is the only
approach that significantly outperforms both Magellan approaches. While there
is no significant difference between EAGER MLP and DeepMatcher, EAGER
does not depend on the provision of schema matching.


6   Conclusion & Future work

We explored the combination of knowledge graph embeddings and attribute
similarities for entity resolution in knowledge graphs with multiple entity types.
These approaches are included in a new learning-based ER system called EA-
GER. We tested our approach on a range of different datasets and showed that
using a combination of both graph embeddings and attribute similarities gener-
ally yields the best results compared to just using either one. We showed that
our approach yields competitive results that are on par with or significantly out-
perform state of the art approaches. The approach is generic and can deal with
several entity types without prior schema matching.
    Future work will investigate blocking strategies utilizing both embeddings
and attribute information, as well as smarter attribute combination strategies
(e.g. using property matching[2]). Unsupervised and active learning in this con-
text should be explored to alleviate the difficulty of obtaining training data.
Acknowledgments. This work was supported by the German Federal Ministry
of Education and Research (BMBF, 01/S18026A-F) by funding the competence
center for Big Data and AI ”ScaDS.AI Dresden/Leipzig”. Some computations
have been done with resources of Leipzig University Computing Center.


References
 1. Ali, M., Berrendorf, M., Hoyt, C.T., Vermue, L., Galkin, M., Sharifzadeh, S., Fis-
    cher, A., Tresp, V., Lehmann, J.: Bringing Light Into the Dark: A Large-scale
    Evaluation of Knowledge Graph Embedding Models Under a Unified Framework
    pp. 1–40 (2020), http://arxiv.org/abs/2006.13365
 2. Ayala, D., Hernández, I., Ruiz, D., Rahm, E.: Leapme: Learning-based property
    matching with embeddings. arXiv preprint arXiv:2010.01951 (2020)
 3. Bhattacharya, I., Getoor, L.: Collective entity resolution in relational data. IEEE
    Data Eng. Bull. 29, 4–12 (2006)
 4. Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001).
    https://doi.org/10.1023/A:1010933404324,           http://dx.doi.org/10.1023/A%
    3A1010933404324
 5. Chen, T., Guestrin, C.: XGBoost: A scalable tree boosting system. In:
    Proceedings of the 22nd ACM SIGKDD International Conference on
    Knowledge Discovery and Data Mining. pp. 785–794. KDD ’16, ACM
    (2016). https://doi.org/10.1145/2939672.2939785, http://doi.acm.org/10.1145/
    2939672.2939785
 6. Christen, P.: Data Matching: Concepts and Techniques for Record Linkage, Entity
    Resolution, and Duplicate Detection. Springer Publishing Company, Incorporated
    (2012)
 7. Demšar, J.: Statistical comparisons of classifiers over multiple data sets. Journal
    of Machine Learning Research 7, 1–30 (2006)
 8. Domingos, P.: Multi-relational record linkage. In: In Proceedings of the KDD-2004
    Workshop on Multi-Relational Data Mining. pp. 31–48 (2004)
 9. Ebraheem, M., Thirumuruganathan, S., Joty, S.R., Ouzzani, M., Tang, N.: Deeper
    - deep entity resolution. CoRR abs/1710.00597 (2017), http://arxiv.org/abs/
    1710.00597
10. Elfeky, M.G., Elmagarmid, A.K., Verykios, V.S.: Tailor: A record linkage tool box.
    In: ICDE (2002)
11. Elmagarmid, A.K., Ipeirotis, P.G., Verykios, V.S.: Duplicate record detection: A
    survey. IEEE Transactions on Knowledge and Data Engineering 19, 1–16 (2007)
12. Herbold, S.: Autorank: A python package for automated ranking of
    classifiers. Journal of Open Source Software 5(48),                   2173 (2020).
    https://doi.org/10.21105/joss.02173, https://doi.org/10.21105/joss.02173
13. Hinton, G.E.: Connectionist learning procedures. Artif. Intell. 40(1–3), 185–234
    (Sep 1989). https://doi.org/10.1016/0004-3702(89)90049-0, https://doi.org/10.
    1016/0004-3702(89)90049-0
14. Isele, R., Bizer, C.: Learning expressive linkage rules using genetic programming.
    PVLDB 5, 1638–1649 (2012)
15. Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization (2017)
16. Konda, P., Das, S., C., P.S.G., Doan, A., Ardalan, A., Ballard, J.R., Li, H., Panahi,
    F., Zhang, H., Naughton, J.F., Prasad, S., Krishnan, G., Deep, R., Raghavendra,
    V.: Magellan: Toward building entity matching management systems. Proc. VLDB
    Endow. 9(12), 1197–1208 (2016). https://doi.org/10.14778/2994509.2994535,
    http://www.vldb.org/pvldb/vol9/p1197-pkonda.pdf
17. Lacoste-Julien, S., Palla, K., Davies, A., Kasneci, G., Graepel, T., Ghahramani,
    Z.: Sigma: simple greedy matching for aligning large knowledge bases. In: KDD
    (2013)
18. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word repre-
    sentations in vector space (2013)
19. Mudgal, S., Li, H., Rekatsinas, T., Doan, A., Park, Y., Krishnan, G.,
    Deep, R., Arcaute, E., Raghavendra, V.: Deep learning for entity match-
    ing: A design space exploration. In: Proceedings of the 2018 International
    Conference on Management of Data, SIGMOD Conference 2018. pp. 19–34.
    ACM (2018). https://doi.org/10.1145/3183713.3196926, https://doi.org/10.
    1145/3183713.3196926
20. Nentwig, M., Hartung, M., Ngomo, A.C.N., Rahm, E.: A survey of current link
    discovery frameworks. Semantic Web 8, 419–436 (2017)
21. Ngomo, A.C.N., Lyko, K.: Eagle: Efficient active learning of link specifications
    using genetic programming. In: ESWC (2012)
22. Ngomo, A.C.N., Lyko, K.: Unsupervised learning of link specifications: determin-
    istic vs. non-deterministic. In: OM (2013)
23. Nikolov, A., d’Aquin, M., Motta, E.: Unsupervised learning of link discovery con-
    figuration. In: ESWC (2012)
24. Pershina, M., Yakout, M., Chakrabarti, K.: Holistic entity matching across knowl-
    edge graphs. 2015 IEEE International Conference on Big Data (Big Data) pp.
    1585–1590 (2015)
25. Sarawagi, S., Bhamidipaty, A.: Interactive deduplication using active learning. In:
    KDD (2002)
26. Sherif, M.A., Ngomo, A.C.N., Lehmann, J.: Wombat - a generalization approach
    for automatic link discovery. In: ESWC (2017)
27. Sun, Z., Hu, W., Zhang, Q., Qu, Y.: Bootstrapping entity alignment with knowl-
    edge graph embedding. IJCAI International Joint Conference on Artificial Intelli-
    gence 2018-July, 4396–4402 (2018). https://doi.org/10.24963/ijcai.2018/611
28. Sun, Z., Zhang, Q., Hu, W., Wang, C., Chen, M., Akrami, F., Li, C.: A bench-
    marking study of embedding-based entity alignment for knowledge graphs. Pro-
    ceedings of the VLDB Endowment 13(11), 2326–2340 (2020), http://www.vldb.
    org/pvldb/vol13/p2326-sun.pdf
29. Volz, J., Bizer, C., Gaedke, M., Kobilarov, G.: Silk - a link discovery framework
    for the web of data. In: LDOW (2009)
30. Wu, Y., Liu, X., Feng, Y., Wang, Z., Yan, R., Zhao, D.: Relation-aware
    entity alignment for heterogeneous knowledge graphs. Proceedings of the
    Twenty-Eighth International Joint Conference on Artificial Intelligence (Aug
    2019). https://doi.org/10.24963/ijcai.2019/733, http://dx.doi.org/10.24963/
    ijcai.2019/733
31. Zhang, Q., Sun, Z., Hu, W., Chen, M., Guo, L., Qu, Y.: Multi-view knowledge
    graph embedding for entity alignment. In: IJCAI. vol. 2019-Augus, pp. 5429–
    5435. IJCAI (jun 2019). https://doi.org/10.24963/ijcai.2019/754, http://arxiv.
    org/abs/1906.02390
32. Zhu, L., Ghasemi-Gol, M., Szekely, P.A., Galstyan, A., Knoblock, C.A.: Unsuper-
    vised entity resolution on multi-type graphs. In: ISWC (2016)

</pre>