<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Daniel Obraczka[</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Knowledge Graph Completion with FAMER</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>D. Obraczka</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>A. Saeedi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>E. Rahm</string-name>
          <email>rahmg@informatik.uni-leipzig.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Leipzig</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>0000</year>
      </pub-date>
      <volume>0002</volume>
      <abstract>
        <p>We outline the use of the tool FAMER to address the schema and entity matching tasks for the DI2KG 2019 challenge. FAMER supports both the static and incremental matching and clustering of entities from multiple sources. To alleviate entity matching, we rst identify matching properties in the provided datasets based on the similarity of property names and instance values. This approach utilizes the given training data to derive property matches from entity matches. For entity matching, we consider multiple con gurations to determine entity similarities with the optional use of word embeddings.</p>
      </abstract>
      <kwd-group>
        <kwd>Entity Resolution</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Knowledge graphs (KG) physically integrate numerous entities with their
properties (attributes) and relationships as well as associated metadata about
entity types and relationship types in a graph-like structure [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. A product KG
may thus contain a huge number of products of many types where the
product types can also be organized in an ontological structure, e.g., to di erentiate
camera-related products into di erent kinds of cameras (DSLR, mirrorless, ...),
camera parts (e.g. camera bodies, lenses, ...) and di erent kinds of camera
accessories. The KG entities and relationships are typically integrated from numerous
sources, such as other knowledge graphs, databases, web pages, documents etc.
Integrating such sources implies a matching and fusion of equivalent entities and
relations. The initial KG may be created from a single source (e.g., a pre-existing
knowledge graph such as DBpedia or the product KG of a speci c merchant) or
a static integration of multiple sources. KG completion (or extension) refers to
the incremental addition of new entities and relationships. The addition of new
entities requires solving several challenging tasks:
1. preprocessing of new datasets for data pro ling (e.g., to determine the
cardinality and value ranges of properties) and data cleaning
2. determining the entity type (classi cation) of new entities
3. incremental schema matching to match and group (cluster) properties of
new entities with known properties in the KG
4. incremental entity resolution to match and cluster new entities with already
known entities in the KG
5. fusion of newly matching entities
6. addition of relationships for new entities.
      </p>
      <p>
        At the Univ. of Leipzig, we are developing a scalable framework for the
end-to-end generation and maintenance of KGs building on our previous work
on learning-based product matching [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and parallel entity resolution [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. The
core of this framework is a new parallel tool called FAMER (FAst Multi-source
Entity Resolution) for both static and incremental matching and clustering of
entities from multiple sources [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. FAMER rst determines or updates a
socalled similarity graph between entities of a certain type and then applies a
clustering approach to determine or update clusters of matching entities. These
clusters group matching entities from di erent sources and thus support both
the fusion of matching entities as well the tracking of original entities (which is
also helpful for a possible cluster repair).
      </p>
      <p>We address both the schema and entity matching tasks of the DI2KG 2019
challenge for KG integration of product entities about cameras. We are not
providing a full-blown schema (property) matching solution but focus on a simple
approach to support entity matching on the most frequent properties. We also
use FAMER for building a similarity graph on properties and to determine and
incrementally update clusters of matching properties.</p>
      <p>In the next section, we provide an overview about FAMER. We then describe
how we address preprocessing and schema matching (Sec. 3) and entity matching
(Sec. 4) for the DI2KG 2019 challenge. Obtained results are described in Sec. 5.
2</p>
    </sec>
    <sec id="sec-2">
      <title>FAMER Overview</title>
      <p>
        Figure 1 illustrates the main components of the FAMER framework for
incremental matching. As outlined in [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ][
        <xref ref-type="bibr" rid="ref12">12</xref>
        ][
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], the framework consists of two major
con gurable phases (gray boxes) named Linking and Clustering. In the Linking
phase, a similarity graph is generated so that similar entities are linked pairwise
with each other. This phase starts with blocking on selected properties so that
only entities of the same block need to be compared with each other. In the
initial version of FAMER, pairwise matching is manually con gured by
specifying a combination of several property similarities that has to exceed a minimal
similarity threshold. We have now also added support for learning-based
linking con gurations, e.g. using random forest classi cation, which utilizes training
data of matching and non-matching entity pairs. Similar to [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], we also added
support for word embeddings, e.g. using FastText, to replace the value of string
(textual) properties by their embeddings for a possibly improved similarity
computation. Depending on the method to determine potential matches, the edges
in the similarity graph include a similarity score to indicate the match
likelihood. The second part of FAMER uses the similarity graph to determine entity
clusters where a cluster groups all matching entities from the di erent input
sources. Clustering can be based on di erent algorithms including the so-called
CLIP approach favoring so-called strong inter-source links that connect
maximally similar entity pairs from both sides [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
      </p>
      <p>
        FAMER is able to update the output result for new entities and new sources
[
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] as needed for KG completion. In this case, the input is a stream of new
entities from known sources or from a new source plus the already determined
entity clusters (stored in the KG) (Figure 1). Here, the Linking part focuses on
the new entities and does not re-link among previous entities. The output of
the linking is an updated similarity graph composed of existing clusters and the
group of new entities and the newly created links (the light-blue colored group in
Figure 1). The Incremental Clustering/Fusion part integrates the group of new
entities into clusters. The updated clusters are fused in the Fusion component so
that all entities are represented by a single entity called cluster representative.
      </p>
      <p>FAMER is implemented using Apache Flink so that the calculation of
similarity graphs and the clustering approaches can be executed in parallel on clusters
of variable size. For the implementation of the parallel clustering schemes we also
use the Gelly library of Flink supporting a so-called vertex-centric programming
of graph algorithms to iteratively execute a user-de ned program in parallel over
all vertices of a graph. The vertex functions are executed by a con gurable
number of worker nodes among which the graph data is partitioned, e.g., according
to a hash partitioning on vertex ids.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Preprocessing and Schema Matching</title>
      <p>To illustrate the data quality problems in the given dataset of the DI2KG
challenge, we show in Table 1 two matching Nikon camera products from di erent
sources. We observe signi cant di erences in the set of properties and
property values. For example the rst entity owns the property features while the
second camera does neither contain this property nor the corresponding value
(Slimline). This may happen even among entities of the same source. Moreover,
the same property values are not represented similarly in di erent entities. For
example in the rst camera the property camera resolution with the value 16
Megapixels is represented as "approx resolution": "16MP" for the second camera.
Altogether, the challenge includes 24 sources with vastly heterogenuous schemas.
For example, the source "ebay" has over 2000 properties some of which are likely
duplicate properties such as "maximum shutter speed" and "max shutter speed".</p>
      <p>Before we perform an incremental schema matching and entity matching we
rst perform preprocessing on the input dataset to derive some statistics and
to perform data cleaning steps. In particular, we focus both entity and schema
matching on the most frequent properties since infrequent properties are unlikely
to be present for all matching pairs of entities so that their use is of limited value.
For example the property energy consumption per year only occurs in one entity
in the entire dataset and will therefore most likely not have a corresponding
property in other sources and is therefore useless for entity resolution. For each
source we therefore determine the k ( 10) most frequent properties.</p>
      <p>We also perform data cleaning to harmonize property values to make
similarity computations more meaningful. For example, we can see that di erent
units are used for weight in di erent sources. Comparing values in ounces with
values in grams would lead to a poor similarity value and we therefore transform
both into the same unit. Further data cleaning procedures are performed, such
as lowercasing strings and using canonical abbreviations.</p>
      <p>
        Incremental schema matching Schema matching or schema alignment
consists of determining which properties of di erent sources correspond with
each other. There are a plethora of di erent approaches like e.g., instances-based
or linguistic matchers that try to tackle this problem (see [
        <xref ref-type="bibr" rid="ref2 ref7 ref9">9,7,2</xref>
        ] for overviews).
FAMER currently expects to be provided with already matched properties for
entity resolution. For the DI2KG challenge, we however need to rst align the
properties before we can apply our entity resolution approach.
      </p>
      <p>Our approach makes use of the provided training data for entity resolution
task that includes a subset of the true matching entity pairs. While the provided
training data contains example entities for all sources, entity matches are only
available for a subset of source combinations. For example we are given entity
matches for "canon-europe.com" and "price-hunt.com", but not for the
combination of "canon-europe.com" and "ebay.com". However we have matches for
"price-hunt.com" and "ebay.com". We can therefore rst align the properties
of "ebay.com" and "price-hunt.com" and then integrate "canon-europe.com"
into this intermediary result using the given entity match between
"canoneurope.com" and "price-hunt.com".</p>
      <p>We therefore follow an incremental property clustering approach that starts
with the pair of sources with the most matches in the training data and consider
the further sources for property matching in the order of their number of matches.
For each source s we thus use the training data to count the number of entities
that have s as provenance. We will refer to this as the provCount of s. Assuming
that the source with the highest provCount has already been integrated into the
KG, we start property matching with the source that has the second highest
provCount and continue with the further sources in descending order of their
provCount until all sources are processed.</p>
      <p>Each incremental step consists of the following procedure:
1. Categorize properties by value range
2. Calculate property similarities by computing the weighted arithmetic mean
of property name similarity and aggregated property value similarity
3. Update similarity graph
4. Cluster properties.</p>
      <p>To avoid comparing apples with oranges all properties are rst categorized by
looking at the property value range. Possible categories are for example "string",
"number" or "boolean". Looking at Table 1 for example the properties optical
zoom and color clearly belong to di erent categories since the former mainly has
number values and the latter consists of strings.</p>
      <p>In the next step we calculate a combined similarity between properties of a
new source and already considered properties of previous sources of the same
category. The similarity between two properties is based on the similarity of
property names and the aggregated similarity of all property values. The
property values are derived from all relevant matches for the considered sources from
the training data.</p>
      <p>The calculated similarites are used to build and update a similarity graph
consisting of the properties as vertices and the similarities as edges. This graph
is given to FAMER's clustering module to determine new property clusters.
This is iteratively done until no more sources are left to integrate. The resulting
property clusters can now be used in the entity resolution step by fusing all
members of a cluster to a new property.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Entity Resolution</title>
      <p>FAMER assumes the knowledge of matching properties for both blocking and
pair-wise linking. We therefore use the schema matching result and data cleaning
for the most frequent properties to harmonize the entities before entity
resolution. Table 2 indicates the improved data of Table 1 after preprocessing and
property alignment. As illustrated we consider only a subset of the properties
and both the property names and some property values have been harmonized.</p>
      <p>FAMER provides many options to perform entity resolution for the prepared
dataset and we aim at a comparative evaluation of several con gurations. In
particular, we can apply a batch-like (static) matching and clustering for all
(24) sources at once or we can apply an incremental approach that iteratively
adds and matches one source after the other. We decided to compare a batch-like
approach, which we will denote as 1step, and an alternative approach dubbed
2step, in which we rst deduplicate each source independently, fuse duplicate
entities and then perform matching and clustering on the deduplicated sources.</p>
      <p>In both cases blocking is done on the manufacturer property that is needed
for a su ciently low runtime. The camera products lacking the value of
manufacturer form a special block and are matched with all other entities.</p>
      <p>The most promising linking con guration used the following weighted
similarity:
sim(e1; e2) = !1
productSim(e1; e2) + !2</p>
      <p>J aroW inkler(e1; e2);
where !i are weights. The similarity productSim is 0 or 1 depending on whether
the product codes of the entities e1 and e2 match. The product codes are
extracted from the page title attribute. Finally, J aroW inkler is the jaro-winkler
similarity performed on the concatenation of all respective properties of the
entities except the page title.</p>
      <p>
        The third approach we submitted utilized machine learning. We used the
provided training data as input to Magellan's [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] XGBoost [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] implementation.
As before we used the rst 2 letters of manufacturers. Negative training examples
were created by taking the most dissimilar entities in a block. Since Magellan
is only able to perform pairwise matching we ran this approach for all possible
data source pairs, where training data was available. The trained classi ers were
then used to classify unseen entity pairs and the resulting classi er probabilities
were used to create a similarity graph of all sources. Finally, FAMERs clustering
module was used on this similarity graph.
In this section we will describe the performance of our approaches on the tasks
schema matching and entity matching of the DI2KG challenge 2019. We present
the evaluation of our results at the time of our submission, as well as the results
obtained from the workshop organizers. Unfortunately, we could not directly use
the golden truth for a comprehensive evaluation but had to rely on the results
determined by the workshop organizers for one schema matching approach and
three entity resolution approaches.
For the schema matching task the challenge organizers provided us with the
results for our approach shown in Table 3.
      </p>
      <p>Since we were only concerned with clustering the most frequent properties,
the evaluation was done with regards to two di erent aspects. On the one hand
only the correct matching of attribute pairs is considered. In this regard our
approach achieves a high precision of 0.96. The schema matching challenge
consisted of matching source attributes to target attributes, which can be seen as
attributes of the integrated schema. In this regard our approach performed worse.
The main reason for this discrepancy lies in the fact that we were more concerned
with correctly clustering source attributes, than nding the corresponding target
attributes, because this seemed more relevant for the following entity resolution
task. Each source attribute belonging to a source attribute cluster was therefore
assumed to claim the union of all target attributes claimed by the other cluster
members. This obviously disregards some intricacies of this task.</p>
      <p>In general we can see that even for the most frequent properties schema
matching is a very di cult task, due to the heterogeneity of these datasets.
Attributes might have the same attribute name, but contain di erent information
and are not to be matched. E.g. the attribute resolution in "www.priceme.co.nz"
contains a technical description about the resolution, while in all other datasets
this attribute contains the number of megapixel of the camera. Another problem
lies in distinguishing attributes with similar value ranges. For example properties
that have a value range that simply consists of numbers are very similar to each
other.</p>
      <p>The obtained results are not yet of su cient quality indicating that inferring
property matches from given entity matches is not as e ective for the given
dataset as we had hoped for. To obtain better and more complete results for all
properties we therefore need a more general solution for property matching, e.g.
with a more comprehensive use of instance data.
5.2</p>
      <p>Entity Matching
As described in Section 4, we submitted results of three di erent entity resolution
approaches. While we initially also wished to employ word embeddings in these
methods, in initial tests this technique did not prove as promising for the given
dataset. The rst two approaches consist of manually created con gurations of
our system, while the third utilized machine learning. For the weighted similarity,
used in the rst two approaches, the best weights were determined to be !1 = 0:6
and !2 = 0:4. The results are presented in Table 4. Before submission we created
Measure 1step 2step ML(train) ML(test)
Fmeasure 0.91 0.88 0.59 0.60
Precision 0.99 0.98 0.77 0.77</p>
      <p>Recall 0.84 0.79 0.48 0.50
Measure 1step 2step ML
Fmeasure 0.64 0.56 0.002
Precision 0.78 0.59 0.06</p>
      <p>Recall 0.54 0.54 0.001
(a) Training Data
(b) Golden Truth
a test dataset to avoid only evaluating the machine learning approach on the
data we trained on. To obtain this test data, the entities with the most similar
page titles in a block where regarded as true matches, and the most dissimilar
entities where regarded as non-matches. At the time of submission we already
observed that our manually created con gurations were superior to the machine
learning approach. We attribute this to the low number of training examples
(especially per source-pair). The conference organizers informed us that our rst
two approaches enabled them to augment their golden truth with roughly 800
new entities that were previously not identi ed as matching indicating that the
considered golden truth is not yet in a perfect state. We can see a huge di erence
between the performance on the training data set and the larger golden truth.
This might indicate that the training data generally contains simpler examples,
or our methods over t to the training data. The bad performance of the machine
learning approach on the whole golden truth is not explainable at this point and
might be due to some error. Unfortunately, a more detailed analysis of this issue
was not yet possible due to the unavailability of the golden truth for us.</p>
      <p>Our 1step method outperformed the 2step method. We assume,
deduplicating each source and fusing detected duplicate entities in one entity, may create
false links in the 2nd step between the wrongly fused entities and other entities
from other sources explaining the relatively low precision for the golden truth
(Table 4b). The more detailed comparison of 1-step vs. 2-step approaches is
another topic for future study.</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>We have shown that the FAMER tool could reasonably well solve the entity
resolution task of the challenging 2019 DI2KG dataset. While there is still room
for improvement, our approach determined matches that helped the conference
organizers to enhance the golden truth (which thus may be more a silver truth).
We could also provide a reasonable solution for (simpli ed) property matching,
but more e ort is necessary to achieve a full- edged solution. Future work will
also investigate how to improve accuracy on sources containing duplicates, and
the integration and optimal use of of machine learning approaches in our system.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>This work is partially funded by the German Federal Ministry of Education and
Research under grant BMBF 01IS18026B. Some computations have been done
with resources of Leipzig University Computing Centre.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guestrin</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Xgboost: A scalable tree boosting system</article-title>
          .
          <source>In: Proc. ACM SIGKDD Conf</source>
          . pp.
          <volume>785</volume>
          {
          <issue>794</issue>
          (
          <year>2016</year>
          ). https://doi.org/10.1145/2939672.2939785
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Euzenat</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shvaiko</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          : Ontology matching. Springer, Heidelberg (DE),
          <year>2nd</year>
          edn. (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Kolb</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thor</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rahm</surname>
          </string-name>
          , E.:
          <article-title>Dedoop: E cient deduplication with Hadoop</article-title>
          .
          <source>PVLDB</source>
          <volume>5</volume>
          (
          <issue>12</issue>
          ) (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Konda</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Das</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.S.G.</given-names>
            ,
            <surname>Doan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Ardalan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Ballard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.R.</given-names>
            ,
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Panahi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            ,
            <surname>Zhang</surname>
          </string-name>
          , H.,
          <string-name>
            <surname>Naughton</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Prasad</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Krishnan</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Deep</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Raghavendra</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Magellan: Toward building entity matching management systems over data science stacks</article-title>
          .
          <source>PVLDB</source>
          <volume>9</volume>
          (
          <issue>13</issue>
          ),
          <volume>1581</volume>
          {1584 (Sep
          <year>2016</year>
          ). https://doi.org/10.14778/3007263.3007314
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5. Kopcke, H.,
          <string-name>
            <surname>Thor</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thomas</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rahm</surname>
          </string-name>
          , E.:
          <article-title>Tailoring entity resolution for matching product o ers</article-title>
          .
          <source>In: Proc. EDBT</source>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Mudgal</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rekatsinas</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Doan</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Park</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Krishnan</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Deep</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Arcaute</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Raghavendra</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Deep learning for entity matching: A design space exploration</article-title>
          .
          <source>In: Proc. ACM SIGMOD conf</source>
          . pp.
          <volume>19</volume>
          {
          <issue>34</issue>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Rahm</surname>
          </string-name>
          , E.:
          <article-title>Towards large-scale schema and ontology matching</article-title>
          .
          <source>In: Schema matching and mapping</source>
          , pp.
          <volume>3</volume>
          {
          <fpage>27</fpage>
          . Springer (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Rahm</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          :
          <article-title>The case for holistic data integration</article-title>
          .
          <source>In: Proc. ADBIS</source>
          . pp.
          <volume>11</volume>
          {
          <fpage>27</fpage>
          . Springer (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Rahm</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bernstein</surname>
            ,
            <given-names>P.A.</given-names>
          </string-name>
          :
          <article-title>A survey of approaches to automatic schema matching</article-title>
          .
          <source>The VLDB Journal</source>
          <volume>10</volume>
          (
          <issue>4</issue>
          ),
          <volume>334</volume>
          {350 (Dec
          <year>2001</year>
          ). https://doi.org/10.1007/s007780100057
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Saeedi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nentwig</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Peukert</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rahm</surname>
          </string-name>
          , E.:
          <article-title>Scalable matching and clustering of entities with famer</article-title>
          .
          <source>Complex Systems Informatics and Modeling Quarterly (16)</source>
          ,
          <volume>61</volume>
          {
          <fpage>83</fpage>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Saeedi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Peukert</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rahm</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          :
          <article-title>Comparative evaluation of distributed clustering schemes for multi-source entity resolution</article-title>
          .
          <source>In: Proc. ADBIS</source>
          . pp.
          <volume>278</volume>
          {
          <fpage>293</fpage>
          . Springer (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Saeedi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Peukert</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rahm</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          :
          <article-title>Using link features for entity clustering in knowledge graphs</article-title>
          .
          <source>In: Proc. ESWC</source>
          . pp.
          <volume>576</volume>
          {
          <fpage>592</fpage>
          . Springer (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Saeedi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Peukert</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rahm</surname>
          </string-name>
          , E.:
          <article-title>Incremental multi-source entity resolution with famer</article-title>
          . p.
          <article-title>submitted for publication (</article-title>
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>