<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>An Approach for Incremental Entity Resolution at the Example of Social Media Data</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>B. Opitz</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>T. Sztyler</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>M. Jess</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>F. Knip</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>C. Bikar</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>B. P ster</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>A. Scherp</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Kiel University and Leibniz Information Center for Economics</institution>
          ,
          <addr-line>Kiel</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Mannheim</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>When querying data providers on the web, one has no guarantee that they will reply within a given time. Some providers may even not answer at all. This makes it infeasible to wait for a complete result before beginning with the entity resolution. In order to solve this problem, we present a query-time entity resolution approach that takes the asynchronous nature of the replies from data providers into account by starting the entity resolution as soon as rst results are returned. Resolved entities are propagated from the entity resolution engine to the mobile client as early as possible. Resolution results that are produced later are send as updates to the client and thus improve earlier results.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Where can I nd the next bakery? Is it open on Saturdays? Is there any concert in
the area or around my current location? What to do for the night without wasting
much time browsing multiple websites? For all these questions, our mobile social
media explorer mobEx provides the answer. It aggregates data from di erent
Web sources to provide information to plan one's holiday, leisure time, or any
other amusements. However, when querying multiple data providers for
information about the same subject or location, it is quite common to nd redundancy
in the overall result. For example, multiple providers may have complementary
information about the same entity or even (exact) duplicates [
        <xref ref-type="bibr" rid="ref1 ref2 ref3">1,2,3</xref>
        ]. This is
further complicated by variations between the retrieved data such as di erent
spellings (possibly mistakes) or missing information [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Assuming that we
retrieve information from an arbitrary number of providers in the form of records,
i. e., a composition of information about an entity, an end user will most likely
prefer a consolidated resource representing an entity instead of multiple resources
describing the same entity. The process of eliminating duplicates and merging
them into one resource is called entity resolution (ER). In this paper, we present
a novel query-time entity resolution approach, i.e., we carry out ER at
querytime. The mobile application mobEx serves as showcase for our approach. We
use techniques such as fuzzy matching and threading as well as precondition
heuristics that reduce the number of comparisons that have to be carried out.
Key di erence to existing work is that we do not have all records readily at hand
when the resolution process is started. Thus, the resolution process receives more
resources as we go along and results gradually become more complete.
      </p>
      <p>
        Showcase: Social Media Exploration with mobEx
The mobile application mobEx sends a user request to the mobEx server. The
server queries the various data providers such as DBpedia (http://dbpedia.org),
Eventful (http://eventful.com), Qype (http://qype.com), OpenPOI
(http://openpois.net), and GeoNames (http://geonames.org) for events, organizations,
persons, and places. The results the mobEx client receives can be navigated through
a facet structure as shown in Figure 1 (left). The faceted navigation has been
extended from Schneider et al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. In the middle of Figure 1 one can see a
screenshot of the app with the map view showing the results of a query in terms of
locations and events. Finally, the details of an object such as a website and open
hours can be viewed as as shown in Figure 1 (right). The entity resolution of
the di erent resources retrieved from the data provider entirely takes place on
the mobEx server. A resource represents an object that is either an event,
organization, person or place. The relevant properties of a resource for the entity
resolution process are outlined in Table 1. When querying data providers, all
retrieved records are mapped into such a resource. The alignment of the data
providers' schemata to the internal schema of mobEx is hard coded. It may be
extended by some automatic approaches in the future like [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>An Approach for Incremental Entity Resolution
Our incremental entity resolution approach takes place in a highly parallel
fashion as can be seen in Figure 2. The numbers in the following description of our
Data type Attribute</p>
      <p>W. Represented informa- Example</p>
      <p>tion
String
String
id used on the server
type of resource
process correspond to the numbers in the gure. In the given scenario where a
client queries our server (1), it is important that the records are processed as fast
as possible. Thus, all processes work in parallel. There are two main stages which
are the querying of the records (upper half of Figure 2) as well as the entity
resolution (lower half of Figure 2). Both stages are divided in multiple sub-threads,
controlled by three central units, namely Main-Thread, Entity-Manager-Thread
and Entity-Resolver-Thread. These components enable the parallel execution of
all available tasks whereas the Main-Thread handles the querying of the records
from di erent data-provider like Geonames or OpenPOI and the
Entity-ResolverThread the entity resolution. The Entity-Manager-Thread decouples these two
stages so that they do not have to wait for each other. Thus, when the
MainThread receives a request (1) it starts and controls the data-provider querying
threads (2) which retrieve the desired information such as restaurants,
hospitals and parks and parses the retrieved records into the local/target schema,
namely our resource model. When a data-provider thread delivers results, they
are passed to the Entity-Manager-Thread (3). The Entity-Manager-Thread
administers a container for the queried and processed resources. It listens to all
data-provider threads and forwards their results to the Entity-Resolver-Thread
(4). The Entity-Resolver-Thread tries to nd and handle duplicates such as `Cafe
Vienna' and `Vienna Cafe', i. e., it actually carries out the entity resolution. All
arriving resources are compared to the already received resources by worker
threads (5). The result is returned to the Entity-Manager-Thread (6) which in
turn returns it to the Main-Thread (7). The processed resources are delivered
in reply to the request (8) as soon as they are available, i.e. we do not wait
until all records have arrived from the data-providers and undergone resolution.
Therefore, earlier results may be updated in a later step of the resolution.
8
Main-Thread
(Central Unit)
7
1
client sends request</p>
      <p>Querying Data
2 Data Provider Thread
2
2</p>
      <p>Data Provider Thread
Data Provider Thread</p>
      <p>store queried data
forward processed data
4
start entity resolution</p>
      <p>
        with new data
Entity-Resolver-Thread
(Central Unit)
5
5
5
Once a resource object is created it is assigned to a facet. A facet is a category
and part of a large tree. This structure is called the facet tree and represents
categories hierarchically [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The tree is static at run time and part of the resource
object management. The core of the structure is based on data from DBpedia,
i.e. it originates from the structure of Wikipedia. The facet tree consists of
four subtrees that have the root nodes event, organization, person, and place.
They originate from di erent integrated (social media) sources as mentioned in
Section 2. Each resource object is assigned to one of these root nodes and to at
least one child. Thus, the `Cafe Vienna' is a `place' and a `Co ee Shop' (which
is a sub-facet of place). The assignment of a facet to a resource is determined by
the category information delivered by the data provider, e.g. `Cafe'. Thus, the
mapping process uses prede ned rules to determine a correct match in the facet
tree but if this fails, the process tries to nd a match based on string similarity.
However, only the root node (event, organization, person, or place) assigned to
a resource is considered by the entity resolution. The hierarchical classi cation
of each resource object to categories is not considered because the run time of
the entity resolution would rise dramatically and speed has a high priority.
3.2
      </p>
      <sec id="sec-1-1">
        <title>Entity Resolution Process</title>
        <p>
          As already pointed out, entity resolution takes place on resource objects. Each
attribute of such a resource contains a certain piece of information about an
entity (see examples in Table 1) and is thus more or less representative of the
actual entity. Consequently, \some attributes are more important in determining
whether a mapping should exist between two objects" [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. We account for that
by assigning weights to the attributes where a higher weight represents that
the feature is more distinctive or authoritative (see Table 1). The weights were
taken from [
          <xref ref-type="bibr" rid="ref3 ref6 ref7">3,6,7</xref>
          ]. While none of them gives explicit numbers, there are hints
which properties are important or more distinctive. The actual numbers were
assigned using empirical results. For example, it is legitimate to assign the URL
the greatest weight, as they are unique. A similar argument applies for the label
and phone number which are both designed to serve as identi er for, e.g., a
person, organization, or place.
        </p>
        <p>The entity resolution process is multi-threaded. We start a new thread each
time we receive a batch of records from a provider. These records are then
mapped into resources which in turn are resolved against each other as well as
all previously received resources that were part of the same client query. However,
this processing brings up several new challenges:
1. Duplicate comparisons have to be avoided.
2. Given two resources r1 and r2. As r1 is the older object it becomes the merge
target. As r2 has already been merged earlier, the further merge process is
a ected as follows: r2 as a merge target no longer exists. Another resource
r3 that would be merged into r2, will be merged into r1 instead.
3. The (intermediate) results become indeterministic as there is no guarantee
that resources will always be compared and merged in the same order. Let
us consider the (simpli ed) example of three resources r1, r2, r3 with labels
r1:label =`Example 1', r2:label =`Example Two', r3:label =`Example Three',
the indexes represent the age, i. e., r1 is the oldest resource. If we merge r1
and r2, the client will receive an intermediate result with r1's label being
\Example Two" as we keep the longer label when merging. If instead, we
matched (and merged) r2 and r3 rst, the intermediate result would contain
r2 with label \Example Three".</p>
        <p>The rst two problems can be addressed by keeping track of the merge process
in a graph, more precisely a forest. We de ne a directed graph G = (V; E) where
each node v 2 V represents a resource. Two resource nodes r1; r2 are connected
(i.e. the edge (r1; r2) 2 E), if the entity resolution process recognized them as
identical and thus merged them (see Figure 3). The problem of updates and
deletions is handled in so far as the client receives corrections to earlier results
and eventually updates or deletes duplicates delivered to the client in an earlier
step of the resolution.</p>
        <p>
          Preconditions for Comparing Resources A further issue that has to be
addressed is that all resources should actually have the chance to be compared
with one another, unless one of the preconditions rules out a match between
them. Without parallel processing, this is easily ensured. In our threaded
approach, we assure this by having an entity manager that receives the resources
from providers in disjoint sets and thus can start a resolution thread on each
such set against all others. When querying data providers, we may very well
receive a total of more than 1000 records in major cities. Naive entity resolution,
i.e. comparing all pairs of resources, is therefore out of the question, since that
would result in n2 and thus O(n2) comparisons which requires too much time
for on-the- y matching. Instead, we cut down the number of comparisons by
applying precondition heuristics which are based on the conditions. The
precondition heuristics we use are: a) Transitivity: If two resources have a common
ancestor r1 in the merge tree, we will not carry out a comparison of r2 with r3.
b) Type: Only resources of the same type are compared, i. e., we build buckets
and compare events with other events, but not with locations, persons or
organizations and so on [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. c) Physical location: We calculate the distance between
resources using the Haversine formula [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] and only consider pairs of resources if
their distance is less then 500m or their postal addresses are similar (to account
for wrong coordinates from a data provider).
        </p>
        <p>
          Matching and Merging We only compare pairs of instances for which
the preconditions apply. When comparing instances, certain properties are more
important than others. We account for that by assigning weights to the di erent
properties as shown in Table 1, which are used in the scoring process. When
actually comparing two resources, we apply fuzzy matching and inexact string
matching [
          <xref ref-type="bibr" rid="ref3 ref6 ref7">3,6,7</xref>
          ]. Features that are more distinctive receive a higher weight,
while less distinctive features are weighted such that they may tip the scales if
necessary. As it is well known that exact text matching can be di cult on text
formatted in an inconsistent/heterogeneous way (e.g. [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]), our approach makes
use of string similarity metrics as well.
        </p>
        <p>Resources are merged if the score is above a certain threshold. When merging
resources, some attributes require special treatment, while others may not need
to be touched at all. In the merging process, the older/oldest resource takes
precedence in so far, as its properties will mostly be assumed to be correct
unless they were empty. Exceptions to this are the label and the description,
respectively. Here, we assume that longer text is better. A resource is considered
old(er) if it has already been merged into. Thus, the oldest resource is the root
of its merge tree. We call the oldest resource in a merging process rold or merge
target, while the other resource is rnew or the merge source. When merging, only
rold will be changed. Thus, given that we want to merge two resources r1; r2, the
merging process may or may not actually merge these two speci c resources. If
r2 has already been merged into another resource, we will nd the root of r2's
merge tree and set it as rnew. Respectively, if r1 has already been merged into
another resource, rold will be set to the root of r1's merge tree. In the situation
depicted in Figure 3, if we merged r7 into r6, it would be merged into r4 instead.
If we merged r6 into r2, we would instead merge r4 into r1.
3.3</p>
      </sec>
      <sec id="sec-1-2">
        <title>Experiment: Resolved Entities over Time</title>
        <p>
          Analyzing the amount of resolved resources the client received at a given point
in time is di cult due to our threaded approach. It may very well happen that a
thread that was started later than another will nish sooner. We can provide an
estimate though, which can be seen in Figure 4. It shows the average percentage
of resources that a client has received (out of all resources to their request) and
the percentage of resources that have undergone the resolution process. We ran
queries for the 5 largest cities by population in the US and Germany, respectively.
On average, we retrieved 959.2 resources per query. Almost 80% of all resources
are delivered within 5s from the request. At the same time, around 54% of the
received resources have been fully resolved. More details on our experiments like
determining the true positive rates of our engine can be found in our TR [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ].
        </p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        There exist various approaches to entity resolution such as [
        <xref ref-type="bibr" rid="ref10 ref11 ref6">10,11,6</xref>
        ]. The
conditions underlying to our approach are very similar to [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Other systems for
entity resolution we might have used are, e. g., Silk [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] or LIMES [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. We did
not use Silk, because it is designed to access data via the SPARQL protocol.
In addition, Silk does not provide for a shared-memory model where the entity
resolution is executed by in principle arbitrary many parallel entity resolution
threads. One problem with LIMES is that it requires all resources to be
available before starting the resolution process. Given that some data providers we
query answer only after more than 2.5 minutes, we could not use LIMES. A
more extensive discussion of related work can be found in our TR [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
      <p>The mobEx app with the incremental entity resolution is available at Google
Play: https://play.google.com/store/apps/details?id=de.unima.mobex.client</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Bhattacharya</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Getoor</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Licamele</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Query-time entity resolution</article-title>
          .
          <source>In: SIGKDD</source>
          ,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2006</year>
          )
          <volume>529</volume>
          {
          <fpage>534</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Chaudhuri</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ganjam</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ganti</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Motwani</surname>
          </string-name>
          , R.:
          <article-title>Robust and e cient fuzzy match for online data cleaning</article-title>
          .
          <source>In: SIGMOD</source>
          ,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2003</year>
          )
          <volume>313</volume>
          {
          <fpage>324</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Benjelloun</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Garcia-Molina</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Menestrina</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Su</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Whang</surname>
            ,
            <given-names>S.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Widom</surname>
          </string-name>
          , J.:
          <article-title>Swoosh: a generic approach to entity resolution</article-title>
          .
          <source>VLDB J</source>
          .
          <volume>18</volume>
          (
          <issue>1</issue>
          ) (
          <year>2009</year>
          )
          <volume>255</volume>
          {
          <fpage>276</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Schneider</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Scherp</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hunz</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>A comparative user study of faceted search in large data hierarchies on mobile devices</article-title>
          .
          <source>In: MUM</source>
          . (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Taheriyan</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Knoblock</surname>
            ,
            <given-names>C.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Szekely</surname>
            ,
            <given-names>P.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ambite</surname>
            ,
            <given-names>J.L.:</given-names>
          </string-name>
          <article-title>A graph-based approach to learn semantic descriptions of data sources</article-title>
          .
          <source>In: ISWC</source>
          . (
          <year>2013</year>
          )
          <volume>607</volume>
          {
          <fpage>623</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Michalowski</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ambite</surname>
            ,
            <given-names>J.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thakkar</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tuchinda</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Knoblock</surname>
            ,
            <given-names>C.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Minton</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Retrieving and semantically integrating heterogeneous data from the web</article-title>
          .
          <source>Intelligent Systems, IEEE</source>
          <volume>19</volume>
          (
          <issue>3</issue>
          ) (
          <year>2004</year>
          )
          <volume>72</volume>
          {
          <fpage>79</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Cohen</surname>
            ,
            <given-names>W.W.</given-names>
          </string-name>
          :
          <article-title>Data integration using similarity joins and a word-based information representation language</article-title>
          .
          <source>TOIS</source>
          <volume>18</volume>
          (
          <issue>3</issue>
          ) (
          <year>2000</year>
          )
          <volume>288</volume>
          {
          <fpage>321</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Shumaker</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sinnott</surname>
          </string-name>
          , R.:
          <article-title>Astronomical computing: 1. computing under the open sky. 2. virtues of the haversine</article-title>
          .
          <source>Sky and telescope 68</source>
          (
          <year>1984</year>
          )
          <volume>158</volume>
          {
          <fpage>159</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Opitz</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sztyler</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jess</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Knip</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bikar</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          , P ster, B.,
          <string-name>
            <surname>Scherp</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>An incremental approach to entity resolution (</article-title>
          <year>2013</year>
          ) URN: urn:nbn:de:bsz:
          <fpage>180</fpage>
          -
          <lpage>madoc347579</lpage>
          , URL: https://ub-madoc.bib.uni-mannheim.de/34757.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Kang</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sehgal</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Getoor</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Geoddupe: a novel interface for interactive entity resolution in geospatial data</article-title>
          .
          <source>In: Information Visualization</source>
          , IEEE (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Bhattacharya</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Getoor</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>A latent dirichlet model for unsupervised entity resolution</article-title>
          .
          <source>In: Int. Conference on Data Mining</source>
          . (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Volz</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bizer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gaedke</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kobilarov</surname>
          </string-name>
          , G.:
          <article-title>Silk-a link discovery framework for the web of data</article-title>
          .
          <source>In: Linked Data on the Web</source>
          ,
          <source>Citeseer</source>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Ngomo</surname>
            ,
            <given-names>A.C.N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Auer</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Limes: a time-e cient approach for large-scale link discovery on the web of data</article-title>
          . In: AAAI. (
          <year>2011</year>
          )
          <volume>2312</volume>
          {
          <fpage>2317</fpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>