<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>April</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Data Linking: Capturing and Utilising Implicit Schema-level Relations</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Andriy Nikolov</string-name>
          <email>a.nikolov@open.ac.uk</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Victoria Uren</string-name>
          <email>v.uren@dcs.shef.ac.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Enrico Motta</string-name>
          <email>e.motta@open.ac.uk</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer</institution>
          ,
          <addr-line>Science</addr-line>
          ,
          <institution>University of Sheffield</institution>
          ,
          <addr-line>Sheffield</addr-line>
          ,
          <country country="UK">UK</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Knowledge Media Institute, Open University</institution>
          ,
          <addr-line>Milton Keynes</addr-line>
          ,
          <country country="UK">UK</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2010</year>
      </pub-date>
      <volume>27</volume>
      <issue>2010</issue>
      <abstract>
        <p>Schema-level heterogeneity represents an obstacle for automated discovery of coreference resolution links between individuals. Although there is a multitude of existing schema matching solutions, the Linked Data environment di ers from the standard scenario assumed by these tools. In particular, large volumes of data are available, and repositories are connected into a graph by instance-level mappings. In this paper we describe how these features can be utilised to produce schema-level mappings which facilitate the instance coreference resolution process. Initial experiments applying this approach to public datasets have produced encouraging results.</p>
      </abstract>
      <kwd-group>
        <kwd>Data fusion</kwd>
        <kwd>coreference resolution</kwd>
        <kwd>linked data</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>
        The Web of Data is constantly growing [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], and the
coreference links between data instances stored in di erent
repositories represent a major added value of the Linked Data
approach. These links connect individuals which refer to
the same real-world entities using di erent URIs. Based
on these links, it is possible to combine bits of
information about the same real-world entity which were originally
stored in several physical locations. Because of the large
amount of data, it is not possible to generate these links
manually, and automatic coreference resolution tools are
used. However, the usage of these tools is complicated by
semantic heterogeneity between repositories: although reusing
common terminologies (e.g., FOAF1 or Dublin Core2) is
encouraged [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], existing repositories often use their own schemas.
If repositories use di erent ontological schemas, it is not
clear which sets of individuals should be compared by the
coreference resolution tool, and which properties can be used
      </p>
      <sec id="sec-1-1">
        <title>1http://xmlns.com/foaf/0.1/ 2http://www.purl.org/dc/</title>
        <p>to measure similarity between individuals. Thus, as a
preprocessing step for generating coreference links between
individuals, it is desirable to align schema terms in an
automated way as well.</p>
        <p>
          Although the schema matching task (discovering
mappings betweem classes and properties) is an established
research topic both in the database and the Semantic Web
communities [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], the Linked Data environment has its
speci c features which are not utilised by existing methods and
can be exploited to support the schema matching process.
In particular:
        </p>
        <p>It is possible to consider several interlinked datasets
in combination instead of comparing each pair in
isolation and to involve information contained in
thirdparty datasets as background knowledge to support
matching.</p>
        <p>Large volumes of instance data are available, which
makes it possible to learn and exploit data patterns
not represented explicitly in the ontologies.</p>
        <p>Actual relations between concepts and properties are
fuzzy and cannot be adequately captured using
description logic terms: i.e., we are dealing with relations
like \class overlap" or \relation overlap" rather than
strict equivalence and subsumption.</p>
        <p>In this paper we describe how these features can be utilised
to perform schema-level matching between Linked Data
repositories and, in turn, to facilitate instance coreference
resolution. We have implemented this approach and obtained
encouraging results in the test experiments.
2.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>RELATED WORK</title>
      <p>
        The problem of instance-level coreference resolution is
wellrecognised in the Linked Data community [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Although in
some key property values (inverse functional properties) can
be compared [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], this is not su cient in general case. In
order to deal with it, the methods developed in the database
community are commonly adopted, in particular,
determining equivalence based on aggregated attribute-based
similarity [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and the use of string similarity to compare property
values [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. For example, these principles are implemented
in SILK [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. However, applying such a tool to a new pair
of datasets requires signi cant user e ort: the user has to
specify which sets of individuals from two datasets are
potentially overlapping, which attributes should be compared,
and which similarity metrics should be used for comparison.
If afterwards one of the datasets has to be connected to
another repository which uses a di erent schema, the user has
to rede ne these settings.
      </p>
      <p>
        To minimise this user e ort, it is currently a common
practice that a newly published repository is only linked to
one or a few \hub" repositories. DBPedia is the most
popular generic \hub" repository, while there are also several
domain-speci c ones (e.g., Geonames for geographical data
and Musicbrainz for music-related information). Then, in
order to obtain complete information about a certain entity
we need to compute a transitive closure of coreference links
and gather all URIs used to represent this entity in di
erent datasets. These transitive closures can be maintained
in a centralised way [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]3 and the mutual impact of atomic
mappings can be analysed [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. However, this approach often
leads to the loss of information. For example, it can happen
that several datasets connected to the same \hub"
repository mention the same entity under di erent URIs. If the
\hub" repository itself does not mention this entity, then the
coreference links between these URIs cannot be established.
It is also possible that one of the intermediate coreference
links is omitted due to an error of the coreference resolution
tool.
      </p>
      <p>
        In order to discover such missing links, the coreference
resolution procedure has to be directly applied to the
corresponding subsets of datasets which are linked via one or
several intermediate repositories. To identify such
corresponding subsets and comparable properties, the above-listed
features of the Linked Data environment can be exploited.
Because large volumes of data and partial sets of equivalence
links are available, it is possible to apply the instance-based
ontology matching techniques [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. This is implemented in
our approach.
      </p>
      <p>USING BACKGROUND DATA FOR
ONTOLOGY MATCHING</p>
      <p>
        Our approach is a further extension of the work presented
in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Its main idea is to use a pre-existing set of
instancelevel links for two purposes:
      </p>
      <p>Infer schema-level relations between concepts and
properties of two di erent repositories. For example, by
analysing the LinkedMDB repository4, which describes
movies from the IMDB database, and DBPedia5, which
describes Wikipedia entries, together with their
incoming and outgoing instance-level links, we can establish
relations between their classes movie:music contributor
and dbpedia:Artist, and between properties movie:actor
and dbpedia:starring. These schema-level mappings
can afterwards be utilised by an instance-level
coreference resolution tool.</p>
      <p>Infer data patterns which hold for instances of these
concepts and properties. For example, it is possible to
infer that identical movies usually have the same
release year and overlapping sets of actors. Later, these
patterns can be used to highlight the existing identity
links which violate these patterns and are likely to be
spurious.
3http://www.sameas.org, http://www.rkbexplorer.com
4http://data.linkedmdb.org/
5http://dbpedia.org
In the following subsections we will describe these two parts
of our approach in more detail using illustrative examples
from actual Linked Data repositories.
3.1</p>
    </sec>
    <sec id="sec-3">
      <title>Inferring schema-level mappings</title>
      <p>In order to produce schema-level mappings between two
data repositories based on existing instance-level links, the
Linked Data environment allows two types of background
knowledge to be utilised:</p>
      <p>Data-level evidence. This includes instance coreference
links between the two repositories being analysed and
third-party repositories. These links can be aggregated
to indicate potentially overlapping sets of individuals
in two original datasets.</p>
      <p>Schema-level evidence. This includes ontological schemas
used in third-party repositories. Schema-level evidence
can be utilised when (a) one dataset uses two di erent
vocabularies which model the domain with di erent
levels of detail or (b) the same schema is reused by
several repositories. This schema-level evidence can
provide additional insights into the meaning of
concepts and properties based on their usage.
3.1.1</p>
      <p>Data-level evidence</p>
      <p>
        Let us consider an example shown in Fig. 1. The
LinkedMDB repository contains data about movies structured
using a special Movie ontology. Many of its individuals are
also mentioned in DBPedia under di erent URIs. Some of
these coreferent individuals, in particular, those belonging to
classes movie: lm and movie:actor, are explicitly linked to
their counterparts in DBPedia by automatically produced
owl:sameAs relations. However, for individuals of some
classes direct links are not available. For instance, there
are no direct links between individuals of the class movie:
music contributor representing composers, whose music was
used in movies, and corresponding DBPedia resources. Then,
there are relations of the type movie:relatedBook from movies
to related books in RDF Book Mashup6, but not to books
mentioned in DBPedia. Partially, such mappings can be
obtained by computing a transitive closure for
individuals connected by coreference links via intermediate
repositories (MusicBrainz7 for composers and Book Mashup for
books). In this way, many links are not discovered because
of omissions of an intermediate link in a chain (e.g., 32% of
movie: music contributor instances were not connected to
corresponding DBPedia instances). Such links can be
discovered by applying an instance coreference resolution tool
(like SILK [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] or KnoFuss [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]) directly to corresponding
subsets of LinkedMDB and DBPedia. However, in order to
apply them, it is necessary to separate these
corresponding subsets from irrelevant data, in other words, to specify
mappings between classes which are likely to contain
identical individuals.
      </p>
      <p>In this situation, we can use our schema matching
approach which includes the following steps:</p>
      <p>1. Combining identical individuals into clusters. At this
stage all identical individuals from a set of datasets are
combined into clusters based on the transitive closure of existing
owl:sameAs relations.</p>
      <p>2. Establishing relations between clusters and schema terms.
For example, if one individual in the cluster belongs to the
class dbpedia:Artist, then we say that the whole cluster
belongs to this class. The same applies for properties of each
individual in the cluster.</p>
      <p>3. Inferring mappings between schema terms using
instance set similarity. Instead of strict owl:equivalentClass or
owl:subClassOf relations we produce fuzzy relations
#overlapsWith. Formally this relation is similar to the umbel:
isAligned property de ned in the Umbel vocabulary8 and
states that two classes share a subset of their individuals.
This relation has a quantitative measure (a number
between 0 and 1) which is used to distinguish between strongly
correlated classes (like dbpedia:Actor and movie:Actor ) and
merely non-disjoint ones (like movie:actor and dbpedia:
FootballPlayer, which share several instances such as \Vinnie
Jones"). This measure is computed as the value of the
overlap coe cient:
sim(A; B) = overlap(c(A); c(B)) =
jc(A) \ c(B)j
min(jc(A)j; jc(B)j)
;
where c(A) and c(B) - sets of instance clusters assigned to
classes A and B respectively. The strength of a relation
between properties is computed as
sim(r1; r2) = jc(X)j ;
jc(Y )j
where c(X) - a set of clusters which have equivalent values
for properties r1 and r2 and c(Y) - a set of all clusters which
have values for both properties r1 and r2.</p>
      <p>Resulting mappings are ltered by comparing the strength
of each relation with a pre-de ned threshold, and weak
mappings are removed from the resulting set. The resulting
set of mappings is passed to the coreference resolution tool
(in our case, KnoFuss) which compares instances belonging
to mapped classes and generates instance coreference
mappings.
6http://www4.wiwiss.fu-berlin.de/bizer/bookmashup/
7http://dbtune.org/musicbrainz/
8http://www.umbel.org/technical documentation.html</p>
      <p>In the example shown in Fig. 1 the main source of
background knowledge are existing instance-level coreference links
with third-party repositories (MusicBrainz and Book Mashup).
One case when schema-level evidence can be utilised is when
instances in a dataset are linked to a schema used by a
thirdparty repository. For example, in Fig. 2 both DBPedia and
DBLP contain individuals representing the same computer
scientists. However, only a small proportion of these
individuals is explicitly linked by owl:sameAs mappings (196
links). Applying automatic coreference resolution, which
could derive more mappings, is complicated by two issues:
Datasets do not contain overlapping properties for their
individuals apart from personal names.</p>
      <p>Individuals which belong to overlapping subsets are
not distinguished from others: in DBLP all paper
authors belong to the foaf:Person class, while in
DBPedia the majority of computer scientists are assigned to
a generic class dbpedia:Person and not distinguished
from other people.</p>
      <p>Using name similarity to produce mappings between
instances is likely to produce many false positive links due
to ambiguity of personal names. The source of the
schemalevel evidence which can resolve this issue is the Yago
repository9. The Yago ontology is based on Wikipedia
categories and provides a more detailed hierarchy of classes than
the DBPedia ontology. Using the procedure described in
section 3.1.1, we can approximate the boundaries of the
DBPedia subset which overlaps with DBLP. The algorithm
returns a set of mappings between the Yago classes and
the foaf:Person class in DBLP, e.g., between foaf:Person
and yago:MicrosoftEmployees and between foaf:Person and
yago:BritishComputerScientists. Having these mappings, the</p>
      <sec id="sec-3-1">
        <title>9http://www.mpi-inf.mpg.de/yago-naga/yago/</title>
        <p>instance-level coreference resolution can be applied only to
instances of mapped classes and produce results with higher
accuracy.</p>
        <p>Another scenario where schema-level evidence can be utilised
is the case when one ontology is reused in several
repositories. Then data from all these repositories can be used
to reason about the usage patterns of the terms of this
ontology. For example, in Fig. 3 three datasets
(Gutenberg project10, RDF Book Mashup, and DBPedia) describe
books and their authors, and two of them (Book Mashup
and Gutenberg) use the Dublin Core vocabulary. There
exist a set of owl:sameAs links between books in RDF Book
Mashup and DBPedia, and a set of links between authors
in DBPedia and Gutenberg project. However, there are no
links between Book Mashup and DBPedia authors or
between Gutenberg and DBPedia books. Again, direct classes
foaf:Person and dbpedia:Person are too generic to provide
useful input for the coreference resolution stage: comparing
all Person individuals in DBPedia and RDF Book Mashup
is likely to produce many spurious mappings between people
having the same names. But, using evidence from all three
repositories, it is possible to establish a relation between the
properties dc:creator and dbpedia:author. Based on the set
of co-referent books from DBPedia and Book Mashup we
can infer that these properties have the same domain, and
based on the mappings between authors we can infer that
they have the same range. Given that no property in
DBPedia is a stronger candidate for the matching relation, we
can produce the schema-level mapping between properties
dc:creator and dbpedia:author. After that we can establish
missing relations from Gutenberg to DBPedia (books) and
from Book Mashup to DBPedia (authors) by comparing
individuals which are connected via these properties to already
mapped ones.</p>
        <p>When processing schema-level evidence, it is important to
bear in mind that the same ontological terms can be used
in di erent repositories with di erent interpretation: e.g.,
in our example, in DBLP a generic foaf:Person class in fact
refers only to people related to computer science.
3.2</p>
        <p>Inferring data patterns and refining the
set of existing mappings</p>
        <p>It is sometimes the case that the existing set of owl:sameAs
mappings contains spurious mappings connecting distinct
individuals: it is hard to avoid errors when an automatic
coreference resolution tool applies some fuzzy similarity
metrics to process large amounts of data. If the resulting set of
mappings is large, it is not feasible to check their
correct10http://www4.wiwiss.fu-berlin.de/gutendata/
ness manually. However, by analyzing the data patterns
it is possible to select subsets of mappings which are more
likely to contain spurious mappings and highlight them. For
instance, in the example shown in Fig. 1, we established
the relations between pairs of properties fmovie:actor ;
dbpedia:starring g (sim = 0:98) and fmovie:initial release date;
dbpedia:releasedDateg (sim = 0:96). In other words, most
equivalent movies have the same release date and are
related to overlapping sets of actors. Thus, we can hypothesise
that the mappings between individuals representing movies
where these patterns do not hold are more likely to be
spurious.</p>
        <p>Currently, our algorithm can infer two kinds of patterns
corresponding to the functionality and inverse functionality
property restrictions:</p>
        <p>Property value equivalence. This states that for a pair
of aligned properties fr1, r2g, equivalent individuals
I1 I2 should have equivalent values: r1(I1; x), r2(I2; y),
x y
Property subject equivalence. This states that for a
pair of aligned properties fr1, r2g, if the objects of
these properties are equivalent individuals I1 I2,
the subjects should be equivalent as well: r1(I3; I1),
r2(I4; I2), I3 I4.</p>
        <p>It is important to note that these data patterns can be
useful for re nement of existing mapping sets only if they were
not taken into account by the original instance coreference
resolution algorithm. Otherwise, they become tautological:
e.g., by analysing a set of mappings produced by
computing label similarity we can infer that equivalent instances
usually have similar labels. Therefore, the re nement
procedure can be used in two cases: (i) where the provenance of
original mappings is available and the algorithm which
produced them is known, or (ii) where a signi cant body of new
evidence is discovered, e.g., a new set of instance mappings
as a result of the process described in section 3.1.
4.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>EXPERIMENTS</title>
      <p>In order to test our approach, we experimented with
existing Linked Data repositories mentioned in our examples:
1. Finding equivalence links between individuals
representing people in DBPedia and DBLP11 (auxiliary dataset:
Yago, gold standard size 1229).
2. Finding equivalence links between music contributor
individuals in LinkedMDB and corresponding
individuals in DBPedia (auxiliary dataset: Musicbrainz, gold
standard size 942).
3. Finding movie:relatedBook links between movie: lm
individuals in LinkedMDB and books mentioned in
DBPedia (auxiliary dataset: RDF Book Mashup, gold
standard size 419).
11DBLP contains a substantial proportion of internal
coreference errors: e.g., several authors having the same URI
or the same person having several URIs. In our tests we
did not consider these issues: e.g., a mapping between
instances from DBLP and DBPedia was considered correct if
the DBLP instance was linked to at least one publication
which was written by the person represented by the
DBPedia instance.
4. Re ning existing equivalence links between movie: lm
individuals in LinkedMDB and corresponding
individuals mentioned in DBPedia by analysing related actors
and release dates (total link set size 18512).
5. Finding equivalence links between books mentioned
in Gutenberg project and DBPedia (auxiliary dataset:
RDF Book Mashup, gold standard size 1201).
6. Finding equivalence links between book authors
mentioned in DBPedia and RDF Book Mashup, (auxiliary
dataset: Gutenberg project, gold standard size 1235).
These tests were relatively small scale due to the need to
construct gold standard mappings manually. In the tests we
initially applied our instance-based schema-matching
algorithm to the datasets to obtain schema-level relations. Then,
these relations were passed as input to our data-level
coreference resolution tool KnoFuss, which processed the datasets
to discover owl:sameAs links between instances. As a
similarity measure, we used Jaro string similarity applied to
the label. Test results (precision, recall and F1 measure)
are given in the Table 1. Two sets of results are provided:
(i) baseline, which involves computing transitive closure of
already existing links12, and (ii) combined set of existing
results and new results obtained by the algorithm after the
schema alignment. As was expected, the usage of
automatically produced schema alignments led to an improvement
in recall (rows 1, 2, 3, 5, 6) because initially missed links
were discovered. In case 4 precision was a ected because
the mappings which did not conform to the data pattern
were removed. The change in precision was small due to
the large size of the dataset (140 mappings were removed
from the set of 18512), however, the precision of the re
nement procedure was high: out of 140 mappings identi ed as
potentially incorrect, 132 were actually incorrect.</p>
      <p>When analysing the results of the experiments, we looked
into the limiting factors which caused errors. The most
important factor involved the quality of the datasets
themselves, in particular, improper use of schema entities and
incorrect data statements. For example, in the tests where
inferred data patterns were applied to the lter out
incorrectly linked movie: lm entities (row 4), we found that the
equivalence of release dates cannot be used as a restriction
on its own: in about 50% of cases the mapping was
correct, while the release date was not provided correctly in
one of the datasets. Incomplete information could also lead
12As was said in section 3.1.2, for Gutenberg and Book
Mashup we did not have any baseline links available.
to problems: for example, in the DBPedia dataset many
musicians were not assigned to an appropriate class
dbpedia:MusicalArtist but instead were assigned to more general
classes dbpedia:Artist or even dbpedia:Person. As a result,
the mapping was established between classes movie:
music contributor and dbpedia:Artist instead of dbpedia:
MusicalArtist. As a result, KnoFuss had to be applied to a larger
set containing many irrelevant individuals and produced
several erroneous coreference links between movie composers
and non-musical artists. Given that occurrences of
incorrect data are inevitable in the Linked Data environment,
these issues have to be taken into account when designing
matching algorithms.
5.</p>
    </sec>
    <sec id="sec-5">
      <title>FUTURE WORK</title>
      <p>In this paper, we described an approach which captured
schema-level relations between linked data repositories based
on available instance data and reused these relations to
facilitate generation of new coreference links. In our
experiments, we applied this approach to small subgraphs of the
Linked Data cloud. In future, we plan to analyse the \schema
cloud" consisting of schema vocabularies used by Linked
Data repositories in combination with the \data cloud"
including the datasets connected by instance-level links. In
particular, it is interesting to investigate how the usage
patterns of the same vocabularies di er between repositories,
to which extent it is possible to capture relations between
their terms, and under which conditions these relations can
be utilised to support the coreference resolution process.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>C.</given-names>
            <surname>Bizer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Heath</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Berners-Lee</surname>
          </string-name>
          .
          <article-title>Linked data - the story so far</article-title>
          .
          <source>IJSWIS</source>
          ,
          <volume>5</volume>
          (
          <issue>3</issue>
          ):1{
          <fpage>22</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>P.</given-names>
            <surname>Cudre-Mauroux</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Haghani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Jost</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Aberer</surname>
          </string-name>
          , and H. de Meer. idMesh:
          <article-title>Graph-based disambiguation of linked data</article-title>
          .
          <source>In WWW</source>
          <year>2009</year>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A. K.</given-names>
            <surname>Elmagarmid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. G.</given-names>
            <surname>Ipeirotis</surname>
          </string-name>
          , and
          <string-name>
            <given-names>V. S.</given-names>
            <surname>Verykios</surname>
          </string-name>
          .
          <article-title>Duplicate record detection: A survey</article-title>
          .
          <source>IEEE KDE Transactions</source>
          ,
          <volume>19</volume>
          (
          <issue>1</issue>
          ):1{
          <fpage>16</fpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J.</given-names>
            <surname>Euzenat</surname>
          </string-name>
          and
          <string-name>
            <given-names>P.</given-names>
            <surname>Shvaiko</surname>
          </string-name>
          . Ontology matching. Springer-Verlag, Heidelberg,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>I. P.</given-names>
            <surname>Fellegi</surname>
          </string-name>
          and
          <string-name>
            <given-names>A. B.</given-names>
            <surname>Sunter</surname>
          </string-name>
          .
          <article-title>A theory for record linkage</article-title>
          .
          <source>JASA</source>
          ,
          <volume>64</volume>
          (
          <issue>328</issue>
          ):
          <volume>1183</volume>
          {
          <fpage>1210</fpage>
          ,
          <year>1969</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Hogan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Harth</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Decker</surname>
          </string-name>
          .
          <article-title>Performing object consolidation on the Semantic Web data graph</article-title>
          .
          <source>In I3 Workshop</source>
          , WWW2007, Ban , Canada,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Ja ri</surname>
          </string-name>
          , H. Glaser,
          <string-name>
            <surname>and I. Millard. Managing</surname>
          </string-name>
          <article-title>URI synonymity to enable consistent reference on the Semantic Web</article-title>
          . In IRSW2008 Workshop, Tenerife, Spain,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Nikolov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Uren</surname>
          </string-name>
          , E. Motta, and A. de Roeck.
          <article-title>Integration of semantically annotated data by the KnoFuss architecture</article-title>
          .
          <source>In EKAW</source>
          <year>2008</year>
          , Acitrezza, Italy,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.</given-names>
            <surname>Nikolov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Uren</surname>
          </string-name>
          , E. Motta, and A. de Roeck.
          <article-title>Overcoming schema heterogeneity between linked semantic repositories to improve coreference resolution</article-title>
          .
          <source>In ASWC</source>
          <year>2009</year>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>J.</given-names>
            <surname>Volz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Bizer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Gaedke</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G.</given-names>
            <surname>Kobilarov</surname>
          </string-name>
          .
          <article-title>Discovering and maintaining links on the web of data</article-title>
          .
          <source>In ISWC</source>
          <year>2009</year>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>