=Paper=
{{Paper
|id=None
|storemode=property
|title=Data linking: capturing and utilising implicit schema-level relations
|pdfUrl=https://ceur-ws.org/Vol-628/ldow2010_paper07.pdf
|volume=Vol-628
|dblpUrl=https://dblp.org/rec/conf/www/NikolovUM10
}}
==Data linking: capturing and utilising implicit schema-level relations==
<pdf width="1500px">https://ceur-ws.org/Vol-628/ldow2010_paper07.pdf</pdf>
<pre>
                Data Linking: Capturing and Utilising Implicit
                           Schema-level Relations

                 Andriy Nikolov                         Victoria Uren                       Enrico Motta
             a.nikolov@open.ac.uk                  v.uren@dcs.shef.ac.uk                 e.motta@open.ac.uk
             Knowledge Media Institute             Department of Computer               Knowledge Media Institute
                 Open University                          Science                           Open University
                Milton Keynes, UK                   University of Sheffield                Milton Keynes, UK
                                                        Sheffield, UK


ABSTRACT                                                           to measure similarity between individuals. Thus, as a pre-
Schema-level heterogeneity represents an obstacle for auto-        processing step for generating coreference links between in-
mated discovery of coreference resolution links between in-        dividuals, it is desirable to align schema terms in an auto-
dividuals. Although there is a multitude of existing schema        mated way as well.
matching solutions, the Linked Data environment differs               Although the schema matching task (discovering map-
from the standard scenario assumed by these tools. In par-         pings betweem classes and properties) is an established re-
ticular, large volumes of data are available, and repositories     search topic both in the database and the Semantic Web
are connected into a graph by instance-level mappings. In          communities [4], the Linked Data environment has its spe-
this paper we describe how these features can be utilised to       cific features which are not utilised by existing methods and
produce schema-level mappings which facilitate the instance        can be exploited to support the schema matching process.
coreference resolution process. Initial experiments applying       In particular:
this approach to public datasets have produced encouraging              • It is possible to consider several interlinked datasets
results.                                                                  in combination instead of comparing each pair in iso-
                                                                          lation and to involve information contained in third-
Categories and Subject Descriptors                                        party datasets as background knowledge to support
                                                                          matching.
H.4.m [Information Systems]: Miscellaneous;
D.2 [Software]: Software Engineering                                    • Large volumes of instance data are available, which
                                                                          makes it possible to learn and exploit data patterns
Keywords                                                                  not represented explicitly in the ontologies.
Data fusion, coreference resolution, linked data                        • Actual relations between concepts and properties are
                                                                          fuzzy and cannot be adequately captured using de-
1.     INTRODUCTION                                                       scription logic terms: i.e., we are dealing with relations
                                                                          like “class overlap” or “relation overlap” rather than
   The Web of Data is constantly growing [1], and the coref-
                                                                          strict equivalence and subsumption.
erence links between data instances stored in different repos-
itories represent a major added value of the Linked Data           In this paper we describe how these features can be utilised
approach. These links connect individuals which refer to           to perform schema-level matching between Linked Data repos-
the same real-world entities using different URIs. Based           itories and, in turn, to facilitate instance coreference reso-
on these links, it is possible to combine bits of informa-         lution. We have implemented this approach and obtained
tion about the same real-world entity which were originally        encouraging results in the test experiments.
stored in several physical locations. Because of the large
amount of data, it is not possible to generate these links         2.     RELATED WORK
manually, and automatic coreference resolution tools are
                                                                      The problem of instance-level coreference resolution is well-
used. However, the usage of these tools is complicated by se-
                                                                   recognised in the Linked Data community [1]. Although in
mantic heterogeneity between repositories: although reusing
                                                                   some key property values (inverse functional properties) can
common terminologies (e.g., FOAF1 or Dublin Core2 ) is en-
                                                                   be compared [6], this is not sufficient in general case. In or-
couraged [1], existing repositories often use their own schemas.
                                                                   der to deal with it, the methods developed in the database
If repositories use different ontological schemas, it is not
                                                                   community are commonly adopted, in particular, determin-
clear which sets of individuals should be compared by the
                                                                   ing equivalence based on aggregated attribute-based similar-
coreference resolution tool, and which properties can be used
                                                                   ity [5] and the use of string similarity to compare property
1
    http://xmlns.com/foaf/0.1/                                     values [3]. For example, these principles are implemented
2
    http://www.purl.org/dc/                                        in SILK [10]. However, applying such a tool to a new pair
                                                                   of datasets requires significant user effort: the user has to
Copyright is held by the author/owner(s).
LDOW2010, April 27, 2010, Raleigh, USA.                            specify which sets of individuals from two datasets are po-
.                                                                  tentially overlapping, which attributes should be compared,
and which similarity metrics should be used for comparison.
If afterwards one of the datasets has to be connected to an-
other repository which uses a different schema, the user has
to redefine these settings.
   To minimise this user effort, it is currently a common
practice that a newly published repository is only linked to
one or a few “hub” repositories. DBPedia is the most pop-
ular generic “hub” repository, while there are also several
domain-specific ones (e.g., Geonames for geographical data
and Musicbrainz for music-related information). Then, in
order to obtain complete information about a certain entity
we need to compute a transitive closure of coreference links
and gather all URIs used to represent this entity in differ-
ent datasets. These transitive closures can be maintained
in a centralised way [7]3 and the mutual impact of atomic
mappings can be analysed [2]. However, this approach often
leads to the loss of information. For example, it can happen
                                                                  Figure 1: LinkedMDB and DBPedia: exploit-
that several datasets connected to the same “hub” reposi-
                                                                  ing instance-level coreference links with third-party
tory mention the same entity under different URIs. If the
                                                                  datasets. Solid arrows show existing owl:sameAs
“hub” repository itself does not mention this entity, then the
                                                                  (=) and movie:relatedBook links. Dashed arrows
coreference links between these URIs cannot be established.
                                                                  connect sets containing potentially omitted links.
It is also possible that one of the intermediate coreference
links is omitted due to an error of the coreference resolution
tool.
   In order to discover such missing links, the coreference       In the following subsections we will describe these two parts
resolution procedure has to be directly applied to the corre-     of our approach in more detail using illustrative examples
sponding subsets of datasets which are linked via one or sev-     from actual Linked Data repositories.
eral intermediate repositories. To identify such correspond-
ing subsets and comparable properties, the above-listed fea-      3.1     Inferring schema-level mappings
tures of the Linked Data environment can be exploited. Be-          In order to produce schema-level mappings between two
cause large volumes of data and partial sets of equivalence       data repositories based on existing instance-level links, the
links are available, it is possible to apply the instance-based   Linked Data environment allows two types of background
ontology matching techniques [4]. This is implemented in          knowledge to be utilised:
our approach.
                                                                     • Data-level evidence. This includes instance coreference
3.    USING BACKGROUND DATA FOR ON-                                    links between the two repositories being analysed and
                                                                       third-party repositories. These links can be aggregated
      TOLOGY MATCHING                                                  to indicate potentially overlapping sets of individuals
   Our approach is a further extension of the work presented           in two original datasets.
in [9]. Its main idea is to use a pre-existing set of instance-
level links for two purposes:                                        • Schema-level evidence. This includes ontological schemas
     • Infer schema-level relations between concepts and prop-         used in third-party repositories. Schema-level evidence
       erties of two different repositories. For example, by           can be utilised when (a) one dataset uses two different
       analysing the LinkedMDB repository4 , which describes           vocabularies which model the domain with different
       movies from the IMDB database, and DBPedia5 , which             levels of detail or (b) the same schema is reused by
       describes Wikipedia entries, together with their incom-         several repositories. This schema-level evidence can
       ing and outgoing instance-level links, we can establish         provide additional insights into the meaning of con-
       relations between their classes movie:music contributor         cepts and properties based on their usage.
       and dbpedia:Artist, and between properties movie:actor
       and dbpedia:starring. These schema-level mappings          3.1.1    Data-level evidence
       can afterwards be utilised by an instance-level corefer-      Let us consider an example shown in Fig. 1. The Linked-
       ence resolution tool.                                      MDB repository contains data about movies structured us-
                                                                  ing a special Movie ontology. Many of its individuals are
     • Infer data patterns which hold for instances of these
                                                                  also mentioned in DBPedia under different URIs. Some of
       concepts and properties. For example, it is possible to
                                                                  these coreferent individuals, in particular, those belonging to
       infer that identical movies usually have the same re-
                                                                  classes movie:film and movie:actor, are explicitly linked to
       lease year and overlapping sets of actors. Later, these
                                                                  their counterparts in DBPedia by automatically produced
       patterns can be used to highlight the existing identity
                                                                  owl:sameAs relations. However, for individuals of some
       links which violate these patterns and are likely to be
                                                                  classes direct links are not available. For instance, there
       spurious.
                                                                  are no direct links between individuals of the class movie:
3
  http://www.sameas.org, http://www.rkbexplorer.com               music contributor representing composers, whose music was
4
  http://data.linkedmdb.org/                                      used in movies, and corresponding DBPedia resources. Then,
5
  http://dbpedia.org                                              there are relations of the type movie:relatedBook from movies
to related books in RDF Book Mashup6 , but not to books
mentioned in DBPedia. Partially, such mappings can be
obtained by computing a transitive closure for individu-
als connected by coreference links via intermediate repos-
itories (MusicBrainz7 for composers and Book Mashup for
books). In this way, many links are not discovered because
of omissions of an intermediate link in a chain (e.g., 32% of
movie: music contributor instances were not connected to
corresponding DBPedia instances). Such links can be dis-
covered by applying an instance coreference resolution tool
(like SILK [10] or KnoFuss [8]) directly to corresponding
subsets of LinkedMDB and DBPedia. However, in order to
apply them, it is necessary to separate these correspond-
ing subsets from irrelevant data, in other words, to specify
mappings between classes which are likely to contain iden-
tical individuals.
   In this situation, we can use our schema matching ap-
proach which includes the following steps:                        Figure 2: DBPedia and DBLP: exploiting schema-
   1. Combining identical individuals into clusters. At this      level links with third-party datasets. Solid arrows
stage all identical individuals from a set of datasets are com-   show existing owl:sameAs (=) and rdf:type links.
bined into clusters based on the transitive closure of existing   Dashed arrows represent discovered schema rela-
owl:sameAs relations.                                             tions. The system identifies the subset of dbpe-
   2. Establishing relations between clusters and schema terms.   dia:Person instances, which overlaps with DBLP
For example, if one individual in the cluster belongs to the      foaf:Person instances, as a union of classes defined
class dbpedia:Artist, then we say that the whole cluster be-      in Yago.
longs to this class. The same applies for properties of each
individual in the cluster.
   3. Inferring mappings between schema terms using in-           3.1.2     Schema-level evidence
stance set similarity. Instead of strict owl:equivalentClass or      In the example shown in Fig. 1 the main source of back-
owl:subClassOf relations we produce fuzzy relations #over-        ground knowledge are existing instance-level coreference links
lapsWith. Formally this relation is similar to the umbel:         with third-party repositories (MusicBrainz and Book Mashup).
isAligned property defined in the Umbel vocabulary8 and           One case when schema-level evidence can be utilised is when
states that two classes share a subset of their individuals.      instances in a dataset are linked to a schema used by a third-
This relation has a quantitative measure (a number be-            party repository. For example, in Fig. 2 both DBPedia and
tween 0 and 1) which is used to distinguish between strongly      DBLP contain individuals representing the same computer
correlated classes (like dbpedia:Actor and movie:Actor ) and      scientists. However, only a small proportion of these indi-
merely non-disjoint ones (like movie:actor and dbpedia: Foot-     viduals is explicitly linked by owl:sameAs mappings (196
ballPlayer, which share several instances such as “Vinnie         links). Applying automatic coreference resolution, which
Jones”). This measure is computed as the value of the over-       could derive more mappings, is complicated by two issues:
lap coefficient:
                                                                       • Datasets do not contain overlapping properties for their
                                        |c(A) ∩ c(B)|                    individuals apart from personal names.
    sim(A, B) = overlap(c(A), c(B)) =                     ,
                                      min(|c(A)|, |c(B)|)
                                                                       • Individuals which belong to overlapping subsets are
where c(A) and c(B) - sets of instance clusters assigned to              not distinguished from others: in DBLP all paper au-
classes A and B respectively. The strength of a relation                 thors belong to the foaf:Person class, while in DBPe-
between properties is computed as                                        dia the majority of computer scientists are assigned to
                                                                         a generic class dbpedia:Person and not distinguished
                                       |c(X)|                            from other people.
                     sim(r1 , r2 ) =           ,
                                       |c(Y )|
                                                                  Using name similarity to produce mappings between in-
where c(X) - a set of clusters which have equivalent values       stances is likely to produce many false positive links due
for properties r1 and r2 and c(Y) - a set of all clusters which   to ambiguity of personal names. The source of the schema-
have values for both properties r1 and r2 .                       level evidence which can resolve this issue is the Yago repos-
   Resulting mappings are filtered by comparing the strength      itory9 . The Yago ontology is based on Wikipedia cate-
of each relation with a pre-defined threshold, and weak map-      gories and provides a more detailed hierarchy of classes than
pings are removed from the resulting set. The resulting           the DBPedia ontology. Using the procedure described in
set of mappings is passed to the coreference resolution tool      section 3.1.1, we can approximate the boundaries of the
(in our case, KnoFuss) which compares instances belonging         DBPedia subset which overlaps with DBLP. The algorithm
to mapped classes and generates instance coreference map-         returns a set of mappings between the Yago classes and
pings.                                                            the foaf:Person class in DBLP, e.g., between foaf:Person
6                                                                 and yago:MicrosoftEmployees and between foaf:Person and
  http://www4.wiwiss.fu-berlin.de/bizer/bookmashup/
7                                                                 yago:BritishComputerScientists. Having these mappings, the
  http://dbtune.org/musicbrainz/
8                                                                 9
  http://www.umbel.org/technical documentation.html                   http://www.mpi-inf.mpg.de/yago-naga/yago/
                                                                   ness manually. However, by analyzing the data patterns
                                                                   it is possible to select subsets of mappings which are more
                                                                   likely to contain spurious mappings and highlight them. For
                                                                   instance, in the example shown in Fig. 1, we established
                                                                   the relations between pairs of properties {movie:actor ; dbpe-
                                                                   dia:starring} (sim = 0.98) and {movie:initial release date;
                                                                   dbpedia:releasedDate} (sim = 0.96). In other words, most
Figure 3:    Gutenberg/DBPedia/Book Mashup:                        equivalent movies have the same release date and are re-
Aligning relations dc:creator and dbpedia:author                   lated to overlapping sets of actors. Thus, we can hypothesise
which   have    strongly   overlapping  domains                    that the mappings between individuals representing movies
(Book Mashup/DBPedia) and ranges (Guten-                           where these patterns do not hold are more likely to be spu-
berg/DBPedia)                                                      rious.
                                                                      Currently, our algorithm can infer two kinds of patterns
                                                                   corresponding to the functionality and inverse functionality
instance-level coreference resolution can be applied only to       property restrictions:
instances of mapped classes and produce results with higher
accuracy.                                                               • Property value equivalence. This states that for a pair
   Another scenario where schema-level evidence can be utilised           of aligned properties {r1 , r2 }, equivalent individuals
is the case when one ontology is reused in several reposi-                I1 ≡ I2 should have equivalent values: r1 (I1 , x), r2 (I2 , y),
tories. Then data from all these repositories can be used                 x≡y
to reason about the usage patterns of the terms of this
ontology. For example, in Fig. 3 three datasets (Guten-                 • Property subject equivalence. This states that for a
berg project10 , RDF Book Mashup, and DBPedia) describe                   pair of aligned properties {r1 , r2 }, if the objects of
books and their authors, and two of them (Book Mashup                     these properties are equivalent individuals I1 ≡ I2 ,
and Gutenberg) use the Dublin Core vocabulary. There ex-                  the subjects should be equivalent as well: r1 (I3 , I1 ),
ist a set of owl:sameAs links between books in RDF Book                   r2 (I4 , I2 ), I3 ≡ I4 .
Mashup and DBPedia, and a set of links between authors             It is important to note that these data patterns can be use-
in DBPedia and Gutenberg project. However, there are no            ful for refinement of existing mapping sets only if they were
links between Book Mashup and DBPedia authors or be-               not taken into account by the original instance coreference
tween Gutenberg and DBPedia books. Again, direct classes           resolution algorithm. Otherwise, they become tautological:
foaf:Person and dbpedia:Person are too generic to provide          e.g., by analysing a set of mappings produced by comput-
useful input for the coreference resolution stage: comparing       ing label similarity we can infer that equivalent instances
all Person individuals in DBPedia and RDF Book Mashup              usually have similar labels. Therefore, the refinement pro-
is likely to produce many spurious mappings between people         cedure can be used in two cases: (i) where the provenance of
having the same names. But, using evidence from all three          original mappings is available and the algorithm which pro-
repositories, it is possible to establish a relation between the   duced them is known, or (ii) where a significant body of new
properties dc:creator and dbpedia:author. Based on the set         evidence is discovered, e.g., a new set of instance mappings
of co-referent books from DBPedia and Book Mashup we               as a result of the process described in section 3.1.
can infer that these properties have the same domain, and
based on the mappings between authors we can infer that
they have the same range. Given that no property in DB-            4.     EXPERIMENTS
Pedia is a stronger candidate for the matching relation, we           In order to test our approach, we experimented with ex-
can produce the schema-level mapping between properties            isting Linked Data repositories mentioned in our examples:
dc:creator and dbpedia:author. After that we can establish
missing relations from Gutenberg to DBPedia (books) and                 1. Finding equivalence links between individuals repre-
from Book Mashup to DBPedia (authors) by comparing in-                     senting people in DBPedia and DBLP11 (auxiliary dataset:
dividuals which are connected via these properties to already              Yago, gold standard size 1229).
mapped ones.                                                            2. Finding equivalence links between music contributor
   When processing schema-level evidence, it is important to               individuals in LinkedMDB and corresponding individ-
bear in mind that the same ontological terms can be used                   uals in DBPedia (auxiliary dataset: Musicbrainz, gold
in different repositories with different interpretation: e.g.,             standard size 942).
in our example, in DBLP a generic foaf:Person class in fact
refers only to people related to computer science.                      3. Finding movie:relatedBook links between movie:film
                                                                           individuals in LinkedMDB and books mentioned in
3.2       Inferring data patterns and refining the                         DBPedia (auxiliary dataset: RDF Book Mashup, gold
          set of existing mappings                                         standard size 419).
   It is sometimes the case that the existing set of owl:sameAs    11
                                                                    DBLP contains a substantial proportion of internal coref-
mappings contains spurious mappings connecting distinct            erence errors: e.g., several authors having the same URI
individuals: it is hard to avoid errors when an automatic          or the same person having several URIs. In our tests we
coreference resolution tool applies some fuzzy similarity met-     did not consider these issues: e.g., a mapping between in-
rics to process large amounts of data. If the resulting set of     stances from DBLP and DBPedia was considered correct if
mappings is large, it is not feasible to check their correct-      the DBLP instance was linked to at least one publication
                                                                   which was written by the person represented by the DBPe-
10
     http://www4.wiwiss.fu-berlin.de/gutendata/                    dia instance.
                                                                         to problems: for example, in the DBPedia dataset many
                        Table 1: Test results                            musicians were not assigned to an appropriate class dbpe-
     N     Dataset               Test       Precision   Recall   F1      dia:MusicalArtist but instead were assigned to more general
                                 Baseline   0.90        0.14     0.25    classes dbpedia:Artist or even dbpedia:Person. As a result,
     1     DBPedia/DBLP
                                 Aligned    0.93        0.89     0.91
           LinkedMDB/DBPedia     Baseline   0.99        0.68     0.81
                                                                         the mapping was established between classes movie: mu-
     2                                                                   sic contributor and dbpedia:Artist instead of dbpedia: Musi-
           (composers)           Aligned    0.98        0.97     0.98
     3
           LinkedMDB/DBPedia     Baseline   0.97        0.82     0.89    calArtist. As a result, KnoFuss had to be applied to a larger
           (books)               Aligned    0.96        0.97     0.96    set containing many irrelevant individuals and produced sev-
           LinkedMDB/DBPedia     Baseline   0.993       1.0      0.996
     4
           (films)               Aligned    1.0         0.999    1.0     eral erroneous coreference links between movie composers
           Gutenberg/            Baseline   N/A         N/A      N/A     and non-musical artists. Given that occurrences of incor-
     5
           DBPedia (books)       Aligned    1.0         1.0      1.0     rect data are inevitable in the Linked Data environment,
           Book Mashup/          Baseline   N/A         N/A      N/A
     6
           DBPedia (authors)     Aligned    1.0         1.0      1.0
                                                                         these issues have to be taken into account when designing
                                                                         matching algorithms.

         4. Refining existing equivalence links between movie:film       5.   FUTURE WORK
            individuals in LinkedMDB and corresponding individ-             In this paper, we described an approach which captured
            uals mentioned in DBPedia by analysing related actors        schema-level relations between linked data repositories based
            and release dates (total link set size 18512).               on available instance data and reused these relations to fa-
                                                                         cilitate generation of new coreference links. In our experi-
         5. Finding equivalence links between books mentioned            ments, we applied this approach to small subgraphs of the
            in Gutenberg project and DBPedia (auxiliary dataset:         Linked Data cloud. In future, we plan to analyse the “schema
            RDF Book Mashup, gold standard size 1201).                   cloud” consisting of schema vocabularies used by Linked
                                                                         Data repositories in combination with the “data cloud” in-
         6. Finding equivalence links between book authors men-          cluding the datasets connected by instance-level links. In
            tioned in DBPedia and RDF Book Mashup, (auxiliary            particular, it is interesting to investigate how the usage pat-
            dataset: Gutenberg project, gold standard size 1235).        terns of the same vocabularies differ between repositories,
                                                                         to which extent it is possible to capture relations between
These tests were relatively small scale due to the need to
                                                                         their terms, and under which conditions these relations can
construct gold standard mappings manually. In the tests we
                                                                         be utilised to support the coreference resolution process.
initially applied our instance-based schema-matching algo-
rithm to the datasets to obtain schema-level relations. Then,
these relations were passed as input to our data-level corefer-          6.   REFERENCES
ence resolution tool KnoFuss, which processed the datasets                [1] C. Bizer, T. Heath, and T. Berners-Lee. Linked data -
to discover owl:sameAs links between instances. As a sim-                     the story so far. IJSWIS, 5(3):1–22.
ilarity measure, we used Jaro string similarity applied to                [2] P. Cudré-Mauroux, P. Haghani, M. Jost, K. Aberer,
the label. Test results (precision, recall and F1 measure)                    and H. de Meer. idMesh: Graph-based disambiguation
are given in the Table 1. Two sets of results are provided:                   of linked data. In WWW 2009, 2009.
(i) baseline, which involves computing transitive closure of              [3] A. K. Elmagarmid, P. G. Ipeirotis, and V. S. Verykios.
already existing links12 , and (ii) combined set of existing                  Duplicate record detection: A survey. IEEE KDE
results and new results obtained by the algorithm after the                   Transactions, 19(1):1–16, 2007.
schema alignment. As was expected, the usage of automat-                  [4] J. Euzenat and P. Shvaiko. Ontology matching.
ically produced schema alignments led to an improvement                       Springer-Verlag, Heidelberg, 2007.
in recall (rows 1, 2, 3, 5, 6) because initially missed links
                                                                          [5] I. P. Fellegi and A. B. Sunter. A theory for record
were discovered. In case 4 precision was affected because
                                                                              linkage. JASA, 64(328):1183–1210, 1969.
the mappings which did not conform to the data pattern
were removed. The change in precision was small due to                    [6] A. Hogan, A. Harth, and S. Decker. Performing object
the large size of the dataset (140 mappings were removed                      consolidation on the Semantic Web data graph. In I3
from the set of 18512), however, the precision of the refine-                 Workshop, WWW2007, Banff, Canada, 2007.
ment procedure was high: out of 140 mappings identified as                [7] A. Jaffri, H. Glaser, and I. Millard. Managing URI
potentially incorrect, 132 were actually incorrect.                           synonymity to enable consistent reference on the
   When analysing the results of the experiments, we looked                   Semantic Web. In IRSW2008 Workshop, Tenerife,
into the limiting factors which caused errors. The most im-                   Spain, 2008.
portant factor involved the quality of the datasets them-                 [8] A. Nikolov, V. Uren, E. Motta, and A. de Roeck.
selves, in particular, improper use of schema entities and                    Integration of semantically annotated data by the
incorrect data statements. For example, in the tests where                    KnoFuss architecture. In EKAW 2008, Acitrezza,
inferred data patterns were applied to the filter out incor-                  Italy, 2008.
rectly linked movie:film entities (row 4), we found that the              [9] A. Nikolov, V. Uren, E. Motta, and A. de Roeck.
equivalence of release dates cannot be used as a restriction                  Overcoming schema heterogeneity between linked
on its own: in about 50% of cases the mapping was cor-                        semantic repositories to improve coreference
rect, while the release date was not provided correctly in                    resolution. In ASWC 2009, 2009.
one of the datasets. Incomplete information could also lead              [10] J. Volz, C. Bizer, M. Gaedke, and G. Kobilarov.
12
                                                                              Discovering and maintaining links on the web of data.
 As was said in section 3.1.2, for Gutenberg and Book                         In ISWC 2009, 2009.
Mashup we did not have any baseline links available.

</pre>