<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>URI Disambiguation in the Context of Linked Data</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Afraz Jaffri</string-name>
          <email>a.o.jaffri@ecs.soton.ac.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hugh Glaser</string-name>
          <email>hg@ecs.soton.ac.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ian Millard</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>School of Electronics and Computer</institution>
          ,
          <addr-line>Science</addr-line>
          ,
          <institution>University of Southampton</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>The Linked Data initiative has given rise to an increasing number of RDF datasets, many of which are freely accessible online. These resources often arise as a result of database exports; however sufficient consideration may not be given to the unseen implications caused when they are used in the wider context of the Semantic Web. This paper investigates two popular resources, DBLP and DBpedia, and discusses whether the issues regarding identity management and co-reference resolution have been suitably addressed. We find that a large percentage of authors in DBLP have been conflated, and that disambiguation pages have been incorrectly linked using owl:sameAs within DBpedia. Systems for dealing with these issues are presented, and directions are given for future research.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Linked Data</kwd>
        <kwd>URI</kwd>
        <kwd>Co-reference</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>
        As the Linking Open Data project gathers pace, more and more
repositories of knowledge are being added to the Linked Data
Cloud, covering a wide range of topics. Many datasets stem from
the focal point of the Linked Data Cloud, DBpedia [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Since
DBpedia has harvested knowledge from Wikipedia, there is the
potential to create links to any subject that is described in
Wikipedia.
      </p>
      <p>
        The datasets that have been interlinked so far have knowledge
relating to people, places, books, songs and CYC [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] concepts as
well as many others. Entities such as these are often prone to the
problems of duplication and co-reference.
      </p>
      <p>Whilst extensive linking between datasets has been widely
encouraged, there has been little analysis of the accuracy of the
links or the datasets themselves.</p>
      <p>Datasets are often converted from existing sources which can
themselves be either incomplete or inaccurate. The linking
process accentuates these inconsistencies and produces a snowball
effect as more datasets are added. If the Semantic Web is to
provide a meaningfully interconnected web of assertions and
relations, there must also be some guarantee or measure of the
correctness of the information.</p>
      <p>One of the main areas in which errors occur, both in databases
and in digital libraries which are the kinds of repositories that
have been converted into RDF for use with linked data, is the
problem of co-reference. Co-reference is the problem of ensuring
that two different entities do not share the same name or identifier,
and conversely identifying when two identifiers refer to the same
entity. In the context of the Semantic Web we are therefore
concerned with URIs.</p>
      <p>This paper presents some analysis of datasets used to link data and
raises the question of how to manage the identity and meaning of
URIs in the Semantic Web. The next section describes some
related work in the field of co-reference and author
disambiguation, while Section 3 describes the problem of
coreference and where it occurs in DBLP and DBpedia. Section 4
goes on to describe possible solutions to the problem that are
currently in deployment. Section 5 concludes and issues an
invitation to help to provide an infrastructure where data can be
confidently used on the Semantic Web.</p>
    </sec>
    <sec id="sec-2">
      <title>2. RELATED WORK</title>
    </sec>
    <sec id="sec-3">
      <title>2.1 Author Disambiguation</title>
      <p>
        The issue of resolving the problem of co-reference occurs in many
different disciplines. A brief overview of the problems and
solutions that appear in Information Science and database design
can be found in [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. One of the main areas in which co-reference
becomes a major problem is in author disambiguation. There are
many authors who share the same name and distinguishing
between them is a vital part of any digital library or citation
system. Not only do authors share the same names but variation in
the spelling of names can also lead to a single author having
multiple identities. For example, the author ‘Hugh Glaser’ could
be represented with his full name or by using ‘H. Glaser’, or
‘Glaser, H.’
A wide variety of methods have been employed to try and solve
the problem of author disambiguation. Some of these include
record linkage [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] used in databases, citation matching [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ],
name matching [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and name equivalence identification [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. These
methods involve some form of string matching and word sense
disambiguation.
      </p>
      <p>
        Although these methods can help in identifying names with
different spellings or written in different formats, the problem of
disambiguating authors with exactly the same name remains a
challenge. There have been recent attempts that use a different
approach from the traditional string based systems. Using the Web
as a means of author disambiguation has been highlighted as a
possible solution to the problem. Since Web pages often contain
information about people that are not included in citation
references, automatic scripts can be made that check the results of
search engine queries made on the names of authors [
        <xref ref-type="bibr" rid="ref18">19</xref>
        ]. Another
web-based approach attempts to find the publication page of an
author from his or her institution’s website and match the
publications contained in the page to citations in the repository
[
        <xref ref-type="bibr" rid="ref21">22</xref>
        ]. The accuracy of such web based systems ranges from 73% to
84%. These systems also rely on there being sufficient
information available on the Web about each author. This is not
always the case, especially with older publications and
publications not in the field of computer science.
      </p>
      <p>
        Another method that has been put into practice is to use an
unsupervised machine learning approach using k-way spectral
clustering that disambiguates authors in citations [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. This study
focused on the DBLP dataset and chose the top ranked ambiguous
names such as ‘J. Lee’, ‘S. Lee’, ‘Y. Chen’, ‘C. Chen’, ‘J
Anderson’ and ‘J Smith’. The unsupervised learning technique
used co-author names, publication titles and publication venue
titles for author disambiguation. This assumes that individuals
will quite often author with the same people and publish to the
same venues. The results of this experiment show that an average
of around 65% of authors can be successfully disambiguated.
The purpose of mentioning the ongoing work in author
disambiguation in a different domain is to highlight the
importance of a problem that is only beginning to be appreciated
on the Semantic Web. Section 3 will elaborate on this. The next
section will look at how co-reference is being managed on the
Semantic Web.
      </p>
    </sec>
    <sec id="sec-4">
      <title>2.2 Disambiguation on the Semantic Web</title>
      <p>There has been much discussion about identity and meaning on
the Semantic Web from a theoretical point of view. Such
discussions will continue as questions fundamental to the
architecture of the Semantic Web are debated. Attention is now
turning towards practical solutions of managing co-reference, or
URI identity management. Since co-reference between datasets is
essential for linked data to work properly, a perfect opportunity
arises to test some of the methods and solutions that have been
proposed.</p>
      <p>
        The various methods that have been suggested for managing
coreference and identity on the Semantic Web range from ontology
based [
        <xref ref-type="bibr" rid="ref11 ref21">22, 11</xref>
        ], object consolidation [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] to complete management
systems [
        <xref ref-type="bibr" rid="ref16 ref7">7, 16</xref>
        ]. The above applications have been used with
geographical data [
        <xref ref-type="bibr" rid="ref20">21</xref>
        ], wikis [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] and general Semantic Web data
[
        <xref ref-type="bibr" rid="ref16 ref7">7, 16</xref>
        ].
      </p>
      <p>
        There has been valuable work done on studying the reliability and
stability of Wikipedia URIs [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] that are being used by DBpedia.
This study suggests that the meaning of a URI found in Wikipedia
stays stable approximately 93% of the time. While the results have
been documented there has been little attempt to quantify how big
a problem co-reference on the Semantic Web actually is. In the
next section we will consider how well the meaning of URI’s
from Wikipedia have translated themselves to DBpedia. We will
also present a study made on the DBLP bibliographic database
which is available both as linked data and a database.
      </p>
    </sec>
    <sec id="sec-5">
      <title>3. THE PROBLEM OF CO-REFERENCE</title>
      <p>
        Co-reference on the Semantic Web can occur in two ways: Firstly,
when a single URI identifies more than one resource and secondly
when multiple URIs identify the same resource. Both situations
occur frequently when studying linked data. For an example of the
first situation, many URIs in the DBLP dataset are used for
identifying a single author when, in fact, there are a number of
people with the same name who are being incorrectly identified as
being the same person. The second situation occurs much more
frequently as different datasets use their own URIs to identify the
same resource. People and places are entities which suffer from
URI multiplicity. Spain, for example has at least four URIs:
http://dbpedia.org/resource/Spain
http://www4.wiwiss.fu-berlin.de/factbook/resource/Spain
http://sws.geonames.org/2510769
http://www4.wiwiss.fuberlin.de/eurostat/resource/countries/Espa%C3%B1a
‘Hugh Glaser’ has at least eight URIs:
http://acm.rkbexplorer.com/rdf/resource-P112732
http://citeseer.rkbexplorer.com/rdf/resource-CSP109020
http://citeseer.rkbexplorer.com/rdf/resource-CSP109013
http://citeseer.rkbexplorer.com/rdf/resource-CSP109011
http://citeseer.rkbexplorer.com/rdf/resource-CSP109002
http://dblp.rkbexplorer.com/rdf/resource-27de9959
http://europa.eu/People/#person-0ff816fa
http://resist.ecs.soton.ac.uk/wiki/User:hugh_glaser
http://www.ecs.soton.ac.uk/info/#person-00021
This is to be expected and does not present a problem in itself.
The problem occurs when these URIs are linked to other URIs via
owl:sameAs. Since URI identity can often depend on the context
in which it is used [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], there can be no guarantee that the two
URIs are in fact the same entity. The next section supports this
assertion by looking at the DBLP dataset and also the DBpedia
dataset to reveal inconsistencies in the linking and naming of
resources.
3.1 DBLP
The DBLP database reportedly contains over 900 000 articles
from over 500 000 different authors in the field of computer
science and related disciplines. The database can be seen as RDF
by means of a D2R Server [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and has been converted into linked
data by adding owl:sameAs links to authors who are also in
DBpedia. Whilst providing a comprehensive repository for
scientific publications, there are a number of inconsistencies that
appear in the data. This problem is not only found in DBLP but
also in other digital repositories. Due to lack of resources there is
often not enough time available to rigorously check the input for
correctness or completeness. This has resulted in many authors
having publications incorrectly attributed to them, with some
having more titles under their name and some authors having less.
This will have a major impact on the Semantic Web when such
repositories are used as data sources without any attempt to
manage the inconsistencies or ‘clean’ the data.
completeness of the data before assigning links to other datasets
and also in the form of owl:sameAs.
      </p>
      <p>To assess the quality of data stored in DBLP we looked at some of
the most common names and tried to ascertain whether the name
belonged to a single author. This was achieved by looking at the
publications attributed to each name and performing a Web search
on the publication to find out to which institution an author was
affiliated. The remaining publications were then checked in the
same way and authors who came from the same institutions were
grouped together. Authors also frequently change institutions, to
accommodate this when a name was found that belonged to a
different institution, it was assumed to be different unless:</p>
      <sec id="sec-5-1">
        <title>1. The co-authors of any publication were the same.</title>
      </sec>
      <sec id="sec-5-2">
        <title>2. The publication venue was the same.</title>
      </sec>
      <sec id="sec-5-3">
        <title>3. The area of research was similar.</title>
        <p>The author’s own publication page was also used if one could be
found. This process allowed for a conservative estimate to be
made of the number of different authors who appeared under the
same name. Single author papers and papers where there was a
difference of greater than four years between their publications
were excluded as authors can change their field of research over a
period of time.</p>
        <p>Names were chosen in order to provide a worst case scenario for
authors not having been disambiguated. The ten most common
surnames in the UK along with a list of common forenames were
used. A total of 49 names were investigated by selecting five
forenames with the nine most common surnames, and four
forenames with the remaining surname. The DBLP dataset that
was used was from October 2006 which contains a total of 491
796 authors. Thus, the selected names were almost 0.01% of the
total population.</p>
        <p>The results showed that for 92% of names chosen there were at
least two different authors whose publications had been
incorrectly merged. The highest number of different authors was
15 for the name ‘David Smith’. The mean number of authors for
each name was 3.8 with a standard deviation of 2.6. The ten most
ambiguous author names are shown in Table 1.</p>
        <p>
          As well as several authors being considered as one, there are also
a number of cases where an author has more than one name where
initials are used instead of full names. For example, ‘C.B. Jones’,
‘Cliff B. Jones’ and ‘Cliff Jones’ are all the same author yet his
publications appear under these three different names.
To estimate the number of names which include two separate
authors in the entire population, a Laplace point estimated can be
calculated using a 95% confidence interval using the Adjusted
Wald Method [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. Multiplying the total number of entries in
DBLP by the Laplace point estimate (0.902) gives 443 600
names. This will not be a truly accurate estimation since common
names were chosen and not random names.
        </p>
        <p>Nevertheless, we can conclude that if a person has a common
name. The probability of their publications being merged with
other authors will be 90%. These results should provide concern
to those working in the Semantic Web and especially those who
deploy linked data.</p>
        <p>
          When existing data sources are used for Semantic Web data
integration it is important to consider the consistency and
All of the names in DBLP have their own URIs which is thought
to identify one single author with that particular name. As these
results show, in most situations that is not the case.
This identity problem is not just theoretical, but also has
implications for the future when more applications will be built
that reason with and use Semantic Web data. In particular,
consider the attempt that is being made in the UK to allocate
research funding and judge research excellence by citation impact
[
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. One could naturally believe that a Semantic Web application
could be made that amalgamates all bibliographic data from
DBLP and other repositories and ranks people or institutions
based on their publications. If the issue of co-reference is not
taken into consideration then it is clear that not everyone will be
fairly represented.
        </p>
        <p>Now that the problem of co-reference has been highlighted in
DBLP, we move on to looking at how well the problem is handled
in DBpedia.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>3.2 DBpedia</title>
      <p>The huge amount of data that has been extracted from Wikipedia
has led to a rapid increase in the number of URIs that can be used
to identify people, places and things. At present DBpedia has
identifiers for close to two million entities. This has enabled many
other datasets to become linked with DBpedia entities through the
use of owl:sameAs giving rise to the Web of Data.</p>
      <p>Whilst providing a valuable resource for data providers and
application developers, the conversion process has not taken into
account the different needs that DBpedia has in comparison to
Wikipedia. In particular, the issues of ambiguity and co-reference
raised in this paper have not been addressed.</p>
      <p>Wikipedia deals with the issue of co-reference by having special
‘disambiguation’ pages. These pages are created when there is
more than one entry that has the same name but carries a different
meaning. Disambiguation pages are mainly intended for humans
searching on a particular topic who may need some help in
locating the page that they are looking for. These same
disambiguation pages have been carried over into DBpedia where
there is no real need for them. Instead of making entities
unambiguous, as in Wikipedia, the DBpedia URIs actually
introduce more ambiguity.</p>
      <p>Consider a person or machine wanting to use a URI for Robert
Williams, the American politician. Using the URI
http://dbpedia.org/resource/Robert_Williams reveals that
properties belonging to Sir Robert Williams of Dorset, Robbie
Williams the singer and Robert Williams the actor have all been
merged onto one page. This happens with a large number of pages
that fall into the Wikipedia category ‘Disambiguation’. DBpedia
2.0 provides a number of examples where URIs are not
sufficiently disambiguated. One example is the URI
http://dbpedia.org/resource/Nancy_Wilson if this URI refers to
Nancy Wilson the singer then the dbpedia:spouse property is of
Nancy Wilson the guitarist.</p>
      <p>
        There are, of course, other URIs which have all the properties
belonging to the correct person. The URI
http://dbpedia.org/resource/Nancy_Wilson_%27guitarist%28 will
give the correct URI for the guitarist Nancy Wilson. This is
simple for a human to work out, but machines will struggle. This
is demonstrated by the fact that putting ‘Robert Williams’ or
‘Nancy Wilson’ into Sindice [
        <xref ref-type="bibr" rid="ref19">20</xref>
        ] puts the ambiguous URI at a
higher rank than the ‘real’ URIs. Therefore the disambiguation
URIs used in DBpedia only act as URI ‘noise’ and should
probably be removed.
      </p>
      <p>
        It is pleasing to note that DBpedia 3.0 has given much more
attention to the issue of disambiguation. However, whilst a new
‘disambiguates’ property has been created, rogue properties
belonging to distinct URIs still appear in URIs referring to
disambiguation pages. There are approximately 150 000 of these
URIs which can be detected with relative ease. It is hoped that
successive improvements to the method in which URIs are
disambiguated will mean that the co-reference resolution of URIs
can then be handled by external systems as described in Section 4.
A second problem arises due to the strong implications prescribed
by the owl:sameAs property. By stating that one URI is
owl:sameAs another, one is stating that the two references identify
the same resource, and that each should share the properties of the
other [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Looking at the owl:sameAs links in DBpedia one can
see that URIs are made to be the same as several URIs with
different meanings. For example,
http://dbpedia.org/resource/Welsh is taken from a Wikipedia
disambiguation page for the term ‘Welsh’, in DBpedia this URI is
owl:sameAs:
None of these links are made from the pages that are actually
identifying these concepts such as
http://dbpedia.org/resource/Welsh_language. In another example
the URI http://dbpedia.org/resource/H.P._Lovecraft is
owl:sameAs the CYC URI identifying the author and the Zitgist
URI identifying the music band. Clearly the two are not the same.
When looking at the real URI for the music band in DBpedia
there is no owl:sameAs link.
      </p>
      <p>These issues demonstrate the necessity of having dedicated
management systems In order to manage co-reference resolution
on the Semantic Web. The next section looks at two such systems
that are currently in production. Problems during the creation of
these systems have shown that there is a significant problem that
needs to be tackled.</p>
    </sec>
    <sec id="sec-7">
      <title>4. POSSIBLE SOLUTIONS</title>
      <p>
        There are two main initiatives that have been set up in order to
confront the issue of co-reference on the Semantic Web. Our own
ReSIST [18] project has gathered metadata from publications and
institutions and exposed them as linked data. The Okkam project
[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] is a relatively new project that will formally begin this year
although the initial architecture has already been conceived.
      </p>
    </sec>
    <sec id="sec-8">
      <title>4.1 Consistent Reference Services</title>
      <p>The CRS sits in the Semantic Web as any other knowledge base
or database would. Each data provider maintains one or more
CRSs for their own knowledge. In the ReSIST project there are
over 15 repositories each with their own CRS.</p>
      <p>
        The CRS introduces the concept of a bundle to group together
resources that have been deemed to refer to the same concept
within a given context. Different bundles may be used to group
together URIs of the same resource in different contexts. For
example, there may be a bundle containing all of the URIs about a
person in the context of institution 1; and another bundle
containing all of the URIs about the same person in the context of
institution 2. Each CRS can use different algorithms to identify
equivalent resources. A full description of the service can be
found in [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ].
      </p>
      <p>The system is being used on a live site at
http://www.rkbexplorer.com. Extending this system for use with
DBpedia and other sites would involve using the linking
algorithms for each dataset and storing the links in a CRS. Each
dataset would have one or more CRSs which would act as an
authority for their data. An application may choose to give
precedence to a CRS hosted from the same domain as the URI in
question. Taking the owl:sameAs links out of the data ensures the
knowledge is semantically correct without introducing a
significant overhead. However, if owl:sameAs links wish to be
made then the CRS can be used for this purpose.</p>
    </sec>
    <sec id="sec-9">
      <title>4.2 Okkam</title>
      <p>
        The Okkam project has been created to enable a ‘Web of Entities’
[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Whereas the CRS is a fully distributed system, the Okkam
system is centralised. The main aims are to create a naming
service for entities and a directory containing entity profiles under
the single control of one authority.
      </p>
      <p>The main service, OkkamCore, allows for the publishing,
modifying and removing of entities and assertions of identity and
a retrieval service based on a set of criteria. A prototype
application has been made and will be sequentially improved and
upgraded throughout the duration of the project. By holding
identifiers for all types of entities the project hopes to avoid the
proliferation of URIs that is currently occurring. For the purposes
of linked data it is yet to be seen what the final system will
provide. The project will be monitored with interest as
progression develops.</p>
    </sec>
    <sec id="sec-10">
      <title>5. CONCLUSION</title>
      <p>This paper has attempted to provide some motivation for finding
solutions to the co-reference problem. With the linked data
initiative in its early stages, it is important to think about the
integrity of the data being provided before errors are found in the
applications that attempt to use the data.</p>
      <p>We would stress that DBLP and Wikipedia/DBpedia are valuable
and hard-won facilities that deliver searchable resources very
effectively to their many users. The problem that is arising is that
in the context of the Semantic Web and Linked Data, different
measures of quality pertain. It is the very Network Effect that the
Linked Data community is seeking that causes the difference.
The issue has attracted significant theoretical debate, yet the only
systems attempting to solve the problem are the two mentioned in
Section 4. It would be in the interest of the whole Semantic Web
community if this issue was carefully considered as a fundamental
part of the architecture needed to make the Semantic Web gain
widespread adoption.</p>
    </sec>
    <sec id="sec-11">
      <title>6. ACKNOWLEDGEMENTS</title>
      <p>This work is supported under the ReSIST Network of Excellence
(NoE) which is sponsored by the Information Society Technology
(IST) priority of the EU Sixth Framework programme (FP6)
under contract number IST-4-026764-NOE.
[18] Resilience for Survivability in IST (ReSIST) Network of</p>
      <p>Excellence. http://resist-noe.eu</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Agresti</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Coull</surname>
            ,
            <given-names>B.A.</given-names>
          </string-name>
          <article-title>Approximate is better than 'Exact' for Interval Estimation of Binomial Proportions</article-title>
          .
          <source>The American Statistician</source>
          . 52 pp.
          <fpage>119</fpage>
          -
          <lpage>126</lpage>
          .
          <year>1998</year>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Auer</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bizer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kobilarov</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lehmann</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cyganiak</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Ives</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          <article-title>DBpeddia: A Nucleus for a Web of Open Data</article-title>
          .
          <source>In Proceedings of the 6th International Semantic Web Conference (Busan, Korea</source>
          <year>2007</year>
          ). Springer.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Bechofer</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Van Harmelen</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hendler</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Horrocks</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mcguiness</surname>
            ,
            <given-names>D.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schneider</surname>
            ,
            <given-names>P.F.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>L.A.</given-names>
          </string-name>
          <string-name>
            <surname>OWL Web Ontology Language Reference</surname>
          </string-name>
          ,
          <source>Technical Report, W3C</source>
          , http://www.w3.org/TR/owl-ref/
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Bizer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Cyganiak</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <article-title>D2R Server - Publishing Relational Databases on the Web as SPARQL Endpoints</article-title>
          .
          <source>In Proceedings of the 15th International World Wide Web Conference.(Edinburgh</source>
          ,
          <year>Scotland 2006</year>
          ).ACM
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Bilenko</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mooney</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , Cohen,
          <string-name>
            <given-names>W.</given-names>
            ,
            <surname>Ravikumar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            , and
            <surname>Fienberg</surname>
          </string-name>
          ,
          <string-name>
            <surname>S.</surname>
          </string-name>
          <article-title>Adaptive Name Matching in Information Integration</article-title>
          .
          <source>IEEE Intelligent Systems</source>
          ,
          <volume>18</volume>
          (
          <issue>5</issue>
          ) pp.
          <fpage>16</fpage>
          -
          <lpage>23</lpage>
          ,
          <year>2003</year>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Booth</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <article-title>URIs and the Myth of Resource Identity</article-title>
          , Proceedings of the Workshop on Identity,
          <article-title>Meaning and the Web (IMW06</article-title>
          ) at International World Wide Web Conference. (Edinburgh, Scotland.
          <year>2006</year>
          ) ACM
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Bouquet</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stoermer</surname>
          </string-name>
          , H and
          <string-name>
            <surname>Giacomuzzi</surname>
            ,
            <given-names>D. OKKAM</given-names>
          </string-name>
          :
          <article-title>Enabling a Web of Entities</article-title>
          .
          <source>In Proceedings of the 16th International World Wide Web Conference (Banff</source>
          , Canada) ACM.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Cycorp</given-names>
            <surname>Inc</surname>
          </string-name>
          . http://www.cyc.com
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Feitelson</surname>
            ,
            <given-names>D.G.</given-names>
          </string-name>
          <article-title>On Identifying Name Equivalences in Digital Libraries</article-title>
          .
          <source>Information Research</source>
          ,
          <volume>9</volume>
          (
          <issue>4</issue>
          ), p.
          <fpage>192</fpage>
          .
          <year>2004</year>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Fellegi</surname>
            ,
            <given-names>I.P.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Sunter</surname>
            ,
            <given-names>A.B.</given-names>
          </string-name>
          <article-title>A Theory for Record Linkage</article-title>
          ,
          <source>Journal of the American Statistical Association</source>
          ,
          <volume>64</volume>
          (
          <issue>328</issue>
          ), pp.
          <fpage>1183</fpage>
          -
          <lpage>1210</lpage>
          ,
          <year>December 1969</year>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Gangemi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Presutti</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <article-title>A Grounded Ontology for Identity and Reference of Web Resources</article-title>
          .
          <source>In Proceedings of the 16th International World Wide Web Conference (Banff</source>
          , Canada) ACM.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Han</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hongyuan</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Giles</surname>
            ,
            <given-names>C.L. Name</given-names>
          </string-name>
          <article-title>Disambiguation in Author Citations using a K-Way Spectral Clustering Method</article-title>
          .
          <source>In Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries</source>
          .(Denver) ACM
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Harnad</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Carr</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brody</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Oppenheim</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <article-title>Mandated online RAE CV's linked to univerestiy eprint archives:enhancing UK research impact and assessment</article-title>
          . Ariadne http://www.ariadne.ac.uk/issue35/harnad/
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Hepp</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Siorpaes</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Bachlechner</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <article-title>Harvesting Wiki Consensus Using Wikipedia Entries as Vocabulary for Knowledge Management</article-title>
          .
          <source>IEEE Internet Computing</source>
          .
          <volume>11</volume>
          (
          <issue>5</issue>
          ) pp.
          <fpage>54</fpage>
          -
          <issue>65</issue>
          <year>Sep 2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Hogan</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Harth</surname>
            ,
            <given-names>A</given-names>
          </string-name>
          and Decker,
          <string-name>
            <surname>S.</surname>
          </string-name>
          <article-title>A Grounded ontology for Identity and Reference of Web Resources</article-title>
          .
          <source>In Proceedings of the 16th International World Wide Web Conference (Banff</source>
          , Canada) ACM.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Jaffri</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Glaser</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Millard</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          <article-title>URI Identity Management for Semantic Web Data Integration and Linkage</article-title>
          .
          <source>In Proceedings of the Workshop on Scalable Semantic Web Systems (Vilamoura</source>
          ,
          <year>Portugal 2007</year>
          ) Springer.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>McCallum</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Niham</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Ungar</surname>
            ,
            <given-names>L.H. Efficient</given-names>
          </string-name>
          <article-title>Clustering of High-Dimensional Data Sets with Application to Reference Matching</article-title>
          .
          <source>In Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining</source>
          . (Boston, USA
          <year>2000</year>
          ). ACM Press.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Tan</surname>
            ,
            <given-names>Y.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kan</surname>
          </string-name>
          , M.-Y. and
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Search Engine Driven Author Disambiguation</surname>
          </string-name>
          ,
          <source>Proceedings 6th ACM/IEEE-CS Joint Conference on Digital Libraries</source>
          , pp.
          <fpage>314</fpage>
          -
          <lpage>315</lpage>
          , ACM Press, New York.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Tummarello</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Delbru</surname>
            ,
            <given-names>R</given-names>
          </string-name>
          and Oren,
          <string-name>
            <surname>E. Sindice.</surname>
          </string-name>
          <article-title>com: Weaving the Open Linked Data</article-title>
          .
          <source>In Proceedings of the 6th International Semantic Web Conference (Busan, Korea</source>
          <year>2007</year>
          ) ACM
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [21]
          <string-name>
            <surname>Volz</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kleb</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Mueller</surname>
            ,
            <given-names>W. Towards</given-names>
          </string-name>
          <article-title>Ontology-based Disambiguation of Geographical Identifiers</article-title>
          .
          <source>In Proceedings of the 16th International World Wide Web Conference (Banff</source>
          , Canada) ACM.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [22]
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jiang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <article-title>and</article-title>
          <string-name>
            <surname>Ho</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <article-title>Extracting Citation Relationships from Web Documents for Author Disambiguation</article-title>
          ,
          <source>Technical Report No.TR-IIS-06- 017</source>
          ,Institute of Information Science, Taipei, Taiwan
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>