URI Disambiguation in the Context of Linked Data
               Afraz Jaffri                                  Hugh Glaser                                   Ian Millard
 School of Electronics and Computer             School of Electronics and Computer          School of Electronics and Computer
               Science                                        Science                                     Science
     University of Southampton                      University of Southampton                   University of Southampton
    a.o.jaffri@ecs.soton.ac.uk                         hg@ecs.soton.ac.uk                         icm@ecs.soton.ac.uk


ABSTRACT                                                              Whilst extensive linking between datasets has been widely
The Linked Data initiative has given rise to an increasing number     encouraged, there has been little analysis of the accuracy of the
of RDF datasets, many of which are freely accessible online.          links or the datasets themselves.
These resources often arise as a result of database exports;          Datasets are often converted from existing sources which can
however sufficient consideration may not be given to the unseen       themselves be either incomplete or inaccurate. The linking
implications caused when they are used in the wider context of the    process accentuates these inconsistencies and produces a snowball
Semantic Web. This paper investigates two popular resources,          effect as more datasets are added. If the Semantic Web is to
DBLP and DBpedia, and discusses whether the issues regarding          provide a meaningfully interconnected web of assertions and
identity management and co-reference resolution have been             relations, there must also be some guarantee or measure of the
suitably addressed. We find that a large percentage of authors in     correctness of the information.
DBLP have been conflated, and that disambiguation pages have
been incorrectly linked using owl:sameAs within DBpedia.              One of the main areas in which errors occur, both in databases
Systems for dealing with these issues are presented, and directions   and in digital libraries which are the kinds of repositories that
are given for future research.                                        have been converted into RDF for use with linked data, is the
                                                                      problem of co-reference. Co-reference is the problem of ensuring
                                                                      that two different entities do not share the same name or identifier,
Categories and Subject Descriptors                                    and conversely identifying when two identifiers refer to the same
H.3.5 [Information Systems]: Information Storage and Retrieval:       entity. In the context of the Semantic Web we are therefore
Online Information Services – data sharing, web-based services.       concerned with URIs.
                                                                      This paper presents some analysis of datasets used to link data and
General Terms                                                         raises the question of how to manage the identity and meaning of
Management, Design, Reliability                                       URIs in the Semantic Web. The next section describes some
                                                                      related work in the field of co-reference and author
                                                                      disambiguation, while Section 3 describes the problem of co-
Keywords
                                                                      reference and where it occurs in DBLP and DBpedia. Section 4
Linked Data, URI, Co-reference
                                                                      goes on to describe possible solutions to the problem that are
                                                                      currently in deployment. Section 5 concludes and issues an
1. INTRODUCTION                                                       invitation to help to provide an infrastructure where data can be
As the Linking Open Data project gathers pace, more and more          confidently used on the Semantic Web.
repositories of knowledge are being added to the Linked Data
Cloud, covering a wide range of topics. Many datasets stem from       2. RELATED WORK
the focal point of the Linked Data Cloud, DBpedia [2]. Since
DBpedia has harvested knowledge from Wikipedia, there is the          2.1 Author Disambiguation
potential to create links to any subject that is described in         The issue of resolving the problem of co-reference occurs in many
Wikipedia.                                                            different disciplines. A brief overview of the problems and
                                                                      solutions that appear in Information Science and database design
The datasets that have been interlinked so far have knowledge
                                                                      can be found in [16]. One of the main areas in which co-reference
relating to people, places, books, songs and CYC [8] concepts as
                                                                      becomes a major problem is in author disambiguation. There are
well as many others. Entities such as these are often prone to the
                                                                      many authors who share the same name and distinguishing
problems of duplication and co-reference.
                                                                      between them is a vital part of any digital library or citation
                                                                      system. Not only do authors share the same names but variation in
                                                                      the spelling of names can also lead to a single author having
                                                                      multiple identities. For example, the author ‘Hugh Glaser’ could
Copyright is held by the author/owner(s).                             be represented with his full name or by using ‘H. Glaser’, or
LDOW2008, April 22, 2008, Beijing, China.                             ‘Glaser, H.’
                                                                      A wide variety of methods have been employed to try and solve
                                                                      the problem of author disambiguation. Some of these include
                                                                      record linkage [10] used in databases, citation matching [17],
name matching [5] and name equivalence identification [9]. These      stays stable approximately 93% of the time. While the results have
methods involve some form of string matching and word sense           been documented there has been little attempt to quantify how big
disambiguation.                                                       a problem co-reference on the Semantic Web actually is. In the
                                                                      next section we will consider how well the meaning of URI’s
Although these methods can help in identifying names with
                                                                      from Wikipedia have translated themselves to DBpedia. We will
different spellings or written in different formats, the problem of
                                                                      also present a study made on the DBLP bibliographic database
disambiguating authors with exactly the same name remains a
                                                                      which is available both as linked data and a database.
challenge. There have been recent attempts that use a different
approach from the traditional string based systems. Using the Web
as a means of author disambiguation has been highlighted as a         3. THE PROBLEM OF CO-REFERENCE
possible solution to the problem. Since Web pages often contain       Co-reference on the Semantic Web can occur in two ways: Firstly,
information about people that are not included in citation            when a single URI identifies more than one resource and secondly
references, automatic scripts can be made that check the results of   when multiple URIs identify the same resource. Both situations
search engine queries made on the names of authors [19]. Another      occur frequently when studying linked data. For an example of the
web-based approach attempts to find the publication page of an        first situation, many URIs in the DBLP dataset are used for
author from his or her institution’s website and match the            identifying a single author when, in fact, there are a number of
publications contained in the page to citations in the repository     people with the same name who are being incorrectly identified as
[22]. The accuracy of such web based systems ranges from 73% to       being the same person. The second situation occurs much more
84%. These systems also rely on there being sufficient                frequently as different datasets use their own URIs to identify the
information available on the Web about each author. This is not       same resource. People and places are entities which suffer from
always the case, especially with older publications and               URI multiplicity. Spain, for example has at least four URIs:
publications not in the field of computer science.                    http://dbpedia.org/resource/Spain
                                                                      http://www4.wiwiss.fu-berlin.de/factbook/resource/Spain
Another method that has been put into practice is to use an
                                                                      http://sws.geonames.org/2510769
unsupervised machine learning approach using k-way spectral
                                                                      http://www4.wiwiss.fu-
clustering that disambiguates authors in citations [12]. This study
                                                                      berlin.de/eurostat/resource/countries/Espa%C3%B1a
focused on the DBLP dataset and chose the top ranked ambiguous
names such as ‘J. Lee’, ‘S. Lee’, ‘Y. Chen’, ‘C. Chen’, ‘J            ‘Hugh Glaser’ has at least eight URIs:
Anderson’ and ‘J Smith’. The unsupervised learning technique          http://acm.rkbexplorer.com/rdf/resource-P112732
used co-author names, publication titles and publication venue        http://citeseer.rkbexplorer.com/rdf/resource-CSP109020
titles for author disambiguation. This assumes that individuals       http://citeseer.rkbexplorer.com/rdf/resource-CSP109013
will quite often author with the same people and publish to the       http://citeseer.rkbexplorer.com/rdf/resource-CSP109011
same venues. The results of this experiment show that an average      http://citeseer.rkbexplorer.com/rdf/resource-CSP109002
of around 65% of authors can be successfully disambiguated.           http://dblp.rkbexplorer.com/rdf/resource-27de9959
The purpose of mentioning the ongoing work in author                  http://europa.eu/People/#person-0ff816fa
disambiguation in a different domain is to highlight the              http://resist.ecs.soton.ac.uk/wiki/User:hugh_glaser
importance of a problem that is only beginning to be appreciated      http://www.ecs.soton.ac.uk/info/#person-00021
on the Semantic Web. Section 3 will elaborate on this. The next       This is to be expected and does not present a problem in itself.
section will look at how co-reference is being managed on the         The problem occurs when these URIs are linked to other URIs via
Semantic Web.                                                         owl:sameAs. Since URI identity can often depend on the context
                                                                      in which it is used [6], there can be no guarantee that the two
2.2 Disambiguation on the Semantic Web                                URIs are in fact the same entity. The next section supports this
There has been much discussion about identity and meaning on          assertion by looking at the DBLP dataset and also the DBpedia
the Semantic Web from a theoretical point of view. Such               dataset to reveal inconsistencies in the linking and naming of
discussions will continue as questions fundamental to the             resources.
architecture of the Semantic Web are debated. Attention is now
turning towards practical solutions of managing co-reference, or      3.1 DBLP
URI identity management. Since co-reference between datasets is       The DBLP database reportedly contains over 900 000 articles
essential for linked data to work properly, a perfect opportunity     from over 500 000 different authors in the field of computer
arises to test some of the methods and solutions that have been       science and related disciplines. The database can be seen as RDF
proposed.                                                             by means of a D2R Server [4] and has been converted into linked
                                                                      data by adding owl:sameAs links to authors who are also in
The various methods that have been suggested for managing co-         DBpedia. Whilst providing a comprehensive repository for
reference and identity on the Semantic Web range from ontology        scientific publications, there are a number of inconsistencies that
based [22, 11], object consolidation [15] to complete management      appear in the data. This problem is not only found in DBLP but
systems [7, 16]. The above applications have been used with           also in other digital repositories. Due to lack of resources there is
geographical data [21], wikis [11] and general Semantic Web data      often not enough time available to rigorously check the input for
[7, 16].                                                              correctness or completeness. This has resulted in many authors
There has been valuable work done on studying the reliability and     having publications incorrectly attributed to them, with some
stability of Wikipedia URIs [14] that are being used by DBpedia.      having more titles under their name and some authors having less.
This study suggests that the meaning of a URI found in Wikipedia      This will have a major impact on the Semantic Web when such
repositories are used as data sources without any attempt to          completeness of the data before assigning links to other datasets
manage the inconsistencies or ‘clean’ the data.                       and also in the form of owl:sameAs.
To assess the quality of data stored in DBLP we looked at some of     All of the names in DBLP have their own URIs which is thought
the most common names and tried to ascertain whether the name         to identify one single author with that particular name. As these
belonged to a single author. This was achieved by looking at the      results show, in most situations that is not the case.
publications attributed to each name and performing a Web search       Table 1. List of names with most number of distinct authors
on the publication to find out to which institution an author was
affiliated. The remaining publications were then checked in the                         Name                 No. Authors
same way and authors who came from the same institutions were                    David Smith                      15
grouped together. Authors also frequently change institutions, to
accommodate this when a name was found that belonged to a                        David Williams                   10
different institution, it was assumed to be different unless:                    David Jones                       8
1. The co-authors of any publication were the same.                              David Evans                       7
2. The publication venue was the same.                                           Alan Williams                     6
3. The area of research was similar.                                             Matthew Jones                     4
The author’s own publication page was also used if one could be                  Andrew Taylor                     4
found. This process allowed for a conservative estimate to be
made of the number of different authors who appeared under the                   Michael Taylor                    4
same name. Single author papers and papers where there was a                     Andrew Brown                      4
difference of greater than four years between their publications
                                                                                 Ben Smith                         4
were excluded as authors can change their field of research over a
period of time.
Names were chosen in order to provide a worst case scenario for       This identity problem is not just theoretical, but also has
authors not having been disambiguated. The ten most common            implications for the future when more applications will be built
surnames in the UK along with a list of common forenames were         that reason with and use Semantic Web data. In particular,
used. A total of 49 names were investigated by selecting five         consider the attempt that is being made in the UK to allocate
forenames with the nine most common surnames, and four                research funding and judge research excellence by citation impact
forenames with the remaining surname. The DBLP dataset that           [13]. One could naturally believe that a Semantic Web application
was used was from October 2006 which contains a total of 491          could be made that amalgamates all bibliographic data from
796 authors. Thus, the selected names were almost 0.01% of the        DBLP and other repositories and ranks people or institutions
total population.                                                     based on their publications. If the issue of co-reference is not
                                                                      taken into consideration then it is clear that not everyone will be
The results showed that for 92% of names chosen there were at
                                                                      fairly represented.
least two different authors whose publications had been
incorrectly merged. The highest number of different authors was       Now that the problem of co-reference has been highlighted in
15 for the name ‘David Smith’. The mean number of authors for         DBLP, we move on to looking at how well the problem is handled
each name was 3.8 with a standard deviation of 2.6. The ten most      in DBpedia.
ambiguous author names are shown in Table 1.
As well as several authors being considered as one, there are also    3.2 DBpedia
a number of cases where an author has more than one name where        The huge amount of data that has been extracted from Wikipedia
initials are used instead of full names. For example, ‘C.B. Jones’,   has led to a rapid increase in the number of URIs that can be used
‘Cliff B. Jones’ and ‘Cliff Jones’ are all the same author yet his    to identify people, places and things. At present DBpedia has
publications appear under these three different names.                identifiers for close to two million entities. This has enabled many
                                                                      other datasets to become linked with DBpedia entities through the
To estimate the number of names which include two separate
                                                                      use of owl:sameAs giving rise to the Web of Data.
authors in the entire population, a Laplace point estimated can be
calculated using a 95% confidence interval using the Adjusted
                                                                      Whilst providing a valuable resource for data providers and
Wald Method [1]. Multiplying the total number of entries in
                                                                      application developers, the conversion process has not taken into
DBLP by the Laplace point estimate (0.902) gives 443 600
                                                                      account the different needs that DBpedia has in comparison to
names. This will not be a truly accurate estimation since common
                                                                      Wikipedia. In particular, the issues of ambiguity and co-reference
names were chosen and not random names.
                                                                      raised in this paper have not been addressed.
Nevertheless, we can conclude that if a person has a common
name. The probability of their publications being merged with         Wikipedia deals with the issue of co-reference by having special
other authors will be 90%. These results should provide concern       ‘disambiguation’ pages. These pages are created when there is
to those working in the Semantic Web and especially those who         more than one entry that has the same name but carries a different
deploy linked data.                                                   meaning. Disambiguation pages are mainly intended for humans
When existing data sources are used for Semantic Web data             searching on a particular topic who may need some help in
integration it is important to consider the consistency and           locating the page that they are looking for. These same
                                                                      disambiguation pages have been carried over into DBpedia where
there is no real need for them. Instead of making entities            When looking at the real URI for the music band in DBpedia
unambiguous, as in Wikipedia, the DBpedia URIs actually               there is no owl:sameAs link.
introduce more ambiguity.
                                                                      These issues demonstrate the necessity of having dedicated
Consider a person or machine wanting to use a URI for Robert          management systems In order to manage co-reference resolution
Williams, the American politician. Using the URI                      on the Semantic Web. The next section looks at two such systems
http://dbpedia.org/resource/Robert_Williams      reveals    that      that are currently in production. Problems during the creation of
properties belonging to Sir Robert Williams of Dorset, Robbie         these systems have shown that there is a significant problem that
Williams the singer and Robert Williams the actor have all been       needs to be tackled.
merged onto one page. This happens with a large number of pages
that fall into the Wikipedia category ‘Disambiguation’. DBpedia
2.0 provides a number of examples where URIs are not                  4. POSSIBLE SOLUTIONS
sufficiently disambiguated. One example is the URI                    There are two main initiatives that have been set up in order to
http://dbpedia.org/resource/Nancy_Wilson if this URI refers to        confront the issue of co-reference on the Semantic Web. Our own
Nancy Wilson the singer then the dbpedia:spouse property is of        ReSIST [18] project has gathered metadata from publications and
Nancy Wilson the guitarist.                                           institutions and exposed them as linked data. The Okkam project
                                                                      [7] is a relatively new project that will formally begin this year
There are, of course, other URIs which have all the properties        although the initial architecture has already been conceived.
belonging      to     the    correct   person.     The      URI
http://dbpedia.org/resource/Nancy_Wilson_%27guitarist%28 will         4.1 Consistent Reference Services
give the correct URI for the guitarist Nancy Wilson. This is          The CRS sits in the Semantic Web as any other knowledge base
simple for a human to work out, but machines will struggle. This      or database would. Each data provider maintains one or more
is demonstrated by the fact that putting ‘Robert Williams’ or         CRSs for their own knowledge. In the ReSIST project there are
‘Nancy Wilson’ into Sindice [20] puts the ambiguous URI at a          over 15 repositories each with their own CRS.
higher rank than the ‘real’ URIs. Therefore the disambiguation        The CRS introduces the concept of a bundle to group together
URIs used in DBpedia only act as URI ‘noise’ and should               resources that have been deemed to refer to the same concept
probably be removed.                                                  within a given context. Different bundles may be used to group
                                                                      together URIs of the same resource in different contexts. For
It is pleasing to note that DBpedia 3.0 has given much more           example, there may be a bundle containing all of the URIs about a
attention to the issue of disambiguation. However, whilst a new       person in the context of institution 1; and another bundle
‘disambiguates’ property has been created, rogue properties           containing all of the URIs about the same person in the context of
belonging to distinct URIs still appear in URIs referring to          institution 2. Each CRS can use different algorithms to identify
disambiguation pages. There are approximately 150 000 of these        equivalent resources. A full description of the service can be
URIs which can be detected with relative ease. It is hoped that       found in [16].
successive improvements to the method in which URIs are
                                                                      The system is being used on a live site at
disambiguated will mean that the co-reference resolution of URIs
                                                                      http://www.rkbexplorer.com. Extending this system for use with
can then be handled by external systems as described in Section 4.
                                                                      DBpedia and other sites would involve using the linking
                                                                      algorithms for each dataset and storing the links in a CRS. Each
A second problem arises due to the strong implications prescribed
                                                                      dataset would have one or more CRSs which would act as an
by the owl:sameAs property. By stating that one URI is
                                                                      authority for their data. An application may choose to give
owl:sameAs another, one is stating that the two references identify
                                                                      precedence to a CRS hosted from the same domain as the URI in
the same resource, and that each should share the properties of the
                                                                      question. Taking the owl:sameAs links out of the data ensures the
other [3]. Looking at the owl:sameAs links in DBpedia one can
                                                                      knowledge is semantically correct without introducing a
see that URIs are made to be the same as several URIs with
                                                                      significant overhead. However, if owl:sameAs links wish to be
different            meanings.            For            example,
                                                                      made then the CRS can be used for this purpose.
http://dbpedia.org/resource/Welsh is taken from a Wikipedia
disambiguation page for the term ‘Welsh’, in DBpedia this URI is
                                                                      4.2 Okkam
owl:sameAs:
                                                                      The Okkam project has been created to enable a ‘Web of Entities’
                                                                      [7]. Whereas the CRS is a fully distributed system, the Okkam
<http://sw.cyc.com/2006/07/27/cyc/EthnicGroupOfWelsh>
                                                                      system is centralised. The main aims are to create a naming
<http://sw.cyc.com/2006/07/27/cyc/Welsh-TheWord>
                                                                      service for entities and a directory containing entity profiles under
<http://sw.cyc.com/2006/07/27/cyc/WelshLanguage>
                                                                      the single control of one authority.
<http://sw.cyc.com/2006/07/27/cyc/Welshing-Cheating>
                                                                      The main service, OkkamCore, allows for the publishing,
None of these links are made from the pages that are actually         modifying and removing of entities and assertions of identity and
identifying         these        concepts         such         as     a retrieval service based on a set of criteria. A prototype
http://dbpedia.org/resource/Welsh_language. In another example        application has been made and will be sequentially improved and
the     URI     http://dbpedia.org/resource/H.P._Lovecraft     is     upgraded throughout the duration of the project. By holding
owl:sameAs the CYC URI identifying the author and the Zitgist         identifiers for all types of entities the project hopes to avoid the
URI identifying the music band. Clearly the two are not the same.     proliferation of URIs that is currently occurring. For the purposes
                                                                      of linked data it is yet to be seen what the final system will
provide. The project will be monitored with interest as                [8] Cycorp Inc. http://www.cyc.com
progression develops.                                                  [9] Feitelson, D.G. On Identifying Name Equivalences in Digital
                                                                           Libraries. Information Research, 9(4), p.192.2004
5. CONCLUSION                                                          [10] Fellegi, I.P. and Sunter, A.B. A Theory for Record Linkage,
This paper has attempted to provide some motivation for finding
                                                                            Journal of the American Statistical Association, 64(328),
solutions to the co-reference problem. With the linked data
                                                                            pp.1183-1210 ,December 1969
initiative in its early stages, it is important to think about the
integrity of the data being provided before errors are found in the    [11] Gangemi, A., and Presutti, V. A Grounded Ontology for
applications that attempt to use the data.                                  Identity and Reference of Web Resources. In Proceedings of
                                                                            the 16th International World Wide Web Conference (Banff,
We would stress that DBLP and Wikipedia/DBpedia are valuable
                                                                            Canada) ACM.
and hard-won facilities that deliver searchable resources very
effectively to their many users. The problem that is arising is that   [12] Han, H., Hongyuan, Z., and Giles, C.L. Name
in the context of the Semantic Web and Linked Data, different               Disambiguation in Author Citations using a K-Way Spectral
measures of quality pertain. It is the very Network Effect that the         Clustering Method. In Proceedings of the 5th ACM/IEEE-CS
Linked Data community is seeking that causes the difference.                Joint Conference on Digital Libraries.(Denver) ACM
The issue has attracted significant theoretical debate, yet the only   [13] Harnad, S., Carr, L., Brody, T. and Oppenheim, C. Mandated
systems attempting to solve the problem are the two mentioned in            online RAE CV’s linked to univerestiy eprint
Section 4. It would be in the interest of the whole Semantic Web            archives:enhancing UK research impact and assessment.
community if this issue was carefully considered as a fundamental           Ariadne http://www.ariadne.ac.uk/issue35/harnad/
part of the architecture needed to make the Semantic Web gain          [14] Hepp, M., Siorpaes, K. and Bachlechner, D. Harvesting Wiki
widespread adoption.                                                        Consensus Using Wikipedia Entries as Vocabulary for
                                                                            Knowledge Management. IEEE Internet Computing. 11(5)
6. ACKNOWLEDGEMENTS                                                         pp.54-65 Sep 2007.
This work is supported under the ReSIST Network of Excellence          [15] Hogan, A., Harth, A and Decker, S. A Grounded ontology
(NoE) which is sponsored by the Information Society Technology              for Identity and Reference of Web Resources. In Proceedings
(IST) priority of the EU Sixth Framework programme (FP6)                    of the 16th International World Wide Web Conference
under contract number IST-4-026764-NOE.                                     (Banff, Canada) ACM.
                                                                       [16] Jaffri, A., Glaser, H., and Millard, I. URI Identity
                                                                            Management for Semantic Web Data Integration and
7. REFERENCES                                                               Linkage. In Proceedings of the Workshop on Scalable
[1] Agresti, A. and Coull, B.A. Approximate is better than                  Semantic Web Systems (Vilamoura, Portugal 2007)
    ‘Exact’ for Interval Estimation of Binomial Proportions. The            Springer.
    American Statistician. 52 pp.119-126.1998
                                                                       [17] McCallum, A., Niham, R., and Ungar, L.H. Efficient
[2] Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R.           Clustering of High-Dimensional Data Sets with Application
    and Ives, Z. DBpeddia: A Nucleus for a Web of Open Data.                to Reference Matching. In Proceedings of the sixth ACM
    In Proceedings of the 6th International Semantic Web                    SIGKDD international conference on Knowledge discovery
    Conference (Busan, Korea 2007). Springer.                               and data mining. (Boston, USA 2000). ACM Press.
[3] Bechofer, S., Van Harmelen, F., Hendler, J., Horrocks, I.,         [18] Resilience for Survivability in IST (ReSIST) Network of
    Mcguiness, D.L., Schneider, P.F. and Stein, L.A.OWL Web                 Excellence. http://resist-noe.eu
    Ontology Language Reference, Technical Report, W3C,
    http://www.w3.org/TR/owl-ref/                                      [19] Tan, Y.F., Kan, M.-Y. and Lee, D. Search Engine Driven
                                                                            Author Disambiguation, Proceedings 6th ACM/IEEE-CS
[4] Bizer, C. and Cyganiak, R. D2R Server – Publishing                      Joint Conference on Digital Libraries, pp.314-315, ACM
    Relational Databases on the Web as SPARQL Endpoints. In                 Press, New York.
    Proceedings of the 15th International World Wide Web
    Conference.(Edinburgh, Scotland 2006).ACM                          [20] Tummarello, G., Delbru, R and Oren, E. Sindice.com:
                                                                            Weaving the Open Linked Data. In Proceedings of the 6th
[5] Bilenko, M., Mooney, R., Cohen, W., Ravikumar, P., and                  International Semantic Web Conference (Busan, Korea
    Fienberg, S. Adaptive Name Matching in Information                      2007) ACM
    Integration. IEEE Intelligent Systems, 18(5) pp.16-23,2003
                                                                       [21] Volz, R., Kleb, J., and Mueller, W. Towards Ontology-based
[6] Booth, D. URIs and the Myth of Resource Identity,                       Disambiguation of Geographical Identifiers. In Proceedings
    Proceedings of the Workshop on Identity, Meaning and the                of the 16th International World Wide Web Conference
    Web (IMW06) at International World Wide Web                             (Banff, Canada) ACM.
    Conference. (Edinburgh, Scotland. 2006) ACM
                                                                       [22] Yang, K., Jiang, J., Lee, H. and Ho, J. Extracting Citation
[7] Bouquet, P., Stoermer, H and Giacomuzzi, D. OKKAM:                      Relationships from Web Documents for Author
    Enabling a Web of Entities. In Proceedings of the 16th                  Disambiguation, Technical Report No.TR-IIS-06-
    International World Wide Web Conference (Banff, Canada)                 017,Institute of Information Science, Taipei, Taiwan
    ACM.