<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>April</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>AIDA-light: High-Throughput Named-Entity Disambiguation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Dat Ba Nguyen</string-name>
          <email>datnb@mpi-inf.mpg.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Martin Theobald</string-name>
          <email>@uantwerpen.be</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Johannes Hoffart</string-name>
          <email>jhoffart@mpi-inf.mpg.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gerhard Weikum</string-name>
          <email>weikum@mpi-inf.mpg.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Max Planck Institute for</institution>
          ,
          <addr-line>Informatics</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Antwerp</institution>
          ,
          <addr-line>martin.theobald</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2014</year>
      </pub-date>
      <volume>8</volume>
      <issue>2014</issue>
      <abstract>
        <p>To advance the Web of Linked Data, mapping ambiguous names in structured and unstructured contents onto knowledge bases would be a vital asset. State-of-the-art methods for Named Entity Disambiguation (NED) face major tradeo s regarding e ciency/scalability vs. accuracy. Fast methods use relatively simple context features and avoid computationally expensive algorithms for joint inference. While doing very well on prominent entities in clear input texts, these methods achieve only moderate accuracy when fed with di cult inputs. On the other hand, methods that rely on rich context features and joint inference for mapping names onto entities pay the price of being much slower. This paper presents AIDA-light which achieves high accuracy on di cult inputs while also being fast and scalable. AIDA-light uses a novel kind of two-stage mapping algorithm. It rst identi es a set of \easy" mentions with low ambiguity and links them to entities in a very e cient manner. This stage also determines the thematic domain of the input text as an important and novel kind of feature. The second stage harnesses the high-con dence linkage for the \easy" mentions to establish more reliable contexts for the disambiguation of the remaining mentions. Our experiments with four di erent datasets demonstrates that the accuracy of AIDA-light is competitive to the very best NED systems, while its run-time is comparable to or better than the performance of the fastest systems.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Categories and Subject Descriptors</title>
      <p>H.3.1 [Information Storage and Retrieval]: Content
Analysis and Indexing</p>
    </sec>
    <sec id="sec-2">
      <title>1. INTRODUCTION 1.1</title>
    </sec>
    <sec id="sec-3">
      <title>Motivation and Approach</title>
      <p>
        The Web of Linked Data [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] provides a wealth of data
and knowledge sources that are richly interlinked at the
entity level. The data itself is uniformly represented as RDF
triples. However, RDF data can also be embedded in Web
pages with surrounding text, and the Web also contains a
huge amount of semi-structured contents like user-created
HTML tables within Web pages. To further grow the Web
of Linked Data and advance its value for semantic
applications, we propose to consider also unstructured and
semistructured contents, detect names of embedded entities, and
link these to knowledge bases like DBpedia, Freebase, or
YAGO.
      </p>
      <p>
        In computational linguistics, the problem of mapping
ambiguous entity mentions (or just mentions for short)
occurring in natural-language texts onto a set of known
target entities is referred to as Named Entity Disambiguation
(NED). State-of-the-art NED methods [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] face major
tradeo s regarding output accuracy vs. run-time e ciency and
scalability. Fast methods, like TagMe2 [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], Illinois
Wikier [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] or DBpedia Spotlight [
        <xref ref-type="bibr" rid="ref17 ref5">17, 5</xref>
        ] use relatively simple
contextual features, and avoid combinatorially expensive
algorithms that use collective inference for jointly mapping
all mentions in a text. These methods perform very well on
straightforward input texts about prominent entities;
however, on very di cult inputs with highly ambiguous names
and mentions of long-tail entities, the accuracy of these
methods is only moderate. On the other hand, more
sophisticated methods that rely on rich contextual features
such as key phrases and on joint-inference algorithms, such
as AIDA [
        <xref ref-type="bibr" rid="ref12 ref14">14, 12</xref>
        ], tend to be much slower and thus face
scalability limitations.
      </p>
      <p>Based on our prior experience with AIDA, this paper
reconsiders the design space of possible context features and
mapping functions, in order to develop an NED system that
achieves both high throughput and high output accuracy.
We present the AIDA-light system that performs very well
even on di cult input texts, while also being as e cient
as the fastest methods. AIDA-light employs simpler
features than AIDA, thus reducing computational cost, and
also adds a new kind of feature: thematic domains like pop
music, football, skiing, etc. A major novelty of AIDA-light
is its two-stage mapping algorithm: First, our method
identi es a set of \easy" mentions which exhibit low ambiguity
and can be mapped onto entities with high con dence. The
second stage harnesses this intermediate result by inferring
the primary domain of the input text and uses this as an
additional feature to establish more reliable contexts for the
remaining, more di cult, mentions.</p>
      <p>Running Example. Consider the following natural-language
input sentence as a running example:
\Under Fergie, United won the Premier League title 13 times."
It is obvious that the above mentions of \Fergie" and \United ",
with hundreds of candidate entities, are more ambiguous
than the mention of \Premier League", which exhibits just
a few candidate entities when using a
DBpedia/Wikipediabased set of target entities. So the risk of disambiguating the
mention \Premier League" incorrectly is fairly low.
Moreover, once we x this mention to the entity Premier_League,
the English professional football league, we can infer that
the above text snippet likely (and primarily) relates to the
general domain of \Football". Within the context of this
domain, it thus becomes easier to map \United " to the entity
Manchester_United_F.C. (rather than to United_Airlines)
and \Fergie" to the entity Alex_Ferguson (rather than to
Fergie_(singer) or Sarah,_Duchess_of_York).
1.2</p>
    </sec>
    <sec id="sec-4">
      <title>State of the Art</title>
      <p>
        Bunescu and Pasca [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] were the rst to investigate the
usage of a Wikipedia-based set of target entities by de
ning a similarity measure that compares the textual context
of a mention to the Wikipedia categories associated with
each entity candidate. This initial idea was later extended
by using richer features for the similarity comparison [
        <xref ref-type="bibr" rid="ref10 ref18 ref4">4, 10,
18</xref>
        ]. In particular, instead of using the similarity function
directly, Milne [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] introduced a supervised learning step to
better estimate the weights of the underlying feature
functions from labeled training data. Milne [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] also added a
notion of semantic relatedness between candidate entities
among unambiguous mentions. However, these approaches
are still limited to mapping each mention individually and
iteratively. That is, their disambiguation objective does not
consider any mutual (i.e., joint) dependencies among the
possible target entities.
      </p>
      <p>
        Collective-learning models perform a joint mapping of all
mentions [
        <xref ref-type="bibr" rid="ref12 ref14 ref16">16, 14, 12</xref>
        ] onto their matching target entities \in
one shot". These approaches typically yield the highest NED
accuracy for di cult input texts. Because of the high
combinatorial cost for the joint mapping (which typically leads to
NP-hard problems), nding an exact solution to this
objective is prohibitive. To relax the problem|and due to
sparseness of the available training data|the above methods build
a graph with edges or \factors" for mention-entity pairs and
entity-entity pairs. Even when considering only pairs of
target entities, nding the most likely mapping over the joint
probability distribution of all mappings in a probabilistic
interpretation (a form of Maximum-A-Posteriori (MAP)
inference [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]) remains an NP-hard optimization problem [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
Therefore, various approximations and variations have been
developed [
        <xref ref-type="bibr" rid="ref12 ref14 ref20">14, 20, 12</xref>
        ].
      </p>
      <p>
        In addition, some of these high-accuracy NED methods
also face high computational cost due to their rich contextual
features, most notably, key phrases of entities, entity-entity
relatedness measures, and corresponing statistics [
        <xref ref-type="bibr" rid="ref14 ref20">14, 20</xref>
        ].
This prevents such techniques from being applied to
Webscale corpora.
      </p>
      <p>
        Fast NED methods, on the other hand, use simpler
features, for example, only characteristic words and
precomputed statistics. By the more compact feature space, they
also avoid interacting with databases and instead use
customized in-memory data structures. The probably fastest
publicly available NED systems are TagMe [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] and
Spotlight [
        <xref ref-type="bibr" rid="ref17 ref5">17, 5</xref>
        ]. TagMe uses a light-weight form of coherence
by averaging relatedness scores over all candidate entities
for a mention that co-occurs with a given mention. These
scores are precomputed for NED throughput. Spotlight uses
word-level statistics to estimate the probability of an entity
given the context of a mention. This is combined, in a
probabilistic manner, with prior distributions on entity names
and entity prominence. Compared to full- edged
collectiveinference methods, this approach is much more light-weight
by avoiding key phrases and relatedness measures over entity
pairs.
1.3
      </p>
    </sec>
    <sec id="sec-5">
      <title>Contributions</title>
      <p>We summarize the novel aspects of this work as follows.
AIDA-light makes a judicious choice of contextual features
to compute pairwise mention-entity and entity-entity
similarities. These feature functions allow us to store the
underlying context statistics with low memory footprint,
which contributes to faster processing.</p>
      <p>AIDA-light is the rst approach that harnesses a thematic
domain hierarchy to capture measures for domain-entity
and entity-entity coherence.</p>
      <p>AIDA-light employs a novel two-stage algorithm in which
we determine the \easy and low-cost" mappings rst. By
doing so, we reduce both the complexity and di culty of
the second-stage disambiguations.</p>
      <p>AIDA-light is a complete system for NED, which is
orders of magnitude faster than AIDA while achieving
comparable output quality. Our experimental comparisons
against AIDA, Spotlight, and Illinois Wiki er show that
AIDA-light is consistently close to the best competitor (or
is even the best) in terms of both run-time and accuracy.</p>
    </sec>
    <sec id="sec-6">
      <title>2. SYSTEM ARCHITECTURE</title>
      <p>
        As shown in Figure 1, AIDA-light focuses on the
disambiguation of named-entity mentions in natural-language
input texts such as news articles, Web pages and Wikipedia
articles. We assume that the mentions in the input text have
been recognized and annotated by an NER tool, such as the
Stanford tagger [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] or a similar annotation tool.
      </p>
      <p>
        In its rst processing stage, AIDA-light divides the
sequence of mentions as they have been marked in the input
text into two parts using its mention partitioner component.
This form of partitioning is based on the number of
possible candidate entities each of the mentions can be mapped
to in the background KB. At this stage, we thus focus on
detecting easy mentions, i.e., those mentions with very few
candidate entities, which exhibit a low level of ambiguity
and thus can be mapped to the correct target entity with
high con dence. Once these easy mentions are xed to their
respective sets of target entities, the more general domain
of the input text is discovered via the entity-domain
dictionary. For example, if there are multiple occurrences of
football clubs in the text snippet, then the text likely
relates to the domain \Football". The chosen target entities
(a) The system architecture.
(b) The collective mapping algorithm [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
as well as their domains are added to the mention context
of the remaining, still ambiguous mentions. Finally, the
detailed similarities (e.g., mention-entity context similarities)
are recomputed for the second stage of disambiguation. At
this stage, our collective mapping algorithm performs the
disambiguation of the remaining mentions using the set of
feature functions described in Section 3.
      </p>
      <p>Figure 2 illustrates how AIDA-light processes our earlier
example sentence \Under Fergie, United won the Premier
League title 13 times." After the preprocessing steps
(including tokenization and NER) are performed, a sliding window
is contiguously moved over the resulting token and mention
sequences, thus keeping the current mention that is about
to be disambiguated at the center of the window. Hence,
for each further mention that we disambiguate, this window
is moved forward. Both surrounding tokens and mentions
within this window are taken into the context of this central
mention. We remark that in this example, some
intermediate results are hidden. For example, the tokenizer (e.g.
Stanford tokenizer) splits \Premier League" into two tokens
\Premier" and \League"; however, these two tokens create a
mention based on the results of NER, and thus AIDA-light
only considers this phrase as a single unit. As shown in the
gure, the ambiguities of the three mentions in our example
sentence vary substantially. The mention \Premier League"
is mapped to only very few candidate entities (even just to
one in this example). Thus, we obtain a high-con dence
disambiguation of this mention to the entity Premier_League.
Thanks to this entity and the domain hierarchy, which maps
an entity to its respective domains (such as football, politics,
etc.), AIDA-light is able to predict that the text relates to
the domain \Football". As a result, it is now much easier
to choose also the correct entities for the other mentions,
for example, to map the mention of \United " to the entity
Manchester_United_F.C. rather than to United_Airlines.</p>
    </sec>
    <sec id="sec-7">
      <title>FEATURE MODEL</title>
      <p>
        As mentioned before, AIDA-light employs a combination
of various feature functions to perform NED via a two-stage
mapping algorithm. Speci cally, we investigate the usage of
seven di erent feature functions that indicate either
1) the likelihood of a mention M within its given textual
context to represent a named entity E out of a set of
known entities E in our background KB, or
2) the coherence (i.e., the semantic similarity) of a pair of
entity candidates E1,E2 2 E with respect to this KB.
These feature functions, which are de ned in detail in the
following subsections, partly consist of simpli ed context
features used in AIDA (e.g., using sets of plain key tokens
instead of sets of key phrases as in [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]). In addition, we
employ two novel features, which are based on a prede ned
domain hierarchy and the Wikipedia category system with
which the candidate entities are associated.
3.1
      </p>
    </sec>
    <sec id="sec-8">
      <title>Preprocessing Steps</title>
      <p>
        Token Sequences. We start by tokenizing a given input
text into a token sequence T = hT1; : : : ; Tti of white-space
and punctuation separated terms. All tokens used as
input for our context features are stemmed using the Porter
stemmer. This token sequence serves as input to the various
context features employed by the disambiguation algorithm.
Mention Sequences. We assume that a subset T0 T
of these tokens has been annotated as a sequence of
mentions M = hM1; : : : ; Mmi by an auxiliary NER tool (using,
e.g., the Stanford [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] or Illinois [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] taggers). Formally, we
designate an annotator function NER : T ! M for this
purpose, which allows us to obtain the mention Mj := NER(Ti)
that is associated with a given token Ti (or null if Ti is not
associated with any mention).
      </p>
      <p>
        Mention Dictionary. In addition to the NER
function, we also assume the availability of a dictionary function
Dict : M ! 2E that identi es the set of candidate entities
Ej E for a given mention Ej := Dict (Mj ) to be provided as
input to our NED algorithm. Just like AIDA [
        <xref ref-type="bibr" rid="ref12 ref14">14, 12</xref>
        ],
AIDAlight employs the means relation of YAGO21 to identify this
set of candidate entities for a (possibly ambiguous)
mention Mj . The entries in the dictionary were extracted from
link anchors, disambiguation pages and redirection links in
Wikipedia.
      </p>
      <p>
        Dictionary Augmentation via LSH. In order to
improve recall over a variety of input texts, including Web
1YAGO2.4.0 - http://www.mpi-inf.mpg.de/yago-naga/yago/
pages where many out-of-dictionary mentions occur, we
apply Locality Sensitive Hashing (LSH) in combination with
a min-wise permutation scheme [
        <xref ref-type="bibr" rid="ref1 ref15 ref9">9, 15, 1</xref>
        ] to cover more
spelling variations among these mentions. That is, once an
out-of-dictionary mention occurs, AIDA-light rst attempts
to nd similar mentions for it via LSH. Second, all possible
entity candidates of these similar mentions are assigned to
this mention as candidate entities. Let us take the
mention \Northwest Fur Company " as an example. This
mention does not exist in the means relation of YAGO2, and
thus there is no entity candidate for it. However, via LSH
we are able to detect that \Northwest Fur Company " and
the dictionary entry \North West Fur Company " are highly
similar, such that AIDA-light is able to identify the entity
North_West_Company as a candidate entity for this mention.
KORE [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] is the rst work that integrated LSH into an
NED system to cluster key-phrases for an e cient form of
similarity computation. Our work touches upon another
important issue, which is to deal with the lack of dictionary
resources in an NED system.
      </p>
      <p>We remark that, according to the above de nitions, two or
more mentions, which are represented by identical tokens
occurring at di erent locations in the input sequence may
be mapped to di erent entities. That is, the same annotated
name may very well be disambiguated to di erent meanings
if this name occurs at di erent positions of the input text.</p>
      <p>We next explain how we maintain the contexts of both
mentions and candidate entities which form the basis for
the actual disambiguation step.
3.1.1</p>
      <sec id="sec-8-1">
        <title>Mention Context</title>
        <p>The mention context is extracted from the annotated text
snippet T that contains a mention Mj which we aim to
disambiguate. For both e ciency and robustness reasons,
this mention context is limited to a sliding window M0 =
hMj k; : : : ; Mj; : : : ; Mj+ki with 2 k + 1 mentions that
surround Mj. Similarly, we consider also a sliding window
T0 = hTi l; : : : ; Ti; : : : ; Ti+li of 2 l + 1 tokens surrounding
Mj, such that Mj = NER(Ti).</p>
        <p>Domains. Within such a mention context, each candidate
entity E 2 E may be assigned to (i.e., be mapped to) zero or
more domains via a domain function Dom : E ! 2D. This
set of domains D consists of the collection of 46 manually
selected classes 2 (such as \Football") of the subclassOf
hierarchy of YAGO2 shown in Figure 3. As inherited from the
original subclassOf relation, these 46 domains form a DAG
structure.</p>
        <p>In the rst stage of the NED algorithm, both neighboring
tokens and mentions that fall within their respective sliding
windows are taken into account by our disambiguation
algorithm. Later, as the algorithm proceeds from its rst to
its second stage, the context of a mention might hold extra
information such as the domains that the neighboring
mentions under this context are associated with. At the rst
stage, however, these domains are unknown, and the
iterative mapping described in Section 4.2, updates the mention
contexts after the rst round by the set of chosen entities
and their respective domains.
3.1.2</p>
      </sec>
      <sec id="sec-8-2">
        <title>Entity Context</title>
        <p>
          In analogy to the mention context, the entity context is
used to describe each entity E out of the set of known entities
E with respect to a given background KB such as YAGO2.
We focus on two main components for this purpose. The
rst is a set of key tokens extracted from all the key phrases
provided by AIDA for the respective entity. The second
component consists of a set of Wikipedia categories which
is again obtained from YAGO2 as the background KB.
Key Tokens. Each entity E 2 E is assigned to zero or more
key tokens of type string via a token function Tok : E !
2string . These tokens are obtained by simplifying the key
phrases provided by AIDA [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] for each entity in YAGO2.
For example, the set consisting of the two key phrases f \U.S.
President", \President of the U.S." g in AIDA is reduced to
the set of key tokens f \president", \U.S." g in AIDA-light.
Wikipedia Categories. Additionally, each entity E 2
E is assigned to zero or more Wikipedia categories via a
category function Cat : E ! 2C. These categories C are
obtained from the type relation of YAGO2 (such as
wikicategory:English_footballers) which thus form the leafs
in the type system in the YAGO2 knowledge base.
        </p>
        <p>Both of these components together form the basis for
de ning the pairwise coherence of two entities, or the
pairwise similarity between a mention and an entity if the
domains are known, respectively.
3.2</p>
      </sec>
    </sec>
    <sec id="sec-9">
      <title>Basic Feature Functions</title>
      <p>For the following steps, we consider only sets of
candidate entities Ei = Dict (Mi). That is, we consider only
2The 46 domains: badminton, baseball, basketball, cricket,
football, golf, table tennis, rugby, soccer, tennis, volleyball, cycling,
skating, skiing, hockey, mountaineering, rowing, swimming, sub,
diving, racing, athletics, wrestling, boxing, fencing, archery,
shing, hunting, bowling, agriculture, alimentation, architecture,
computer science, engineering, medicine, veterinary, astronomy,
biology, chemistry, earth, mathematics, physics, economy, fashion,
industry, politics.
for those mentions Mi that occur in the mention context
M0 = hMj k; : : : ; Mj; : : : ; Mj+ki of a mention Mj that is
about to be disambiguated. Moreover, these mentions need
to match an entry in our mention dictionary as described
earlier. Both these points help us to signi cantly reduce the
amount of candidate entities for a given mention.
3.2.1</p>
      <sec id="sec-9-1">
        <title>Prior</title>
        <p>The function prior re ects the likelihood that a mention
Mi refers to an entity Ei;j with respect to the link structure
of Wikipedia. It is thus de ned as the relative frequency
with which a link with anchor Mi points to a Wikipedia
article representing entity Ei;j .
8Mi 2 M0; 8Ei;j 2 Ei := Dict (Mi ) :
f1(Mi; Ei;j ) := prior (Mi; Ei;j)
:=
count(Mi ! Ei;j)
count(Mi)
(1)
The respective counts are obtained from a recent WikiMedia
dump of English Wikipedia articles3. Each such count is
normalized by the number of times Mi occurs as an anchor
among all links in Wikipedia. It thus produces a real value
in the range [0; 1].
3.2.2</p>
      </sec>
      <sec id="sec-9-2">
        <title>Mention-Name Similarity</title>
        <p>The similarity between a mention (e.g., \Obama") and an
entity name (e.g., Barack_Obama) is measured by the Jaccard
similarity over 3-gram character sequences extracted from
the respective mention and entity strings.
8Mi 2 M0; 8Ei;j 2 Ei := Dict (Mi ) :
f2(Mi; Ei;j ) := Jaccard (Mi; Ei;j )
:=
j3 -grams(Mi) \ 3 -grams(Ei;j)j (2)
j3 -grams(Mi) [ 3 -grams(Ei;j)j
White-spaces and truncators (such as \ ") are removed from
the input strings before the 3-grams are generated. Hence,
Jaccard also produces a real value between 0 and 1.
3.2.3</p>
      </sec>
      <sec id="sec-9-3">
        <title>Context Similarity</title>
        <p>The function context similarity de nes how similar the
token context T0 = hTi l; : : : ; Ti; : : : ; Ti+li surrounding a
mention Mi and the key tokens Tok (Ei;j ) of an entity
candidate Ei;j are.
8Mi 2 M0; 8Ei;j 2 Ei := Dict (Mi ) :
f3(Mi; Ei;j) := Overlap(T0; Tok (Ei;j))
:=</p>
        <p>jT0 \ Tok (Ei;j)j
min(jT0j; jTok (Ei;j )j)
(3)
It is thus estimated as the overlap coe cient of tokens
extracted from the mention context T0 and the key tokens
Tok (Ei;j ) (thus using word stems as tokens and excluding
stopwords). It produces a real value between 0 and 1.
3.2.4</p>
      </sec>
      <sec id="sec-9-4">
        <title>Surrounding Mention Coherence</title>
        <p>This feature function measures the semantic coherence
among mentions Mt within the context of a mention Mi
which we aim to disambiguate. Thus, given the mention
context M0 = hMi k; : : : ; Mi; : : : ; Mi+ki that is surrounding
a mention Mi, this feature function computes the pairwise
coherence between Mi and the remaining mentions Mt 2 M,
t 6= i, based on their co-occurrence counts in Wikipedia.
3http://dumps.wikimedia.org/enwiki/20130103/
For example, if there is a pair of mentions \Victoria" and
\David Beckham", then \Victoria" tends to be mapped to
Victoria_Beckham and \David Beckham" to David_Beckham,
respectively.</p>
        <p>:=</p>
        <p>This features thus resembles the conditional probability
P (Ei;j jMt) that an entity Ei;j occurs when we are given
a surrounding mention Mt. In YAGO2, this probability is
again estimated over an English Wikipedia dump by
counting how many times entity Ei;j and mention Mt occur
together in an article. This count is normalized by the absolute
number of times mention Mt occurs (i.e., using count (Mt))
in all articles. In order to avoid counts of zero, we apply a
simple smoothing technique for our implementation by
assigning 1 to unseen mention-entity pairs. This already
produces a real value from 0 to 1. However, it is quite small in
most cases. And thus, we apply a linear normalization to all
values among the candidate set Ei of a mention.
3.2.5</p>
      </sec>
      <sec id="sec-9-5">
        <title>Entity-Entity Context Coherence</title>
        <p>The entity-entity context coherence function nally
reects the pairwise relatedness between the two entities via
their key tokens sets Tok (Ei;j ) and Tok (Et;v).
8Mi; Mt 2 M0; 8Ei;j 2 Ei := Dict (Mi );
8Et;v 2 Et := Dict (Mt ) :
f6(Ei;j ; Et;v) :=
:=</p>
        <p>Overlap(Tok (Ei;j ); Tok (Et;v))</p>
        <p>jTok (Ei;j ) \ Tok (Et;v)j
min(jTok (Ei;j )j; jTok (Et;v)j)
(5)
It is once more computed as the overlap coe cient of two
key tokens sets. It produces a real value from 0 to 1.
3.3</p>
      </sec>
    </sec>
    <sec id="sec-10">
      <title>Domain-Oriented Feature Functions</title>
      <p>3.3.1</p>
      <sec id="sec-10-1">
        <title>Domain Hierarchy</title>
        <p>
          Despite the obvious usefulness of Wikipedia categories for
estimating the coherence of two entities or the coherence of
an entity and a domain, there|so far|has not been much
work investing this particular issue. We thus propose to
build hierarchies between Wikipedia categories (instances
of the type relation) and domains (leafs in the subclassOf
relation system). Both kinds of semantic relations are
available in YAGO2 [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] via the yagoWordnetDomains and
wordnetDomainHierarchy relations4. That is, a Wikipedia
category is rst mapped to its WordNet [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] type, and it is
then traced back to a more generic domain in one of the 46
domains, such as \Sports" (including football, basketball,
etc.), \Science" (including computer science, mathematics,
etc.), and so on, which are second-level nodes in the
wordnetDomainHierarchy relation in YAGO2. Note that, we
manually removed overly generic (and thus noisy) domains, such
as the \Person" domain, which is associated with the
wikicategory:Living_people category in Wikipedia and thus
spans multiple of the aforementioned domains.
4http://www.mpi-inf.mpg.de/yago-naga/yago/downloads.html
(6)
(7)
        </p>
        <p>Once these hierarchies (as shown in Figure 3) are built,
the coherence between two such domains, or between a
category and a domain, can be computed as the inverse of the
length of the shortest path between them in this hierarchy.
In addition, this feature may help to predict the domain of
the input text when some of the mentions with low
ambiguity are already xed to entities, which results in a better
local similarity estimation for the remaining, more
ambiguous mentions. For example, if a number of \easy mentions"
have been mapped to soccer players, clubs or leagues, we can
predict that the text likely relates to the domain \Football".
3.3.2</p>
      </sec>
      <sec id="sec-10-2">
        <title>Domain-Entity Coherence</title>
        <p>The domain-entity coherence function re ects the (binary)
relevance of assigning an entity to one of the prede ned
domains (e.g., \Football").</p>
        <p>f5(Mi; Ei;j ) :=
8Mi 2 M0; 8Ei;j 2 Ei := Dict (Mi ); Domains D :
8
&lt; 1</p>
        <p>9D 2 D ^ Ei;j 2 D
: 0</p>
        <p>otherwise</p>
        <p>In case a chosen entity candidate Ei;j belongs to one of our
prede ned domains D, we are thus able to take the coherence
between this entity candidate and the domain hierarchy into
account. This feature function only produces a binary value
of 0 or 1, which re ects whether there is a domain D 2 D
holding Ei;j , or not.
3.3.3</p>
      </sec>
      <sec id="sec-10-3">
        <title>Entity-Entity Category Coherence</title>
        <p>When using both YAGO2 or DBpedia 3.8 as background
KB, Wikipedia categories (such as wiki-category:English_
footballers or wiki-category:FIFA_World_Cup) may
provide good hints on the coherence among pairs of entities,
especially for long-tail entities with otherwise poor context
information. The entity-entity category coherence function
thus estimates the relatedness between the two entities via
their assigned Wikipedia categories Cat (Ei;j ) and Cat (Et;v).
8Mi; Mt 2 M0; 8Ei;j 2 Ei := Dict (Mi );
8Et;v 2 Et := Dict (Mt ) :
f7(Ei;j ; Et;v) := Coherence(Cat (Ei;j ); Cat (Et;v))
1
:=</p>
        <p>max
Ci;j;x 2Cat(Ei;j) Distance(Ci;j;x ; Ct;v;y )</p>
        <p>Ct;v;y 2Cat(Et;v)
Here, the function Distance of two categories is computed
as the shortest path between them in a domain hierarchy D.</p>
        <p>It is computed as the maximum coherence over all pairs of
Wikipedia categories the two candidate entities belong to.
The coherence of a category pair can be computed by using
the domain-hierarchy described above. It produces a real
value from 0 to 1.
3.4</p>
      </sec>
    </sec>
    <sec id="sec-11">
      <title>Objective Function</title>
      <p>To aim for a robust disambiguation algorithm, we
combine the seven individual feature functions described above
into a single objective function (all functions except prior
are computed at runtime). In doing so, we disambiguate
the entire sequence of mentions occurring in an input
document jointly. Thus let MD = hM1; : : : ; Mki denote this
sequence of mentions occurring in the input document, and let
ED = hE1; : : : ; Eki, Ei 2 Dict (Mi ), denote its corresponding
set of disambiguated entities. We select exactly one entity
candidate Ei for each mention Mi 2 MD, i = 1::k subject
to the following optimization objective NED : 2M ! 2E.</p>
      <p>NED(hM1; : : : ; Mki) := hE1; : : : ; Eki
:= argmaxE12Dict(M1 )
X pj fj (Mi; Ei) +
j=1::5
i=1::k</p>
      <p>Ed2Dict(Mk )
X pj fj (Et; Ev)
j=6::7
t=1::k
v=1::k
t6=v
(8)
such that P pj = 1 and p6 + p7 = 1.</p>
      <p>j=1::5
The rst ve feature functions f1; :::; f5 thus re ect the
likelihood of choosing an entity based on the respective mention
and entity contexts, while functions f6 and f7 compute the
pairwise coherences among the chosen entities.</p>
      <p>We remark that, as described before for the feature
functions, the mention context for each mention Mi 2 MD still
is based on a sliding window of mentions M0 and tokens T0
surrounding that mention (see Section 3.1.1).</p>
    </sec>
    <sec id="sec-12">
      <title>DISAMBIGUATION ALGORITHM</title>
      <p>
        We construct a weighted, undirected dependency graph
whose nodes are formed by all mentions and their candidate
entities, and whose weighted edges are created by the
feature functions described in the previous section [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
Consequently, the objective function described in Section 3.4
is solved by nding the best subgraph where each
mention is mapped onto only one entity candidate. This
densesubgraph problem is known to be NP-hard [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] as it
generalizes the well-studied Steiner-tree problem. We thus adopt
the greedy approximation algorithm proposed in [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] and
extend it into a new, two-stage disambiguation algorithm.
4.1
      </p>
    </sec>
    <sec id="sec-13">
      <title>Entity-Mention Dependency Graph</title>
      <p>The mention-entity dependency graph (see Figure 1 for
an example) is typically dense on the entity side, often
having hundreds or thousands of nodes because there might be
many candidate entities for common mentions (e.g.,
common rst names, last names, etc.). Therefore, in order to
stay e cient, we rst propose to keep at most k of the best
entity candidates for each mention in the graph. Thanks
to the multi-phase computation (see the next subsection),
which iteratively updates the context and thus gives a
better initial mention-entity similarity estimation, this cut-o
at the top k (even with fairly small values of k, e.g., k = 5)
does not a ect accuracy too much. Second, an entity node
which is not the last remaining candidate entity of a mention</p>
      <sec id="sec-13-1">
        <title>Algorithm 1 Collective Mapping Algorithm.</title>
        <p>Require: Weighted dependency graph of mentions and
entities.</p>
        <p>Ensure: Best sub-graph with one edge per mention.
1: //build the graph
2: for each mention M do
3: for each candidate entity Ei of mention M do
4: sim(M; Ei) := P ptft(M; Ei); //local similarity
t=1::5
5: end for
6: Keep the k best candidates, drop the others;
7: end for
8: for each candidate entity Ei of mention M do
9: Wi := sim(M; Ei); //init weight
10: for each entity Ej not generated from mention M do
11: //compute coherence
12: coh(Ei; Ej ) := p6f6(Ei; Ej ) + p7f7(Ei; Ej );
13: Wi += coh(Ei; Ej ); //update weight
14: end for
15: end for
16: /*an entity is non-taboo if it is not the last candidate of
17: a mention*/
18: while graph has non-taboo entity do
19: Determine non-taboo entity node with lowest weight,
remove it and all its incident edges;
20: Recompute the weights;
21: end while
(and hence a \non-taboo" node), and which holds the
currently smallest weighted degree, is iteratively removed from
the mention-entity graph as shown in Algorithm 1.
4.2</p>
      </sec>
    </sec>
    <sec id="sec-14">
      <title>Two-Stage Computation</title>
      <p>Current NED approaches either map each mention
separately in a \context-free" manner, which cannot capture the
intricate coherences among entities, or they map all
mentions together in an expensive joint-mapping step. However,
in practice the level of ambiguity of various mentions
occurring in a text snippet often varies substantially. Often, there
are mentions which are not very di cult to disambiguate
because of the very few candidates (e.g., less than 4
candidates based on our experiments) that come into question for
them. We thus propose to organize the mapping algorithm
into 2-stage computation as shown in Algorithm 2.</p>
      <p>For each stage of the disambiguation algorithm, we apply
the collective mapping algorithm described in the previous
subsection by using the 7 features functions from Section 4.1.
The unknown context-related domains D0 in the rst stage
does not a ect the algorithm because the corresponding
feature function f6 only returns 0. The multi-phase
computation not only helps to reduce the complexity of collective
mapping algorithm but it also contributes to a better local
similarity estimation for \highly ambiguous" mentions (in
the second stage) by updating the mention contexts such as
context-related domains or already chosen entities (from the
rst stage).
5.1</p>
    </sec>
    <sec id="sec-15">
      <title>EXPERIMENTS</title>
    </sec>
    <sec id="sec-16">
      <title>Setup</title>
      <p>Test corpora. We conducted experiments for four
different test corpora, with di erent characteristics:</p>
      <sec id="sec-16-1">
        <title>Algorithm 2 Multi-Phase Computation.</title>
        <p>Require: Sequence of mentions M0;
1: Initialize context-related domains D0 := ;;
2: Find sets Ei = Dict (Mi), Mi 2 M0, of \easy mentions"
such that jEij 4;
3: Jointly disambiguate sets of candidate entities for these
easy mentions using Algorithm 1 and let M00 M0
denote the set of mentions disambiguated after the rst
stage;
4: for each domain D 2 D do
5: if at least half of the entities chosen in the rst stage
relate to D then
6: add D to D0;
7: end if
8: end for
9: Recompute the mention-entity similarities;
10: Disambiguate M0 M00 using Algorithm 1;</p>
        <p>
          CoNLL-YAGO: The CoNLL-YAGO testb corpus was
originally used in [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]. It is based on the CoNLL 2003
dataset, which consists of 231 news articles (4485
mentions) with an average article length of 216 words. There
are long-tail entities in form of short mentions, which
challenges NED systems to get high precision on it. For
example, it is challenging to map the mention \Japan"
in the sentence \SOCCER- Japan get lucky win." to the
right entity Japan_national_football_team, not to the
prominent entity Japan.
        </p>
        <p>
          WP: The WP corpus [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] consisting of 2019 di cult
sentences (10318 mentions) is a speci c sample from Wikipedia
articles, with emphasis on short context (52 words on
average) and highly ambiguous mentions and long-tail
entities. Let's take the text mentioning AC/DC band as an
example:
\Johnson, Young, Rudd, Young and Williams, live in
Tacoma, Washington, 31 August 2009."
These short person names with hundreds of entity
candidates are hard for any current NED systems to deal
with.
        </p>
        <p>Wiki-links: We randomly extracted a subset of the
Wikilinks dataset 5, consisting of 10665 annotated articles
(21643 mentions) with an average article length of 5344
words. They are web-pages in a number of websites
including blogspot 6, livejournal 7, etc. Note that in this
dataset, the annotated mentions are limited to href
anchor texts that point to a Wikipedia page. This way, each
article has only a marked-up mentions (often only one or
two), although many additional entities are mentioned in
the text. Moreover, most annotated mentions refer to
prominent entities.</p>
        <p>Wikipedia articles: We randomly extract 153 Wikipedia
featured articles 8, with a total of 19264 mentions and an
average article length of 9487 words. This corpus covers
a variety of topics such as football (Arsenal_F.C page),
entertainment (Batman page), people (John_Calvin
page), etc.
For each of these four NED tasks, ground truth is given,
either by manual annotation or by the href links in Wikipedia
(which, of course, are not given to the NED methods). In
all cases, we consider only mentions that refer to individual
entities like people, organizations, places, events, etc. We
disregard text phrases that refer to general concepts such
as \peace", \harmony", \logic", \statistics", etc. In so-called
Wiki cation tasks, such as TAC KBP Entity Linking, all
Wikipedia articles are considered as targets, including
general concepts. For the Web of Linked Data and this paper's
emphasis on NED, this does not make sense; hence our focus
on individual entities. We implement this restriction by
considering only mentions with ground-truth entities that are
in both Wikipedia and YAGO (as YAGO does not include
general concepts of the above kind).</p>
        <p>Parameters. The parameters p1 : : : p7 to combine
AIDAlight 's feature functions were manually tuned with the goal
of maximizing the precision of AIDA-light on CoNLL-YAGO
testa training corpus. Plus, based on our experiments, we
chose a sliding window of 11 sentences (5 sentences before,
5 sentences after and the sentence holding the mention) to
extract the context of a mention.</p>
        <p>Performance measures. Our measure for output
quality is the per-mention precision: the fraction of mentions
that are correctly mapped to the proper entities. When a
system has a choice of abstaining on a di cult mention by
reporting \null" although the proper entity is in the
knowledge base, we count this as a false mapping. Alternatively,
we could count precision only on the non-null entities, and
additionally consider the correctly mapped fraction of all
entities as recall. However, our notion of precision captures
both of these aspects in a single metric.</p>
        <p>Our measure for e ciency is the run-time, measured in
single-threaded mode with cold caches on a server with 8
(+8 in hyper-threading mode) Intel Xeon CPUs at 2.4GHz
and 48GB of memory.</p>
        <p>Systems under comparison. We compared AIDA-light
against the following state-of-the-art methods:</p>
        <p>
          AIDA [
          <xref ref-type="bibr" rid="ref12 ref14">14, 12</xref>
          ]: using rich features stored in a SQL database;
Spotlight [
          <xref ref-type="bibr" rid="ref17 ref5">17, 5</xref>
          ]: statistical version without using any
thresholds to get as many entities as possible.
        </p>
        <p>
          We did not include TagMe [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] in our comparison, as it uses
its own special method for mention recognition; so we could
not control that it would process exactly the same mentions
as the other methods. Besides, TagMe has been designed
for very short texts like tweets|a use-case not considered
here.
        </p>
        <p>In the following, we rst report on experiments with
different AIDA-light con gurations, then on our comparison
against state-of-the-art baselines regarding output quality,
and nally on run-time measurements.
5.2</p>
      </sec>
    </sec>
    <sec id="sec-17">
      <title>AIDA-light in Different Configurations</title>
      <p>In order to study the in uence of the di erent assets of
AIDA-light, we con gured our system in di erent ways and
ran it on the CoNLL data, which presumably is the most
di cult one of our four test sets.</p>
      <p>One possible concern about our architecture is that the
multi-stage algorithm could su er from early mistakes that
propagate in the processing pipeline. Our experiments showed
a precision of 96% for the \easy" mentions identi ed in stage 1.
So this potential concern was not an issue.</p>
      <p>A second concern was that using only the top K candidate
entities for each mention is risky in potentially missing the
proper entity. We compared two di erent settings in this
regard:</p>
      <p>Top 5: Iteratively updating the context in our
multiphase computation is deactivated. The local similarity is
computed with the lack of context-related domains.
Top 5 with domain coherence: The local similarity is
updated by the chosen entities for \easy mentions". That
is, the entity-domain coherence is taken into account.
Table 1 shows the top-5 precision results for these two
settings. When the correct entity of a mention was among the
top-5 candidates, we count this stage as being successful,
otherwise it is counted as false. The results demonstrate
that the context-related domains have substantial bene t:
3% better precision. In general, the precision of 95.2% at
this rst stage is very high, especially considering that the
CoNLL corpus has very di cult texts (with short contexts,
long-tail entity names, metonyms, etc.).</p>
      <p>Finally, we report the results of AIDA-light with all
mentions on CoNLL, in the following three con gurations:
KW: AIDA-light uses the baseline features including a
bag of key tokens and surrounding mentions. The
collective algorithm is applied in a single round to the entire
mention set (deactivating the two-stage approach).
KW and easy mentions: the multi-phase computation
is activated. However, AIDA-light does not update the
context (e.g. domain feature) after the rst round.
KW, easy mentions, and domains: All components
in AIDA-light including multi-phase computation and
domain features are turned on.</p>
      <p>The results are shown in Table 2. Obviously, each
additional asset helps AIDA-light in gaining output quality. Most
notably, enabling \easy mention" disambiguation stage
improves precision by almost 4%. Note that this is not at the
expense of increased run-time.
5.3</p>
    </sec>
    <sec id="sec-18">
      <title>Comparisons on NED Quality</title>
      <p>
        In our experiments, Illinois Wiki er performed poorly with
precision results around 70% or worse, probably because of
code changes (and perhaps resulting bugs) compared to the
version used in [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. To avoid improper comparisons, we
therefore discuss only the results of the remaining three
competitors: AIDA-light , AIDA, and Spotlight.
      </p>
      <p>Table 3 shows the precision results for the three systems.
AIDA-light outperforms the other two systems on CoNLL
and Wiki-links. Especially, a paired t-test shows that
AIDAlight is signi cantly better than Spotlight on both corpora
(p-value &lt; 0.01). Particularly on CoNLL, AIDA-light can
handle many di cult entities such as football teams
mentioned in the form of city names, national sport teams in
the form of country names, etc. On Wiki-links, the point
that brings 4% higher precision than others is AIDA-light 's
strong ability to handle mentions that di er from the o cial
names of entities (e.g. \Northwest Fur Company " for the
entity North_West_Company). AIDA-light is slightly worse
than AIDA on WP, but still much better than Spotlight
(20% higher precision). On Wikipedia articles, the best
system is AIDA (90.0%), followed by Spotlight (89.6%) and
AIDA-light (88.3%). However, there is no signi cant di
erence among the three systems.</p>
      <p>Overall, AIDA-light consistently exhibits high output
quality, whereas Spotlight is inferior on very di cult input texts.
AIDA is usually close in performance to AIDA-light, but
clearly loses on Wiki-links because of the small number of
mentions per document for this test case.</p>
    </sec>
    <sec id="sec-19">
      <title>Comparisons on Run-Time Efficiency</title>
      <p>The original AIDA system uses a SQL database as a
backend. Therefore, it is an order of magnitude slower that
the systems that use customized in-memory data structures.
To avoid reporting apples vs. oranges, we exclude AIDA
from the following run-time comparisons, leaving us with
two competitors: AIDA-light and Spotlight.</p>
      <p>Table 4 shows the average run-times per document for the
four test corpora. Both systems are very fast. AIDA-light is
faster on CoNLL, WP, and Wiki-links, and slightly slower on
Wikipedia articles. The reason is that AIDA-light computes
coherence including context coherence and type coherence
over all entity pairs, and thus, it becomes slower to process
large documents with hundreds of mentions.</p>
      <p>Table 5 shows the run-times on the Wiki-links corpus in
more detail. The reason for choosing Wiki-links is that
it is the largest one among our corpora with 10665
documents. The table gives 90% con dence intervals for the
per-document average run-time, and also the standard
deviation. AIDA-light is much faster than Spotlight on average.
However, they are almost the same on the 90% quantile,
that is, the 10% longest or most di cult documents. The
reason is the increasing complexity of AIDA-light when the
number of mentions increases. Overall, however, AIDA-light
clearly outperformed its competitor on this dataset.</p>
    </sec>
    <sec id="sec-20">
      <title>CONCLUSION</title>
      <p>AIDA-light is a new NED system that reconciles high
output quality with fast run-time. Our experiments have shown
that AIDA-light consistently performs as well as or better</p>
      <p>Avg Running</p>
      <p>Time</p>
      <p>KW+Easy Mentions</p>
      <p>KW+Easy Mentions+Domain</p>
      <p>Avg Running</p>
      <p>Time</p>
      <p>Accuracy</p>
      <p>Avg Running</p>
      <p>Time
0.47s</p>
      <p>Standard
Deviation
2250
2810</p>
      <p>Mean
1225
1234
352
443
than state-of-the-art competitors, over a variety of corpora
including news (CoNLL), di cult and short contexts (WP),
Web pages (Wiki-links), and Wikipedia articles.</p>
      <p>By its two-stage algorithm, its simple but expressive
features including thematic domains, and its low footprint,
AIDA-light is geared for high-throughput usage at Web scale.
Our measurements showed that AIDA-light can process a
typical Web page (from the Wiki-links corpus) within 0.2
seconds on mid-range hardware. Its architecture easily
allows it to scale out the processing of a large corpus across
the nodes of a cluster or distributed platform. By its small
memory consumption, AIDA-light runs well even on low-end
nodes of such platforms. Thus, with a single-thread
throughput of 5 pages/second, we can process, for example, the
complete ClueWeb'12 collection (lemurproject.org/clueweb12/)
with ca. 700 Million pages, in about one week (including
NER) on a mid-range 32-node, 16 cores-per-node cluster.</p>
      <p>Our future work includes further improvements of the
multi-stage algorithm, and especially a deeper integration
of the mention recognition (NER, currently implemented by
the CRF-based Stanford Tagger) with the NED method.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A. Z.</given-names>
            <surname>Broder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Charikar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Frieze</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Mitzenmacher</surname>
          </string-name>
          .
          <article-title>Min-wise Independent Permutations</article-title>
          .
          <source>Journal of Computer and System Sciences</source>
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>R.</given-names>
            <surname>Bunescu</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Pasca</surname>
          </string-name>
          .
          <article-title>Using Encyclopedic Knowledge for Named Entity Disambiguation</article-title>
          .
          <source>EACL</source>
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M.</given-names>
            <surname>Cornolti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Ferragina</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Ciaramita</surname>
          </string-name>
          .
          <article-title>A Framework for Benchmarking Entity-annotation Systems</article-title>
          .
          <source>WWW</source>
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Cucerzan</surname>
          </string-name>
          .
          <article-title>Large-scale Named Entity Disambiguation Based on Wikipedia Data</article-title>
          . EMNLP-CoNLL
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.</given-names>
            <surname>Daiber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Jakob</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Hokamp</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P. N.</given-names>
            <surname>Mendes</surname>
          </string-name>
          .
          <article-title>Improving E ciency and Accuracy in Multilingual Entity Extraction</article-title>
          .
          <string-name>
            <surname>I-SEMANTICS</surname>
          </string-name>
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>C.</given-names>
            <surname>Fellbaum</surname>
          </string-name>
          .
          <article-title>WordNet: An Electronic Lexical Database</article-title>
          . MIT Press 1998.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>P.</given-names>
            <surname>Ferragina</surname>
          </string-name>
          and
          <string-name>
            <given-names>U.</given-names>
            <surname>Scaiella</surname>
          </string-name>
          . Tagme:
          <article-title>On-the- y Annotation of Short Text Fragments (by Wikipedia Entities)</article-title>
          .
          <source>CIKM</source>
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J. R.</given-names>
            <surname>Finkel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Grenager</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Manning</surname>
          </string-name>
          .
          <article-title>Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling</article-title>
          .
          <source>ACL</source>
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.</given-names>
            <surname>Gionis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Indyk</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Motwani</surname>
          </string-name>
          .
          <article-title>Similarity Search in High Dimensions via Hashing</article-title>
          .
          <source>VLDB</source>
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>X.</given-names>
            <surname>Han</surname>
          </string-name>
          and
          <string-name>
            <given-names>J</given-names>
            .
            <surname>Zhao</surname>
          </string-name>
          .
          <article-title>Named Entity Disambiguation by Leveraging Wikipedia Semantic Knowledge</article-title>
          .
          <source>CIKM</source>
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>T.</given-names>
            <surname>Heath</surname>
          </string-name>
          and
          <string-name>
            <given-names>C.</given-names>
            <surname>Bizer</surname>
          </string-name>
          .
          <article-title>Linked Data: Evolving the Web into a Global Data Space</article-title>
          .
          <source>Morgan Claypool</source>
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>J.</given-names>
            <surname>Ho art</surname>
          </string-name>
          , S. Seufert,
          <string-name>
            <given-names>D. B.</given-names>
            <surname>Nguyen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Theobald</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G.</given-names>
            <surname>Weikum</surname>
          </string-name>
          . Kore:
          <article-title>Keyphrase Overlap Relatedness for Entity Disambiguation</article-title>
          .
          <source>CIKM</source>
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>J.</given-names>
            <surname>Ho art</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. M.</given-names>
            <surname>Suchanek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Berberich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Lewis-Kelham</surname>
          </string-name>
          , G. de Melo, and
          <string-name>
            <surname>G. Weikum.</surname>
          </string-name>
          <article-title>YAGO2: Exploring and Querying World Knowledge in Time, Space</article-title>
          , Context, and
          <string-name>
            <surname>Many Languages. WWW</surname>
          </string-name>
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>J. Ho art</given-names>
            , M. A.
            <surname>Yosef</surname>
          </string-name>
          , I. Bordino, H. Furstenau, M. Pinkal,
          <string-name>
            <given-names>M.</given-names>
            <surname>Spaniol</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Taneva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Thater</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G.</given-names>
            <surname>Weikum</surname>
          </string-name>
          .
          <article-title>Robust Disambiguation of Named Entities in Text</article-title>
          .
          <source>EMNLP</source>
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>P.</given-names>
            <surname>Indyk</surname>
          </string-name>
          and
          <string-name>
            <given-names>R.</given-names>
            <surname>Motwani</surname>
          </string-name>
          . Approximate Nearest Neighbors:
          <article-title>Towards Removing the Curse of Dimensionality</article-title>
          .
          <source>STOC</source>
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>S.</given-names>
            <surname>Kulkarni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Ramakrishnan</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Chakrabarti</surname>
          </string-name>
          .
          <article-title>Collective Annotation of Wikipedia Entities in Web Text</article-title>
          .
          <source>KDD</source>
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>P. N.</given-names>
            <surname>Mendes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Jakob</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Garc</surname>
          </string-name>
          a-Silva, and
          <string-name>
            <given-names>C.</given-names>
            <surname>Bizer</surname>
          </string-name>
          .
          <source>Dbpedia Spotlight: Shedding Light on the Web of Documents. I-SEMANTICS</source>
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>D.</given-names>
            <surname>Milne</surname>
          </string-name>
          and
          <string-name>
            <given-names>I. H.</given-names>
            <surname>Witten</surname>
          </string-name>
          .
          <article-title>Learning to Link with Wikipedia</article-title>
          .
          <source>CIKM</source>
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>L.</given-names>
            <surname>Ratinov</surname>
          </string-name>
          and
          <string-name>
            <given-names>D.</given-names>
            <surname>Roth</surname>
          </string-name>
          .
          <article-title>Design Challenges and Misconceptions in Named Entity Recognition</article-title>
          .
          <source>CoNLL</source>
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>L.</given-names>
            <surname>Ratinov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Roth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Downey</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Anderson</surname>
          </string-name>
          .
          <article-title>Local and Global Algorithms for Disambiguation to Wikipedia</article-title>
          .
          <source>HLT</source>
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>M.</given-names>
            <surname>Sozio</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Gionis</surname>
          </string-name>
          .
          <article-title>The Community-search Problem and How to Plan a Successful Cocktail Party</article-title>
          .
          <source>KDD</source>
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>