=Paper= {{Paper |id=Vol-2073/article-06 |storemode=property |title=None |pdfUrl=https://ceur-ws.org/Vol-2073/article-06.pdf |volume=Vol-2073 |dblpUrl=https://dblp.org/rec/conf/www/MandaliosTCS18 }} ==None== https://ceur-ws.org/Vol-2073/article-06.pdf
            GEEK: Incremental Graph-based Entity Disambiguation
                               Alexios Mandalios                                                         Konstantinos Tzamaloukas
                 National Technical University of Athens                                            National Technical University of Athens
                             Athens, Greece                                                                     Athens, Greece
                       amandalios@islab.ntua.gr                                                              kot@dblab.ece.ntua.gr

                           Alexandros Chortaras                                                                 Giorgos Stamou
                 National Technical University of Athens                                            National Technical University of Athens
                             Athens, Greece                                                                     Athens, Greece
                           achort@cs.ntua.gr                                                                   gstam@cs.ntua.gr

ABSTRACT                                                                                    the events of World War II.” A human reader, even one that is not
A document may include mentions of people, locations, organi-                               familiar with 20th century history, can easily deduce the mapping
zations, films, product brands and other kinds of entities. Such                            of Figure 1. However, if we try to reproduce the same results using
mentions are often ambiguous, with no obvious way for a machine                             automated methods, then considerable effort is required in order to
to map them to real world entities, due to reasons like homonymy                            overcome the inherent ambiguity of natural language.
and polysemy. The process of recognizing such mentions in unstruc-                             Even though NERD is a challenging NLP task, it has been exten-
tured texts and disambiguating them by mapping them to entities                             sively addressed in past research [22], as it is a crucial component
stored in a knowledge base is known as Named Entity Recognition                             for any system hoping to grasp the essence of natural language.
and Disambiguation (NERD) or Entity Linking.                                                The fact that NERD moderates (or ideally eliminates) the equivocal
   In this paper, we introduce GEEK (Graphical Entity Extraction                            nature of natural language becomes evident even from our simple
Kit), a NERD system that extracts named entities in text and links                          example of Figure 1. Knowing that a document contains references
them to a knowledge base using a graph-based method, taking                                 to two battleships, a military strike, and a war makes it easy to ex-
into account measures of entity commonness, relatedness, and                                tract its topics and determine its position in an extended document
contextual similarity. All relevant data is retrieved at runtime using                      collection, with minimal further processing.
public RESTful APIs. GEEK tries to push the performance limits                                 This paper introduces GEEK (Graphical Entity Extraction Kit), a
of a straightforward disambiguation method, that doesn’t require                            pipeline of tools and methods to perform NERD. It relies on Stan-
arduous training or a complex mathematical foundation.                                      ford CoreNLP for the named entity mention extraction task, and on
                                                                                            Google’s Knowledge Graph and Wikipedia APIs for collecting infor-
CCS CONCEPTS                                                                                mation on the extracted entities. The collected information is then
                                                                                            mapped onto a graph, thus transforming the task of entity linking
• Information systems → Entity resolution; Information ex-
                                                                                            into a graph problem, which is easier to handle. The resulting graph
traction; • Computing methodologies → Natural language
                                                                                            problem is solved using a heuristic method, which incrementally re-
processing; • Mathematics of computing → Approximation al-
                                                                                            fines the candidate entities. The entire pipeline is evaluated against
gorithms;
                                                                                            established NERD systems on the GERBIL framework. GEEK is
                                                                                            found to be competitive when compared against these systems in
KEYWORDS
                                                                                            the NERD task, suggesting that it is a potent alternative to methods
Named Entity Recognition, NER, Named Entity Disambiguation,                                 having a more complex analytical foundation.
NED, NERD, Google Knowledge Graph, Wikipedia, k-partite graph,                                 The rest of the paper is structured as follows: Section 2 models
max weight k-clique, worst out heuristic                                                    the NERD process in general terms. In Section 3 we start the pre-
                                                                                            sentation of the proposed system by discussing how it performs
1     INTRODUCTION                                                                          the named entity recognition step, and in Section 4 we discuss the
Most pieces of text one can find online are to a large extent unstruc-                      core of GEEK, the disambiguation step. Section 5 presents an ex-
tured and unlabeled. Extracting the mapping between named enti-                             tensive evaluation of the proposed system by comparing it to other
ties appearing in text and real world objects stored in a knowledge                         state-of-the-art systems. Finally, Section 6 discusses related work,
base contributes a great deal towards understanding unprocessed                             and Section 7 concludes the paper.
written word. As an example, consider the sentence: “Arizona and
Oklahoma are two of the ships that sank in Pearl Harbor during
                                                                                            2   NERD MODELING
Permission to make digital or hard copies of part or all of this work for personal or       In this section we model NERD in a simple way that may serve
classroom use is granted without fee provided that copies are not made or distributed       as a general framework for NERD algorithms. We assume that we
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. Copyrights for third-party components of this work must be honored.      have a knowledge base KB, which contains a mapping between
For all other uses, contact the owner/author(s).                                            entities of the world and Unique Resource Identifiers (URIs), which
LDOW2018, April 2018, Lyon, France                                                          are concise and unambiguous. We also assume that there exists
© 2018 Copyright held by the owner/author(s).
                                                                                            a set N of all names that can be used in natural language texts
LDOW2018, April 2018, Lyon, France                     Alexios Mandalios, Konstantinos Tzamaloukas, Alexandros Chortaras, and Giorgos Stamou


                                                                                  where E is the set of URIs finally selected by NED.
                                                                                     Consider text T = “Arizona and Oklahoma are two of the ships
                                                                                  that sank in Pearl Harbor during the events of World War II” of
                                                                                  Figure 1. NER gives us the set of names M = {Arizona, Oklahoma,
                                                                                  Pearl Harbor, World War II}. Next, we use Google Knowledge Graph
                                                                                  (GKG) as knowledge base KB and apply NED to map these men-
                                                                                  tions to URIs in KB. The problem is the ambiguity of mentions.
      Arizona and Oklahoma are two of the ships that sank                         For example:
      in Pearl Harbor during the events of World War II.
                                                                                             E Arizona = {/m/0vmt, /m/019r32, /m/06rxnl, . . .}

                                                                                     In particular, “Arizona” could refer to the American state of
                                                                                  Arizona1 , a Pennsylvania-class battleship named Arizona2 , and the
                                                                                  beverage manufacturing company known as Arizona3 , among other
                                                                                  candidates. NED needs to identify the correct choice for the specific
                                                                                  text T , one that abides with human understanding of written word:

                                                                                                                 Arizona 7→ /m/019r32
                                                                                                               Oklahoma 7→ /m/01b8zk
   Figure 1: Named entity disambiguation in a short text.                                                    Pearl Harbor 7→ /m/0gc1_
                                                                                                             World War II 7→ /m/081pw
to denote said entities in KB. According to our natural language
                                                                                      This mapping is also illustrated in Figure 1.
conventions, there is a mapping f between the entities in KB and
those subsets of N that can be used to refer to them:
                                                                                  3    NAMED ENTITY RECOGNITION
                              f : KB → 2 N
                                                                                  As the above-presented model suggests, the first step in any NERD
This means that every entity in KB can be referred to in texts using              pipeline is the identification of the named entity mentions set M in
only a specific set of names. If we model a text T as a simple string             a given text T . This step is important, since the discovery of exactly
consisting of characters, then the appearance of a name n ∈ N in                  those noun phrases that denote named entities directly affects the
T means that there is a possible mention of the corresponding KB                  quality of the final disambiguation results. For the NER part of
entity. The process of finding named entities in unstructured texts               GEEK we use Stanford CoreNLP4 [15], a constantly evolving NLP
is called named entity recognition (NER), and can be described as                 toolkit. In particular, we use its Entity Mentions Annotator, which
finding substrings of T that map to any name n ∈ N :                              analyzes a text T and outputs a list of named entity mentions in T ,
                              NER : T 7→ M                                        that can be used as an approximation of M. Stanford CoreNLP offers
                                                                                  three named entity recogintion models, developed using different
where                                                                             training data:
M = {m | m is a substring of T and ∃URI ∈ KB s.t. m ∈ f (URI)}                         • 3 class model, which discovers entities of type Location,
The process of mapping the named entities in M to specific URIs of                       Person, or Organization.
KB is called named entity disambiguation (NED). NED is required                        • 4 class model, which discovers entities of type Location,
because the same name may be used for different entities, i.e. there                     Person, Organization, or Misc, so that there’s a category for
may be distinct URIs e 1 , e 2 s.t. f (e 1 ) ∩ f (e 2 ) , ∅. The first step for          entities that don’t match the first three (miscellaneous).
disambiguating a mention m is to generate an appropriate candi-                        • 7 class model, which discovers entities of type Location,
date entity set for m, denoted as Em . This is defined as the set of                     Person, Organization, Money, Percent, Date, or Time.
individuals in KB that can be referred to as m:                                   Experimentation with the above CoreNLP models revealed that,
                   Em = {URI ∈ KB | m ∈ f (URI)}                                  while the 3 class model is able to capture most relevant entities,
                                                                                  in some cases it can be complemented by the 4 class and 7 class
After we generate candidate entity sets for all named entities in a               models. Moreover, the 3 class model is relatively better in detecting
text T , NED selects the most appropriate entity from each of those               the full span of multi-word named entities. Thus, aiming for max-
sets:                                                                             imum named entity recognition coverage, we combine Stanford
               NED : m 7→ e ∈ Em , for each m ∈ M                                 CoreNLP’s models using the following three step procedure:
   To sum up, NER identifies individual names in a text T and                         (1) We extract named entities using Stanford CoreNLP’s 3 class
NED maps those names to the most appropriate candidates for                               model.
them in KB. Clearly, to be more efficient, NED could disambiguate
jointly the entire M. The process of named entity recognition and                 1 https://g.co/kg/m/0vmt
disambiguation (NERD) combines NER and NED:                                       2 https://g.co/kg/m/019r32
                                                                                  3 https://g.co/kg/m/06rxnl
                                   N ER       N ED
                      NERD : T −−−−→ M −−−−−→ E                                   4 https://stanfordnlp.github.io/CoreNLP/
GEEK: Incremental Graph-based Entity Disambiguation                                                                  LDOW2018, April 2018, Lyon, France


   (2) We extract named entities using Stanford CoreNLP’s 4 class                 by saying m, we shouldn’t allow Phoenix to be a candidate entity
       model, but keep only those entities that don’t overlap with                for “Arizona”, as no one would use the name of a state to refer to
       the ones we got from the first step.                                       its capital city. In order to mitigate the effect of this problem, we
   (3) We extract named entities using Stanford CoreNLP’s 7 class                 turn to Wikipedia for help. We try to make an educated decision
       model, but keep only those entities that don’t overlap with                about whether or not an entity can be referred to as m by querying
       the ones we got from the first two steps. Furthermore, we                  Wikipedia’s disambiguation pages and redirects. This is achieved by
       reject any entities that have types of Money, Percent, Date,               consulting Wikipedia’s disambiguation pages for candidate entities
       or Time, as we aren’t interested in quantitative or temporal               and Wikipedia’s redirect pages for possible aliases of entities, and
       entities.                                                                  giving priority to those entities returned by GKG API that are also
   We should note that the type Stanford CoreNLP provides for                     returned by Wikipedia (as GKG API returns each entity’s corre-
each detected named entity may not always be correct (people                      sponding Wikipedia article). We fetch the required information
may be recognized as locations, locations as organizations, and so                using services provided by MediaWiki API10 . We access the disam-
on). For this reason, we use Stanford CoreNLP only to identify the                biguation pages and the redirects via the MediaWiki Links API11
named entity mentions in the text, and we do not use the provided                 and the MediaWiki Redirects API12 respectively.
type information in the disambiguation process.                                      By combining the aforementioned methods, we manage to con-
                                                                                  struct an approximation for a mention’s set of candidate entities.
4 NAMED ENTITY DISAMBIGUATION                                                     Building the sets of candidate entities properly is crucial for a suc-
                                                                                  cessful disambiguation system [7], as, on one hand, including too
4.1 Candidate Entity Generation                                                   few entities in a candidate entity set could lead to the exclusion of
After discovering the set M of named entities in text T , the next step           the appropriate entity, and, on the other hand, polluting a candidate
is to find the best mapping between those named entities and the                  entity set with too many entities could lead the disambiguation
canonical entities that reside in KB. To that end, for each m ∈ M                 process astray.
we generate the set of candidate entities Em that contains only
those entities that one could refer to by saying m, using Google’s                4.2     NED measures
Knowledge Graph (GKG)5 as our KB. Technically, we achieve                         After building the set of candidate entities Em for a named entity
this by using Google’s Knowledge Graph Search API (GKG API)6 ,                    mention m, we need to select a concrete entity e ∈ Em to realize
Google’s replacement for their deprecated Freebase7 . Using GKG                   the mapping m 7→ e. As stated above, our goal is try to make the
API, one can get a ranked list of GKG entities related to a given query           same mapping a human reader would make. In order to do that,
string. In particular, for each matching entity, GKG API returns its              GEEK employs three measures akin to human thinking patterns.
names, the corresponding Wikipedia8 information, its schema.org9
types, etc. In our case, we use the API to retrieve GKG entities that                4.2.1 GKG resultScore. When we query GKG API with a string
match a certain named entity mention. For example, querying for                   m, then each returned entity is accompanied with a resultScore field.
“Arizona” fetches a number of entities that match that string, like               This is a positive real number that indicates how good a match is
the state of Arizona, the Pennsylvania-class battleship, the beverage             the returned entity for the given request. In our framework, this
company, and so on.                                                               value is used as an alternative measure of an entity’s popularity
    Unfortunately, not all entities this method yields belong to the              prior (also known as prior probability or commonness). Popularity
set of candidate entities Em for a mention m. For example, it’s not               prior is a measure of the conditional probability of one referring to
unusual that GKG API returns an entity that is missing fields criti-              an entity e, given the mention string m:
cal for our disambiguation framework, such as the corresponding
Wikipedia information. Entities missing such important response                     Popularity Prior[e |m] = P[m 7→ e |mention m appears in text]
fields are deemed useless for our disambiguation pipeline, and are                   This conditional probability can be approximated in various
immediately rejected.                                                             ways, more prominently using Wikipedia and its links to deduce
    Another inconvenience is that GKG API is a general purpose                    how often each surface form is used to refer to a specific entity in
tool that has not been specifically designed to serve as generator of             the encyclopedia. Popularity prior’s usefulness as a disambiguation
candidate entities to be used for NED. For example, when we query                 measure can be explained using what is seemingly a tautology: a
for “Arizona”, GKG API returns, among other entities, Phoenix, the                mention m usually refers to the entity e that is referred to by m
capital of the state of Arizona. It is pretty clear what happens here:            most of the time. However, it is one of the most widely used mea-
GKG API returns not only those entities that could be referred to                 sures in NED systems [6, 12–14, 17, 21, 23–25]. Before we can use
by saying “Arizona”, but also entities closely related to “Arizona”.              resultScore, we normalize it for all entities in each set of candidate
That may make perfect sense for some applications, but in our                     entities Em . As a result, in each Em , the most popular entity will
case it poses a serious problem. Given that for a mention m we                    have a resultScore of 1, and all the rest will have 0 < resultScore < 1.
want Em to be comprised of only those entities that we can refer to               For example, the most popular entity for m = Arizona seems to be
                                                                                  the state of Arizona, as expected.
5 https://www.google.com/intl/es419/insidesearch/features/search/knowledge.html
6 https://developers.google.com/knowledge-graph/
7 https://developers.google.com/freebase/                                         10 https://www.mediawiki.org/wiki/API:Main_page
8 https://en.wikipedia.org/wiki/Main_Page                                         11 https://www.mediawiki.org/wiki/API:Links
9 http://schema.org/                                                              12 https://www.mediawiki.org/wiki/API:Redirects
LDOW2018, April 2018, Lyon, France                       Alexios Mandalios, Konstantinos Tzamaloukas, Alexandros Chortaras, and Giorgos Stamou


   4.2.2 Document Similarity. In prior work [13], the first few sen-              natural to a human reader. However, we need to find a way to com-
tences of an entity’s Wikipedia article have been used to extract the             bine those measures in a sensible way, as, more often than not, each
most relevant terms for the entity. We follow a similar approach.                 of them favors a different disambiguation decision. For example, in
For every entity e returned by GKG API, the articleBody field con-                the text shown in Figure 1, the GKG resultScore measure indicates
tains the first few words of the corresponding Wikipedia article                  that Arizona and Oklahoma are states, while document similarity
that describes the entity, denoted as Te . Comparing the entire text T            and entity relatedness indicate that they are, in fact, ships.
as a bag of words with those short descriptions can help us discover                 We combine those three measures on a graph G, the cornerstone
which entities are appropriate for the given context. The steps are:              of GEEK, which we call candidate entity graph. Given a text T
   (1) Tokenize T and Te , remove stopwords, and stem the remain-                 that contains k named entity mentions M = {m 1 , m 2 , . . . , mk },
       ing tokens.                                                                we generate candidate entity sets E 1 , E 2 , . . . , Ek . Then, for each
   (2) Search for T ’s tokens in Te using fuzzy string matching, for              candidate entity set Ei = {ei1 , . . . ei ni }, where ni = |Ei |, we add to
       increased flexibility.                                                     G nodes ei1 , . . . ei ni , where each node ei j corresponds to the j-th
   (3) Calculate a document similarity measure using the formula                  candidate entity for mention mi . We complete the construction of
       log(1 + |T ∩ Te |)/log(1 + |T |). The logarithms serve to make             G, by connecting each node ei j to each node euv , where i , u, with
       sure we don’t need high overlap of used words in T and Te to               an undirected weighted edge. The edge’s weight is calculated as a
       achieve a high value. Also, the returned value is by definition            linear combination of the three NED measures introduced above,
       normalized.                                                                applied for both of the candidate entities ei j (j-th candidate entity
                                                                                  for mention mi ) and euv (v-th candidate for mention mu ):
   Simply comparing the words contained in two documents has
the potential to guide the disambiguation process. For example,                                                 b · rs(ei j ) + (1 − b) · sim(T ,Tei j )
returning to Figure 1, the word “ship” would help a human under-                    weiдht(ei j , euv ) = a ·
                                                                                                                                   2
stand that Arizona and Oklahoma are ships, and not states. The                                                                                             !
same goes for a system that uses this measure to calculate document                                               b · rs(euv ) + (1 − b) · sim(T ,Teuv )
                                                                                                                +
similarity, as the word “ship”, as well as several of its derivatives,                                                               2
appear multiple times in the respective articleBody fields.
                                                                                                          + (1 − a) · WLM(ei j , euv )
   4.2.3 Entity Relatedness. In the literature, it is common to as-                            where rs ≡ normalized GKG resultScore
sume that a text contains a few coherent topics [3, 4, 6, 8, 9, 12,                                   sim ≡ binary document similarity
13, 21, 23–25, 27], so its entities are semantically related. This sort
of “semantic locality” allows for joint or collective methods of dis-                                 0 ≤ a, b ≤ 1
ambiguation. These methods process all candidate entity sets of a                 The candidate entity graph G is undirected, weighted, complete,
document at the same time, and aim to select those entities (one                  and k-partite. That means G’s nodes are distributed among k inde-
from each set) that demonstrate maximum semantic relatedness. In                  pendent sets. If two nodes belong to the same independent set, then
most cases, Wikipedia’s link structure is used as a way to calculate              they are not adjacent. If they belong to different independent sets,
semantic relatedness between entities. Specifically, the more incom-              then they are adjacent and connected by an undirected weighted
ing links are shared between the Wikipedia articles describing two                edge. The idea behind this is simple: there is no point in connecting
entities, the more semantically similar those entities are assumed                two entities that are candidates for the same named entity mention
to be. Most systems [4, 9, 11, 13, 14, 21, 24, 25] utilize an efficient           mi , as they are mutually exclusive, that is, if ei j , eil ∈ Ei , then
derivative of the Normalized Google Distance [2] suggested by                     (mi 7→ ei j ) =⇒ ¬(mi 7→ eil ) and (mi 7→ eil ) =⇒ ¬(mi 7→ ei j )
Milne and Witten [16], known as the Wikipedia Link-based Mea-                     The parameters a and b serve to add two degrees of freedom to the
sure (WLM). One can easily gather an article’s incoming links using               way we prioritize NED measures. In particular:
the MediaWiki Linkshere API13 . If I N 1 is the set of incoming links                   • a determines how much we value the matching between a
for e 1 ’s Wikipedia article, I N 2 is the set of incoming links for e 2 ’s               mention’s string (calculated as GKG resultScore) as well as
Wikipedia article, and W P is the set of articles in Wikipedia, then:                     the complete text’s string (calculated by document similarity)
                           log(max(|I N 1 |, |I N 2 |)) − log(|I N 1 ∩ I N 2 |)           and the candidate entity’s attributes, versus how much we
   WLM(e 1 , e 2 ) = 1 −
                              log(|W P |) − log(min(|I N 1 |, |I N 2 |))                  value W LM.
                                                                                        • b determines how much we value GKG’s resultScore as a
Returning to Figure 1, it is clear why relatedness is so important. It is
                                                                                          degree of entity commonness, versus how much we value
much easier to disambiguate Arizona and Oklahoma as battleships,
                                                                                          term-overlap document similarity.
given that Pearl Harbor is the military strike, and, on the same note,
it is pretty straightforward to disambiguate Pearl Harbor as the                  Given that all NED measures are normalized (their values range
military strike, given the text talks about World War II.                         from 0 to 1), it follows from the way we calculate the weights of
                                                                                  G’s edges that those weights are also normalized.
4.3     Building the candidate entity graph
Previously, we touched on the positive correlation that exists be-
                                                                                  4.4     Solving the candidate entity graph
tween the proposed NED measures and a disambiguation that seems                   Building the candidate entity graph G is an important step towards
                                                                                  disambiguating the named entities found in text T . That is be-
13 https://www.mediawiki.org/wiki/API:Linkshere                                   cause we transform an unstructured, hard to process piece of text
GEEK: Incremental Graph-based Entity Disambiguation                                                              LDOW2018, April 2018, Lyon, France


into a well-defined, abstract data structure we can work with. Of                 entities in T , and those graphs can be used to identify the alien enti-
course, building G is not the end of the story. We need to find a                 ties discussed above. To elaborate further, when we construct G ei j
way to map G to a disambiguation for T ’s entities. This mapping                  for a candidate entity ei j , it’s like we are forcing ei j in the given
represents GEEK’s main contribution when compared to the liter-                   context, in order to see what happens with our objective function,
ature’s graph-based NERD solutions: a straightforward subgraph                    which is the resulting graph’s total weight. We hypothesize that
extraction method that incrementally eliminates unsuitable candi-                 there are two cases:
date entities. This leads to a sequence of candidate entity graphs                      • ei j is the correct disambiguation choice for mention mi ,
G (1) , G (2) , G (3) , . . ., where each G (x +1) better approximates the cor-           which will be reflected on G ei j ’s increased total weight, as
rect disambiguation result compared to G (x ) .                                           the NED measures’ values will be high.
    Given that G is comprised of k independent sets of nodes, each                      • ei j is not the correct disambiguation choice for mention mi ,
set Ei containing candidate entities for named entity mention mi , it                     which will be reflected on G ei j ’s decreased total weight, as
is clear that we need to select exactly one node ei j from each inde-                     the NED measures’ values will be low.
pendent set Ei . This process is equivalent to mapping each named
                                                                                  Aiming for an incremental disambiguation process that eliminates
entity mention to exactly one of its candidate entities. However,
                                                                                  said alien entities, we developed a worst out heuristic method,
what is not clear is the optimal way to perform said mapping. In
                                                                                  which removes the entity that seems most unfitting in each step,
our framework, we assume that the correct disambiguation lies in
                                                                                  until we achieve disambiguation of all entities. This is more effective
G’s maximum weight k-clique, denoted as G ∗ . G ∗ is the induced
                                                                                  than a best in heuristic, as it is easier to identify the many entities
subgraph of G that contains exactly one node from each indepen-
                                                                                  that don’t fit, rather than those few entities that are correct. An
dent set Ei , such that the total weight of the connecting edges is
                                                                                  outline of this method is presented by Algorithm 1. An example of
maximized. Hence, we face the problem of finding a maximum
                                                                                  building the maximum contribution graph for a candidate entity is
weight k-clique in an undirected, weighted, complete, and k-partite
                                                                                  presented in Figure 2. It’s important to note that after each removal
graph. Such combinatorial optimization problems have been tack-
                                                                                  step, we need to recalculate the maximum contribution graphs
led by NERD frameworks in the past, and have been proved to be
                                                                                  for our candidate entities, as demonstrated in Figure 3. Solving
NP-hard [12–14, 23]. This is pretty clear on an intuitive level, as
                                                                                  the candidate entity graph is the final step in the disambiguation
we have to choose one entity from each of k candidate entity sets
                                                                                  process, and the entire pipeline is illustrated by a flow diagram in
Ei , 1 ≤ i ≤ k. If each Ei contains n candidate entities on average,
                                                                                  Figure 4.
then our choices for the disambiguation are exponentially many:
nk . Consequently, we cannot hope to create an exact algorithm
                                                                                   1: function WorstOutHeuristic(G)
to find G ∗ , given reasonable runtime and resource consumption
constraints.                                                                       2:    if |Ei | ≤ 1 ∀i = 1, 2, . . . , k then
    In this context, we need to come up with an approximation for                  3:        return G                          ▷ disambiguation complete
G ∗ . To this end, we formulate a heuristic method that is tailored to             4:    end if
our problem specifications. Our goal isn’t to find a general purpose               5:    for each ei j ∈ G do
solution for the maximum weight k-clique optimization problem,                     6:        calculate G ei j
but only a way to solve the problem, given that it arises from a                   7:    end for
semantically coherent text and its candidate entities. The method                  8:    tied_nodes = {ei j ∈ G : |Ei | > 1 ∧ šeuv with |Eu | > 1
we suggest is based on the fact that a lot of candidate entities                                         such that weiдht(G ei j ) > weiдht(G euv )}
seem outlandish for a specific textual context. Thus, finding these                9:    find e ∈ tied_nodes with minimum incident edges weight
out of place entities is the key to our disambiguation algorithm.                 10:    return WorstOutHeuristic(G \ e)
                                                                                  11: end function
For example, in the text of Figure 1, the candidate entity Arizona
Beverage Company seems to be a bad choice for the disambiguation
of mention “Arizona”, as suggested by all three NED measures.                     Algorithm 1: Outline of the worst out heuristic method used
Indeed, the Arizona Beverage Company isn’t usually what one                       to disambiguate named entities in text. Every function call
refers to when one says “Arizona” (low commonness), the text                      isolates the “most outlandish” entity, which is then removed
doesn’t contain any terms that would suggest it’s talking about                   from graph G, until disambiguation is achieved. Note that
this company (low document similarity), and, finally, the beverage                the total weight of the maximum contribution graphs is the
company isn’t related to the other entities in the text (low entity               first thing we take into account. However, we use the nodes’
relatedness).                                                                     incident edges weights to resolve ties.
    In order to identify those alien entities in the candidate entity
space, we consider for each candidate entity ei j the disambiguation
in which ei j has the maximum weight contribution. For each can-                  5 EXPERIMENTAL EVALUATION
didate entity ei j , we define its maximum contribution graph G ei j
as the G ∗ candidate in which node ei j has the maximum possible
                                                                                  5.1 Datasets
weight of incident edges. Calculating the G ei j graphs for all can-              We evaluated GEEK using three kinds of datasets: a small texts
didate entities is straightforward, as it only requires visiting ei j ’s          dataset we manually generated, a medium-sized texts dataset com-
neighbors in G. Each G ei j graph suggests a disambiguation for all               prised of Reuters news articles, and a selection of two other datasets
                                                                                  provided by GERBIL [28] (again a dataset containing smaller texts
LDOW2018, April 2018, Lyon, France                         Alexios Mandalios, Konstantinos Tzamaloukas, Alexandros Chortaras, and Giorgos Stamou

                 E1                                            E3                               E1                                        E3




                                                               e 31                                                                       e 31
                                        β

                e11                                                                             e11
                                            δ                                                                                δ
                                                               e 32                                                                       e 32

                                α                                                                             α
                                                γ
                                                       ϵ                                                                             ϵ



                                         e 21                                                                           e 21




                                         E2                                                                             E2


Figure 2: Building the maximum contribution graph for                             Figure 3: Continuing on Figure 2, we assume that node e 31
node e 11 of independent set E 1 in the case of a three-mention                   has been eliminated. Among other nodes, e 11 needs to up-
text T . Node e 11 greedily chooses nodes e 21 and e 31 for G e11 .               date its maximum contribution graph G e11 . Now e 11 is con-
This means β ≥ δ. Edges e 11 –e 21 and e 11 –e 31 are chosen be-                  nected to e 32 , as it offers the next highest weight connec-
cause they are the maximum weight connectors between e 11                         tion to set E 3 . Edges e 11 –e 21 and e 11 –e 32 are chosen because
and the respective node sets E 2 and E 3 . Edge e 21 –e 31 com-                   they are the maximum weight connectors between e 11 and
pletes G e11 ’s edges. We have weiдht(G e11 ) = α + β + γ and                     the respective node sets E 2 and E 3 . Edge e 21 –e 32 completes
contribution(G e11 , e 11 ) = α + β. These two criteria, in that                  G e11 ’s edges. Now we have weiдht(G e11 ) = α + δ + ϵ and
order, applied for all nodes in G, are used to determine the                      contribution(G e11 , e 11 ) = α + δ. We conclude that both elim-
node to be eliminated. In this illustration, the number of                        ination criteria have been altered, a fact that demonstrates
nodes in sets E 1 , E 2 , and E 3 is limited only for the sake of                 why we need to update the maximum contribution graphs
simplicity.                                                                       after every node removal.


and another comprised of larger documents), on its web-based plat-                 Freebase (GKG’s predecessor) and Wikipedia. This dataset aims to
form that facilitates the comparison of several NERD systems14 .                   test our system’s disambiguation capacity when it comes to longer
We note that we didn’t experiment on annotators that have been                     texts, that may include a larger number of possibly unrelated topics.
already outperformed by the NERD tools provided by GERBIL, such                    From now on, it will be denoted as Reuters-231.
as Milne and Witten’s Wikipedia Miner system [16].
                                                                                      Other datasets. To the end of further experimental evaluation of
   Small texts dataset. In the interest of our system’s evaluation, we             GEEK, we turn to GERBIL and use the KORE-50 dataset in order to
collected and compiled a dataset of small texts, which are dense in                assess its performance when it comes to short texts, as well as the
highly ambiguous named entities. The entities in this dataset were                 complete CoNLL dataset to test our system on larger texts.
manually extracted and annotated. Each named entity was mapped
to its best match in GKG and Wikipedia. This dataset was inspired                  5.2    Evaluation Measures
from the KORE 50 NIF NER Corpus15 . Its texts were gathered from
                                                                                   In order to assess the disambiguation that a system produces for a
online sources, such as encyclopedia articles, news feeds, blog posts,
                                                                                   single text T , the precision, recall, F 1 score, and accuracy measures
social media, and so on. It aims to test our system’s disambiguation
                                                                                   may be used. In the context of NERD, precision is the fraction of
capabilities given a limited context and a well defined semantic core.
                                                                                   correctly disambiguated named entity mentions that are generated
From now on, it will be denoted as SAT-300 (300 Short Ambiguous
                                                                                   by the system:
Texts). The SAT-300 dataset is available as part of this paper16 .
                                                                                                             |correctly disambiguated entities|
   Medium-sized texts dataset. The second part of our experimental                   precision =
                                                                                                      |disambiguated entities produced by the system|
evaluation was performed using a standard benchmark dataset
in the field of NERD. This is a dataset of 231 Reuters news-wire                   Recall is the fraction of correctly disambiguated named entity men-
articles used by Hoffart [11] to compare his own NERD system with                  tions that should be disambiguated:
other disambiguation systems found in the literature. Much like                                            |correctly disambiguated entities|
SAT-300’s texts, these were processed by hand, and their named                              recall =
                                                                                                        |entities that should be disambiguated|
entities were mapped to a number of knowledge bases, including
                                                                                   The F 1 score is the harmonic mean of precision and recall:
14 http://gerbil.aksw.org/gerbil/
15 https://datahub.io/dataset/kore-50-nif-ner-corpus                                                                precision · recall
                                                                                                           F1 = 2
16 https://github.com/WWW-2018-submission-SAT-300-dataset/SAT-300-dataset
                                                                                                                    precision + recall
GEEK: Incremental Graph-based Entity Disambiguation                                                                                  LDOW2018, April 2018, Lyon, France


                                                                                                 Similarly to the case of a single text, we use the aggregated version
 NER                            Are named entity                                                 of accuracy in cases where the named entities to be disambiguated
                               mentions in T given?
                                                                                                 are given to the disambiguation system as part of its input.
                   Yes                  No

                                                                                                 5.3     System Parameterization
       Use given                        Use CoreNLP to
       mentions                         extract mentions
                                                                                                 Our system includes several parameters that affect the behaviour
                                                                                                 and performance of the disambiguation process. The most crucial
                                                                                                 elements we need to decide on are:
                                                                                                       • The maximum number of entities requested by GKG API.
          NED
                       Query GKG API
                                                                  Calculate term-based
                                                                  document similarity
                                                                                                         Reducing this value means we have fewer candidates for
                    for candidate entities                      for each candidate entity                each named entity mention, which makes the disambigua-
                                                                                                         tion process easier. On the other hand, increasing this value
            Query Wikipedia disambiguation
             pages for T ’s mentions, using
                                                            Query MediaWiki Linkshere API                means we have richer, more inclusive candidate entity sets,
                                                                 to find incoming links
               the MediaWiki Links API
                                                               for each candidate entity                 at the expense of disambiguation efficiency.
                                                                                                       • b, which indicates how much we value GKG resultScore
                Find Wikipedia redirects
                 for all candidates, using                         Build graph G with                    versus document similarity as NED measures when we build
                                                                    parameters a, b
              the MediaWiki Redirects API
                                                                                                         our candidate entity graph G.
                                                                                                       • a, which indicates how much we value GKG resultScore
               Normalize GKG resultScore                           Solve graph G using
                for each candidate entity                  heuristic method, as in Algorithm 1           and document similarity versus WLM when we build our
                                                                                                         candidate entity graph G.
                                                                                                 We tuned these parameters using about a hundred short ambiguous
                                                                                                 texts similar to those contained in SAT-300, and we settled on the
                                                                                                 following values for our experiments:
Figure 4: GEEK’s named entity recognition and disambigua-
tion pipeline. First we have the NER step, followed by the                                             • We request a hundred candidate entities for each mention,
NED step. In the disambiguation step, which is the most crit-                                            which covers most cases without excluding the correct enti-
ical, we start by defining a crude estimate of the candidate                                             ties from the candidate entity sets or making the candidate
entity set by using the GKG API, which is refined with the                                               entity graph unnecessarily complex.
use of Wikipedia’s APIs. Next we collect the information we                                            • b = 0.85, which means we value GKG resultScore much more
need to build the candidate entity graph, which is solved in                                             than document similarity. The fact that term-based document
a heuristic manner.                                                                                      similarity is a rather naive NED measure was obvious from
                                                                                                         the early stages of our analysis. Our testing concurs with
                                                                                                         this, as the system seems to perform better when the other
The evaluation measures of precision and recall are used when                                            NED measures undertake the heaving lifting.
the NERD system is responsible for the step of NER. Oftentimes,                                        • a ∈ {0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0}, meaning
that is avoided, when the named entities to be disambiguated have                                        we make a transition from a system that only cares about
already been marked and are provided to the disambiguation sys-                                          WLM (a = 0.0) to a system that only cares about the combi-
tem straightaway. In this case, only the NED part of the system is                                       nation of GKG resultScore and document similarity (a = 1.0).
evaluated, and accuracy is used as the only evaluation measure:                                          We didn’t choose only one value for a, as the best value of a
                                   |correctly disambiguated entities|                                    seems to vary depending on the text in hand. Also, observing
            accuracy =                                                                                   the system’s behaviour with different values of a can give
                                        |entities marked in text|
                                                                                                         us insight into its optimal configuration.
   In the case of a collection of texts, one could use precision and
recall to evaluate a system’s NER + NED performance, and accuracy
                                                                                                 5.4     Experimental Results
to evaluate its NED performance, for each and every one of the
texts separately. However, this would turn the experimental results                                  SAT-300 dataset. The performance of our system on the SAT-300
into a long, difficult to interpret list of numbers. That is why we                              dataset in terms of the µAF 1 and MAF 1 measures can be seen in
use the Micro Average and Macro Average aggregation approach:                                    the SAT-300 relevant parts of Table 1 and Figure 5. We used the F 1
                                                                                                 score, because in this experiment we fed the texts to the system as
     • Micro Average: We assume that the dataset’s texts are com-
                                                                                                 plain strings, and it was the system’s responsibility to recognize
        bined to form a large text, on which we are able to use the
                                                                                                 and then disambiguate the named entities.
        above-provided definitions of precision, recall, and accuracy.
                                                                                                     For comparison, the results obtained from GERBIL for a sample of
     • Macro Average: We evaluate the system’s performance on
                                                                                                 its available annotators on the same dataset using the A2KB task17
        the dataset by averaging precision, recall, and accuracy cal-
                                                                                                 are displayed in Table 2. We note that our system outperforms
        culated on each of the collection’s texts.
                                                                                                 other systems on small ambiguous texts. Based on this experimental
For the sake of space preservation, we use the abbreviations Micro                               analysis, we can draw several conclusions about our system:
Average F 1 score ≡ µAF 1 , Macro Average F 1 score ≡ MAF 1 , Micro
Average Accuracy ≡ µAA, and Macro Average Accuracy ≡ MAA.                                        17 Full SAT-300 results: http://gerbil.aksw.org/gerbil/experiment?id=201801240021
LDOW2018, April 2018, Lyon, France                                                             Alexios Mandalios, Konstantinos Tzamaloukas, Alexandros Chortaras, and Giorgos Stamou

Table 1: GEEK scores on SAT-300 and Reuters-231 datasets                                                                   • The system’s best configuration is a = 0.5. This means that
for different values of a (all values in %).                                                                                 balancing between GKG resultScore and document similarity
                                                                                                                             versus entity relatedness yields the best results.
                                                           SAT-300            Reuters-231                                  • We notice from Table 1 and Figure 5 that a = 0.0 gives us bet-
                                                 a                                                                           ter results than a = 1.0. This means that the most important
                                                       µAF 1 MAF 1 µAA MAA
                                                                                                                             measure is entity relatedness. This reflects the nature of our
                                            0.0 72.48             71.91       57.53 51.83                                    dataset. Indeed, we anticipate small, semantically coherent
                                            0.1 77.65             76.60       61.42 57.78                                    texts to contain highly related entities. We also observe that
                                            0.2 83.25             82.55       69.83 67.35                                    GEEK performs better on SAT-300 compared with Reuters-
                                            0.3 88.01             87.48       79.37 76.67                                    231. That difference is due to the entities found in the two
                                            0.4 92.13             92.21       85.29 84.03                                    datasets: SAT-300 only contains entities stored both in GKG
                                            0.5 92.87             93.58       87.66 87.84                                    and Wikipedia, so there are rarely out-of-knowledge-base
                                            0.6 88.64             89.54       84.21 85.70                                    entities; this is not true for Reuters-231, as well as the larger
                                            0.7 82.41             82.99       82.26 84.31                                    CoNLL dataset below, where exist many entities that don’t
                                            0.8 74.48             74.90       81.35 83.51                                    exist in GKG. Those inevitably lead to false positives in the
                                            0.9 68.57             68.29       78.65 81.36                                    context of an annotator that always tries to map all mentions
                                            1.0 65.08             64.60       76.86 80.27                                    to entities.
                                                                                                                           • Both Micro Average and Macro Average aggregation mea-
Table 2: Scores of GERBIL’s annotators when faced with the                                                                   sures give us similar results.
A2KB task of finding named entities in SAT-300’s texts and
then linking them to a knowledge base. Best GEEK configu-                                                                 Reuters-231 dataset. The results of the experimental evaluation
ration is appended for comparison (all values in %).                                                                   of our system using the Reuters-231 dataset can be seen in the
                                                                                                                       Reuters-231 relevant parts of Table 1 and Figure 5. For this dataset,
                                         Annotator                                µAF 1        MAF 1                   we only calculate accuracy because this is the measure used by
                                                                                                                       Hoffart [11] in the original article. This means that in this case the
                                         AIDA                                    68.28         63.66                   system performed only the NED task: we fed the system with the
                                         Babelfy                                 57.29         56.08                   texts to be disambiguated, along with the named entity mentions
                                         DBpedia Spotlight                       58.62         54.43                   they contain. Our conclusions are:
                                         Dexter                                  46.81         41.35
                                         Entityclassifier.eu NER                 35.26         33.39                       • The system’s best configuration still is a balanced value of
                                         FOX                                     39.18         34.97                         a = 0.5. In that case, accuracy exceeds 87%. This represents a
                                         Kea                                      2.64          1.39                         5% improvement compared against Hoffart’s AIDA system.
                                         WAT                                     60.55         51.92                       • We notice from Table 1 and Figure 5 that a = 1.0 gives us
                                         GEEK                                    92.87         93.58                         better results than a = 0.0. This means that the most impor-
                                                                                                                             tant measures are GKG resultScore and document similarity.
                                                                                                                             That’s no surprise, as the Reuters-231 dataset is comprised
                              90                                                                                             of news articles, which contain a fair amount of more or
                                                                                                                             less unambiguous mentions, which can be resolved without
                                                                                                                             resorting to entity relatedness.
    Evaluation measures (%)




                              80                                                                                           • In contrast to SAT-300’s experimental results, we see that
                                                                                                                             the Micro Average and Macro Average approaches behave
                                                                                                                             somewhat differently in the case of Reuters-231. More specif-
                                                                                                                             ically, we have µAA > MAA for a < 0.5 and µAA < MAA
                              70


                                                                                                                             for a > 0.5. This variance offers us insight into the way our
                              60                                                               SAT-300 µAF 1
                                                                                                                             NED measures function:
                                                                                               SAT-300 MAF 1                 – For small values of a, we prioritize coherence in our texts,
                                                                                               Reuters-231 µAA
                                                                                                                                expressed by the relatedness between entities. This deci-
                                                                                               Reuters-231 MAA
                              50                                                                                                sion works in our favor in the case of texts that contain
                                   0.0     0.1       0.2    0.3    0.4     0.5    0.6    0.7    0.8    0.9       1.0
                                                                  Value of parameter a
                                                                                                                                a large number of entities, as those are expected to be
                                                                                                                                semantically correlated, thus easy to disambiguate using
Figure 5: Illustration of our system’s performance on the                                                                       WLM. On the other hand, texts that contain few, possibly
SAT-300 and Reuters-231 datasets, as the parameter a in-                                                                        unrelated entities will be harder to process. This means
creases from 0 to 1.                                                                                                            that the majority of errors is expected to happen in entity-
                                                                                                                                sparse texts, lowering their individual accuracy measure.
                                                                                                                                For example, if we have a text T that contains two named
                                                                                                                                entities and we get one of them wrong, then its accuracy
                                                                                                                                immediately drops to 50%. Returning to the definition
GEEK: Incremental Graph-based Entity Disambiguation                                                                    LDOW2018, April 2018, Lyon, France


          of Micro Average and Macro Average aggregation mea-                       Table 3: Comparison of GEEK with GERBIL’s annotators
          sures, we conclude that our MAA will be affected, as we                   when faced with the A2KB task of finding named entities
          average significantly smaller numbers. Of course, µAA                     in KORE-50 and CoNLL texts and then linking them to a
          is not affected in any way, as this measure doesn’t care                  knowledge base (all values in %).
          about the dataset’s texts individually. That’s why we get
          µAA > MAA for a < 0.5.                                                                                           KORE-50           CoNLL
        – For larger values of a, we prioritize commonness and doc-                    Annotator
                                                                                                                        µAF 1     MAF 1   µAF 1   MAF 1
          ument similarity in our texts, expressed by the measures
          of GKG resultScore and term overlap. This decision works                     AIDA                            58.40      52.58   67.35   64.23
          in our favor in the case of texts that contain few unrelated                 Babelfy                         56.45      52.63   44.81   39.66
          entities, so coherence wouldn’t help. On the other hand,                     DBpedia Spotlight               35.24      28.50   53.92   51.27
          texts that contain many semantically similar entities are                    Dexter                          23.28      17.00   47.47   43.40
          harder to disambiguate. This means that the majority of                      Entityclassifier.eu NER         29.97      26.97   44.92   42.03
          errors is expected to happen in texts containing a high                      FOX                             28.02      25.31   57.23   57.26
          number of entities, which doesn’t affect their individual                    Kea                             50.31      46.27   39.81   36.10
          accuracy measure as much. For example, if we have a text                     WAT                             51.95      39.63   67.22   64.21
          T that contains a hundred named entities and we get one                      GEEK                            62.90      61.50   53.69   51.16
          of them wrong, then its accuracy barely drops to 99%. In
          contrast to what we noticed above, we conclude that our
          MAA will not be drastically affected. Again, µAA is not                           NED. One such tool is Stanford NER20 , used by Hoffart [11]
          affected in any way. This analysis explains why we get                            in his NERD system. An evolution of this tool, in the form
          µAA < MAA for a > 0.5.                                                            of Stanford CoreNLP, is also used in our framework.
    KORE-50 and CoNLL datasets. Setting a = 0.5, which from the
                                                                                    6.2     NED
experimental evaluation turned out to be the best overall perform-
ing value, we further tested GEEK by comparing its performance                      Even though the NER options are pretty straightforward, the NED
on GERBIL’s A2KB task against the service’s annotators on the                       methods vary wildly across the literature. A non-exhaustive list of
KORE-50 dataset18 , a dataset similar to SAT-300, but with a higher                 NED systems that stand out for the novel ideas they introduced
level of ambiguity, as well as the complete CoNLL collection of                     follows: Bunescu and Pasca [1] were the first to appreciate the value
documents19 . The results, in terms of the µAF 1 and MAF 1 scores,                  of Wikipedia’s structured information, like articles, redirects, and
can be found in Table 3. We conclude that GEEK outperforms other                    disambiguation pages, to the end of entity disambiguation. They
annotators on the small documents of KORE-50, but it does not                       introduced Wikification, the idea of mapping textual mentions to
achieve top performance when it comes to the larger texts of CoNLL.                 Wikipedia articles. After that point, all successful NERD systems
This is due to the nature of the disambiguation algorithm applied by                used Wikipedia as a resource in one way or the other. Cucerzan
GEEK, where in-document coherence is crucial, and this attribute                    [3] created the first system that executes NERD in an end-to-end
is less prominent in larger documents.                                              process, getting unstructured text as input, recognizing entities,
    To sum up, GEEK seems to outperform the state-of-the-art on                     and mapping them to Wikipedia. Also, his work introduces the po-
short documents, while being competitive on longer documents.                       tential value of joint or collective disambiguation. Milne and Witten
                                                                                    [16] introduced a novel way to recognize mentions by training a
6 RELATED WORK                                                                      classifier on Wikipedia data and stressed the importance of prior
                                                                                    probability or commonness in the NED process. However, their
6.1 NER                                                                             greatest contribution was the repackaging of Normalized Google
As mentioned above, the subtask of NER is critical for a NERD                       Distance into the Wikipedia Link-based Measure, an efficient and
pipeline. When it comes to NER, two main approaches have been                       effective way of calculating entity relatedness, which was used by
followed in the literature:                                                         the majority of NERD frameworks that followed. Kulkarni et al. [13]
     • Building a dedicated tool. This could mean anything from                     implemented the first full-fledged collective disambiguation system.
       using a statistical approach [3] to training a classifier on                 They also made significant contributions to the algorithmic side
       Wikipedia’s links [16]. This approach is the most flexible, as               of collective disambiguation, recognizing its high complexity and
       the developers of the NERD system can tailor the NER com-                    resorting to heuristic methods. Graph-based collective inference
       ponent to their needs, or even blur the boundaries between                   methods are among the most successful in the field of entity linking.
       NER and NED [6, 20, 26].                                                     As is the case with GEEK, these methods always reduce the text to
     • Using an off-the-shelf tool. This limits the flexibility of the              an appropriate graph, which can guide the disambiguation process.
       NER component, but offers the upside of having an estab-                     Han et al. [10] and Hoffart [11] proposed graph based structures
       lished and well-tuned framework for the preprocessing step                   that combine the disambiguation features of the respective systems
       of entity recognition, which decouples the tasks of NER and                  and model the entity disambiguation task in an intuitive manner.
                                                                                    Moro et al. [18] decided to build their graph using random walks,
18 Full KORE-50 results: http://gerbil.aksw.org/gerbil/experiment?id=201801280015
19 Full CoNLL results: http://gerbil.aksw.org/gerbil/experiment?id=201801280016     20 https://nlp.stanford.edu/software/CRF-NER.shtml
LDOW2018, April 2018, Lyon, France                          Alexios Mandalios, Konstantinos Tzamaloukas, Alexandros Chortaras, and Giorgos Stamou


while Piccinno et al. [19] suggested a plethora of different algo-                       [12] Johannes Hoffart, Mohamed Amir Yosef, Ilaria Bordino, Hagen Fürstenau, Man-
rithms for the solution of their graph. Ganea et al. [5] introduced a                         fred Pinkal, Marc Spaniol, Bilyana Taneva, Stefan Thater, and Gerhard Weikum.
                                                                                              2011. Robust Disambiguation of Named Entities in Text. In Proceedings of the
probabilistic graphical model for collective disambiguation.                                  Conference on Empirical Methods in Natural Language Processing (EMNLP ’11).
                                                                                              Association for Computational Linguistics, Stroudsburg, PA, USA, 782–792.
                                                                                         [13] Sayali Kulkarni, Amit Singh, Ganesh Ramakrishnan, and Soumen Chakrabarti.
7    CONCLUSIONS                                                                              2009. Collective Annotation of Wikipedia Entities in Web Text. In Proceedings of
                                                                                              the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data
In this paper we introduced GEEK, a framework that tackles the                                Mining (KDD ’09). ACM, New York, NY, USA, 457–466.
difficult task of NERD by combining well-established measures,                           [14] Xiaohua Liu, Yitong Li, Haocheng Wu, Ming Zhou, Furu Wei, and Yi Lu. 2013.
such as commonness of entities and coherence, with new potent                                 Entity Linking for Tweets.. In ACL (1). 1304–1311.
                                                                                         [15] Christopher D. Manning, Mihai Surdeanu, John Bauer, Jenny Finkel, Steven J.
tools, such as the Google Knowledge Graph Search API. We pro-                                 Bethard, and David McClosky. 2014. The Stanford CoreNLP Natural Language
posed a graph representation of texts, which allowed us to model                              Processing Toolkit. In Association for Computational Linguistics (ACL) System
                                                                                              Demonstrations. 55–60.
disambiguation as a concise graph-based problem and solve it using                       [16] David Milne and Ian H. Witten. 2008. Learning to Link with Wikipedia. In Pro-
a heuristic method that finds the best clique in a stepwise manner.                           ceedings of the 17th ACM Conference on Information and Knowledge Management
    The main strength of GEEK is that it is built on top of publicly                          (CIKM ’08). ACM, New York, NY, USA, 509–518.
                                                                                         [17] Sean Monahan, John Lehmann, Timothy Nyberg, Jesse Plymale, and Arnold Jung.
available data sources and has a straightforward graph algorithm at                           2011. Cross-Lingual Cross-Document Coreference with Entity Linking.. In TAC.
its core. Moreover, the experimental evaluation on GERBIL showed                         [18] Andrea Moro, Alessandro Raganato, and Roberto Navigli. 2014. Entity Linking
that GEEK produces competitive scores in comparison with other                                meets Word Sense Disambiguation: A Unified Approach. Transactions of the
                                                                                              Association for Computational Linguistics 2 (2014).
state-of-the-art systems, which suggests that one does not always                        [19] Francesco Piccinno and Paolo Ferragina. 2014. From TagME to WAT: A New
need specialized knowledge bases or sophisticated disambiguation                              Entity Annotator. In Proceedings of the First International Workshop on Entity
                                                                                              Recognition & Disambiguation (ERD ’14). 55–62.
algorithms to achieve a high quality disambiguation result.                              [20] Ken Q. Pu, Oktie Hassanzadeh, Richard Drake, and Renée J. Miller. 2010. Online
                                                                                              Annotation of Text Streams with Structured Entities. In Proceedings of the 19th
                                                                                              ACM International Conference on Information and Knowledge Management (CIKM
ACKNOWLEDGEMENTS                                                                              ’10). ACM, New York, NY, USA, 29–38.
We acknowledge support of this work by the project ”APOLLONIS”                           [21] Lev Ratinov, Dan Roth, Doug Downey, and Mike Anderson. 2011. Local and
                                                                                              Global Algorithms for Disambiguation to Wikipedia. In Proceedings of the 49th
(MIS 5002738) which is implemented under the Action ”Reinforce-                               Annual Meeting of the Association for Computational Linguistics: Human Language
ment of the Research and Innovation Infrastructure”, funded by the                            Technologies - Volume 1 (HLT ’11). Association for Computational Linguistics,
                                                                                              Stroudsburg, PA, USA, 1375–1384.
Operational Programme ”Competitiveness, Entrepreneurship and                             [22] W. Shen, J. Wang, and J. Han. 2015. Entity Linking with a Knowledge Base: Issues,
Innovation” (NSRF 2014-2020) and co-financed by Greece and the                                Techniques, and Solutions. IEEE Transactions on Knowledge and Data Engineering
European Union (European Regional Development Fund).                                          27, 2 (Feb 2015), 443–460.
                                                                                         [23] Wei Shen, Jianyong Wang, Ping Luo, and Min Wang. 2012. LIEGE:: Link Entities
                                                                                              in Web Lists with Knowledge Base. In Proceedings of the 18th ACM SIGKDD
REFERENCES                                                                                    International Conference on Knowledge Discovery and Data Mining (KDD ’12).
                                                                                              ACM, New York, NY, USA, 1424–1432.
 [1] Razvan C Bunescu and Marius Pasca. 2006. Using Encyclopedic Knowledge for           [24] Wei Shen, Jianyong Wang, Ping Luo, and Min Wang. 2012. LINDEN: Linking
     Named entity Disambiguation.. In Eacl, Vol. 6. 9–16.                                     Named Entities with Knowledge Base via Semantic Knowledge. In Proceedings
 [2] R. L. Cilibrasi and P. M. B. Vitanyi. 2007. The Google Similarity Distance. IEEE         of the 21st International Conference on World Wide Web (WWW ’12). ACM, New
     Transactions on Knowledge and Data Engineering 19, 3 (March 2007), 370–383.              York, NY, USA, 449–458.
 [3] Silviu Cucerzan. 2007. Large-Scale Named Entity Disambiguation Based on             [25] Wei Shen, Jianyong Wang, Ping Luo, and Min Wang. 2013. Linking Named Entities
     Wikipedia Data. In Proceedings of EMNLP-CoNLL 2007. 708âĂŞ716.                           in Tweets with Knowledge Base via User Interest Modeling. In Proceedings of the
 [4] Paolo Ferragina and Ugo Scaiella. 2010. TAGME: On-the-fly Annotation of                  19th ACM SIGKDD International Conference on Knowledge Discovery and Data
     Short Text Fragments (by Wikipedia Entities). In Proceedings of the 19th ACM             Mining (KDD ’13). ACM, New York, NY, USA, 68–76.
     International Conference on Information and Knowledge Management (CIKM ’10).        [26] Avirup Sil and Alexander Yates. 2013. Re-ranking for joint named-entity recog-
     ACM, New York, NY, USA, 1625–1628.                                                       nition and linking. In Proceedings of the 22nd ACM international conference on
 [5] Octavian-Eugen Ganea, Marina Ganea, Aurelien Lucchi, Carsten Eickhoff, and               Conference on information & knowledge management (CIKM ’13). ACM, New
     Thomas Hofmann. 2016. Probabilistic Bag-Of-Hyperlinks Model for Entity Link-             York, NY, USA, 2369–2374.
     ing. In Proceedings of the 25th International Conference on World Wide Web (WWW     [27] Tadej Štajner and Dunja Mladenić. 2009. Entity Resolution in Texts Using Statistical
     ’16). International World Wide Web Conferences Steering Committee, Republic              Learning and Ontologies. Springer Berlin Heidelberg, Berlin, Heidelberg, 91–104.
     and Canton of Geneva, Switzerland, 927–938.                                         [28] Ricardo Usbeck, Michael Röder, Axel-Cyrille Ngonga Ngomo, Ciro Baron, An-
 [6] Stephen Guo, Ming-Wei Chang, and Emre Kiciman. 2013. To Link or Not to Link?             dreas Both, Martin Brümmer, Diego Ceccarelli, Marco Cornolti, Didier Cherix,
     A Study on End-to-End Tweet Entity Linking.. In HLT-NAACL. 1020–1030.                    Bernd Eickmann, Paolo Ferragina, Christiane Lemke, Andrea Moro, Roberto
 [7] Ben Hachey, Will Radford, Joel Nothman, Matthew Honnibal, and James R. Cur-              Navigli, Francesco Piccinno, Giuseppe Rizzo, Harald Sack, René Speck, Raphaël
     ran. 2013. Evaluating Entity Linking with Wikipedia. Artificial Intelligence 194         Troncy, Jörg Waitelonis, and Lars Wesemann. 2015. GERBIL: General Entity
     (2013), 130 – 150.                                                                       Annotator Benchmarking Framework. In Proceedings of the 24th International
 [8] Xianpei Han and Le Sun. 2011. A Generative Entity-mention Model for Linking              Conference on World Wide Web (WWW ’15). International World Wide Web
     Entities with Knowledge Base. In Proceedings of the 49th Annual Meeting of the           Conferences Steering Committee, Republic and Canton of Geneva, Switzerland,
     Association for Computational Linguistics: Human Language Technologies - Volume          1133–1143.
     1 (HLT ’11). Association for Computational Linguistics, Stroudsburg, PA, USA,
     945–954.
 [9] Xianpei Han and Le Sun. 2012. An Entity-topic Model for Entity Linking. In
     Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language
     Processing and Computational Natural Language Learning (EMNLP-CoNLL ’12).
     Association for Computational Linguistics, Stroudsburg, PA, USA, 105–115.
[10] Xianpei Han, Le Sun, and Jun Zhao. 2011. Collective Entity Linking in Web
     Text: A Graph-based Method. In Proceedings of the 34th International ACM SIGIR
     Conference on Research and Development in Information Retrieval (SIGIR ’11). ACM,
     New York, NY, USA, 765–774.
[11] Johannes Hoffart. 2015. Discovering and disambiguating named entities in text.
     Ph.D. Dissertation. UniversitÃďt des Saarlandes, Postfach 151141, 66041 Saar-
     brÃijcken.