A Topic-Sensitive Model for Salient Entity
                      Linking

                   Lei Zhang, Cong Liu, and Achim Rettinger

        Institute AIFB, Karlsruhe Institute of Technology (KIT), Germany
          {l.zhang, rettinger}@kit.edu, cong.liu@student.kit.edu


      Abstract. In recent years, the amount of entities in large knowledge
      bases available on the Web has been increasing rapidly. Such entities
      can be used to bridge textual data with knowledge bases and thus help
      with many tasks, such as text understanding, word sense disambiguation
      and information retrieval. The key issue is to link the entity mentions in
      documents with the corresponding entities in knowledge bases, referred
      to as entity linking. In addition, for many entity-centric applications,
      entity salience for a document has become a very important factor. This
      raises an impending need to identify a set of salient entities that are
      central to the input document. In this paper, we introduce a new task of
      salient entity linking and propose a graph-based disambiguation solution,
      which integrates several features, especially a topic-sensitive model based
      on Wikipedia categories. Experimental results show that our method
      significantly outperforms the state-of-the-art entity linking methods in
      terms of precision, recall and F-measure.


1   Introduction
In recent years, large repositories of structured knowledge publicly available
on the Web, such as Wikipedia, DBpedia, Freebase and YAGO, have become
valuable resources for information extraction. In this regard, entity linking, which
leverages such knowledge bases to link words or phrases in natural language text
with the corresponding entities, has emerged as a topic of major interest.
    The challenges of entity linking lie in entity recognition and disambiguation.
The first stage serves to detect words or phrases in text, also called mentions,
that are likely to denote entities; the second stage performs the disambiguation
of the recognized mentions into entities. Many methods [1,2,3,4,5,6,7,8] have
been proposed to address the problems of entity disambiguation and linking.
However, these methods do not take into account the actual importance of
entities w.r.t. the topics of the input document. In this work, the relation between
the candidate entities and their associated categories are utilized to opt the
entities that are related to the document topics.
    In addition, there is an impending need to identify a set of salient entities
in a document that play an important role in the content of the document,
which would help to better understand its meaning or aboutness [9]. This paper
focuses on the task of salient entity linking, especially the disambiguation of
                      Fig. 1: Salient entity linking framework.

mentions into salient entities in a document. For this purpose, we propose a
graph-based disambiguation framework, which utilizes a topic-sensitive model
based on Wikipedia categories.
   The rest of the paper is organized as follows. We start with an overview of
our framework for salient entity linking in Sec. 2. The details of features and
measures used for salient entity disambiguation are provided in Sec. 3. Based
on them, we discuss the graph-based disambiguation utilizing a topic-sensitive
model in Sec. 4. Evaluation results are then presented in Sec. 5, followed by the
conclusions in Sec. 6.


2    Framework
Before we discuss our salient entity linking framework, we first formulate the
task of entity linking and then introduce the problem of salient entity linking,
an extension of the general entity linking task.
Definition 1 (Entity Linking). Let M = {m1 , m2 , . . . , mp } denote a set of
entity mentions in a document D. Given a knowledge base KB containing a set
of entities E = {e1 , e2 , . . . , en }, the objective of entity linking is to determine
the referent entities in KB for the mentions in M , where two functions are to be
found. For entity recognition, the mentions need to be extracted from D, where a
recognition function er : D → 2M will be computed. The resulting mentions (i.e.,
a subset µ ⊆ M ) are then mapped to entities in KB, where a disambiguation
function ed : µ → E must be derived.
Definition 2 (Salient Entity Linking). Given a knowledge based KB and a
document D, the recognition function of salient entity linking is same as general
entity linking, i.e., er : D → 2M . For the set of mentions µ ⊆ M yielded by the
recognition function, the disambiguation function ed : µ → E ∪ {Non-Salient},
which maps the set of mentions µ to entities in the KB or to non-salient entities,

                                           2
must be derived, where non-salient entities are entities with no focus of attention
in D, i.e., the document D is really not about such entities.

    An illustration of our salient entity linking framework consisting of several
components is given in Fig. 1. In the following, we first introduce the components
w.r.t. general entity linking and then discuss its extension with the components
for salient entity linking by utilizing a topic-sensitive model.
    For both general and salient entity linking, the input text is first processed by
entity recognition, which detects the boundaries of mentions without knowing the
actual referent entities or whether they are salient or non-salient entities. Then
these mentions serve as the input of entity disambiguation, which is the focus
of this work since we do not aim to compare the method’s ability to recognize
entity names in the input text.
    Given a detected mention, its candidate referent entities are extracted from
the knowledge base. For entity disambiguation regarding general entity linking,
our framework combines different features including prior mention importance,
mention-entity compatibility and entity-entity coherence. The feature of prior
mention importance assigns the prior importance to each detected mention as
weight and it will be used as the initial evidence for graph-based disambiguation.
While the local feature of mention-entity compatibility captures the most likely
entity behind the mention and the entity that best fits the context, the global
feature of entity-entity coherence collectively captures the linked entities in a
document that are related to each other. These features are then employed by
graph-based disambiguation based on a personalized PageRank algorithm.
    To aim for effective salient entity linking, we first perform text classification
on the input text using a multi-class support vector machine (SVM) classifier
based on Wikipedia categories1 aligned with the training corpus. For each
category, we compute the category probability of the input document that serves
as the feature of document-specific category importance. In addition, we compute
the strength of entity-category association based on the depth between each
candidate entity and its categories. Such features are then incorporated into
graph-based disambiguation using a topic-sensitive PageRank algorithm.


3     Features and Measures

In this section, we discuss the features and measures needed for salient entity
disambiguation, while the graph model and algorithm will be presented in Sec. 4.
    Prior Mention Importance. We employ the Wikipedia link structures for
determining the prior mention importance. As each Wikipedia article describes
an entity, article titles, redirect pages and link anchors can be used to refer to
the entity. Based on the above sources, we extract all surface forms of entities.
1
    In this work, we employ the 16 second-level categories including Mathematics, People,
    Science, Sport, Geography, Culture, Politics, Nature, Technology, Education, Health,
    Business, Belief, Society, Life and Concepts in Wikipedia, where the first-level
    category is the fundamental category.


                                            3
For each mention m with the name m.s as surface form of an entity, we define
the probability P (m.s) that captures how likely m.s refers to an entity as

                                        countlink (m.s)
                   P (m.s) =                                                     (1)
                               countlink (m.s) + counttext (m.s)

where countlink (m.s) is the number of articles that contain m.s as anchor text
and counttext (m.s) is the number of articles where m.s appears as raw text.
    Mention-Entity Compatibility. For each mention m and its candidate
referent entity e, we calculate the semantic similarity SS(m, e) representing the
local mention-entity compatibility of m and e as follows

                     SS(m, e) = α · LP (m, e) + β · CS(m, e)                     (2)

where LP (m, e) is the link probability of e for m and CS(m, e) is the context
similarity between m and e, α and β are tunable parameters with α + β = 1.
The link probability LP (m, e) can be calculated using the probability P (e|m.s)
capturing how likely the mention name m.s refers to the entity e as follows

                                                   countlink (e, m.s)
              LP (m, e) = P (e|m.s) = P                                          (3)
                                               ei ∈Em.s countlink (ei , m.s)

where countlink (e, m.s) denotes the number of links using m.s as anchor text
pointing to e as destination and Em.s is the set of entities that have the surface
form m.s. An entity e is characterized by its textual description e.c, called context
of e and a mention m is characterized by its surrounding sentences m.c, called
context of m. The context similarity CS(m, e) between m and e can be calculated
using cosine similarity on the term vectors e.c of e.c and m.c of m.c as

                                                        he.c, m.ci
                     CS(m, e) = cos(e.c, m.c) =                                  (4)
                                                       |e.c| · |m.c|

    Entity-Entity Coherence. The disambiguation is based on the feature of
entity-entity coherence, which collectively captures the referent entities of the
mentions contained in the same document that are related to each other. In this
regard, we calculate the semantic relatedness between each pair of entities ei
and ej by adopting the Wikipedia link-based measure described in [10], which is
originally modeled after the Normalized Google Distance (NGD) [11], as follows

                                  log(max(|Ei |, |Ej |)) − log(|Ei ∩ Ej |)
             SR(ei , ej ) = 1 −                                                  (5)
                                     log(|E|) − log(min(|Ei |, |Ej |))

where Ei and Ej are the sets of entities that link to ei and ej in KB respectively,
and E is the set of all entities in KB.
    Document-specific Category Importance. For text classification of the
input document, we employ John C. Platt’s sequential minimal optimization
for training a support vector machine (SVM) classifier [12,13]. Multi-category
problems are solved using pairwise classification. To obtain proper probability


                                           4
estimates, we use the option that fits logistic regression models to the outputs of
the SVM classifier. In our multi-category scenario, the predicted probabilities are
coupled using Hastie and Tibshirani’s pairwise coupling method [14]. All these
algorithms have been integrated into Weka2 , a collection of machine learning
algorithms for data mining tasks. Based on that, we calculate the category
probability P (ci ) of the input text for each assigned category ci , which reflects
the document-specific category importance.
    Entity-Category Association. All candidate entities are mapped to the
selected Wikipedia categories. In order to measure the entity-category association
between an entity e and its assigned category c, we define the distance d(c, e)
as the minimum depth at which the entity e is located in Wikipedia’s category
tree with the category c as the root. This is computed offline by performing a
breadth-first search starting from the fundamental category that forms the root
of Wikipedia’s hierarchy to each entity. Then the semantic association SA(c, e)
between entity e and category c can be calculated as
                                                  1
                                 SA(c, e) =                                     (6)
                                                d(c, e)

4     Graph Model and Algorithm
Based on the features and measures discussed in Sec. 3, we construct a directed
weighted graph G = {N, R}, called disambiguation graph, where N = NM ]
NE ] NC is the disjoint union of mention nodes NM , entity nodes NE and
category nodes NC , and R is the set of directed edges representing relationships
between these nodes. All detected mentions and their candidate referent entities
are added into NM and NE , respectively, while the categories that the input
text belongs to are added into NC . For each mention m and its candidate entity
e, we add an edge from m to e into R. Additionally, we add an edge between
ei and ej into R if they are connected in KB. Furthermore, for each association
between an entity e and a category c, an edge from c to e will be added into R.
    Once the disambiguation graph G is built, we apply a personalized PageRank
algorithm [15,16] over it. The calculation of the PageRank vector P r over G is
equivalent to resolving the following equation

                           P r = d · T · P r + (1 − d) · v                      (7)

where T is the transition probability matrix, v is the initial evidence vector and
d is the so called damping factor, usually set as 0.85. Each entry Tij in T is the
evidence propagation ratio from node i to node j, which is computed in Eq. 8.
                     
                           SS(mi ,ej )
                      P
                      k∈NE (i) SS(mi ,ek )
                                              if i ∈ NM , j ∈ NE
                     
                           SR(ei ,ej )
               Tij = Pk∈N (i) SR(ei ,ek )      if i ∈ NE , j ∈ NE               (8)
                           E
                          SA(c i ,ej )
                     
                     P                        if i ∈ N , j ∈ N
                                                          C      E
                         k∈NE (i) SA(ci ,ek )

2
    http://www.cs.waikato.ac.nz/ml/weka


                                            5
where NE (i) is the set of entity nodes such that for each node k ∈ NE (i), there is
an edge from i to k in G. The entry vi in v is the initial evidence representing the
prior importance of a mention mi if i ∈ NM or the document-specific importance
of an category ci if i ∈ NC , which is calculated as follows
                   
                                  λ·P (mi )
                    λ·Pk∈N P (mk )+η·Pk∈N P (ck )
                                                           if i ∈ NM
                             M                C
                   
             vi =                  η·P (c )                                      (9)
                        P                i P
                      λ· k∈N P (mk )+η· k∈N P (ck )         if i ∈ NC
                   
                            M                C
                                    0                      otherwise
where λ and η are tunable parameters with λ+η = 1, which reflect the sensitivity
of prior mention importance and document-specific category importance to the
final probability of each candidate entity. When η = 0, our method reduces to
general entity linking without considering the topic-sensitive model. In contrast,
when λ = 0, the initial evidence of the graph-based disambiguation only depends
on the category importance.
    As a result of the personalized PageRank algorithm, each candidate entity e
receives a final probability P (e). For each mention m having a set of candidate
entities Em , we choose the entity with the maximal probability as the predicted
linking entity, i.e., em = arg maxe∈Em P (e). The process discussed above doesn’t
distinguish between salient and non-salient entities. In order to deal with salient
entity linking, one important task of the topic-sensitive model is to validate
whether the predicted linking entity em for mention m is a salient entity. For
this purpose, we learn a threshold τ such that if P (em ) is greater than τ we
return em as the linking entity for m, otherwise we return Non-Salient.


5     Experiments
We now discuss the experiments we performed to assess the performance of
our approach. As the knowledge base, we used the English Wikipedia snapshot
from July 2013. We employed the Reuters-128 entity salience dataset3 , which is
an extension of a part of the N3 entity linking datasets [17]. The Reuters-128
dataset is an English corpus and it contains 128 economic news articles. The
dataset contains information for 880 named entities with their position in the
document and a URI of a DBpedia resource identifying each entity. The salience
dataset extends the Reuters-128 dataset also with 3,551 common entities.
    In order to construct the dataset, entity salience information was obtained by
crowdsourcing salience information using the CrowdFlower platform. For each
named and common entity in the Reuters-128 dataset, the authors of the dataset
collected at least three judgements. Only judgments from annotator with trust
score higher than 70% were considered as trusted judgements. If the trust score of
an annotator falls bellow 70%, all his/her judgements were disregarded. Finally,
each named and common entity in the dataset has been classified in one of the
following classes4 :
3
    https://github.com/KIZI/ner-eval-collection
4
    http://ner.vse.cz/datasets/entitysalience-collection


                                         6
Methods                 Mic. Prec. Mic. Rec. Mic. F1 Mac. Prec. Mac. Rec. Mac. F1.
DBpedia Spotlight [2]     0.45       0.39     0.41      0.45       0.37      0.40
Wikipedia Miner [1]       0.60       0.48     0.54      0.60       0.42      0.52
NERD-ML [5,7]             0.67       0.50     0.57      0.65       0.46      0.54
WAT [4,8]                 0.35       0.32     0.34      0.36       0.33      0.34
AGDISTIS [6]              0.73       0.50     0.59      0.73       0.48      0.58
Our Method (General)      0.70       0.46     0.56      0.69       0.45      0.55
Our Method (Salient)      0.83       0.51     0.63      0.82       0.50      0.62
                        Table 1: The experimental results.

 – Most Salient - Entities with the highest focus of attention in the article. The
   document is mostly about the these entities, or the entities play a prominent
   role in the content of the article.
 – Less Salient - Entities with less focus of attention in the article. The entities
   play an important role in some parts of the content of the article.
 – Not Salient - The article is really not about the entities
    In our experiments, we consider the entities in both classes Most Salient
and Less Salient as salient entities, while entitites belonging to Not Salient are
considered as non-salient entities. Using the Reuters-128 entity salience dataset,
we conducted the experiments to compare our approach with several entity
linking methods. We used two variants of our approach, one employs only the
graph-based disambiguation for general entity linking (λ = 1 and η = 0) and the
other integrates the topic-sensitive model with the goal of salient entity linking
(λ = 0.2 and η = 0.8). All the methods should label each mention with either
the correct entity or Not Salient. Note that we restrict the input to the labeled
mentions to compare the method’s ability to distinguish between salient entity
and non-salient entity, not its ability to recognize entity names in the input text.
The adopted evaluation criteria include Micro-Precision, Micro-Recall, Micro-F1,
Macro-Precision, Macro-Recall and Macro-F1.
    The experimental results are shown in Table 1. By utilizing the topic-sensitive
model, our approach to salient entity disambiguation significantly outperforms
the baselines in terms of all evaluation criteria. Regarding the two variants of
our approach, it clearly shows that the topic-sensitive model indeed contributes
to the final performance improvement.

6   Conclusions
In this paper, we introduce the task of salient entity linking that existing entity
linking solutions cannot well address. For tackling this new problem, we propose
a graph-based disambiguation framework, which integrates several features
including prior mention importance, mention-entity compatibility, entity-entity
coherence and in particular a topic-sensitive model capturing entity-category
association and document-specific category importance. We have experimentally
shown that our approach achieves a significant improvement over the baselines.
The evaluation results also show that the topic-sensitive model indeed helps with
the salient entity disambiguation.

                                         7
Acknowledgments. The research leading to these results has received funding
from the European Union Seventh Framework Programme (FP7/2007-2013)
under grant agreement no. 611346.


References
 1. Milne, D.N., Witten, I.H.: Learning to link with wikipedia. In: CIKM. (2008)
    509–518
 2. Mendes, P.N., Jakob, M., Garcı́a-Silva, A., Bizer, C.: Dbpedia spotlight: shedding
    light on the web of documents. In: I-SEMANTICS. (2011) 1–8
 3. Han, X., Sun, L., Zhao, J.: Collective entity linking in web text: a graph-based
    method. In: SIGIR. (2011) 765–774
 4. Ferragina, P., Scaiella, U.: Fast and accurate annotation of short texts with
    wikipedia pages. IEEE Software 29(1) (2012) 70–75
 5. van Erp, M., Rizzo, G., Troncy, R.: Learning with the web: Spotting named entities
    on the intersection of NERD and machine learning. In: #MSM. (2013) 27–30
 6. Usbeck, R., Ngomo, A.N., Röder, M., Gerber, D., Coelho, S.A., Auer, S., Both,
    A.: AGDISTIS - graph-based disambiguation of named entities using linked data.
    In: ISWC. (2014) 457–471
 7. Rizzo, G., van Erp, M., Troncy, R.:            Benchmarking the extraction and
    disambiguation of named entities on the semantic web. In: LREC. (2014)
    4593–4600
 8. Piccinno, F., Ferragina, P.: From tagme to WAT: a new entity annotator. In:
    ERD@SIGIR. (2014) 55–62
 9. Gamon, M., Yano, T., Song, X., Apacible, J., Pantel, P.: Identifying salient entities
    in web pages. In: CIKM. (2013) 2375–2380
10. Witten, I., Milne, D.: An effective, low-cost measure of semantic relatedness
    obtained from wikipedia links. In: WIKIAI. (2008) 25–30
11. Cilibrasi, R., Vitányi, P.M.B.: The google similarity distance. IEEE Trans. Knowl.
    Data Eng. 19(3) (2007) 370–383
12. Platt, J.C.: Advances in kernel methods. MIT Press, Cambridge, MA, USA (1999)
    185–208
13. Keerthi, S.S., Shevade, S.K., Bhattacharyya, C., Murthy, K.R.K.: Improvements
    to platt’s smo algorithm for svm classifier design. Neural Comput. 13(3) (2001)
    637–649
14. Hastie, T., Tibshirani, R.: Classification by pairwise coupling. In: NIPS. (1997)
    507–513
15. Jeh, G., Widom, J.: Scaling personalized web search. In: WWW. (2003) 271–279
16. Haveliwala, T.H.: Topic-sensitive pagerank: A context-sensitive ranking algorithm
    for web search. IEEE Trans. Knowl. Data Eng. 15(4) (2003) 784–796
17. Röder, M., Usbeck, R., Hellmann, S., Gerber, D., Both, A.: N3 - A collection of
    datasets for named entity recognition and disambiguation in the NLP interchange
    format. In: LREC. (2014) 3529–3533


                                           8