=Paper=
{{Paper
|id=Vol-1141/paper_18
|storemode=property
|title=E2E: An End-to-End Entity Linking System for Short and Noisy Text
|pdfUrl=https://ceur-ws.org/Vol-1141/paper_18.pdf
|volume=Vol-1141
|dblpUrl=https://dblp.org/rec/conf/msm/ChangHMLW14
}}
==E2E: An End-to-End Entity Linking System for Short and Noisy Text==
E2E: An End-to-End Entity Linking System for
Short and Noisy Text
Ming-Wei Chang Bo-June Hsu Hao Ma Ricky Loynd Kuansan Wang
Microsoft Research
Redmond, WA
{minchang|paulhsu|haoma|riloynd|kuansanw}@microsoft.com
ABSTRACT Candidate Generation.
We present E2E, an end-to-end entity linking system that The next step is to generate a list of surface form can-
is designed for short and noisy text found in microblogs and didates that could potentially link to entities. E2E uses
text messages. Mining and extracting entities from short a lexicon to generate the candidate surface forms. A lex-
text is an essential step for many content analysis applica- icon is a dictionary that maps a surface form to its pos-
tions. By jointly optimizing entity recognition and disam- sible entity set. For example, the word “giants” could re-
biguation as a single task, our system can process short and fer to “New York Giants”, or “San Francisco Giants”, etc.
noisy text robustly. Our lexicon is mainly composed by extracting information
from Wikipedia and Freebase. The dictionary is constructed
to support fuzzy mention matching based on edit distance.
Keywords Note that we over-generate candidates at this stage, and no
Information Extraction, Social Media, Entity Linking filtering is performed.
Joint Recognition and Disambiguation.
1. INTRODUCTION This stage is the key component of the E2E framework.
Given a message, the goal here is to figure out the entity
In this paper, we describe our entity linking system called assignment of each candidate mention generated from pre-
E2E for the #Microposts2014 NEEL Challenge [1]. Our vious stages. Note that a candidate mention may be rejected
system focuses on the task of extracting and linking enti- altogether (mapped to the null entity).
ties from short and noisy text given entity databases like Our model is based on a supervised learning method.
Wikipedia or Freebase. An entity linking system usually Given a message m and a candidate mention a, the entity
needs to perform two key functions: mention recognition assignment is generated from the ranking of all possible en-
and entity disambiguation. In mention recognition the sys- tities in the entity set E(a).
tem identifies each mention (surface form) of an entity in
the text. In entity disambiguation, the system maps men- arg max f (Φ(m, a, e)), (1)
e∈{E(a)∩∅}
tions to canonical entities. E2E has been carefully designed
to treats entity recognition and disambiguation as a single where f is the function of the model, and Φ is a feature
task. function over the input m, the mention a and the candidate
output e. Note that it is very likely E2E rejects a candidate
and does not link it to an entity (link a to ∅). The joint
2. THE ARCHITECTURE OF E2E approach that recognizes and disambiguates entity mentions
When a short message is received, E2E processes the mes- together is crucial for E2E to properly link surface forms to
sage in four stages: Text Normalization, Candidate Genera- the corresponding entities.
tion and Joint Recognition-and-Disambiguation, and Over-
lap Resolution. Overlap Resolution.
At this point, many of the linked mentions will overlap
Text Normalization. each other. Dynamic programming resolves these conflicts
In this stage, a short message is normalized and tokenized. by choosing the best-scoring set of non-overlapping mention-
For tweets, the retweet symbols and some other special sym- entity mappings. The experimental results show that resolv-
bols are removed. Punctuation symbols are represented as ing overlap improve the models performance consistently in
separate tokens in general. different settings.
3. SYSTEM IMPLEMENTATION
Our database is constructed from both Wikipedia and
Permission
Copyright to c make
2014 digital
held by or author(s)/owner(s);
hard copies of all or copying
part of this work for
permitted Freebase. The whole system is implemented in C#.
personal
only for or classroom
private use is granted
and academic without fee provided that copies are
purposes. Entity linking systems often require a large amount of
not made orasdistributed
Published for #Microposts2014
part of the profit or commercial advantageproceedings,
Workshop and that copies memory due to the size of the structured/unstructured data
available online as CEUR Vol-1141 (http://ceur-ws.org/Vol-1141)
bear this notice and the full citation on the first page. To copy otherwise, to for many entities. High memory consumption restricts the
republish, to post on April
servers7th,
or to2014,
redistribute
Seoul, to lists, requires prior specific
#Microposts2014,
permission and/or a fee.
Korea. scale of an entity linking system, limiting the number of al-
Copyright 20XX ACM X-XXXXX-XX-X/XX/XX ...$15.00. lowed entities that can be handled. Long loading times also
· #Microposts2014 · 4th Workshop on Making Sense of Microposts · @WWW2014
0.85
reduce the efficiency of conducting experiments. In E2E, we
Precision Recall F1
adopt the completion trie data structure proposed in [4] in- 0.8
stead of a hash map dictionary. The completion trie greatly
0.75
reduces the memory footprint and loading time of E2E.
We have tested two learning methods when developing 0.7
P/R/F1
E2E: a structured support vector machine algorithm [2] and
0.65
a fast implementation of the MART gradient boosting algo-
rithm [3]. The structural SVM model is a linear model that 0.6
takes into account all of the candidates together in the same
0.55
tweet. MART learns an ensemble of decision/regression
trees with scalar values at the leaves, but treats each candi- 0.5
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
date separately. The submitted results are generated using S
MART due to its superior performance on our development
set.
Figure 1: Results of E2E on the development set.
Features.
Three groups of features were used in our system. The
textual features are the features regarding the textual prop- where [·] is an indicator function. When s increases, the sys-
erties of the surface form and its context. For example, tem will produce more entities. From the results in Figure 1,
one feature indicates if the current surface form and the we found that tuning s does impact results significantly. Af-
surrounding words are capitalized or not. We also use fea- ter learning parameters and desired value of s are chosen,
tures generated from the output of the in-house named en- we then retrain the E2E using the full training data, and
tity recognition system that is specially designed to be ro- generate final results with s = 0, 2.5 and 3.5, respectively.
bust on non-capitalized words. The entity graph features
Error Analysis.
We analyze at our results on the development set with
capture the semantic cohesiveness between the entity-entity
s = 3.5. In the development set, there are 1304 mentions,
and entity-mention pairs. This group of features was mainly
and E2E generates total number of 18746 candidates in the
calculated using the entity database and its structured data.
candidate generation stage. Our error analysis shows that
Finally, the statistical features indicates the word usage and
E2E misses 340 entity mentions and predict extra 284 men-
entity popularity using the information collected from the
tions. Among the errors, E2E has troubles on the “num-
web.
ber” entities (e.g. 1_(number)). Further investigation re-
Among the three group features, the statistical feature
veals that the tokenization choice of E2E plays a big part of
group is the most important one. We describe some of the
these errors, given that most punctuations are being treated
most important features in the following. Let a denote the
as separate tokens. Interestingly, E2E only makes 44 cases
surface form of a candidate, and e denote the an entity.
where it correctly recognizes the mentions but link to wrong
One important feature is the link probability feature Pl (a),
entities. Most errors occur when E2E fail to recognize men-
which indicates the probability that a phrase is used as an
tions correctly.
anchor in Wikipedia. For each phrase a, we also collect
statistics about the probability that a phrase is capitalized 5. CONCLUSIONS
in Wikipedia. We refer to this feature as the capitalization In this paper, we presented E2E, a system that performs
rate feature, Pc (a). joint entity recognition and disambiguation on short and
We also compute features that captures the relationships noisy text. We found that the substance of a successful
between an anchor a and an entity e. The probability P (e|a) entity linking system consists of successfully combining all
captures the likelihood of an anchor linked to an Wikipedia of the components.
entity. We have downloaded Wikipedia page view counts, Due to the time limitation, the submitted system still has
representing page view information from 2012.1 According plenty of room to improve. For example, one important
to the popularity information, we add another probability direction is to explore the relationships between different
feature that captures the relative popularity of the pages tweets to improve entity linking results. Developing a ro-
that could be linkedP from the anchor a. More precisely, bust mention detection algorithm is an important research
Pv (e|a) = v(ei )/( {e∈E(a)∩∅} v(e)), where v(e) represents direction as well.
the view count for the page e.
6. REFERENCES
4. RESULTS [1] A. E. Cano Basave, G. Rizzo, A. Varga, M. Rowe,
In our experiments, we split the training set into two sets M. Stankovic, and A.-S. Dadzie. Making Sense of
that contains 1534 and 800 tweets, respectively. The 800- Microposts (#Microposts2014) Named Entity
tweet data is used as our development set. Our analysis Extraction & Linking Challenge. In Proc.,
shows that robust mention detection is often the source of #Microposts2014, pages 54–60, 2014.
errors in the current the entity linking systems. In order to [2] M.-W. Chang and W.-T. Yih. Dual coordinate descent
achieve better F1 score, we change the prediction function algorithms for efficient large margin structured
to prediction. TACL, 2013.
arg max f (Φ(m, a, e)) − s[e = ∅], (2) [3] J. H. Friedman. Greedy function approximation: A
e∈{E(a)∩∅} gradient boosting machine. Annals of Statistics, 1999.
[4] B.-J. P. Hsu and G. Ottaviano. Space-efficient data
1
http://dammit.lt/wikistats structures for top-k completion. In WWW, 2013.
63
· #Microposts2014 · 4th Workshop on Making Sense of Microposts · @WWW2014