=Paper=
{{Paper
|id=Vol-1141/neel-challenge-report
|storemode=property
|title=Making Sense of Microposts (#Microposts2014) Named Entity Extraction & Linking Challenge
|pdfUrl=https://ceur-ws.org/Vol-1141/microposts2014_neel-challenge-report.pdf
|volume=Vol-1141
|dblpUrl=https://dblp.org/rec/conf/msm/BasaveRVRSD14
}}
==Making Sense of Microposts (#Microposts2014) Named Entity Extraction & Linking Challenge==
Making Sense of Microposts (#Microposts2014)
Named Entity Extraction & Linking Challenge
Amparo E. Cano Giuseppe Rizzo Andrea Varga
Knowledge Media Institute Università di Torino, Italy The OAK Group
The Open University, UK EURECOM, France The University of Sheffield, UK
amparo.cano@open.ac.uk giuseppe.rizzo@di.unito.it a.varga@dcs.shef.ac.uk
Matthew Rowe Milan Stankovic Aba-Sah Dadzie
School of Computing and Sépage, France School of Computer Science
Communications Université Paris-Sorbonne University of Birmingham, UK
Lancaster University, UK milstan@gmail.com a.dadzie@cs.bham.ac.uk
m.rowe@lancaster.ac.uk
ABSTRACT croposts are short text messages published using minimal effort via
Microposts are small fragments of social media content and a pop- social media platforms. They provide a publicly available wealth
ular medium for sharing facts, opinions and emotions. They com- of data which has proven to be useful in different applications and
prise a wealth of data which is increasing exponentially, and which contexts (e.g. music recommendation, social bots, emergency re-
therefore presents new challenges for the information extraction sponse situations). However, gleaning useful information from Mi-
community, among others. This paper describes the ‘Making Sense cropost content presents various challenges, due, among others, to
of Microposts’ (#Microposts2014) Workshop’s Named Entity Ex- the inherent characteristics of this type of data:
traction and Linking (NEEL) Challenge, held as part of the 2014
World Wide Web conference (WWW’14). The task of this chal- i) the limited length of Microposts;
lenge consists of the automatic extraction and linkage of entities ii) the noisy lexical nature of Microposts, where terminology dif-
appearing within English Microposts on Twitter. Participants were fers between users when referring to the same thing, and ab-
set the task of engineering a named entity extraction and DBpedia breviations are commonplace.
linkage system targeting a predefined taxonomy, to be run on the
challenge data set, comprising a manually annotated training and a A commonly used approach for mining Microposts is the use of
test corpus of Microposts. 43 research groups expressed intent to cues that are available in textual documents, providing contextual
participate in the challenge, of which 24 signed the agreement re- features to this content. One example of such a cue is the use of
quired to be given a copy of the training and test datasets. 8 groups named entities (NE). Extracting named entities in Micropost con-
fulfilled all submission requirements, out of which 4 were accepted tent has proved to be a challenging task; this was the focus of the
for the presentation at the workshop and a further 2 as posters. The first challenge, in #MSM2013 [3]. A step further into the use of
submissions covered sequential and joint methods for approaching such cues is to be able not only to recognize and classify them but
the named entity extraction and entity linking tasks. We describe also to provide further information, in other words, disambiguating
the evaluation process and discuss the performance of the different entities. This prompted the Named Entity Extraction and Linking
approaches to the #Microposts2014 NEEL Challenge. (NEEL) Challenge, held as part of the Making Sense of Microposts
Workshop (#Microposts2014) at the 2014 World Wide Web Confer-
Keywords ence (WWW’14).
Microposts, Named Entity, Evaluation, Extraction, Linking, Dis-
ambiguation, Challenge The purpose of this challenge was to set up an open and com-
petitive environment that would encourage participants to deliver
novel or improved approaches to extract entities from Microposts
1. INTRODUCTION and link them to their DBpedia counterpart resources (if defined).
Since the first Making Sense of Microposts (#MSM2011) work- This report describes the #Microposts2014 NEEL Challenge, our
shop at the Extended Semantic Web Conference in 2011 through to collaborative annotation of a corpus of Microposts and our eval-
the most recent workshop in 2014 we have received over 80 sub- uation of the performance of each submission. We also describe
missions covering a wide range of topics related to mining infor- the approaches taken in the participants’ systems – which use both
mation and (re-)using the knowledge content of Microposts. Mi- established and novel, alternative approaches to entity extraction
and linking. We describe how well they performed and how sys-
tem performance differed across approaches. The resulting body
of work has implications for researchers interested in the task of
Copyright c 2014 held by author(s)/owner(s); copying permitted information extraction from social media.
only for private and academic purposes.
Published as part of the #Microposts2014 Workshop proceedings,
available online as CEUR Vol-1141 (http://ceur-ws.org/Vol-1141)
2. THE CHALLENGE
In this section we describe the goal of the challenge, the task set,
#Microposts2014, April 7th, 2014, Seoul, Korea. and the process we followed to generate the corpus of Microposts.
We conclude the section with the list of the accepted submissions.
· #Microposts2014 · 4th Workshop on Making Sense of Microposts · @WWW2014
2.1 The Task and Goal Correctly formatted result:
The NEEL Challenge task required participants to build semi-automated bbcworldservice dbpedia:BBC_World_Service
systems in two stages: Oslo dbpedia:Oslo
oslexp dbpedia:2011_Norway_attacks
(i) generally known as Named Entity Extraction (NEE) – in which
participants were to extract entity mentions from a tweet; and We also consider the case where an entity is referenced in a tweet
either as a noun or a noun phrase, if it:
(ii) known as Named Entity Linking (NEL), in which each entity
extracted is linked to an English DBpedia v3.9 resource. a) belongs to one of the categories specified in the taxonomy;
b) is disambiguated by a DBpedia URI within the context of the
tweet. Hence any (single word or phrase) entity without a dis-
For this task we considered the definition of an entity in the general ambiguation URI is disregarded;
sense of being, in which an object or a set of objects do not neces- c) subsumes other entities. The longest entity phrase within a Mi-
sarily need to have a material existence, but which however must cropost, composed of multiple sequential entities and that can
be characterized as an instance of a taxonomy class. To facilitate be disambiguated by a DBpedia URI, takes precedence over its
the creation of the gold standard (GS) we limited the entity types component entities.
evaluated in this challenge by specifying the taxonomy to be used:
the NERD ontology v0.51 [16]. To this we added a few concepts
from the DBpedia taxonomy. The taxonomy was not considered as Consider the following examples:
normative in the evaluation of the submissions, nor for the ranking.
This is a deliberate choice, to increase the complexity of the task
and to let participants perform taxonomy matching starting from 1. [Natural History Museum at Tring];
the distribution of the entities in the GS. The list of classes in the 2. [News International chairman James Murdoch]’s
taxonomy used is distributed with the released GS2 . evidence to MPs on phone hacking;
3. [Sony]’s [Android Honeycomb] Tablet
Beside the typical word-tokens found in a Micropost, new to this
year’s challenge we considered special social media markers as en-
tity mentions as well. These Twitter markers are tokens introduced For the 3nd case, even though they may appear to be a coherent
with a special symbol. We considered two such markers: hashtags, phrases, since there are no DBpedia URIs for [Sony’s Android
prefixed by #, denoting the topic of a Micropost (e.g. #londonri- Honeycomb] or [Sony’s Android Honeycomb Tablet],
ots, #surreyriots, #osloexpl), and mentions prefixed by @, referring the entity phrase is split into what are the (valid) component enti-
to Twitter user names, which include entities such as organizations ties highlighted above.
(e.g. @bbcworldservice) and celebrities (e.g. @ChadMMurray,
@AmyWinehouse).
To encourage competition we solicited sponsorship for the winning
Participants were required to recognize these different entity types submission. This was provided by the European project LinkedTV5 ,
within a given Micropost, and to extract the corresponding entity who offered a prize of an iPad This generous sponsorship is tes-
link tuples. Consider the following example, taken from our anno- tament to the growing interest in issues related to automatic ap-
tated corpus: proaches for gleaning information from (the very large amounts of)
social media data.
Source (tweet text): 2.2 Data Collection and Annotation
RT @bbcworldservice police confirms bomb The challenge data set comprises 3,505 tweets extracted from a col-
in Oslo #oslexp lection of over 18 million tweets. This collection, provided by the
Redites project6 , covers event-annotated tweets collected for the
The 2nd token (the mention @bbcworldservice) in this Micropost period 15th July 2011 to 15th August 2011 (31 days). It extends
refers to the international broadcaster, the BBC World Service; the over multiple notable events, including the death of Amy Wine-
7th token refers to the location Oslo; while the 8th token (the hash- house, the London Riots and the Oslo bombing. Since the NEEL
tag #oslexp) refers to the 2011 Norway terrorist attack. An entry Challenge task is to automatically extract and link entities, we built
to the challenge would be required to spot these tokens and display our data set considering both event and non-event tweets. Event
the result as a set of annotations, where each line corresponds to a tweets are more likely to contain entities; non-event tweets there-
tab-separated entity mention3 and entity link4 : fore enable us to evaluate the performance of the system in avoiding
false positives in the entity extraction phase.
1
http://nerd.eurecom.fr/ontology/nerd-v0.5.
n3 Statistics describing the training and test sets are provided in Ta-
2
The NEEL Challenge GS available for download from: http: ble 1. The dataset was split into training (70%) and test (30%)
//ceur-ws.org/Vol-1141/microposts2014-neel_ sets. The training set contains 2,340 tweets, with 41,037 tokens
challenge_gs.zip and 3,819 named entities; the test set contains 1,165 tweets, with
3
Note that the annotated result returns tokens without the social 20,224 tokens and 1,458 named entities. The tweets are relatively
media markers (# and @) in the original Micropost.
4 5
In this example “dbpedia:” refers to the namespace prefix of a http://www.linkedtv.eu
6
DBpedia resource (see http://dbpedia.org/resource) http://demeter.inf.ed.ac.uk/redites
55
· #Microposts2014 · 4th Workshop on Making Sense of Microposts · @WWW2014
Table 1: General statistics of the training and test data sets:
Posts refers to the number of tweets in a data set; Words to the unique number of words; Tokens refers to the total number of
words; AvgTokens/Post represents the average number of tokens per tweet; NEs denotes the unique number of NEs; totalNEs the
total number of NEs; and AvgNEs/Post the average number of NEs per post. We computed AvgTokens/Post and AvgNEs/Post as the
standard deviation from the mean (mean ± standard deviation).
Dataset Posts Words/Tokens AvgTokens/Post NEs totalNEs AvgNEs/Post
train 2,340 12,758/ 41,037 17.54±5.70 1,862 3,819 3.26±3.37
test 1,165 6,858/ 20,224 17.36±5.59 834 1,458 2.50±2.94
long in both data sets; the average number of tokens per tweet is
17.54±5.70 in the training, and 17.36±5.59 in the test set. The av- Table 2: Submissions accepted, ordered by submission number,
erage number of entities per tweet is also relatively high, at with team affiliations and number of runs for each.
3.26±3.37 for the training and 2.50±2.94 for the test dataset. The ID Affiliation Authors Runs
percentage of tweets without any valid entities is 32% (775 tweets) 13 UTwente Habib, M. et al. 2
in the training, and 40% (469 tweets) in the test set. There is a fair
bit of overlap of entities between the training and test data: 13.27% 15 Max Planck Amir, M. et al. 3
(316) of the named entities in the training data also occurs in the 16 IIT Hyberabad Bansal, R. et al. 1
test dataset. With regard to the tokens in the original tweets with
hashtag and mention social media markers, a total of 406 hashtags 18 Microsoft Chang, M. 3
represented valid entities in the training, with 184 in the test set. 19 Net7-Spaziodati- Scaiella, U. et al. 2
The total number of valid entity mentions was 133 in the training, UPisa
and 73 in the test data set.
20 SAP Dahlmeier, D. et al. 1
The annotation of each Micropost in the training set gave all partic-
ipants a common base from which to learn extraction patterns. In
order to assess the performance of the submissions we used an un- dard, is available for download9 with the #Microposts2014 Work-
derlying gold standard (GS), generated by 14 annotators, who had shop proceedings, accessible under the Creative Commons Attribution-
different backgrounds, including computer scientists, social scien- NonCommercial-ShareAlike 3.0 Unported License10 .
tists, social web experts, semantic web experts and linguists.
The annotation process comprised the following phases7
2.3 Challenge Submissions
The challenge attracted a lot of interest from research groups spread
across the world. Initially, 43 groups expressed their intent to par-
Phase 1. Unsupervised annotation of the corpus was performed, to ticipate in the challenge; however only 8 completed submission.
extract candidate links that were used as input to the next Each submission consisted of a short paper explaining the system
stage. The candidates were extracted using the NERD approach, and up to three different test set annotations generated
framework [15]. by running the system with different settings. After peer review,
4 submissions were accepted, and a further 2 as posters. The sub-
Phase 2. The data set was divided into batches, with three different mission run with the best overall performance for each system was
annotators to each batch. In this phase annotations were used in the rankings (see Table 4). The submissions accepted are
performed using CrowdFlower8 . The annotators were listed in Table 2.
asked to analyze the NERD links generated in phase 1 by
adding or removing entity-annotations as required. The
annotators were also asked to mark any ambiguous cases
2.4 System Descriptions
We present next an analysis of the participants’ systems for the
encountered.
Named Entity Extraction and Linking (NEEL) tasks. Except for
Phase 3. In the final stage, consistency checking, three experts submission 18, who treated the NEEL task as a joint task of Named
double-checked the annotations and generated the GS (for Entity Extraction (NEE) and Named Entity Linking (NEL); all par-
both the training and test sets). Three main tasks were ticipants approached the NEEL task as two sequential sub-tasks
carried out here: (1) cross-consistency check of entity (i.e. NEE first, followed by NEL). A summary of these approaches
types; (2) cross-consistency check of URIs; (3) resolution includes:
of ambiguous cases raised by the 14 annotators.
i) use of external systems;
ii) main features used;
The complete data set, including a list of changes and the gold stan- iii) type of strategy used;
7 9
We aim to provide a more detailed explanation of the annotation http://ceur-ws.org/Vol-1141/
process and the rest of the NEEL Challenge evaluation process in a microposts2014-neel_challenge_gs.zip
separate publication. 10
Following the Twitter ToS we only provide tweet IDs and annota-
8
http://crowdflower.com tions for the training set; and tweet IDs for the test set.
56
· #Microposts2014 · 4th Workshop on Making Sense of Microposts · @WWW2014
iv) use of external sources. use of Twitter account metadata for characterizing mentions and
popularity-based statistical features for characterizing entities (sys-
Table 3 provides a detailed summary of the approaches used for tems 16, 18).
both the NEE and NEL tasks.
The classification strategies used for entity linking included super-
vised approaches (systems 13, 15, 16, 18, 19) existing off-the-shelf
The NEE task on Microposts is on its own challenging. One of the approaches enhanced with simple heuristics (e.g. the search+rules)
main strategies was to use off-the-shelf named entity recognition (system 20).
(NER) tools, improved through the use of extended gazetteers. Sys-
tem 18 approached the NEE task from scratch using a rule-based
approach; all others made use of external toolkits. Some of these
3. EVALUATION OF CHALLENGE SUBMIS-
were Twitter-tuned and were applied for: SIONS
We describe next the evaluation measures used to assess the good-
ness of the submissions and conclude with the final challenge rank-
i) feature extraction, including the use of the TwitterNLP (2013) ings, with submissions ordered according to the F1 measure.
[13] and TwitterNLP (2011) [8] toolkits for POS tagging (sys-
tems 16, 20);
3.1 Evaluation Measures
ii) entity extraction with TwiNER [10], Ritter’s NER [14] and We evaluate the goodness of a system S in terms of the performance
TAGME [5] (systems 13, 16, 19). of the system to both recognize and link an entity from a test set
T S. Per each instance in T S, a system provides a set of pairs P
of the form: entity mention (e), and link (l). A link is any valid
Other external toolkits which address NEE in longer newswire texts
DBpedia URI11 that points to an existing resource (e.g. http://
were also applied, including Stanford NER [6] and DBpedia Spot-
dbpedia.org/resource/Barack_Obama). The evaluation
light [11] (systems 15, 20).
consists of comparing submission entry pairs against those in the
gold standard GS. The measures used to evaluate each pair are
Another common trend across these systems was the use of gazetteer-
precision P , recall R, and f-measure F1 . The evaluation is based
based, rule-matching approaches to improve the coverage of the
on micro-averages.
off-the-shelf tools. System 13 applied simple regular expression
rules to detect additional named entities not found by the NE ex-
First, a cleansing stage is performed over each submission, resolv-
tractor (such as numbers, and dates); systems 15 and 18 applied
ing where needed, the redirects. Then, to assess the correctness of
rules to find candidate entity mentions using a knowledge base
the pairs provided by a system S, we perform an exact-match eval-
(among others, Freebase [2]). Some systems also applied name
uation, in which a pair is correct only if both the entity mention and
normalization for feature extraction (systems 15, 18). This strategy
the link match the corresponding set in the GS. Pair order is also
was particularly useful for catering for entities originally appearing
relevant. We define (e, l)S ∈ S as the set of pairs extracted by the
as hashtags or username mentions. For example, hashtags such as
system S, (e, l)GS ∈ GS denotes the set of pairs in the gold stan-
#BarackObama were normalized into a composite entity mention
dard. We define the set of true positives T P , false positives F P ,
“Barack Obama"; and “@EmWatson" into “Emma Watson".
and false negatives F N for a given system as:
T P = {(e, l)S |(e, l)GS ∈ (S ∩ GS)} (1)
The NEL task involved in some cases the use of off-the-self tools,
for finding candidate links for each entity mention and/or for deriv-
F P = {(e, l)S |(e, l)GS ∈ S ∧ (e, l) ∈
/ GS)} (2)
ing mention features (systems 13, 19, 20). A common trend across
systems was the use of external knowledge sources including:
F N = {(e, l)S |(e, l)GS ∈ GS ∧ (e, l) ∈
/ S} (3)
i) NER dictionaries (e.g. Google CrossWiki [17]);
Thus T P defines the set of relevant pairs in T S, in other words the
ii) Knowledge Base Gazetteers (e.g. Yago [9], DBpedia [1]); set of pairs in T S that match corresponding ones in GS. F P is the
set of irrelevant pairs in T S, in other words the pairs in T S that do
iii) Weighted lexicons (using e.g. Freebase [2], Wikipedia); not match the pairs in GS. F N is the set of false negatives denoting
the pairs that are not recognised by T S, yet appear in GS. Since
iv) other sources (e.g. Microsoft Web N-gram [19]).
our evaluation is based on a micro-average analysis, we sum the
individual true positives, false positives, and false negatives of each
A wide range of different features was investigated for the linking system across all Microposts. As we require an exact-match for
strategies. Some systems characterized an entity using Micropost- pairs (e, l) we are looking for strict entity recognition and linking
derived features with Knowledge base (KB)-derived features (sys- matches; each system has to link each entity e recognised to the
tems 13, 15, 16, 19). Micropost-derived features include the use of correct resource l.
lexical (e.g., N-grams, capitalization) and syntactical (e.g., POS)
features, while KB-derived features included the use of URIs, an- From this set of definitions, we define precision, recall, and f-
chor text and link-based probabilities (see Table 3). Additionally, measure as follows:
features were extended by capturing jointly the local (within a Mi- |T P |
cropost) and global (within a knowledge base) contextual informa- P = (4)
|T P ∪ F P |
tion of an entity, via graph-based features (such as entity seman-
11
tic cohesiveness) (system 18). Further novel features included the We consider all DBpedia v3.9 resources valid.
57
· #Microposts2014 · 4th Workshop on Making Sense of Microposts · @WWW2014
Table 3: Automated approaches used in #Microposts2014 NEEL Challenge
Named Entity Extraction (NEE)
External System Main Features NE Extraction Strategy Linguistic Knowledge
13 UTwente TwiNER [10] Regular Expression, Entity phrases, N-gram TwiNER [10] + CRF DBa Gazetteer [1], Wikib
Gazetteer
15 MaxPlanck StanfordNER [6] - - NER Dictionary
16 IIT Hyberabad RitterNER [14],TwitterNLP(2011) [8] Proper nouns sequence, N-grams - Wiki
18 Microsoft - N-grams, stop words removal, punctuation Rule-based (candidate filter) Wiki and Freebase [2]
as tokens lexicons
19 Net7-Spaziodati-UPisa TAGME [5] Wiki anchor texts, N-grams Collective agreement + Wiki stats Wiki
20 SAP DBSpotlight [11], TwitterNLP(2013) Unigram, POS, lower, title & upper case, CRF DB Gazetteer, Brown
[13] stripped words, isNumber, word cluster, Clusters [18]
DBpedia
Named Entity Linking (NEL)
External System Main Features Linking Strategy Linguistic Knowledge
13 UTwente Google Searchc N-grams, DB links, Wiki links, SVM Wiki, DB, WordNet [12],
Capitalization Web N-Gram [19], Yago
[9]
15 MaxPlanck - Prefix, POS, suffix, Twitter account entity aggregate prior + prefix-tree data Wiki, DB, Yago [9]
metadata, normalized mentions, tri-grams structure + DB match
· #Microposts2014 · 4th Workshop on Making Sense of Microposts · @WWW2014
16 IIT Hyberabad - Wiki context-based measure, anchor text LambdaMART [20] Wiki Gazetteer, Google
measure, entity popularity (in Twitter) (ranking/disambiguation) CrossWiki Dictionary
[17]
18 Microsoft - N-grams, lower case, entity graph features DCD-SSVM [4] + MART gradient Wiki, Freebase [2]
(entity semantic cohesiveness), boosting [7]
popularity-based statistical features (clicks
and visiting information from the Web)
19 Net7-Spaziodati-UPisa TAGME [5] Link probability, mention-link commonness C4.5 (for taxonomy-filter) Wiki, DB [1]
20 SAP Search API (Wiki, DBSpotlight [11], Entity mention Search+rules Wiki, DB [1]
Google)
a
DBpedia abbreviated to ‘DB’ [1]
b
Wikipedia abbreviated to ‘Wiki’
c
Google Search, https://developers.google.com/web-search/docs
58
|T P | with this type of progressively changing content. It also indicated
R= (5)
|T P ∪ F N | that learning entity extraction and linking as a joint task may be
beneficial for boosting performance in entity linking in Microposts.
P ∗R
F1 = 2 ∗ (6)
P +R We aim to continue to host additional challenges targeting more
complex tasks, within the context of data mining of Microposts.
The evaluation framework used in the challenge is available at https:
//github.com/giusepperizzo/neeleval. 5. ACKNOWLEDGMENTS
The authors thank Nikolaos Aletras for helping to set-up the crowd-
3.2 Evaluation Results flower experiments for annotating the gold standard data set. A spe-
Table 4 reports the performance of participants’ systems, using the cial thank to Andrés García-Silva, Daniel Preoţiuc-Pietro, Ebrahim
best run for each. The ranking is based on the F1 . Bagheri, José M. Morales del Castillo, Irina Temnikova, Georgios
Paltoglou, Pierpaolo Basile and Leon Derczynski, who took part in
the annotation tasks. We also thank the participants who helped us
Table 4: P, R, F1 breakdown figures per submission. improve the gold standard. We finally thank the LinkedTV project
for supporting the challenge by sponsoring the prize for the win-
ning submission.
Rank System Entry P R F1
1 18-2 Microsoft 77.10 64.20 70.06 6. REFERENCES
2 13-2 UTwente 57.30 52.74 54.93 [1] S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, and Z. Ives.
3 19-2 Net7-Spaziodati-UPisa 60.93 42.25 49.90 DBpedia: A Nucleus for a Web of Open Data. In 6th
International Semantic Web Conference (ISWC’07), 2007.
4 15-3 MaxPlanck 53.28 39.51 45.37 [2] K. Bollacker, C. Evans, P. Paritosh, T. Sturge, and J. Taylor.
5 16-1 IIT Hyberabad 50.95 40.67 45.23 Freebase: A collaboratively created graph database for
6 20-1 SAP 49.58 32.17 39.02 structuring human knowledge. In ACM SIGMOD
International Conference on Management of Data
(SIGMOD’08), 2008.
System 18 clearly outperformed other systems, with F1 more than
[3] A. E. Cano Basave, A. Varga, M. Rowe, M. Stankovic, and
15% higher than the next best system. System 18 differed from all
A.-S. Dadzie. Making Sense of Microposts (#MSM2013)
other systems, by using a joint approach to the NEEL task. The oth-
Concept Extraction Challenge. In Making Sense of
ers each divided the task into a sequential entity extraction and link-
Microposts (#MSM2013) Concept Extraction Challenge,
ing task. The approach in System 18 made use of features which
2013.
capture jointly an entity’s local and global contextual information,
resulting in the best approach submitted to the #Microposts2014 [4] M.-W. Chang and W.-T. Yih. Dual coordinate descent
NEEL Challenge. algorithms for efficient large margin structured prediction.
Transactions of the Association for Computational
Linguistics, 2013.
4. CONCLUSIONS [5] P. Ferragina and U. Scaiella. Fast and accurate annotation of
The aim of the #Microposts2014 Named Entity Extraction & Link- short texts with Wikipedia pages. IEEE Software,
ing Challenge was to foster an open initiative that would encourage 29(1):70–75, 2012.
participants to develop novel approaches for extracting and linking
[6] J. R. Finkel, T. Grenager, and C. Manning. Incorporating
entity mentions appearing in Microposts. The NEEL task involved
non-local information into information extraction systems by
the extraction of entity mentions in Microposts and the linking of
Gibbs sampling. In 43rd Annual Meeting on Association for
these entity mentions to DBpedia resources (where such exist).
Computational Linguistics (ACL’05), 2005.
Our motivation for hosting this challenge is the increased availabil- [7] J. H. Friedman. Greedy function approximation: A gradient
ity of third-party entity extraction and entity linking tools. Such boosting machine. Annals of Statistics, 29:1189–1232, 2000.
tools have proven to be a good starting point for entity linking, [8] K. Gimpel, N. Schneider, B. O’Connor, D. Das, D. Mills,
even for Microposts. However, the evaluation results show that the J. Eisenstein, M. Heilman, D. Yogatama, J. Flanigan, and
NEEL task remains challenging when applied to social media con- N. A. Smith. Part-of-speech tagging for Twitter: Annotation,
tent with its peculiarities, when compared to standard length text features, and experiments. In 49th Annual Meeting of the
employing regular language. Association for Computational Linguistics: Human
Language Technologies, 2011.
As a result of this challenge, and the collaboration of annotators [9] J. Hoffart, F. M. Suchanek, K. Berberich, E. Lewis-Kelham,
and participants, we also generated a manually annotated data set, G. de Melo, and G. Weikum. YAGO2: Exploring and
which may be used in conjunction with the NEEL evaluation frame- querying world knowledge in time, space, context, and many
work (neeleval). To the best of our knowledge this is the largest languages. In 20th International Conference Companion on
publicly available data set providing entity/resource annotations for World Wide Web (WWW’11), 2011.
Microposts. We hope that both the data set and the neeleval [10] C. Li, J. Weng, Q. He, Y. Yao, A. Datta, A. Sun, and B.-S.
framework will facilitate the development of future approaches in Lee. TwiNER: Named entity recognition in targeted Twitter
this and other such tasks. stream. In 35th International ACM SIGIR Conference on
Research and Development in Information Retrieval (SIGIR
The results of this challenge highlighted the relevance of normal- ’12), 2012.
ization and time-dependent features (such as popularity) for dealing [11] P. N. Mendes, M. Jakob, A. García-Silva, and C. Bizer.
59
· #Microposts2014 · 4th Workshop on Making Sense of Microposts · @WWW2014
DBpedia spotlight: shedding light on the web of documents.
In 7th International Conference on Semantic Systems
(I-Semantics’11), 2011.
[12] G. A. Miller. Wordnet: A lexical database for English.
Commun. ACM, 38(11):39–41, 1995.
[13] O. Owoputi, C. Dyer, K. Gimpel, N. Schneider, and N. A.
Smith. Improved part-of-speech tagging for online
conversational text with word clusters. In NAACL, 2013.
[14] A. Ritter, S. Clark, Mausam, and O. Etzioni. Named entity
recognition in tweets: An experimental study. In EMNLP,
2011.
[15] G. Rizzo and R. Troncy. NERD: A framework for unifying
named entity recognition and disambiguation extraction
tools. In 13th Conference of the European Chapter of the
Association for computational Linguistics (EACL’12), 2012.
[16] G. Rizzo, R. Troncy, S. Hellmann, and M. Bruemmer. NERD
meets NIF: Lifting NLP extraction results to the Linked Data
Cloud. In Proceedings of the 5th International Workshop on
Linked Data on the Web (LDOW’12), 2012.
[17] V. I. Spitkovsky and A. X. Chang. A cross-lingual dictionary
for English Wikipedia concepts. In LREC, 2012.
[18] J. Turian, L. Ratinov, and Y. Bengio. Word representations:
A simple and general method for semi-supervised learning.
In 48th Annual Meeting of the Association for
Computational Linguistics (ACL’10), 2010.
[19] K. Wang, C. Thrasher, E. Viegas, X. Li, and B.-j. P. Hsu. An
overview of Microsoft Web N-gram corpus and applications.
In NAACL HLT 2010 Demonstration Session
(HLT-DEMO’10), 2010.
[20] Q. Wu, C. J. Burges, K. M. Svore, and J. Gao. Adapting
boosting for information retrieval measures. Information
Retrieval, 13(3):254–270, 2010.
60
· #Microposts2014 · 4th Workshop on Making Sense of Microposts · @WWW2014