=Paper= {{Paper |id=Vol-1141/neel-challenge-report |storemode=property |title=Making Sense of Microposts (#Microposts2014) Named Entity Extraction & Linking Challenge |pdfUrl=https://ceur-ws.org/Vol-1141/microposts2014_neel-challenge-report.pdf |volume=Vol-1141 |dblpUrl=https://dblp.org/rec/conf/msm/BasaveRVRSD14 }} ==Making Sense of Microposts (#Microposts2014) Named Entity Extraction & Linking Challenge== https://ceur-ws.org/Vol-1141/microposts2014_neel-challenge-report.pdf
               Making Sense of Microposts (#Microposts2014)
                Named Entity Extraction & Linking Challenge

                  Amparo E. Cano                             Giuseppe Rizzo                         Andrea Varga
              Knowledge Media Institute                  Università di Torino, Italy             The OAK Group
               The Open University, UK                     EURECOM, France                 The University of Sheffield, UK
           amparo.cano@open.ac.uk giuseppe.rizzo@di.unito.it                                 a.varga@dcs.shef.ac.uk
               Matthew Rowe            Milan Stankovic                                           Aba-Sah Dadzie
               School of Computing and                       Sépage, France                 School of Computer Science
                  Communications                        Université Paris-Sorbonne           University of Birmingham, UK
               Lancaster University, UK                   milstan@gmail.com                 a.dadzie@cs.bham.ac.uk
            m.rowe@lancaster.ac.uk

 ABSTRACT                                                                croposts are short text messages published using minimal effort via
 Microposts are small fragments of social media content and a pop-       social media platforms. They provide a publicly available wealth
 ular medium for sharing facts, opinions and emotions. They com-         of data which has proven to be useful in different applications and
 prise a wealth of data which is increasing exponentially, and which     contexts (e.g. music recommendation, social bots, emergency re-
 therefore presents new challenges for the information extraction        sponse situations). However, gleaning useful information from Mi-
 community, among others. This paper describes the ‘Making Sense         cropost content presents various challenges, due, among others, to
 of Microposts’ (#Microposts2014) Workshop’s Named Entity Ex-            the inherent characteristics of this type of data:
 traction and Linking (NEEL) Challenge, held as part of the 2014
 World Wide Web conference (WWW’14). The task of this chal-                i) the limited length of Microposts;
 lenge consists of the automatic extraction and linkage of entities       ii) the noisy lexical nature of Microposts, where terminology dif-
 appearing within English Microposts on Twitter. Participants were            fers between users when referring to the same thing, and ab-
 set the task of engineering a named entity extraction and DBpedia            breviations are commonplace.
 linkage system targeting a predefined taxonomy, to be run on the
 challenge data set, comprising a manually annotated training and a      A commonly used approach for mining Microposts is the use of
 test corpus of Microposts. 43 research groups expressed intent to       cues that are available in textual documents, providing contextual
 participate in the challenge, of which 24 signed the agreement re-      features to this content. One example of such a cue is the use of
 quired to be given a copy of the training and test datasets. 8 groups   named entities (NE). Extracting named entities in Micropost con-
 fulfilled all submission requirements, out of which 4 were accepted     tent has proved to be a challenging task; this was the focus of the
 for the presentation at the workshop and a further 2 as posters. The    first challenge, in #MSM2013 [3]. A step further into the use of
 submissions covered sequential and joint methods for approaching        such cues is to be able not only to recognize and classify them but
 the named entity extraction and entity linking tasks. We describe       also to provide further information, in other words, disambiguating
 the evaluation process and discuss the performance of the different     entities. This prompted the Named Entity Extraction and Linking
 approaches to the #Microposts2014 NEEL Challenge.                       (NEEL) Challenge, held as part of the Making Sense of Microposts
                                                                         Workshop (#Microposts2014) at the 2014 World Wide Web Confer-
 Keywords                                                                ence (WWW’14).
 Microposts, Named Entity, Evaluation, Extraction, Linking, Dis-
 ambiguation, Challenge                                                  The purpose of this challenge was to set up an open and com-
                                                                         petitive environment that would encourage participants to deliver
                                                                         novel or improved approaches to extract entities from Microposts
 1. INTRODUCTION                                                         and link them to their DBpedia counterpart resources (if defined).
 Since the first Making Sense of Microposts (#MSM2011) work-             This report describes the #Microposts2014 NEEL Challenge, our
 shop at the Extended Semantic Web Conference in 2011 through to         collaborative annotation of a corpus of Microposts and our eval-
 the most recent workshop in 2014 we have received over 80 sub-          uation of the performance of each submission. We also describe
 missions covering a wide range of topics related to mining infor-       the approaches taken in the participants’ systems – which use both
 mation and (re-)using the knowledge content of Microposts. Mi-          established and novel, alternative approaches to entity extraction
                                                                         and linking. We describe how well they performed and how sys-
                                                                         tem performance differed across approaches. The resulting body
                                                                         of work has implications for researchers interested in the task of
 Copyright c 2014 held by author(s)/owner(s); copying permitted          information extraction from social media.
 only for private and academic purposes.
 Published as part of the #Microposts2014 Workshop proceedings,
 available online as CEUR Vol-1141 (http://ceur-ws.org/Vol-1141)
                                                                         2.   THE CHALLENGE
                                                                         In this section we describe the goal of the challenge, the task set,
 #Microposts2014, April 7th, 2014, Seoul, Korea.                         and the process we followed to generate the corpus of Microposts.
                                                                         We conclude the section with the list of the accepted submissions.




· #Microposts2014 · 4th Workshop on Making Sense of Microposts · @WWW2014
 2.1    The Task and Goal                                                 Correctly formatted result:
 The NEEL Challenge task required participants to build semi-automated    bbcworldservice dbpedia:BBC_World_Service
 systems in two stages:                                                   Oslo dbpedia:Oslo
                                                                          oslexp dbpedia:2011_Norway_attacks

   (i) generally known as Named Entity Extraction (NEE) – in which
       participants were to extract entity mentions from a tweet; and     We also consider the case where an entity is referenced in a tweet
                                                                          either as a noun or a noun phrase, if it:
  (ii) known as Named Entity Linking (NEL), in which each entity
       extracted is linked to an English DBpedia v3.9 resource.           a) belongs to one of the categories specified in the taxonomy;
                                                                          b) is disambiguated by a DBpedia URI within the context of the
                                                                             tweet. Hence any (single word or phrase) entity without a dis-
 For this task we considered the definition of an entity in the general      ambiguation URI is disregarded;
 sense of being, in which an object or a set of objects do not neces-     c) subsumes other entities. The longest entity phrase within a Mi-
 sarily need to have a material existence, but which however must            cropost, composed of multiple sequential entities and that can
 be characterized as an instance of a taxonomy class. To facilitate          be disambiguated by a DBpedia URI, takes precedence over its
 the creation of the gold standard (GS) we limited the entity types          component entities.
 evaluated in this challenge by specifying the taxonomy to be used:
 the NERD ontology v0.51 [16]. To this we added a few concepts
 from the DBpedia taxonomy. The taxonomy was not considered as            Consider the following examples:
 normative in the evaluation of the submissions, nor for the ranking.
 This is a deliberate choice, to increase the complexity of the task
 and to let participants perform taxonomy matching starting from          1. [Natural History Museum at Tring];
 the distribution of the entities in the GS. The list of classes in the   2. [News International chairman James Murdoch]’s
 taxonomy used is distributed with the released GS2 .                        evidence to MPs on phone hacking;
                                                                          3. [Sony]’s [Android Honeycomb] Tablet
 Beside the typical word-tokens found in a Micropost, new to this
 year’s challenge we considered special social media markers as en-
 tity mentions as well. These Twitter markers are tokens introduced       For the 3nd case, even though they may appear to be a coherent
 with a special symbol. We considered two such markers: hashtags,         phrases, since there are no DBpedia URIs for [Sony’s Android
 prefixed by #, denoting the topic of a Micropost (e.g. #londonri-        Honeycomb] or [Sony’s Android Honeycomb Tablet],
 ots, #surreyriots, #osloexpl), and mentions prefixed by @, referring     the entity phrase is split into what are the (valid) component enti-
 to Twitter user names, which include entities such as organizations      ties highlighted above.
 (e.g. @bbcworldservice) and celebrities (e.g. @ChadMMurray,
 @AmyWinehouse).
                                                                          To encourage competition we solicited sponsorship for the winning
 Participants were required to recognize these different entity types     submission. This was provided by the European project LinkedTV5 ,
 within a given Micropost, and to extract the corresponding entity        who offered a prize of an iPad This generous sponsorship is tes-
 link tuples. Consider the following example, taken from our anno-        tament to the growing interest in issues related to automatic ap-
 tated corpus:                                                            proaches for gleaning information from (the very large amounts of)
                                                                          social media data.

  Source (tweet text):                                                    2.2     Data Collection and Annotation
  RT @bbcworldservice police confirms bomb                                The challenge data set comprises 3,505 tweets extracted from a col-
  in Oslo #oslexp                                                         lection of over 18 million tweets. This collection, provided by the
                                                                          Redites project6 , covers event-annotated tweets collected for the
 The 2nd token (the mention @bbcworldservice) in this Micropost           period 15th July 2011 to 15th August 2011 (31 days). It extends
 refers to the international broadcaster, the BBC World Service; the      over multiple notable events, including the death of Amy Wine-
 7th token refers to the location Oslo; while the 8th token (the hash-    house, the London Riots and the Oslo bombing. Since the NEEL
 tag #oslexp) refers to the 2011 Norway terrorist attack. An entry        Challenge task is to automatically extract and link entities, we built
 to the challenge would be required to spot these tokens and display      our data set considering both event and non-event tweets. Event
 the result as a set of annotations, where each line corresponds to a     tweets are more likely to contain entities; non-event tweets there-
 tab-separated entity mention3 and entity link4 :                         fore enable us to evaluate the performance of the system in avoiding
                                                                          false positives in the entity extraction phase.
 1
   http://nerd.eurecom.fr/ontology/nerd-v0.5.
 n3                                                                       Statistics describing the training and test sets are provided in Ta-
 2
   The NEEL Challenge GS available for download from: http:               ble 1. The dataset was split into training (70%) and test (30%)
 //ceur-ws.org/Vol-1141/microposts2014-neel_                              sets. The training set contains 2,340 tweets, with 41,037 tokens
 challenge_gs.zip                                                         and 3,819 named entities; the test set contains 1,165 tweets, with
 3
   Note that the annotated result returns tokens without the social       20,224 tokens and 1,458 named entities. The tweets are relatively
 media markers (# and @) in the original Micropost.
 4                                                                        5
   In this example “dbpedia:” refers to the namespace prefix of a             http://www.linkedtv.eu
                                                                          6
 DBpedia resource (see http://dbpedia.org/resource)                           http://demeter.inf.ed.ac.uk/redites




                                                                                                                                           55
· #Microposts2014 · 4th Workshop on Making Sense of Microposts · @WWW2014
 Table 1: General statistics of the training and test data sets:
 Posts refers to the number of tweets in a data set; Words to the unique number of words; Tokens refers to the total number of
 words; AvgTokens/Post represents the average number of tokens per tweet; NEs denotes the unique number of NEs; totalNEs the
 total number of NEs; and AvgNEs/Post the average number of NEs per post. We computed AvgTokens/Post and AvgNEs/Post as the
 standard deviation from the mean (mean ± standard deviation).


                          Dataset     Posts    Words/Tokens      AvgTokens/Post     NEs     totalNEs        AvgNEs/Post

                            train     2,340    12,758/ 41,037       17.54±5.70     1,862      3,819          3.26±3.37
                            test      1,165     6,858/ 20,224       17.36±5.59       834      1,458          2.50±2.94


 long in both data sets; the average number of tokens per tweet is
 17.54±5.70 in the training, and 17.36±5.59 in the test set. The av-       Table 2: Submissions accepted, ordered by submission number,
 erage number of entities per tweet is also relatively high, at            with team affiliations and number of runs for each.
 3.26±3.37 for the training and 2.50±2.94 for the test dataset. The         ID      Affiliation                Authors                    Runs
 percentage of tweets without any valid entities is 32% (775 tweets)        13      UTwente                    Habib, M. et al.              2
 in the training, and 40% (469 tweets) in the test set. There is a fair
 bit of overlap of entities between the training and test data: 13.27%      15      Max Planck                 Amir, M. et al.               3
 (316) of the named entities in the training data also occurs in the        16      IIT Hyberabad              Bansal, R. et al.             1
 test dataset. With regard to the tokens in the original tweets with
 hashtag and mention social media markers, a total of 406 hashtags          18      Microsoft                  Chang, M.                     3
 represented valid entities in the training, with 184 in the test set.      19      Net7-Spaziodati-           Scaiella, U. et al.           2
 The total number of valid entity mentions was 133 in the training,                 UPisa
 and 73 in the test data set.
                                                                            20      SAP                        Dahlmeier, D. et al.          1
 The annotation of each Micropost in the training set gave all partic-
 ipants a common base from which to learn extraction patterns. In
 order to assess the performance of the submissions we used an un-         dard, is available for download9 with the #Microposts2014 Work-
 derlying gold standard (GS), generated by 14 annotators, who had          shop proceedings, accessible under the Creative Commons Attribution-
 different backgrounds, including computer scientists, social scien-       NonCommercial-ShareAlike 3.0 Unported License10 .
 tists, social web experts, semantic web experts and linguists.

 The annotation process comprised the following phases7
                                                                           2.3     Challenge Submissions
                                                                           The challenge attracted a lot of interest from research groups spread
                                                                           across the world. Initially, 43 groups expressed their intent to par-
 Phase 1. Unsupervised annotation of the corpus was performed, to          ticipate in the challenge; however only 8 completed submission.
          extract candidate links that were used as input to the next      Each submission consisted of a short paper explaining the system
          stage. The candidates were extracted using the NERD              approach, and up to three different test set annotations generated
          framework [15].                                                  by running the system with different settings. After peer review,
                                                                           4 submissions were accepted, and a further 2 as posters. The sub-
 Phase 2. The data set was divided into batches, with three different      mission run with the best overall performance for each system was
          annotators to each batch. In this phase annotations were         used in the rankings (see Table 4). The submissions accepted are
          performed using CrowdFlower8 . The annotators were               listed in Table 2.
          asked to analyze the NERD links generated in phase 1 by
          adding or removing entity-annotations as required. The
          annotators were also asked to mark any ambiguous cases
                                                                           2.4     System Descriptions
                                                                           We present next an analysis of the participants’ systems for the
          encountered.
                                                                           Named Entity Extraction and Linking (NEEL) tasks. Except for
 Phase 3. In the final stage, consistency checking, three experts          submission 18, who treated the NEEL task as a joint task of Named
          double-checked the annotations and generated the GS (for         Entity Extraction (NEE) and Named Entity Linking (NEL); all par-
          both the training and test sets). Three main tasks were          ticipants approached the NEEL task as two sequential sub-tasks
          carried out here: (1) cross-consistency check of entity          (i.e. NEE first, followed by NEL). A summary of these approaches
          types; (2) cross-consistency check of URIs; (3) resolution       includes:
          of ambiguous cases raised by the 14 annotators.
                                                                              i) use of external systems;
                                                                             ii) main features used;
 The complete data set, including a list of changes and the gold stan-      iii) type of strategy used;
 7                                                                          9
   We aim to provide a more detailed explanation of the annotation            http://ceur-ws.org/Vol-1141/
 process and the rest of the NEEL Challenge evaluation process in a         microposts2014-neel_challenge_gs.zip
 separate publication.                                                     10
                                                                              Following the Twitter ToS we only provide tweet IDs and annota-
 8
   http://crowdflower.com                                                   tions for the training set; and tweet IDs for the test set.




                                                                                                                                           56
· #Microposts2014 · 4th Workshop on Making Sense of Microposts · @WWW2014
  iv) use of external sources.                                            use of Twitter account metadata for characterizing mentions and
                                                                          popularity-based statistical features for characterizing entities (sys-
 Table 3 provides a detailed summary of the approaches used for           tems 16, 18).
 both the NEE and NEL tasks.
                                                                          The classification strategies used for entity linking included super-
                                                                          vised approaches (systems 13, 15, 16, 18, 19) existing off-the-shelf
 The NEE task on Microposts is on its own challenging. One of the         approaches enhanced with simple heuristics (e.g. the search+rules)
 main strategies was to use off-the-shelf named entity recognition        (system 20).
 (NER) tools, improved through the use of extended gazetteers. Sys-
 tem 18 approached the NEE task from scratch using a rule-based
 approach; all others made use of external toolkits. Some of these
                                                                          3.       EVALUATION OF CHALLENGE SUBMIS-
 were Twitter-tuned and were applied for:                                          SIONS
                                                                          We describe next the evaluation measures used to assess the good-
                                                                          ness of the submissions and conclude with the final challenge rank-
   i) feature extraction, including the use of the TwitterNLP (2013)      ings, with submissions ordered according to the F1 measure.
      [13] and TwitterNLP (2011) [8] toolkits for POS tagging (sys-
      tems 16, 20);
                                                                          3.1        Evaluation Measures
  ii) entity extraction with TwiNER [10], Ritter’s NER [14] and           We evaluate the goodness of a system S in terms of the performance
      TAGME [5] (systems 13, 16, 19).                                     of the system to both recognize and link an entity from a test set
                                                                          T S. Per each instance in T S, a system provides a set of pairs P
                                                                          of the form: entity mention (e), and link (l). A link is any valid
 Other external toolkits which address NEE in longer newswire texts
                                                                          DBpedia URI11 that points to an existing resource (e.g. http://
 were also applied, including Stanford NER [6] and DBpedia Spot-
                                                                          dbpedia.org/resource/Barack_Obama). The evaluation
 light [11] (systems 15, 20).
                                                                          consists of comparing submission entry pairs against those in the
                                                                          gold standard GS. The measures used to evaluate each pair are
 Another common trend across these systems was the use of gazetteer-
                                                                          precision P , recall R, and f-measure F1 . The evaluation is based
 based, rule-matching approaches to improve the coverage of the
                                                                          on micro-averages.
 off-the-shelf tools. System 13 applied simple regular expression
 rules to detect additional named entities not found by the NE ex-
                                                                          First, a cleansing stage is performed over each submission, resolv-
 tractor (such as numbers, and dates); systems 15 and 18 applied
                                                                          ing where needed, the redirects. Then, to assess the correctness of
 rules to find candidate entity mentions using a knowledge base
                                                                          the pairs provided by a system S, we perform an exact-match eval-
 (among others, Freebase [2]). Some systems also applied name
                                                                          uation, in which a pair is correct only if both the entity mention and
 normalization for feature extraction (systems 15, 18). This strategy
                                                                          the link match the corresponding set in the GS. Pair order is also
 was particularly useful for catering for entities originally appearing
                                                                          relevant. We define (e, l)S ∈ S as the set of pairs extracted by the
 as hashtags or username mentions. For example, hashtags such as
                                                                          system S, (e, l)GS ∈ GS denotes the set of pairs in the gold stan-
 #BarackObama were normalized into a composite entity mention
                                                                          dard. We define the set of true positives T P , false positives F P ,
 “Barack Obama"; and “@EmWatson" into “Emma Watson".
                                                                          and false negatives F N for a given system as:
                                                                                           T P = {(e, l)S |(e, l)GS ∈ (S ∩ GS)}              (1)
 The NEL task involved in some cases the use of off-the-self tools,
 for finding candidate links for each entity mention and/or for deriv-
                                                                                       F P = {(e, l)S |(e, l)GS ∈ S ∧ (e, l) ∈
                                                                                                                             / GS)}          (2)
 ing mention features (systems 13, 19, 20). A common trend across
 systems was the use of external knowledge sources including:
                                                                                        F N = {(e, l)S |(e, l)GS ∈ GS ∧ (e, l) ∈
                                                                                                                               / S}          (3)

   i) NER dictionaries (e.g. Google CrossWiki [17]);
                                                                          Thus T P defines the set of relevant pairs in T S, in other words the
  ii) Knowledge Base Gazetteers (e.g. Yago [9], DBpedia [1]);             set of pairs in T S that match corresponding ones in GS. F P is the
                                                                          set of irrelevant pairs in T S, in other words the pairs in T S that do
 iii) Weighted lexicons (using e.g. Freebase [2], Wikipedia);             not match the pairs in GS. F N is the set of false negatives denoting
                                                                          the pairs that are not recognised by T S, yet appear in GS. Since
  iv) other sources (e.g. Microsoft Web N-gram [19]).
                                                                          our evaluation is based on a micro-average analysis, we sum the
                                                                          individual true positives, false positives, and false negatives of each
 A wide range of different features was investigated for the linking      system across all Microposts. As we require an exact-match for
 strategies. Some systems characterized an entity using Micropost-        pairs (e, l) we are looking for strict entity recognition and linking
 derived features with Knowledge base (KB)-derived features (sys-         matches; each system has to link each entity e recognised to the
 tems 13, 15, 16, 19). Micropost-derived features include the use of      correct resource l.
 lexical (e.g., N-grams, capitalization) and syntactical (e.g., POS)
 features, while KB-derived features included the use of URIs, an-        From this set of definitions, we define precision, recall, and f-
 chor text and link-based probabilities (see Table 3). Additionally,      measure as follows:
 features were extended by capturing jointly the local (within a Mi-                                           |T P |
 cropost) and global (within a knowledge base) contextual informa-                                   P =                                     (4)
                                                                                                           |T P ∪ F P |
 tion of an entity, via graph-based features (such as entity seman-
                                                                          11
 tic cohesiveness) (system 18). Further novel features included the            We consider all DBpedia v3.9 resources valid.




                                                                                                                                            57
· #Microposts2014 · 4th Workshop on Making Sense of Microposts · @WWW2014
                                                                                                                               Table 3: Automated approaches used in #Microposts2014 NEEL Challenge


                                                                                                                                                                       Named Entity Extraction (NEE)
                                                                                                             External System                       Main Features                                   NE Extraction Strategy                      Linguistic Knowledge
                                                                                  13 UTwente                 TwiNER [10]                           Regular Expression, Entity phrases, N-gram      TwiNER [10] + CRF                           DBa Gazetteer [1], Wikib
                                                                                                                                                                                                                                               Gazetteer

                                                                                  15 MaxPlanck               StanfordNER [6]                       -                                               -                                           NER Dictionary

                                                                                  16 IIT Hyberabad           RitterNER [14],TwitterNLP(2011) [8]   Proper nouns sequence, N-grams                  -                                           Wiki

                                                                                  18 Microsoft               -                                     N-grams, stop words removal, punctuation        Rule-based (candidate filter)               Wiki and Freebase [2]
                                                                                                                                                   as tokens                                                                                   lexicons

                                                                                  19 Net7-Spaziodati-UPisa   TAGME [5]                             Wiki anchor texts, N-grams                      Collective agreement + Wiki stats           Wiki

                                                                                  20 SAP                     DBSpotlight [11], TwitterNLP(2013)    Unigram, POS, lower, title & upper case,        CRF                                         DB Gazetteer, Brown
                                                                                                             [13]                                  stripped words, isNumber, word cluster,                                                     Clusters [18]
                                                                                                                                                   DBpedia



                                                                                                                                                                         Named Entity Linking (NEL)
                                                                                                             External System                       Main Features                                   Linking Strategy                            Linguistic Knowledge
                                                                                  13 UTwente                 Google Searchc                        N-grams, DB links, Wiki links,                  SVM                                         Wiki, DB, WordNet [12],
                                                                                                                                                   Capitalization                                                                              Web N-Gram [19], Yago
                                                                                                                                                                                                                                               [9]

                                                                                  15 MaxPlanck               -                                     Prefix, POS, suffix, Twitter account            entity aggregate prior + prefix-tree data   Wiki, DB, Yago [9]
                                                                                                                                                   metadata, normalized mentions, tri-grams        structure + DB match




· #Microposts2014 · 4th Workshop on Making Sense of Microposts · @WWW2014
                                                                                  16 IIT Hyberabad           -                                     Wiki context-based measure, anchor text         LambdaMART [20]                             Wiki Gazetteer, Google
                                                                                                                                                   measure, entity popularity (in Twitter)         (ranking/disambiguation)                    CrossWiki Dictionary
                                                                                                                                                                                                                                               [17]

                                                                                  18 Microsoft               -                                     N-grams, lower case, entity graph features      DCD-SSVM [4] + MART gradient                Wiki, Freebase [2]
                                                                                                                                                   (entity semantic cohesiveness),                 boosting [7]
                                                                                                                                                   popularity-based statistical features (clicks
                                                                                                                                                   and visiting information from the Web)

                                                                                  19 Net7-Spaziodati-UPisa   TAGME [5]                             Link probability, mention-link commonness       C4.5 (for taxonomy-filter)                  Wiki, DB [1]

                                                                                  20 SAP                     Search API (Wiki, DBSpotlight [11],   Entity mention                                  Search+rules                                Wiki, DB [1]
                                                                                                             Google)


                                                                            a
                                                                              DBpedia abbreviated to ‘DB’ [1]
                                                                            b
                                                                              Wikipedia abbreviated to ‘Wiki’
                                                                            c
                                                                              Google Search, https://developers.google.com/web-search/docs




                                                          58
                                    |T P |                                 with this type of progressively changing content. It also indicated
                          R=                                        (5)
                                |T P ∪ F N |                               that learning entity extraction and linking as a joint task may be
                                                                           beneficial for boosting performance in entity linking in Microposts.
                                      P ∗R
                           F1 = 2 ∗                                 (6)
                                      P +R                                 We aim to continue to host additional challenges targeting more
                                                                           complex tasks, within the context of data mining of Microposts.
 The evaluation framework used in the challenge is available at https:
 //github.com/giusepperizzo/neeleval.                                      5.   ACKNOWLEDGMENTS
                                                                           The authors thank Nikolaos Aletras for helping to set-up the crowd-
 3.2    Evaluation Results                                                 flower experiments for annotating the gold standard data set. A spe-
 Table 4 reports the performance of participants’ systems, using the       cial thank to Andrés García-Silva, Daniel Preoţiuc-Pietro, Ebrahim
 best run for each. The ranking is based on the F1 .                       Bagheri, José M. Morales del Castillo, Irina Temnikova, Georgios
                                                                           Paltoglou, Pierpaolo Basile and Leon Derczynski, who took part in
                                                                           the annotation tasks. We also thank the participants who helped us
       Table 4: P, R, F1 breakdown figures per submission.                 improve the gold standard. We finally thank the LinkedTV project
                                                                           for supporting the challenge by sponsoring the prize for the win-
                                                                           ning submission.
  Rank System Entry                              P        R       F1
   1      18-2     Microsoft                   77.10 64.20 70.06           6.   REFERENCES
   2      13-2     UTwente                     57.30 52.74 54.93            [1] S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, and Z. Ives.
   3      19-2     Net7-Spaziodati-UPisa       60.93 42.25 49.90                DBpedia: A Nucleus for a Web of Open Data. In 6th
                                                                                International Semantic Web Conference (ISWC’07), 2007.
   4      15-3     MaxPlanck                   53.28 39.51 45.37            [2] K. Bollacker, C. Evans, P. Paritosh, T. Sturge, and J. Taylor.
   5      16-1     IIT Hyberabad               50.95 40.67 45.23                Freebase: A collaboratively created graph database for
   6      20-1     SAP                         49.58 32.17 39.02                structuring human knowledge. In ACM SIGMOD
                                                                                International Conference on Management of Data
                                                                                (SIGMOD’08), 2008.
 System 18 clearly outperformed other systems, with F1 more than
                                                                            [3] A. E. Cano Basave, A. Varga, M. Rowe, M. Stankovic, and
 15% higher than the next best system. System 18 differed from all
                                                                                A.-S. Dadzie. Making Sense of Microposts (#MSM2013)
 other systems, by using a joint approach to the NEEL task. The oth-
                                                                                Concept Extraction Challenge. In Making Sense of
 ers each divided the task into a sequential entity extraction and link-
                                                                                Microposts (#MSM2013) Concept Extraction Challenge,
 ing task. The approach in System 18 made use of features which
                                                                                2013.
 capture jointly an entity’s local and global contextual information,
 resulting in the best approach submitted to the #Microposts2014            [4] M.-W. Chang and W.-T. Yih. Dual coordinate descent
 NEEL Challenge.                                                                algorithms for efficient large margin structured prediction.
                                                                                Transactions of the Association for Computational
                                                                                Linguistics, 2013.
 4. CONCLUSIONS                                                             [5] P. Ferragina and U. Scaiella. Fast and accurate annotation of
 The aim of the #Microposts2014 Named Entity Extraction & Link-                 short texts with Wikipedia pages. IEEE Software,
 ing Challenge was to foster an open initiative that would encourage            29(1):70–75, 2012.
 participants to develop novel approaches for extracting and linking
                                                                            [6] J. R. Finkel, T. Grenager, and C. Manning. Incorporating
 entity mentions appearing in Microposts. The NEEL task involved
                                                                                non-local information into information extraction systems by
 the extraction of entity mentions in Microposts and the linking of
                                                                                Gibbs sampling. In 43rd Annual Meeting on Association for
 these entity mentions to DBpedia resources (where such exist).
                                                                                Computational Linguistics (ACL’05), 2005.
 Our motivation for hosting this challenge is the increased availabil-      [7] J. H. Friedman. Greedy function approximation: A gradient
 ity of third-party entity extraction and entity linking tools. Such            boosting machine. Annals of Statistics, 29:1189–1232, 2000.
 tools have proven to be a good starting point for entity linking,          [8] K. Gimpel, N. Schneider, B. O’Connor, D. Das, D. Mills,
 even for Microposts. However, the evaluation results show that the             J. Eisenstein, M. Heilman, D. Yogatama, J. Flanigan, and
 NEEL task remains challenging when applied to social media con-                N. A. Smith. Part-of-speech tagging for Twitter: Annotation,
 tent with its peculiarities, when compared to standard length text             features, and experiments. In 49th Annual Meeting of the
 employing regular language.                                                    Association for Computational Linguistics: Human
                                                                                Language Technologies, 2011.
 As a result of this challenge, and the collaboration of annotators         [9] J. Hoffart, F. M. Suchanek, K. Berberich, E. Lewis-Kelham,
 and participants, we also generated a manually annotated data set,             G. de Melo, and G. Weikum. YAGO2: Exploring and
 which may be used in conjunction with the NEEL evaluation frame-               querying world knowledge in time, space, context, and many
 work (neeleval). To the best of our knowledge this is the largest              languages. In 20th International Conference Companion on
 publicly available data set providing entity/resource annotations for          World Wide Web (WWW’11), 2011.
 Microposts. We hope that both the data set and the neeleval               [10] C. Li, J. Weng, Q. He, Y. Yao, A. Datta, A. Sun, and B.-S.
 framework will facilitate the development of future approaches in              Lee. TwiNER: Named entity recognition in targeted Twitter
 this and other such tasks.                                                     stream. In 35th International ACM SIGIR Conference on
                                                                                Research and Development in Information Retrieval (SIGIR
 The results of this challenge highlighted the relevance of normal-             ’12), 2012.
 ization and time-dependent features (such as popularity) for dealing      [11] P. N. Mendes, M. Jakob, A. García-Silva, and C. Bizer.




                                                                                                                                          59
· #Microposts2014 · 4th Workshop on Making Sense of Microposts · @WWW2014
      DBpedia spotlight: shedding light on the web of documents.
      In 7th International Conference on Semantic Systems
      (I-Semantics’11), 2011.
 [12] G. A. Miller. Wordnet: A lexical database for English.
      Commun. ACM, 38(11):39–41, 1995.
 [13] O. Owoputi, C. Dyer, K. Gimpel, N. Schneider, and N. A.
      Smith. Improved part-of-speech tagging for online
      conversational text with word clusters. In NAACL, 2013.
 [14] A. Ritter, S. Clark, Mausam, and O. Etzioni. Named entity
      recognition in tweets: An experimental study. In EMNLP,
      2011.
 [15] G. Rizzo and R. Troncy. NERD: A framework for unifying
      named entity recognition and disambiguation extraction
      tools. In 13th Conference of the European Chapter of the
      Association for computational Linguistics (EACL’12), 2012.
 [16] G. Rizzo, R. Troncy, S. Hellmann, and M. Bruemmer. NERD
      meets NIF: Lifting NLP extraction results to the Linked Data
      Cloud. In Proceedings of the 5th International Workshop on
      Linked Data on the Web (LDOW’12), 2012.
 [17] V. I. Spitkovsky and A. X. Chang. A cross-lingual dictionary
      for English Wikipedia concepts. In LREC, 2012.
 [18] J. Turian, L. Ratinov, and Y. Bengio. Word representations:
      A simple and general method for semi-supervised learning.
      In 48th Annual Meeting of the Association for
      Computational Linguistics (ACL’10), 2010.
 [19] K. Wang, C. Thrasher, E. Viegas, X. Li, and B.-j. P. Hsu. An
      overview of Microsoft Web N-gram corpus and applications.
      In NAACL HLT 2010 Demonstration Session
      (HLT-DEMO’10), 2010.
 [20] Q. Wu, C. J. Burges, K. M. Svore, and J. Gao. Adapting
      boosting for information retrieval measures. Information
      Retrieval, 13(3):254–270, 2010.




                                                                            60
· #Microposts2014 · 4th Workshop on Making Sense of Microposts · @WWW2014