Preface The 4th Workshop on Making Sense of Microposts (#Microposts2014) ost data presents challenges for analysis, knowledge extraction and was held in Seoul, Korea, on the 7th of April 2014, during the 23rd aggregation, further dissemination and reuse in any of a range of International Conference on the World Wide Web (WWW’14). #Mi- applications. At the same time, today’s end user, understandably, croposts2014 sees a change in the workshop acronym from #MSM, has very high expectations for intuitive, minimal effort applications to highlight our focus on Microposts – small chunks of information for tailored search and information retrieval across myriad, inter- published online with minimal effort, via a variety of platforms connected devices, customised to their current context – situation, and devices. The #Microposts journey started in 2011 at the 8th location and proximity of others within their social and other net- Extended Semantic Web Conference (ESWC 2011), then moved to works, and influenced by unknown users with similar interests. WWW in 2012, where it has stayed, for the third year now. The #Microposts workshop was created to bring together researchers The #Microposts series of workshops is unique in targeting re- exploring novel methods for analysing Microposts, and for reusing searchers from a range of fields spanning both Computer Science the resulting collective knowledge extracted from such posts, both and the Social Sciences. The aim is to harness the benefits differ- online and in the physical world. With each year we have seen ent fields bring to research involving Microposts, and to maintain novel, leading edge approaches to exploring this now ubiquitous, a focus on the end user and their interaction with other users and but still very valued, means of communication and the knowledge the physical and online worlds – the community who collectively it generates. We are able to report wide interest in the workshop, publish this rich, varied information. with a good number of submissions from a range of fields in and across disciplines, mainly from Computer Science and the Social Microblogging platforms and other restricted size, text only and Sciences. Along with reports of applications in different domains, multi-media capable, instant communication tools are now so com- our contributors and audience have re-confirmed each year the im- monplace that applications are constantly being developed to en- portance of Microposts to the ordinary end user and, increasingly, able their use not only on the desktop, but also on the go, from or- public organisations and industry. dinary and smart phones, tablets and even public kiosks.While the Many hearty thanks to all our contributors and participants. Sub- well-known microblogging platforms – Twitter, Facebook, MyS- missions came from institutions all over the world – the main track pace, Google+, Tumblr, Foursquare, Instagram and Pinterest, among saw authors from institutions across 11 different countries, and the others – cover a large portion of the online user base, other coun- challenge from 7. Interestingly, while challenges are often more try and/or language-specific platforms such as Sina Weibo, and popular with students, half the challenge submissions included au- mobile-based messaging tools such as WhatsApp, are increasingly thors from research institutions, including Microsoft Research, the being used to share Micropost-type information. This medium of Max Planck Institute, CNRS (France) and SAP Research. communication is no longer patronised predominantly by individ- Our Programme Committee are even more varied, coming from ual users sharing information informally within private networks universities and research institutions around the world, as well as and also with the wider public, but is used as mouth pieces by en- from industry, more than half of whom have reviewed for each of terprise organisations and public bodies, to foster a feeling of more our four workshops. A very special thanks goes to each of them; personal interaction with consumers and the wider, participating their valued feedback resulted in a rich collection of papers and public. Disaster and emergency response and management, and posters, each of which adds to the state of the art in leading edge political upheaval and crises, are two areas where social media has research. We are confident that the #Microposts series of work- been shown to be particularly powerful for disseminating critical shops will continue to foster a vibrant community, as we continue information to the individuals involved, and for broadcasting events to work with the rich body of knowledge generated by the many as they occur to the outside world. Citizen reporters and scientists and varied end users whose social and working lives span the phys- are now commonly accepted, or even anticipated, by organisations ical and online worlds. that traditionally relied mainly on trained experts. Microblogging and other social media platform usage is seen across all walks of life, from opinion mining and feedback solicitation for public con- sultations, to election campaigns and classroom participation. With increasingly lower cost methods for publishing Microposts Matthew Rowe University of Lancaster, UK (often via mobile devices), and widespread use of informal and ab- Milan Stankovic Sépage / Université Paris-Sorbonne, France breviated language, the sheer scale and heterogeneity of Microp- Aba-Sah Dadzie The University of Birmingham, UK #Microposts2014 Organising Committee, April 2014 i · #Microposts2014 · 4th Workshop on Making Sense of Microposts · @WWW2014 Introduction to the Proceedings In Combining Named Entity Recognition Methods for Concept Ex- The main workshop track attracted 12 submissions, 6 of which, all traction in Microposts, Dlugolinsky et al. present an approach for long papers, were accepted, along with a poster. These covered combining multiple named entity recognisers together. The authors topics from machine learning, on Micropost classification and ex- demonstrate the improved performance that can be achieved, in par- traction, to data mining and analysis, and sentic and sentiment anal- ticular in relation to recall, when using multiple, combined recog- ysis. Applications were seen in incident and emergency response nisers. We are pleased to report that Dlugolinsky et al. make use of and management, and topic and opinion mining. We provide a brief the #MSM2013 Concept Extraction Challenge data1 , and reference introduction to these below. their own and other contributions to the challenge. The proceedings include the abstract of the keynote, ‘Computa- Bellaachia & Al-Dhelaan, in HG-RANK: A Hypergraph-based Keyphrase tional Social Science and Microblogs – The Good, the Bad and the Extraction for Short Documents in Dynamic Genre, propose an ap- Ugly’, presented by Markus Strohmaier of the Dept. of Computer proach for extracting keyphrases from Microposts, by modeling the Science, University of Koblenz-Landau, Germany. information as a hypergraph. The authors use a random walk ap- proach to rank key phrases, and using the Opinosis dataset, contain- ing Micropost-length product reviews, demonstrate the superiority Main Track Presentations of their approach with regard to state of the art baselines. Micropost Mining and Analysis Panisson et al., in their paper Mining Concurrent Topical Activity in Microblog Streams, present a novel approach to topic mining from Twitter streams, in the context of recreating event timelines. Their evaluation, performed on a dataset sampled from the London 2012 Summer Olympics, shows a high degree of matching between the inferred timeline and the actual Olympics schedule. Prapula G et al. introduce the notion of episode in the extraction of events from tweets, in TEA: Episode Analytics on Short Messages. Named Entity Extraction & Linking Detection of episodes – significant moments when a particular en- tity gets traction on Twitter – constitute the basis for the application (NEEL) Challenge scenarios they present. Using data visualisation and social media The workshop has over the years highlighted novel research direc- monitoring, the approach is evaluated on selected famous person- tions while improving the analysis and reuse of Microposts using alities and entities, including sports and brands. approaches in Information Extraction, Data Mining, Information Visualisation, Social Studies and other relevant areas. Each of these The paper Sentic API: A Common-Sense Based API for Concept- tackles these challenges from different perspectives, using a variety Level Sentiment Analysis, by Cambria et al. presents Sentic API, of state of the art and novel techniques. At the same time, the con- which makes use of a “bag-of-concepts” model, based on ontolo- tributions to Making Sense of Microposts highlight the challenges gies and semantic networks, and a “common-sense” knowledge still faced in research and applications using Micropost data. To base, with a combination of techniques: CF-IOF, the Affective- respond to this challenge, #Microposts2014 hosted a Named En- Space vector space and the “Hourglass of Emotions”, to improve tity Extraction & Linking (NEEL) Challenge. While this and the on automatic extraction of semantics, sentics and sentiment from first (Context Extraction) challenge, in #MSM2013, directly tar- text. The authors conclude with a description of the application geted only a sub-set of the Microposts and Social Web community, of Sentic API for opinion mining and sentiment analysis of patient the dataset in each may be reused for other purposes, beyond infor- opinion about the UK National Health Service, captured using a mation extraction and data mining. We aim to extend the challenge microblog-type feedback service. in the future to widen inclusion. The poster paper Sentiment Analysis of Wimbledon Tweets, by The #Microposts2014 NEEL challenge attracted good interest from Sinha et al., introduces novel ideas that may inspire future work the community, with 43 intents to submit, out of which 24 applied on sentiment analysis on Twitter. The poster focuses on televised for a copy of the dataset, and 8 completed submission. Of these 4 events where parallel annotation of video content and Twitter streams were accepted, and a further 2 as posters. All challenge submis- may give novel insight into the understanding of the emotional con- sions also took part in the workshop’s poster session, whose aim is tent of the events. to exhibit practical application in the field, and foster further dis- cussion about the ways in which knowledge content is extracted Micropost Classification and Extraction from Microposts and reused. Evaluating Multi-label Classification of Incident-related Tweets by The NEEL challenge was chaired by A. Elizabeth Cano and Giuseppe A. Schulz et al. addresses the problem of assigning multiple labels Rizzo, with Andrea Varga as dataset chair. Many thanks to those to tweets, where such tweets are related to incidents that have oc- who helped with the annotation of the training dataset – we name curred. The approach uses dependencies between labels to boost these contributors in the challenge summary paper. the performance of a multi-label classifier trained on specific label sequences. Schulz et al. demonstrate a good level of performance using this approach, tested on identification and classification of 1 The proceedings of the 2013 ‘Making Sense of Microposts’ data concerning incidents and emergencies, with an exact-match (#MSM2013) Concept Extraction Challenge are available at: percentage of 84.35%. http://ceur-ws.org/Vol-1019 ii · #Microposts2014 · 4th Workshop on Making Sense of Microposts · @WWW2014 We provide a brief introduction to the challenge submissions here, Yosef et al., in the submission Adapting AIDA, extend an exist- and more detail about the evaluation process in the challenge sum- ing tool for entity disambiguation, AIDA. AIDA extracts entity mary paper included in the proceedings. mentions from natural language text and maps these mentions to canonical entities appearing in YAGO. To cater for Micropost con- Chang et al., who submitted the run with the highest F1 score, in tent they normalise abbreviations appearing as entity mentions and E2E: An End-to-end Entity Linking System for Short and Noisy supporting entity mentions appearing as username and/or hashtags. Text, present a novel approach to the NEEL task. They jointly op- Yosef et al. employ a graph-based approach for linking entities, timised the recognition and disambiguation tasks. Based on the which uses different similarity measures for weighting mention- local and global contexts of an entity mention they generated a set entity edges. of surface candidates using normalised entity lexicons, and applied overlap resolution techniques to recognise and disambiguate entity Bansal et al., in Linking Entities in #Microposts, present a sequen- mentions. tial approach to NEEL (comprising NEE + NEL). They make use of existing off-the-shelf-tools for Named Entity Extraction (NEE), Habib et al., in Named Entity Extraction and Linking Challenge: and also introduce novel features for entity linking. The latter rely University of Twente at #Microposts2014, present a sequential ap- on the recent popularity of an entity mention on Twitter. Along proach to the NEEL task by first extracting entities and then disam- with other state-of-the-art features based on Wikipedia, they ap- biguating them into a DBpedia link. They make use of state of the plied a LambdaMART approach for the final entity disambiguation art features for identifying named entity (NE) candidates, includ- and linking step. ing the use of Tweet segments and regular expressions. Habib et al. followed an NE dictionary approach for matching for the named Finally, in Part-of-Speech is (almost) enough: SAP Research & entity linking step. Innovation at the #Microposts2014 NEEL Challenge, Dahlmeier et al., present a sequential approach which makes use of off-the- In the submission DataTXT at #Microposts2014 Challenge, Scaiella shelf-tools for both NEE and NEL. They extended these toolkits et al. propose an approach which builds on their TAGME sys- using gazetteers, and employed a series of heuristics for improving tem. They train their disambiguation algorithm with the NEEL the disambiguation and linking steps. challenge dataset. The approach in Scaiella et al. relies strongly on Wikipedia features for extraction and disambiguation. Their fi- nal entity linking step integrates Wikipedia categories and DBpe- dia RDF types as features for deploying a C4.5 classifier. DataTXT assigns a confidence score to each entity annotation, and discards those that fall below a specified threshold. iii · #Microposts2014 · 4th Workshop on Making Sense of Microposts · @WWW2014 Workshop Awards Additional Material Yorkshire Tea2 , manufactured by Taylor’s of Harrogate, sponsored The call for participation and all paper, poster and challenge ab- the best paper awards. Nominations were sought from the review- stracts are available on the #Microposts2014 website4 . The full ers, and a final decision agreed by the Chairs, based on the nomi- proceedings are also available on the CEUR-WS server, as Vol- nations and review scores. The #Microposts2014 best paper award 11415 . The gold standard for the NEEL Challenge is available for went to: download6 . The proceedings for the #MSM2013 main track are available as André Panisson, Laetitia Gauvin, Marco Quaggiotto part of the WWW’13 Proceedings Companion7 . The #MSM2013 & Ciro Cattuto Concept Extraction Challenge proceedings are published as a sepa- for their submission entitled: rate volume as CEUR Vol-10198 , and the gold standard is available Mining Concurrent Topical Activity in for download9 . The proceedings for #MSM2012 and #MSM2011 Microblog Streams are available as CEUR Vol-83810 . and CEUR Vol-71811 , respec- tively. LinkedTV3 sponsored the NEEL Challenge award, an iPad, for the best submission. The challenge award was also determined by the results of the quantitative evaluation. The #Microposts NEEL Chal- lenge award went to: Ming-Wei Chang, Bo-June Hsu, Hao Ma, Ricky Loynd & Kuansan Wang for their submission entitled: E2E: An End-to-end Entity Linking System for Short and Noisy Text 4 http://www.scc.lancs.ac.uk/microposts2014 5 #Microposts2014 Proc.: http://ceur-ws.org/Vol-1141 6 http://ceur-ws.org/Vol-1141/ microposts2014-neel_challenge_gs.zip 7 WWW’13 Companion: http://dl.acm.org/citation.cfm?id=2487788 8 #MSM2013 CE Challenge Proc.: http://ceur-ws.org/Vol-1019 9 http://ceur-ws.org/Vol-1019/msm2013-ce_ challenge_gs.zip 2 10 http://www.yorkshiretea.co.uk #MSM2012 Proc.: http://ceur-ws.org/Vol-838 3 11 http://www.linkedtv.eu #MSM2011 Proc.: http://ceur-ws.org/Vol-718 iv · #Microposts2014 · 4th Workshop on Making Sense of Microposts · @WWW2014 Main Track Programme Committee NEEL Challenge Evaluation Committee Gholam R. Amin Sultan Qaboos University, Oman Ebrahim Bagheri Ryerson University, Canada Pierpaolo Basile University of Bari, Italy Pierpaolo Basile University of Bari, Italy Julie Birkholz Vrije University, The Netherlands Óscar Corcho Universidad Politécnica de Madrid, Spain Uldis Bojars SIOC Project, Latvia Leon Derczynski The University of Sheffield, UK John Breslin NUIG, Ireland Guillaume Erétéo Orange Labs, France A. Elizabeth Cano KMi, The Open University, UK Miriam Fernandez KMi, The Open University, UK Marco A. Casanova Pontifícia Universidade Católica do Rio de Andrés Garcia-Silva Universidad Politécnica de Madrid, Spain Janeiro, Brazil Anna Lisa Gentile The University of Sheffield, UK Óscar Corcho Universidad Politécnica de Madrid, Spain Robert Jäschke University of Kassel, Germany Ali Emrouznejad Aston Business School, UK Diana Maynard The University of Sheffield, UK Guillaume Erétéo Orange Labs, France José M. Morales del Castillo El Colegio de México, Mexico Miriam Fernandez KMi, The Open University, UK Georgios Paltoglou University of Wolverhampton, UK Fabien Gandon INRIA, Sophia-Antipolis, France Bernardo Pereira Nunes Pontifícia Universidade Católica do Rio Andrés Garcia-Silva Universidad Politécnica de Madrid, Spain de Janeiro, Brazil Anna Lisa Gentile The University of Sheffield, UK Daniel Preoţiuc-Pietro The University of Sheffield, UK Robert Jäschke University of Kassel, Germany Irina Temnikova Bulgarian Academy of Sciences, Bulgaria Jelena Jovanovic University of Belgrade, Serbia Raphaël Troncy Eurecom, France Mathieu Lacage Alcméon, France Victoria Uren Aston Business School, UK Vita Lanfranchi The University of Sheffield, UK Philipe Laublet Université Paris-Sorbonne, France João Magalhães Universidade Nova de Lisboa, Portugal Diana Maynard The University of Sheffield, UK José M. Morales del Castillo El Colegio de México, Mexico Fabrizio Orlandi DERI, Ireland Alexandre Passant seevl.net / MDG Web, Ireland Bernardo Pereira Nunes Pontifícia Universidade Católica do Rio de Janeiro, Brazil Danica Radovanovic University of Belgrade, Serbia Yves Raimond BBC, UK Giuseppe Rizzo Università di Torino, Italy Harald Sack University of Potsdam, Germany Bernhard Schandl Gnowsis.com, Austria Sean W. M. Siqueira Universidade Federal do Estado do Rio de Janeiro, Brazil Victoria Uren Aston Business School, UK Andrea Varga The University of Sheffield, UK Shenghui Wang Vrije University, The Netherlands Katrin Weller GESIS Leibniz Institute for the Social Sciences, Germany Ziqi Zhang The University of Sheffield, UK Sub Reviewers Pavan Kapanipathi Knoesis Center, Wright State University, USA Flavio Martins Universidade Nova de Lisboa, Portugal Filipa Peleja Universidade Nova de Lisboa, Portugal Víctor Rodríguez Doncel Universidad Politécnica de Madrid, Spain Nadine Steinmetz Hasso-Plattner Institute, Germany v · #Microposts2014 · 4th Workshop on Making Sense of Microposts · @WWW2014