Preface #Microposts2015, the 5th Workshop on Making Sense of Microp- moment, breaking news, local and context-specific information and osts, was held in Florence, Italy, on the 18th of May 2015, during personal stories, resulted in an increased sense of community and (WWW’15), the 24th International Conference on the World Wide solidarity. Interestingly, in response to emergencies, mass demon- Web. The #Microposts journey started at the 8th Extended Se- strations and other social events such as festivals and conferences, mantic Web Conference (ESWC 2011, as #MSM, with the change when regular access to communication services is often interrupted in acronym from 2014), and moved to WWW in 2012, where it and/or unreliable, developers are quick to offer alternatives that end has stayed, for the fourth year now. #Microposts2015 continues to users piggyback on to post information. Line was born to serve highlight the importance of the medium, as we see end users appro- such a need, to provide an alternative communication service and priating Microposts, small chunks of information published online support emergency response during a natural disaster in Japan in with minimal effort, as part of daily communication and to interact 2011. Its popularity continued beyond its initial purpose, and Line with increasingly wider networks and new publishing arenas. has grown into a popular (regional) microblogging service. The #Microposts workshops are unique in that they solicit par- The #Microposts workshop was created to bring together research- ticipation not just from Computer Science, but encourage inter- ers in different fields studying the publication, analysis and reuse disciplinary work. We welcome research that looks at computa- of these very small chunks of information, shared in private, semi- tional analysis of Microposts, as well as studies that employ mixed public and fully open, social and formal networks. Microposts col- methods, and also those that examine the human generating and lectively make up a vast knowledge store, contained in what is to- consuming Microposts and interacting with other users via this day described as “big data” – heterogenous, increasing at phenom- publishing venue. New to #Microposts2015 is a dedicated So- enal rates, and with multiple, unbridled authors, covering myriad cial Sciences track, to encourage, particularly, contribution from topics with varying degrees of accuracy and veracity. With each the Social Sciences, to harness the advantages that approaches to year we have seen submissions tackling different aspects of Micro- analysing Microposts from this perspective bring to the field. posts, with new methods and techniques developed to analyse this valuable dataset and also its publishers, human or bot, and exam- The term Micropost now rarely needs definition. Microposts are ining the different ways in which the medium is used. With the here to stay, and have evolved from text only, to include images, increase in the use of Microposts as a portal to other services, we and now, audio and video. New platforms are developed each year saw, this year, studies on the detection and analysis of spam, and the to serve specific markets, and niche services compete with each use of open posting as a cover for disseminating extremist opinions other for a share of the audience. Twitter’s Periscope is a new or to swamp dissenting views. Reflecting the very social nature of service similar to Meerkat, both of which use microblogging plat- the publishing platform, submissions also covered analysis of the forms to alert a network to a live video stream. Microposts now human reaction to recent, provoking news events. often serve also as a portal, and are harnessed by recommendation services, marketing and other enterprise to advertise or push in- We thank all contributors and participants: each author’s work adds formation, products and services on other platforms. This is a not to research that continues to advance the field. Submissions to the surprising means to access potential users, who now exchange Mi- two research tracks came from institutions in ten countries around croposts round the clock, using a variety of publishing platforms. the world. The challenge also continues to see wide interest, with Media trends show that users are doing so increasingly from per- final submissions from academia and industry, across six countries. sonal, mobile devices, as a preferred/convenient option that started Our programme committee is even more varied, working in acade- to overtake usage on PCs in 2014. To extend reach, both in de- mia, independent research institutions and industry, and spanning veloped and emerging markets, services for publishing Microposts an even larger number of countries. Most of our PC have reviewed from feature phones are being developed – these include the usual for more than one, and a good percentage, all five #Microposts suspects, Twitter and Facebook, who employ native apps or the workshops. Very special thanks to our committee, without whom mobile web, and also newer entrants with dedicated services and we would not be able to run the workshop – their dedication is apps such as Saya. Country and language-specific platforms such seen in the feedback provided to us and to authors. Thanks also to as Sina Weibo, while not as widespread, serve a specific region and the chairs of the Social Sciences Track and the NEEL Challenge, market, especially where any of a number of reasons prevent access whose work has been invaluable in pulling the three parts together to the more well-known microblogging platforms. Political move- into a unified, successful workshop. ments such as the Arab Spring have been reported to have increased Matthew Rowe Lancaster University, UK the use of social media services and microblogging particularly in Milan Stankovic Sépage / Université Paris-Sorbonne, France regions concerned, as the quick, low-cost means for sharing, in the Aba-Sah Dadzie KMi, The Open University, UK #Microposts2015 Organising Committee, May 2015 i · #Microposts2015 · 5th Workshop on Making Sense of Microposts · @WWW2015 Introduction to the Proceedings identify further individuals and communities promoting extremism. The authors report misclassification of 13% and accuracy of 77% Main Track for predicting “hate promoting bloggers”, with misclassification of unknown bloggers at 34%. The main workshop track attracted nine submissions, out of which two long papers and one short were accepted, in addition to an ex- tended abstract and a poster. It should be noted that two of these crossed the boundary between Computer and Social Sciences, and were therefore assigned reviewers from both tracks. Topics cov- ered ranged from machine learning and named entity recognition to Micropost classification and extraction. Applications were seen in topic, event and spam detection. We provide a brief introduction to each below. Social Sciences Track The Social Sciences track attracted three submissions, of which two De Boom, Van Canneyt & Dhoedt, in Semantics-driven Event Clus- were accepted. In addition to data mining and/or statistical analysis tering in Twitter Feeds, present a novel perspective on event-detection over the very large amounts of data involved, each submission car- in tweets, by associating semantics to tweets and hashtags. They ried out in-depth, qualitative analysis to tease out nuanced informa- demonstrate how an approach that combines machine learning with tion that is more difficult to identify with automated methods. The explicit semantics detection can yield considerable improvement track was chaired by Katrin Weller and Danica Radovanović. over state of the art event clustering approaches. One of the major contemporary events that spiked user engagement In the paper Making the Most of Tweet-Inherent Features for So- on social media during the first months of 2015 was the Charlie cial Spam Detection on Twitter, Wang, Zubiaga, Liakata & Procter Hebdo shooting in France on January 7th. Giglietto & Lee pro- investigate the use of a variety of feature sets and classifiers for vide one of the first studies of Twitter users’ reactions to this event, the detection of social spam on Twitter. These include user fea- in To Be or Not to Be Charlie: Twitter Hashtags as a Discourse tures (social network properties of the tweeter, such as their in- and and Counter-discourse in the Aftermath of the 2015 Charlie Hebdo out-degrees); content features (number of hashtags and mentions); Shooting in France. In particular, they study the use of the hash- n-gram features (mined from textual aspects); and sentiment fea- tag #JeNeSuisPasCharlie, which was used in contrast to the initial tures, based on both manually and automatically created seman- #JeSuisCharlie hashtag. Using different approaches to data anal- tic lexicons. Classifiers tested including naïve Bayes; k-Nearest ysis (including activity patterns and word frequencies) the authors Neighbours, Support Vector Machines, Decision Trees, and Ran- demonstrate how tweets including #JeNeSuisPasCharlie rather re- dom Forests. The paper presents an interesting investigation, clas- semble crisis communication patterns, and at the same time support sifying users as spammers (or not), as opposed to existing work different expressions of self-identity such as grief and resistance. which attempts to classify content as spam (or not). Coelho, Lapa, Ramos & Malini, in A Research Design for the Anal- In User Interest Modeling in Twitter with Named Entity Recogni- ysis of Contemporary Social Movements, present a research method tion, Karatay & Karagoz explore techniques for user profiling using to identify elements that promote social empowerment in the politi- Named Entity detection in tweets – a topic of increasing importance cal vitality present in digital culture. They developed a model of in- in the era of information overload, where filtering and personalising vestigation that allows discursive analysis of posts generated within information is crucial for user engagement and experience. The in- net activist groups. Methods, instruments and resources were cre- depth view of appropriate techniques and issues related to Named ated and articulated for the collection and treatment of big data and Entity-based user profiling on Twitter will interest both academic for further qualitative analysis of content. In addition to contribut- and industrial audiences. ing to ICT, by proposing a qualitative investigation of social net- works, this research design contributes to the field of Education, as Within the broader area of spam, misconduct and automated ac- the results of its application can be used be used to develop guide- counts on Twitter, Edwards & Guy study the Connections between lines for teachers, to support critical appropriation and education of Twitter Spammer Categories. Unlike most other work in this area, social networks. they do not only distinguish spam from non-spam, but assume there are different types of spam accounts, which they categorise as “ad- vertising”, “explicit”, “follower gain”, “celebrity” and “bot”. They show, in their extended abstract, that each type of spammer behaves differently with respect to establishing follower relations with other spam accounts. They also observe that genuine Twitter users can be found as followers of all types of spam accounts, but are more likely to connect with specific types of spammers. Named Entity rEcognition & Linking (NEEL) Agarwal & Sureka, in A Topical Crawler for Uncovering Hidden Communities of Extremist Micro-Bloggers on Tumblr, discuss the Challenge use of microblogging systems such as Tumblr to promote extrem- The #Microposts2015 NEEL challenge again increased in com- ism, taking advantage of the ability to post information anony- plexity, to address further challenges encountered in the analysis mously. The poster paper describes a process that uses pre-identified of Micropost data. This year’s challenge required participants to keywords to flag relevant posts, and hence, identify suspect tags in recognise entities and their types, and also link them, where found, textual posts. A random walk from a seed blogger is then used to to corresponding DBpedia resources. ii · #Microposts2015 · 5th Workshop on Making Sense of Microposts · @WWW2015 The challenge attracted good interest from the community, with 29 most representative DBpedia reference entity, and, therefore, type intents to submit, out of which 21 applied for the final evaluation. of each candidate mention, normalising the type via manual align- Seven took part in the quantitative evaluation and six completed ment from the DBpedia ontology and the NEEL taxonomy. Finally, submission (including a written abstract). Of these three were ac- Gârbacea et al. solve the NIL using a clustering algorithm operat- cepted for presentation and a further three as posters. All accepted ing on the lexical similarity of the candidate mentions for which no submissions also took part in the workshop’s poster session, whose counterparts are found in DBpedia. aim is to exhibit practical application in the field and foster further discussion about the ways in which knowledge content is extracted Basile, Caputo & Semeraro in UNIBA: Exploiting a Distributional from Microposts and reused. Semantic Model for Disambiguating and Linking Entities in Tweets, introduce an unsupervised approach which uses a modified ver- The NEEL challenge was chaired by A. Elizabeth Cano and Giuse- sion of their Lesk algorithm. Basile et al. use similarity of “dis- ppe Rizzo, with Andrea Varga and Bianca Pereira as dataset chairs. tributional semantic spaces” for disambiguation, and two alterna- As in previous years, the challenge committee prepared a gold stan- tive and state of the art approaches for the candidate identification dard from the challenge corpus, which covered events in 2011, ’13 phase, based on either POS tagging or n-gram similarity. Entities & 14 on, for example, the London Riots, the Oslo bombing and are typed through inheritance of the type of the DBpedia reference the UCI Cyclo-cross World Cup. Changes to the submission and entity pointed to, which is in turn manually aligned to the NEEL evaluation protocols included wrapping submissions as a publicly taxonomy. accessible, REST-based service. Up to ten runs were allowed per submission, of which the best three were used in computing the fi- In AMRITA - CEN@NEEL: Identification and Linking of Twitter nal rankings, using four weighted metrics: tagging (0.3), linking Entities, Barathi Ganesh, Abinaya, Anand Kumar, Soman & Vinay- (0.3), clustering (0.4) and latency (computation time) to sort in case kumar address the NEEL task sequentially by, first, tokenising and of a tie. tagging the tweets using TwitIE. They then classify entity mentions by applying supervised learning using direct (POS tags) and indi- We provide here a brief introduction to participants’ abstracts de- rect features (the two words before and after a candidate mention scribing their submissions, and more detail about the preparation entity). Using a total of 34 lexical features, the authors experiment and evaluation processes in the challenge summary paper included with three supervised learning algorithms to determine the recog- in the proceedings. nition configuration that would achieve the best performance in the development test. Barathi Ganesh et al. tackle the linking task by looking up DBpedia reference entries; that maximising the similar- Yamada, Takeda & Takefuji, in An End-to-End Entity Linking Ap- ity score between related entries and the named entities is desig- proach for Tweets, present a five stage approach: (1) preprocess- nated the representative. Named entities without related links are ing, (2) candidate mention generation, (3) mention detection and assigned as NIL. disambiguation, (4) NIL mention detection and (5) type prediction. In preprocessing, they utilise tokenisation and POS tagging based Finally, Sinha & Barik, in Named Entity Extraction and Linking in on state of the art algorithms, along with extraction of tweet times- #Microposts, present a sequential approach to the NEEL task which tamps. Yamada et al. tackle candidate mention generation and dis- recognises entities and then links them. The first stage is grounded ambiguation using fuzzy search of Wikipedia for candidate entity on linguistic clues extracted from conventional approaches such as mentions, and popularity of Wikipedia pages for ranking the set of POS tagging, word capitalisation and hashtag in the tweet. They candidate entities. Finally, they tackle selection of NIL mentions then train a CRF with the linguistic features and the contextual sim- and entity typing as supervised learning problems. ilarity of adjacent tokens, with the token window set to 5. Priyanka & Barik perform the linking task using an entity resolution mech- In Entity Recognition and Linking on Tweets with Random Walks, anism that takes as input the output of the NER stage and that of Guo & Barbosa present a sequential approach to the NEEL task DBpedia Spotlight. For each entity returned from DBpedia Spot- by, first, recognising entities using TwitIE, and then linking them light found to be a substring of any of the entities extracted in the to corresponding DBpedia entities. Starting from the (DBpedia) NER stage and for which a substring match is found, the corre- candidate entities, Guo & Barbosa build a subgraph by adding all sponding URI is returned and assigned to it. Otherwise the entity adjacent entities to the candidates. They execute a personalised is assigned as NIL. PageRank, giving more importance to unambiguous entities. They then measure semantic relatedness between entity candidates and the “unambiguous” entities for the “document”, and employ thresh- old and name similarity for NIL prediction and clustering. In the submission Combining Multiple Signals for Semanticizing Tweets: University of Amsterdam at #Microposts2015, Gârbacea, Odijk, Graus, Sijaranamual & de Rijke employ a sequential ap- proach composed of four stages: (1) candidate mention detection, (2) candidate typing and linking, (3) NIL clustering and (4) over- lap resolution. The first stage is tackled with an annotation-based process that takes as input the lexical content of Wikipedia and an NER classifier trained using the challenge dataset. To resolve can- didate mention overlaps, the authors propose an algorithm based on the results of the linking stage and the Viterbi path resolution output. A “learning to rank” supervised model is used to select the iii · #Microposts2015 · 5th Workshop on Making Sense of Microposts · @WWW2015 Workshop Awards Additional Material The call for participation and all paper, poster and challenge ab- stracts are available on the #Microposts2015 website3 . The full Main Track. The #Microposts2015 best paper award went to: proceedings are also available on the CEUR-WS server, as Vol- 13954 . The gold standard for the NEEL Challenge is available for Cedric De Boom, Steven Van Canneyt & Bart Dhoedt download5 . for their submission entitled: The proceedings for #Microposts2014 are available as Vol-11416 Semantics-driven Event Clustering in Twit- The proceedings for the #MSM2013 main track are available as ter Feeds part of the WWW’13 Proceedings Companion7 . The #MSM2013 Concept Extraction Challenge proceedings are published as a sepa- rate volume as CEUR Vol-10198 , and the gold standard is available for download9 . The proceedings for #MSM2012 and #MSM2011 are available as CEUR Vol-83810 . and CEUR Vol-71811 , respec- Social Sciences Track. GESIS1 , the Leibniz Institute for the tively. Social Sciences, sponsored the best paper award for the Social Sci- ences track. We teamed up with GESIS, the largest service and in- frastructure institution for the Social Sciences in Germany, to high- light the role of interdisciplinary approaches in obtaining a better understanding of the users behind social media and Microposts. As in the main track, the decision was guided by nominations from the reviewers and review scores. The #Microposts2015 Social Sci- ences Track best paper award went to: Fabio Giglietto & Yenn Lee for their submission entitled: To Be or Not to Be Charlie: Twitter Hash- tags as a Discourse and Counter-discourse in the Aftermath of the 2015 Charlie Hebdo Shooting in France NEEL Challenge. SpazioDati2 , an Italian startup who took part in the #Microposts2014 NEEL challenge, sponsored the award for the best submission. SpazioDati aim to provide access to a single source of common-sense knowledge, mined and synthesised from a large number of open and closed data sources. By sponsoring the challenge, SpazioDati reinforce the value in the content of the increasingly large knowledge source that is Micropost data. The challenge award was also determined by the results of the quanti- tative evaluation. The #Microposts NEEL Challenge award went to: Ikuya Yamada, Hideaki Takeda & Yoshiyasu Takefuji for their submission entitled: 3 An End-to-End Entity Linking Approach http://www.scc.lancs.ac.uk/microposts2015 4 for Tweets #Microposts2015 Proc. http://ceur-ws.org/Vol-1395 5 http://ceur-ws.org/Vol-1395/microposts2015_ neel-challenge-report/microposts2015-neel_ challenge_gs.zip 6 #Microposts2014 Proc. http://ceur-ws.org/Vol-1141 7 WWW’13 Companion: http://dl.acm.org/citation. cfm?id=2487788 8 #MSM2013 CE Challenge Proc. http://ceur-ws.org/ Vol-1019 9 http://ceur-ws.org/Vol-1019/msm2013-ce_ challenge_gs.zip 1 10 http://www.gesis.org #MSM2012 Proc. http://ceur-ws.org/Vol-838 2 11 http://spaziodati.eu #MSM2011 Proc. http://ceur-ws.org/Vol-718 iv · #Microposts2015 · 5th Workshop on Making Sense of Microposts · @WWW2015 Main Track Programme Committee NEEL Challenge Evaluation Committee Pierpaolo Basile University of Bari, Italy Gabriele Antonelli SpazioDati, Italy Julie Birkholz CHEGG, Ghent University, Belgium Ebrahim Bagheri Ryerson University, Canada John Breslin National University of Ireland Galway, Ireland Pierpaolo Basile University of Bari, Italy A. Elizabeth Cano KMi, The Open University, UK Grégoire Burel KMi, The Open University, UK Marco A. Casanova Pontifícia Universidade Católica do Rio de Óscar Corcho Universidad Politécnica de Madrid, Spain Janeiro, Brazil Leon Derczynski The University of Sheffield, UK Óscar Corcho Universidad Politécnica de Madrid, Spain Milan Dojchinovski Czech Technical University in Prague, Czech Guillaume Erétéo Vigiglobe, France Republic Miriam Fernandez KMi, The Open University, UK Guillaume Erétéo Vigiglobe, France Andrés Garcia-Silva Universidad Politécnica de Madrid, Spain Andrés Garcia-Silva Universidad Politécnica de Madrid, Spain Anna Lisa Gentile The University of Sheffield, UK Anna Lisa Gentile The University of Sheffield, UK Jelena Jovanovic University of Belgrade, Serbia Miguel Martinez-Alvarez Signal, UK Mathieu Lacage Alcméon, France José M. Morales del Castillo El Colegio de México, Mexico Philipe Laublet Université Paris-Sorbonne, France Bernardo Pereira Nunes Pontifícia Universidade Católica do Rio José M. Morales del Castillo El Colegio de México, Mexico de Janeiro, Brazil Fabrizio Orlandi University of Bonn, Germany Daniel Preoţiuc-Pietro University of Pennsylvania, USA Bernardo Pereira Nunes Pontifícia Universidade Católica do Rio Giles Reger Otus Labs, UK de Janeiro, Brazil Irina Temnikova Qatar Computing Research Institute, Qatar Danica Radovanović University of Belgrade, Serbia Victoria Uren Aston Business School, UK Guiseppe Rizzo Eurecom, France Harald Sack HPI, University of Potsdam, Germany Bernhard Schandl mySugr GmbH, Austria Sean W. M. Siqueira Universidade Federal do Estado do Rio de Janeiro, Brazil Victoria Uren Aston Business School, UK Andrea Varga Swiss Re, UK Katrin Weller GESIS Leibniz Institute for the Social Sciences, Germany Alistair Willis The Open University, UK Ziqi Zhang The University of Sheffield, UK Sub Reviewers Tamara Bobic HPI, University of Potsdam, Germany Social Sciences Track Programme Committee Gholam R. Amin Sultan Qaboos University, Oman Julie Birkholz CHEGG, Ghent University, Belgium Tim Davies University of Southampton, UK Munmun De Choudhury Georgia Tech, USA Ali Emrouznejad Aston Business School, UK Fabio Giglietto Università di Urbino Carlo Bo, Italy Simon Hegelich Universität Siegen, Germany Kim Holmberg University of Turku, Finland Athina Karatzogianni University of Leicester, UK José M. Morales del Castillo El Colegio de México, Mexico Raquel Recuero Universidade Católica de Pelotas, Brazil Bianca C. Reisdorf University of Leicester, UK Luca Rossi Università di Urbino Carlo Bo, Italy Saskia Vanmanen The Open University, UK Alistair Willis The Open University, UK Taha Yasseri University of Oxford, UK Victoria Uren Aston Business School, UK v · #Microposts2015 · 5th Workshop on Making Sense of Microposts · @WWW2015