Preface


 #Microposts2015, the 5th Workshop on Making Sense of Microp-           moment, breaking news, local and context-specific information and
 osts, was held in Florence, Italy, on the 18th of May 2015, during     personal stories, resulted in an increased sense of community and
 (WWW’15), the 24th International Conference on the World Wide          solidarity. Interestingly, in response to emergencies, mass demon-
 Web. The #Microposts journey started at the 8th Extended Se-           strations and other social events such as festivals and conferences,
 mantic Web Conference (ESWC 2011, as #MSM, with the change             when regular access to communication services is often interrupted
 in acronym from 2014), and moved to WWW in 2012, where it              and/or unreliable, developers are quick to offer alternatives that end
 has stayed, for the fourth year now. #Microposts2015 continues to      users piggyback on to post information. Line was born to serve
 highlight the importance of the medium, as we see end users appro-     such a need, to provide an alternative communication service and
 priating Microposts, small chunks of information published online      support emergency response during a natural disaster in Japan in
 with minimal effort, as part of daily communication and to interact    2011. Its popularity continued beyond its initial purpose, and Line
 with increasingly wider networks and new publishing arenas.            has grown into a popular (regional) microblogging service.
 The #Microposts workshops are unique in that they solicit par-         The #Microposts workshop was created to bring together research-
 ticipation not just from Computer Science, but encourage inter-        ers in different fields studying the publication, analysis and reuse
 disciplinary work. We welcome research that looks at computa-          of these very small chunks of information, shared in private, semi-
 tional analysis of Microposts, as well as studies that employ mixed    public and fully open, social and formal networks. Microposts col-
 methods, and also those that examine the human generating and          lectively make up a vast knowledge store, contained in what is to-
 consuming Microposts and interacting with other users via this         day described as “big data” – heterogenous, increasing at phenom-
 publishing venue. New to #Microposts2015 is a dedicated So-            enal rates, and with multiple, unbridled authors, covering myriad
 cial Sciences track, to encourage, particularly, contribution from     topics with varying degrees of accuracy and veracity. With each
 the Social Sciences, to harness the advantages that approaches to      year we have seen submissions tackling different aspects of Micro-
 analysing Microposts from this perspective bring to the field.         posts, with new methods and techniques developed to analyse this
                                                                        valuable dataset and also its publishers, human or bot, and exam-
 The term Micropost now rarely needs definition. Microposts are         ining the different ways in which the medium is used. With the
 here to stay, and have evolved from text only, to include images,      increase in the use of Microposts as a portal to other services, we
 and now, audio and video. New platforms are developed each year        saw, this year, studies on the detection and analysis of spam, and the
 to serve specific markets, and niche services compete with each
                                                                        use of open posting as a cover for disseminating extremist opinions
 other for a share of the audience. Twitter’s Periscope is a new
                                                                        or to swamp dissenting views. Reflecting the very social nature of
 service similar to Meerkat, both of which use microblogging plat-      the publishing platform, submissions also covered analysis of the
 forms to alert a network to a live video stream. Microposts now        human reaction to recent, provoking news events.
 often serve also as a portal, and are harnessed by recommendation
 services, marketing and other enterprise to advertise or push in-      We thank all contributors and participants: each author’s work adds
 formation, products and services on other platforms. This is a not     to research that continues to advance the field. Submissions to the
 surprising means to access potential users, who now exchange Mi-       two research tracks came from institutions in ten countries around
 croposts round the clock, using a variety of publishing platforms.     the world. The challenge also continues to see wide interest, with
 Media trends show that users are doing so increasingly from per-       final submissions from academia and industry, across six countries.
 sonal, mobile devices, as a preferred/convenient option that started   Our programme committee is even more varied, working in acade-
 to overtake usage on PCs in 2014. To extend reach, both in de-         mia, independent research institutions and industry, and spanning
 veloped and emerging markets, services for publishing Microposts       an even larger number of countries. Most of our PC have reviewed
 from feature phones are being developed – these include the usual      for more than one, and a good percentage, all five #Microposts
 suspects, Twitter and Facebook, who employ native apps or the          workshops. Very special thanks to our committee, without whom
 mobile web, and also newer entrants with dedicated services and        we would not be able to run the workshop – their dedication is
 apps such as Saya. Country and language-specific platforms such        seen in the feedback provided to us and to authors. Thanks also to
 as Sina Weibo, while not as widespread, serve a specific region and    the chairs of the Social Sciences Track and the NEEL Challenge,
 market, especially where any of a number of reasons prevent access     whose work has been invaluable in pulling the three parts together
 to the more well-known microblogging platforms. Political move-        into a unified, successful workshop.
 ments such as the Arab Spring have been reported to have increased     Matthew Rowe Lancaster University, UK
 the use of social media services and microblogging particularly in     Milan Stankovic Sépage / Université Paris-Sorbonne, France
 regions concerned, as the quick, low-cost means for sharing, in the    Aba-Sah Dadzie KMi, The Open University, UK
                                                                          #Microposts2015 Organising Committee, May 2015


                                                                                                                                           i
· #Microposts2015 · 5th Workshop on Making Sense of Microposts · @WWW2015
 Introduction to the Proceedings                                           identify further individuals and communities promoting extremism.
                                                                           The authors report misclassification of 13% and accuracy of 77%
 Main Track                                                                for predicting “hate promoting bloggers”, with misclassification of
                                                                           unknown bloggers at 34%.
 The main workshop track attracted nine submissions, out of which
 two long papers and one short were accepted, in addition to an ex-
 tended abstract and a poster. It should be noted that two of these
 crossed the boundary between Computer and Social Sciences, and
 were therefore assigned reviewers from both tracks. Topics cov-
 ered ranged from machine learning and named entity recognition
 to Micropost classification and extraction. Applications were seen
 in topic, event and spam detection. We provide a brief introduction
 to each below.                                                            Social Sciences Track
                                                                           The Social Sciences track attracted three submissions, of which two
 De Boom, Van Canneyt & Dhoedt, in Semantics-driven Event Clus-            were accepted. In addition to data mining and/or statistical analysis
 tering in Twitter Feeds, present a novel perspective on event-detection   over the very large amounts of data involved, each submission car-
 in tweets, by associating semantics to tweets and hashtags. They          ried out in-depth, qualitative analysis to tease out nuanced informa-
 demonstrate how an approach that combines machine learning with           tion that is more difficult to identify with automated methods. The
 explicit semantics detection can yield considerable improvement           track was chaired by Katrin Weller and Danica Radovanović.
 over state of the art event clustering approaches.
                                                                           One of the major contemporary events that spiked user engagement
 In the paper Making the Most of Tweet-Inherent Features for So-           on social media during the first months of 2015 was the Charlie
 cial Spam Detection on Twitter, Wang, Zubiaga, Liakata & Procter          Hebdo shooting in France on January 7th. Giglietto & Lee pro-
 investigate the use of a variety of feature sets and classifiers for      vide one of the first studies of Twitter users’ reactions to this event,
 the detection of social spam on Twitter. These include user fea-          in To Be or Not to Be Charlie: Twitter Hashtags as a Discourse
 tures (social network properties of the tweeter, such as their in- and    and Counter-discourse in the Aftermath of the 2015 Charlie Hebdo
 out-degrees); content features (number of hashtags and mentions);         Shooting in France. In particular, they study the use of the hash-
 n-gram features (mined from textual aspects); and sentiment fea-          tag #JeNeSuisPasCharlie, which was used in contrast to the initial
 tures, based on both manually and automatically created seman-            #JeSuisCharlie hashtag. Using different approaches to data anal-
 tic lexicons. Classifiers tested including naïve Bayes; k-Nearest         ysis (including activity patterns and word frequencies) the authors
 Neighbours, Support Vector Machines, Decision Trees, and Ran-             demonstrate how tweets including #JeNeSuisPasCharlie rather re-
 dom Forests. The paper presents an interesting investigation, clas-       semble crisis communication patterns, and at the same time support
 sifying users as spammers (or not), as opposed to existing work           different expressions of self-identity such as grief and resistance.
 which attempts to classify content as spam (or not).
                                                                           Coelho, Lapa, Ramos & Malini, in A Research Design for the Anal-
 In User Interest Modeling in Twitter with Named Entity Recogni-           ysis of Contemporary Social Movements, present a research method
 tion, Karatay & Karagoz explore techniques for user profiling using       to identify elements that promote social empowerment in the politi-
 Named Entity detection in tweets – a topic of increasing importance       cal vitality present in digital culture. They developed a model of in-
 in the era of information overload, where filtering and personalising     vestigation that allows discursive analysis of posts generated within
 information is crucial for user engagement and experience. The in-        net activist groups. Methods, instruments and resources were cre-
 depth view of appropriate techniques and issues related to Named          ated and articulated for the collection and treatment of big data and
 Entity-based user profiling on Twitter will interest both academic        for further qualitative analysis of content. In addition to contribut-
 and industrial audiences.                                                 ing to ICT, by proposing a qualitative investigation of social net-
                                                                           works, this research design contributes to the field of Education, as
 Within the broader area of spam, misconduct and automated ac-             the results of its application can be used be used to develop guide-
 counts on Twitter, Edwards & Guy study the Connections between            lines for teachers, to support critical appropriation and education of
 Twitter Spammer Categories. Unlike most other work in this area,          social networks.
 they do not only distinguish spam from non-spam, but assume there
 are different types of spam accounts, which they categorise as “ad-
 vertising”, “explicit”, “follower gain”, “celebrity” and “bot”. They
 show, in their extended abstract, that each type of spammer behaves
 differently with respect to establishing follower relations with other
 spam accounts. They also observe that genuine Twitter users can
 be found as followers of all types of spam accounts, but are more
 likely to connect with specific types of spammers.
                                                                           Named Entity rEcognition & Linking (NEEL)
 Agarwal & Sureka, in A Topical Crawler for Uncovering Hidden
 Communities of Extremist Micro-Bloggers on Tumblr, discuss the
                                                                           Challenge
 use of microblogging systems such as Tumblr to promote extrem-            The #Microposts2015 NEEL challenge again increased in com-
 ism, taking advantage of the ability to post information anony-           plexity, to address further challenges encountered in the analysis
 mously. The poster paper describes a process that uses pre-identified     of Micropost data. This year’s challenge required participants to
 keywords to flag relevant posts, and hence, identify suspect tags in      recognise entities and their types, and also link them, where found,
 textual posts. A random walk from a seed blogger is then used to          to corresponding DBpedia resources.


                                                                                                                                               ii
· #Microposts2015 · 5th Workshop on Making Sense of Microposts · @WWW2015
 The challenge attracted good interest from the community, with 29         most representative DBpedia reference entity, and, therefore, type
 intents to submit, out of which 21 applied for the final evaluation.      of each candidate mention, normalising the type via manual align-
 Seven took part in the quantitative evaluation and six completed          ment from the DBpedia ontology and the NEEL taxonomy. Finally,
 submission (including a written abstract). Of these three were ac-        Gârbacea et al. solve the NIL using a clustering algorithm operat-
 cepted for presentation and a further three as posters. All accepted      ing on the lexical similarity of the candidate mentions for which no
 submissions also took part in the workshop’s poster session, whose        counterparts are found in DBpedia.
 aim is to exhibit practical application in the field and foster further
 discussion about the ways in which knowledge content is extracted         Basile, Caputo & Semeraro in UNIBA: Exploiting a Distributional
 from Microposts and reused.                                               Semantic Model for Disambiguating and Linking Entities in Tweets,
                                                                           introduce an unsupervised approach which uses a modified ver-
 The NEEL challenge was chaired by A. Elizabeth Cano and Giuse-            sion of their Lesk algorithm. Basile et al. use similarity of “dis-
 ppe Rizzo, with Andrea Varga and Bianca Pereira as dataset chairs.        tributional semantic spaces” for disambiguation, and two alterna-
 As in previous years, the challenge committee prepared a gold stan-       tive and state of the art approaches for the candidate identification
 dard from the challenge corpus, which covered events in 2011, ’13         phase, based on either POS tagging or n-gram similarity. Entities
 & 14 on, for example, the London Riots, the Oslo bombing and              are typed through inheritance of the type of the DBpedia reference
 the UCI Cyclo-cross World Cup. Changes to the submission and              entity pointed to, which is in turn manually aligned to the NEEL
 evaluation protocols included wrapping submissions as a publicly          taxonomy.
 accessible, REST-based service. Up to ten runs were allowed per
 submission, of which the best three were used in computing the fi-        In AMRITA - CEN@NEEL: Identification and Linking of Twitter
 nal rankings, using four weighted metrics: tagging (0.3), linking         Entities, Barathi Ganesh, Abinaya, Anand Kumar, Soman & Vinay-
 (0.3), clustering (0.4) and latency (computation time) to sort in case    kumar address the NEEL task sequentially by, first, tokenising and
 of a tie.                                                                 tagging the tweets using TwitIE. They then classify entity mentions
                                                                           by applying supervised learning using direct (POS tags) and indi-
 We provide here a brief introduction to participants’ abstracts de-       rect features (the two words before and after a candidate mention
 scribing their submissions, and more detail about the preparation         entity). Using a total of 34 lexical features, the authors experiment
 and evaluation processes in the challenge summary paper included          with three supervised learning algorithms to determine the recog-
 in the proceedings.                                                       nition configuration that would achieve the best performance in the
                                                                           development test. Barathi Ganesh et al. tackle the linking task by
                                                                           looking up DBpedia reference entries; that maximising the similar-
 Yamada, Takeda & Takefuji, in An End-to-End Entity Linking Ap-            ity score between related entries and the named entities is desig-
 proach for Tweets, present a five stage approach: (1) preprocess-         nated the representative. Named entities without related links are
 ing, (2) candidate mention generation, (3) mention detection and          assigned as NIL.
 disambiguation, (4) NIL mention detection and (5) type prediction.
 In preprocessing, they utilise tokenisation and POS tagging based         Finally, Sinha & Barik, in Named Entity Extraction and Linking in
 on state of the art algorithms, along with extraction of tweet times-     #Microposts, present a sequential approach to the NEEL task which
 tamps. Yamada et al. tackle candidate mention generation and dis-         recognises entities and then links them. The first stage is grounded
 ambiguation using fuzzy search of Wikipedia for candidate entity          on linguistic clues extracted from conventional approaches such as
 mentions, and popularity of Wikipedia pages for ranking the set of        POS tagging, word capitalisation and hashtag in the tweet. They
 candidate entities. Finally, they tackle selection of NIL mentions        then train a CRF with the linguistic features and the contextual sim-
 and entity typing as supervised learning problems.                        ilarity of adjacent tokens, with the token window set to 5. Priyanka
                                                                           & Barik perform the linking task using an entity resolution mech-
 In Entity Recognition and Linking on Tweets with Random Walks,            anism that takes as input the output of the NER stage and that of
 Guo & Barbosa present a sequential approach to the NEEL task              DBpedia Spotlight. For each entity returned from DBpedia Spot-
 by, first, recognising entities using TwitIE, and then linking them       light found to be a substring of any of the entities extracted in the
 to corresponding DBpedia entities. Starting from the (DBpedia)            NER stage and for which a substring match is found, the corre-
 candidate entities, Guo & Barbosa build a subgraph by adding all          sponding URI is returned and assigned to it. Otherwise the entity
 adjacent entities to the candidates. They execute a personalised          is assigned as NIL.
 PageRank, giving more importance to unambiguous entities. They
 then measure semantic relatedness between entity candidates and
 the “unambiguous” entities for the “document”, and employ thresh-
 old and name similarity for NIL prediction and clustering.

 In the submission Combining Multiple Signals for Semanticizing
 Tweets: University of Amsterdam at #Microposts2015, Gârbacea,
 Odijk, Graus, Sijaranamual & de Rijke employ a sequential ap-
 proach composed of four stages: (1) candidate mention detection,
 (2) candidate typing and linking, (3) NIL clustering and (4) over-
 lap resolution. The first stage is tackled with an annotation-based
 process that takes as input the lexical content of Wikipedia and an
 NER classifier trained using the challenge dataset. To resolve can-
 didate mention overlaps, the authors propose an algorithm based
 on the results of the linking stage and the Viterbi path resolution
 output. A “learning to rank” supervised model is used to select the


                                                                                                                                           iii
· #Microposts2015 · 5th Workshop on Making Sense of Microposts · @WWW2015
 Workshop Awards                                                         Additional Material
                                                                         The call for participation and all paper, poster and challenge ab-
                                                                         stracts are available on the #Microposts2015 website3 . The full
 Main Track. The #Microposts2015 best paper award went to:               proceedings are also available on the CEUR-WS server, as Vol-
                                                                         13954 . The gold standard for the NEEL Challenge is available for
        Cedric De Boom, Steven Van Canneyt & Bart Dhoedt                 download5 .
        for their submission entitled:
                                                                         The proceedings for #Microposts2014 are available as Vol-11416
             Semantics-driven Event Clustering in Twit-                  The proceedings for the #MSM2013 main track are available as
             ter Feeds                                                   part of the WWW’13 Proceedings Companion7 . The #MSM2013
                                                                         Concept Extraction Challenge proceedings are published as a sepa-
                                                                         rate volume as CEUR Vol-10198 , and the gold standard is available
                                                                         for download9 . The proceedings for #MSM2012 and #MSM2011
                                                                         are available as CEUR Vol-83810 . and CEUR Vol-71811 , respec-
 Social Sciences Track. GESIS1 , the Leibniz Institute for the           tively.
 Social Sciences, sponsored the best paper award for the Social Sci-
 ences track. We teamed up with GESIS, the largest service and in-
 frastructure institution for the Social Sciences in Germany, to high-
 light the role of interdisciplinary approaches in obtaining a better
 understanding of the users behind social media and Microposts. As
 in the main track, the decision was guided by nominations from
 the reviewers and review scores. The #Microposts2015 Social Sci-
 ences Track best paper award went to:


        Fabio Giglietto & Yenn Lee
        for their submission entitled:
             To Be or Not to Be Charlie: Twitter Hash-
             tags as a Discourse and Counter-discourse
             in the Aftermath of the 2015 Charlie Hebdo
             Shooting in France


 NEEL Challenge. SpazioDati2 , an Italian startup who took part
 in the #Microposts2014 NEEL challenge, sponsored the award for
 the best submission. SpazioDati aim to provide access to a single
 source of common-sense knowledge, mined and synthesised from
 a large number of open and closed data sources. By sponsoring
 the challenge, SpazioDati reinforce the value in the content of the
 increasingly large knowledge source that is Micropost data. The
 challenge award was also determined by the results of the quanti-
 tative evaluation. The #Microposts NEEL Challenge award went
 to:


        Ikuya Yamada, Hideaki Takeda & Yoshiyasu Takefuji
        for their submission entitled:
                                                                         3
             An End-to-End Entity Linking Approach                          http://www.scc.lancs.ac.uk/microposts2015
                                                                         4
             for Tweets                                                     #Microposts2015 Proc. http://ceur-ws.org/Vol-1395
                                                                          5
                                                                            http://ceur-ws.org/Vol-1395/microposts2015_
                                                                          neel-challenge-report/microposts2015-neel_
                                                                          challenge_gs.zip
                                                                          6
                                                                            #Microposts2014 Proc. http://ceur-ws.org/Vol-1141
                                                                          7
                                                                            WWW’13 Companion: http://dl.acm.org/citation.
                                                                          cfm?id=2487788
                                                                          8
                                                                            #MSM2013 CE Challenge Proc. http://ceur-ws.org/
                                                                          Vol-1019
                                                                          9
                                                                            http://ceur-ws.org/Vol-1019/msm2013-ce_
                                                                          challenge_gs.zip
 1                                                                       10
     http://www.gesis.org                                                   #MSM2012 Proc. http://ceur-ws.org/Vol-838
 2                                                                       11
     http://spaziodati.eu                                                   #MSM2011 Proc. http://ceur-ws.org/Vol-718


                                                                                                                                       iv
· #Microposts2015 · 5th Workshop on Making Sense of Microposts · @WWW2015
 Main Track Programme Committee                                   NEEL Challenge Evaluation Committee
 Pierpaolo Basile University of Bari, Italy                       Gabriele Antonelli SpazioDati, Italy
 Julie Birkholz CHEGG, Ghent University, Belgium                  Ebrahim Bagheri Ryerson University, Canada
 John Breslin National University of Ireland Galway, Ireland      Pierpaolo Basile University of Bari, Italy
 A. Elizabeth Cano KMi, The Open University, UK                   Grégoire Burel KMi, The Open University, UK
 Marco A. Casanova Pontifícia Universidade Católica do Rio de     Óscar Corcho Universidad Politécnica de Madrid, Spain
       Janeiro, Brazil                                            Leon Derczynski The University of Sheffield, UK
 Óscar Corcho Universidad Politécnica de Madrid, Spain            Milan Dojchinovski Czech Technical University in Prague, Czech
 Guillaume Erétéo Vigiglobe, France                                     Republic
 Miriam Fernandez KMi, The Open University, UK                    Guillaume Erétéo Vigiglobe, France
 Andrés Garcia-Silva Universidad Politécnica de Madrid, Spain     Andrés Garcia-Silva Universidad Politécnica de Madrid, Spain
 Anna Lisa Gentile The University of Sheffield, UK                Anna Lisa Gentile The University of Sheffield, UK
 Jelena Jovanovic University of Belgrade, Serbia                  Miguel Martinez-Alvarez Signal, UK
 Mathieu Lacage Alcméon, France                                   José M. Morales del Castillo El Colegio de México, Mexico
 Philipe Laublet Université Paris-Sorbonne, France                Bernardo Pereira Nunes Pontifícia Universidade Católica do Rio
 José M. Morales del Castillo El Colegio de México, Mexico              de Janeiro, Brazil
 Fabrizio Orlandi University of Bonn, Germany                     Daniel Preoţiuc-Pietro University of Pennsylvania, USA
 Bernardo Pereira Nunes Pontifícia Universidade Católica do Rio   Giles Reger Otus Labs, UK
       de Janeiro, Brazil                                         Irina Temnikova Qatar Computing Research Institute, Qatar
 Danica Radovanović University of Belgrade, Serbia               Victoria Uren Aston Business School, UK
 Guiseppe Rizzo Eurecom, France
 Harald Sack HPI, University of Potsdam, Germany
 Bernhard Schandl mySugr GmbH, Austria
 Sean W. M. Siqueira Universidade Federal do Estado do Rio de
       Janeiro, Brazil
 Victoria Uren Aston Business School, UK
 Andrea Varga Swiss Re, UK
 Katrin Weller GESIS Leibniz Institute for the Social Sciences,
       Germany
 Alistair Willis The Open University, UK
 Ziqi Zhang The University of Sheffield, UK


 Sub Reviewers

 Tamara Bobic HPI, University of Potsdam, Germany


 Social Sciences Track Programme Committee
 Gholam R. Amin Sultan Qaboos University, Oman
 Julie Birkholz CHEGG, Ghent University, Belgium
 Tim Davies University of Southampton, UK
 Munmun De Choudhury Georgia Tech, USA
 Ali Emrouznejad Aston Business School, UK
 Fabio Giglietto Università di Urbino Carlo Bo, Italy
 Simon Hegelich Universität Siegen, Germany
 Kim Holmberg University of Turku, Finland
 Athina Karatzogianni University of Leicester, UK
 José M. Morales del Castillo El Colegio de México, Mexico
 Raquel Recuero Universidade Católica de Pelotas, Brazil
 Bianca C. Reisdorf University of Leicester, UK
 Luca Rossi Università di Urbino Carlo Bo, Italy
 Saskia Vanmanen The Open University, UK
 Alistair Willis The Open University, UK
 Taha Yasseri University of Oxford, UK
 Victoria Uren Aston Business School, UK


                                                                                                                            v
· #Microposts2015 · 5th Workshop on Making Sense of Microposts · @WWW2015