=Paper=
{{Paper
|id=Vol-2482/paper37
|storemode=property
|title=Stance Classification for Rumour Analysis in Twitter: Exploiting Affective Information and Conversation Structure
|pdfUrl=https://ceur-ws.org/Vol-2482/paper37.pdf
|volume=Vol-2482
|authors=Endang Wahyu Pamungkas,Valerio Basile,Viviana Patti
|dblpUrl=https://dblp.org/rec/conf/cikm/PamungkasBP18
}}
==Stance Classification for Rumour Analysis in Twitter: Exploiting Affective Information and Conversation Structure==
Stance Classification for Rumour Analysis in Twitter: Exploiting Affective Information and Conversation Structure Endang Wahyu Pamungkas, Valerio Basile, Viviana Patti Dipartimento di Informatica Università degli Studi di Torino {pamungka, basile, patti}@di.unito.it 1 Introduction Nowadays, people increasingly tend to use social media Abstract like Facebook and Twitter as their primary source of information and news consumption. There are several Analysing how people react to rumours associ- reasons behind this tendency, such as the simplicity ated with news in social media is an important to gather and share the news and the possibility of task to prevent the spreading of misinforma- staying abreast of the latest news and updated faster tion, which is nowadays widely recognized as than with traditional media. An important factor is a dangerous tendency. In social media con- also that people can be engaged in conversations on versations, users show different stances and the latest breaking news with their contacts by us- attitudes towards rumourous stories. Some ing these platforms. Pew Research Center’s newest users take a definite stance, supporting or report1 shows that two-thirds of U.S. adults gather denying the rumour at issue, while others just their news from social media, where Twitter is the comment it, or ask for additional evidence most used platform. However, the absence of a sys- on the rumour’s veracity. A shared task has tematic approach to do some form of fact and veracity been proposed at SemEval-2017 (Task 8, Sub- checking may also encourage the spread of rumourous Task A), which is focused on rumour stance stories and misinformation [PVV13]. Indeed, in social classification in English tweets. The goal is media, unverified information can spread very quickly predicting user stance towards emerging ru- and becomes viral easily, enabling the diffusion of false mours in Twitter, in terms of supporting, rumours and fake information. denying, querying, or commenting the original Within this scenario, it is crucial to analyse peo- rumour, looking at the conversation threads ple attitudes towards rumours in social media and to originated by the rumour. This paper de- resolve their veracity as soon as possible. Several ap- scribes a new approach to this task, where the proaches have been proposed to check the rumour ve- use of conversation-based and affective-based racity in social media [SSW+ 17]. This paper focus features, covering different facets of affect, is on a stance-based analysis of event-related rumours, explored. Our classification model outper- following the approach proposed at SemEval-2017 in forms the best-performing systems for stance the new RumourEval shared task (Task 8, sub-task classification at SemEval-2017 showing the ef- A) [DBL+ 17]. In this task English tweets from conver- fectiveness of the feature set proposed. sation threads, each associated to a newsworthy event Copyright © CIKM 2018 for the individual papers by the papers' and the rumours around it, are provided as data. The authors. Copyright © CIKM 2018 for the volume as a collection goal is to determine whether a tweet in the thread by its editors. This volume and its papers are published under is supporting, denying, querying, or commenting the original rumour which started the conversation. It can the Creative Commons License Attribution 4.0 International (CC be considered a stance classification task, where we BY 4.0). 1 http://www.journalism.org/2017/09/07/ news-use-across-social-media-platforms-2017/ have to predict the user’s stance towards the rumour Development Data from a tweet, in the context of a given thread. This Rumour S D Q C task has been defined as open stance classification task Germanwings 69 11 28 173 and is conceived as a key step in rumour resolution, Training Data by providing an analysis of people reactions towards Rumour S D Q C an emerging rumour [PVV13, ZLP+ 16]. The task is Charlie Hebdo 239 58 53 721 also different from detecting stance towards a specific Ebola Essien 6 6 1 21 target entity [MKS+ 16]. Ferguson 176 91 99 718 Contribution We describe a novel classification Ottawa Shooting 161 76 63 477 approach, by proposing a new feature matrix, which Prince Toronto 21 7 11 64 includes two new groups: (a) features exploiting the Putin Missing 18 6 5 33 conversational structure of the dataset [DBL+ 17]; (b) Sydney Siege 220 89 98 700 affective features relying on the use of a wide range Total 841 333 330 2734 of affective resources capturing different facets of sen- Testing Data timent and other affect-related phenomena. We were Rumour S D Q C also inspired by the fake news study on Twitter in Ferguson 15 4 17 66 [VRA18], showing that false stories inspire fear, dis- Ottawa Shooting 10 2 20 91 gust, and surprise in replies, while true stories inspire Sydney Siege 5 1 12 69 anticipation, sadness, joy, and trust. Meanwhile, from Charlie Hebdo 9 2 8 74 a dialogue act perspective, the study of [NS13] found Germanwings 11 5 15 71 that a relationship exists between the use of an affec- Marina Joyce 5 30 10 110 tive lexicon and the communicative intention of an ut- Hillary’s Illness 39 27 24 297 terance which includes AGREE-ACCEPT (support), Total 94 71 106 778 REJECT (deny), INFO-REQUEST (question), and OPINION (comment). They exploited several LIWC categories to analyse the role of affective content. Table 1: Semeval-2017 Task 8 (A) dataset distribution. Our results show that our model outperforms the Dataset2 The data for this task are taken from state of the art on the Semeval-2017 benchmark Twitter conversations about news-related rumours col- dataset. Feature analysis highlights the contribution lected by [ZLP+ 16]. They were annotated using of the different feature groups, and error analysis is four labels (SDQC): support - S (when tweet’s au- shedding some light on the main difficulties and chal- thor support the rumour veracity); deny -D (when lenges which still need to be addressed. tweet’s author denies the rumour veracity); query - Outline The paper is organized as follows. Sec- Q (when tweet’s author ask for additional informa- tion 2 introduces the SemEval-2017 Task 8. Section 3 tion/evidence); comment -C (when tweet’s author just describes our approach to deal with open stance classi- make a comment and does not give important informa- fication by exploiting different groups of features. Sec- tion to asses the rumour veracity). The distribution tion 4 describes the evaluation and includes a quali- consists of three sets: development, training and test tative error analysis. Finally, Section 5 concludes the sets, as summarized in Table 1, where you can see also paper and points to future directions. the label distribution and the news related to the ru- mors discussed. Training data consist of 297 Twitter conversations and 4,238 tweets in total with related 2 SemEval-2017 Task 8: RumourEval direct and nested replies, where conversations are as- The SemEval-2017 Task 8 Task A [DBL+ 17] has as sociated to seven different breaking news. Test data its main objective to determine the stance of the users consist of 1049 tweets, where two new rumourous top- in a Twitter thread towards a given rumour, in terms ics were added. of support, denying, querying or commenting (SDQC) Participants Eight teams participated in the task. on the original rumour. Rumour is defined as a “cir- The best performing system was developed by Tur- culating story of questionable veracity, which is appar- ing (78.4 in accuracy). ECNU, MamaEdha, UWa- ently credible but hard to verify, and produces sufficient terloo, and DFKI-DKT utilized ensemble classifier. skepticism and/or anxiety so as to motivate finding out Some systems also used deep learning techniques, in- the actual truth” [ZLP+ 15]. The task was very timing cluding Turing, IKM, and MamaEdha. Meanwhile, due to the growing importance of rumour resolution NileTRMG and IITP used classical classifier (SVM) to in the breaking news and to the urgency of preventing 2 http://alt.qcri.org/semeval2017/task8/index.php?id= the spreading of misinformation. data-and-tools build their systems. Most of the participants exploited 3.3 Affective Based Features word embedding to construct their feature space, be- The idea to use affective features in the context of our side the Twitter domain features. task was inspired by recent works on fake news detec- tion, focusing on emotional responses to true and false 3 Proposed Method rumors [VRA18], and by the work in [NS13] reflecting We developed a new model by exploiting several on the role of affect in dialogue acts [NS13]. Multi- stylistic and structural features characterizing Twit- faceted affective features have been already proven to ter language. In addition, we propose to utilize be effective in some related tasks [LFPR16], including conversational-based features by exploiting the pecu- the stance detection task proposed at SemEval-2016 liar tree structure of the dataset. We also explored the (Task 6). use of affective based feature by extracting information We used the following affective resources relying on from several affective resources including dialogue-act different emotion models. inspired features. Emolex: it contains 14,182 words associated 3.1 Structural Features with eight primary emotion based on the Plutchik model [MT13, Plu01]. They were designed taking into account several Twit- ter data characteristics, and then selecting the most EmoSenticNet(EmoSN): it is an enriched ver- relevant features to improve the classification perfor- sion of SenticNet [COR14] including 13,189 words mance. The set of structural features that we used is labeled by six Ekman’s basic emotion [PGH+ 13, listed below. Ekm92]. Retweet Count: The number of retweet of each Dictionary of Affect in Language (DAL): in- tweet. cludes 8,742 English words labeled by three scores representing three dimensions: Pleasantness, Ac- Question Mark: presence of question mark ”?”; tivation and Imagery [Whi09]. binary value (0 and 1). Affective Norms for English Words Question Mark Count: number of question (ANEW): consists of 1,034 English words marks present in the tweet. [BL99] rated with ratings based on the Valence- Hashtag Presence: this feature has a binary Arousal-Dominance (VAD) model [OST57]. value 0 (if there is no hashtag in the tweet) or 1 Linguistic Inquiry and Word Count (if there is at least one hashtag in the tweet). (LIWC): this psycholinguistic resource [PFB01] Text Length: number of characters after remov- includes 4,500 words distributed into 64 emo- ing Twitter markers such as hashtags, mentions, tional categories including positive (PosEMO) and URLs. and negative (NegEMO). URL Count: number of URL links in the tweet. 3.4 Dialogue-Act Features 3.2 Conversation Based Features We also included additional 11 categories from bf LIWC, which were already proven to be effective in These features are devoted to exploit the peculiar char- dialogue-act task in previous work [NS13]. Basically, acteristics of the dataset, which have a tree structure these features are part of the affective feature group, reflecting the conversation thread3 . but we present them separately because we are in- Text Similarity to Source Tweet: Jaccard terested in exploring the contribution of such feature Similarity of each tweet with its source tweet. set separately. This feature set was obtained by se- lecting 4 communicative goals related to our classes Text Similarity to Replied Tweet: the degree in the stance task: agree-accept (support), reject of similarity between the tweet with the previous (deny), info-request (question), and opinion (com- tweet in the thread (the tweet is a reply to that ment). The 11 LIWC categories include: tweet). Agree-accept: Assent, Certain, Affect; Tweet Depth: the depth value is obtained by counting the node from sources (roots) to each Reject: Negate, Inhib; tweet in their hierarchy. Info-request: You, Cause; 3 The implementation of these features is inspired from un- published shared code [Gra17]. Opinion: Future, Sad, Insight, Cogmech. No. Systems Accuracy S D Q C 1. Turing’s System 78.4 Support 27 0 3 64 2. Aker et al. System 79.02 Deny 2 0 1 68 3. Our System 79.5 Query 0 0 50 56 RumourEval Baseline 74.1 Comment 13 0 8 757 Table 2: Results and comparison with state of the art Table 3: Confusion Matrix 4 Experiments, Evaluation and Analy- S D Q C sis Support 39 14 5 13 We used the RumourEval dataset from SemEval-2017 Deny 8 28 5 30 Task 8 described in Section 2. We defined the rumour Query 2 3 62 4 stance detection problem as a simple four-way classi- Comment 14 14 2 41 fication task, where every tweet in the dataset (source and direct or nested reply) should be classified into one Table 4: Confusion Matrix on Balanced Dataset among four classes: support, deny, query, and com- ment. We conducted a set of experiments in order to the scores for each class in order to get a better un- evaluate and analyze the effectiveness of our proposed derstanding of our classifier’s performance. feature set.4 . Using only conversational, affective, or dialogue-act The results are summarized in Table 2, showing features (without structural features) did not give a that our system outperforms all of the other systems good classification result. Set B (conversational fea- in terms of accuracy. Our best result was obtained tures only) was not able to detect the query and deny by a simple configuration with a support vector clas- classes, while set C (affective features only) and D sifier with radial basis function (RBF) kernel. Our (dialogue-act features only) failed to catch the sup- model performed better than the best-performing sys- port, query, and deny classes. Conversational features tems in SemEval 2017 Task 8 Subtask A (Turing team, were able to improve the classifier performance signif- [KLA17]), which exploited deep learning approach by icantly, especially in detecting the support class. Sets using LTSM-Branch model. In addition, we also got a E, H, I, and K which utilize conversational features in- higher accuracy than the system described in [ADB17], duce an improvement on the prediction of the support which exploits a Random Forest classifier and word class (roughly from 0.3 to 0.73 on precision). Mean- embeddings based features. while, the combination of affective and dialogue-act We experimented with several classifiers, including features was able to slightly improve the classification Naive Bayes, Decision Trees, Support Vector Machine, of the query class. The improvement can be seen from and Random Forest, noting that SVM outperforms the set E to set K where the F1 -score of query class in- other classifiers on this task. We explored the pa- creased from 0.52 to 0.58. Overall, the best result was rameter space by tuning the SVM hyperparameters, obtained by the K set which encompasses all sets of namely the penalty parameter C, kernel type, and class features. It is worth to be noted that in our best con- weights (to deal with class imbalance). We tested sev- figuration system, not all of affective and dialogue-act eral values for C (0.001, 0.01, 0.1, 1, 10, 100, and 1000), features were used in our feature vector. After several four different kernels (linear, RBF, polynomial, and optimization steps, we found that some features were sigmoid) and weighted the classes based on their dis- not improving the system’s performance. Our final list tribution in the training data. The best result was of affective and dialogue-act based features includes: obtained with C=1, RBF kernel, and without class DAL Activation, ANEW Dominance, Emolex weighting. Negative, Emolex Fear, LIWC Assent, LIWC An ablation test was conducted to explore the con- Cause, LIWC Certain and LIWC Sad. There- tribution of each feature set. Table 5 shows the result fore, we have only 17 columns of features in the best of our ablation test, by exploiting several feature sets performing system covering structural, conversational, on the same classifier (SVM with RBF kernel) 5 . This affective and dialogue-act features. evaluation includes macro-averages of precision, recall We conducted a further analysis of the classification and F1 -score as well as accuracy. We also presented result obtained by the best performing system (79.50 4 We on accuracy). Table 3 shows the confusion matrix of built our system by using scikit-learn Python Li- brary: http://scikit-learn.org/ our result. On the one hand, the system is able to 5 Source code is available on the GitHub platform: detect the comment tweets very well. However, this https://github.com/dadangewp/SemEval2017-RumourEval result is biased due to the number of comment data in Ablation Test Overall Support Query Comment Set Features Acc Prec Rec F1 Acc Prec Rec F1 Acc Prec Rec F1 Acc Prec Rec F1 A Structural 0.731 0.41 0.37 0.38 0.18 0.28 0.18 0.22 0.39 0.56 0.39 0.46 0.91 0.78 0.91 0.84 B Conversational 0.767 0.42 0.31 0.33 0.29 0.93 0.29 0.44 0 0 0 0 1 0.76 1 0.87 C Affective 0.742 0.19 0.25 0.21 0 0 0 0 0 0 0 0 1 0.74 1 0.85 D Dialogue-Act 0.742 0.19 0.25 0.21 0 0 0 0 0 0 0 0 1 0.74 1 0.85 E A+B 0.783 0.54 0.43 0.45 0.29 0.73 0.29 0.41 0.42 0.62 0.42 0.52 0.96 0.8 0.96 0.87 F A+C 0.741 0.42 0.36 0.38 0.14 0.27 0.14 0.18 0.39 0.62 0.39 0.48 0.93 0.77 0.93 0.84 G A+D 0.736 0.42 0.37 0.38 0.18 0.3 0.18 0.23 0.37 0.59 0.37 0.45 0.92 0.77 0.92 0.84 H E+C 0.788 0.56 0.42 0.46 0.28 0.74 0.28 0.4 0.44 0.7 0.44 0.54 0.97 0.8 0.97 0.87 I E+D 0.784 0.53 0.43 0.46 0.3 0.65 0.3 0.41 0.45 0.67 0.45 0.54 0.96 0.8 0.96 0.87 J F+D 0.749 0.43 0.36 0.38 0.14 0.33 0.14 0.19 0.38 0.63 0.38 0.47 0.94 0.77 0.94 0.85 K All Features 0.795 0.57 0.43 0.47 0.29 0.73 0.29 0.41 0.47 0.75 0.47 0.58 0.97 0.8 0.97 0.88 *deny class is not presented, since the score is always zero (0) Table 5: Ablation test on several feature sets. the dataset. On the other hand, the system is failing tion. she needs help, but in the form of rehab to detect denying tweets, which were falsely classified #savemarinajoyce into comments (68 out of 71)6 . Meanwhile, approxi- Tweets like (da1) and (da2) seem to be more inclined mately two thirds of supporting tweets and almost half to show the respondent’s personal hatred towards the of querying tweets were classified into the correct class s1-tweet’s author than to deny the veracity of the ru- by the system. mour. In other words, they represent a peculiar form In order to assess the impact of class imbalance on of denying the rumour, which is expressed by personal the learning, we performed an additional experiment attack and by showing negative attitudes or hatred with a balanced dataset using the best performing con- towards the rumour’s author. This is different from figuration. We took a subset of the instances equally denying by attacking the source tweet content, and it distributed with respect to their class from the train- was difficult to comprehend for our system, that often ing set (330 instances for each class) and test set (71 misclassified such kind of tweets as comments. instances for each class). As shown in Table 4, our Noisy text, specific jargon, very short text. In classifier was able to correctly predict the underrep- (da1) and (da2) (as in many tweets in the test set), we resented classes much better, although the overall ac- also observe the use of noisy text (abbreviations, mis- curacy is lower (59.9%). The result of this analysis spellings, slang words and slurs, question statements clearly indicates that class imbalance has a negative without question mark, and so on) that our classifier impact on the system performance. struggles to handle . Moreover, especially in tweets 4.1 Error analysis from the Marina Joyce rumour’s group, we found some very short tweets in the denying class that do not pro- We conducted a qualitative error analysis on the 215 vide enough information, e.g. tweets like “shut up!”, misclassified in the test set, to shed some light on the “delete”, and “stop it. get some help”. issues and difficulties to be addressed in future work Argumentation context. We also observed misclas- and to detect some notable error classes. sification cases that seem to be related to a deeper Denying by attacking the rumour’s author. An capability of dealing with the argumentation context interesting finding from the analysis of the Marina underlying the conversation thread. Joyce rumour data is that it contains a lot of deny- ing tweets including insulting comments towards the Rumour: Ferguson author of the source tweet, like in the following cases: Misclassified tweet: (arg1)@QuadCityPat @AP I join you in this Rumour: Marina Joyce demand. Unconscionable. Misclassified tweets: Misclassification type: deny (gold) com- (da1) stfu you toxic sludge ment (prediction) (da2) @sampepper u need rehab Source tweet: Misclassification type: deny (gold) com- (s2) @AP I demand you retract the lie that ment (prediction) people in #Ferguson were shouting “kill the Source tweet: police”, local reporting has refuted your ugly (s1) Anyone who knows Marina Joyce per- racism sonally knows she has a serious drug addic- 6 A similar observation is reported by the best team at Here the misclassified tweet is a reply including an ex- Semeval-2017 [KLA17]. plicit expression of agreement with the author of the source tweet (“I join you”). Tweet (s2) is one of the a pickle jar. rare cases of source tweets denying the rumor (source (fg2) @mitchellvii Also, except for having tweets in the RumourEval17 dataset are mostly sup- a 24/7 MD by her side giving her Val- porting the rumor at issue). Our hypothesis is that it ium injections, Hillary is in good health! is difficult for a system to detect such kind of stance https://t.co/GieNxwTXX7 without a deeper comprehension of the argumentation (fg3) @mitchellvii @JoanieChesnutt At the context (e.g., if the author’s stance is denying the ru- very peak yes, almost time to go down a cliff mor, and I agree with him, then I am denying the and into the earth. rumor as well). In general, we observed that when the Misclassification type: support (gold) source tweet is annotated by the deny label, most of comment (prediction) denying replies of the thread include features typical Source tweet: of the support class (and vice versa), and this was a (s4) Except for the coughing, fainting, appar- criticism. ent seizures and ”short-circuits,” Hillary is in Mixed cases. Furthermore, we found some border- the peak of health. line mixed cases in the gold standard annotation. See for instance the following case: All misclassified tweets (fg1-fg3) from the Hillary’s ill- ness data are replies to a source tweet (s4), which is Rumour: Ferguson featured by sarcasm. In such replies authors support Misclassified tweet: the rumor by echoing the sarcastic tone of the source (mx1) @MichaelSkolnik @MediaLizzy Oh tweet. Such more sophisticated cases, where the sup- do tell where they keep track of ”vigilante” portive attitude is expressed in an implicit way, were stats. That’s interesting. challenging for our classifier, and they were quite sys- Misclassification type: query (gold) tematically misclassified as simple comments. comment (prediction) Source tweet: (s3) Every 28 hours a black male is killed 5 Conclusion in the United States by police or vigilantes. In this paper we proposed a new classification model #Ferguson for rumour stance classification. We designed a set of features including structural, conversation-based, Tweet (mx1) is annotated with a query label rather affective and dialogue-act based feature. Experi- than as a comment (our system prediction), but we ments on the SemEval-2017 Task 8 Subtask A dataset can observe the presence of a comment (“That’s inter- show that our system based on a limited set of well- esting”) after the request for clarification, so it seems engineered features outperforms the state-of-the-art to be a kind of mixed case, where both labels make systems in this task, without relying on the use of sense. sophisticated deep learning approaches. Although Citation of the source’s tweet. We have noticed achieving a very good result, several research chal- many misclassified cases of replying tweets with er- lenges related to this task are left open. Class im- ror pattern support (gold) comment (our predic- balance was recognized as one the main issues in this tion), where the text contains a literal citation of the task. For instance, our system was struggling to de- source tweet, like in the following tweet: THIS HAS tect the deny class in the original dataset distribution, TO END “@MichaelSkolnik: Every 28 hours a black but it performed much better in that respect when we male is killed in the United States by police or vigi- balanced the distribution across the classes. lantes. #Ferguson” (the text enclosed in quotes is the A re-run of the RumourEval shared task has been source tweet). Such kind of mistakes could be maybe proposed at SemEval 20197 and it will be very inter- addressed by applying some pre-processing to the data, esting to participate to the new task with an evolution for instance by detecting the literal citation and replac- of the system here described. ing it with a marker. Figurative language devices. Finally, the use of Acknowledgements figurative language (e.g., sarcasm) is also an issue that should be considered for the future work. Let us con- Endang Wahyu Pamungkas, Valerio Basile and Vi- sider for instance the following misclassified tweets: viana Patti were partially funded by Progetto di Ate- neo/CSP 2016 (Immigrants, Hate and Prejudice in So- Rumour: Hillary’s Illness cial Media, S1618 L2 BOSC 01). Misclassified tweets: (fg1) @mitchellvii True, after all she can open 7 http://alt.qcri.org/semeval2019/ References [NS13] Nicole Novielli and Carlo Strapparava. The role of affect analysis in dialogue act iden- [ADB17] Ahmet Aker, Leon Derczynski, and Kalina tification. IEEE Transactions on Affective Bontcheva. Simple open stance classi- Computing, 4(4):439–451, 2013. fication for rumour analysis. In Proc. of RANLP 2017, pages 31–39. INCOMA [OST57] C.E. Osgood, G.J. Suci, and P.H. Tenen- Ltd., 2017. baum. The Measurement of meaning. Uni- [BL99] Margaret M Bradley and Peter J Lang. versity of Illinois Press, Urbana:, 1957. Affective norms for english words (anew): [PFB01] James W Pennebaker, Martha E Francis, Instruction manual and affective ratings. and Roger J Booth. Linguistic inquiry and Technical report, Technical Report C-1, word count (LIWC): LIWC 2001. Mahway: The Center for Research in Psychophysi- Lawrence Erlbaum Associates, 2001. ology, University of Florida., 1999. [PGH+ 13] Soujanya Poria, Alexander Gelbukh, Amir [COR14] Erik Cambria, Daniel Olsher, and Dheeraj Hussain, Newton Howard, Dipankar Das, Rajagopal. Senticnet 3: a common and Sivaji Bandyopadhyay. Enhanced sen- and common-sense knowledge base for ticnet with affective labels for concept- cognition-driven sentiment analysis. In based opinion mining. IEEE Intelligent Proc. of AAAI 2014, 2014. Systems, 28(2):31–38, 2013. [DBL+ 17] Leon Derczynski, Kalina Bontcheva, Maria [Plu01] Robert Plutchik. The nature of emotions. Liakata, Rob Procter, Geraldine Wong American scientist, 89(4):344–350, 2001. Sak Hoi, and Arkaitz Zubiaga. Semeval- 2017 task 8: Rumoureval: Determining ru- [PVV13] Rob Procter, Farida Vis, and Alex Voss. mour veracity and support for rumours. In Reading the riots on twitter: method- Proc. of SemEval-2017, pages 69–76. ACL, ological innovation for the analysis of big 2017. data. International journal of social re- [Ekm92] Paul Ekman. An argument for basic emo- search methodology, 16(3):197–214, 2013. tions. Cognition & emotion, 6(3-4):169– [SSW+ 17] Kai Shu, Amy Sliva, Suhang Wang, Jiliang 200, 1992. Tang, and Huan Liu. Fake news detection [Gra17] David Graf. Semeval-2017-t8, June 2017. on social media: A data mining perspec- tive. ACM SIGKDD Explorations Newslet- [KLA17] Elena Kochkina, Maria Liakata, and Is- ter, 19(1):22–36, 2017. abelle Augenstein. Turing at SemEval-2017 Task 8: Sequential Approach to Rumour [VRA18] Soroush Vosoughi, Deb Roy, and Sinan Stance Classification with Branch-LSTM. Aral. The spread of true and false news on- In Proc. of SemEval-2017, pages 475–480. line. Science, 359(6380):1146–1151, 2018. ACL, 2017. [Whi09] Cynthia Whissell. Using the revised dic- [LFPR16] Mirko Lai, Delia Irazú Hernández Farı́as, tionary of affect in language to quantify Viviana Patti, and Paolo Rosso. Friends the emotional undertones of samples of and enemies of Clinton and Trump: us- natural language. Psychological reports, ing context for detecting stance in political 105(2):509–521, 2009. tweets. In Proc. of MICAI 2016, volume 10061 of LNCS, pages 155–168. Springer, [ZLP+ 15] Arkaitz Zubiaga, Maria Liakata, Rob Proc- 2016. ter, Kalina Bontcheva, and Peter Tolmie. Towards detecting rumours in social me- [MKS+ 16] Saif Mohammad, Svetlana Kiritchenko, dia. In AAAI Workshop: AI for Cities, Parinaz Sobhani, Xiao-Dan Zhu, and Colin 2015. Cherry. Semeval-2016 task 6: Detecting stance in tweets. In Proc. of SemEval 2016, [ZLP+ 16] Arkaitz Zubiaga, Maria Liakata, Rob Proc- pages 31–41. ACL, 2016. ter, Geraldine Wong Sak Hoi, and Peter Tolmie. Analysing how people orient to [MT13] Saif M Mohammad and Peter D Turney. and spread rumours in social media by Crowdsourcing a word–emotion associa- looking at conversational threads. PloS tion lexicon. Computational Intelligence, one, 11(3):e0150989, 2016. 29(3):436–465, 2013.