=Paper=
{{Paper
|id=Vol-2696/paper_132
|storemode=property
|title=Analyzing User Profiles for Detection of Fake News Spreaders on Twitter
|pdfUrl=https://ceur-ws.org/Vol-2696/paper_132.pdf
|volume=Vol-2696
|authors=María S. Espinosa,Roberto Centeno,Álvaro Rodrigo
|dblpUrl=https://dblp.org/rec/conf/clef/EspinosaCR20
}}
==Analyzing User Profiles for Detection of Fake News Spreaders on Twitter==
Analyzing User Profiles for Detection of Fake News Spreaders on Twitter Notebook for PAN at CLEF 2020 María S. Espinosa, Roberto Centeno, and Álvaro Rodrigo Natural Language Processing and Information Retrieval Group Universidad Nacional de Educación a Distancia, Spain. mespinosa@lsi.uned.es, rcenteno@lsi.uned.es, alvarory@lsi.uned.es Abstract The massive spread of digital information to which our society is sub- jected nowadays has led to a great amount of false or extremely biased infor- mation being shared and consumed by Internet users every day. Disinformation, including misleading and even false information, is a major issue for our current society. The impact of fake news on politics, economy and even public health is yet to be specified. Internet users must face a high amount of false information in digital media such as rumours, fake news, and extremely biased news. Given the crucial role that the spread of fake news plays in our current society, it is becoming essential to design tools to automatically verify the veracity of online information. In order to address this issue, the PAN@CLEF 2020 competition has proposed a task focused on the detection of fake news spreaders on Twitter. In this paper, we offer a detailed description of the system developed for this com- petition. Our system relies on psychological features for modelling the behaviour of users. 1 Introduction The rise of social media in the past years has changed the way people consume in- formation, especially news. The amount of time spent online, immediacy and lower to nonexistent price are decisive factors for this global change in the ways of news con- sumption. According to a study conducted by the Pew Research Center in 2016 in the United States, the percentage of American adults getting their news through social media in- creased from 49% in 2012 to 62% in 20161 . The same study in 2018 reported a value of 68%, confirming that this number is still increasing2 . The shift in how people consume news during the past few years is undeniable. Nowadays, people are more likely than ever to use social networks for news instead of more traditional sources such as printed newspapers and television [23]. Copyright c 2020 for this paper by its authors. Use permitted under Creative Commons Li- cense Attribution 4.0 International (CC BY 4.0). CLEF 2020, 22-25 September 2020, Thessa- loniki, Greece. 1 https://www.journalism.org/2016/05/26/news-use-across-social-media-platforms-2016/ 2 https://www.journalism.org/2018/09/10/news-use-across-social-media-platforms-2018/ Our society is subjected to a massive exposition of information nowadays, and this has led to a great amount of false or extremely biased information being shared and consumed by Internet users every day. In 2013, the need to combat fake news was already appointed by the World Economic Forum’s Global Risks Report that warned that “digital wildfires" could spread false information rapidly3 . Fake news has become a global issue, especially since recent social and political events such as the 2016 U.S. Presidential Election. Online political discussion was strongly influenced by social media users and bots spreading misinformation, which potentially altered public opinion and endangered the integrity of the elections. Fur- thermore, a paper published in 2019 conducting a study over a dataset with 171 million tweets in the five months preceding the election day founded that, from the 30 mil- lion tweets which contained a link to news outlets, 25% of them spread either fake or extremely biased news [3]. The impact of fake news in global economy, public health and even in the creation of panic in society has been extensively documented in the past few years with count- less examples, such as [15], [8], [22] and [17]. These are examples of the high cost associated to the spread of fake news: the absence of control and verification of the information, which makes social media a fertile ground for the spread of unverified or false information. With this in mind, we can affirm that the magnitude, diversity and substantial dan- gers of fake news and, in more general terms, the disinformation circulating on social media is becoming a reason of concern due to the potential social cost it may have in the near future [2]. As a consequence, the research community has launched several evaluation competitions to foster the development of systems able to detect false infor- mation. The Author Profiling task of Profiling Fake News Spreaders on Twitter at the PAN@CLEF 2020 competition is a good example of one of such competitions [14]. In this paper, we describe the proposal send to the competition, analysing the main contributions and errors detected. Our proposal focuses on the content created by a user instead of the content created by other users as for example retweets. Then, we use a combination of psychological and linguistic features aimed at modelling the behaviour of a user to detect if they are spreader or non-spreader. 2 Related Work Over the past few years, several definitions have been given for the term fake news. One of the most frequent definitions is one that overlaps with the contents of misinformation and disinformation as well: Fake news are fabricated information that mimics news media content intentionally created to deceive, mislead or misinform readers [9]. The intention behind the creation and dissemination of fake news often has a political or economic component. Given the crucial role that the spread of fake news plays in our current society, research on this topic is developing significantly. In fact, the number of published papers indexed in the the Scopus database concerning the topic of fake 3 http://www3.weforum.org/docs/WEF_GlobalRisks_Report_2013.pdf news has increased considerably from less than 20 in 2006 to more than 200 in 2018 [25]. These works concentrate on understanding how false information spreads through social media, and how can it be efficiently detected in order to reduce its negative impact on society. This task has been approached from different perspectives, such as Natural Language Processing (NLP), Data Mining (DM), and Social Media Analysis (SMA). Recent research proposes an approach which combines text generation and fact- checking in order to mitigate the effects of fake news spreading [24]. In many cases the task is treated as a binary classification problem where a news piece is classified as fake or real. However, there are cases in which this classification may not be adequate since the news could be partially true and partially false. For this reason, systems capable of multi-class classification have also been proposed [16]. In the field of Natural Language Processing, research has been focusing on the detection and intervention of fake news using techniques such as Machine Learning and Deep Learning [18], and taking into account: – Content-based features contain information that can be extracted from the text, such as linguistic features. – Context-based features contain surrounding information such as user characteris- tics, social network propagation features, or users’ reactions to the information. Detecting fake news in the context of social media presents characteristics and chal- lenges that result in content-based methods not being effective on their own. Fake news are intentionally created to deceive, making it difficult identify them only from their textual content. For this reason, it is common to use surrounding information such as the way in which they are disseminated and the behavior of the users involved in this dissemination, as well as information related to the author of the news [19]. Recent research has demonstrated that studying the correlation of the user profile and the spread of fake news works for the identification of those users mere likely to believe fake news and for the differentiation of those more likely to believe real news [20]. Approaches considering context-based features in combination with content-based features have been gaining popularity in the past years, due to the promising results obtained in recent studies, such as [21] and [4]. Mainly three aspects of this type of information can be studied: – User information, such as location, age, number of followers, etc. – The responses generated by fake news, which can stand as an important source of detection not only because users use responses to express their opinions but also because they can help in the construction of a credibility index for users [7]. – The social networks through which the news disseminate. The study of the net- works through which the information is propagated has special relevance since the rapid diffusion of these networks is used to reach the maximum number of users in the shortest possible time. 3 Dataset Description and Preprocessing The dataset provided by the organizers of the task was divided into two collections of tweets: one in Spanish and one in English. Each of them contained 300 XML documents containing 100 tweets written or shared by one user. There was, therefore, information regarding 300 users. In addition, each directory contained a truth file with the user identifier and a 0 or a 1 determining the class label 4 . The user identification numbers were not the real Twitter IDs since these were ob- fuscated for privacy reasons. Their purpose was to be able to identify the users in the truth file as well as in the user XML documents. In the same way, the contents of the tweets in what regards mentions, hashtags, URLs, and usernames were also obfuscated indicating only that one of them had oc- curred by the use of a keyword in capital letters and between dollar symbols. Therefore, a tweet containing the following text: “RT The new president of the USA is @bartsimpson! #usa https://thenews.com/ new-president-bart-simpson” would have the following content in the provided dataset: “RT The new president of the USA is $USER$ ! $HASHTAG$ $URL$” Our participation in this task was only for the English language, therefore, we only used the data in the English directory in our model. With regards to the preprocessing applied to the data, there are three important steps that we took before the feature extraction: 1. Tokenization. In this step, the words conforming the tweets were separated into to- kens using the RegexpTokenizer from the Natural Language Toolkit (NLKT) in Python [10], which splits a string into substrings using a regular expression that matches the tokens. In this case, we selected words of 3 or more alphabetic charac- ters. 2. Stop words removal. After the words were tokenized, the stop words were removed from the text. For this task, we used the stopword set form the NLTK corpus combined with a small list of custom words added manually to the set. These words were commonly used words such as prepositions, coordinating conjunctions, and determiners that were not included in the NLTK corpus stop word set. 3. Tweet aggregation. After the two previous steps were executed, the resulting to- kens were aggregated conforming a single document per user. The reason for this aggregation was to have a single piece of text per user before the processing phase. It is important to notice that, as we will see in the following sections, some of this preprocessing had to be done later on the processing phase because some of the features of our model take into account metrics such as the number of stop words, the number of determiners, or the number of coordinating conjunctions. 4 The information regarding which class label corresponds to fake news spreaders was not avail- able due to GDPR reasons 4 Feature Engineering The main contents that users share in social media can be divided in: (1) content cre- ated by the user, and (2) content created by others. Our model for the task of profiling fake news spreaders on Twitter is based on the following hypothesis: Establishing the difference between user-created and user-shared content will reveal more accurate features of the user’s online behavior. This hypothesis states that the individual anal- ysis of these groups of contents will reveal more precise features of the user profiles. For this reason, the process of feature extraction was applied differently to the con- tent that was originally written by the user (i.e his/her tweets) and to the content shared by the user but originally written by other users (i.e. retweets). In this section, we will describe all the feature engineering applied to the data detailing how the distinction between tweets and retweets was made in each case. The complete set of features extracted from the data is depicted in Table 1. The set of features used in the model can be divided in the following four main categories: – Psychological features, which will help defining the user profile in order to differ- entiate between fake news spreaders and real news spreaders. – Linguistic features that will help identifying the linguistic traits that identify each category. – Twitter actions features. Exploring how the users behave in the social network could offer some insights on the online behaviour of fake news spreaders. – Headline analysis data, which can help us in the identification of news pieces that a user shares that are actually fake. 4.1 Psychological Features The psychological features were extracted using a third-party API developed by Symanto5 . The documents containing the aggregated tweets for each user were sent to the API in order to retrieve the values of their (1) personality traits, (2) their communication styles, and (3) the sentiment analysis of their text. The personality traits value would be either “emotional” or “rational” depending on the analysis of the user’s text. The value returned by the API when the communication styles are requested is a collection of traits, such as self-revealing, which means shar- ing one’s own experience and opinion; fact-oriented, which implies focusing on factual information, objective observations or statements; information-seeking, that is, posing questions; and action-seeking or aiming to trigger someone’s action by giving recom- mendation, requests or advice. Finally, the values returned by the sentiment analysis of the text returns either “positive” o “negative” depending on the sentiment found in the user’s text. All the values returned by Symanto’s API included a percentage for the predicted value and had to be converted to a binary representation (0,1). 5 https://symanto-research.github.io/symanto-docs/ Category Feature name Description Set of values pers_pred personality prediction value {emotional, rational} self_pred self-revealing prediction value {yes, no} info_pred information-seeking prediction value {yes, no} psychological features action_pred action-seeking prediction value {yes, no} fact_pred fact-oriented prediction value {yes, no} sent_pred sentiment analysis prediction value {positive, negative} num_ADJ number of adjectives N num_DET number of determiners N num_ADP number of adpositions N num_NOUN number of nouns N num_VERB number of verbs N num_PROPN number of proper nouns N num_NUM number of numerals N num_AUX number of auxiliary verbs N num_ADV number of advebrs N linguistic features num_CONJ number of coodinating conjunctions N num_INTJ number of interjections N num_PART number of particles N I-ORG number of named entities (organizations) N I-LOC number of named entities (locations) N I-PER number of named entities (persons) N num_POS number of positive words N num_NEG number of negative words N num_NEUT number of neutral words N num_words total number of words N num_htag number of hashtags N num_rt number of retweets N Twitter actions num_url number of URLs N num_user number of mentions N all_caps number of words all in caps N per_stop percentage of stopwords N headline analysis data num_propn number of proper nouns N Table 1: Complete set of features extracted from the data 4.2 Linguistic Features For the extraction of linguistic features, a natural language pipeline called Polyglot was used [1]. This library is built using distributed word representations (word embeddings) in conjunction with traditional NLP features for over 100 different languages in order to solve NLP tasks, such as Part-of-Speech (POS) tagging, Named Entity Recognition (NER), sentiment analysis, etc. For our model’s set of features we choose 12 POS tagging metrics, 3 named entity recognition metrics and total word count. Details regarding the specific metrics can be found in Table 1. 4.3 Twitter Actions Features The analysis of the activities of the users in Twitter was restricted by the data obfusca- tion described in Section 3. Therefore, only 4 metrics were recorded from the actions of the user within Twitter: the number of mentions, the URL number, the number of retweets and the number of hashtags. The values of these metrics were counted from the total aggregation of tweets of each user. 4.4 Headline Analysis Data In this category of the model, we try to study if there are specific message characteristics that accompany fake news articles being produced and widely shared. Recent studies suggest that not only these characteristics exist, but also that some of them can be found in the headline of the news article [6]. Therefore, we took the 3 most significative char- acteristics differentiating fake news headlines from real news headline and applied them to the text in our dataset. Based on the assumption of our main hypothesis being true, we separated tweets from retweets and applied these measurements only to the retweet subset. 5 Experiments and Results 5.1 Experiments For the creation of our model we first did some experiments in order to select the most important features as well as the best performing algorithms6 . We tested the model taking into account the set of features available in each category separately, and we also tested the possible combinations of the features to evaluate their performance on the data. With regards to the classification models, we used an open source machine learning library, scikit-learn [12]. We performed a comparative analysis in which we tested the model with some of the most pupular classification algorithms, such as Logistic Re- gression, K-Neighbors, Random Forest, Decision Tree, and Support Vector Machines. 6 All the experiments and results can be found in a notebook uploaded to https://github.com/mariaesp/PANCLEF_PAPER/blob/master/notebooks/Spreaders.ipynb. Features Measures LogisticRegression KNeighbors RandomForest DecisionTree SVM accuracy 0.57 0.55 0.62 0.62 0.6 Cat1 precision 0.58 0.56 0.63 0.63 0.59 recall 0.56 0.55 0.59 0.59 0.61 accuracy 0.65 0.55 0.66 0.58 0.62 Cat2 precision 0.65 0.56 0.68 0.59 0.60 recall 0.72 0.54 0.59 0.57 0.77 accuracy 0.56 0.53 0.59 0.55 0.53 Cat3 precision 0.55 0.53 0.58 0.55 0.53 recall 0.7 0.55 0.57 0.55 0.65 accuracy 0.61 0.55 0.59 0.57 0.56 Cat4 precision 0.58 0.48 0.56 0.55 0.54 recall 0.79 0.66 0.82 0.85 0.88 Table 2: Performance results with different classifiers for the evaluation of the feature categories described. Cat1: psychological features, Cat2: linguistic features, Cat3: Twitter actions, Cat4: headline analysis data. In order to use all the available data in for the tests, we used cross-validation with 5 iterations. The results obtained for each classifier can be seen in Table 2. Results are given in terms of accuracy, the official measure, as well as precision and recall. Features Measures LogisticRegression KNeighbors RandomForest DecisionTree SVM accuracy 0.61 0.6 0.68 0.61 0.62 All precision 0.60 0.62 0.7 0.61 0.61 recall 0.65 0.59 0.64 0.61 0.77 Table 3: Performance results with different classifiers for the evaluation of the complete model. After comparing the results obtained with the different categories and classifiers, we trained the model using a combination of all categories. With regards to the clas- sification algorithm, the Random Forest Classifier outperformed the others in 3 of the 4 categories, and also in an additional test in which the set of all the features in the four categories were combined. The results of this experiment can be found in table 3. Therefore, the Random Forest Classifier was chosen as the algorithm to train our model. 5.2 Final Model Definition and Results Once all the experiments allowed us to choose the classifier and the set of features for our model, we trained the model with the data provided for the task and exported our trained model in order to make the submission and evaluation in the TIRA7 environment [13]. Figure 1: Performance result plot with different classifiers for the evaluation of the final model There were two evaluations for our model. On the one hand, there was an early-bird submission evaluation for the task and, on the other hand, there was the final submission evaluation. We participated in both evaluations, first with and early model and then with a final model. The evaluation results can be found in table 4. Data Model Phase Accuracy early experimentation 0.67 development final experimentation 0.68 early early-bird submission 0.67 test final final submission 0.64 Table 4: Early and final model evaluation resutls throughout the different evaluation phases. With regards to the general results of the competition, our team was in position 61 from 66 in the classification. This classification considers results in both Spanish and English languages calculating the average from both accuracies. However, since our team only participated in the English part of the task, we have taken into account only the results in English language in order to see our position. In this case, our team was positioned 45th form 66 participants. Furthermore, if we aggregate the results, that is, 7 https://www.tira.io/ if we count all the participants with the same results as just one participant, our result would be 16th from 33 participants. It is important to notice that our model evolved from the early-bird submission to the final sumbmission. There were 3 main changes performed in the model: – The psychological features could not be included in the first evaluation due to tech- nical issues with the platform. The organizers helped us to solve those issues and we could add the psychological features in the final submission. – The separation of the user original tweets from the user retweets was done after the early-bird sumbission was completed. As we explained in detail in section 4, this decision was made based on the assumption of our main hypothesis. Therefore, the way in which several features of our model were calculated changed for the final submission as well. – The fourth category of features, namely the headline analysis data, was added to the model for the final evaluation submission. As it can be observed in the evaluation results, our model performed slightly better in the evaluation of the early bird submission than in the final submission. The reasons for this performance drop are unknown to the author, since the final model did perform better in our experimentation. One possible explanation is that the evaluation dataset has slightly different characteristics than the training dataset and, therefore, the results vary accordingly. Nevertheless, the difference in the results is too small to certainly know the causes. 6 Conclusions and Future Work In this paper, we have described our proposal for the Author Profiling task of Profil- ing Fake News Spreaders on Twitter at the PAN@CLEF 2020 competition. Our model aimed to differentiate fake news spreaders from real news spreaders using a combina- tion of psychological and linguistic traits extracted from the user’s data, together with characteristics extracted from both the user behaviour in the social network and the news headline analysis. On the one hand, one of the next experiments will be to use deep learning in the training phase of the model. The advances made in the development of Recurrent Neu- ral Networks (RNNs) and Convolutional Neural Networks (CNNs) in the past years demonstrate promising results in the field of natural language processing. On the other hand, more work needs to be done with regards to the psychological and psycholinguistic dimension of the model. The are several psychological models that we want to explore in the following months, such as the Big5 personality model [5] and the Myers–Briggs Type Indicator (MBTI) model [11]. Due to the lack of existing tools for the automatic labelling of these indicators, we will work in the retrieval and labelling of a larger dataset in order to learn to automatically predict such personality traits. As it can be seen in the results exposed in the previous section, our results in de- velopment and test are consistent, which means that, despite the work that needs to be done in order to improve it, it is a robust model with an expectable performance when datasets vary. This work is at very early stages of development and will continue evolving towards a more efficient and better performing system. This is why we show preliminary results of our experiments and there is still room for improvement. Acknowledgements This research project has been supported by the European Social Fund through the Youth Employment Initiative (YEI 2019) and the Spanish Ministry of Science, Inno- vation and Universities (DeepReading RTI2018-096846-B-C21, MCIU/AEI/FEDER, UE). References 1. Al-Rfou, R., Perozzi, B., Skiena, S.: Polyglot: Distributed word representations for multilingual nlp. In: Proceedings of the Seventeenth Conference on Computational Natural Language Learning. pp. 183–192. Association for Computational Linguistics, Sofia, Bulgaria (August 2013), http://www.aclweb.org/anthology/W13-3520 2. Allcott, H., Gentzkow, M.: Social media and fake news in the 2016 election. Journal of economic perspectives 31(2), 211–36 (2017) 3. Bovet, A., Makse, H.A.: Influence of fake news in twitter during the 2016 us presidential election. Nature communications 10(1), 1–14 (2019) 4. Crestani, F., Rosso, P.: The role of personality and linguistic patterns in discriminating between fake news spreaders and fact checkers. In: Natural Language Processing and Information Systems: 25th International Conference on Applications of Natural Language to Information Systems, NLDB 2020, Saarbrücken, Germany, June 24–26, 2020, Proceedings. p. 181. Springer Nature 5. Goldberg, L.R.: An alternative" description of personality": the big-five factor structure. Journal of personality and social psychology 59(6), 1216 (1990) 6. Horne, B.D., Adali, S.: This just in: fake news packs a lot in title, uses simpler, repetitive content in text body, more similar to satire than real news. In: Eleventh International AAAI Conference on Web and Social Media (2017) 7. Jin, Z., Cao, J., Zhang, Y., Luo, J.: News verification by exploiting conflicting social viewpoints in microblogs. In: Thirtieth AAAI conference on artificial intelligence (2016) 8. Kang, C., Goldman, A.: In washington pizzeria attack, fake news brought real guns. The New York Times (2016), https://www.nytimes.com/2016/12/05/business/media/comet- ping-pong-pizza-shooting-fake-news-consequences.html 9. Lazer, D.M., Baum, M.A., Benkler, Y., Berinsky, A.J., Greenhill, K.M., Menczer, F., Metzger, M.J., Nyhan, B., Pennycook, G., Rothschild, D., et al.: The science of fake news. Science 359(6380), 1094–1096 (2018) 10. Loper, E., Bird, S.: Nltk: The natural language toolkit. In: In Proceedings of the ACL Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics. Philadelphia: Association for Computational Linguistics (2002) 11. Myers, I.B.: The myers-briggs type indicator: Manual (1962). (1962) 12. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12, 2825–2830 (2011) 13. Potthast, M., Gollub, T., Wiegmann, M., Stein, B.: TIRA Integrated Research Architecture. In: Ferro, N., Peters, C. (eds.) Information Retrieval Evaluation in a Changing World. Springer (Sep 2019) 14. Rangel, F., Giachanou, A., Ghanem, B., Rosso, P.: Overview of the 8th Author Profiling Task at PAN 2020: Profiling Fake News Spreaders on Twitter. In: Cappellato, L., Eickhoff, C., Ferro, N., Névéol, A. (eds.) CLEF 2020 Labs and Workshops, Notebook Papers. CEUR-WS.org (Sep 2020) 15. Rapoza, K.: Can ‘fake news’ impact the stock market? by Forbes (2017) 16. Rashkin, H., Choi, E., Jang, J.Y., Volkova, S., Choi, Y.: Truth of varying shades: Analyzing language in fake news and political fact-checking. In: Proceedings of the 2017 conference on empirical methods in natural language processing. pp. 2931–2937 (2017) 17. Rich, M.: As coronavirus spreads, so does anti-chinese sentiment. The New York Times. Available from: URL: https://www. nytimes. com/2020/01/30/world/asia/coronavirus-chinese-racism. html (2020) 18. Ruchansky, N., Seo, S., Liu, Y.: Csi: A hybrid deep model for fake news detection. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. pp. 797–806 (2017) 19. Shu, K., Sliva, A., Wang, S., Tang, J., Liu, H.: Fake news detection on social media: A data mining perspective. ACM SIGKDD explorations newsletter 19(1), 22–36 (2017) 20. Shu, K., Wang, S., Liu, H.: Understanding user profiles on social media for fake news detection. In: 2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR). pp. 430–435. IEEE (2018) 21. Shu, K., Zhou, X., Wang, S., Zafarani, R., Liu, H.: The role of user profiles for fake news detection. In: Proceedings of the 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. pp. 436–439 (2019) 22. Takahashi, R.: Amid virus outbreak, japan stores scramble to meet demand for face masks. Japan Times. Consultado el 1 (2020) 23. Tolmie, P., Procter, R., Randall, D.W., Rouncefield, M., Burger, C., Wong Sak Hoi, G., Zubiaga, A., Liakata, M.: Supporting the use of user generated content in journalistic practice. In: Proceedings of the 2017 chi conference on human factors in computing systems. pp. 3632–3644 (2017) 24. Vo, N., Lee, K.: Learning from fact-checkers: Analysis and generation of fact-checking language. In: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. pp. 335–344 (2019) 25. Zhou, X., Zafarani, R.: Fake news: A survey of research, detection methods, and opportunities. arXiv preprint arXiv:1812.00315 (2018)