=Paper=
{{Paper
|id=Vol-2846/paper27
|storemode=property
|title=SAMS: Human-in-the-loop Approach to Combat the Sharing of Digital Misinformation
|pdfUrl=https://ceur-ws.org/Vol-2846/paper27.pdf
|volume=Vol-2846
|authors=Shaban Shabani,Zarina Charlesworth,Maria Sokhn,Heiko Schuldt
|dblpUrl=https://dblp.org/rec/conf/aaaiss/ShabaniCSS21
}}
==SAMS: Human-in-the-loop Approach to Combat the Sharing of Digital Misinformation==
SAMS: Human-in-the-loop Approach to Combat the Sharing of Digital Misinformation Shaban Shabania,b , Zarina Charlesworthb , Maria Sokhnb and Heiko Schuldta a Department of Mathematics and Computer Science, University of Basel, Switzerland b Institut de Digitalisation, University of Applied Sciences Western Switzerland (HES-SO), Neuchâtel, Switzerland Abstract Spread of online misinformation is an ubiquitous problem especially in the context of social media. In addition to the impact on global health caused by the current COVID-19 pandemic, the spread of related misinformation poses an additional health threat. Detecting and controlling the spread of misinformation using algorithmic methods is a challenging task. Relying on human fact-checking experts is the most reliable approach, how- ever, it does not scale with the volume and speed with which digital misinformation is being produced and disseminated. In this paper, we present the SAMS Human-in-the-loop (SAMS-HITL) approach to combat the detection and the spread of digital misinformation. SAMS-HITL leverages the fact-checking skills of humans by providing feedback on news stories about the source, author, message, and spelling. The SAMS features are jointly integrated into a machine learning pipeline for detecting misinformation. First results indicate that SAMS features have a marked impact on the classification as it improves accuracy by up to 7.1%. The SAMS- HITL approach goes one step further than the traditional human-in-the-loop models in that it helps raising awareness about digital misinformation by allowing users to become self fact-checkers. Keywords Digital Misinformation, Machine Learning, Crowdsourcing, Human-in-the-Loop 1. Introduction Advances in mobile technology have allowed for an unprecedented spread of information and both mis- and disinformation. The ease of transmission and sharing, the use of social media and mes- saging apps coupled with the increasing penetration of the Internet, provides a fertile ground for its spread [1]. As pointed out by Ciampaglia [2], the risk is the massive, uncontrolled, and often systemic spread of untrustworthy content. Digital misinformation comes in a variety of forms from entirely false to the integration of one or two misleading sentences in a piece of real news or just a provocative misleading title in introduction to a correct piece of news. In addition to this, one also finds rumor, hoaxes, satire, and conspiracy theories contributing to what can be characterized as an online false information ecosystem [3]. One can group such information under the umbrella term of misinformation. More recently, one also sees a distinction between misinformation, which can be spread with or without intent to mislead, and disinformation, which intends to spread false information [1]. In A. Martin, K. Hinkelmann, H.-G. Fill, A. Gerber, D. Lenat, R. Stolle, F. van Harmelen (Eds.), Proceedings of the AAAI 2021 Spring Symposium on Combining Machine Learning and Knowledge Engineering (AAAI-MAKE 2021) – Stanford University, Palo Alto, California, USA, March 22-24, 2021. email: shaban.shabani@unibas.ch (S. Shabani); zarina.charlesworth@he-arc.ch (Z. Charlesworth); maria.sokhn@hes-so.ch (M. Sokhn); heiko.schuldt@unibas.ch (H. Schuldt) orcid: 0000-0003-4710-6091 (S. Shabani); 0000-0002-2898-5716 (Z. Charlesworth); 0000-0001-7586-0564 (M. Sokhn); 0000-0001-9865-6371 (H. Schuldt) © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) The problem with misinformation is that it is pervasive and runs through all types of media from print to radio to online. The latter grew considerably during 2016 American presidential election and with the onset of the COVID-19 pandemic in March of 2020 has now taken on alarming proportions. In the words of T. A. Ghebreyesus1 , director-general of the WHO, speaking of COVID-19: “We are not just fighting an epidemic; we are fighting an infodemic”. Digital misinformation spreads faster and more easily than this virus, and is just as dangerous. Unlike the virus, however, COVID-19 related news has two strains true and false with the latter inundating social media channels, and going largely unverified. The expression infodemic was first used in 2003 by Rothkopf [4] when writing about the SARS epidemic and highlighting the negative impact that misinformation had on controlling the then health crisis – a crisis far from the size of what we are now experiencing with COVID-19. In today’s COVID-19 influenced world, we are indeed dealing with an infodemic and the question is how best to control it and fight the spread of misinformation. In order to counter this, and in light of the high level of user distrust towards online fact-checking services, there is an urgent need to train individuals to evaluate the veracity of the information they are receiving and sharing, and give them the tools to become fact-checkers in their own right. In recent years, fact-checking services have become the norm for journalists and are now also easily accessible by the public. Although the majority address issues in the political arena, SNOPES2 , a well known service, started out primarily by debunking urban legends. Such services certainly have a role to play in response to the challenge of online misinformation [5, 6], however, there is increasing interest in coming up with an automated and scalable response [7]. Despite the availability of fact- checking services, research suggests that there is a high level of distrust for such services [8, 9]. Yet another argument in support of the development of an individual user application. This research focuses specifically on digital misinformation. As this is technology-based, the so- lutions considered are also often only technology-related [7]. Some advocate that the response to the online spread of misinformation is through technology [10] and the integration of Artificial In- telligence (AI), others lean more towards human fact checkers. An alternate possibility is through a combination of the two allowing for the development of a high level performing model used by individuals. Research in the area of mixed-initiative fact-checking [11, 12, 13] suggests that AI alone cannot be as accurate as when the human element is integrated in the fact-checking process. Human- in-the-loop AI (HAI) systems face different challenges in terms of effective performance due to fact that individuals are involved. It is, however, possible to train models to a significant level of accuracy. We suggest going one step further in the battle to slow-the-flow by taking an HAI approach as well as involving those who are at the source by getting them on board as fact-checkers in their own rights. In order to do this, we have developed a user friendly tool that both identifies the veracity of the news and calls on the user to self-check four critical indicators on their own. Our proposed framework is called SAMS. Following a review of the published research and literature on credibility indicators [14, 15, 16] and fact-checking guides [9, 17] the choice of a limited number of checks seemed to be appropriate. In order to be best armed to counter the spread of misinformation, it is important to see who is spreading it. Spring [18] suggests grouping the spreaders into seven categories ranging from the “Joker” to the “Politician” and including “well-intentioned family members”. Zannettou et al. [3] go one step further including even bots for a total of ten categories. Keeping in mind the fact that we were looking for a limited number of indicators and yet ones that could be applied to all categories, the ones that repeatedly came to the top of the list were: source, author, message, and spelling (SAMS). 1 https://www.who.int/dg/speeches/detail/munich-security-conference 2 https://www.snopes.com/fact-check/ We went with these and launched into the development of a prototype. In this paper, we propose and evaluate the SAMS-HITL method. It uses supervised machine learn- ing models to classify news articles into false and true. It leverages the fact-checking skills of humans by providing answers to SAMS related questions of the news articles. The human feedback about the four SAMS indicators is joined with the automatically extracted features from the text of news articles. We evaluate this approach on a recently published dataset with articles related to COVID-19. Prelim- inary results show that the SAMS human-in-the-loop (SAMS-HITL) approach outperforms methods that rely only on automated techniques. Feeding the model with information about the source, author, message, and spelling provides higher accuracy in the classification task. Our contributions can be summarized as follows: • We conceptualized SAMS-HITL as a user friendly option to check information, which calls on human input and is backed up by machine learning models • We designed a crowdsourcing task that leverages the human cognitive skills of online crowd workers on providing answers to SAMS questions, using an aggregation method to infer the true answer based on multiple judgments • We implemented and evaluated SAMS-HITL approach on a dataset with COVID-19 related news articles. The proposed technique performs better than automatic classification models. Preliminary results indicate that SAMS features have a marked impact on the classification as it improves accuracy by up to 7.1%. The remainder of the paper is structured as follows: Section 2 introduces the concept behind SAMS and describes the individual components. Section 3 introduces the dataset and describes the imple- mentation. Section 4 details the evaluation and results of our methods. Section 5 presents related work and Section 6 concludes with suggestions for further research. 2. Concept In this section, we present the components of the SAMS-HITL approach. Figure 1 illustrates the overall architecture of SAMS. 2.1. Machine Learning Component Considering the rapid growth of online data and spread of misinformation, and the high impact it has on the society, efficient and effective data processing tools are essential. Approaches based on machine learning and deep learning techniques [19] have been comprehensively considered for fake news detection. The core component of SAMS is the supervised machine learning model which analyzes the news content. This model consists of two phases: i) feature extraction, and ii) model construction. Feature extraction is performed on the text coming from the headline and the body text. The head- line of a news item is a short text that is meant to catch the attention of the reader, whereas the body text is the main part that details the news story. We consider two types of features: statistical and sentiment features based on linguistic characteristics. The statistical features are extracted using the Term-Frequency Inverse-Document-Frequency (tf-idf) algorithm which measures the importance of words in the text document. Analyzing the sentiment of the news is very important especially when taking into account that much of the misinformation being spread started out as disinformation with the intent to deceive rather than to report objectively, sometimes for political or financial gain, and Figure 1: SAMS overall architecture – the feature extractor module initially performs a text cleaning step and generates the tf-idf features together with sentiment features. The labeled data in- cludes new articles that are categorized as false or true. Retrieving SAMS features is done by querying the crowdsourcing module and collecting answers from multiple users. Judgments from users are aggregated and injected to the set of automatically extracted features (tf-idf and sentiment) and finally the combined feature set is used by the model for prediction. in the COVID-19 pandemic situation to exploit public fear and uncertainty. Therefore, the sentiment features focus on capturing the objectivity aspect, the mood, modality, and polarity of the reported news. Additionally, the length of news stories is an important aspect as misinformation generated on social network channels tends to be short and catchy. Model construction builds the machine learning model to perform the classification, in order to differentiate between false and true news. For the evaluations, we selected four different state-of-the- art algorithms: Logistic Regression (LOG), Random Forest (RF), Gradient Boosting Classifier (GBC), and Support Vector Machines (SVM). 2.2. SAMS – Source, Author, Message, Spelling Classifying news articles by relying solely on machine learning models based on news content is a challenging task. One reason is that spreaders of misinformation have advanced their writing style and the language used in the news with the aim of distorting the truth and bypass detection by style- based models. Another very important factor is the length of the news: false stories, especially in the era of the COVID-19 pandemic, tend to be short and alerting, making it difficult to automatically analyze the message conveyed by such stories. Research in the area of mixed-initiative fact-checking [11, 12, 13] suggests that machine learning alone cannot be as accurate as when the human element is integrated in the fact-checking process. We have identified that important features when perform- ing fact-checking are: source, author, message, and spelling. Answering the questions about the four features of SAMS automatically is non-viable. In contrast, humans have the potential to do better, as they can perform fact-checking skills, by searching for facts on trustworthy data resources. We define a process with tips and tricks to easily answer each of the questions for SAMS. Source – taking a critical look at the source, both data and metadata, is the first step. The goal is to understand if the news stories have sources and if the sources are reliable. To better evaluate if the sources are trustworthy, we take a look at where the information originated, inspect if the references are stated and if so, trace the references checking if they are correct and trustworthy. Author – in principle, real and serious news articles always have an author. Therefore, the first step is to identify if there is an author of the news item. If so, further inspections include if the author is a journalist, their affiliation, academic or professional credentials. Furthermore, a check for related publications by the same author can be made. Message – the message should be clear, balanced, and unbiased. Guidelines to identify misinfor- mation suggest checking for unsupported or outrageous claims, if there is a push to share the infor- mation, lack of quotes, references or contributing sources, and identifying if the headlines provoke strong emotions. Spelling – reputable sources will proofread material prior to publishing. Misinformation tends to have grammar mistakes such as repeated spelling mistakes, poor grammar, incorrect punctuation, use of different fonts, the writing of entire words or phrases written in capital letters, etc. 2.3. Human-in-the-Loop Approach Obtaining reliable information related to the four aspects of SAMS is crucial for our approach towards misinformation detection. While experts such as journalists are well trained to search for the right data sources to find facts, employing them becomes expensive. Considering the amount and velocity of potential disseminated digital misinformation, this approach is not scalable in terms of time. On the other hand, crowdsourcing has been widely deployed for small tasks as it leverages the collective human skills of extensive online crowd worker communities. In specific scenarios, crowd- sourcing has shown to be an alternative service to replace experts with specific domain knowledge for labelling. In this work, we design a crowdsourcing component which aggregates the inputs from multiple users to infer the true answers related to SAMS questions. The output of the crowd answers is a vector of four binary values, each value corresponding to the SAMS questions. As could be seen in Figure 1, the output is encoded into a binary feature vector which later is appended to the feature vec- tor generated using the tf-idf algorithm and the sentiment features described in Section 2.1. Finally, the concatenated feature vector is used for training and evaluating the machine learning models. 3. Dataset and Implementation In this section, after presenting the dataset we used to evaluate our approach, we describe the imple- mentation of the SAMS pipeline. 3.1. Dataset In this experimental setup, we use the CoAID dataset collected and annotated by Cui and Lee [20]. The dataset consists of true and false news about COVID-19 from diverse sources mainly covering websites and social network platforms. There are several types of entries such as “news articles” collected from fact-checking reliable sources, “claims” posted by official channels of WHO, “user engagement” which include twitter posts and replies, and other “social platform posts” such as Facebook, Instagram, etc. For each entry, there is the title, abstract, content, keywords, and URL of the article or the post. Our interest is in analyzing news posts that contain potentially longer text and posted on various social media channels (online newspapers, blogs, communication apps etc.), therefore we focus only news articles, skipping the twitter posts with short text. As a result, we filtered the false and true news from the CoAID dataset, ending up with 1,127 true samples, and 266 false samples. Considering that the two classes are imbalanced, finally from the dataset we selected all 266 available false entries and 264 randomly sampled real news articles. Figure 2 illustrates the distribution of news length by word count and Table 1 describes the descriptive statistics of the articles length. Average length of true and false news articles is 35 and 33 words, respectively. Features LOG SVM GBC RF Class Min Max Mean StDev Med Total TF 76.4 77.4 75.2 77.9 false 7 85 33.27 16.05 31 8’850 TFS 82.2 80.6 87.6 91.5 true 11 89 35.17 12.32 34 9’285 TFST 87.8 82.8 93.1 93 TFSC 87.1 83.7 94.7 93.6 Table 1: Descriptive statistics of the dataset articles word length Table 2: Classification results (f1 score) 3.2. Implementation The first step towards implementing a model is data pre-processing. Since this is a text classification task, text cleaning is a useful process and that includes removing frequent words that provide non- unique information to the model (stop words) and special characters, and applying stemming and lemmatization. This part is important for extracting features using the tf-idf technique. On the other hand, extracting sentiment features is possible on the raw text and this is done with the pattern.en tool [21]. The sentiment features include: i) polarity which is given as a value between -1.0 (completely negative) and +1.0 (completely positive); ii) subjectivity which is a value between 0.0 (very objective) and 1.0 (very subjective); iii) modality feature represents the degree of certainty as a value between -1.0 and +1.0, where values higher than +0.5 represent facts ; iv) mood feature is a categorical value based on auxiliary verbs and the answer can be either “indicative”, “imperative”, “conditional”, or “subjunctive”. Additionally, we add the word count to the feature vector. The aim of this work is evaluating the SAMS-HITL approach and the importance of the four indi- cators in the classification task. Doing that, called for having answers to the four SAMS questions for every news record from the dataset. Initially, we labelled manually the 530 dataset entries. A trained annotator used an in-house developed web annotation interface to answer the SAMS questions for each dataset entry. The annotation was a tedious task that took approximately 30 hours. After that, we designed a crowdsourcing job on the Microworkers3 crowdsourcing platform. Each news story was used to generate a Human Intelligence Task (HIT), asking online crowd participants to provide the answers to SAMS questions which were stated as follows: 1. Is there a source in this news article? – Yes/No. 2. Is there an author in this news article? – Yes/No. 3. Is the message of this news article clear, unbiased, and balanced? – Yes/No. 4. Is spelling correct on this news article? – Yes/No. A HIT contained the URL of the original news story, the headline, and the body text. Crowd workers were instructed to click on the news link, inspect and analyze the article always considering the four questions that they were asked to answer. We used data quality control mechanisms [22] such as redundancy where for each task we asked three workers from three different regions: USA, Europe, and Asia. Analysis on demographics and dynamics of crowd workers on crowdsourcing platforms has shown that these regions are mostly represented [23] and this is the case in the Microworkers platform as well. Additionally, collecting judgments from crowd workers from different regions could be an 3 https://www.microworkers.com Figure 2: Distribution of news length by word Figure 3: Performance impact of different set of count features important factor as diversity matters when it comes to label quality [24]. Furthermore, task design techniques are important element for high quality data, therefore guidelines with tips and examples were part of the instructions in the crowdsourced job. The choice of obtaining multiple judgments for the same question from different user demographics can increase the quality of crowdsourced data. Depending on the task complexity, various aggregation techniques can be applied, such as voting strategies, profile-based or iterative aggregation algorithms [25]. The majority voting is the simplest as it is non-iterative and does not require pre-processing, aggregating each object independently by choosing the label with highest votes. In our scenario, for a single HIT that has SAMS questions, we would get three responses for each of the four questions and the aggregation selects the answers with highest votes. Since crowd workers can have different levels of expertise, profile-based strategies take into account information from their past contributions to build a ranking score, as well incorporate additional information such as location, domain of interests etc. The reputation score of crowd workers can be updated dynamically based on their performance on the existing task. Iterative algorithms are based on a sequence of computational rounds where in each round the probabilities of possible labels for each object are computed and updated repeatedly until convergence. For the answer aggregation in our SAMS-HITL approach, we applied the Dawid and Skene algorithm [26], a model that is based on the Expectation-Maximization (EM) principle to model the worker’s reliability with a confusion matrix. 4. Evaluation In this section, we outline the evaluation of the proposed SAMS-HITL approach. We used a 10-fold cross validation for evaluating the performance of the models, and accuracy and f1 score as evaluation metrics. To analyze the evaluation of models and the importance of the three different sets of features described in Sections 2.1 and 2.2, we run the evaluation of the models for each setting separately. We consider the following combinations of features: (i) tf-idf features (TF) (ii) tf-idf + sentiment features (TFS) (iii) tf-idf + sentiment features + SAMS trained annotator features (TFST) (iv) tf-idf + sentiment features + SAMS crowd features (TFSC) Table 2 shows the f1 score of the models under different features setup. When considering only tf-idf (TF) features extracted from the headline and body text, the Random Forest model achieves the highest accuracy of 77.9%. Appending the sentiment features (TFS) showed to have considerable impact on the performance of the models, lifting up the accuracy to 91.5%. Finally, adding the SAMS features in both options with crowdsourcing (TFSC) and trained annotator (TFST) shows a positive impact on the models’ performances. The Gradient Boosting classifier model achieves the highest accuracy of 94.7% with SAMS features obtained by aggregating the answers from the crowd. Interestingly, the TFSC approach performs slightly better than TFST in three out of the four models. It can be observed that sentiment features (TFS) have distinct impact on the accuracy for the Random Forest model compared to TF features, improving the accuracy by 13.6%. The TFSC features increase the accuracy by another 2.1% . This directs us to further inspect the evaluation with more samples and test the models performance with additional data sources. In such a scenario, we would expect that both SAMS (TFST and TFSC) features will make an even larger difference compared to the other settings. Figure 3 shows the effect of the four combinations of features during the evaluation. We can see that sentiment features (TFS) have significant impact on the performance compared to the tf-idf (TF) features. Furthermore, we can observe that both combinations with SAMS features overall indicate a significant difference compared to the TFS and TF approaches. Additionally, we evaluate the impor- tance of features with a tree of forests and analysis shows that most significant features are the SAMS features, where feature about the message had highest score, followed by spelling, source, and author. From the sentiment features, modality and the text length (word count) appeared in the top ten list. Finally, Pearson’s correlation analysis shows that source feature has a moderate positive correlation of 0.57 with the class, followed by message at 0.51. 5. Related Work In recent years, automatic misinformation classification has been extensively used under specific su- pervised scenarios. Shu et al. [19] explore the characterization of fake news in social media and the data mining perspectives. As an emerging topic, misinformaton has drawn attention of the research communities in different disciplines. As a result, several datasets have been published, related to po- litical news [27], rumor debunking [28], fake vs. satire detection [29], FEVER dataset for verification of textual sources [30], and a more recent dataset with news related to the COVID-19 pandemic [20]. Significant efforts have been made on exploring the potential of deep learning [27, 31] and the lin- guistic and semantic aspects [32, 33]. Crowdsourcing as a methodology can assist in classification of news articles by fact-checking the statements. Tschiatschek et al. [34] propose a strategy of flagging news considered as false by so- cial network users, and deploys an aggregation method to select subset of those flagged news for evaluation by experts. A recent work by Roitero et al. [35] analyzed how the crowd users assessed the truthfulness of false and true statements related to COVID-19 pandemic. Results show that the crowd is able to accurately classify statements and achieve a certain level of agreement with expert judgments. In response to combating the COVID-19 misinformation, Li et al. [36] developed the Jen- nifer chatbot which helps users to easily find information related to COVID-19. The chatbot provides reliable sources and diverse topics maintained by a global group of volunteers. Considering the sensitivity and the risk of misinformation spreading on one hand, and the limita- tions of both automated and human-based methods, hybrid human-machine approaches have been envisioned [12, 37]. For instance, the hybrid machine-crowd [38] approach has demonstrated higher accuracy for classification of fake and satiric stories. It uses a high-confidence switching method where crowd feedback is requested whenever the ensemble of machine learning models fails to achieve unanimity and high accuracy. A hybrid human-machine interactive approach [11] based on a prob- abilistic graphical model combines machine learning model predictions with crowd annotations for fact-checking. The follow-up user study [39] shows that predictions from automated models can help users in assessing claims correctly, hence tending to trust the system, even when model predictions were wrong. However, enabling interaction and having transparent model predictions has the poten- tial of training the users to build their own skills for fact-checking. 6. Conclusion and Future Work In this paper, we have addressed the issue of classifying news stories related to the COVID-19 pan- demic. We presented SAMS-HITL framework for misinformation detection. SAMS-HITL combines statistical and sentiment based features automatically extracted from the text of the news articles with the features related to the source, message, author, and spelling of the article obtained via crowdsourc- ing. Preliminary results showed that the four SAMS features are the most important features of the classification model and it has high impact on the overall classification accuracy. In summary, our pro- posed framework leverages the efficiency of machine learning models over a large amount of data and the quality of human intelligence for fact-checking. This method is helpful for social networks which could benefit from the high availability of their platform members and leverage their fact-checking skills to provide feedback on SAMS questions on news articles that are posted and shared on their platform. The SAMS-HITL approach goes one step further than the traditional HAI models in that it calls on the users to answer the four questions themselves thus raising user awareness about digital misinformation. The SAMS-HITL prototype application is currently being developed. Our objective is to help users check news articles, time train them to have a critical view and raise awareness about misinformation. In the long run, the impact can only be positive as even without the SAMS-HITL app people will think twice before passing news along. A limitation in the results presented is the size of the dataset and the potential bias in the classes due to the limited diversity of sources and the length of the text in the news articles. Validating SAMS- HITL will call for its application on a much larger dataset. Having more data gives opportunity to consider word-embedding techniques for feature extraction and application of deep learning models for classification. Future work intends to further investigate the SAMS features. One direction is to explore the potential of automatically answering the questions related to author and spelling, which could reduce the human effort. Automated tools in combination with a customized text processing algorithm can be used to identify grammar mistakes and generate a score for the spelling. For news articles published on web portals, identifying and extracting metadata information related to the au- thor could be done automatically. However, this is challenging for news stories published and shared via different social network channels. Further work on SAMS features will investigate the impact of using a score range instead of the binary yes/no values. Acknowledgments This work was partly funded by the HES-SO (project no. 104353) and Hasler Foundation in the context of the project City-Stories (contract no. 17055). References [1] L. Ha, L. A. Perez, R. Ray, Mapping recent development in scholarship on fake news and mis- information, 2008-2017: Disciplinary contribution, topics, and impact, American Behavioral Scientist August (2019) 1–26. [2] G. L. Ciampaglia, Fighting fake news: a role for computational social science in the fight against digital misinformation, Journal of Computational Social Science 1 (2018) 147–153. [3] S. Zannettou, M. Sirivianos, J. Blackburn, N. Kourtellis, The web of false information: Rumors, fake news, hoaxes, clickbait , and various other shenanigans, Journal of Data and Information Quality 11 (2019) 37. [4] D. J. Rothkopf, When the buzz bites back, The Washington Post (2003) BO1. [5] J. R. Harman, Collateral Damage: The Imperiled Status of Truth in American Public Discourse and Why it Matters to You, Author House, Bloomington, Indiana, 2014. [6] P. B. Brandtzaeg, A. Følstad, M. A. C. Dominguez, How journalists and social media users per- ceive online fact-checking and verification services, Journalism Practice 12 (2018) 1109–1129. [7] L. Graves, Understanding the Promise and Limits of Automated Fact-Checking, Report, Reuters Institute for the Study of Journalism et University of Oxford, 2018. [8] P. B. Brandtzaeg, A. Følstad, Trust and distrust in online fact-checking services, Communications of the ACM 60 (2017) 65–71. [9] Evaluating resources: the crapp test, 2015. URL: https://researchguides.ben.edu/c.php?g= 261612&p=2441794. [10] L. Konstantinovskiy, O. Price, M. Babakar, A. Zubiaga, Towards automated factchecking: Devel- oping an annotation schema and benchmark for consistent automated claim detection, 2020. [11] A. T. Nguyen, A. Kharosekar, S. Krishnan, S. Krishnan, E. Tate, B. C. Wallace, M. Lease, Believe it or not: Designing a human-ai partnership for mixed-initiative fact-checking, in: UIST ’18: The 31st Annual ACM Symposium on User Interface Software and Technology, 2018, pp. 189–199. [12] G. Demartini, S. Mizzaro, D. Spina, Human-in-the-loop artificial intelligence for fighting online misinformation: Challenges and opportunities, The Bulletin of the Technical Committee on Data Engineering 43 (2020) 65–74. [13] G. Rehm, An infrastructure for empowering internet users to handle fake news and other on- line media phenomena, in: International Conference of the German Society for Computational Linguistics and Language Technology, Springer, 2017, pp. 216–231. [14] D. Esteves, A. J. Reddy, P. Chawla, J. Lehmann, Belittling the source: Trustworthiness indicators to obfuscate fake news on the web, in: Proceedings of the First Workshop on Fact Extraction and VERification (FEVER), 2018, pp. 50–59. [15] A. X. Zhang, A. Ranganathan, S. E. Metz, S. Appling, C. M. Sehat, N. Gilmore, N. B. Adams, E. Vincent, J. Lee, M. Robbins, et al., A structured response to misinformation: Defining and annotating credibility indicators in news articles, in: The Web Conference 2018, 2018, p. 603–612. [16] D. M. Lazer, M. A. Baum, Y. Benkler, A. J. Berinsky, K. M. Greenhill, F. Menczer, M. J. Metzger, B. Nyhan, G. Pennycook, D. Rothschild, et al., The science of fake news, Science (2018). [17] Library guides: News literacy, Accessed 2020. URL: https://nwtc.libguides.com/news. [18] M. Spring, Coronavirus: The seven types of people who start and spread viral misinformation, BBC Trending (2020). URL: https://www.bbc.com/news/blogs-trending-52474347. [19] K. Shu, A. Sliva, S. Wang, J. Tang, H. Liu, Fake news detection on social media: A data mining perspective, SIGKDD Explor. Newsl. 19 (2017). [20] L. Cui, D. Lee, CoAID: COVID-19 Healthcare Misinformation Dataset, https://arxiv.org/abs/2006.00885, 2020. [21] T. De Smedt, W. Daelemans, Pattern for python, The Journal of Machine Learning Research 13 (2012) 2063–2067. [22] M. Allahbakhsh, B. Benatallah, A. Ignjatovic, H. R. Motahari-Nezhad, E. Bertino, S. Dustdar, Quality control in crowdsourcing systems: Issues and directions, Internet Computing (2013). [23] D. Difallah, E. Filatova, P. Ipeirotis, Demographics and dynamics of mechanical turk workers, in: Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, WSDM ’18, Association for Computing Machinery, New York, NY, USA, 2018, p. 135–143. [24] G. Kazai, J. Kamps, N. Milic-Frayling, The face of quality in crowdsourcing relevance labels: Demographics, personality and labeling accuracy, in: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, CIKM ’12, 2012. [25] N. Quoc Viet Hung, N. T. Tam, L. N. Tran, K. Aberer, An evaluation of aggregation techniques in crowdsourcing, in: Web Information Systems Engineering – WISE 2013, 2013, pp. 1–15. [26] A. P. Dawid, A. M. Skene, Maximum likelihood estimation of observer error-rates using the em algorithm, Journal of the Royal Statistical Society: Series C (Applied Statistics) 28 (1979) 20–28. [27] W. Y. Wang, “liar, liar pants on fire”: A new benchmark dataset for fake news detection, in: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017. [28] W. Ferreira, A. Vlachos, Emergent: a novel data-set for stance classification, in: Proceedings of the 2016 conference of the North American chapter of the association for computational lin- guistics: Human language technologies, 2016, pp. 1163–1168. [29] J. Golbeck, M. Mauriello, B. Auxier, K. H. Bhanushali, C. Bonk, M. A. Bouzaghrane, C. Buntain, R. Chanduka, P. Cheakalos, J. B. Everett, et al., Fake news vs satire: A dataset and analysis, in: Proceedings of the 10th ACM Conference on Web Science, 2018, pp. 17–21. [30] J. Thorne, A. Vlachos, C. Christodoulopoulos, A. Mittal, Fever: a large-scale dataset for fact extraction and verification, in: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics, 2018, pp. 809–819. [31] N. Ruchansky, S. Seo, Y. Liu, Csi: A hybrid deep model for fake news detection, in: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, 2017, pp. 797–806. [32] V. L. Rubin, N. Conroy, Y. Chen, S. Cornwell, Fake news or truth? using satirical cues to de- tect potentially misleading news, in: Proceedings of the second workshop on computational approaches to deception detection, 2016, pp. 7–17. [33] V. Pérez-Rosas, B. Kleinberg, A. Lefevre, R. Mihalcea, Automatic detection of fake news, in: Proceedings of the 27th International Conference on Computational Linguistics, 2018. [34] S. Tschiatschek, A. Singla, M. Gomez Rodriguez, A. Merchant, A. Krause, Fake news detection in social networks via crowd signals, in: The Web Conference 2018, WWW ’18, 2018, p. 517–524. [35] K. Roitero, M. Soprano, B. Portelli, D. Spina, V. Della Mea, G. Serra, S. Mizzaro, G. Demartini, The covid-19 infodemic: Can the crowd judge recent misinformation objectively?, in: Proceedings of the 29th ACM International Conference on Information & Knowledge Management, 2020. [36] Y. Li, T. Grandison, P. Silveyra, A. Douraghy, X. Guan, T. Kieselbach, C. Li, H. Zhang, Jennifer for COVID-19: An NLP-powered chatbot built for the people and by the people to combat mis- information, in: Proceedings of the 1st Workshop on NLP for COVID-19 at ACL 2020, 2020. [37] S. Ahmed, K. Hinkelmann, F. Corradini, Combining machine learning with knowledge engi- neering to detect fake news in social networks-a survey, in: AAAI’19 Spring Symposium, 2019. [38] S. Shabani, M. Sokhn, Hybrid machine-crowd approach for fake news detection, in: 2018 IEEE 4th International Conference on Collaboration and Internet Computing (CIC), 2018, pp. 299–306. [39] A. T. Nguyen, A. Kharosekar, M. Lease, B. C. Wallace, An interpretable joint graphical model for fact-checking from crowds., in: AAAI Conference on Artificial Intelligence, 2018, pp. 1511–1518.