FakeNews: Corona Virus and 5G Conspiracy Task at MediaEval 2020 Konstantin Pogorelov1 ,Daniel Thilo Schroeder13 ,Luk Burchard4 , Johannes Moe12 ,Stefan Brenner5 ,Petra Filkukova1 ,Johannes Langguth1 1 Simula Research Laboratory, Norway2 University of Oslo, Norway 3 Simula Metropolitan Center for Digital Engineering, Norway 4 Technical University of Berlin, Germany5 Stuttgart Media University, Germany {konstantin,daniels,langguth,petrafilkukova}@simula.no,l.burchard@campus.tu-berlin.de,\sb288@hdm-stuttgart.de, arenor.moe@gmail.com ABSTRACT field, e.g. [3, 5, 12, 16] that cover a wide range of approaches, in- The FakeNews: Corona Virus and 5G Conspiracy task, running cluding knowledge graphs, diffusion models, and natural language for the first time as part of MediaEval 2020, focuses on the clas- processing. These methods typically rely on labeled data. Conse- sification of tweet texts and retweet cascades for the detection quently, several such datasets have been published in recent years of fast-spreading misinformation, and therefore provides a low- [4, 6, 8, 14, 15, 17, 19, 21]. However, to our best knowledge, there threshold introduction to natural language processing and graph is no existing dataset that emphasizes Digital Wildfires and takes analysis. This paper describes the task, including use case and mo- retweet cascades into account. tivation, challenges, the dataset with ground truth, the required The task is intended to be of interest to researchers in the ar- participant runs, and the evaluation metrics. eas of online news, social media, multimedia analysis, multimedia information retrieval, natural language processing, and meaning understanding and situational awareness. 1 INTRODUCTION Digital wildfires, i.e., fast-spreading inaccurate, counterfactual, or 2 DATASET DETAILS intentionally misleading and information that can quickly permeate Our dataset’s creation can roughly be divided into four steps. First, public consciousness and have severe real-world implications, are we used Twitters’search API between January 17, 2020 and May 15, among the top global risks in the 21st century [10]. While misin- 2020 to collect a large number of statuses (i.e. tweets, retweets, quotes, formation is widespread on the internet, only a very small portion and replies) including key-words related to the COVID-19 pandemic. of it leads to harmful harmful acts in the real world. In2020. the Moreover, we filtered for those that mention 5G in any conceivable COVID-19 pandemic has severely affected people worldwide and spelling such as 5G, 5g, or #5g. Second, we restored as much of consequently dominated world news for months. Thus, it is no the Twitter threads as possible using our custom framework [18]. surprise that it has also been the topic of a massive amount of mis- The result is a graph of tweets, retweets, and quotes that does not information, which was most likely amplified by the fact that many only consist of statuses containing the obvious combination of details about the virus were unknown at the start of the pandemic. keywords but provides more subtle content like "All this to declare We are particularly interested in detecting content associated with martial law huh? Lol or do you wanna put fear in us so we can run a Digital Wildfire that relates COVID-19 to 5G wireless technology and get them vaccines to make us suseptible for these damn 5g tower and led to arson and attacks on telecommunications workers. De- radiations? Lol either way, the government is to NOT be trusted and spite the emphasis on COVID-19 and 5G, we further differentiate are up to something.". Some threads containing these statuses have between content that does not contain misinformation and content their origin long before the COVID-19 pandemic. Nevertheless, we attributed to other misinformation. Our task offers two subtasks: decided to include this data, too, since it contains context that led The first subtask includes text-based tweets classification, while to our Digital Wildfires’ emergence. In the third step, based on the second targets the classification of retweet cascades [11]. the number of statuses obtained in steps one and two, we started In contrast to text-only classification challenges, e.g., [1, 7, 13], the manual labeling. Therefore, we randomly selected a subset our dataset also contains retweet cascades, allowing us to consider of 10𝑘 tweets with their corresponding retweets. The annotation diffusion as a characteristic shown to be valuable for the spread of process has been performed by a team of researchers, postdocs, misinformation [20]. The final goal is the inclusion of various field Ph.Ds, and master students. Each team member received an part of experts aiming for efficient multi-modal approaches. Furthermore, the subsets and these data were then annotated manually. Most of we ask for evaluation of different approaches utilizing both as little the easy-to-annotate statuses were assessed and classified by one and as much training data as possible and evaluating the approaches annotator, but when assigning a class was not obvious, the tweet with respect to real-world imbalanced datasets [2]. was discussed with the entire group until consensus was reached. There are already many methods for automatic news analysis While the text dataset was prepared via manual labelling, extracting and fake content detection in the social media and news analysis the retweet cascades requires an additional step. A cascades root is Copyright 2020 for this paper by its authors. Use permitted under Creative always a labeled tweet while all other nodes correspond to retweets. Commons License Attribution 4.0 International (CC BY 4.0). We again made use of Twitter’s API to fetch these retweets and MediaEval’20, 14-15 December 2020, Online the underlying social network that connects users via follower MediaEval’20, December 2020, Online K. Pogorelov et al. relationships. Unfortunately, Twitter limits the number of available can submit four more runs for any of the described subtasks, i.e., retweets which narrows the cascade size to one hundred. Since each participants can submit up to ten runs in total. tweet and retweet contains a timestamp, one can track the temporal Text-Based Misinformation Detection Subtask: In this sub- diffusion. However, Twitter does not provide the true retweet path, task, the participants are asked to perform classification of the thus leaving it to the challenge participants to reconstruct it. We tweets based on the tweet text contents and other tweet-relevant use three classes to label tweets and retweet cascades: The 5G- multimedia and meta-information can be obtained from Twitter or Corona Conspiracy class corresponds to all tweets that claim or the Internet. The subtask requires one mandatory and four optional insinuate some deep or obvious connection between COVID-19 runs to be submitted. The required run implements a pure NLP clas- and 5G, such as the idea that 5G weakens the immune system and sification of tweets based only on tweet text content without using thus caused the current Corona-virus pandemic, or that there is any additional sources of data. Optional runs gradually extend the no pandemic and the COVID-19 victims were actually harmed by amount and types of allowed additional information implementing radiation emitted by 5G network towers. The crucial requirement is classification based on tweet text analysis in combination with vi- the claimed existence of some causal link. The Other Conspiracy sual information (images and/or videos) extracted from the original class corresponds to all tweets that spread conspiracy theories tweet and classification using any automatically scraped data from other than the ones discussed above. This includes ideas about an any external sources. intentional release of the virus, forced or harmful vaccinations, or Structure-Based Misinformation Detection Subtask: In this the virus being a hoax. The Non-Conspiracy class corresponds subtask, the participants are asked to perform a classification of to all tweets not belonging to the previous two classes and includes tweet graphs based on the tweet retweet graph, and additional those discussing COVID-19 pandemic itself, claiming that 5G is not retweet-tree-related information was obtained from Twitter. The proven to be absolutely safe or even can be harmful without linking subtask requires one mandatory and four optional runs to be sub- it to COVID-19, as well as claiming that authorities are pushing for mitted. The required run implements a pure tweet classification the installation of 5G while the Publicis distracted by COVID-19. In based only on the retweet graph structure only, without using any addition, tweets pointing out the existence of conspiracy theories additional data. Optional runs gradually extend the amount and or mocking them fall into this class since they do not spread the types of allowed additional information implementing classifica- conspiracy theories by inciting people to believe in them. tion based on a full set of retweet graph description, retweeting nodes’ properties, and using any automatically scraped data from 2.1 Dataset Contents any external sources. Thus, the participants are allowed to use only information that The development and test datasets consist of 6, 458 tweets and 2, 327 can be extracted from the provided tweets (including metadata) retweet graphs, and 3, 230 tweets and 1, 165 retweet graphs respec- and retweet cascades for generating the first and second run for tively, stored in two folders each: tweets and graphs. Both datasets both subtasks. In contrast, for other runs everything is allowed, are heavily unbalanced in terms of the number of samples per class, both from the data collection method perspective and the sources reflecting the distribution of tweet topics and people’s opinions. of information used. However, manual annotation of tweets or any To comply with the Twitter data publication policy, we provide externally scraped data is not allowed in any run. only tweet IDs, but not the tweet text itself. An additional tweet content download script is provided to obtain the tweets from their ids via the corresponding Twitter API using a user-supplied API 4 DISCUSSION AND OUTLOOK access keys. Retweet cascades are stored individually in a separate The task itself can be seen as very atypical and challenging due to a folder with three files. The edges.txt file contains a directed edge list fairly limited amount of information available to support the tweet source-node-ID to target-node-ID. The plot.png file contains a plot classification process. This reflects the real-world conditions in of the cascade. The nodes.csv contains an assignment from the node which online social media analysis systems are deployed. Thus, this ID to the following properties: id - an anonymized node ID which task is a practical attempt to make a step towards building a usable remains the same for all graphs in the dataset of all categories; time multi-modal social network analysis system that is able to combine - the time difference in seconds from each retweet to the original isolated data source properties with inter-source relations. Due to tweet. The original tweet always has a difference of 0 seconds to the importance of the use case, we hope to motivate researchers itself; friends - the next greater power of two of the follower count from different research fields to present their approaches, thereby from the user profile of the respective user; followers - the next performing research that can help society to fight against malicious greater power of two of the friend count from the user profile of manipulations of social networks and threats to society in general. the respective user. We hope that the FakeNews task can help to raise awareness of the topic, but also provide an interesting and meaningful use case to 3 EVALUATION METRICS AND SUBTASKS researchers interested in this application. The officially reported metric used for evaluating the multi-class classification performance is the multi-class generalization of the ACKNOWLEDGMENTS Matthews correlation coefficient (MCC) [9]. In case of equal metric This work was funded by the Norwegian SAMRISK-2 project ”UMOD” values, we use the timestamp of the official run submission to rank (#272019). It has benefited from the Experimental Infrastructure for the teams. For the evaluation, the participants must submit one Exploration of Exascale Computing (eX3), which is financially sup- run for both subtasks defined below. Additionally, they optionally ported by the Research Council of Norway under contract 270053. FakeNews: Corona Virus and 5G Conspiracy MediaEval’20, December 2020, Online REFERENCES In 2019 Sixth International Conference on Social Networks Analysis, [1] 2018. Toxic Comment Classification Challenge - Identify and clas- Management and Security (SNAMS). IEEE, 134–141. sify toxic online comments. (2018). https://www.kaggle.com/c/ [19] Kai Shu, Deepak Mahudeswaran, Suhang Wang, Dongwon Lee, and jigsaw-toxic-comment-classification-challenge/ Huan Liu. 2018. Fakenewsnet: A data repository with news content, [2] Nitesh V Chawla, Nathalie Japkowicz, and Aleksander Kotcz. 2004. social context and dynamic information for studying fake news on Special issue on learning from imbalanced data sets. ACM SIGKDD social media. arXiv preprint arXiv:1809.01286 8 (2018). explorations newsletter 6, 1 (2004), 1–6. [20] Soroush Vosoughi, Deb Roy, and Sinan Aral. 2018. The spread of true [3] Limeng Cui, Haeseung Seo, Maryam Tabar, Fenglong Ma, Suhang and false news online. Science 359, 6380 (2018), 1146–1151. Wang, and Dongwon Lee. 2020. DETERRENT: Knowledge Guided [21] William Yang Wang. 2017. " liar, liar pants on fire": A new benchmark Graph Attention Network for Detecting Healthcare Misinformation. dataset for fake news detection. arXiv preprint arXiv:1705.00648 (2017). https://doi.org/10.1145/3394486.3403092. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD ’20). Association for Computing Machinery, New York, NY, USA, 492–502. https://doi.org/10.1145/3394486.3403092 [4] Enyan Dai, Yiwei Sun, and Suhang Wang. 2020. Ginger Cannot Cure Cancer: Battling Fake Health News with a Comprehensive Data Repos- itory. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 14. 853–862. [5] Dylan de Beer and Machdel Matthee. 2020. Approaches to Identify Fake News: A Systematic Literature Review. In Integrated Science in Digital Age 2020, Tatiana Antipova (Ed.). Springer International Publishing, Cham, 13–22. [6] Sameer Dhoju, Md Main Uddin Rony, Muhammad Ashad Kabir, and Naeemul Hassan. 2019. Differences in health news from reliable and unreliable media. In Companion Proceedings of The 2019 World Wide Web Conference. 981–987. [7] Quan Do. 2019. Jigsaw Unintended Bias in Toxicity Classification. (2019). [8] Amira Ghenai and Yelena Mejova. 2018. Fake cures: user-centric modeling of health misinformation in social media. Proceedings of the ACM on human-computer interaction 2, CSCW (2018), 1–20. [9] Jan Gorodkin. 2004. Comparing two K-category assignments by a K- category correlation coefficient. Computational biology and chemistry 28, 5-6 (2004), 367–374. [10] Lee Howell. 2013. Digital Wildfires in a Hyperconnected World. https: //bit.ly/2GiEF4f. (2013). [11] Andrey Kupavskii, Liudmila Ostroumova, Alexey Umnov, Svyatoslav Usachev, Pavel Serdyukov, Gleb Gusev, and Andrey Kustarev. 2012. Prediction of retweet cascade size over time. In Proceedings of the 21st ACM international conference on Information and knowledge manage- ment. 2335–2338. [12] Thai Le, Suhang Wang, and Dongwon Lee. 2020. MALCOM: Gener- ating Malicious Comments to Attack Neural Fake News Detection Models. arXiv preprint arXiv:2009.01048 (2020). [13] Akshay Mungekar, Nikita Parab, Prateek Nima, and Sanchit Pereira. 2019. Quora insincere question classification. National College of Ireland (2019). [14] Mahmoud Nabil, Mohamed Aly, and Amir Atiya. 2015. Astd: Arabic sentiment tweets dataset. In Proceedings of the 2015 conference on empirical methods in natural language processing. 2515–2519. [15] Kai Nakamura, Sharon Levy, and William Yang Wang. 2019. r/fakeddit: A new multimodal benchmark dataset for fine-grained fake news detection. arXiv preprint arXiv:1911.03854 (2019). [16] Verónica Pérez-Rosas, Bennett Kleinberg, Alexandra Lefevre, and Rada Mihalcea. 2017. Automatic detection of fake news. arXiv preprint arXiv:1708.07104 (2017). [17] Fatima K Abu Salem, Roaa Al Feel, Shady Elbassuoni, Mohamad Jaber, and May Farah. 2019. Fa-kes: A fake news dataset around the syrian war. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 13. 573–582. [18] Daniel Thilo Schroeder, Konstantin Pogorelov, and Johannes Langguth. 2019. FACT: a Framework for Analysis and Capture of Twitter Graphs.