INTRODUCTION

FakeNews: Corona Virus and Conspiracies Multimedia Analysis Task at MediaEval 2021

Konstantin Pogorelov

konstantin@simula.no 1

Daniel Thilo Schroeder

daniels@simula.no

Stefan Brenner

Johannes Langguth

langguth@simula.no 1 0 Simula Metropolitan Center for Digital Engineering , Norway 1 Simula Research Laboratory , Norway 2 Stuttgart Media University , Germany 3 Technical University of Berlin , Germany 4 University of Oslo , Norway

2021

13 15

The FakeNews: Corona Virus and Conspiracies Multimedia Analysis task, running for the second time as part of MediaEval 2021, focuses on the classification of tweet texts aiming detection of fastspreading misinformation. Task of this year extends the number of target conspiracy theories and introduces new challenges in terms of analysis complexity of the imbalanced dataset. This paper describes the task, including use case and motivation, challenges, the dataset with ground truth, the required participant runs, and the evaluation metrics.

INTRODUCTION

During the development of the COVID-crisis, a lot of new COVIDrelated conspiracy theories have arise. Despite eforts of the major social networks, mass-spread fake facts, irrational theories and news-like posts are widely presented in the online media sources. Rumors and other fast-spreading inaccurate, counterfactual, or intentionally misleading information can quickly permeate public consciousness and have severe real-world implications. Public attention to the problem have already allowed content moderation and partial limitation of freedom of speech in order to prevent manipulation of COVID-related public opinion. Thus, fake news and intentional missinformation are still among the top global risks in the 21st century [ 6 ]. Consequentially, we are particularly interested in detecting content associated with the fake news and COVIDrelated missinformation. We further diferentiate between content that does not contain misinformation and content attributed to other misinformation. Our task ofers three subtasks, all require text-based tweets classification.

Similar to text-only classification challenges, e.g., [ 1, 4, 7 ], we expect to see NLP approaches for tweet text analysis, but we aim wider set of conspiracy theories and diferent-level detection methodologies. Furthermore, we ask for evaluation of diferent approaches with respect to real-world imbalanced datasets [ 3 ].

The task is intended to be of interest to researchers in the areas of online news, social media, multimedia analysis, multimedia information retrieval, natural language processing, and meaning understanding and situational awareness.

DATASET DETAILS

Our datasets creation can roughly be divided into four steps. First, We used Twitters’ search API between January 17, 2020 and Jun made publicly available and they are sent to the members of the research team via the direct emails.

After the challenge, the annotated datasets containing only tweet IDs, but not the tweet text itself will be made publicly available. These publicly available datasets will be shufled and supplied by the additional content to prevent linking to the full-text datasets was used during the challenge by the researcher team. An additional tweet content download script will be provided to obtain the tweets from their ids via the corresponding Twitter API using a usersupplied API access keys. 3

EVALUATION METRICS AND SUBTASKS

The oficially reported metric used for evaluating the multi-class classification performance is the multi-class generalization of the Matthews correlation coeficient (MCC, Rk-statistic) [ 5 ]. This metric provides an eficient and reliable comparison for multi-class classifiers for both balanced and unbalanced datasets.

In case of equal metric values, we use the timestamp of the oficial run submission to rank the teams. For the evaluation, the participants must submit at least one run for at least one subtask defined below. Additionally, the participants optionally can submit four more runs for any of the described subtasks, i.e., participants can submit up to 15 runs in total.

Text-Based Misinformation Detection: In this subtask, the

participants receive a dataset consisting of tweet text blocks in English related to COVID-19 and various conspiracy theories. The participants are encouraged to build a multi-class classifier that can lfag whether a tweet promotes/supports or discusses at least one (or many) of the conspiracy theories. In the case if the particular tweet promotes/supports one conspiracy theory and just discusses another, the result of the detection for the particular tweet is expected to be equal to "stronger" class: promote/support in the given sample.

Text-Based Conspiracy Theories Recognition: In this sub

task, the participants receive a dataset consisting of tweet text blocks in English related to COVID-19 and various conspiracy theories. The main goal of this subtask is to build a detector that can detect whether a text in any form mentions or refers to any of the predefined conspiracy topics.

Text-Based Combined Misinformation and Conspiracies

Detection: In this subtask, the participants receive a dataset consisting of tweet text blocks in English related to COVID-19 and various conspiracy theories. The goal of this subtask is to build a complex multi-labelling multi-class detector that for each topic from a list of predefined conspiracy topics can predict whether a tweet promotes/supports or just discusses that particular topic.

All the subtask, in which the team has decided to participate, requires one mandatory and four optional runs to be submitted. The required mandatory run implements a pure NLP classification of tweets based only on tweet text content without using any additional sources of data. Optional runs gradually extend the amount and types of allowed additional information by implementing classification based on tweet text analysis in combination with pretrained models and classification using any automatically scraped data from any external sources. Manual annotation of tweets or any externally scraped data is not allowed in any run.

In the submitted runs participants are allowed to use an additional Cannot Determine class. This additional class represents cases, when the output of the classifier is not reliable. This additional class is important for evaluation of multi-class classifiers. The efect of using Cannot Determine class is described in the related literature [ 2 ]. In-short, marking a sample that classifier cannot reliable classify as an unknown class afects the resulting classification performance less negatively than marking the sample with a wrong class label, exactly as it expected to be implemented in a real-world classification tasks.

With respect to the subtasks evaluation, the following methodology is used. Text-Based Misinformation Detection subtask is evaluated with Rk-statistic directly. Text-Based Conspiracy

Theories Recognition and Text-Based Combined Misinfor

mation and Conspiracies Detection subtasks are evaluated with the two-steps evaluation procedure. First, evaluation of each conspiracy theory individually and independently is performed using Rk-statistic. Then all the computed Rk-statistic values across all the conspiracy theories are averaged and the resulting averaged value is used to compare results of diferent teams. Finally, results in each conspiracy theory group are evaluated independently, but this evaluation is auxiliary and do not afect the final teams ranking. 4

DISCUSSION AND OUTLOOK

The task itself can be seen as very atypical and challenging due to a fairly limited amount of information available to support the tweet classification process. This reflects the real-world conditions in which online social media analysis systems are deployed. Thus, this task is a practical attempt to make a step towards building a usable multi-modal social network analysis system that is able to combine isolated data source properties with inter-source relations. Due to the importance of the use case, we hope to motivate researchers from diferent research fields to present their approaches, thereby performing research that can help society to fight against malicious manipulations of social networks and threats to society in general. We hope that the FakeNews task can help to raise awareness of the topic, but also provide an interesting and meaningful use case to researchers interested in this application.

ACKNOWLEDGMENTS

This work was funded by the Norwegian Research Council under contracts #272019 and #303404 and has benefited from the Experimental Infrastructure for Exploration of Exascale Computing (eX3), which is financially supported by the Research Council of Norway under contract 270053. We also acknowledge support from Michael Kreil in the collection of Twitter data.

[1] 2018 . Toxic Comment Classification Challenge - Identify and classify toxic online comments . ( 2018 ). https://www.kaggle.com/c/ jigsaw-toxic -comment-classification-challenge/

[2]

Sabri

Boughorbel , Fethi Jarray, and Mohammed El-Anbari. 2017 . Optimal classifier for imbalanced data using Matthews Correlation Coeficient metric . PloS one 12 , 6 ( 2017 ), e0177678 .

[3] Nitesh

V Chawla

Nathalie

Japkowicz , and

Aleksander

Kotcz . 2004 . Special issue on learning from imbalanced data sets . ACM SIGKDD explorations newsletter 6 , 1 ( 2004 ), 1 - 6 .

[4]

Quan

Do . 2019 . Jigsaw Unintended Bias in Toxicity Classification . ( 2019 ).

[5]

Jan

Gorodkin . 2004 . Comparing two K-category assignments by a Kcategory correlation coeficient . Computational biology and chemistry 28 , 5 - 6 ( 2004 ), 367 - 374 .

[6]

Lee

Howell . 2013 . Digital Wildfires in a Hyperconnected World . https: //bit.ly/2GiEF4f. ( 2013 ).

[7]

Akshay

Mungekar , Nikita Parab, Prateek Nima, and

Sanchit

Pereira . 2019 . Quora insincere question classification . National College of Ireland ( 2019 ).

[8]

Konstantin

Pogorelov , Daniel Thilo Schroeder, Petra Filkuková, Stefan Brenner, and

Johannes

Langguth . 2021 . WICO Text: A Labeled Dataset of Conspiracy Theory and 5G-Corona Misinformation Tweets . In Proceedings of the 2021 Workshop on Open Challenges in Online Social Networks . 21 - 25 .