=Paper=
{{Paper
|id=Vol-3181/paper65
|storemode=property
|title=Detecting COVID-19-Related Conspiracy Theories in Tweets
|pdfUrl=https://ceur-ws.org/Vol-3181/paper65.pdf
|volume=Vol-3181
|authors=Youri Peskine,Giulio Alfarano,Ismail Harrando,Paolo Papotti,Raphaël
Troncy
|dblpUrl=https://dblp.org/rec/conf/mediaeval/PeskineAHPT21
}}
==Detecting COVID-19-Related Conspiracy Theories in Tweets==
Detecting COVID-19-Related Conspiracy Theories in Tweets Youri Peskine, Giulio Alfarano, Ismail Harrando, Paolo Papotti, Raphael Troncy EURECOM, France firstName.lastName@eurecom.fr ABSTRACT methods that allow us to obtain a first baseline. Several algorithms Misinformation in online media has become a major research topic have been tested: Decision Tree, Naive Bayes classifier (Gaussian the last few years, especially during the COVID-19 pandemic. In- and Bernoullian), AdaBoost, Ridge and Logistic Regression. In the deed, false or misleading news about coronavirus have been charac- case of Task 1, these were used in a multi-class asset. In the multi- terized as an infodemic 1 by the World Health Organization, because label case of Task 2, we used a multi-output classifier with the of how fast it can spread online. A considerable vector of spreading different methods listed above as estimators: in this scenario the misinformation is represented by conspiracy theories. During this algorithm instantiates a binary model for each conspiracy theory. challenge, we tackled the problem of detecting COVID-19-related Finally, in the Task 3, only the strictly tree-based algorithms were conspiracy theories in tweets. To perform this task, we used dif- tested, since they are the only ones to allow a multi-label and multi- ferent approaches such as a combination of TFIDF and machine class output. learning algorithms, transformer-based neural networks or Natural Language Inference. Our best model obtains a MCC score of 0.726 2.2 NLI-based approach for the main task on the validation set and a MCC score of 0.775 This approach relies on leveraging pre-trained language models on the test set making it the best performing method among the that are then fine-tuned on the task of NLI. Put simply, given two challenge competitors. statements (a premise and a hypothesis), these models are trained to classify the logical relationship between them: entailment (agree- 1 INTRODUCTION ment or support), contradiction (disagreement), or neutrality (unde- termined). Since these models are trained to project statements that The full description of the task is detailed in [7] and more infor- share similar opinions into close points in their embedding space, mation about the dataset can be found in [8]. Text classification our hypothesis was to identify the tweets that support/discuss the is a problem widely studied in many different fields for various same conspiracies using this common embedding space. applications such as sentiment analysis or topic modeling. Standard For the Task 1, we need to differentiate between the different machine learning based approaches used in combination with Term stances (agreement, discussion, neutrality) regarding conspiracy Frequency - Inverse Document Frequency (TFIDF) are considered theories. Therefore, we generate an embedding for all the tweets decent baselines [2] for performing text classification tasks. How- using the fine-tuned model, and we then classify them using a ever, the recent introduction of transformer-based architectures like K-nearest Neighbor classifier, the idea being that tweets sharing BERT [1], RoBERTa [3] or DistilBERT [9] has allowed significant similar stances would be embedded close to each other. This ap- improvement in various text-based problems [5]. proach can also be applied as is for the second task. For the Task 2, we provided as a premise to the model a definition 2 APPROACH of each conspiracy theory, relying mostly on the Wikipedia articles In order to tackle this challenge, we studied three different kind of describing them, thus classifying a tweet as pertaining to one of the approaches. The first uses a combination of TFIDF and machine listed conspiracies if the pre-trained model predicts that there is a learning algorithms. The second approach uses Natural Language entailment relationship between the definition of the conspiracy Inference (NLI) combined with metadata from Wikipedia. The third (premise) and the tweet text (hypothesis). approach aims at fine-tuning transformer-based models. In the Finally, as a combination of both methods, we also used some following sections, we discuss the experiments we pursued for each annotated tweets related to a specific conspiracy as a premise in- of these approaches. In order to ease reproducibility, we release all stead of a definition of the conspiracy, and proceed to classify them our code at https://github.com/D2KLab/mediaeval-fakenews. by whether an entailment relationship is found. 2.1 TFIDF-based approach 2.3 Transformer-based approach TFIDF is one of the most widely used feature extraction techniques Transformer-based models have been performing remarkably well in the field of text processing, often used in parallel with pre- on text classification tasks in the last few years. For our submissions, processing techniques such as tokenization, capitalization and stop we used RoBERTa [3] large pre-trained models and COVID-Twitter- word removal, which we also applied to the dataset in question. The BERT (CT-BERT) [6] pre-trained models in different ways to tackle derived features are fed to different supervised machine learning each sub-task. The CT-BERT model is a BERT-large model pre- 1 https://www.who.int/health-topics/infodemic trained on COVID-related tweets. For this approach, we decided to use some pre-processing on the input tweets. We replaced all emojis Copyright 2021 for this paper by its authors. Use permitted under Creative Commons with their textual meaning, and removed all the ’#’ characters. License Attribution 4.0 International (CC BY 4.0). MediaEval’21, December 13-15 2021, Online The simplest strategy when working with transformer-based models for the first task is to approach it as a 3-class classification MediaEval’21, December 13-15 2021, Online Y. Peskine, G. Alfarano, I. Harrando, P. Papotti, R. Troncy Table 1: MCC results for each task, based on stratified 5-fold cross-validation set and then on the test set Models Evaluation MCC Test MCC TFIDF (SVC) 0.461 0.498 NLI transformer 0.426 X RoBERTa 0.624 X Task 1 CT-BERT 0.676 X RoBERTa-task3 0.667 X CT-BERT-task3 0.700 0.720 Figure 1: Training framework of transformer-based models Ensembling Models 0.716 0.733 performing Task 3. Each color represent a conspiracy theory. TFIDF (Multi output clf) 0.585 0.317 NLI Wikipedia 0.310 X RoBERTa 0.731 X Task 2 problem. Both RoBERTa and CT-BERT are fine tuned on the data to CT-BERT 0.780 0.774 perform classification with a weighted Cross Entropy loss function. RoBERTa-task3 0.734 X We approach the second task as a multi-label binary classification CT-BERT-task3 0.743 0.719 problem. Both models are fine tuned to perform this objective with Ensembling Models 0.781 0.768 a weighted Binary Cross Entropy loss function. TFIDF (DT) 0.497 0.186 The third and main task can be performed with different strate- RoBERTa-task1+task2 0.675 X gies. We first try to combine our results of the first two tasks, by CT-BERT-task1+task2 0.717 0.775 Task 3 labeling the tweets with the level detected in Task 1 for the con- RoBERTa-task3 0.690 X spiracy theories detected in Task 2. While this approach obtains CT-BERT-task3 0.706 0.713 convincing results, it is not able to deal properly with cases where Ensembling Models 0.726 0.676 tweets discuss about one conspiracy theory but support another one. An alternative approach is to train both transformer-based models to perform the main task directly. These models are fine They remain an interesting alternative in the case where annotated tuned for nine different classification problems with nine Cross data is lacking: a few tweets of each class or, minimally, just the Entropy loss functions, one for each conspiracy theory. The final definition of the classes are enough to provide some decent results. loss is the weighted sum of the nine losses. This training framework Looking at conspiracy theories, our worst results on Tasks 2 is illustrated in Figure 1. The advantage of such approach is that a and 3 are about the Intentional Pandemic theory, even though it single model is trained to perform all the tasks at once, because the is the most represented class in the dataset. Instead, our best re- first two tasks are just simplifications of the main task. sults are obtained with the Harmful Influence and New World Order In our experiments, the weights of all our loss functions are theories. One possible explanation is that both conspiracies can proportional to the inverse frequency of each class or sub-class be represented with very specific keywords (’5g’ or ’NWO’ for they are related to, and all our models are trained using the AdamW example). [4] optimizer. We also performed late fusion ensembling through majority voting with different combination of transformer-based models to 3 RESULTS AND ANALYSIS further improve our results on all the tasks. While this was our Our results for this challenge are presented in Table 1. All the models best results on a stratified 5-fold cross-validation set, this was less have been first evaluated on a stratified 5-fold cross-validation set competitive on the test set. and then evaluated on the test set. Transformer-based approaches obtained the most competitive 4 DISCUSSION AND OUTLOOK results. First we notice that RoBERTa models are under-performing compared to CT-BERT models on all the tasks. The latter models In this paper, we presented three different methods to perform are more suited to this dataset because it contains tweets that use COVID-19-related conspiracy theories detection in tweets. The first plenty of COVID-related vocabulary that would not be understood approach consist of a combination of standard machine learning with the former models. It is also worth mentioning that models based algorithm with TF-IDF. The second approach uses NLI with trained on the main task (Task 3) perform better on Task 1 than transformer-based models and Wikipedia enrichment. The last their task-specific counterpart. approach aims at fine-tuning transformer-based models on the For the TFIDF approach, the best performing method for Task given dataset. Our best model obtains a MCC score of 0.775 for the 1 is the Support Vector Machine with a MCC score of 0.461. For main task on the test set which outperforms by a large margin all Task 2, the Decision Tree gave the best result with 0.585 using the the other competitors in this challenge. multi output classifier. On Task 3, the best result was given by the Decision Tree with an MCC of 0.497. ACKNOWLEDGMENTS For the NLI approach, the observed results of these methods on This work has been partially supported by CHIST-ERA within the cross-validation did not measure up to the fully-trained models. CIMPLE project (CHIST-ERA-19-XAI-003). FakeNews: Corona Virus and Conspiracies Multimedia Analysis Task MediaEval’21, December 13-15 2021, Online REFERENCES [1] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. (2019). arXiv:cs.CL/1810.04805 [2] Kowsari, Jafari Meimandi, Heidarysafa, Mendu, Barnes, and Brown. 2019. Text Classification Algorithms: A Survey. Information 10, 4 (2019). [3] Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoy- anov. 2019. RoBERTa: A Robustly Optimized BERT Pretraining Ap- proach. (2019). arXiv:cs.CL/1907.11692 [4] Ilya Loshchilov and Frank Hutter. 2019. Decoupled Weight Decay Regularization. (2019). arXiv:cs.LG/1711.05101 [5] Shervin Minaee, Nal Kalchbrenner, Erik Cambria, Narjes Nikzad, Meysam Chenaghlu, and Jianfeng Gao. 2021. Deep Learning Based Text Classification: A Comprehensive Review. (2021). arXiv:cs.CL/2004.03705 [6] Martin Müller, Marcel Salathé, and Per E Kummervold. 2020. COVID- Twitter-BERT: A Natural Language Processing Model to Analyse COVID-19 Content on Twitter. (2020). arXiv:cs.CL/2005.07503 [7] Konstantin Pogorelov, Daniel Thilo Schroeder, Stefan Brenner, and Johannes Langguth. 2021. FakeNews: Corona Virus and Conspiracies Multimedia Analysis Task at MediaEval 2021. In Multimedia Bench- mark Workshop. [8] Konstantin Pogorelov, Daniel Thilo Schroeder, Petra Filkukova, and Johannes Langguth. 2021. WICO Text: A Labeled Dataset of Conspir- acy Theory and 5G-Corona Misinformation Tweets. In Workshop on Open Challenges in Online Social Networks (OASIS). [9] Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2020. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. (2020). arXiv:cs.CL/1910.01108