=Paper= {{Paper |id=Vol-3181/paper65 |storemode=property |title=Detecting COVID-19-Related Conspiracy Theories in Tweets |pdfUrl=https://ceur-ws.org/Vol-3181/paper65.pdf |volume=Vol-3181 |authors=Youri Peskine,Giulio Alfarano,Ismail Harrando,Paolo Papotti,Raphaël Troncy |dblpUrl=https://dblp.org/rec/conf/mediaeval/PeskineAHPT21 }} ==Detecting COVID-19-Related Conspiracy Theories in Tweets== https://ceur-ws.org/Vol-3181/paper65.pdf
      Detecting COVID-19-Related Conspiracy Theories in Tweets
                    Youri Peskine, Giulio Alfarano, Ismail Harrando, Paolo Papotti, Raphael Troncy
                                                                      EURECOM, France
                                                               firstName.lastName@eurecom.fr
ABSTRACT                                                                             methods that allow us to obtain a first baseline. Several algorithms
Misinformation in online media has become a major research topic                     have been tested: Decision Tree, Naive Bayes classifier (Gaussian
the last few years, especially during the COVID-19 pandemic. In-                     and Bernoullian), AdaBoost, Ridge and Logistic Regression. In the
deed, false or misleading news about coronavirus have been charac-                   case of Task 1, these were used in a multi-class asset. In the multi-
terized as an infodemic 1 by the World Health Organization, because                  label case of Task 2, we used a multi-output classifier with the
of how fast it can spread online. A considerable vector of spreading                 different methods listed above as estimators: in this scenario the
misinformation is represented by conspiracy theories. During this                    algorithm instantiates a binary model for each conspiracy theory.
challenge, we tackled the problem of detecting COVID-19-related                      Finally, in the Task 3, only the strictly tree-based algorithms were
conspiracy theories in tweets. To perform this task, we used dif-                    tested, since they are the only ones to allow a multi-label and multi-
ferent approaches such as a combination of TFIDF and machine                         class output.
learning algorithms, transformer-based neural networks or Natural
Language Inference. Our best model obtains a MCC score of 0.726                      2.2    NLI-based approach
for the main task on the validation set and a MCC score of 0.775                     This approach relies on leveraging pre-trained language models
on the test set making it the best performing method among the                       that are then fine-tuned on the task of NLI. Put simply, given two
challenge competitors.                                                               statements (a premise and a hypothesis), these models are trained to
                                                                                     classify the logical relationship between them: entailment (agree-
1     INTRODUCTION                                                                   ment or support), contradiction (disagreement), or neutrality (unde-
                                                                                     termined). Since these models are trained to project statements that
The full description of the task is detailed in [7] and more infor-
                                                                                     share similar opinions into close points in their embedding space,
mation about the dataset can be found in [8]. Text classification
                                                                                     our hypothesis was to identify the tweets that support/discuss the
is a problem widely studied in many different fields for various
                                                                                     same conspiracies using this common embedding space.
applications such as sentiment analysis or topic modeling. Standard
                                                                                         For the Task 1, we need to differentiate between the different
machine learning based approaches used in combination with Term
                                                                                     stances (agreement, discussion, neutrality) regarding conspiracy
Frequency - Inverse Document Frequency (TFIDF) are considered
                                                                                     theories. Therefore, we generate an embedding for all the tweets
decent baselines [2] for performing text classification tasks. How-
                                                                                     using the fine-tuned model, and we then classify them using a
ever, the recent introduction of transformer-based architectures like
                                                                                     K-nearest Neighbor classifier, the idea being that tweets sharing
BERT [1], RoBERTa [3] or DistilBERT [9] has allowed significant
                                                                                     similar stances would be embedded close to each other. This ap-
improvement in various text-based problems [5].
                                                                                     proach can also be applied as is for the second task.
                                                                                         For the Task 2, we provided as a premise to the model a definition
2     APPROACH                                                                       of each conspiracy theory, relying mostly on the Wikipedia articles
In order to tackle this challenge, we studied three different kind of                describing them, thus classifying a tweet as pertaining to one of the
approaches. The first uses a combination of TFIDF and machine                        listed conspiracies if the pre-trained model predicts that there is a
learning algorithms. The second approach uses Natural Language                       entailment relationship between the definition of the conspiracy
Inference (NLI) combined with metadata from Wikipedia. The third                     (premise) and the tweet text (hypothesis).
approach aims at fine-tuning transformer-based models. In the                            Finally, as a combination of both methods, we also used some
following sections, we discuss the experiments we pursued for each                   annotated tweets related to a specific conspiracy as a premise in-
of these approaches. In order to ease reproducibility, we release all                stead of a definition of the conspiracy, and proceed to classify them
our code at https://github.com/D2KLab/mediaeval-fakenews.                            by whether an entailment relationship is found.

2.1     TFIDF-based approach                                                         2.3    Transformer-based approach
TFIDF is one of the most widely used feature extraction techniques                   Transformer-based models have been performing remarkably well
in the field of text processing, often used in parallel with pre-                    on text classification tasks in the last few years. For our submissions,
processing techniques such as tokenization, capitalization and stop                  we used RoBERTa [3] large pre-trained models and COVID-Twitter-
word removal, which we also applied to the dataset in question. The                  BERT (CT-BERT) [6] pre-trained models in different ways to tackle
derived features are fed to different supervised machine learning                    each sub-task. The CT-BERT model is a BERT-large model pre-
1 https://www.who.int/health-topics/infodemic                                        trained on COVID-related tweets. For this approach, we decided to
                                                                                     use some pre-processing on the input tweets. We replaced all emojis
Copyright 2021 for this paper by its authors. Use permitted under Creative Commons   with their textual meaning, and removed all the ’#’ characters.
License Attribution 4.0 International (CC BY 4.0).
MediaEval’21, December 13-15 2021, Online                                               The simplest strategy when working with transformer-based
                                                                                     models for the first task is to approach it as a 3-class classification
MediaEval’21, December 13-15 2021, Online                                             Y. Peskine, G. Alfarano, I. Harrando, P. Papotti, R. Troncy

                                                                          Table 1: MCC results for each task, based on stratified 5-fold
                                                                          cross-validation set and then on the test set

                                                                                           Models              Evaluation MCC       Test MCC
                                                                                         TFIDF (SVC)                0.461             0.498
                                                                                       NLI transformer              0.426               X
                                                                                           RoBERTa                  0.624               X




                                                                           Task 1
                                                                                           CT-BERT                  0.676               X
                                                                                        RoBERTa-task3               0.667               X
                                                                                        CT-BERT-task3               0.700             0.720
Figure 1: Training framework of transformer-based models                              Ensembling Models             0.716             0.733
performing Task 3. Each color represent a conspiracy theory.                        TFIDF (Multi output clf)        0.585             0.317
                                                                                        NLI Wikipedia               0.310               X
                                                                                           RoBERTa                  0.731               X




                                                                           Task 2
problem. Both RoBERTa and CT-BERT are fine tuned on the data to                            CT-BERT                  0.780             0.774
perform classification with a weighted Cross Entropy loss function.                     RoBERTa-task3               0.734               X
We approach the second task as a multi-label binary classification                      CT-BERT-task3               0.743             0.719
problem. Both models are fine tuned to perform this objective with                    Ensembling Models             0.781             0.768
a weighted Binary Cross Entropy loss function.                                           TFIDF (DT)                 0.497             0.186
    The third and main task can be performed with different strate-                  RoBERTa-task1+task2            0.675               X
gies. We first try to combine our results of the first two tasks, by                 CT-BERT-task1+task2            0.717             0.775
                                                                           Task 3
labeling the tweets with the level detected in Task 1 for the con-                      RoBERTa-task3               0.690               X
spiracy theories detected in Task 2. While this approach obtains                        CT-BERT-task3               0.706             0.713
convincing results, it is not able to deal properly with cases where                  Ensembling Models             0.726             0.676
tweets discuss about one conspiracy theory but support another
one. An alternative approach is to train both transformer-based
models to perform the main task directly. These models are fine
                                                                          They remain an interesting alternative in the case where annotated
tuned for nine different classification problems with nine Cross
                                                                          data is lacking: a few tweets of each class or, minimally, just the
Entropy loss functions, one for each conspiracy theory. The final
                                                                          definition of the classes are enough to provide some decent results.
loss is the weighted sum of the nine losses. This training framework
                                                                             Looking at conspiracy theories, our worst results on Tasks 2
is illustrated in Figure 1. The advantage of such approach is that a
                                                                          and 3 are about the Intentional Pandemic theory, even though it
single model is trained to perform all the tasks at once, because the
                                                                          is the most represented class in the dataset. Instead, our best re-
first two tasks are just simplifications of the main task.
                                                                          sults are obtained with the Harmful Influence and New World Order
    In our experiments, the weights of all our loss functions are
                                                                          theories. One possible explanation is that both conspiracies can
proportional to the inverse frequency of each class or sub-class
                                                                          be represented with very specific keywords (’5g’ or ’NWO’ for
they are related to, and all our models are trained using the AdamW
                                                                          example).
[4] optimizer.
                                                                             We also performed late fusion ensembling through majority
                                                                          voting with different combination of transformer-based models to
3   RESULTS AND ANALYSIS
                                                                          further improve our results on all the tasks. While this was our
Our results for this challenge are presented in Table 1. All the models   best results on a stratified 5-fold cross-validation set, this was less
have been first evaluated on a stratified 5-fold cross-validation set     competitive on the test set.
and then evaluated on the test set.
   Transformer-based approaches obtained the most competitive             4         DISCUSSION AND OUTLOOK
results. First we notice that RoBERTa models are under-performing
compared to CT-BERT models on all the tasks. The latter models            In this paper, we presented three different methods to perform
are more suited to this dataset because it contains tweets that use       COVID-19-related conspiracy theories detection in tweets. The first
plenty of COVID-related vocabulary that would not be understood           approach consist of a combination of standard machine learning
with the former models. It is also worth mentioning that models           based algorithm with TF-IDF. The second approach uses NLI with
trained on the main task (Task 3) perform better on Task 1 than           transformer-based models and Wikipedia enrichment. The last
their task-specific counterpart.                                          approach aims at fine-tuning transformer-based models on the
   For the TFIDF approach, the best performing method for Task            given dataset. Our best model obtains a MCC score of 0.775 for the
1 is the Support Vector Machine with a MCC score of 0.461. For            main task on the test set which outperforms by a large margin all
Task 2, the Decision Tree gave the best result with 0.585 using the       the other competitors in this challenge.
multi output classifier. On Task 3, the best result was given by the
Decision Tree with an MCC of 0.497.                                       ACKNOWLEDGMENTS
   For the NLI approach, the observed results of these methods on         This work has been partially supported by CHIST-ERA within the
cross-validation did not measure up to the fully-trained models.          CIMPLE project (CHIST-ERA-19-XAI-003).
FakeNews: Corona Virus and Conspiracies Multimedia Analysis Task               MediaEval’21, December 13-15 2021, Online


REFERENCES
[1] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova.
    2019. BERT: Pre-training of Deep Bidirectional Transformers for
    Language Understanding. (2019). arXiv:cs.CL/1810.04805
[2] Kowsari, Jafari Meimandi, Heidarysafa, Mendu, Barnes, and Brown.
    2019. Text Classification Algorithms: A Survey. Information 10, 4
    (2019).
[3] Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi
    Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoy-
    anov. 2019. RoBERTa: A Robustly Optimized BERT Pretraining Ap-
    proach. (2019). arXiv:cs.CL/1907.11692
[4] Ilya Loshchilov and Frank Hutter. 2019. Decoupled Weight Decay
    Regularization. (2019). arXiv:cs.LG/1711.05101
[5] Shervin Minaee, Nal Kalchbrenner, Erik Cambria, Narjes Nikzad,
    Meysam Chenaghlu, and Jianfeng Gao. 2021. Deep Learning
    Based Text Classification: A Comprehensive Review.               (2021).
    arXiv:cs.CL/2004.03705
[6] Martin Müller, Marcel Salathé, and Per E Kummervold. 2020. COVID-
    Twitter-BERT: A Natural Language Processing Model to Analyse
    COVID-19 Content on Twitter. (2020). arXiv:cs.CL/2005.07503
[7] Konstantin Pogorelov, Daniel Thilo Schroeder, Stefan Brenner, and
    Johannes Langguth. 2021. FakeNews: Corona Virus and Conspiracies
    Multimedia Analysis Task at MediaEval 2021. In Multimedia Bench-
    mark Workshop.
[8] Konstantin Pogorelov, Daniel Thilo Schroeder, Petra Filkukova, and
    Johannes Langguth. 2021. WICO Text: A Labeled Dataset of Conspir-
    acy Theory and 5G-Corona Misinformation Tweets. In Workshop on
    Open Challenges in Online Social Networks (OASIS).
[9] Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf.
    2020. DistilBERT, a distilled version of BERT: smaller, faster, cheaper
    and lighter. (2020). arXiv:cs.CL/1910.01108