<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Conspiracy Theory Detection using Transformers with Multi-task and Multilingual Approaches</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Leon Zrnić</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Zagreb, Faculty of Electrical Engineering and Computing</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <abstract>
        <p>The COVID-19 pandemic sparked a new age of conspiracy theories in society. This has become an issue, especially since these theories are mixed in with reasonable arguments that criticize the measures taken by governments and their efects. A way to help diferentiate these two narratives is using natural language processing (NLP) models such as Transformers. These working notes detail a few approaches to classifying conspiracy and critical narratives and the identification of the key narrative elements present in these texts. We employ these models on two datasets which encompass English and Spanish texts on Telegram talking about the COVID-19 pandemic. Our approaches include using pre-trained BERT and RoBERTa models on monolingual datasets, a multilingual approach in which we translate the Spanish texts into English and the use of a multilingual model on nontranslated texts, and using a multi-task model architecture in the identification of narrative elements. Our results show that BERT pre-trained on COVID-19 tweets had similar results to RoBERTa in the binary classification task, while in the token classification task RoBERTa worked better. The monolingual English approach yielded better results than the multilingual one which was, however, better than the Spanish models. We conclude that transformer models can have good results in these classification tasks, making them an easy-to-deploy way to diferentiate critical narratives from conspiracy theories.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Machine learning</kwd>
        <kwd>NLP</kwd>
        <kwd>conspiracy theories</kwd>
        <kwd>transformers</kwd>
        <kwd>multilingual</kwd>
        <kwd>multi-task model</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>The COVID-19 pandemic has flooded digital platforms with both essential updates and conspiracy
theories. This surge of information creates the challenge of distinguishing between legitimate critical
narratives and harmful conspiracy theories. Critical narratives question established systems using
evidence and reason, while conspiracy theories claim secret plots without substantial proof.
Diferentiating these is vital for efective public health communication and social stability. It ensures informed
decision-making, as critical narratives drive constructive scrutiny based on evidence, while conspiracy
theories spread misinformation and cause societal divisions.</p>
      <p>One way to diferentiate these narratives is through the use of natural language processing (NLP)
models. These automatic classifiers can hasten the process of identifying conspiracy narratives, removing
the need for human annotation in the process.</p>
      <p>In these work notes, we describe our approach to creating and training these automatic classifiers
on two separate classification tasks. The first task is a binary classification task in which an AI model
diferentiates between conspiracy and critical narratives. The second task sets the goal of identifying
the key narrative elements present in conspiracy and critical narrative texts regarding the COVID-19
pandemic.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Task descriptions</title>
      <p>
        As part of the PAN at CLEF 2024 Oppositional thinking analysis: Conspiracy theories vs critical thinking
narratives shared task [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ], we partook in two diferent tasks. The first task was a binary classification
task in which participants needed to make AI models that diferentiated between conspiracy and
critical narratives in Telegram messages about the COVID-19 pandemic. The second task was a token
classification task in which models needed to find six diferent key narrative elements present in the
Telegram messages.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Dataset</title>
      <p>As mentioned, there are two train datasets: one with English Telegram messages and another with
Spanish Telegram messages, each containing 4000 annotated messages about the COVID-19 pandemic.</p>
      <p>Messages are labeled as either Critical or Conspiracy. Critical messages discuss the pandemic with
reasoned arguments, questioning government measures. Conspiracy messages claim hidden plots aim
to undermine freedom and establish a new world order.</p>
      <p>Each message also has token-level annotations for six narrative elements, as defined by the dataset
authors:
• Agents,
• Facilitators,
• Victims,
• Campaigners,
• Objectives, and
• Negative efects .</p>
      <p>To further analyze the dataset, we examined the diferences between texts labeled as conspiracy and
critical in two ways. First, we looked at the lengths of the messages, as shown in Table 1. On average,
conspiratorial messages are almost twice as long as critical messages and Spanish texts are longer than
English texts.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Methodology</title>
      <p>This section describes the methodology used in our work. Subsection 4.1 briefly explains the transformer
architecture and the models we used. Subsection 4.2 details the two multilingual approaches we used,
namely text translation and multilingual models. In Subsection 4.3 we explain the multi-task model
architecture that was used for the token classification task, and in Subsection 4.4 we detail Stratified
K-fold cross-validation with which we evaluated the performance of our models during training.</p>
      <sec id="sec-4-1">
        <title>4.1. Transformer models</title>
        <p>
          Transformers [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] are deep learning neural network architectures primarily used in NLP. They leverage
the attention mechanism proposed by [4], allowing the model to focus on crucial parts of a text for
a given task. Transformers can be applied to various tasks, including text summarization, question
answering, and binary and token classification, as explored in our work.
        </p>
        <p>Since their introduction, many transformer architectures have emerged, with BERT (Bidirectional
Encoder Representations from Transformers) [5] being one of the most popular. BERT improves on the
original transformer by using a bidirectional approach, analyzing the entire sentence to determine the
importance of each word, unlike the original architecture, which only considered preceding words.</p>
        <p>The availability of pre-trained models has significantly contributed to the widespread use of
transformers in the NLP community. All pre-trained models were sourced from the HuggingFace transformers
library [6]. The following models were used for the binary classification task:
• English
• Spanish
– bert-base-cased [5],
– bert-large-cased [5],
– roberta-base [7],
– roberta-large [7], and
– digitalepidemiologylab/covid-twitter-bert-v2 [8] (referred to as ct-bert)
– dccuchile/bert-base-spanish-wwm-cased [9] (referred to as bert-spanish) and
– PlanTL-GOB-ES/roberta-large-bne [10] (referred to as roberta-spanish)</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Multilingual approach</title>
        <sec id="sec-4-2-1">
          <title>4.2.1. Text translation</title>
          <p>One way to utilize all the available data is by translating one language data into another. This way we
will have available twice the amount of data instead of the monolingual approach in which we use half
of all the data. To this end, we used the translate Python package1. From this Python package, we
implemented the MyMemory [11] translation provider that has several diferent machine translation
models with a linguistic database.</p>
          <p>Since English is a high-resource language, we decided that for this approach we would use monolingual
English models that worked on a dataset that combined the original English texts and the Spanish texts
translated into English.</p>
        </sec>
        <sec id="sec-4-2-2">
          <title>4.2.2. Multilingual models</title>
          <p>Using multilingual models can address the issue of utilizing only half the available data. An example is
the xlm-roberta-base [12] model, which was “pre-trained on 2.5TB of CommonCrawl data in 100
languages”.</p>
          <p>Multilingual models leverage shared learning across languages, allowing for the use of double the
data compared to monolingual models. English, a high-resource language, can enhance the classification
of Spanish texts, improving performance through shared representations and transfer learning, which
also aids generalization.</p>
        </sec>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Multi-task model</title>
        <p>Multi-task Learning (MTL) models [13] are trained on related tasks to create representations that can
handle multiple objectives. They use two main architectures: hard parameter sharing and soft parameter
sharing.</p>
        <p>In hard parameter sharing, a single pipeline of shared layers is used, with separate task-specific
layers. Figure 1 showcases this MTL architecture. In hard parameter sharing, the model has one main
pipeline of shared layers while keeping task-specific layers separate for each task. This approach
reduces overfitting and enables knowledge transfer between tasks. For example, representations learned
from a binary classification task can aid in token classification.
1https://pypi.org/project/translate/#description</p>
        <p>The second way of making an MTL model is by soft parameter sharing. Figure 2 shows the structure
of one such model. Soft parameter sharing involves separate models for each task, with regularized
layers to keep parameters similar. [14] state that there are diferent ways of regularizing these models
such as L2 distance [15] or the trace norm [16].</p>
        <p>We employ an MTL model for token classification using a hard parameter-sharing transformer model.
It shares a common hidden layer backbone with six separate classification heads for diferent narrative
elements. Diferent pre-trained transformers serve as backbones for the two datasets. Figure 3 visualizes
the model architecture used.</p>
      </sec>
      <sec id="sec-4-4">
        <title>4.4. Stratified K-fold cross-validation</title>
        <p>Since only the train dataset was available for most of our work, we used an artificial test dataset for
evaluation during training. We created this dataset using Stratified -fold cross-validation, which splits
the training set into  equal-sized subsets while preserving the class label ratio in each fold. In each
epoch, the model is trained  times, using  − 1 folds for training and the remaining fold for validation.
The model’s performance is then averaged across all folds and epochs. This method, implemented with
Scikit-learn [17], allowed us to obtain performance scores without the oficial test dataset. The best
models were ultimately evaluated on the oficial test dataset at the competition’s end.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Experimental setup</title>
      <p>In this section, we present the technical details regarding our setup. For both tasks, we used Stratified
5-fold cross-validation. The hyperparameters for the transformers were 10 epochs, a learning rate of
2− 5, a batch size of 32, weight decay was set to 0.01, and we set the value of warmup steps to 0.1. We
also increased the maximum sequence length from the base length of 256 to 512. The models were
trained on an Nvidia A100 graphics card with 40GB of RAM. All of the models and their tokenizers
were from the HuggingFace [6] library.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Results</title>
      <p>Here we present the results from our diferent approaches to the two classification tasks. In Subsection 6.1
we detail the results we got while self-validating on the train dataset. Subsection 6.2 shows the results
of the models we submitted to evaluation on the oficial test dataset.</p>
      <sec id="sec-6-1">
        <title>6.1. Experimental results</title>
        <sec id="sec-6-1-1">
          <title>6.1.1. Task 1: Binary classification</title>
          <p>2https://towardsdatascience.com/how-to-create-and-train-a-multi-task-transformer-model-18c54a146240</p>
        </sec>
        <sec id="sec-6-1-2">
          <title>6.1.2. Task 2: Token classification</title>
          <p>Here we detail the results the token classification models had during our own evaluation. Table 5
and Table 6 have the results for the English and Spanish models respectively. The roberta-large
achieved the best results on the English dataset while roberta-spanish was the best model on the
Spanish dataset.</p>
        </sec>
      </sec>
      <sec id="sec-6-2">
        <title>6.2. Oficial results</title>
        <p>This subsection contains the tables with the results we achieved on the oficial test dataset. Table 7
contains the binary classification results and Table 8 the results for the token classification task.</p>
        <p>When comparing these results to the other teams competing at PAN, we placed sixth in the English
variant of the first task and ninth in the Spanish variant. In the second task, we placed second in both
the English and Spanish variants.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusion</title>
      <p>In our work, we explored the application of transformer models on the tasks of binary classification of
conspiracy and critical narratives and the token classification of key narrative elements in the dataset.
Our results show that transformer models such as BERT and RoBERTa are highly efective in both binary
and token classification tasks in the domain of COVID-19 messages. In the binary classification task, the
English transformers performed better than the Spanish ones. There are many possible reasons for this,
such as the data quality and size of pre-training data, the pre-training approaches, and the diferences
between English and Spanish. The translation approach did not succeed in achieving good results.
We attribute this to the poor translation capabilities of MyMemory. Further work could use diferent
translation methods, such as transformer-based machine translation [18, 19]. On the other hand, the
multilingual transformer model had good results when compared to the monolingual approaches. In
the token classification task, the best-performing English and Spanish models had the same F1 score.
However, there were diferences in the performance of the models when looking at the F1 scores for
each annotation. For example, the Spanish models could detect better the Negative efect and Victim
annotations. Further work should explore the diferences between English and Spanish conspiracy
theories on a semantical level. Multilingual models could perhaps leverage these diferences to achieve
better results than the monolingual models.
[4] D. Bahdanau, K. Cho, Y. Bengio, Neural machine translation by jointly learning to align and
translate, 2016. arXiv:1409.0473.
[5] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, BERT: Pre-training of deep bidirectional transformers
for language understanding, 2018. URL: https://arxiv.org/abs/1810.04805. doi:10.48550/ARXIV.
1810.04805.
[6] T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf, M.
Funtowicz, J. Davison, S. Shleifer, P. von Platen, C. Ma, Y. Jernite, J. Plu, C. Xu, T. L. Scao, S. Gugger,
M. Drame, Q. Lhoest, A. M. Rush, Transformers: State-of-the-art natural language processing,
in: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing:
System Demonstrations, Association for Computational Linguistics, Online, 2020, pp. 38–45. URL:
https://www.aclweb.org/anthology/2020.emnlp-demos.6.
[7] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, V. Stoyanov,</p>
      <p>Roberta: A robustly optimized bert pretraining approach, 2019. arXiv:1907.11692.
[8] M. Müller, M. Salathé, P. E. Kummervold, Covid-twitter-bert: A natural language processing model
to analyse covid-19 content on twitter, Frontiers in Artificial Intelligence 6 (2023). URL: https:
//www.frontiersin.org/articles/10.3389/frai.2023.1023281. doi:10.3389/frai.2023.1023281.
[9] J. Cañete, G. Chaperon, R. Fuentes, J.-H. Ho, H. Kang, J. Pérez, Spanish pre-trained bert model and
evaluation data, in: PML4DC at ICLR 2020, 2020.
[10] A. G. Fandiño, J. A. Estapé, M. Pàmies, J. L. Palao, J. S. Ocampo, C. P. Carrino, C. A. Oller,
C. R. Penagos, A. G. Agirre, M. Villegas, Maria: Spanish language models, Procesamiento del
Lenguaje Natural 68 (2022). URL: https://upcommons.upc.edu/handle/2117/367156#.YyMTB4X9A-0.
mendeley. doi:10.26342/2022-68-3.
[11] M. Trombetti, MyMemory: creating the world’s largest translation memory, in: Proceedings of
Translating and the Computer 31, Aslib, London, UK, 2009. URL: https://aclanthology.org/2009.
tc-1.12.
[12] A. Conneau, K. Khandelwal, N. Goyal, V. Chaudhary, G. Wenzek, F. Guzmán, E. Grave, M. Ott,
L. Zettlemoyer, V. Stoyanov, Unsupervised cross-lingual representation learning at scale, in:
D. Jurafsky, J. Chai, N. Schluter, J. Tetreault (Eds.), Proceedings of the 58th Annual Meeting of the
Association for Computational Linguistics, Association for Computational Linguistics, Online,
2020, pp. 8440–8451. URL: https://aclanthology.org/2020.acl-main.747. doi:10.18653/v1/2020.
acl-main.747.
[13] R. Caruana, Multitask learning: A knowledge-based source of inductive bias, in: International</p>
      <p>Conference on Machine Learning, 1993. URL: https://api.semanticscholar.org/CorpusID:18522085.
[14] S. Ruder, An overview of multi-task learning in deep neural networks, 2017. arXiv:1706.05098.
[15] L. Duong, T. Cohn, S. Bird, P. Cook, Low resource dependency parsing: Cross-lingual parameter
sharing in a neural network parser, in: C. Zong, M. Strube (Eds.), Proceedings of the 53rd
Annual Meeting of the Association for Computational Linguistics and the 7th International
Joint Conference on Natural Language Processing (Volume 2: Short Papers), Association for
Computational Linguistics, Beijing, China, 2015, pp. 845–850. URL: https://aclanthology.org/
P15-2139. doi:10.3115/v1/P15-2139.
[16] Y. Yang, T. M. Hospedales, Trace norm regularised deep multi-task learning, 2017.</p>
      <p>arXiv:1606.04038.
[17] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer,
R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, E. Duchesnay,
Scikit-learn: Machine learning in Python, Journal of Machine Learning Research 12 (2011) 2825–
2830.
[18] T. Tian, C. Song, J. Ting, H. Huang, A french-to-english machine translation model using
transformer network, Procedia Computer Science 199 (2022) 1438–1443. URL: https://www.sciencedirect.
com/science/article/pii/S1877050922001831. doi:https://doi.org/10.1016/j.procs.2022.
01.182, the 8th International Conference on Information Technology and Quantitative
Management (ITQM 2020 2021): Developing Global Digital Economy after COVID-19.
[19] T. J. Sefara, S. G. Zwane, N. Gama, H. Sibisi, P. N. Senoamadi, V. Marivate, Transformer-based
machine translation for low-resourced languages embedded with language identification, in: 2021
Conference on Information Communications Technology and Society (ICTAS), 2021, pp. 127–132.
doi:10.1109/ICTAS50802.2021.9394996.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A. A.</given-names>
            <surname>Ayele</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Babakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bevendorf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X. B.</given-names>
            <surname>Casals</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Chulvi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Dementieva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Elnagar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Freitag</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Fröbe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Korenčić</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mayerl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Moskovskiy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mukherjee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Panchenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Potthast</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Rangel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Rizwan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Schneider</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Smirnova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Stamatatos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Stein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Taulé</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ustalov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wiegmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Yimam</surname>
          </string-name>
          , E. Zangerle, Overview of PAN 2024:
          <article-title>Multiauthor writing style analysis, multilingual text detoxification, oppositional thinking analysis, and generative AI authorship verification - condensed lab overview, in: Experimental IR Meets Multilinguality, Multimodality, and Interaction</article-title>
          .
          <source>Proceedings of the Fifteenth International Conference of the CLEF Association CLEF-2024</source>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>D.</given-names>
            <surname>Korenčić</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Chulvi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Bonet-Casals</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Taulé</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Rangel</surname>
          </string-name>
          ,
          <article-title>Overview of the oppositional thinking analysis PAN task at CLEF 2024</article-title>
          , in: G. Faggioli,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Galuvakova</surname>
          </string-name>
          , A. G. S. de Herrera (Eds.),
          <source>Working Notes of CLEF 2024 - Conference and Labs of the Evaluation Forum</source>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Vaswani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Parmar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Uszkoreit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Gomez</surname>
          </string-name>
          , Ł. Kaiser,
          <string-name>
            <surname>I. Polosukhin</surname>
          </string-name>
          ,
          <article-title>Attention is all you need</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>30</volume>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>