<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Conference and Labs of the Evaluation Forum, September</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Overview of the Oppositional Thinking Analysis PAN Task at CLEF 2024</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Damir Korenčić</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Berta Chulvi</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Xavier Bonet-Casals</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mariona Taulé</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Paolo Rosso</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff5">5</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Francisco Rangel</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Ruđer Bošković Institute</institution>
          ,
          <country country="HR">Croatia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Symanto Research</institution>
          ,
          <country country="ES">Spain</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Universitat Politècnica de València</institution>
          ,
          <country country="ES">Spain</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Universitat de Barcelona</institution>
          ,
          <country country="ES">Spain</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>Universitat de València</institution>
          ,
          <country country="ES">Spain</country>
        </aff>
        <aff id="aff5">
          <label>5</label>
          <institution>ValgrAI Valencian Graduate School and Research Network Analysis of Artificial Analysis</institution>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <volume>0</volume>
      <fpage>9</fpage>
      <lpage>12</lpage>
      <abstract>
        <p>This paper describes the Oppositional Thinking Analysis task at CLEF 2024. The task focuses on analyzing conspiracy theories and critical thinking narratives, and is comprised of two subtasks. Subtask 1 is a binary classification task aimed at distinguishing between critical and conspiracy texts. Subtask 2 is a token classification task aimed at detecting text spans corresponding to the key elements of oppositional (critical and conspiracy) narratives. The subtasks are based on a dataset of English and Spanish COVID19-related texts obtained from oppositional Telegram channels, and labeled using a topic-agnostic annotation scheme [1]. A total of 82 teams participated in the challenge, and 17 teams published working notes papers with system descriptions. The participants employed a range of NLP methods and pushed the state-of-art performance on both subtasks beyond the performance of the strong baseline systems [1] that were provided.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Conspiracy Theories</kwd>
        <kwd>Oppositional Thinking</kwd>
        <kwd>Computational Social Science</kwd>
        <kwd>Natural Language Processing</kwd>
        <kwd>Text Classification</kwd>
        <kwd>Sequence Labeling</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The first edition of the Oppositional Thinking Task, held at CLEF 2024, focused on distinguishing
automatically between conspiratorial narratives and critical narratives that do not convey a conspiratorial
mentality. Conspiracy Theories (CTs) are causal explanations of significant events that present them
as a result of cover plots orchestrated by secret, powerful, and malicious groups [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Since conspiracy
narratives tend to convey a critical vision of mainstream policies, a common mistake, especially in the
middle of a global crisis such as a pandemic or a war, is to categorize every critical narrative against the
oficial discourse as conspiratorial. Criticism and free discussion are key values in democratic societies;
however, conspiracy narratives severely weaken democratic systems because they place the ultimate
agent of the crisis outside the control of our systems of governance. As a result, it is important not to
confuse critical and conspiracy narratives.
      </p>
      <p>
        The interest in the automatization of the critical-conspiracy distinction was recently highlighted by
Korenčić et al. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], who argued that, if models monitoring the social media messages do not diferentiate
between critical and conspiratorial thinking, there is a high risk of pushing people toward conspiracy
communities. The sociopsychological basis of this process is based on Social Identity Theory. Social
Identity Theory (SIT) has been a cornerstone in understanding group processes and intergroup relations
since its inception in the early 1970s [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. This theory posits that individuals derive a part of their
self-concept from their membership in social groups, which influences their behavior and attitudes
towards in-group and out-group members [
        <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
        ]. As a result, being considered a conspiracist when you
are not could be a threat to your social identity. Once the subject is the target of this accusation, a way to
repair this stigmatization is to join conspiracist groups that will give the social support needed to recover
a positive social identity. This process is not unusual. As several authors from the field of social sciences
suggest, a fully-fledged conspiratorial worldview is the final step in a progressive “spiritual journey”
that sets out questioning social and political orthodoxies [
        <xref ref-type="bibr" rid="ref6 ref7 ref8">6, 7, 8</xref>
        ]. Accordingly, the distinction between
conspiratorial and critical thinking is crucial for automated content moderation: without it, there is
a significant risk of driving individuals towards conspiracy communities. Specifically, mislabeling a
text as conspiratorial when it merely challenges mainstream perspectives could inadvertently steer
individuals who are simply questioning into the arms of conspiracy groups.
      </p>
      <p>
        Furthermore, in the area of computational linguistics, Korenčić et al. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] have shown that conspiracist
narrative and critical thinking are diferent due to their potential social efect on public opinion discourse,
with the former being significantly more associated with violent words and expressions of anger. In
their corpus, the authors have also labelled key elements in oppositional narratives (goals, efects,
agents, and the two groups in conflict, facilitators of government decisions and campaigners against
them), demonstrating that a greater level of intergroup conflict between facilitators and campaigners is
associated especially with conspiracy narratives and correlates with a greater use of violent words and
the emotional manifestation of anger.
      </p>
      <p>
        Based on this recent research [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], the present task addresses two new challenges for the NLP research
community: (1) to distinguish the conspiracy narrative from other oppositional narratives that do
not express a conspiracy mentality (i.e., critical thinking); and (2) to identify the key elements of the
oppositional narrative in online messages. As demonstrated [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], predictive NLP systems for these
two tasks have value for computational social scientists who are interested in analyzing oppositional
narratives. Therefore, it is of interest to push the performance on these tasks beyond the previously
proposed NLP approaches [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. This PAN task has attempted to achieve this goal.
      </p>
      <p>
        For the two tasks described above, we provide the XAI-Disinfodemic corpus [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], a multilingual (English
and Spanish) corpus consisting of 10,000 annotated Telegram messages that focus on oppositional
narratives related to the COVID-19 pandemic. For each language, a training set of 4,000 messages has
been provided to the participants, while the outputs of the systems were computed and evaluated using
the testing set consisting of 1,000 messages. These messages contain oppositional non-mainstream
views on the COVID-19 pandemic, classified into two categories: critical and conspiratorial messages.
Messages have been annotated at the span level with a topic-agnostic schema that distinguishes the
key elements of an oppositional narrative: objectives, negative efects, agents, victims, and facilitators
and campaigners (the two groups in conflict). We also provide strong baseline solutions [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The train
and test splits of the dataset, as well as the code of the baseline systems, are freely available1.
      </p>
      <p>The following sections of this paper describe the key aspects of this task. Section 2 summarizes
the related work on the classification of conspiratorial narratives in NLP and on the span detection
of diferent elements of these narratives. Section 3 presents the dataset used in this task. Section 4
describes the two subtasks proposed above, as well as evaluation measures and baseline solutions.
Section 5 presents the systems used by the participants. Section 6 analyzes the results and the systems
of the participants. Finally, Section 7 contains conclusions and directions for future work.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>A recent literature review by Mahl et al. [9] indicates a rising interest in conspiracy theories within
online environments, particularly within the Social Sciences. Approximately 80% of the research focuses
on written content, with about a third using automated content analysis methods. In this chapter, we
review research from NLP area which are relevant to the present tasks.</p>
      <sec id="sec-2-1">
        <title>1https://github.com/dkorenci/pan-clef-2024-oppositional</title>
        <sec id="sec-2-1-1">
          <title>2.1. Conspiracy detection in NLP</title>
          <p>The COVID-19 pandemic has been one of the topics that has garnered the most attention in the study of
conspiracy narratives since 2020. The pandemic has been fertile ground for the expansion of conspiracy
theories. Among the works oriented in this direction, Uscinski et al. [10] collected a dataset of letters
sent to a mainstream US publication, and labeled them as either containing a conspiracy or not. Another
available corpus dedicated to conspiracy theories is LOCO corpus [11] containing 96,743 texts from a
diverse collection of mainstream and conspiracy outlets. The texts are enriched with website metadata
and auto-generated topics. With more detail about the content of conspiracy theories, we find COCO, a
corpus of 3,495 texts promoting COVID-19 conspiracies [12]. The texts were manually annotated in the
COCO corpus with a fine-grained classification scheme encompassing conspiracy sub-topics.</p>
          <p>The problem has often been approached as a binary classification task with the goal of distinguishing
conspiratorial from non-conspiratorial text. A good example is the two recent MediaEval challenges.
Focusing on the classification of conspiracy texts [ 13, 14], this task led to a number of approaches
demonstrating that the state-of-the-art architecture is a multi-task classifier [ 15, 16, 17] based on
CT-BERT [18].</p>
          <p>More nuanced methodologies using fine-grained approaches, like multi-label or multi-class
classiifcations, have provided a detailed understanding [ 19, 20, 13, 14] of the difusion of conspiracies. For
example, Mofitt et al. [20] developed a classifier of conspiracy tweets and used it for propagation
analysis. COVID-19 origin conspiracy theory tweets using this method and then used social
cybersecurity methods to analyze communities, spreaders, and characteristics of the diferent origin-related
conspiracy theory narratives. This research found that tweets about conspiracy theories were supported
by news sites with low fact-checking scores and amplified by bots who were more likely to link to
prominent Twitter users than in non-conspiracy tweets.</p>
          <p>Other research in computational linguistics has dealt with diferent aspects related to the
characteristics of the disseminators of conspiracy narratives or has focused on the characteristics of the messages.
Bessi [21] employed a text scaling method to map conspiratorial texts to personality traits and analyze
these conspiracies. Giachanou et al. [19] used psychological and linguistic features to classify and
analyze the social media users who spread conspiracies. Topic modeling techniques were used by
other authors [22, 23] to extract and examine common themes within conspiracy texts. Levy et al. [24],
taking an approach diferent from the problem of classifying humans texts, analyze the capacity of
large language models to generate conspiracies.</p>
          <p>However, present research fails to diferentiate between critical thinking and conspiratorial thinking,
which is the main goal of this task.</p>
        </sec>
        <sec id="sec-2-1-2">
          <title>2.2. Span detection in conspiracy theories</title>
          <p>
            In the field of conspiracy theories, several papers have addressed the challenge of span detection.
Samory and Mitra [23] utilized syntactic parsing to identify “motifs” (agent-action-target triplets) and
analyze the patterns of their occurrence. Introne et al. [25] propose a span-level scheme of six categories
(event, actor, goal, action, consequence, target), and use it to analyze 236 messages from anti-vaccination
forums. They distinguish between conspiracy theories and conspiratorial thinking, a category that implies
only passive support for a conspiracy. This distinction is not based on annotations grounded in theory
but on the requirement of all the categories being present in a given text. However, in practice, fewer
elements can convey a conspiracy theory in a very strong manner. Although this research identifies
diferent elements of discourse, it also fails to consider the role played by intergroup conflict in the
conspiracy narrative, which is addressed in the XAI-DisInfodemic corpus [
            <xref ref-type="bibr" rid="ref1">1</xref>
            ].
          </p>
          <p>Holur et al. [26] focus on oppositional elements in the conspirational narrative, detecting the so-called
insider and outsider entities within conspiracy texts by automatically labelling noun phrases. This insider
and outsider schema is based on the positive or negative sentiment that each user conveys for each
entity. Although this research starts a path that could arrive at the consideration of the important role
of intergroup conflict in conspirational narratives, it fails in the proper identification of this intergroup
conflict because objects and other inanimate realities which are clearly out of the social framework are
also identified as insiders or outsiders.</p>
          <p>
            The importance of detecting intergroup conflict, as proposed by Korenčić et al. [
            <xref ref-type="bibr" rid="ref1">1</xref>
            ], relies on the
growing and potentially violent participation of conspiratorial groups in political activities. This
connection implies that CTs aim to strengthen group cohesion and facilitate coordinated actions [27].
Consequently, detecting crucial aspects of the narrative at the level of span, such as intergroup conflict,
can provide significant insights for content moderation.
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Dataset</title>
      <p>
        This task uses the XAI-DisInfodemic corpus [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], which consists of 10,000 annotated Telegram messages,
5,000 in English and 5,000 in Spanish. These messages contain oppositional, non-mainstream views
on the COVID-19 pandemic, and were obtained from public Telegram channels in which users tend
to post messages which oppose the mainstream discourse about the pandemic. They are classified
into two categories: critical messages and conspiratorial messages. For the creation of this corpus, the
authors developed an annotation scheme to diferentiate between texts hinting at the existence of a
conspiracy and those criticizing mainstream views on COVID-19 but without suggesting the existence
of a conspiracy.
      </p>
      <p>Language
Spanish
English</p>
      <p>Avg. Std. dev
128 123
265 528</p>
      <p>Min.
23
12</p>
      <p>In addition to the annotation into the two classes, the XAI-Disinfodemic corpus ofers a second
annotation that presents the key elements in oppositional narratives. The tagset includes six labels
which can be applied both to messages containing a conspiracy theory and messages containing critical
thinking: goals, efects, agents, facilitators (the group that collaborates with the mainstream authorities)
and campaigners (the group that conveys the oppositional message).</p>
      <p>Conspiracy Theory</p>
      <p>
        Korenčić et al. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] identified the following six categories of narrative elements (see Figure 1 for an
example annotation of a Conspiracy message, and Figure 2 for an example annotation of a Critical
message.):
1. Agents (A): Those responsible for the actions and/or negative efects described in the comment. In
Conspiracy, it could be the hidden power that pulls the strings (in Figure 1, “Private owned WHO”,
“investors like Bill Gates”, “pharma companies” and “very evil beings”). In Critical, it could be the
actors that design the mainstream public health policies (in Figure 2, “White House chief medical
advisor Dr. Anthony Fauci” and “the lead of CDC director Rochelle Walensky, who questioned natural
immunity”).
2. Facilitators (F): Those who collaborate with the agents and contribute to the execution of their
goals. In Conspiracy, they could be governments or institutions which, either intentionally or
unwittingly, collaborate with the conspirators and help the conspiracy move forward (in Figure 1,
“the world governments ruled by their puppets”, “their media”, “the media” and “governments”). In
Critical, the facilitators could be healthcare workers, mass media or authority figures who abide
by governmental instructions (in Figure 2, “university hospitals” and “the vaccinated work - from
home hospital administrators who are firing her for not being vaccinated ”).
3. Campaigners (C): Those who oppose the mainstream narrative. In Conspiracy, those who know
the truth and expose it to society at large (in Figure 1, “those awake already”). In Critical, those
who oppose the enforcement of laws and/or refuse to follow health-related instructions from the
authorities (in Figure 2, “Dr Martin Kulldorf ”).
4. Victims (V): Those who sufer the consequences of the actions and decisions of the agents and/or
the facilitators. In Conspiracy, the people who are deceived by those in power, and sufer, become
ill, lose their freedom, or die as a result of a hidden plan (in Figure 1, “people”, “most people” and
“regular people”). In Critical, the people who receive the negative consequences of the actions and
the decisions made by those in power, and also sufer, lose their freedom, become ill, or die as a
result of incorrect decisions (in Figure 2, “all nurses, doctors and other health care providers”).
5. Objectives (O): The intentions and purposes that the agents are trying to achieve. In Conspiracy,
the goals of the conspirators (in Figure 1, “agenda” and “destroying us”). In Critical, the goals of
public authorities, pharmaceutical companies, organizations, etc. (in Figure 2, “pushing vaccine
mandates”).
6. Negative Efects (E): The negative consequences sufered by the victims as a result of the actions and
decisions of those in power and/or their collaborators (in Figure 1, “the constant fear mongering”
and “pay a hefty price, often with their health, lives, the loss of their loved ones”; in Figure 2, “will
be fired if they do not get a Covid vaccine ”).
      </p>
    </sec>
    <sec id="sec-4">
      <title>4. Task Setup</title>
      <p>For each language, the corresponding dataset of 5,000 texts was divided into train and test sets using
stratified sampling. The train set consisted of 4,000 messages while the test set consisted of 1,000
messages. The participants had access to the train set from the start of the task, and prior to the
evaluation deadline they were provided with the unlabeled test set and asked to submit their predictions.
Each team was allowed to submit up to two predictions for each combination of subtask and language.</p>
      <p>The dataset, the code for building and applying the baseline systems, as well as the evaluation code
and task instructions, are made available2.</p>
      <p>
        Distinguishing Between Critical and Conspiratorial Messages (Subtask 1) This is a binary
classification task diferentiating between (1) critical messages, i.e. those that question major decisions
in the public health domain, but do not promote a conspiracist mentality [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]; and (2) conspiratorial
messages, i.e. those that view the pandemic or public health decisions as a result of a malevolent
conspiracy by secret, influential groups [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Input data consists of a set of messages, each of which
associated with one of two categories: either CONSPIRACY or CRITICAL. The evaluation metric used
for this subtask is Matthews Correlation Coeficient (MCC) [28].
      </p>
      <sec id="sec-4-1">
        <title>Detecting Elements of Oppositional Narratives (Subtask 2) This is a token-level classification</title>
        <p>
          task aimed at recognizing text spans corresponding to the key elements of oppositional narratives [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ].
Input data consists of a set of messages, each of which is accompanied by a (possibly empty) list of
span annotations. Each annotation corresponds to a narrative element, and is described by its borders
(start and end characters), as well as its category. There are six distinct span categories: AGENTS,
FACILITATORS, VICTIMS, CAMPAIGNERS, OBJECTIVES, NEGATIVE_EFFECTS. The evaluation metric
used for this subtask is macro-averaged span-F1 [29].
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>4.1. Evaluation Measures</title>
        <p>As the main criterion for evaluation in Subtask 1 , we used the MCC [28]. MCC serves the same purpose
as the macro-averaged F1 measure – it aggregates performance across both classes. We opted for the
MCC measure since it works well on imbalanced datasets, while being reliable and less optimistic than
the macro-averaged F1 [30], and comparing favorably to other alternatives [28].</p>
        <p>For evaluation in Subtask 2 , we used the span-F1 measure [29], which is an adapted version of the
F1 measure and accounts for partially correct predictions by looking at span overlap. Specifically, a
predicted span is not required to exactly match a gold standard span in terms of start and end characters.
Instead, the proportion of overlapping characters is used to calculate precision and recall [29]. This
approach ofers a fairer evaluation in tasks with long spans, and with inherent subjectivity of the span
boundaries. For tasks like traditional, non-nested Named Entity Recognition (NER), where named
entities are shorter and are expected to have well-defined boundaries, exact matching is a reasonable
method of evaluation.</p>
        <p>As the main criterion for evaluation we used macro-averaged span-F1, i.e., span-F1 averaged over all
six span labels corresponding to six elements of oppositional narratives described in Section 3.</p>
        <sec id="sec-4-2-1">
          <title>2https://github.com/dkorenci/pan-clef-2024-oppositional</title>
        </sec>
      </sec>
      <sec id="sec-4-3">
        <title>4.2. Baseline Solutions</title>
        <p>
          Baselines for both subtasks are based on the approaches from Korenčić et al. [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], where more details
can be found. For each subtask, we took as a baseline the version based on the transformer model
which resulted in the lowest performance in Korenčić et al. [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. Hyperparameters were not changed,
the models were trained on the entire train set, and then applied to the test set.
        </p>
      </sec>
      <sec id="sec-4-4">
        <title>Distinguishing Critical and Conspiratorial Messages (Subtask 1) The approach for this binary</title>
        <p>classification task is based on fine-tuning the BERT transformer model [ 31] from the Hugging Face3
repository, using the case-sensitive “base” version. The BETO [32] version of BERT was used for the
Spanish dataset. The number of tokens was set to 256. We tuned the models for three epochs using the
AdamW optimizer, learning rate of 2− 5, slanted triangular LR scheduler with a 10% warm-up period, a
batch size of 16, and a weight decay of 0.01. All the layers of the transformers were fine-tuned. The
dropout rate for the classification head was 0.1.</p>
        <p>Detecting Elements of Oppositional Narratives (Subtask 2) The baseline for this sequence
labeling task is based on fine-tuning a transformer model with added token classification heads. To
account for the possibility of overlapping spans with diferent categories, we used six separate
percategory heads that performed BIO sequence tagging. We employed multi-task learning [33] by
connecting the per-category taggers to the same transformer backbone. Multi-task learning has several
advantages, such as improved regularization and implicit data augmentation [33], and the described
approach was successfully deployed for a similar task of span-level skill extraction [34]. We used the
same configuration and hyperparameters as in the case of Subtask 1 . The exception was the number of
epochs, which we increased to 10 in order to accommodate for the increased task complexity. The BERT
model [31] was used as the base transformer for the English dataset, while for the Spanish dataset the
BETO version of BERT [32] was used.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Participating Systems</title>
      <p>A total of 82 teams submitted their solution for at least one of the tasks. The approaches included
preneural NLP models, small transformers such as BERT [31], and Large Language Models [35]. Techniques
such as Ensemble Methods [36] and Data Augmentation [37] were also used to improve performance.
Another important factor was the data on which the chosen transformer models were pretrained –
participants experimented with both domain-specific models such as CT-BERT [ 18] and multilingual
models such as mBERT [38].</p>
      <p>Most of the approaches relied on fine-tuning BERT-like transformers [ 31]. This is not surprising
since these models yield strong results for both classification [ 31] and sequence labeling [31], and since
baselines based on this approach were provided to the participants.</p>
      <p>To describe the approaches based on transformer models [39] we shall use the abbreviation SLM
(“Small” Language Models) to describe transformers with fewer than one billion parameters. For the
transformers with more than one billion parameters, we shall use the standard abbreviation LLM (Large
Language Models).</p>
      <p>
        Working Notes Submissions A total of 17 participating systems had their working notes papers
accepted. Huertas-García et al. [
        <xref ref-type="bibr" rid="ref9">40</xref>
        ] tackled Subtask 1 , experimenting with a range of SLMs and with
the commercial LLM Claude4. Vallecillo-Rodríguez et al. [
        <xref ref-type="bibr" rid="ref10">41</xref>
        ] experimented with the fine-tuning of two
LLMs: LLaMA3-8B-instruct [
        <xref ref-type="bibr" rid="ref11">42</xref>
        ] and GPT-3.5 [
        <xref ref-type="bibr" rid="ref12">43</xref>
        ]. Hu et al. [
        <xref ref-type="bibr" rid="ref13">44</xref>
        ] used SLMs with an added BiGRU LSTM
layer [
        <xref ref-type="bibr" rid="ref14">45</xref>
        ] to tackle both tasks. Damian et al. [
        <xref ref-type="bibr" rid="ref15">46</xref>
        ] approached both tasks using ensembles of mono- and
multi-lingual SLMs. Sánchez-Hermosilla et al. [
        <xref ref-type="bibr" rid="ref16">47</xref>
        ] focused on Subtask 1 using a range of SLMs, data
3https://huggingface.co/models
4https://www.anthropic.com/claude
augmentation, and ensembling techinques. Zrnić [
        <xref ref-type="bibr" rid="ref17">48</xref>
        ] experimented with mono- and multilingual SLMs
in order to tackle both tasks. Sahitaj et al. [
        <xref ref-type="bibr" rid="ref18">49</xref>
        ] approached Subtask 1 using SLMs and a LLM-based data
augmentation technique. Gómez-Romero et al. [
        <xref ref-type="bibr" rid="ref19">50</xref>
        ] used an approach based on OpenAI Embeddings
and a deep feedforward network for Subtask 1 and, in addition, did entity masking in order to increase
the models’ generality. Mahesh et al. [
        <xref ref-type="bibr" rid="ref20">51</xref>
        ] experimented with SLMs and non-neural approaches on
Subtask 1 . Zeng et al. [
        <xref ref-type="bibr" rid="ref21">52</xref>
        ] employed mono- and multi-lingual SLMs for both Subtask 1 and Subtask 2 .
Huang et al. [53] used SLMs for both tasks, and employed ensembling for Subtask 1 . Tulbure and Coll
Ardanuy [54] experimented with SLMs boosted by data augmentation and ensembling, and for Subtask
2 split the input texts into sentences. Liu et al. [55] experimented with a range of LLMs using zero-shot
chain-of-thoughts prompts to tackle Subtask 1 , and used a SLM approach for Subtask 2 . Mhalgi et al.
[56] approached Subtask 1 using data augmentation, non-neural classifiers, SLMs and LLMs, as well as
model ensembles.
      </p>
      <p>Several participants basically repeated what had been done in the baseline solution, i.e., fine-tuned
and applied one or several SLMs [57, 58, 59].</p>
      <p>Teams that did not submit working notes accounted for 65 submissions and provided a short
description of their approaches. Many of these submissions were minor modifications of the provided baseline,
i.e., changing of an SLM to be fine-tuned. However, a number of these teams achieved competitive
results or provided useful datapoints using, for example, ensembling techniques, data and feature
augmentation techniques, and non-neural NLP approaches.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Results and Analysis</title>
      <sec id="sec-6-1">
        <title>6.1. Distinguishing Critical and Conspiracy Texts (Subtask 1)</title>
        <p>
          Results for English The top IUCL team [56] employed the DeBERTa model [60] fine-tuned on an
augmented dataset comprising the Subtask 1 dataset and the conspiracy-labeled examples from the
LOCO corpus [11] (cca. 16,000 examples were selected). The AI_Fusion team came a close second,
simply by relying on the fine-tuned ELECTRA model [ 61]. A close third was the SINAI team [
          <xref ref-type="bibr" rid="ref10">41</xref>
          ],
which used the fine-tuned LLaMA3-8B-instruct LLM [
          <xref ref-type="bibr" rid="ref11">42</xref>
          ] as a solution. Additionally, their experiments
demonstrated that fine-tuned LLMs outperform the LLM-based zero-shot approaches by a large margin
[
          <xref ref-type="bibr" rid="ref10">41</xref>
          ].
        </p>
        <p>
          The rest of the top-performing models on English based their approaches on SLMs, with several
teams using techniques such as ensembling and data augmentation. The Covid-twitter-BERT [18], used
by the teams ezio [
          <xref ref-type="bibr" rid="ref13">44</xref>
          ], hinlole [53], Zleon [
          <xref ref-type="bibr" rid="ref17">48</xref>
          ], and inaki [
          <xref ref-type="bibr" rid="ref16">47</xref>
          ], seems to be a successful transformer
model for this use-case. Some teams with competitive results used standard transformer models: the
theateam, trustno1, and ojo-bes teams used standard RoBERTa [62], while the virmel team used BERT
[31] and the yeste team relied on the ELECTRA model [61].
        </p>
        <p>
          Two fully multilingual approaches performed competitively, those of the auxR and RD-IA-FUN
[
          <xref ref-type="bibr" rid="ref9">40</xref>
          ] teams. Both approaches were based on a multilingual transformer trained on joint English and
Spanish data. The auxR team employed the Twitter-XLM-RoBERTa-large model, a derivative of the
XLM-RoBERTa model [63] domain-adapted using Twitter data, while the RD-IA-FUN [
          <xref ref-type="bibr" rid="ref9">40</xref>
          ] team used
the multilingual-e5-large model [64], a derivative of XLM-RoBERTa. The Elias&amp;Sergio team used
monolingual RoBERTa, but fine-tuned the model using the Spanish dataset translated to English (in
addition to the English dataset).
        </p>
        <p>
          Notably diferent was the approach of the sail team [
          <xref ref-type="bibr" rid="ref19">50</xref>
          ], who used OpenAI Embeddings5 in
combination with a deep feed-forward neural network for fine-tuning. Additionally, they pre-processed
the texts by replacing named entities with entity classes such as ’PERSON’, in order to “enhance the
model’s generalization capabilities” [
          <xref ref-type="bibr" rid="ref19">50</xref>
          ]. They showed that, for Subtask 1 , the masked model performs
better than the non-masked one.
        </p>
        <p>Results for Spanish Many of the teams that did well on Spanish also achieved top results on English.
For these teams, we will briefly describe the diferences between the two approaches, and we refer the
reader to the English section of Subtask 1 for details.</p>
        <p>
          Top performance was obtained by the SINAI team [
          <xref ref-type="bibr" rid="ref10">41</xref>
          ], which relied on LLMs. In contrast to what
happened in English, the fine-tuned GPT-3.5 model [
          <xref ref-type="bibr" rid="ref12">43</xref>
          ] outperformed LLaMA3-8B-instruct [
          <xref ref-type="bibr" rid="ref11">42</xref>
          ] by a
large margin, yielding the best overall solution.
        </p>
        <p>
          The second and third positions are held by the two fully multilingual approaches of the auxR and
RD-IA-FUN teams [
          <xref ref-type="bibr" rid="ref9">40</xref>
          ], which also performed well on English.
        </p>
        <p>
          Interestingly, five out of the six following teams (Elias&amp;Sergio, AI_Fusion, zhengqiaozeng, virmel,
trustno1, Zleon) employed standard SLM fine-tuning with PlanTL-GOB-ES/roberta-base-bne [ 65] as
the base model. The exception is the zhengqiaozeng team [
          <xref ref-type="bibr" rid="ref21">52</xref>
          ], which relied on the multilingual
XLM-RoBERTa model. The tulbure team [54] relied on an ensemble of three Spanish SLMs.
        </p>
        <p>
          The sail team [
          <xref ref-type="bibr" rid="ref19">50</xref>
          ] used the same approach as for English, based on multilingual OpenAI Embeddings.
        </p>
        <p>The nlpln team [55] made it over the baseline using an unconventional approach in the context of this
challenge - zero-shot prompting based on LLMs and the chain-of-thought prompting technique [66].
We note that the same approach scored competitively on the English classification subtask, achieving
an MCC of 0.7844 (see Table A). The nlpln team [55] tested a number of LLMs, including GPT, Claude,
and Gemini, on the full training set. The DeepSeek V2 model [67], a large mixture-of-experts LLMs,
achieved the best results. Surprisingly, the results on the test data proved this model to be relatively
competitive with fine-tuned LLMs.</p>
        <p>
          Analysis The results of the top teams suggest that the most successful English transformer-based
models are the DeBERTa model [60], the ELECTRA model [61] and the large LLaMA3-8B-instruct LLM
[
          <xref ref-type="bibr" rid="ref11">42</xref>
          ]. The Covid-twitter-BERT [18] model was used by a number of high-performing teams, suggesting
        </p>
        <sec id="sec-6-1-1">
          <title>5https://platform.openai.com/docs/guides/embeddings</title>
          <p>that pre-training on social media data probably influences performance. However, both BERT [ 31] and
RoBERTa [62] were shown to be able to perform competitively. The performance edge obtained by
the IUCL team [56] suggests that the LOCO conspiracy corpus [11] is a useful resource for boosting
conspiracy-related classifiers for other use-cases.</p>
          <p>
            In Spanish, the choice of a model seems to be more important, and many of the best teams used the
Spanish ’Maria’ RoBERTa model [65], trained exclusively on the data crawled from the web, while none
of the top teams employed either the BETO [32] or BERTIN [68] models. Moreover, the top three teams
employed either fine-tuned LLMs [
            <xref ref-type="bibr" rid="ref10">41</xref>
            ] (GPT-3.5 [
            <xref ref-type="bibr" rid="ref12">43</xref>
            ]) or multilingual models [
            <xref ref-type="bibr" rid="ref9">40, 63</xref>
            ]. These teams,
especially the top one based on LLMs, outperformed the others by a significant margin. Interestingly,
none of the participants used RoBERTuito [69], a model pretrained on Spanish social media text.
          </p>
          <p>It would be interesting to perform ablation studies in both languages in order to measure the influence
of both architectural improvements and the choice of the pretraining dataset on performance.</p>
          <p>
            As for the application of the LLMs [35], the results on English show no big diference between
finetuned LLMs and fine-tuned SLMs. Therefore, we hypothesize that the superiority of fine-tuned GPT-3.5
[
            <xref ref-type="bibr" rid="ref12">43</xref>
            ] on Spanish is due to the pre-training data (GPT-3.5 has probably “seen” much more texts from then
social media then the Spanish SLMs). The results of the nlpln team [55] demonstrate the competitiveness,
in both languages, of the DeepSeek V2 model [67], in combination with chain-of-thoughts prompting
[66]. Therefore, this approach seems to be a good way to quickly bootstrap a conspiracy vs. critical
classifier for other use-cases and other supported languages. The approach of Sahitaj et al. [
            <xref ref-type="bibr" rid="ref18">49</xref>
            ], which
was based on using LLM-based elaboration on text’s context and argumentation as additional input for
classification, might prove beneficial for improving LLM-based zero-shot prompting.
          </p>
          <p>A number of teams opted to use non-neural text classifiers, such as LinearSVM [ 70] or Random Forest
[71] in combination with tf-idf- or n-gram-based features. The average score of these approaches is
0.7080 MCC for English, and 0.5814 MCC for Spanish.</p>
          <p>
            The baseline systems [
            <xref ref-type="bibr" rid="ref1">1</xref>
            ] were based on BERT [31] and BETO [32], respectively, for the English and
Spanish dataset. These models were chosen as the baseline as they yielded the weakest performance in
Korenčić et al. [
            <xref ref-type="bibr" rid="ref1">1</xref>
            ]. The best performance, corresponding to the state-of-art before this challenge, was
obtained for DeBERTaV3 [72] and ’BERTIN’ RoBERTa [68] models. When these models were applied to
the train-test split of the challenge, the MCC scores of 0.8259 and 0.6681 were obtained, respectively, for
English and Spanish. The score of DeBERTaV3 represents an improvement in relation to BERT. Even
with this improvement, the participants managed to improve upon the state-of-art performance.
          </p>
        </sec>
      </sec>
      <sec id="sec-6-2">
        <title>6.2. Detecting Elements of the Oppositional Narratives (Subtask 2)</title>
        <p>
          Results for English The most successful team, tulbure [54], relied on a combination of preprocessing
techniques and data augmentation. While the provided baseline used multi-task learning to account
for overlapping spans of diferent categories [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], Tulbure and Coll Ardanuy [54] opted to use a single
model for all the span categories and modified the data accordingly. Additionally, each Telegram text
was segmented into sentences which were used as examples for learning. This solved the problem of
texts longer than the maximum length supported by a transformer. Data augmentation was performed
by “replacing words in the texts by synonyms or semantically-related words”, and the RoBERTa model
was used [62].
        </p>
        <p>
          As the remaining teams mostly relied on modifying the multi-task sequence labeling approach of the
baseline [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], this will be the assumed default approach. Only if another approach was used will the
diference be described.
        </p>
        <p>
          The second-placed team, Zleon [
          <xref ref-type="bibr" rid="ref17">48</xref>
          ], used a large variant of RoBERTa [62] and increased the model’s
maximum sequence length to 512. The third-placed team, hinlole [53], used Covid-twitter-BERT [18] as
the base model.
tulbure [54]
Zleon [
          <xref ref-type="bibr" rid="ref17">48</xref>
          ]
hinlole [53]
oppositional_opposition
AI_Fusion
virmel
miqarn
TargaMarhuenda
ezio [
          <xref ref-type="bibr" rid="ref13">44</xref>
          ]
zhengqiaozeng [
          <xref ref-type="bibr" rid="ref21">52</xref>
          ]
Elias&amp;Sergio
DSVS [
          <xref ref-type="bibr" rid="ref15">46</xref>
          ]
CHEEXIST
rfenthusiasts
ALC-UPV-JD-2
baseline-BERT
span-F1
        </p>
        <p>
          span-F1
tulbure [54] 0.6129
Zleon [
          <xref ref-type="bibr" rid="ref17">48</xref>
          ] 0.5875
AI_Fusion 0.5777
virmel 0.5616
CHEEXIST 0.5621
miqarn 0.5603
DSVS [
          <xref ref-type="bibr" rid="ref15">46</xref>
          ] 0.5529
TargaMarhuenda 0.5364
Elias&amp;Sergio 0.5151
hinlole [53] 0.4994
baseline-BETO 0.4934
        </p>
        <p>
          The oppositional_opposition team used the DistilBERT model [73] in combination with Conditional
Random Fields [74]. Interestingly, the same type of model was used for Subtask 2 in Spanish, but achieved
a very low result (see Table 10 in Appendix A), as if overfitting or failing to converge. The AI_Fusion
team used the RoBERTa model [62] and chose the best model over the 50 nfie-tuning epochs. The virmel
team used the RoBERTa model with the maximum sequence length set to 512. The zhengqiaozeng team
[
          <xref ref-type="bibr" rid="ref21">52</xref>
          ] employed the RoBERTa model, while the ALC_UPV_JD_2 team relied on the small ALBERT model
[75].
        </p>
        <p>The miqarn team used the multilingual mBERT model [38], trained on datasets in both languages.
This approach also performed well on the Spanish dataset.</p>
        <p>The TargaMarhuenda team used the RoBERTa model, and added pre-computed POS tags as input
by concatenating them to the model’s token embeddings to construct input to the initial layer of the
transformer. The Elias&amp;Sergio team used a similar approach, but concatenated one-hot POS vectors
with the token representations of the final layer of the transformer to construct input to the token
classification head.</p>
        <p>
          The ezio team [
          <xref ref-type="bibr" rid="ref13">44</xref>
          ] modified the multi-tasking approach using “BiGRU LSTM”, a bidirectional LSTM
network based on gated recurrent units [
          <xref ref-type="bibr" rid="ref14">45</xref>
          ]. Instead of using simple per-task classification heads,
each task was assigned both a task-specific LSTM network and a task-specific classification head.
Covid-twitter-BERT [18] was used as the base model.
        </p>
        <p>
          The DSVS [
          <xref ref-type="bibr" rid="ref15">46</xref>
          ] team created an ensemble of token classifiers based on diferent SLMs such as BERT,
RoBERTa and ELECTRA, and performed “logit averaging” to obtain their final predictions.
        </p>
        <p>The CHEEXIST team used the Fake-News-Bert-Detect model, a domain-adapted version of RoBERTa.
Additionally, they replaced the final classification layer with a shallow neural network.</p>
        <p>The rfenthusiasts team used the DeBERTaV3 model [72] and did a data augmentation by replacing
characters in text. The same approach, when used in combination with the XLM-RoBERTa model [63],
did not work well on the Spanish dataset.</p>
        <p>
          Results for Spanish All of the teams that achieved top results on the Spanish dataset did the same on
the English dataset. Therefore, here we will only briefly describe the diferences, which mostly pertain
to a diferent choice of transformer model. Similarly as for English, the majority of the approaches
relied on the multi-task sequence labeling approach of the baseline [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ].
        </p>
        <p>The same two teams - tulbure and Zleon - took the first and second place, as on the English dataset.
Both relied on the same respective approach that they used on English, with the diference of using the
Spanish ’Maria’ RoBERTa model [65].</p>
        <p>The AI_Fusion team, placed third, relied on the XLM-RoBERTa model [63], while the virmel team
relied on Spanish ’BERTIN’ RoBERTa model [68]. The CHEEXIST team used the ’Maria’ RoBERTa
model [65].</p>
        <p>
          The miqarn team used a single mBERT [38] model fine-tuned on both datasets, and achieved good
results on Spanish. The DSVS [
          <xref ref-type="bibr" rid="ref15">46</xref>
          ] team’s ensemble approach also achieved good results in the case of
the Spanish dataset. The ensemble consisted of a number of Spanish and multilingual models [
          <xref ref-type="bibr" rid="ref15">46</xref>
          ].
        </p>
        <p>Two approaches based on using POS tags as additional input to the model, used by the
TargaMarhuenda and Elias&amp;Sergio teams, relied on the Spanish RoBERTa model. The hinlole team [53] relied
on the Spanish BETO model [32].</p>
        <p>Analysis The system that clearly outperformed the others in both languages was the one of the tulbure
team [54]. Its sentence-level processing of texts shows that signals for the inference of the elements of
oppositional narrative are largely sentence-local. It would be interesting to perform ablation studies to
determine how much data augmentation influences performance in contrast to sentence segmenting.
Further improvements might be achieved by way of using multi-task learning and transformers other
than RoBERTa, as well as other data augmentation techniques, possibly based on LLMs.</p>
        <p>
          The competitive results of the Zleon team [
          <xref ref-type="bibr" rid="ref17">48</xref>
          ] and several other teams relying on the multi-task
baseline approach show its efectiveness in combination with an improved choice of the backbone
SLM and increased maximum sequence length. Covid-twitter-BERT [18], used by the second- and
third-placed teams, seems to be a successful choice for English.
        </p>
        <p>The performance of Subtask 2 seems to be less influenced by the choice of the transformer model,
especially in the case of Spanish. Concretely, a larger variety of models appear among the top teams
and, in the case of Spanish, all three families of models (BETO [32], BERTIN [68], and ’Maria’ [65]) are
represented.</p>
        <p>The approach of the miqarn team, based on the multilingual mBERT model [38], worked well for
both languages and could be a good approach for the task of inferring the elements of oppositional
narrative in other languages, especially under-resourced ones.</p>
        <p>
          The baseline systems [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] were based on BERT [31] and BETO [32] models, respectively, for the
English and Spanish dataset. They were chosen since they yielded the weakest performance in Korenčić
et al. [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. Top performance, corresponding to the state-of-art before this challenge, was obtained for
DeBERTaV3 [72] and BERTIN [68] models. When these models were applied to the train-test split of
the challenge, the MCC scores of 0.5786 and 0.5369 were obtained, respectively, for English and Spanish.
These scores represent an improvement in relation to the baseline, but even so the participants managed
to significantly raise the state-of-art performance on the task.
        </p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusions</title>
      <p>
        The Oppositional Thinking Analysis PAN Task presented to the NLP community two subtasks:
distinguishing between critical and conspiratorial messages, and detecting elements of oppositional narratives.
These subtasks are of interest to computational social scientists interested in text-based analysis of
oppositional thinking [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>
        A total of 82 teams participated in the challenge, while 17 teams provided working notes papers. The
teams devised a range of solutions, the most successful of which exceeded previous state-of-the-art
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] for both subtasks. The new solutions have the potential to facilitate researchers in applying the
domain-agnostic annotation schemes proposed in Korenčić et al. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] to new corpora.
      </p>
      <p>
        For Subtask 1 the most successful submitted English system [56] relied on augmentation using the
large news conspiracy corpus LOCO [11]. The best result for Spanish was achieved using a fine-tuned
GPT-3.5 [
        <xref ref-type="bibr" rid="ref10">41</xref>
        ]. The multilingual approach of Huertas-García et al. [
        <xref ref-type="bibr" rid="ref9">40</xref>
        ] also proved competitive. An
LLM-based zero-shot approach of Liu et al. [55] achieved results competitive with supervised baselines
on Subtask 1 and demonstrated a cost-efective way to bootstrap conspiracy vs. critical classifiers
for new use-cases. The experiments also point to the need to create better small-scale transformer
models for Spanish, as the solutions that work best on the Spanish dataset rely either on LLMs, or on
multilingual SLMs.
      </p>
      <p>For Subtask 2, the top system in both languages relied on a combination of data augmentation by
word replacement and sentence-level processing [54]. Most of the other systems relied on improving
the provided baseline solution by changing the underlying transformer model, or by modifying the
training procedure.</p>
      <p>
        There are many possible directions for creating even better-performing systems. Crafting new
domainspecific SLMs would probably be beneficial, as demonstrated by the efectiveness of Covid-twitter-BERT
[18] on both subtasks. Having in mind the dificulty of creating high-quality annotated data, further
work on the LLM-based zero- and few-shot approaches would be beneficial for practitioners. Similarly,
multi-lingual approaches adaptable to new languages with few annotated examples [76] would also be
an interesting and potentially efective direction to pursue. If the topic-agnostic annotation scheme [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]
used for this task is applied to create new labeled corpora, it would be interesting to use these corpora
for benchmarking the approach of Gómez-Romero et al. [
        <xref ref-type="bibr" rid="ref19">50</xref>
        ], which focuses on the generalization
capabilities of the models.
      </p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgments</title>
      <p>The shared task on Oppositional Thinking Analysis was organised in the framework of the
XAIDisInfodemics: eXplainable AI for disinformation and conspiracy detection during infodemics (MICIN
PLEC2021-007681), funded by MCIN/AEI/ 10.13039/501100011033 and by European Union
NextGenerationEU/PRTR. The work of Damir Korenčić and Berta Chulvi was conducted while at Universitat
Politècnica de València.
conspiracist worldviews, Frontiers in Psychology 8 (2017). URL: https://www.frontiersin.org/
articles/10.3389/fpsyg.2017.00861. doi:10.3389/fpsyg.2017.00861.
[9] D. Mahl, M. S. Schäfer, J. Zeng, Conspiracy theories in online environments: An
interdisciplinary literature review and agenda for future research, New Media &amp; Society
0 (2022) 14614448221075759. URL: https://doi.org/10.1177/14614448221075759. doi:10.1177/
14614448221075759. arXiv:https://doi.org/10.1177/14614448221075759.
[10] J. E. Uscinski, J. Parent, B. Torres, Conspiracy Theories are for Losers, 2011. URL: https://papers.</p>
      <p>ssrn.com/abstract=1901755, aPSA 2011 Annual Meeting Paper.
[11] A. Miani, T. Hills, A. Bangerter, Loco: The 88-million-word language of conspiracy corpus,</p>
      <p>Behavior research methods (2021) 1–24.
[12] J. Langguth, D. T. Schroeder, P. Filkuková, S. Brenner, J. Phillips, K. Pogorelov, Coco: an annotated
twitter dataset of covid-19 conspiracy theories, Journal of Computational Social Science (2023)
1–42.
[13] K. Pogorelov, D. T. Schroeder, S. Brenner, J. Langguth, FakeNews: Corona Virus and Conspiracies
Multimedia Analysis Task at MediaEval 2021, in: Working Notes Proceedings of the MediaEval
2021 Workshop Bergen, Norway and Online, 2021.
[14] K. Pogorelov, D. T. Schroeder, S. Brenner, A. Maulana, J. Langguth, Combining tweets and
connections graph for fakenews detection at mediaeval 2022, in: Proceedings of the MediaEval
2022 Workshop, Bergen, Norway and Online, 12-13 January 2023., 2023.
[15] Y. Peskine, G. Alfarano, I. Harrando, P. Papotti, R. Troncy, Detecting covid-19-related conspiracy
theories in tweets, in: MediaEval 2021, MediaEval Benchmarking Initiative for Multimedia
Evaluation Workshop, 13-15 December 2021, 2021.
[16] Y. Peskine, P. Papotti, R. Troncy, Detection of COVID-19-Related Conpiracy Theories in Tweets
using Transformer-Based Models and Node Embedding Techniques, in: Working Notes Proceedings
of the MediaEval 2022 Workshop Bergen, Norway and Online, 2023.
[17] D. Korenčić, I. Grubišić, A. H. Toselli, B. Chulvi, P. Rosso, Tackling Covid-19 Conspiracies on
Twitter using BERT Ensembles, GPT-3 Augmentation, and Graph NNs, in: Working Notes
Proceedings of the MediaEval 2022 Workshop Bergen, Norway and Online, 2023. URL: https:
//2022.multimediaeval.com/paper8969.pdf.
[18] M. Müller, M. Salathé, P. E. Kummervold, Covid-twitter-bert: A natural language processing model
to analyse covid-19 content on twitter, Frontiers in Artificial Intelligence 6 (2023). URL: https:
//www.frontiersin.org/articles/10.3389/frai.2023.1023281. doi:10.3389/frai.2023.1023281.
[19] A. Giachanou, B. Ghanem, P. Rosso, Detection of conspiracy propagators using psycho-linguistic
characteristics, Journal of Information Science 49 (2021) 3–17. doi:10.1177/0165551520985486.
[20] J. D. Mofitt, C. King, K. M. Carley, Hunting conspiracy theories during the covid-19 pandemic,</p>
      <p>Social Media + Society 7 (2021). doi:10.1177/20563051211043212.
[21] A. Bessi, Personality traits and echo chambers on facebook, Computers in Human Behavior
65 (2016) 319–324. URL: https://www.sciencedirect.com/science/article/pii/S0747563216305817.
doi:10.1016/j.chb.2016.08.016.
[22] C. Klein, P. Clutton, V. Polito, Topic Modeling Reveals Distinct Interests within an Online
Conspiracy Forum, Frontiers in Psychology 9 (2018). URL: https://www.frontiersin.org/articles/10.3389/
fpsyg.2018.00189.
[23] M. Samory, T. Mitra, ’The Government Spies Using Our Webcams’: The Language of Conspiracy
Theories in Online Discussions, Proceedings of the ACM on Human-Computer Interaction 2 (2018)
1–24. URL: https://dl.acm.org/doi/10.1145/3274421. doi:10.1145/3274421.
[24] S. Levy, M. Saxon, W. Y. Wang, Investigating Memorization of Conspiracy Theories in Text
Generation, in: Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021,
Association for Computational Linguistics, Online, 2021, pp. 4718–4729. URL: https://aclanthology.
org/2021.findings-acl.416. doi: 10.18653/v1/2021.findings-acl.416.
[25] J. Introne, A. Korsunska, L. Krsova, Z. Zhang, Mapping the Narrative Ecosystem of Conspiracy
Theories in Online Anti-vaccination Discussions, in: International Conference on Social Media
and Society, Association for Computing Machinery, 2020, pp. 184–192. URL: https://dl.acm.org/
doi/10.1145/3400806.3400828. doi:10.1145/3400806.3400828.
[26] P. Holur, T. Wang, S. Shahsavari, T. Tangherlini, V. Roychowdhury, Which side are you on?
Insider-Outsider classification in conspiracy-theoretic social media, in: Proceedings of the 60th
Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers),
Association for Computational Linguistics, Dublin, Ireland, 2022, pp. 4975–4987. URL: https:
//aclanthology.org/2022.acl-long.341. doi:10.18653/v1/2022.acl-long.341.
[27] P. Wagner-Egger, A. Bangerter, S. Delouvée, S. Dieguez, Awake together: Sociopsychological
processes of engagement in conspiracist communities, Current Opinion in Psychology 47 (2022)
101417. URL: https://www.sciencedirect.com/science/article/pii/S2352250X22001385. doi:https:
//doi.org/10.1016/j.copsyc.2022.101417.
[28] D. Chicco, N. Tötsch, G. Jurman, The Matthews correlation coeficient (MCC) is more reliable
than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix
evaluation, BioData Mining 14 (2021) 13. URL: https://doi.org/10.1186/s13040-021-00244-z. doi:10.
1186/s13040-021-00244-z.
[29] G. Da San Martino, S. Yu, A. Barrón-Cedeño, R. Petrov, P. Nakov, Fine-Grained Analysis of
Propaganda in News Articles, in: Proceedings of the 2019 Conference on Empirical Methods in
Natural Language Processing and the 9th International Joint Conference on Natural Language
Processing (EMNLP-IJCNLP), Association for Computational Linguistics, Hong Kong, China, 2019,
pp. 5636–5646. URL: https://aclanthology.org/D19-1565. doi:10.18653/v1/D19-1565.
[30] D. Chicco, G. Jurman, The advantages of the Matthews correlation coeficient (MCC) over
F1 score and accuracy in binary classification evaluation, BMC Genomics 21 (2020) 6. URL:
https://doi.org/10.1186/s12864-019-6413-7. doi:10.1186/s12864-019-6413-7.
[31] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, BERT: Pre-training of Deep Bidirectional
Transformers for Language Understanding, in: Proceedings of the 2019 Conference of the North American
Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume
1 (Long and Short Papers), Association for Computational Linguistics, Minneapolis, Minnesota,
2019, pp. 4171–4186. URL: https://aclanthology.org/N19-1423. doi:10.18653/v1/N19-1423.
[32] J. Cañete, G. Chaperon, R. Fuentes, J.-H. Ho, H. Kang, J. Pérez, Spanish Pre-trained BERT
Model and Evaluation Data, 2023. URL: http://arxiv.org/abs/2308.02976. arXiv:2308.02976,
arXiv:2308.02976.
[33] S. Ruder, An Overview of Multi-Task Learning in Deep Neural Networks, 2017. URL: http://arxiv.</p>
      <p>org/abs/1706.05098, arXiv:1706.05098.
[34] M. Zhang, K. Jensen, S. Sonniks, B. Plank, SkillSpan: Hard and soft skill extraction from
English job postings, in: Proceedings of the 2022 Conference of the North American
Chapter of the Association for Computational Linguistics: Human Language Technologies,
Association for Computational Linguistics, Seattle, United States, 2022, pp. 4962–4984. URL: https:
//aclanthology.org/2022.naacl-main.366. doi:10.18653/v1/2022.naacl-main.366.
[35] W. X. Zhao, K. Zhou, J. Li, T. Tang, X. Wang, Y. Hou, Y. Min, B. Zhang, J. Zhang, Z. Dong, Y. Du,
C. Yang, Y. Chen, Z. Chen, J. Jiang, R. Ren, Y. Li, X. Tang, Z. Liu, P. Liu, J.-Y. Nie, J.-R. Wen, A survey
of large language models, 2023. URL: https://arxiv.org/abs/2303.18223. arXiv:2303.18223.
[36] T. G. Dietterich, Ensemble methods in machine learning, in: Multiple Classifier Systems, Springer</p>
      <p>Berlin Heidelberg, Berlin, Heidelberg, 2000, pp. 1–15.
[37] C. Shorten, T. M. Khoshgoftaar, B. Furht, Text data augmentation for deep learning, Journal of big</p>
      <p>Data 8 (2021) 101.
[38] T. Pires, E. Schlinger, D. Garrette, How multilingual is multilingual BERT?, in: A. Korhonen,
D. Traum, L. Màrquez (Eds.), Proceedings of the 57th Annual Meeting of the Association for
Computational Linguistics, Association for Computational Linguistics, Florence, Italy, 2019, pp.
4996–5001. URL: https://aclanthology.org/P19-1493. doi:10.18653/v1/P19-1493.
[39] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. u. Kaiser, I. Polosukhin,
Attention is all you need, in: I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S.
Vishwanathan, R. Garnett (Eds.), Advances in Neural Information Processing Systems, volume 30,
Curran Associates, Inc., 2017. URL: https://proceedings.neurips.cc/paper_files/paper/2017/file/
CEUR-WS.org, 2024.
[53] J. Huang, Z. Han, R. Zhu, M. Guo, K. Sun, Conspiracy Theory Text Classification Based on CT-BERT
and BETO Models, in: G. Faggioli, N. Ferro, P. Galuščáková, A. G. S. de Herrera (Eds.), Working
Notes of CLEF 2024 - Conference and Labs of the Evaluation Forum, CEUR-WS.org, 2024.
[54] A. Tulbure, M. Coll Ardanuy, Conspiracy vs critical thinking using an ensemble of transformers
with data augmentation techniques, in: G. Faggioli, N. Ferro, P. Galuščáková, A. G. S. de Herrera
(Eds.), Working Notes of CLEF 2024 - Conference and Labs of the Evaluation Forum, CEUR-WS.org,
2024.
[55] B. Liu, Z. Han, H. Cao, An Approach to Classifying Conspiratorial and Critical Public Health
Narratives, in: G. Faggioli, N. Ferro, P. Galuščáková, A. G. S. de Herrera (Eds.), Working Notes of
CLEF 2024 - Conference and Labs of the Evaluation Forum, CEUR-WS.org, 2024.
[56] S. Mhalgi, S. Pulipaka, S. Kübler, IUCL at PAN 2024: Using Data Augmentation for Conspiracy
Theory Detection, in: G. Faggioli, N. Ferro, P. Galuščáková, A. G. S. de Herrera (Eds.), Working
Notes of CLEF 2024 - Conference and Labs of the Evaluation Forum, CEUR-WS.org, 2024.
[57] P. Balasundaram, K. Swaminathan, O. Sampath, P. Km, Oppositional Thinking Analysis: Conspiracy
Theories vs Critical Thinking Narratives, in: G. Faggioli, N. Ferro, P. Galuščáková, A. G. S.
de Herrera (Eds.), Working Notes of CLEF 2024 - Conference and Labs of the Evaluation Forum,
CEUR-WS.org, 2024.
[58] A. Albladi, C. Seals, Detection of Conspiracy vs. Critical Narratives and Their Elements using NLP,
in: G. Faggioli, N. Ferro, P. Galuščáková, A. G. S. de Herrera (Eds.), Working Notes of CLEF 2024
Conference and Labs of the Evaluation Forum, CEUR-WS.org, 2024.
[59] D. Espinosa, G. Sidorov, E. Ricárdez-Vázquez, Using BERT to Identify Conspiracy Theories, in:
G. Faggioli, N. Ferro, P. Galuščáková, A. G. S. de Herrera (Eds.), Working Notes of CLEF 2024
Conference and Labs of the Evaluation Forum, CEUR-WS.org, 2024.
[60] P. He, X. Liu, J. Gao, W. Chen, Deberta: Decoding-enhanced bert with disentangled attention, in:
International Conference on Learning Representations, 2021. URL: https://openreview.net/forum?
id=XPZIaotutsD.
[61] K. Clark, M.-T. Luong, Q. V. Le, C. D. Manning, Electra: Pre-training text encoders as discriminators
rather than generators, 2020. URL: https://arxiv.org/abs/2003.10555. arXiv:2003.10555.
[62] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, V.
Stoyanov, Roberta: A robustly optimized bert pretraining approach, 2019. arXiv:1907.11692,
arXiv:1907.11692.
[63] A. Conneau, K. Khandelwal, N. Goyal, V. Chaudhary, G. Wenzek, F. Guzmán, E. Grave, M. Ott,
L. Zettlemoyer, V. Stoyanov, Unsupervised cross-lingual representation learning at scale, 2020.</p>
      <p>URL: https://arxiv.org/abs/1911.02116. arXiv:1911.02116.
[64] L. Wang, N. Yang, X. Huang, L. Yang, R. Majumder, F. Wei, Multilingual e5 text embeddings: A
technical report, 2024. URL: https://arxiv.org/abs/2402.05672. arXiv:2402.05672.
[65] A. Gutiérrez-Fandiño, J. Armengol-Estapé, M. Pàmies, J. Llop-Palao, J. Silveira-Ocampo, C. P.</p>
      <p>Carrino, C. Armentano-Oller, C. Rodriguez-Penagos, A. Gonzalez-Agirre, M. Villegas, Maria:
Spanish language models, Procesamiento del Lenguaje Natural (2022) 39–60. URL: https://doi.org/
10.26342/2022-68-3. doi:10.26342/2022-68-3.
[66] J. Wei, X. Wang, D. Schuurmans, M. Bosma, B. Ichter, F. Xia, E. Chi, Q. Le, D. Zhou, Chain-of-thought
prompting elicits reasoning in large language models, 2023. URL: https://arxiv.org/abs/2201.11903.
arXiv:2201.11903.
[67] DeepSeek-AI, A. Liu, B. Feng, B. Wang, B. Wang, B. Liu, C. Zhao, C. Dengr, C. Ruan, D. Dai, D. Guo,
D. Yang, D. Chen, D. Ji, E. Li, F. Lin, F. Luo, G. Hao, G. Chen, G. Li, H. Zhang, H. Xu, H. Yang,
H. Zhang, H. Ding, H. Xin, H. Gao, H. Li, H. Qu, J. L. Cai, J. Liang, J. Guo, J. Ni, J. Li, J. Chen, J. Yuan,
J. Qiu, J. Song, K. Dong, K. Gao, K. Guan, L. Wang, L. Zhang, L. Xu, L. Xia, L. Zhao, L. Zhang, M. Li,
M. Wang, M. Zhang, M. Zhang, M. Tang, M. Li, N. Tian, P. Huang, P. Wang, P. Zhang, Q. Zhu,
Q. Chen, Q. Du, R. J. Chen, R. L. Jin, R. Ge, R. Pan, R. Xu, R. Chen, S. S. Li, S. Lu, S. Zhou, S. Chen,
S. Wu, S. Ye, S. Ma, S. Wang, S. Zhou, S. Yu, S. Zhou, S. Zheng, T. Wang, T. Pei, T. Yuan, T. Sun,
W. L. Xiao, W. Zeng, W. An, W. Liu, W. Liang, W. Gao, W. Zhang, X. Q. Li, X. Jin, X. Wang, X. Bi,
X. Liu, X. Wang, X. Shen, X. Chen, X. Chen, X. Nie, X. Sun, X. Wang, X. Liu, X. Xie, X. Yu, X. Song,
X. Zhou, X. Yang, X. Lu, X. Su, Y. Wu, Y. K. Li, Y. X. Wei, Y. X. Zhu, Y. Xu, Y. Huang, Y. Li, Y. Zhao,
Y. Sun, Y. Li, Y. Wang, Y. Zheng, Y. Zhang, Y. Xiong, Y. Zhao, Y. He, Y. Tang, Y. Piao, Y. Dong, Y. Tan,
Y. Liu, Y. Wang, Y. Guo, Y. Zhu, Y. Wang, Y. Zou, Y. Zha, Y. Ma, Y. Yan, Y. You, Y. Liu, Z. Z. Ren,
Z. Ren, Z. Sha, Z. Fu, Z. Huang, Z. Zhang, Z. Xie, Z. Hao, Z. Shao, Z. Wen, Z. Xu, Z. Zhang, Z. Li,
Z. Wang, Z. Gu, Z. Li, Z. Xie, Deepseek-v2: A strong, economical, and eficient mixture-of-experts
language model, 2024. URL: https://arxiv.org/abs/2405.04434. arXiv:2405.04434.
[68] J. D. l. Rosa, E. G. Ponferrada, M. Romero, P. Villegas, P. G. d. P. Salas, M. Grandury, BERTIN:
Eficient Pre-Training of a Spanish Language Model using Perplexity Sampling, Procesamiento
del Lenguaje Natural 68 (2022) 13–23. URL: http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/
article/view/6403, number: 0.
[69] J. M. Pérez, D. A. Furman, L. A. Alemany, F. Luque, Robertuito: a pre-trained language model for
social media text in spanish, 2022. URL: https://arxiv.org/abs/2111.09453. arXiv:2111.09453.
[70] T. Joachims, Text categorization with support vector machines: Learning with many relevant
features, in: C. Nédellec, C. Rouveirol (Eds.), Machine Learning: ECML-98, Springer Berlin
Heidelberg, Berlin, Heidelberg, 1998, pp. 137–142.
[71] L. Breiman, Random forests, Machine learning 45 (2001) 5–32.
[72] P. He, J. Gao, W. Chen, DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training
with Gradient-Disentangled Embedding Sharing, in: International Conference on Learning
Representations, 2023. URL: https://openreview.net/forum?id=sE7-XhLxHA.
[73] V. Sanh, L. Debut, J. Chaumond, T. Wolf, Distilbert, a distilled version of bert: smaller, faster,
cheaper and lighter, 2020. URL: https://arxiv.org/abs/1910.01108. arXiv:1910.01108.
[74] J. D. Laferty, A. McCallum, F. C. N. Pereira, Conditional random fields: Probabilistic models
for segmenting and labeling sequence data, in: Proceedings of the Eighteenth International
Conference on Machine Learning, ICML ’01, Morgan Kaufmann Publishers Inc., San Francisco,
CA, USA, 2001, p. 282–289.
[75] Z. Lan, M. Chen, S. Goodman, K. Gimpel, P. Sharma, R. Soricut, Albert: A lite bert for
selfsupervised learning of language representations, 2020. URL: https://arxiv.org/abs/1909.11942.
arXiv:1909.11942.
[76] F. D. Schmidt, I. Vulić, G. Glavaš, Don’t stop fine-tuning: On training regimes for few-shot
crosslingual transfer with multilingual language models, in: Y. Goldberg, Z. Kozareva, Y. Zhang (Eds.),
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing,
Association for Computational Linguistics, Abu Dhabi, United Arab Emirates, 2022, pp. 10725–10742. URL:
https://aclanthology.org/2022.emnlp-main.736. doi:10.18653/v1/2022.emnlp-main.736.
TASK 1 - ENGLISH
POSITION</p>
    </sec>
    <sec id="sec-9">
      <title>A. Appendix: Detailed Results</title>
      <p>TASK 1 - ENGLISH (cont.)
POSITION
TEAM
MCC</p>
      <p>F1-MACRO
F1-CONSPIRACY</p>
      <p>TASK 1 - SPANISH
POSITION
TASK 1 - SPANISH (cont.)
POSITION</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>D.</given-names>
            <surname>Korenčić</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Chulvi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Bonet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Mariona</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Toselli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          ,
          <article-title>What distinguishes conspiracy from critical narratives? A computational analysis of oppositional discourse</article-title>
          ,
          <source>Expert Systems</source>
          (
          <year>2024</year>
          ). doi:
          <volume>10</volume>
          .1111/exsy.13671.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>K. M. Douglas</surname>
            ,
            <given-names>R. M.</given-names>
          </string-name>
          <string-name>
            <surname>Sutton</surname>
          </string-name>
          ,
          <article-title>What are conspiracy theories? A definitional approach to their correlates, consequences, and communication</article-title>
          ,
          <source>Annual Review of Psychology</source>
          <volume>74</volume>
          (
          <year>2023</year>
          )
          <fpage>271</fpage>
          -
          <lpage>298</lpage>
          . URL: https://doi.org/10.1146/annurev-psych-
          <volume>032420</volume>
          -031329.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>H.</given-names>
            <surname>Tajfel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. C.</given-names>
            <surname>Turner</surname>
          </string-name>
          ,
          <article-title>An integrative theory of intergroup relations, Psychology of intergroup relations (</article-title>
          <year>1979</year>
          )
          <fpage>33</fpage>
          -
          <lpage>47</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>R.</given-names>
            <surname>Brown</surname>
          </string-name>
          ,
          <article-title>Social identity theory: past achievements, current problems and future challenges</article-title>
          ,
          <source>European Journal of Social Psychology</source>
          <volume>30</volume>
          (
          <year>2000</year>
          )
          <fpage>745</fpage>
          -
          <lpage>778</lpage>
          . doi:
          <volume>10</volume>
          .1002/
          <fpage>1099</fpage>
          -
          <lpage>0992</lpage>
          (
          <issue>200011</issue>
          / 12)30:
          <fpage>6</fpage>
          &lt;
          <fpage>745</fpage>
          :
          <article-title>:AID-EJSP24&gt;3.0</article-title>
          .CO;
          <fpage>2</fpage>
          -O.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Hogg</surname>
          </string-name>
          , Social identity theory (
          <year>2016</year>
          ). doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>319</fpage>
          -29869-
          <issue>6</issue>
          _
          <fpage>1</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>R. M.</given-names>
            <surname>Sutton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. M.</given-names>
            <surname>Douglas</surname>
          </string-name>
          ,
          <article-title>Rabbit hole syndrome: Inadvertent, accelerating, and entrenched commitment to conspiracy beliefs</article-title>
          ,
          <source>Current Opinion in Psychology</source>
          <volume>48</volume>
          (
          <year>2022</year>
          )
          <article-title>101462</article-title>
          . URL: https://www.sciencedirect.com/science/article/pii/S2352250X2200183X. doi:https://doi.org/ 10.1016/j.copsyc.
          <year>2022</year>
          .
          <volume>101462</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>E.</given-names>
            <surname>Funkhouser</surname>
          </string-name>
          ,
          <article-title>A tribal mind: Beliefs that signal group identity or commitment</article-title>
          ,
          <source>Mind &amp; Language</source>
          <volume>37</volume>
          (
          <year>2022</year>
          )
          <fpage>444</fpage>
          -
          <lpage>464</lpage>
          . URL: https://onlinelibrary.wiley. com/doi/abs/10.1111/mila.12326. doi:https://doi.org/10.1111/mila.12326. arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1111/mila.12326.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>B.</given-names>
            <surname>Franks</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bangerter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. W.</given-names>
            <surname>Bauer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hall</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. C.</given-names>
            <surname>Noort</surname>
          </string-name>
          , Beyond “monologicality”?
          <source>exploring 3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf.</source>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [40]
          <string-name>
            <surname>Á</surname>
          </string-name>
          .
          <string-name>
            <surname>Huertas-García</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Martí-González</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Muñoz</surname>
          </string-name>
          , E. Ambite,
          <article-title>Small Language Models and Large Language Models in Oppositional thinking analysis: Capabilities and Biases and Challenges</article-title>
          , in: G. Faggioli,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Galuščáková</surname>
          </string-name>
          , A. G. S. de Herrera (Eds.), Working Notes of CLEF 2024 -
          <article-title>Conference and Labs of the Evaluation Forum, CEUR-WS</article-title>
          .org,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [41]
          <string-name>
            <given-names>M.</given-names>
            <surname>Vallecillo-Rodríguez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Martín-Valdivia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Montejo-Ráez</surname>
          </string-name>
          ,
          <article-title>SINAI at PAN 2024 Oppositional Thinking Analysis: Exploring the fine-tuning performance of LLMs</article-title>
          , in: G. Faggioli,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Galuščáková</surname>
          </string-name>
          , A. G. S. de Herrera (Eds.), Working Notes of CLEF 2024 -
          <article-title>Conference and Labs of the Evaluation Forum, CEUR-WS</article-title>
          .org,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [42]
          <string-name>
            <given-names>H.</given-names>
            <surname>Touvron</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Lavril</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Izacard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Martinet</surname>
          </string-name>
          , M.
          <article-title>-</article-title>
          <string-name>
            <surname>A. Lachaux</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Lacroix</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Rozière</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Goyal</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Hambro</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Azhar</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Rodriguez</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Joulin</surname>
          </string-name>
          , E. Grave, G. Lample,
          <article-title>Llama: Open and eficient foundation language models</article-title>
          ,
          <year>2023</year>
          . URL: https://arxiv.org/abs/2302.13971. arXiv:
          <volume>2302</volume>
          .
          <fpage>13971</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [43]
          <string-name>
            <given-names>T.</given-names>
            <surname>Brown</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Mann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ryder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Subbiah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. D.</given-names>
            <surname>Kaplan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Dhariwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Neelakantan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Shyam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Sastry</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Askell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Agarwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Herbert-Voss</surname>
          </string-name>
          , G. Krueger,
          <string-name>
            <given-names>T.</given-names>
            <surname>Henighan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Child</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ramesh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ziegler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Winter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Hesse</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Chen</surname>
          </string-name>
          , E. Sigler,
          <string-name>
            <given-names>M.</given-names>
            <surname>Litwin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gray</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Chess</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Clark</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Berner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>McCandlish</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Radford</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Sutskever</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Amodei</surname>
          </string-name>
          ,
          <article-title>Language models are few-shot learners</article-title>
          ,
          <source>in: Advances in Neural Information Processing Systems</source>
          , volume
          <volume>33</volume>
          ,
          <string-name>
            <surname>Curran</surname>
            <given-names>Associates</given-names>
          </string-name>
          , Inc.,
          <year>2020</year>
          , pp.
          <fpage>1877</fpage>
          -
          <lpage>1901</lpage>
          . URL: https://proceedings.neurips.cc/paper/2020/hash/ 1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [44]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Han</surname>
          </string-name>
          ,
          <string-name>
            <surname>J</surname>
          </string-name>
          . Peng,
          <string-name>
            <given-names>M.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <article-title>An Oppositional Thinking Analysis Method Using BERTbased Model with BiGRU</article-title>
          , in: G. Faggioli,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Galuščáková</surname>
          </string-name>
          , A. G. S. de Herrera (Eds.), Working Notes of CLEF 2024 -
          <article-title>Conference and Labs of the Evaluation Forum, CEUR-WS</article-title>
          .org,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [45]
          <string-name>
            <given-names>K.</given-names>
            <surname>Cho</surname>
          </string-name>
          ,
          <string-name>
            <surname>B. van Merriënboer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Bahdanau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bengio</surname>
          </string-name>
          ,
          <article-title>On the properties of neural machine translation: Encoder-decoder approaches</article-title>
          ,
          <source>in: Proceedings of SSST-8</source>
          , Eighth Workshop on Syntax,
          <source>Semantics and Structure in Statistical Translation</source>
          ,
          <year>2014</year>
          , pp.
          <fpage>103</fpage>
          -
          <lpage>111</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [46]
          <string-name>
            <given-names>S.</given-names>
            <surname>Damian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Herrera-Gonzalez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Vazquez-Santana</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Calvo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Felipe-Riverón</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          <article-title>YáñezMárquez</article-title>
          , DSVS at PAN 2024:
          <article-title>Ensemble Approach of Large Language Models for Analyzing Conspiracy Theories Against Critical Thinking Narratives</article-title>
          , in: G. Faggioli,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Galuščáková</surname>
          </string-name>
          , A. G. S. de Herrera (Eds.), Working Notes of CLEF 2024 -
          <article-title>Conference and Labs of the Evaluation Forum, CEUR-WS</article-title>
          .org,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [47]
          <string-name>
            <given-names>I.</given-names>
            <surname>Sánchez-Hermosilla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. Panizo</given-names>
            <surname>Lledot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Camacho</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          <article-title>Study on NLP Model Ensembles and Data Augmentation Techniques for Separating Critical Thinking from Conspiracy Theories in English Texts</article-title>
          , in: G. Faggioli,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Galuščáková</surname>
          </string-name>
          , A. G. S. de Herrera (Eds.), Working Notes of CLEF 2024 -
          <article-title>Conference and Labs of the Evaluation Forum, CEUR-WS</article-title>
          .org,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [48]
          <string-name>
            <given-names>L.</given-names>
            <surname>Zrnić</surname>
          </string-name>
          ,
          <article-title>Conspiracy theory detection using transformers with multi-task and multilingual approaches</article-title>
          , in: G. Faggioli,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Galuščáková</surname>
          </string-name>
          , A. G. S. de Herrera (Eds.), Working Notes of CLEF 2024 -
          <article-title>Conference and Labs of the Evaluation Forum, CEUR-WS</article-title>
          .org,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [49]
          <string-name>
            <given-names>A.</given-names>
            <surname>Sahitaj</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Sahitaj</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mohtaj</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Möller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Schmitt</surname>
          </string-name>
          ,
          <article-title>Towards a Computational Framework for Distinguishing Critical and Conspiratorial Texts by Elaborating on the Context and Argumentation with LLMs</article-title>
          , in: G. Faggioli,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Galuščáková</surname>
          </string-name>
          , A. G. S. de Herrera (Eds.), Working Notes of CLEF 2024 -
          <article-title>Conference and Labs of the Evaluation Forum, CEUR-WS</article-title>
          .org,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [50]
          <string-name>
            <given-names>J.</given-names>
            <surname>Gómez-Romero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>González-Silot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Montoro-Montarroso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Molina-Solana</surname>
          </string-name>
          ,
          <string-name>
            <surname>E. Martínez Cámara</surname>
          </string-name>
          ,
          <article-title>Detection of conspiracy-related messages in Telegram with anonymized named entities</article-title>
          , in: G. Faggioli,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Galuščáková</surname>
          </string-name>
          , A. G. S. de Herrera (Eds.), Working Notes of CLEF 2024 -
          <article-title>Conference and Labs of the Evaluation Forum, CEUR-WS</article-title>
          .org,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [51]
          <string-name>
            <given-names>S.</given-names>
            <surname>Mahesh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Divakaran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Girish</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lakshmaiah</surname>
          </string-name>
          , Binary Battle:
          <article-title>Leveraging ML and TL Models to Distinguish between Conspiracy Theories and Critical Thinking</article-title>
          , in: G. Faggioli,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Galuščáková</surname>
          </string-name>
          , A. G. S. de Herrera (Eds.), Working Notes of CLEF 2024 -
          <article-title>Conference and Labs of the Evaluation Forum, CEUR-WS</article-title>
          .org,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [52]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zeng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Han</surname>
          </string-name>
          ,
          <string-name>
            <surname>J</surname>
          </string-name>
          . Ye,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Tan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Cao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A Conspiracy</given-names>
            <surname>Theory</surname>
          </string-name>
          <article-title>Text Detection Method based on RoBERTa and XLM-RoBERTa Models</article-title>
          , in: G. Faggioli,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Galuščáková</surname>
          </string-name>
          , A. G. S. de Herrera (Eds.), Working Notes of CLEF 2024 -
          <article-title>Conference and Labs of the Evaluation Forum,</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>