<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Conference and Labs of the Evaluation Forum, September</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>SINAI at PAN 2024 Oppositional Thinking Analysis: Exploring the Fine-Tuning Performance of Large Language Models</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>María Estrella Vallecillo-Rodríguez</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>María Teresa Martín-Valdivia</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Arturo Montejo-Ráez</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Computer Science Department, SINAI, CEATIC, Universidad de Jaén</institution>
          ,
          <addr-line>23071</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <volume>0</volume>
      <fpage>9</fpage>
      <lpage>12</lpage>
      <abstract>
        <p>This article describes the participation of the SINAI research group in the shared task “Oppositional Thinking Analysis: Conspiracy theories vs critical thinking narratives” in CLEF 2024. This task is composed of 2 subtasks subtask 1 which consists of a binary classification between critical and conspiracy texts and subtask 2 which consists of a token-level classification of the element of the oppositional narrative. The proposed system for both subtasks consists of the use of LLMs (LLaMA3 or GPT-3.5) where we apply an instruction tuned for the specific subtask. We think that these types of models have more knowledge and can reason to distinguish each type of text or elements of the texts and the instruction tuned will potentiate this, helping the models to distinguish between the classes. In the final leaderboard, our proposal obtained 3rd and 1st place for task 1 in English and Spanish respectively. In subtask 2 our systems reached the 18th position for English and 17th for Spanish.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Large Language Models</kwd>
        <kwd>QLoRA</kwd>
        <kwd>Zero-Shot Learning</kwd>
        <kwd>Oppositional Thinking Analysis</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Nowadays, social networks are the most widely used means of communication by people. In them,
users share various aspects of their lives, express their opinions, ideas, and even share current news.
The problem is that not all the news published on social networks are true and users sometimes just by
reading them are already spreading them on the network, without stopping to check the information.
One type of message that is harmful to the social networking community is conspiracy theories, defined
by the European Union [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] as: "The belief that certain events or situations are secretly manipulated
behind the scenes by powerful forces with negative intentions". These theories are harmful because
they can generate serious consequences in society, such as spreading distrust in public institutions or
scientific information, feeding discrimination, and justifying hate crimes, among other consequences.
However, there are other types of texts that can be found in social networks known as critical thinking
narratives. In them, users express their opinions, sometimes argued and sometimes based on events
that have happened to them or to acquaintances. A more concrete definition of what critical thinking
narratives are is provided by the Oxford dictionary [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] “the process of analyzing information in order
to make a logical decision about the extent to which you believe something to be true or false”. If this
critical thinking issues a judgment that opposes the main idea, we will be talking about an oppositional
critical thinking narrative.
      </p>
      <p>These two narratives explained above are challenging to distinguish, especially for language models
that analyze social network content. Therefore, the organizers of the shared task “Oppositional Thinking</p>
      <p>
        Analysis” in the PAN Lab [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] of CLEF 2024 propose 2 subtasks. The first subtask is to distinguish
the conspiratorial narrative from other oppositional narratives that do not express a conspiratorial
mindset (i.e., critical thinking). This is a binary classification task between two classes (CRITICAL or
CONSPIRATIVE). The second subtask is to identify the key elements of a narrative that fuels intergroup
conflict in oppositional thinking in online messages. This task is a token-level classification task in
which models have to recognize text spaces corresponding to such key elements of oppositionalist
narratives (AGENT, FACILITATOR, VICTIM, CAMPAIGNER, TARGET, NEGATIVE_EFFECT). These
two subtasks are proposed to be applied in English and Spanish messages. The data of the task is
extracted from the Telegram social network and is related to the COVID-19 pandemic.
      </p>
      <p>
        Our proposal consists of the use of generative LLMs such as GPT-3.5 and LLaMA3-8B-instruct that
are trained with instructions to detect conspiratorial and critical texts as well as the diferent elements
of these narratives. To train the models we want to apply QLoRA, which is a method to train LLMs
eficiently, and the OpenAI API. We think that this adaptation to the task will be crucial to help the
models learn the diferences between the diferent classes. With this proposal, we intend to study how
the training of the models afects the achievement of the objectives of the proposed subtasks, and what
diferences exist between the size of the models, and their performance. There are recent studies that
focus on the study of conspiracy theories in social networks, such as [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] in which the authors create a
dataset that includes accounts dedicated to conspiracy theories and a control group of randomly selected
users. They then perform a comparative analysis of the topics covered, profile characteristics, and
behaviors. Using machine learning algorithms and features from bot, troll, and linguistic literature, they
successfully classified conspiracy theory users with high accuracy. In contrast, other studies attempt to
use and analyze the performance of generative LLMs to detect conspiratorial texts. Diab et al. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] tries
to address the detection of conspiracy theories by training a BERT model and then compare with the
performance of the GPT model without applying any training to it. Their study finds that GPT fails to
apply logical reasoning. However, other studies such as [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] which focuses on detecting conspiratorial
Telegram messages in German, show a comparison between applying supervised tuning approaches
(BERT models) and instruction-based approaches (LLaMA2, GPT-3.5 and GPT-4), which require little or
no additional training data. Their work shows that both approaches can be used efectively, highlighting
that among the highest results is GPT-4 with Zero-Shot Learning (ZSL) instruction and including
a definition of what a conspiracy theory is. Peskine et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] attempt to generate definitions from
examples and use them for zero classification of fine-grained multi-label conspiracy theory. They show
that improving class label definitions has a direct consequence on subsequent classification results. This
makes us think that it is very important to refine the instruction we give to the model. Some studies
analyze how well instruction-based models perform if they adjust a task. An example of such studies is
[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] in which they use a LLaMA model containing emotional information and apply training based on
diferent instructions (emotion recognition, sentiment, and conspiracy theories). Their results show
that this model largely outperforms several open-source domain-general LLMs.
      </p>
      <p>The remainder of the paper is organized as follows: Section 2 presents an overview of important
details about the proposed system for the shared task. The used data and the methodology followed to
achieve the goal of the task are described in Section 3. In Section 4 we show the results obtained in
our experiments during the development phase and the evaluation phase. Finally, we conclude with a
discussion in Section 5.</p>
    </sec>
    <sec id="sec-2">
      <title>2. System overview</title>
      <p>
        The developed system to achieve Oppositional Thinking Analysis shared task [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] at CLEF 2024 is
described in this section.
      </p>
      <p>
        To achieve both subtasks we want to study how LLMs such as LLaMA3 or GPT-3.5 which are
generative models can be adapted to a classification task as proposed in this shared task. In addition
we want to study whether the diferences between the size of the models influenced the classification
of each text. For this reason, we plan to apply an instruction-based training of the models. The first
step of this method is to create a good instruction or prompt in which the models show good results in
pre-training tests. To do this, we provide diferent examples to the models and ask the selected models
what are the diferences between critical and conspiratorial texts for subtask 1 and a definition of each
element of oppositional narratives for subtask 2. We will feed the prompt with the information these
models give us in their response, as we believe this information will help the model to detect each type
of text or element. The used prompt are presented in Appendix A. To train the GPT-3.5 model, we use
the OpenAI API with 1 epoch and to train LLaMA we use a method called QLoRA [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. This approach
facilitated a faster and more afordable process as it significantly reduced the hardware requirements.
The model was loaded in 4 bits with the quantization data type NF4. As computational data type bf16
was used. Finally, LoRA update matrices were applied to the linear layers of the model. The LoRA rank
was set to 16, the scaling factor (LoRA alpha) to 64 and the dropout to 0.05. We used a learning rate of
2e-4 and 1 example for the batch size, 10 epochs with an early stop of 3 epochs.
      </p>
      <p>Furthermore, as we can see in Section 3.1 the dataset for subtask 2 is unbalanced, especially in the
Spanish dataset where the class ‘OBJECTIVE’ appears fewer times. Since we do not have enough
instances for the model to learn and considering this class inserts noise in the training of the models,
we exclude this class during the training.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Experimental setup</title>
      <p>
        3.1. Data
The dataset of this shared task [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] is composed of 10,000 messages of Telegram written in English
and Spanish. These messages are related to the COVID-19 pandemic and labelled according to the
annotation scheme for Task 1 and Task 2. The labels of Subtask 1 are CONSPIRACY and CRITICAL and
the labels that we can associate to the span texts of the Subtask 2 are AGENT, FACILITATOR, VICTIM,
CAMPAIGNER, OBJECTIVE, NEGATIVE_EFFECT. The dataset is divided into two splits, the first to
train the developed systems and the second to test these systems.
      </p>
      <p>The distribution of the train split of the dataset for Subtask 1: Distinguishing between critical and
conspiracy texts, and Subtask 2: Detecting elements of the oppositional narratives are presented in
Figures 1 and 2, respectively. We can observe that, for Subtask 1, the majority class is ‘CRITICAL’
although the dataset is not very unbalanced. Since Subtask 2 is a token-level classification, each text
can have multiple labels and the same label can be repeated for each text instance. So, for Subtask 2, we
can observe two figures, the first one (Subfigure 2a) represents the number of the texts in the dataset
where the labels are. The label ‘X’ represents the texts where no label appears for the task. The second
ifgure (Subfigure 2b) represents the number of times each label appears in the dataset. In each figure,
we can see unbalanced data, for example, the minority class is ‘OBJECTIVE’ and for Spanish, this label
appears in only 338 texts and fewer occurrences only 493 in front of 898 times of the same class appear
in the English dataset with 1602 occurrences.</p>
      <p>To carry out the experiments proposed in Section 3. We will divide the training set provided by the
task organizers into three subsets. One to train the models, another to perform validation of our system
during training, and finally a test one to evaluate how well our systems perform and select the best
experiments to submit the final results to the task. The partitioning performed was done in a stratified
way to maintain the same percentage of labels in the partitions created. The number of text instances
of each class in the dataset for each subset can be seen in Table 1.</p>
      <sec id="sec-3-1">
        <title>3.2. Experiments and Selected Models</title>
        <p>
          To achieve the goal of the Oppositional Thinking Analysis shared task, we selected the following models:
LLaMA3-8B-instruct [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ], and GPT-3.5[
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. With these models, we want to study how the training of
the models afects the achievement of the objectives of the proposed subtasks, and what diferences
exist between the size of the models, and their performance. Moreover, we propose two experiments for
each task. Each experiment has a diferent configuration. The proposed experiments are the following:
• Baseline. This experiment employs the use of the model with a prompt strategy based on ZSL,
providing reasoning for its responses. Our goal is to establish a reference experiment to evaluate
the efectiveness of the proposed systems. In this case, we selected the GPT-3.5 model because we
consider that a model with more parameters has more knowledge of the task and will be able to
distinguish between the diferent classes of both subtasks without previous knowledge of them.
• Fine-tuning. This experiment applies techniques for eficient instruction learning of LLMs. To
apply this experiment we are going to select the LLaMA3-8B-instruct model which is an open
model and we have full control of its parameters and the GPT-3.5 model which belongs to a
company and its use is not free. It also has more restricted parameters. To train LLaMA we will use
a technique for eficient learning of LLMs called QLoRA (Quantified Low-Rank Adaptation) [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ].
This method accelerates the training process and makes it more accessible. QLoRA enables us to
train models with a large number of hyperparameters using minimal hardware resources. This is
achieved by not requiring the training of all model parameters and through the quantization of
the numbers used during the training process. In this experiment, we load the selected model
with 4 bits with the quantization data type NF4. The computational data type bf16 will be used.
The LoRA update matrices were applied to the linear layers of the model. The LoRA rank is to
be set to 16, the scale factor (LoRA alpha) to 64, and the dropout to 0.05. In addition, we used a
learning rate of 2e-4, 1 example for the batch size, and 10 epochs with an early stop of 3 epochs.
In the case of the GPT-3.5 model, we will use the openAI API and train the model with 1 epoch.
For subtask 2, as seen in Section 3.1, we have unbalanced data, so we will propose two variants of
this experiment:
– FT_all: Using all the labels in the dataset.
        </p>
        <p>– FT_withoutObjective: Excluding the minority class. This class is ‘OBJECTIVE’.
Because the use of GPT-3.5 is not free, to train the GPT-3.5 model for subtask 2 we only apply the
ifne tuning of the best variant of LLaMA training for subtask 2. As we can see in Section 4.1 the
best variants for each language are to use all labels for English and to exclude the OBJECTIVE
class for Spanish.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results</title>
      <p>
        In this section, we present the results obtained by the system developed as part of our participation in
the “Oppositional thinking analysis” task. To evaluate our systems, we use the oficial metrics given by
the organizers. Specifically, the MCC metric [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] that is a single-value classification metric which helps
to summarize the confusion matrix or an error matrix. The MCC ranges between -1 and +1. A coeficient
of +1 represents perfect prediction, 0 represents average random prediction and -1 represents inverse
prediction. Moreover, for this task, the macro F1 score (harmonic mean of precision and recall for a
more balanced summarization of model performance) and the specific macro F1 score for each class are
provided. For subtask 2, the span F1 score [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] is used as the oficial metric. This metric calculates F1
measures per each class of the dataset and for each span identified. In addition, the organizers provide
span recall and precision and the micro span F1 score. The experiments are conducted in two phases,
the development phase, where we select the best models, and the evaluation phase where we evaluate
the selected models and choose the best model to appear in the leaderboard of the evaluation campaign.
      </p>
      <sec id="sec-4-1">
        <title>4.1. Development Phase</title>
        <p>In order to select the best model for each subtask we trained the models selected in Section 3.2 with a
subset of the train split provided by the organizers and evaluated them with other subsets of the train
split. The results obtained in the development phase are shown in Tables 2, and 3.</p>
        <p>Table 2 shows the results obtained in the experiments proposed above for subtask 1. As can be seen,
the fine-tuning of LLaMA-8B-instruct shows promising results in all the metrics evaluated and obtains
the best result when applied to English data. Its performance in Spanish is good, although it does not
outperform the fine-tuning of GPT-3.5 model. This may be because LLaMA-8B-instruct does not have
as extensive knowledge of Spanish as GPT-3.5, having been trained with more English data. In addition,
the GPT-3.5 model shows greater consistency across languages by obtaining very similar results in
both languages. If we look at the performance of the GPT-3.5 model with the ZSL experiment we see a
big diference with the fitted models, as it does not achieve such promising results. This highlights the
importance of adjusting the models to improve their performance in the proposed task.</p>
        <p>The results of subtask 2 are shown in Table 3. For this subtask, we can see how LLaMA3-8B-instruct
shows better results for English when trained with all classes compared to when trained without taking
the minority class into account. However, for Spanish, the results of training with all classes show
lower performance than expected, being even below the ZSL strategy where the GPT-3.5 model has
not been fitted to the task. Probably because having a very underrepresented class with few examples
inserts noise during the model training process. For that reason, if we remove the minority class
(OBJECTIVE) from Spanish we get a result more similar to what we would expect. As in the previous
task, LLaMA3-8B-intruct performs better in English and GPT fits better in Spanish. If we look at the
ZSL experiment we can see that this strategy is not efective for the task compared to model fitting,
demonstrating the need to train the models to understand the diferences between the diferent classes
we have and where in the text they may appear.</p>
        <sec id="sec-4-1-1">
          <title>Model</title>
        </sec>
        <sec id="sec-4-1-2">
          <title>LLaMA3-8B-instruct</title>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Evaluation Phase</title>
        <p>In the evaluation phase, we use the trained models of the development phase and evaluate them on the
test set provided by the organizers. The systems submitted and their results for each run in subtasks 1
and 2 are presented in Tables 4, and 5 respectively.</p>
        <p>Regarding subtask 1, because the LLaMA3-8B-instruct adjustment obtained the best results for English
and because it was free, we decided to send these results in the 2 runs. In Spanish, we decided to
send on one side the adjusted LLaMA3-8B-instruct model and on the other GPT-3.5. As can be seen
in Table 4 we can see how the performance of the adjusted GPT-3.5 model outperforms the adjusted
LLaMA3-8B-instruct model. This is not surprising since the same thing happened in the development
phase and may be due to the fact that the prior knowledge that GPT has about Spanish is higher than
that of LLaMA3-8B-instruct.</p>
        <p>MCC
0.8297
0.6780
0.8297
0.7429</p>
        <sec id="sec-4-2-1">
          <title>F1-macro F1-conspiracy F1-critical</title>
          <p>0.9149 0.8886 0.9412
0.8363 0.7841 0.8886
0.9149 0.8886 0.9412
0.8705 0.8319 0.9091</p>
          <p>The results of the systems submitted for subtask 2 can be seen in Table 5. For each submission we were
allowed for this task we decided to submit an adjusted model with all classes for English and removing
the minority class for Spanish. Since the diferences between GPT-3.5 Spanish and LLaMA3-8B-instruct
are minimal, we decided not to make combinations between these models for Spanish and to send the
predictions made by each model for Spanish and English. As can be seen in this table, the model that
best fits the task is LLaMA3-8B-instruct, probably because it has been trained with more epochs than
GPT-3.5 and the task is somewhat more complex than the first one, since we have to choose between 6
classes and the parts in which it appears.
Finally, we want to emphasize that the results obtained in both tasks by LLaMA3-8B-instruct are
striking due to the large diference between the number of parameters that LLaMA3-8B-instruct has
in comparison with GPT-3.5. This makes us think that as long as we have quality data and that they
are representative of the classes, it is not so important to select models that are very large, since by
training them a little more epochs we can obtain very similar and even better results to those with a
large number of parameters.</p>
        </sec>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Error Analysis</title>
        <p>For each subtask, we present an error analysis of the final selected models in the test split used during
our development phase.</p>
        <p>For the first task, in Table 6 what we can see how dificult it is to recognize each of the labeled classes.
For example in the first text for Spanish (id 9256) we can see how the comment is a criticism of the
decision to change the brand at the time of putting the third dose of a vaccine, but also has part of
conspiracy to say that to kill all carry the same thing, so the model assign the CONSPIRACY class. In
the second example for Spanish (id 9076) we see a typical sentence of conspiracy theories ( “they try to
make us believe”), but the model is not able to detect it and thinks that it is more oriented to criticize
how the diferent COVID variants are created. On the other hand, if we look at the English texts, we can
see how just the conspiracy title of a thread of conversations where opinions are going to be exposed,
already helps the model to classify it as CONSPIRACY instead of CRITICAL (id 151). Moreover, in the
second English text (id 177), the purpose of the message is a conspiracy, but LLaMA3-8B-instruct model
labels it as critical, probably because it thinks that is spreading an opinion of something that has been
said in a podcast like AlexJonesShow.</p>
        <p>On the other hand, in the texts related to Subtask 2, we find examples such as the ones shown in
Table 7. If we look at the Spanish example (id 4263) as we have removed the OBJECTIVE class from the
Spanish model, the model should not predict anything. However, it predicts various CAMPAIGNERS
that are not even entities that promote something in the conspiracy. This suggests that the model is
hesitant to recognize these types of entities. In the case of the English text, we can see how it is dificult
for the model to recognize the negative efects that do not carry negations or negative words such as
death. We can also see how it confuses the class CAMPAIGNER with FACILITATOR as in the case of
“the " " scientific clerisy " " ”.
What Else Could They Have Lied to You About ? Tune into my conversation on Radical , with Maajid
Nawaz ... –&gt; drtesslawrie . substack . com / p / on - what - else - could - they - have - lied I ’m delighted to
share this wonderful conversation I had recently with Maajid Nawaz . Maajid is , amongst many things ,
a podcaster , an author with his own Substack here , and he was also a host at this year ’s Better Way
Conference . I really enjoyed speaking with him — he asks good questions — and we covered not just
health but also the nefarious aims of the World Economic Forum and Big Pharma , the need for us to take
control of our own health and also how to positively and practically prepare for challenging times ahead .</p>
        <p>Watch it here , and I hope you enjoy it . Have a wonderful Sunday , Tess Follow Me : –&gt; @ audreywest
# AlexJonesShow : It ’s Oficial ! mRNA Covid Vaccines Are Euthanizing Thousands of Old People
Worldwide ! - https :// ifw . io / hw8 Get Live Broadcast Alerts ! - Text : ’ SHOW ’ To : ( 833 ) 470 - 0222
$ 50 Of Alexapure Pro Water Filtration System : https :// www . infowarsstore . com / alexapure - pro
water - filtration - system
Gold Label
CRITICAL
predicted Label</p>
        <p>CONSPIRACY
CONSPIRACY</p>
        <p>CRITICAL
CRITICAL</p>
        <p>CONSPIRACY
CONSPIRACY</p>
        <p>CRITICAL
LLaMA3-8B-instruct model error analysis for Subtask 2 (Detecting elements of the oppositional narratives).
Examples of predictions from the test split created from the train split of Oppositional Thinking Analysis shared
Text Gold Labels predicted Labels
¿ LAS VACUNAS COVID INSTALARON « CARGAS ÚTILES » DE MARBURG {’text’: ’El uso de señales externas para con- {’text’: ’MARBURG’,’chars’: 52-59, ’category’:
QUE SERÁN LIBERADAS POR SEÑALES 5 G ? En la vacuna COVID - 19 se trolar implantes neurales usando nanotec- ’CAMPAIGNER’},
instaló nanotecnología que transportaba cargas útiles de virus quiméricos . No nología’, ’category’: ’OBJECTIVE’, ’chars’: {’text’: ’cargas útiles [...] quiméricos’,’chars’:
es ciencia ficción . El uso de señales externas para controlar implantes neurales 225-307, ’english_text’: ” } 165-198, ’category’: ’CAMPAIGNER’,
’enusando nanotecnología está bien descrito en patentes y literatura médica . https glish_text’: ’chimeric [...] payloads’,},
:// ejercitoremanente . com / 2022 / 04 / 26 / las - vacunas - covid - instalaron - {’text’: ’El uso de [...] literatura médica’,’chars’:
cargas - utiles - de - marburg - que - seran - liberadas - por - senales - 5 g / 225-358, ’category’: ’CAMPAIGNER’,
’en(DID THE COVID VACCINES INSTALL MARBURG “PAYLOADS” TO BE RE- glish_text’: ’The use [...] medical literature’},
LEASED BY 5 G SIGNALS ? In the COVID - 19 vaccine was installed nanotech- {’text’: ’ejercitoremanente’,’chars’: 371-388,
nology carrying chimeric virus payloads . It is not science fiction . The use of ’category’: ’CAMPAIGNER’},
external signals to control neural implants using nanotechnology is well described {’text’: ’cargas - utiles - de - marburg’,’chars’:
in patents and medical literature . https :// ejercitoremanente . com / 2022 / 04 / 26 451-481, ’category’: ’CAMPAIGNER’}
/ las - vacunas - covid - instalaron - cargas - utiles - de - marburg - que - seran
liberadas - por - senales - 5 g / )
11360 " Stanford professor who challenged lockdowns and ’ scientific clerisy ’ declares {’text’: "Stanford professor who [...] clerisy ’", {’text’: ’a medical [...] Stanford
Univeracademic freedom ’ dead ’ - FOX NEWS After his life became a " " living hell " ’category’: ’CAMPAIGNER’, ’chars’: 2-72}, sity’,’chars’: 264-306, ’category’:
’CAM" for challenging coronavirus lockdown orders and the " " scientific clerisy " " {’text’: "scientific clerisy ’", ’category’: ’FACIL- PAIGNER’},
during the pandemic , a medical professor at Stanford University claims that " ITATOR’, ’chars’: 52-72}, {’text’: ’academic [...] dead’,’chars’: 323-347,
" academic freedom is dead . " " SOURCE @ TheGreatResetTimes Follow us : {’text’: ’his life [...] " " living hell " "’, ’category’: ’category’: ’NEGATIVE_EFFECT’},
Telegram | Chat Group | Twitter " ’NEGATIVE_EFFECT’, ’chars’: 125-162}, {’text’: ’TheGreatResetTimes’,’chars’: 363-381,
{’text’: ’the " " scientific clerisy " "’, ’category’: ’category’: ’CAMPAIGNER’},
’FACILITATOR’, ’chars’: 211-241}, {’text’: ’a {’text’: ’the " " scientific [...] the
panmedical [...] Stanford University’, ’category’: demic’,’chars’: 211-261, ’category’:
’CAM’CAMPAIGNER’, ’chars’: 264-306}, PAIGNER’}
{’text’: ’TheGreatResetTimes’, ’category’:
’CAMPAIGNER’, ’chars’: 363-381}
LLaMA3
8B-instruct</p>
        <p>EN</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>This paper presents the participation of SINAI research group in the Oppositional Thinking Analysis
shared task at CLEF 2024. In the two subtasks, we explore how diferent fine-tuned LLMs (GPT-3.5
and LLaMA3-8B-instruct) perform using previous knowledge. For the first subtask, we have seen that
GPT-3.5 model works better for Spanish than LLaMA3-8B-instruct model when fine-tuned to the task,
while LLaMA3-8B-instruct performs better for English. In the second subtask, we found that
LLaMA38B-instruct achieved better results than GPT-3.5 in both languages.</p>
      <p>We conclude that, in general,
ifne-tuning LLMs is efective for conducting oppositional thinking analysis tasks, especially when
the number of classes is fewer. Furthermore, the good performance obtained by LLaMA3-8B-instruct
demonstrates that it is not always necessary to use larger models; rather, we need models trained with
quality data and given well-constructed input prompts so that they can efectively understand the task
at hand. As future work, we plan to further analyze the misclassification of each class and provide the
model with a complete definition to help in its reasoning. Additionally, since the detection of critical
thinking is subjective, we aim to study how the classification of models is afected by texts with lower
agreement among annotators and whether annotators’ sociodemographic characteristics influence
their reasoning. Finally, we want to investigate if the models are overfitted to the task data and if they
perform well with other datasets.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>This work has been partially supported by Project CONSENSO (PID2021-122263OB-C21), Project
MODERATES (TED2021-130145B-I00), and Project SocialTox (PDC2022-133146-C21) funded by
MCIN/AEI/10.13039/501100011033 and by the European Union NextGenerationEU/PRTR.</p>
    </sec>
    <sec id="sec-7">
      <title>A. Used Prompt</title>
      <p>The prompts used for our experiment with ZSL and for tuning the selected models are presented in
Table 8.
Subtask 2</p>
      <sec id="sec-7-1">
        <title>Prompt</title>
        <p>You are an expert in the classification of critical and conspiratorial texts. Your task is to identify these
CRITICAL and CONSPIRACY texts.</p>
        <p>CRITICAL messages criticize decisions made by an individual, a group of people, or a committee of experts.
They may also expose personal concerns or opinions on an issue or decisions that have been made over
time and are contradictory. Moreover, they make a claim about the theme, without delving into complex
or implausible theories
CONSPIRACY messages, on the other hand, see decisions as the result of a malevolent conspiracy by
secret and influential groups. There are some diferences between CRITICAL and CONSPIRACY messages:
1. Degree of Speculation: CRITICAL texts may contain unsubstantiated personal claims, but CONSPIRACY
texts often go further by proposing complex and implausible theories. These theories lack solid evidence
and are based on extreme speculation.
2. Level of Alarmism: CRITICAL texts may use alarming language. CONSPIRACY texts tend to be even
more sensationalist and apocalyptic. They often include claims of impending catastrophic events or the
existence of an ‘imminent danger’ that only the ‘awakened’ can see.
3. Global Conspiracy Tone: CRITICAL texts suggest specific concerns while CONSPIRACY texts often
address much broader issues, such as the existence of a ‘secret world government’ or the manipulation of
reality by unknown entities.</p>
        <p>Now you are going to receive a TEXT and based on everything explained above, argue your response,
reasoning step by step, and put at the end of your answer the keyword ‘LABEL’ with the assigned class
(CRITICAL or CONSPIRACY).</p>
        <p>TEXT: " "
You are an expert in detecting elements of the texts. Since conspiracy narratives are a special type of
causal explanation, your task consists in the recognition of text spans corresponding to the key elements
of a text.</p>
        <p>Step 1: Identify all of the negative efects mentioned in the text and relate them to the oppositional
narrative. A negative efect is a harmful consequence or negative impact related to conspiracy theories or
critical aspects. Put these negative efects in the same form that they appear in the text in diferent lines
with the keyword “NEGATIVE_EFFECT”.</p>
        <p>Step 2: Identify if there is an explicitly stated objective of the oppositional narrative. An explicit objective
refers to a clear and direct statement outlining the goal or purpose of the narrative being presented. This
objective is typically stated overtly within the text, providing insight into what the proponents of the
narrative are trying to achieve or promote. Put these objectives in the same form that they appear in the
text in diferent lines with the keyword “OBJECTIVE”.</p>
        <p>Step 3: Identify if there are victims of the oppositional texts. A victim is a specific individual or group that
is negatively afected by the negative efects identified in step 1, harmful actions or policies described in
the text. Put all victims with the keyword “VICTIM”.</p>
        <p>Step 4: Identify if there are conspirators in the text. A conspirator refers to the entity responsible for
planning, executing, or supporting the main action or policy being discussed in the text. Moreover, a
conspirator is responsible for the NEGATIVE_EFECTS Put all the conspirators identified with the keyword
“AGENT”.</p>
        <p>Step 5: Identify if there is any facilitator in the text. A facilitator is a collaborator or entity that supports
the agents in executing the main actions or policies discussed in the text. They assist in the achievement
of the objectives outlined by the conspirators, often playing a role in enabling or promoting the negative
efects on the victims. Put all the facilitators identified with the keyword “FACILITATOR”.
Step 6: Detect the campaigners that appear in the text. A campaigner is an entity or someone who
unmasks the conspiracy agenda, opposes the conspiracy narrative, and works to expose or challenge it.
Moreover, a campaigner actively opposing the mainstream narrative and promoting his own opinion. Put
all the campaigners identified with the keyword “CAMPAIGNERS”.</p>
        <p>Please answer each step with the exact part of the text and explain your answer for each step. If there is
not a specific and clear element, do not provide it.</p>
        <p>TEXT: " "</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>European</given-names>
            <surname>Comission</surname>
          </string-name>
          , Identifying conspiracy theories, https://commission. europa.eu/strategy-and
          <article-title>-policy/coronavirus-response/fighting-disinformation/ identifying-conspiracy-theories_en, Publication date unknown</article-title>
          . Accessed:
          <volume>13</volume>
          /06/
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Oxford</given-names>
            <surname>Learner's Dictionaries</surname>
          </string-name>
          , Definition of critical thinking, https://www. oxfordlearnersdictionaries.com/definition/english/critical-thinking?
          <article-title>q=critical+thinking, Publication date unknown</article-title>
          . Accessed:
          <volume>13</volume>
          /06/
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A. A.</given-names>
            <surname>Ayele</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Babakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bevendorf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X. B.</given-names>
            <surname>Casals</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Chulvi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Dementieva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Elnagar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Freitag</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Fröbe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Korenčić</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mayerl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Moskovskiy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mukherjee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Panchenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Potthast</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Rangel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Rizwan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Schneider</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Smirnova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Stamatatos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Stein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Taulé</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ustalov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wiegmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Yimam</surname>
          </string-name>
          , E. Zangerle, Overview of PAN 2024:
          <article-title>MultiAuthor Writing Style Analysis, Multilingual Text Detoxification, Oppositional Thinking Analysis, and Generative AI Authorship Verification - Condensed Lab Overview, in: Experimental IR Meets Multilinguality, Multimodality, and Interaction</article-title>
          .
          <source>Proceedings of the Fifteenth International Conference of the CLEF Association CLEF-2024, Lecture Notes in Computer Science</source>
          , Springer, Berlin Heidelberg New York,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Gambini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Tardelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Tesconi</surname>
          </string-name>
          ,
          <article-title>The anatomy of conspiracy theorists: Unveiling traits using a comprehensive twitter dataset</article-title>
          ,
          <source>Comput. Commun</source>
          .
          <volume>217</volume>
          (
          <year>2024</year>
          )
          <fpage>25</fpage>
          -
          <lpage>40</lpage>
          . URL: https://doi.org/10.1016/ j.comcom.
          <year>2024</year>
          .
          <volume>01</volume>
          .027. doi:
          <volume>10</volume>
          .1016/j.comcom.
          <year>2024</year>
          .
          <volume>01</volume>
          .027.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Diab</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Nefriana</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.-R.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <article-title>Classifying conspiratorial narratives at scale: False alarms and erroneous connections</article-title>
          ,
          <source>Proceedings of the International AAAI Conference on Web and Social Media</source>
          <volume>18</volume>
          (
          <year>2024</year>
          )
          <fpage>340</fpage>
          -
          <lpage>353</lpage>
          . URL: https://ojs.aaai.org/index.php/ICWSM/article/view/31318. doi:
          <volume>10</volume>
          . 1609/icwsm.v18i1.
          <fpage>31318</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M.</given-names>
            <surname>Pustet</surname>
          </string-name>
          , E. Stefen,
          <string-name>
            <given-names>H.</given-names>
            <surname>Mihaljević</surname>
          </string-name>
          ,
          <article-title>Detection of conspiracy theories beyond keyword bias in german-language telegram using large language models</article-title>
          ,
          <year>2024</year>
          . arXiv:
          <volume>2404</volume>
          .
          <fpage>17985</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Peskine</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Korenčić</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Grubisic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Papotti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Troncy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          ,
          <article-title>Definitions matter: Guiding GPT for multi-label classification</article-title>
          , in: H.
          <string-name>
            <surname>Bouamor</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Pino</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          Bali (Eds.),
          <source>Findings of the Association for Computational Linguistics: EMNLP</source>
          <year>2023</year>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Singapore,
          <year>2023</year>
          , pp.
          <fpage>4054</fpage>
          -
          <lpage>4063</lpage>
          . URL: https://aclanthology.org/
          <year>2023</year>
          .findings-emnlp.
          <volume>267</volume>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2023</year>
          .findings-emnlp.
          <volume>267</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Thompson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ananiadou</surname>
          </string-name>
          ,
          <article-title>Conspemollm: Conspiracy theory detection using an emotion-based large language model</article-title>
          ,
          <year>2024</year>
          . arXiv:
          <volume>2403</volume>
          .
          <fpage>06765</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>D.</given-names>
            <surname>Korenčić</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Chulvi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X. B.</given-names>
            <surname>Casals</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Taulé</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Rangel</surname>
          </string-name>
          ,
          <article-title>Overview of the oppositional thinking analysis pan task at clef 2024</article-title>
          , in: G. Faggioli,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Galuvakova</surname>
          </string-name>
          , A. G. S. de Herrera (Eds.),
          <source>Working Notes of CLEF 2024 - Conference and Labs of the Evaluation Forum</source>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>T.</given-names>
            <surname>Dettmers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Pagnoni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Holtzman</surname>
          </string-name>
          , L. Zettlemoyer, Qlora: Eficient finetuning of quantized llms,
          <year>2023</year>
          . arXiv:
          <volume>2305</volume>
          .
          <fpage>14314</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>D.</given-names>
            <surname>Korenčić</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Chulvi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X. Bonet</given-names>
            <surname>Casals</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Taulé</surname>
          </string-name>
          , P. Rosso,
          <article-title>Pan24 oppositional thinking analysis</article-title>
          ,
          <year>2024</year>
          . URL: https://doi.org/10.5281/zenodo.11199642. doi:
          <volume>10</volume>
          .5281/zenodo.11199642.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <article-title>AI@Meta, Llama 3 model card (</article-title>
          <year>2024</year>
          ). URL: https://github.com/meta-llama/llama3/blob/main/ MODEL_CARD.md.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>T.</given-names>
            <surname>Brown</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Mann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ryder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Subbiah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. D.</given-names>
            <surname>Kaplan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Dhariwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Neelakantan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Shyam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Sastry</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Askell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Agarwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Herbert-Voss</surname>
          </string-name>
          , G. Krueger,
          <string-name>
            <given-names>T.</given-names>
            <surname>Henighan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Child</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ramesh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ziegler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Winter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Hesse</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Chen</surname>
          </string-name>
          , E. Sigler,
          <string-name>
            <given-names>M.</given-names>
            <surname>Litwin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gray</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Chess</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Clark</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Berner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>McCandlish</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Radford</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Sutskever</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Amodei</surname>
          </string-name>
          ,
          <article-title>Language models are few-shot learners</article-title>
          , in: H.
          <string-name>
            <surname>Larochelle</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Ranzato</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Hadsell</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Balcan</surname>
          </string-name>
          , H. Lin (Eds.),
          <source>Advances in Neural Information Processing Systems</source>
          , volume
          <volume>33</volume>
          ,
          <string-name>
            <surname>Curran</surname>
            <given-names>Associates</given-names>
          </string-name>
          , Inc.,
          <year>2020</year>
          , pp.
          <fpage>1877</fpage>
          -
          <lpage>1901</lpage>
          . URL: https://proceedings.neurips.cc/paper_files/paper/2020/file/ 1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>D.</given-names>
            <surname>Chicco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Tötsch</surname>
          </string-name>
          ,
          <string-name>
            <surname>G. Jurman,</surname>
          </string-name>
          <article-title>The matthews correlation coeficient (mcc) is more reliable than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix evaluation</article-title>
          ,
          <source>BioData Mining</source>
          <volume>14</volume>
          (
          <year>2021</year>
          ). doi:
          <volume>10</volume>
          .1186/s13040-021-00244-z.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>G.</given-names>
            <surname>Da San Martino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Barrón-Cedeño</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Petrov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          ,
          <article-title>Fine-grained analysis of propaganda in news article</article-title>
          , in: K. Inui,
          <string-name>
            <given-names>J.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Ng</surname>
          </string-name>
          ,
          <string-name>
            <surname>X.</surname>
          </string-name>
          Wan (Eds.),
          <source>Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)</source>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Hong Kong, China,
          <year>2019</year>
          , pp.
          <fpage>5636</fpage>
          -
          <lpage>5646</lpage>
          . URL: https://aclanthology.org/D19-1565. doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>D19</fpage>
          -1565.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>