<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>HiTZ@Disargue: Few-shot Learning and Argumentation to Detect and Fight Misinformation in Social Media</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Rodrigo Agerri</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jeremy Barnes</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jaione Bengoetxea</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Blanca Calvo Figueras</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Joseba Fernandez de Landa</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Iker García-Ferrero</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Olia Toporkov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Irune Zubiaga</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>HiTZ Center - Ixa, University of the Basque Country UPV/EHU</institution>
          ,
          <addr-line>Donostia-San Sebastián</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>DISARGUE opens a new and exciting avenue of research in AI-based explanatory argumentation to fight misinformation. This project will investigate and develop new methods based on automatic argumentation to provide explanations of misinformation detection systems and to generate automatic counterspeech to counteract misinformation in social media. This vision constitutes a disruptive approach with respect to current research: (i) with respect to explainability, most previous research has been focused on post-hoc or simple flagging methods and, (ii) with respect to counter-argumentation to refute misinformation in real time, no previous work has been done in the AI field, although some psychological and communication studies exist. Furthermore, DISARGUE's vision is made possible by the huge leaps in performance in Natural Language Understanding and Generation provided by the Transformer-based Large Language Models on which DISARGUE will investigate new methods to exploit them in few-shot learning settings. Additionally, the project aims to follow recent trends on human-centric AI where humans are by design in the loop. Being aligned with many of the hot topics in AI research (argumentation, few-shot learning, explainability) DISARGUE will benefit from the advances being achieved on those disciplines. Apart from the project description, we also provide an overview of the project's first contributions.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Argumentation</kwd>
        <kwd>Text Generation</kwd>
        <kwd>Social Media</kwd>
        <kwd>Automated Journalism</kwd>
        <kwd>Media Discourse</kwd>
        <kwd>Online Communication</kwd>
        <kwd>Misinformation</kwd>
        <kwd>Hate Speech</kwd>
        <kwd>Counter Narratives</kwd>
        <kwd>Natural Language Processing</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>SEPLN-CEDI-PD 2024: Seminar of the Spanish Society for Natural
Language Processing: Projects and System Demonstrations, June
19-20, 2024, A Coruña, Spain
$ rodrigo.agerri@ehu.eus (R. Agerri)</p>
      <p>© 2024 Copyright for this paper by its authors. Use permitted under Creative
CPWrEooUrckReshdoinpgs IhStpN:/c1e6u1r3-w-0s.o7r3g CCoEmmUoRns LWiceonsrekAstthribouptionP4r.0oIncteerenadtiionnagl s(CC(CBYE4U.0)R.-WS.org)
1https://hitz.eus
2https://www.ehu.eus/es/web/gureiker/home
3For the sake of brevity, we will use the term “misinformation”
to refer to “misbehaviour”, namely, both “misinformation” (spreading
fake news) and “disinformation” (intention to do harm by spreading
fake news). Most of the time we will refer also to hate speech,
another kind of “misbehaviour” in social media.</p>
      <sec id="sec-1-1">
        <title>4https://help.twitter.com/en/rules-and-policies/</title>
        <p>
          medical-misinformation-policy
mation, most of the recommendations regarding the type
of response that may be adequate refer, in one way or the
other, to the fact that an appropriate mitigation strategy
should include an explanation or argument providing
reasons of various possible types (factual, rhetoric...) [
          <xref ref-type="bibr" rid="ref1 ref4">4, 1</xref>
          ].
Another important aspect is to adapt to the language of
the message spreading misinformation. The aim of such
explanations would be to convince or at least sow doubts
on the person sharing the message and, perhaps most
importantly, on the large number of users reading the
interaction in social media.
        </p>
        <p>Taking these considerations into account,
DISARGUE’s vision is to develop new techniques based on
automatic argumentation to address both aspects of
explainability thereby improving current techniques on
misinformation detection and mitigation. By including
argumentation-based explanations, DISARGUE will
advance the state of the art in misinformation detection and
mitigation by: (i) improving the interpretability of the
predictions given by misinformation detection systems,
(ii) automatically fighting misinformation by providing
high-quality argumentative-based explanations and, (iii)
using automatic natural language argumentation to
provide a more interactive experience for the fact-checker
using AI technology as an assistance. Thus, in the
detection step, argumentation-based explanations would
help domain-experts to better understand the decisions
of the system. After detection, argumentation would
focus on providing appropriate explanatory responses
to counter items suspected of spreading misinformation
thereby mitigating their overall efect on the public.</p>
        <p>Figure 1 depicts the use-case scenario envisioned for</p>
      </sec>
      <sec id="sec-1-2">
        <title>DISARGUE. Steps 1 to 6 in the figure are originally from</title>
        <p>
          Augenstein [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], and describe the process that
misinformation detection and mitigation pipelines tend to follow:
claims crawled from social media are examined for
checkworthiness. If deemed worthy, evidence is retrieved and
ranked. Then, a stance detection process assesses
agreement or disagreement with the claim. Finally, claim
verification determines the claim’s truthfulness based on
obtained evidence. This basis of the use case scenario
envisaged by our project is already implemented in many
professional fact-checking teams, as human fact-checkers
use AI technology as an assistant to detect
misinformation in social media.
        </p>
        <p>However, the next steps illustrated in Figure 1 is where
DISARGUE’s novelty lies, as (7) human fact-checkers
request explanatory arguments about the automatic
detection results. The system then (8) provides arguments
based on input and evidence, leading to two outcomes: (9)
acceptance, with the fact-checker flagging input as
misinformation, followed by the (10) optional publication of
an automated response or, if unconvinced, (11) rejection,
with the fact-checker downranking the message, thus
having to repeat the claim verification (6) and
argumentative explanation steps when new evidence becomes
available.</p>
        <p>DISARGUE’s vision faces several scientific and
interdisciplinary challenges which are related with
misinformation and argumentation theory, explainability and
few-shot learning. First, how to leverage and produce
new research from domain-experts (fact-checkers, social
scientists, journalists) to guide the argumentation-based
counteracting of misinformation in social media. Second,
misinformation is spread nowadays in a variety of
modalities (video, audio, images, text) and DISARGUE will face
the challenge of ofering explanations of attribution
working in a multimodal environment and for diferent social
media. Third, by the nature of the problem itself and of
the current AI technology, misinformation detection and
mitigation sufers perennially from a lack of annotated
data. Thus, DISARGUE will need to research new
methods of leveraging large pre-trained transformer-based
language models to apply few-shot learning (learning
with few examples from a specific topic or domain) for
multimodal and multilingual misinformation detection,
including the generation of argumentation-based
explanations.</p>
        <p>Although the DISARGUE’s vision explained above may
be applicable to any topic of misbehaviour in social
media, this project will focus on tackling misinformation
about: (i) public health and vaccines, (ii) immigration
and, (iii) climate change, in a number of social media
(Twitter, YouTube, Tiktok, etc.) and for Spanish, Catalan,
Basque and English. The choice of topics is based on
their perceived universality and cross-cultural character,
namely, on the fact that misinformation on these three
topics follow a number of common themes independently
of the specific countries, languages and local policies.</p>
      </sec>
      <sec id="sec-1-3">
        <title>Automatic techniques to counteract and mitigate the ef</title>
        <p>fects of misinformation in social media are mostly based
2. Related Work on explicitly flagging a given message as being suspicious
(without any specific explanation to justify the decision).</p>
        <p>
          In this section we review the most relevant previous Other approaches include the chatbot service created by
work focusing on explainable misinformation detection the WHO and Facebook to combat misinformation
reand generation for misbehaviour mitigation, as well as garding COVID-195. However, the chatbot allows users
few-shot learning and evaluation challenges in Natural to get factual and accurate information about the
panLanguage Generation (NLG) tasks. demic, it is not a service to counteract misinformation
being spread in social media. Therefore, there is a clear
2.1. Explainable Misinformation lack of AI-based automated approaches to mitigate
misinDetection formation by generating appropriate counter-arguments
in real time. The closest to this is the work undertaken
A commonly accepted trend in Natural Language Pro- within the HATEMETER project6, where they propose
cessing (NLP) is to consider fact-checking as a multi-step using text generation techniques to generate
counterautomatic process usually performed sequentially, in a narratives to tackle anti-muslim hate speech. However,
pipeline architecture, as depicted in Figure 1 steps 1-6. the aim of generating counter-narratives is substantially
Thus, in the last step, claim verification, misinformation diferent from generating arguments to address
misinfordetection is essentially modelled as a pairwise classifica- mation [15] and it should work under diferent
domaintion task where the objective is to infer a label from a experts’ informed guidelines.
given claim with respect to a piece of evidence or a pre- Natural Language Generation (NLG) has become one
defined topic, in what is usually also known as a Natural of the most important yet challenging tasks in NLP which
Language Inference (NLI) or Textual Entailment task [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. is currently being addressed by the intense development
        </p>
        <p>
          Nowadays, as it is the case with many NLP tasks, the and release of many Large Language Models (LLMs)
large majority of the best performing approaches address [16, 17, 18]. One of the advantages of these neural
modthe task by considering only the textual content [
          <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
          ] els is that they enable end-to-end learning of
semanand, from 2018 onwards, by applying (in one way or an- tic mappings from input to output in text generation.
other) large pre-trained language models [8, 9, 10]. This Transformer models such T5 [19] or a single Transformer
trend is recently changing by incorporating user-based decoder blocks like Llama 2 or Mistral [16, 17, 18] are
interaction information from social media to improve the currently the standard architectures for generating high
performance of the textual-based classifiers [ 11, 12, 13]. quality text.
        </p>
        <p>
          In any case, most approaches simply provide a predic- DISARGUE will provide novel AI technology by
levertion label, without aiming to provide any explanation aging the latest advances in NLG to automatically
generto justify the classifier’s decision. In an efort to make ate counter-arguments guided by Retrieval Augmented
the decisions of the detection models more transparent, Generation (RAG) [20] with the aim of counteracting the
explainability has been addressed by post-hoc and by spread of misinformation in social media. This
endeavgeneration methods. Post-hoc methods focus on find- our requires multidisciplinary work between
domaining specific regions of the input that may explain the experts on misinformation (fact-checkers, journalists,
predicted label [14], while generation methods aim to policy makers, etc.) and AI researchers to generate
argenerate a summary of the evidence used to predict the guments that fulfil a number of task-specific objectives
label in a simplified setting [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. related to fact-checking and reason-checking . In this
        </p>
        <p>DISARGUE will develop unified vector-based repre- sense, legitimate objectives could be to provide
argusentations for both textual and interaction data with the ments based on factual, rhetoric (assessing the quality
aim of providing a common approach to misinformation of premises and reasoning in persuasive or explanatory
detection which exploits not only the text but also any texts) or simply by alerting other users of the social media
network-based information characteristic from social me- that a particular message might be spreading
misinfordia. Furthermore, it will integrate argument mining and mation (and arguing the justification to do so).
explanatory argument generation in the decision making
addressing both positive and negative evidence
supporting the prediction. This would provide domain-experts
with argumentation-based explanations, also using
evidence from external knowledge, to support the decision
taken by the misinformation detection system.</p>
        <sec id="sec-1-3-1">
          <title>2.3. Few-shot Learning</title>
        </sec>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>3. Methodology and Work Plan</title>
      <p>DISARGUE will focus on two novel models in the
misinformation detection and mitigation pipeline, as depicted
in Figure 1: (i) the Argumentation Model, which provides
arguments based on both the input message and the
evidence available to justify the prediction; (ii) the
Generation model, which focuses on automatically generating
arguments to counteract a perceived misinformation.</p>
      <p>
        The currently available data for misinformation tasks is
highly compartmentalized and topic-specific, meaning
that each topic requires its own data in order to learn 3.1. Work Plan
relevant classifiers for fact-checking. This results in a The Work Plan is structured in six Work Packages of
general lack of data for the misinformation detection task, which four are focused on the scientific contributions of
as many of the available data is also small in size, or has the project.
incompatible labelling schemes [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. WP2: Methodology. The aim is to define, adapt and
      </p>
      <p>Recent work has shown that pre-trained language mod- integrate the modules, resources, data structures, data
els can robustly perform classification tasks in a few-shot formats, and module APIs within the DISARGUE
aror even in zero-shot fashion, when given an adequate chitecture. Additionally, focus will be given to the
detask description in its natural language prompt [16]. Un- velopment of evaluation datasets and corpora to train
like traditional supervised learning, which trains a model argumentation-based explainable AI systems.
to take in an input and predict an output, prompt-based WP3: Explainable Misinformation Detection. The
purlearning is based on exploiting pre-trained language mod- pose of this WP is to work on joint and multitask models
els to solve a task using text directly [9]. Thus, some NLP for explainable misinformation detection beyond
posttasks can be solved in an almost unsupervised fashion hoc explainable methods. Novel approaches to exploit
by providing a pre-trained language model with task de- the full potential of LLMs will be developed, including
scriptions in natural language [19, 21]. Surprisingly, fine- prompting, generation and multimodal training, in order
tuning pre-trained language models on a collection of to make these models usable for the various tasks and
tasks described via instructions (or prompts) substantially languages of DISARGUE with minimal preparation efort,
boosts zero-shot performance on unseen tasks [22, 23]. through zero-shot and, especially, few-shot learning.
WP4: Argument Generation. WP4 focuses on (i) defining
2.4. Evaluation of Generated Text and analyzing counter-argumentative patterns, creating
natural language counter-arguments against detected
misinformation and (ii) improving counter-argument
generation by mining textual arguments from reliable
sources via RAG. In summary, this task aims to prompt
and train generative language models to enhance their
text generation abilities for producing clear and
understandable argumentation.</p>
      <p>WP5: Evaluation of misinformation. WP5 aims to
improve qualitative and quantitative evaluation of text
generation-based tasks such for argument generation.</p>
      <p>More specifically, the objective will be to evaluate: (i)
the efectiveness and quality of the prediction; (ii) the
quality of the generated arguments for explanation and
counter-argumentation, (iii) the efect of the
counterargumentation strategy via user-based evaluation guided
by domain-experts.</p>
      <p>NLG tasks such as the one proposed in DISARGUE
present a considerable evaluation challenge. Thus, while
it is possible to use usual distance-based metrics to
evaluate the generated text such as ROUGE, BLEU or Bertscore
[24], other works have proposed to use quality-based
metrics such as Diversity and Novelty to evaluate the
capacity of the model to generate diverse responses and
the ability to generate sequences diferent from the data
seeing during training or fine-tuning [25, 26].</p>
      <p>However, a proper evaluation of the explanatory
arguments generated in DISARGUE to explain the label
prediction (in the detection phase) or to counteract
misinformation (in the mitigation phase), requires to consider
task-specific issues not taken into account in previous
NLG or argumentation work. This implies evaluating the
quality of the generated counter arguments regarding
the supporting evidence found in trusted resources. A
new promising avenue is that represented by JudgeLM,
a scalable language model judge, designed for evaluating
LLMs in open-ended scenarios [27].</p>
    </sec>
    <sec id="sec-3">
      <title>4. Ongoing Work</title>
      <sec id="sec-3-1">
        <title>There are a number of tasks currently being undertaken within the project. In this section we provide details of the most important ones with respect to the objectives and motivation provided in the introduction.</title>
        <sec id="sec-3-1-1">
          <title>4.1. CONAN-EUS</title>
          <p>CONAN-EUS7 is a new parallel Basque and Spanish
dataset for CN generation consisting of automatic
trans</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>7https://huggingface.co/datasets/HiTZ/CONAN-EUS</title>
        <p>lations and professional post-editions of the original En- Currently, ongoing work has focused on analyzing the
glish CONAN. The corpus consists of 6654 machine trans- automatic generation of counterarguments in Basque
lated HS-CN pairs and 6654 gold-standard human curated and Spanish, as well as novel experimentation of critical
HS-CN pairs (per language) which makes it a unique re- question generation and text veracity authentication via
source to investigate CN generation from a multilingual the development of new benchmarks such as TruthfulQA
and crosslingual perspective. Experimental results show for Basque, Catalan and Spanish.
that CN generation is better when mT5 is fine-tuned on Future work includes further experimentation on
arpost-edited training data, rather than on the output of gument generation using LLMs and on the evaluation
MT. The paper will appear at LREC-COLING 2024 [28]. of the generated text, a crucial topic to understand the
performance of our models.</p>
        <sec id="sec-3-2-1">
          <title>4.2. Automatic Generation of Critical</title>
        </sec>
        <sec id="sec-3-2-2">
          <title>Questions</title>
          <p>Critical questions can be particularly helpful in the
debunking process of misinformation. DISARGUE will
study the automatic generation of these questions by
exploring argumentation schemes, which represent
different types of arguments illustrated through diferent
premises. In argumentation theory, each argumentation
scheme may be associated to a set of critical questions
[29].</p>
          <p>Based on this theory, we are currently working on
building a model that, given an argument, outputs the
critical questions needed to question the argument.
Additionally, the automatic generation of critical
questions would potentially enhance DISARGUE’s quality of
argumentation-based explainability. The limitations we
are currently facing include: few and small datasets
annotated with argumentation schemes, mainly in English;
the great amount of diferent argumentation schemes
(over 60, and it is not a closed set); and the automated
transformation of the datasets does not result in
particularly natural critical questions.</p>
        </sec>
        <sec id="sec-3-2-3">
          <title>4.3. Multilingual TruthfulQA</title>
          <p>A popular benchmark to evaluate the truthfulness of
current LLMs is TruthfulQA, which evaluates truthfulness
in English [30]. The dataset consists of question-answer
pairs, each question with both true and false reference
answers. No similar task on truthfulness has been done
before for Basque, Catalan or Spanish, which means that
currently is not possible to evaluate truthfulness of LLMs
for those languages. DISARGUE will explore the
truthfulness of monolingual and multilingual LLMs for those
languages and English. The manual translated dataset
and complementary experiments will be released soon.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>5. Concluding Remarks</title>
      <sec id="sec-4-1">
        <title>This paper outlines the DISARGUE project, which focuses on developing novel automatic argumentation techniques to enhance explainability and improve existing methods for detecting and mitigating misinformation.</title>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <sec id="sec-5-1">
        <title>Disargue (TED2021-130810B-C21) is a project funded</title>
        <p>by MCIN/AEI/10.13039/501100011033 and by
European Union NextGenerationEU/PRTR. Iker
GarcíaFerrero is supported by a doctoral grant from the
Basque Government (PRE_2021_2_0219). Rodrigo Agerri
was also funded by the RYC-2017-23647 fellowship
(MCIN/AEI/10.13039/501100011033 and by ESF Investing
in your future).
in tweets, in: S. Bethard, M. Carpuat, D. Cer, D. Jur- M. Matena, Y. Zhou, W. Li, P. J. Liu, Exploring the
gens, P. Nakov, T. Zesch (Eds.), Proceedings of the limits of transfer learning with a unified text-to-text
10th International Workshop on Semantic Evalua- transformer, Journal of Machine Learning Research
tion (SemEval-2016), 2016, pp. 31–41. 21 (2020) 1–67.
[8] M. Hardalov, A. Arora, P. Nakov, I. Augenstein, Few- [20] Y. Gao, Y. Xiong, X. Gao, K. Jia, J. Pan, Y. Bi, Y. Dai,
shot cross-lingual stance detection with sentiment- J. Sun, Q. Guo, M. Wang, H. Wang,
Retrievalbased pre-training, in: AAAI Conference on Artifi- augmented generation for large language models:
cial Intelligence, 2021. A survey, ArXiv abs/2312.10997 (2023).
[9] P. Liu, W. Yuan, J. Fu, Z. Jiang, H. Hayashi, G. Neu- [21] T. Schick, H. Schütze, Exploiting cloze-questions for
big, Pre-train, prompt, and predict: A systematic few-shot text classification and natural language
survey of prompting methods in natural language inference, in: Conference of the European Chapter
processing, ACM Computing Surveys 55 (2021) 1 – of the Association for Computational Linguistics,
35. 2020.
[10] D. Küçük, F. Can, Stance detection: A survey, ACM [22] J. Wei, M. Bosma, V. Zhao, K. Guu, A. W. Yu,</p>
        <p>Comput. Surv. 53 (2020). B. Lester, N. Du, A. M. Dai, Q. V. Le, Finetuned
[11] R. Agerri, R. Centeno, M. Espinosa, J. F. de Landa, language models are zero-shot learners, ArXiv
Álvaro Rodrigo Yuste, Vaxxstance@iberlef 2021: abs/2109.01652 (2021).</p>
        <p>Overview of the task on going beyond text in cross- [23] B. Min, H. Ross, E. Sulem, A. P. B. Veyseh, T. H.
lingual stance detection, in: Procesamiento del Nguyen, O. Sainz, E. Agirre, I. Heinz, D. Roth,
ReLenguaje Natural., 2021. cent advances in natural language processing via
[12] M. S. Espinosa, R. Agerri, Á. Rodrigo, R. Centeno, large pre-trained language models: A survey, ACM
Deepreading @ sardistance 2020: Combining tex- Computing Surveys 56 (2021) 1 – 40.
tual, social and emotional features, EVALITA Eval- [24] T. Zhang, V. Kishore, F. Wu, K. Q. Weinberger,
uation of NLP and Speech Tools for Italian - Decem- Y. Artzi, Bertscore: Evaluating text generation with
ber 17th, 2020 (2020). bert, ArXiv abs/1904.09675 (2019).
[13] M. Lai, A. T. Cignarella, L. Finos, A. Sciandra, [25] K. Wang, X. Wan, Sentigan: Generating sentimental
Wordup! at vaxxstance 2021: Combining contextual texts via mixture adversarial networks, in:
Interinformation with textual and dependency-based national Joint Conference on Artificial Intelligence,
syntactic features for stance detection, in: Proceed- 2018.
ings of the Iberian Languages Evaluation Forum [26] Y.-L. Chung, S. S. Tekiroğlu, M. Guerini, Italian
(IberLEF 2021), CEUR Workshop Proceedings, 2021. counter narrative generation to fight online hate
[14] P. Atanasova, J. G. Simonsen, C. Lioma, I. Augen- speech, Proceedings of the Seventh Italian
Constein, A diagnostic study of explainability tech- ference on Computational Linguistics CLiC-it 2020
niques for text classification, in: B. Webber, T. Cohn, (2020).</p>
        <p>Y. He, Y. Liu (Eds.), Proceedings of the 2020 Con- [27] L. Zhu, X. Wang, X. Wang, Judgelm: Fine-tuned
ference on Empirical Methods in Natural Language large language models are scalable judges, ArXiv
Processing (EMNLP), 2020, pp. 3256–3274. abs/2310.17631 (2023).
[15] Y.-L. Chung, E. Kuzmenko, S. S. Tekiroglu, [28] J. Bengoetxea, Y. Chung, M. Guerini, R. Agerri,
M. Guerini, CONAN - COunter NArratives through Basque and spanish counter narrative generation:
nichesourcing: a multilingual dataset of responses Data creation and evaluation, in: LREC-COLING
to fight online hate speech, in: A. Korhonen, 2024, 2020.</p>
        <p>D. Traum, L. Màrquez (Eds.), Proceedings of the [29] D. M. Godden, D. Walton, Advances in the theory
57th Annual Meeting of the Association for Com- of argumentation schemes and critical questions,
putational Linguistics, 2019, pp. 2819–2829. Informal Logic 27 (2008) 267–292.
[16] T. Brown, e. a. Mann, Language models are few-shot [30] S. C. Lin, J. Hilton, O. Evans, Truthfulqa: Measuring
learners, in: H. Larochelle, M. Ranzato, R. Hadsell, how models mimic human falsehoods, in: Annual
M. Balcan, H. Lin (Eds.), Advances in Neural In- Meeting of the Association for Computational
Linformation Processing Systems, volume 33, Curran guistics, 2021.</p>
        <p>Associates, Inc., 2020, pp. 1877–1901.
[17] H. Touvron, L. M. et al., Llama 2: Open foundation
and fine-tuned chat models, ArXiv abs/2307.09288
(2023).
[18] A. Q. Jiang, A. S. et al., Mistral 7b, ArXiv</p>
        <p>abs/2310.06825 (2023).
[19] C. Rafel, N. Shazeer, A. Roberts, K. Lee, S. Narang,</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>U. K.</given-names>
            <surname>Ecker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z. O</given-names>
            <surname>'Reilly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. S.</given-names>
            <surname>Reid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. P.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <article-title>The effectiveness of short-format refutational fact-checks</article-title>
          ,
          <source>British Journal of Psychology</source>
          <volume>111</volume>
          (
          <year>2020</year>
          )
          <fpage>36</fpage>
          -
          <lpage>54</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>R.</given-names>
            <surname>Kouzy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Jaoude</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kraitem</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. B. E.</given-names>
            <surname>Alam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. S.</given-names>
            <surname>Karam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Adib</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zarka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Traboulsi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. W.</given-names>
            <surname>Akl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Baddour</surname>
          </string-name>
          ,
          <article-title>Coronavirus goes viral: Quantifying the covid-19 misinformation epidemic on twitter</article-title>
          ,
          <source>Cureus</source>
          <volume>12</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>I. Augenstein</surname>
          </string-name>
          ,
          <article-title>Towards explainable fact checking</article-title>
          ,
          <source>arXiv preprint arXiv:2108.10274</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>U. K.</given-names>
            <surname>Ecker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. L.</given-names>
            <surname>Hogan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lewandowsky</surname>
          </string-name>
          ,
          <article-title>Reminders and repetition of misinformation: Helping or hindering its retraction?</article-title>
          ,
          <source>Journal of applied research in memory and cognition 6</source>
          (
          <year>2017</year>
          )
          <fpage>185</fpage>
          -
          <lpage>192</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.</given-names>
            <surname>Thorne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Vlachos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Christodoulopoulos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mittal</surname>
          </string-name>
          ,
          <article-title>FEVER: a large-scale dataset for fact extraction and VERification</article-title>
          , in: M.
          <string-name>
            <surname>Walker</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Ji</surname>
            ,
            <given-names>A</given-names>
          </string-name>
          . Stent (Eds.),
          <source>Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , Volume
          <volume>1</volume>
          (
          <string-name>
            <surname>Long</surname>
            <given-names>Papers)</given-names>
          </string-name>
          ,
          <year>2018</year>
          , pp.
          <fpage>809</fpage>
          -
          <lpage>819</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>I.</given-names>
            <surname>Augenstein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Rocktäschel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Vlachos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Bontcheva</surname>
          </string-name>
          ,
          <article-title>Stance detection with bidirectional conditional encoding</article-title>
          , in: J.
          <string-name>
            <surname>Su</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Duh</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          Carreras (Eds.),
          <source>Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>876</fpage>
          -
          <lpage>885</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S.</given-names>
            <surname>Mohammad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kiritchenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Sobhani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhu</surname>
          </string-name>
          , C. Cherry, SemEval
          <article-title>-2016 task 6: Detecting stance</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>