1. Introduction

HiTZ@Disargue: Few-shot Learning and Argumentation to Detect and Fight Misinformation in Social Media

Rodrigo Agerri

Jeremy Barnes

Jaione Bengoetxea

Blanca Calvo Figueras

Joseba Fernandez de Landa

Iker García-Ferrero

Olia Toporkov

Irune Zubiaga

0 0 HiTZ Center - Ixa, University of the Basque Country UPV/EHU , Donostia-San Sebastián , Spain

DISARGUE opens a new and exciting avenue of research in AI-based explanatory argumentation to fight misinformation. This project will investigate and develop new methods based on automatic argumentation to provide explanations of misinformation detection systems and to generate automatic counterspeech to counteract misinformation in social media. This vision constitutes a disruptive approach with respect to current research: (i) with respect to explainability, most previous research has been focused on post-hoc or simple flagging methods and, (ii) with respect to counter-argumentation to refute misinformation in real time, no previous work has been done in the AI field, although some psychological and communication studies exist. Furthermore, DISARGUE's vision is made possible by the huge leaps in performance in Natural Language Understanding and Generation provided by the Transformer-based Large Language Models on which DISARGUE will investigate new methods to exploit them in few-shot learning settings. Additionally, the project aims to follow recent trends on human-centric AI where humans are by design in the loop. Being aligned with many of the hot topics in AI research (argumentation, few-shot learning, explainability) DISARGUE will benefit from the advances being achieved on those disciplines. Apart from the project description, we also provide an overview of the project's first contributions.

eol>Argumentation Text Generation Social Media Automated Journalism Media Discourse Online Communication Misinformation Hate Speech Counter Narratives Natural Language Processing

1. Introduction

SEPLN-CEDI-PD 2024: Seminar of the Spanish Society for Natural Language Processing: Projects and System Demonstrations, June 19-20, 2024, A Coruña, Spain $ rodrigo.agerri@ehu.eus (R. Agerri)

© 2024 Copyright for this paper by its authors. Use permitted under Creative CPWrEooUrckReshdoinpgs IhStpN:/c1e6u1r3-w-0s.o7r3g CCoEmmUoRns LWiceonsrekAstthribouptionP4r.0oIncteerenadtiionnagl s(CC(CBYE4U.0)R.-WS.org) 1https://hitz.eus 2https://www.ehu.eus/es/web/gureiker/home 3For the sake of brevity, we will use the term “misinformation” to refer to “misbehaviour”, namely, both “misinformation” (spreading fake news) and “disinformation” (intention to do harm by spreading fake news). Most of the time we will refer also to hate speech, another kind of “misbehaviour” in social media.

4https://help.twitter.com/en/rules-and-policies/

medical-misinformation-policy mation, most of the recommendations regarding the type of response that may be adequate refer, in one way or the other, to the fact that an appropriate mitigation strategy should include an explanation or argument providing reasons of various possible types (factual, rhetoric...) [ 4, 1 ]. Another important aspect is to adapt to the language of the message spreading misinformation. The aim of such explanations would be to convince or at least sow doubts on the person sharing the message and, perhaps most importantly, on the large number of users reading the interaction in social media.

Taking these considerations into account, DISARGUE’s vision is to develop new techniques based on automatic argumentation to address both aspects of explainability thereby improving current techniques on misinformation detection and mitigation. By including argumentation-based explanations, DISARGUE will advance the state of the art in misinformation detection and mitigation by: (i) improving the interpretability of the predictions given by misinformation detection systems, (ii) automatically fighting misinformation by providing high-quality argumentative-based explanations and, (iii) using automatic natural language argumentation to provide a more interactive experience for the fact-checker using AI technology as an assistance. Thus, in the detection step, argumentation-based explanations would help domain-experts to better understand the decisions of the system. After detection, argumentation would focus on providing appropriate explanatory responses to counter items suspected of spreading misinformation thereby mitigating their overall efect on the public.

Figure 1 depicts the use-case scenario envisioned for

DISARGUE. Steps 1 to 6 in the figure are originally from

Augenstein [ 3 ], and describe the process that misinformation detection and mitigation pipelines tend to follow: claims crawled from social media are examined for checkworthiness. If deemed worthy, evidence is retrieved and ranked. Then, a stance detection process assesses agreement or disagreement with the claim. Finally, claim verification determines the claim’s truthfulness based on obtained evidence. This basis of the use case scenario envisaged by our project is already implemented in many professional fact-checking teams, as human fact-checkers use AI technology as an assistant to detect misinformation in social media.

However, the next steps illustrated in Figure 1 is where DISARGUE’s novelty lies, as (7) human fact-checkers request explanatory arguments about the automatic detection results. The system then (8) provides arguments based on input and evidence, leading to two outcomes: (9) acceptance, with the fact-checker flagging input as misinformation, followed by the (10) optional publication of an automated response or, if unconvinced, (11) rejection, with the fact-checker downranking the message, thus having to repeat the claim verification (6) and argumentative explanation steps when new evidence becomes available.

DISARGUE’s vision faces several scientific and interdisciplinary challenges which are related with misinformation and argumentation theory, explainability and few-shot learning. First, how to leverage and produce new research from domain-experts (fact-checkers, social scientists, journalists) to guide the argumentation-based counteracting of misinformation in social media. Second, misinformation is spread nowadays in a variety of modalities (video, audio, images, text) and DISARGUE will face the challenge of ofering explanations of attribution working in a multimodal environment and for diferent social media. Third, by the nature of the problem itself and of the current AI technology, misinformation detection and mitigation sufers perennially from a lack of annotated data. Thus, DISARGUE will need to research new methods of leveraging large pre-trained transformer-based language models to apply few-shot learning (learning with few examples from a specific topic or domain) for multimodal and multilingual misinformation detection, including the generation of argumentation-based explanations.

Although the DISARGUE’s vision explained above may be applicable to any topic of misbehaviour in social media, this project will focus on tackling misinformation about: (i) public health and vaccines, (ii) immigration and, (iii) climate change, in a number of social media (Twitter, YouTube, Tiktok, etc.) and for Spanish, Catalan, Basque and English. The choice of topics is based on their perceived universality and cross-cultural character, namely, on the fact that misinformation on these three topics follow a number of common themes independently of the specific countries, languages and local policies.

Automatic techniques to counteract and mitigate the ef

fects of misinformation in social media are mostly based 2. Related Work on explicitly flagging a given message as being suspicious (without any specific explanation to justify the decision).

In this section we review the most relevant previous Other approaches include the chatbot service created by work focusing on explainable misinformation detection the WHO and Facebook to combat misinformation reand generation for misbehaviour mitigation, as well as garding COVID-195. However, the chatbot allows users few-shot learning and evaluation challenges in Natural to get factual and accurate information about the panLanguage Generation (NLG) tasks. demic, it is not a service to counteract misinformation being spread in social media. Therefore, there is a clear 2.1. Explainable Misinformation lack of AI-based automated approaches to mitigate misinDetection formation by generating appropriate counter-arguments in real time. The closest to this is the work undertaken A commonly accepted trend in Natural Language Pro- within the HATEMETER project6, where they propose cessing (NLP) is to consider fact-checking as a multi-step using text generation techniques to generate counterautomatic process usually performed sequentially, in a narratives to tackle anti-muslim hate speech. However, pipeline architecture, as depicted in Figure 1 steps 1-6. the aim of generating counter-narratives is substantially Thus, in the last step, claim verification, misinformation diferent from generating arguments to address misinfordetection is essentially modelled as a pairwise classifica- mation [15] and it should work under diferent domaintion task where the objective is to infer a label from a experts’ informed guidelines. given claim with respect to a piece of evidence or a pre- Natural Language Generation (NLG) has become one defined topic, in what is usually also known as a Natural of the most important yet challenging tasks in NLP which Language Inference (NLI) or Textual Entailment task [ 5 ]. is currently being addressed by the intense development

Nowadays, as it is the case with many NLP tasks, the and release of many Large Language Models (LLMs) large majority of the best performing approaches address [16, 17, 18]. One of the advantages of these neural modthe task by considering only the textual content [ 6, 7 ] els is that they enable end-to-end learning of semanand, from 2018 onwards, by applying (in one way or an- tic mappings from input to output in text generation. other) large pre-trained language models [8, 9, 10]. This Transformer models such T5 [19] or a single Transformer trend is recently changing by incorporating user-based decoder blocks like Llama 2 or Mistral [16, 17, 18] are interaction information from social media to improve the currently the standard architectures for generating high performance of the textual-based classifiers [ 11, 12, 13]. quality text.

In any case, most approaches simply provide a predic- DISARGUE will provide novel AI technology by levertion label, without aiming to provide any explanation aging the latest advances in NLG to automatically generto justify the classifier’s decision. In an efort to make ate counter-arguments guided by Retrieval Augmented the decisions of the detection models more transparent, Generation (RAG) [20] with the aim of counteracting the explainability has been addressed by post-hoc and by spread of misinformation in social media. This endeavgeneration methods. Post-hoc methods focus on find- our requires multidisciplinary work between domaining specific regions of the input that may explain the experts on misinformation (fact-checkers, journalists, predicted label [14], while generation methods aim to policy makers, etc.) and AI researchers to generate argenerate a summary of the evidence used to predict the guments that fulfil a number of task-specific objectives label in a simplified setting [ 3 ]. related to fact-checking and reason-checking . In this

DISARGUE will develop unified vector-based repre- sense, legitimate objectives could be to provide argusentations for both textual and interaction data with the ments based on factual, rhetoric (assessing the quality aim of providing a common approach to misinformation of premises and reasoning in persuasive or explanatory detection which exploits not only the text but also any texts) or simply by alerting other users of the social media network-based information characteristic from social me- that a particular message might be spreading misinfordia. Furthermore, it will integrate argument mining and mation (and arguing the justification to do so). explanatory argument generation in the decision making addressing both positive and negative evidence supporting the prediction. This would provide domain-experts with argumentation-based explanations, also using evidence from external knowledge, to support the decision taken by the misinformation detection system.

2.3. Few-shot Learning 3. Methodology and Work Plan

DISARGUE will focus on two novel models in the misinformation detection and mitigation pipeline, as depicted in Figure 1: (i) the Argumentation Model, which provides arguments based on both the input message and the evidence available to justify the prediction; (ii) the Generation model, which focuses on automatically generating arguments to counteract a perceived misinformation.

The currently available data for misinformation tasks is highly compartmentalized and topic-specific, meaning that each topic requires its own data in order to learn 3.1. Work Plan relevant classifiers for fact-checking. This results in a The Work Plan is structured in six Work Packages of general lack of data for the misinformation detection task, which four are focused on the scientific contributions of as many of the available data is also small in size, or has the project. incompatible labelling schemes [ 3 ]. WP2: Methodology. The aim is to define, adapt and

Recent work has shown that pre-trained language mod- integrate the modules, resources, data structures, data els can robustly perform classification tasks in a few-shot formats, and module APIs within the DISARGUE aror even in zero-shot fashion, when given an adequate chitecture. Additionally, focus will be given to the detask description in its natural language prompt [16]. Un- velopment of evaluation datasets and corpora to train like traditional supervised learning, which trains a model argumentation-based explainable AI systems. to take in an input and predict an output, prompt-based WP3: Explainable Misinformation Detection. The purlearning is based on exploiting pre-trained language mod- pose of this WP is to work on joint and multitask models els to solve a task using text directly [9]. Thus, some NLP for explainable misinformation detection beyond posttasks can be solved in an almost unsupervised fashion hoc explainable methods. Novel approaches to exploit by providing a pre-trained language model with task de- the full potential of LLMs will be developed, including scriptions in natural language [19, 21]. Surprisingly, fine- prompting, generation and multimodal training, in order tuning pre-trained language models on a collection of to make these models usable for the various tasks and tasks described via instructions (or prompts) substantially languages of DISARGUE with minimal preparation efort, boosts zero-shot performance on unseen tasks [22, 23]. through zero-shot and, especially, few-shot learning. WP4: Argument Generation. WP4 focuses on (i) defining 2.4. Evaluation of Generated Text and analyzing counter-argumentative patterns, creating natural language counter-arguments against detected misinformation and (ii) improving counter-argument generation by mining textual arguments from reliable sources via RAG. In summary, this task aims to prompt and train generative language models to enhance their text generation abilities for producing clear and understandable argumentation.

WP5: Evaluation of misinformation. WP5 aims to improve qualitative and quantitative evaluation of text generation-based tasks such for argument generation.

More specifically, the objective will be to evaluate: (i) the efectiveness and quality of the prediction; (ii) the quality of the generated arguments for explanation and counter-argumentation, (iii) the efect of the counterargumentation strategy via user-based evaluation guided by domain-experts.

NLG tasks such as the one proposed in DISARGUE present a considerable evaluation challenge. Thus, while it is possible to use usual distance-based metrics to evaluate the generated text such as ROUGE, BLEU or Bertscore [24], other works have proposed to use quality-based metrics such as Diversity and Novelty to evaluate the capacity of the model to generate diverse responses and the ability to generate sequences diferent from the data seeing during training or fine-tuning [25, 26].

However, a proper evaluation of the explanatory arguments generated in DISARGUE to explain the label prediction (in the detection phase) or to counteract misinformation (in the mitigation phase), requires to consider task-specific issues not taken into account in previous NLG or argumentation work. This implies evaluating the quality of the generated counter arguments regarding the supporting evidence found in trusted resources. A new promising avenue is that represented by JudgeLM, a scalable language model judge, designed for evaluating LLMs in open-ended scenarios [27].

4. Ongoing Work There are a number of tasks currently being undertaken within the project. In this section we provide details of the most important ones with respect to the objectives and motivation provided in the introduction. 4.1. CONAN-EUS

CONAN-EUS7 is a new parallel Basque and Spanish dataset for CN generation consisting of automatic trans

7https://huggingface.co/datasets/HiTZ/CONAN-EUS

lations and professional post-editions of the original En- Currently, ongoing work has focused on analyzing the glish CONAN. The corpus consists of 6654 machine trans- automatic generation of counterarguments in Basque lated HS-CN pairs and 6654 gold-standard human curated and Spanish, as well as novel experimentation of critical HS-CN pairs (per language) which makes it a unique re- question generation and text veracity authentication via source to investigate CN generation from a multilingual the development of new benchmarks such as TruthfulQA and crosslingual perspective. Experimental results show for Basque, Catalan and Spanish. that CN generation is better when mT5 is fine-tuned on Future work includes further experimentation on arpost-edited training data, rather than on the output of gument generation using LLMs and on the evaluation MT. The paper will appear at LREC-COLING 2024 [28]. of the generated text, a crucial topic to understand the performance of our models.

4.2. Automatic Generation of Critical Questions

Critical questions can be particularly helpful in the debunking process of misinformation. DISARGUE will study the automatic generation of these questions by exploring argumentation schemes, which represent different types of arguments illustrated through diferent premises. In argumentation theory, each argumentation scheme may be associated to a set of critical questions [29].

Based on this theory, we are currently working on building a model that, given an argument, outputs the critical questions needed to question the argument. Additionally, the automatic generation of critical questions would potentially enhance DISARGUE’s quality of argumentation-based explainability. The limitations we are currently facing include: few and small datasets annotated with argumentation schemes, mainly in English; the great amount of diferent argumentation schemes (over 60, and it is not a closed set); and the automated transformation of the datasets does not result in particularly natural critical questions.

4.3. Multilingual TruthfulQA

A popular benchmark to evaluate the truthfulness of current LLMs is TruthfulQA, which evaluates truthfulness in English [30]. The dataset consists of question-answer pairs, each question with both true and false reference answers. No similar task on truthfulness has been done before for Basque, Catalan or Spanish, which means that currently is not possible to evaluate truthfulness of LLMs for those languages. DISARGUE will explore the truthfulness of monolingual and multilingual LLMs for those languages and English. The manual translated dataset and complementary experiments will be released soon.

5. Concluding Remarks This paper outlines the DISARGUE project, which focuses on developing novel automatic argumentation techniques to enhance explainability and improve existing methods for detecting and mitigating misinformation. Acknowledgments Disargue (TED2021-130810B-C21) is a project funded

by MCIN/AEI/10.13039/501100011033 and by European Union NextGenerationEU/PRTR. Iker GarcíaFerrero is supported by a doctoral grant from the Basque Government (PRE_2021_2_0219). Rodrigo Agerri was also funded by the RYC-2017-23647 fellowship (MCIN/AEI/10.13039/501100011033 and by ESF Investing in your future). in tweets, in: S. Bethard, M. Carpuat, D. Cer, D. Jur- M. Matena, Y. Zhou, W. Li, P. J. Liu, Exploring the gens, P. Nakov, T. Zesch (Eds.), Proceedings of the limits of transfer learning with a unified text-to-text 10th International Workshop on Semantic Evalua- transformer, Journal of Machine Learning Research tion (SemEval-2016), 2016, pp. 31–41. 21 (2020) 1–67. [8] M. Hardalov, A. Arora, P. Nakov, I. Augenstein, Few- [20] Y. Gao, Y. Xiong, X. Gao, K. Jia, J. Pan, Y. Bi, Y. Dai, shot cross-lingual stance detection with sentiment- J. Sun, Q. Guo, M. Wang, H. Wang, Retrievalbased pre-training, in: AAAI Conference on Artifi- augmented generation for large language models: cial Intelligence, 2021. A survey, ArXiv abs/2312.10997 (2023). [9] P. Liu, W. Yuan, J. Fu, Z. Jiang, H. Hayashi, G. Neu- [21] T. Schick, H. Schütze, Exploiting cloze-questions for big, Pre-train, prompt, and predict: A systematic few-shot text classification and natural language survey of prompting methods in natural language inference, in: Conference of the European Chapter processing, ACM Computing Surveys 55 (2021) 1 – of the Association for Computational Linguistics, 35. 2020. [10] D. Küçük, F. Can, Stance detection: A survey, ACM [22] J. Wei, M. Bosma, V. Zhao, K. Guu, A. W. Yu,

Comput. Surv. 53 (2020). B. Lester, N. Du, A. M. Dai, Q. V. Le, Finetuned [11] R. Agerri, R. Centeno, M. Espinosa, J. F. de Landa, language models are zero-shot learners, ArXiv Álvaro Rodrigo Yuste, Vaxxstance@iberlef 2021: abs/2109.01652 (2021).

Overview of the task on going beyond text in cross- [23] B. Min, H. Ross, E. Sulem, A. P. B. Veyseh, T. H. lingual stance detection, in: Procesamiento del Nguyen, O. Sainz, E. Agirre, I. Heinz, D. Roth, ReLenguaje Natural., 2021. cent advances in natural language processing via [12] M. S. Espinosa, R. Agerri, Á. Rodrigo, R. Centeno, large pre-trained language models: A survey, ACM Deepreading @ sardistance 2020: Combining tex- Computing Surveys 56 (2021) 1 – 40. tual, social and emotional features, EVALITA Eval- [24] T. Zhang, V. Kishore, F. Wu, K. Q. Weinberger, uation of NLP and Speech Tools for Italian - Decem- Y. Artzi, Bertscore: Evaluating text generation with ber 17th, 2020 (2020). bert, ArXiv abs/1904.09675 (2019). [13] M. Lai, A. T. Cignarella, L. Finos, A. Sciandra, [25] K. Wang, X. Wan, Sentigan: Generating sentimental Wordup! at vaxxstance 2021: Combining contextual texts via mixture adversarial networks, in: Interinformation with textual and dependency-based national Joint Conference on Artificial Intelligence, syntactic features for stance detection, in: Proceed- 2018. ings of the Iberian Languages Evaluation Forum [26] Y.-L. Chung, S. S. Tekiroğlu, M. Guerini, Italian (IberLEF 2021), CEUR Workshop Proceedings, 2021. counter narrative generation to fight online hate [14] P. Atanasova, J. G. Simonsen, C. Lioma, I. Augen- speech, Proceedings of the Seventh Italian Constein, A diagnostic study of explainability tech- ference on Computational Linguistics CLiC-it 2020 niques for text classification, in: B. Webber, T. Cohn, (2020).

Y. He, Y. Liu (Eds.), Proceedings of the 2020 Con- [27] L. Zhu, X. Wang, X. Wang, Judgelm: Fine-tuned ference on Empirical Methods in Natural Language large language models are scalable judges, ArXiv Processing (EMNLP), 2020, pp. 3256–3274. abs/2310.17631 (2023). [15] Y.-L. Chung, E. Kuzmenko, S. S. Tekiroglu, [28] J. Bengoetxea, Y. Chung, M. Guerini, R. Agerri, M. Guerini, CONAN - COunter NArratives through Basque and spanish counter narrative generation: nichesourcing: a multilingual dataset of responses Data creation and evaluation, in: LREC-COLING to fight online hate speech, in: A. Korhonen, 2024, 2020.

D. Traum, L. Màrquez (Eds.), Proceedings of the [29] D. M. Godden, D. Walton, Advances in the theory 57th Annual Meeting of the Association for Com- of argumentation schemes and critical questions, putational Linguistics, 2019, pp. 2819–2829. Informal Logic 27 (2008) 267–292. [16] T. Brown, e. a. Mann, Language models are few-shot [30] S. C. Lin, J. Hilton, O. Evans, Truthfulqa: Measuring learners, in: H. Larochelle, M. Ranzato, R. Hadsell, how models mimic human falsehoods, in: Annual M. Balcan, H. Lin (Eds.), Advances in Neural In- Meeting of the Association for Computational Linformation Processing Systems, volume 33, Curran guistics, 2021.

Associates, Inc., 2020, pp. 1877–1901. [17] H. Touvron, L. M. et al., Llama 2: Open foundation and fine-tuned chat models, ArXiv abs/2307.09288 (2023). [18] A. Q. Jiang, A. S. et al., Mistral 7b, ArXiv

abs/2310.06825 (2023). [19] C. Rafel, N. Shazeer, A. Roberts, K. Lee, S. Narang,

[1]

U. K.

Ecker ,

Z. O

'Reilly ,

J. S.

Reid ,

E. P.

Chang , The effectiveness of short-format refutational fact-checks , British Journal of Psychology 111 ( 2020 ) 36 - 54 .

[2]

Kouzy ,

J. A.

Jaoude ,

Kraitem ,

M. B. E.

Alam ,

B. S.

Karam ,

Adib ,

Zarka ,

Traboulsi ,

E. W.

Akl ,

Baddour , Coronavirus goes viral: Quantifying the covid-19 misinformation epidemic on twitter , Cureus 12 ( 2020 ).

[3] I. Augenstein , Towards explainable fact checking , arXiv preprint arXiv:2108.10274 ( 2021 ).

[4]

U. K.

Ecker ,

J. L.

Hogan ,

Lewandowsky , Reminders and repetition of misinformation: Helping or hindering its retraction? , Journal of applied research in memory and cognition 6 ( 2017 ) 185 - 192 .

[5]

Thorne ,

Vlachos ,

Christodoulopoulos ,

Mittal , FEVER: a large-scale dataset for fact extraction and VERification , in: M. Walker , H. Ji , A . Stent (Eds.), Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies , Volume 1 ( Long

Papers)

, 2018 , pp. 809 - 819 .

[6]

Augenstein ,

Rocktäschel ,

Vlachos ,

Bontcheva , Stance detection with bidirectional conditional encoding , in: J. Su , K. Duh , X. Carreras (Eds.), Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing , 2016 , pp. 876 - 885 .

[7]

Mohammad ,

Kiritchenko ,

Sobhani ,

Zhu , C. Cherry, SemEval -2016 task 6: Detecting stance