To Click it or not to Click it: An Italian Dataset for Neutralising Clickbait Headlines Daniel Russo1,2,∗ , Oscar Araque3 and Marco Guerini2 1 University of Trento, Trento, Italy 2 Fondazione Bruno Kessler, Trento, Italy 3 Universidad Politécnica de Madrid, Madrid, Spain Abstract Clickbait is a common technique aimed at attracting a reader’s attention, although it can result in inaccuracies and lead to misinformation. This work explores the role of current Natural Language Processing methods to reduce its negative impact. To do so, a novel Italian dataset is generated, containing manual annotations for classification, spoiling, and neutralisation of clickbait. Besides, several experimental evaluations are performed, assessing the performance of current language models. On the one hand, we evaluate the performance in the task of clickbait detection in a multilingual setting, showing that augmenting the data with English instances largely improves overall performance. On the other hand, the generation tasks of clickbait spoiling and neutralisation are explored. The latter is a novel task, designed to increase the informativeness of a headline, thus removing the information gap. This work opens a new research avenue that has been largely uncharted in the Italian language. Keywords clickbait, natural language processing, natural language generation, large language model, language resource 1. Introduction Although clickbait headlines are considered one of the less harmful forms of fake news, as their main goal Accuracy and truthfulness are essential characteristics of is to increase profit by driving traffic to their website journalism. Nevertheless, in an effort to improve revenue, [6, 7], they can sometimes pose a danger, especially when a large number of newspapers and magazines publish they deal with potentially harmful topics such as health clickbait articles, a viral journalism strategy that seeks to and science. To address this problem, Natural Language attract users to click on a link to a page through tactics Processing techniques have been widely employed to such as sensationalist stories and catchy headlines that detect clickbait headlines, with a particular focus on the act as bait. The use of these tactics harms the quality of English language [8, 9]. Hagen et al. [10] proposed the news pieces and thus hinders the ability of citizens to clickbait spoiling task, i.e., the generation of a short text obtain reliable and objective information. The literature that satisfies the curiosity induced by a clickbait post. distinguishes between two main types of clickbait. (i) In light of this, this work addresses the issue of click- Classical clickbait [1] embeds within the headlines infor- bait in the Italian language, studying its characteristics mation gaps, also known as curiosity gaps [2, 3], in order and the possibilities of current technology to reduce its to arouse curiosity in the reader that is forced to access negative impact. In doing so, we have generated a novel the article’s content which is ultimately disappointing. Italian dataset that gathers a large collection of clickbait Classical clickbait usually makes use of hyperbolic lan- articles, which is made public for the community to use 1 . guage, caps lock, demonstrative pronouns and superla- We named the dataset ClickBaIT. This dataset contains tive to grasp the user’s attention [1, 4, 5]. (ii) Deceptive manually annotated instances as clickbait/non-clickbait, clickbait [5] refers to headlines that resemble traditional as well as manually generated spoilers and neutralised media headlines by offering a summary of the article, still headlines. We have also performed a thorough multi- leading to content that differs from the reader’s expec- lingual evaluation, exploiting the availability of English tations. These headlines promise high news value but data to complement our dataset in the task of clickbait deliver content with low news value, resulting in reader detection. Finally, this work also explores the use of our disappointment. annotated dataset and large language models to auto- CLiC-it 2024: Tenth Italian Conference on Computational Linguistics, matically generate both spoilers and, as a novel task, a Dec 04 — 06, 2024, Pisa, Italy neutralised version of clickbait headlines. A graphical ∗ Corresponding author. illustration of the experimental design is presented in Envelope-Open drusso@fbk.eu (D. Russo); o.araque@upm.es (O. Araque); Figure 1. guerini@fbk.eu (M. Guerini) Orcid 0009-0006-9123-5316 (D. Russo); 0000-0003-3224-0001 (O. Araque); 0000-0003-1582-6617 (M. Guerini) © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License 1 Attribution 4.0 International (CC BY 4.0). The dataset is available in https://github.com/oaraque/ClickBaIT CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings 1. CLICKBAIT DETECTION Quale malattia colpisce 3. CLICKBAIT NEUTRALISATION 500mila persone? Question Rewriting La psoriasi: una malattia che colpisce circa La psoriasi 500mila persone in Italia Una malattia che colpisce 500mila persone HEADLINE SPOILER NEUTRALISED HEADLINE HEADLINE HEADLINE HEADLINE ARTICLE 2. SPOILER GENERATION Figure 1: The experimental design is depicted, encompassing three tasks: clickbait detection, spoiler generation, and clickbait neutralisation. The robot icon represents the language model used for either classification or generation. We utilized DistilBERT and Llama3-8B for task 1, and LLaMAntino-3-8B for tasks 2 and 3. The models were tested for generative tasks using zero-shot, few-shot, and fine-tuning configurations, except for question rewriting, for which we employed a few-shot approach. 2. Related Work 3. Dataset The use of clickbait is common in many news outlets, 3.1. Dataset Creation and thus it has been extensively studied. There are several works that address clickbait detec- Data were collected from fourteen news websites2 , noto- tion: Potthast et al. [8] collected a corpus of clickbait rious for acting as news aggregators, engaging in plagia- articles, posted by well-known English-speaking newspa- rism, lacking fact-checking, and using sensational head- pers on Twitter, and proposed a set of lexical and semantic lines to draw in readers. In all the websites, articles are features to be used with a Random Forest classifier. Fol- labelled according to specific categories; we decided to lowing the general trend in Natural Language Processing focus on four macro-categories: health, science, economy, (NLP) field, clickbait detection has also been explored and environment. These categories have been selected using deep learning methods, such as convolutional [11] to cover some of the most frequent - and potentially and recurrent [12] neural networks, as well as more re- hazardous - domains where clickbait is usually found. cent Transformer-based approaches [9]. Since the categories varied a lot from website to web- Other works leveraged Natural Language Generations site, we manually mapped each category into one of the (NLG) strategies to create a piece of text, the spoiler, com- four macro categories under analysis. Two annotators, prising the information needed to fulfil the curiosity gap knowledgeable in the area, were then provided with the present in clickbait headlines. This task was proposed by headlines and the related articles and were asked to la- Fröbe et al. [13] with the name of spoiling generation. The bel whether a headline was clickbait. For aiding in this authors created the Webis Clickbait Spoiling Corpus 2022, task, we have used as reference the clickbait measure as and cast spoiler generation as a Question Answering task. computed by Arthur et al. [21]. Eventually, given the Eventually, they open the challenge to the community clickbait dataset, the two annotators were required to through a SemEval-2023 shared task [13, 14]. The op- extract the gold spoilers from the article’s text and to pro- timal spoiler generator operates with five independent duce the neutralised forms for each headline. To this end, sequence-to-sequence generative models. It selects the we employed an author reviewer strategy [22]: an LLM best spoiler through a majority vote, determined by com- (ChatGPT gpt-3.5-turbo-0125 3 ) was used to generate paring edit distances among the outputs [15]. both the spoilers and the neutralised forms (author com- Regarding the languages studied, the majority of works ponent)4 , and the native Italian speaking annotators were are based on English. Other works were performed in asked to manually post-edited the generations (reviewer Chinese [16], Turkish [17, 18] and Spanish [19, 20]. To component).5 This procedure was proven to be more the best of our knowledge, this is the first work that fully effective and less time-consuming than writing the data addresses the study of clickbait detection and spoiling 2 Essere Informati, TGNewsItalia, Voxnews, DirettaNews, Informati, in the Italian language. Moreover, we propose a novel Italia, Jeda News, News Cronaca, TG5Stelle, TG24-ore, ByoBlu, task, i.e., clickbait neutralisation, which aims at filling Mag24, WorldNotix, lo sapevi che, Fortementein the curiosity gap by rewriting the headline levering the 3 https://chat.openai.com 4 information of the spoiler. In Appendix A.3 we provide the prompt employed 5 Details in Appendix A.2 Category Headline Article Clickbait Spoiler Neutralised title Health Frutto o fiore? gusto- Tutti la conosciamo, im- True La fragola Fragola: gustosissima e sissima e attraente, una mancabile sulle nostre attraente, una celebrità celebrità sulle nostre tav- tavole, celebre in tutto sulle nostre tavole ole, sveliamo chi è il mondo ma misteriosa la sua natura, frutto da gustare o fiore... Science Scoperto un metallo che Il recente esperimento True Il platino Il metallo che si auto- si auto-ripara. Scienziati ha rivelato un fenomeno ripara: il platino sbalorditi straordinario... Health Una malattia che colpisce Parliamo di una malat- True La psoriasi colpisce circa La psoriasi: una malattia 500mila persone tia sistemica cronica me- 500 mila persone che colpisce circa 500mila diata dal sistema immu- persone in Italia nitario che interessa... Environment Zanzare, ecco come elim- Con l’arrivo del caldo, an- True Per eliminare una volta Zanzare, ecco come elim- inarle senza insetticidi che le zanzare si fanno per tutte le zanzare dalla inarle senza insetticidi: largo nelle nostre case o vostra casa, dovreste ac- basta acquistare un pip- nei nostri giardini... quistare un pipistrello istrello Table 1 An excerpt of the presented dataset showing the most relevant fields. Article bodies are shortened for space reasons. Translated text can be found in Table 9 (Appendix B). from scratch [23]. To assess the amount of post-editing 3.2. Dataset Analysis required, we employed Human-targeted Translation Edit The complete ClickBaIT dataset consists of 4,144 entries. Rate [HTER; 24]. HTER quantifies the minimum edit Each entry includes the following fields: (i) source web- distance, which is the least number of editing operations site, that specifies the source of the article; (ii) publica- needed, between a machine-generated text and its post- tion date, which is captured from the original source; edited counterpart. HTER values exceeding 0.4 indicate (iii) headline text; (iv) article text; (v) original URL; low-quality outputs; under such circumstances, rewrit- (vi) macro category inferred from the original category ing the text from scratch or extensive post-editing would extracted from the source; (vii) image URL associated necessitate comparable effort [25]. with the article as specified in the source; (viii) clickbait The obtained HTER results for the spoiler generation annotation; (ix) the associated spoiler; and (x) the (0.4) are higher than those computed upon the neutrali- neutralised version of the title. sation (0.3), in par or slightly lower than the 0.4 thresh- Table 2 shows the main statistics of the final version old. The high HTER values, especially for the spoiler of the dataset. The golden set is manually annotated and annotation, can be attributed to the model’s tendency thus contains high-quality information. Additionally, the to generate spoilers comprising more details than those silver set has been annotated automatically as described necessary to fill the curiosity gap. While in some cases and therefore contains a larger number of instances. a simple deletion was sufficient, in others the annotator To gain a deeper understanding of the content of the had to rewrite the spoiler almost completely. Regarding dataset we have used Variationist [26], a tool that allows the annotation of the neutralisation texts, the higher re- to inspect useful statistics and patterns in textual data. sults are a consequence of the spoiler generation, as the Upon inspection of the data, we have detected several model was required to generate them simultaneously. patterns frequently used for generating the curiosity gap. With this, we have generated the golden set of the Of course, one of the most common strategies used in dataset, in which all the instances were manually anno- tated. Further details regarding the dataset creation can be found in Appendix A. To expand this set, we have used Set Clickbait (%) Non-clickbait (%) Total a clickbait classifier (see Sect. 4.1) to automatically detect clickbait headlines. This new set of data, automatically Golden 698 (53%) 629 (47%) 1,327 Silver 1,563 (56%) 1,224 (44%) 2,787 annotated, constitutes the silver set of our dataset. Sev- eral examples of dataset entries are provided in Table 1. Total 2,261 1,853 4,114 Table 2 Size of the presented dataset, considering both golden and silver sets. clickbait headlines is the formulation of a question that multilingual-cased 7 ) model trained in a multilingual is later answered in the article, even though sometimes setting, and (ii) the Llama3-8B language model (meta- it is not. In the instance “Quanto è green il gas? ” (How llama/Meta-Llama-3-8B 8 ). The composed dataset has green is gas? ) the article explains that gas is not consid- been split into train and test splits, which have been used ered green. Another frequent strategy we have detected to fine-tune and evaluate these models, respectively. is the introduction to the content of the article, which To assess the effect of using a mixture of both En- invites the reader to click it: Beve un cucchiaio di aceto di glish and Italian instances in the dataset, we evaluate mele nell’acqua tutti i giorni, ecco cosa succede (Drinks a the performance of the two models in a monolingual tablespoon of apple cider vinegar in water every day, this setting (e.g., fine-tuning in Italian and predicting in the is what happens). same language) as well as the multilingual variant (e.g., Another usual pattern is the reference to enumerations, fine-tuning in English and Italian text, and predicting on frequently using round and manageable numbers such as Italian instances). 10, 8, and 5. This can be done for introducing numbered content, as in “Le 10 fantasie femminili più segrete” (The 4.2. Spoiler Generation 10 most secret female fantasies), or even to generate a re- action in the reader: “Hai solo 10 secondi per salvarti. Ecco The spoiler generation task consists in generating a cosa devi fare:” (You only have 10 seconds to save yourself. short message that fulfils the curiosity gap present in Here’s what you have to do:). Other means can be used a given clickbait title, by extracting the information from to make headlines noticeable, such as introducing text the linked article. To this end, we tested LLaMAntino- in all caps, using striking vocabulary or even punctua- 3-ANITA-8B-Inst-DPO-ITA (LLaMAntino-3-8B here- tion marks, as in “[ALLARME] Truffa AUTO USATE, fate after) [30] on our clickbait dataset. The model was tested attenzione!” ([ALERT] USED CAR scam, beware!). both in in-context learning (zero- and few-shot) and fine- See Table 8 (Appendix A.2) for a collection of patterns tuning settings. that have been considered during the manual annotation Building on prior research that frames spoiler genera- of the dataset. Besides, Appendix B includes a graphical tion as a Question Answering task [31], we prompt the summary of the dataset, while its interactive version can model to rewrite clickbait headlines as questions and ex- be accessed online.6 Details are provided in Appendix C. tract the corresponding answers, i.e., the spoilers, from the linked articles. 4. Experimental Design 4.3. Clickbait Neutralisation The experimental design comprises three steps: clickbait The best-performing configuration was employed for the detection, spoiler generation and clickbait neutralisation. neutralisation of the clickbait headlines. To this end, we instructed the LLM to perform a style transfer task, 4.1. Clickbait Detection from a clickbait headline style to a more journalistic one, while integrating the spoiler information into the original This is the first and most basic task aimed at addressing headline. the clickbait phenomenon. To explore the effect of using additional data in the training process, we use the Webis- Clickbait-17 [27], an English dataset containing clickbait 5. Results and Discussion that is also annotated in a binary fashion. Following the insights by Araque et al. [28], we use the 5.1. Evaluation Metrics training on English data to improve the classification of Italian data. The main idea is to harness the availability Firstly, for the evaluation of the clickbait detection of large amounts of English data, generating a compound task we use the macro-averaged precision, recall and dataset with a lower amount of Italian instances. To do f-score. This allows us to assess the performance even so, a multilingual mixture dataset is created so that 35% in an unbalanced scenario. For the generation tasks, we of the final dataset comprises Italian instances, while the assessed lexical similarity through ROUGE score [32] rest are in English. and semantic similarity. For the latter, text embed- We model the detection challenge as a binary clas- dings, computed 9 using sentence-bert-base-italian- sification task: clickbait/non-clickbait. To study the xxl-uncased , were compared using cosine similarity. complexity of the task, we explore two different models for classification: (i) a DistilBERT [29] (distil-base- 7 https://huggingface.co/distilbert-base-multilingual-cased 8 https://huggingface.co/meta-llama/Meta-Llama-3-8B 9 https://huggingface.co/nickprock/sentence-bert-base-italian-xxl- 6 https://oaraque.github.io/ClickBaIT/clickbait.html uncased zero-shot few-shot fine-tuning R1 RL SemSim R1 RL SemSim R1 RL SemSim headlines 0.189 0.157 0.567 0.250 0.221 0.667 0.260 0.234 0.659 questions 0.271 0.249 0.645 0.286 0.258 0.630 0.250 0.224 0.646 Table 3 LLaMAntino-3-8B results for the spoiler generation task. We report ROUGE 1 and L (R1, RL) and semantic similarity (SemSim). 5.2. Clickbait detection few examples provided in the few-shot approach, which make the model aware of the task while allowing more Table 4 shows the results of the evaluation in the task creative outputs (resulting in lower ROUGE scores). Con- of clickbait classification. As expected, introducing data versely, the fine-tuned model learned from the training instances in English improves the performance in Italian. data to adhere more closely to the source article, which In the case of classification in Italian, we see a staggering comes at the expense of producing semantically richer improvement for the Llama3 model of 8.43 points. This responses (evidenced by lower SemSim scores). further supports previous results [28]. We argue that Interestingly, casting spoiler generation as a question- augmenting the training set with instances in a diverse answering task yields higher results in the zero-shot set- language is an effective strategy that can be generalised ting compared to using headlines as input. However, the to other tasks. results for few-shot and fine-tuning scenarios tend to be We also see that the best model for the classification on par. This can be explained by the fact that headlines of clickbait is the one obtained with Llama3, trained with may contain multiple gaps that the human-annotated both English and Italian data. Hence, we use this model dataset accounted for, but the non-supervised “question to predict on the silver set of our dataset. generation” module could not fully capture. Generally, this approach leads to sufficiently good results; however, Test Train Model Prec. Rec. M-F1 we believe that more attention should be given to the DistilBERT 67.15 70.34 66.94 quality of the questions, either through more efficient EN Llama3 68.42 66.46 67.18 prompts or with human-generated/curated data. EN DistilBERT 70.28 70.14 70.12 EN+IT Llama3 71.20 71.15 71.15 5.4. Clickbait Neutralisation Results DistilBERT 68,85 70.47 68.65 IT In Table 5, we report the results for clickbait neutralisa- Llama3 66.96 67.19 67.07 IT tion. For this task, we prompted LLaMAntino-3-8B with DistilBERT 72.87 74.85 71.77 a few-shot approach, employing the spoilers generated EN+IT Llama3 76.32 75.51 75.50 with the three configurations of the previous experiments Table 4 (headlines as input). Using spoilers generated with the Results for Clickbait detection. The ‘Test’ and ‘Train’ columns fine-tuned models leads to higher results both for lexi- indicate the languages of the test and train sets, respectively. cal and semantic metrics. Interestingly, scores tend to increase when the training complexity of the input data increases. In Table 6 we report examples of headlines along with their generated spoilers (through the fine- 5.3. Spoiler Generation Results tuned model) and their neutralisation. Results for the spoiler generation task are reported in Ta- ble 3. We evaluated the capabilities of LLaMAntino-3-8B input data R1 RL SemSim in both in-context learning scenarios (zero- and few-shot) zero-shot 0.250 0.212 0.675 and through fine-tuning. As inputs, we used clickbait few-shot 0.265 0.223 0.706 headlines and questions generated by ChatGPT, instruct- fine-tuning 0.286 0.247 0.715 ing the model to execute a Question Answering task for Table 5 the latter. When using headlines as input, few-shot and Neutralisation generation results. Automatically generated fine-tuning approaches outperform zero-shot methods. spoilers from the previous experiments were used as input for Few-shot approaches demonstrate higher performance the few-shot generation of the data. We report ROUGE 1 and in terms of semantic similarity, while fine-tuning exhibits L (R1 and RL) and the semantic similarity scores. stronger lexical adherence to the source document, as reflected in ROUGE scores. This can be attributed to the Headline Spoiler Neutralisation “Juventus in Serie B”: perché c’è 15 punti di penalizzazione Juventus in grave difficoltà: 15 panico tra i tifosi, la scoperta punti di penalizzazione e il ris- delle ultime ore chio di cadere in Serie B Lutto tremendo nello sport ital- “Samuel Dilas era un giocatore di pallacanestro che mili- Tragico decesso del palla- iano, morto giovanissimo dopo tava nel Virtus Lumezzane a Brescia, in Serie B” e “aveva canestrista Samuel Dilas, 24 un malore 24 anni” e “era alto 206 centimetri” e “nato a Novellara anni, ex convalescente da (Reggio Emilia)” e “aveva un padre di nome Torsen, una polmonite e giocatore della madre di nome Chiara e una sorella minore di nome Maia” Virtus Lumezzane e “era in convalescenza dopo una polmonite” e “era ar- rivato alla Virtus Lumezzane nella scorsa stagione”. Un papà si rifiuta di mangiare Il cameriere ha sbattuto fuori il maleducato padre che Un cameriere espelle un cliente accanto a un bambino Down di voleva essere spostato a causa della presenza di un bam- maleducato che chiede di es- 5 anni, il cameriere decide di bino con sindrome di Down. sere spostato per non sedersi fare questo accanto a un bambino con sin- drome di Down. E’ doloroso e si forma tra le dita “L’occhio di pernice è causato principalmente dalla pres- Come rimuovere l’occhio di per- dei piedi, ecco come rimuoverlo sione della scarpa che favorisce la formazione di un’ispes- nice, un problema di pressione simento di pelle che provoca dolore, in quanto è soggetto e attrito causato dalle scarpe all’attrito tra le dita. Per rimuovere l’occhio di pernice è fondamentale ammorbidire prima la zona interessata per poi provare a rimuovere l’ispessimento utilizzando rimedi naturali senza dolore e in modo semplice.” La chiamano “LA BOMBA” la Lo zenzero è un rimedio naturale per il trattamento di Miscela naturale di limone, zen- miscela che in sole 24-48 ore tosse, raffreddore e influenza. La miscela limone, zenzero zero e miele allevia i sintomi di elimina influenza, raffreddore e e miele è ideale per alleviare i sintomi delle comuni malat- tosse, raffreddore e influenza in tosse tie. Basta prendere 2 o 3 cucchiai della miscela naturale, pochi giorni. riempire una tazza con acqua calda e lasciare in infusione per 3 o 4 minuti. Table 6 Examples of clickbait headlines, along with the automatically generated spoiler and neutralised version. 6. Conclusion considering certain sensitive domains such as health. Thus, we hope that this work facilitates future research This work presents ClickBaIT, a novel Italian dataset on the topic for example, by addressing the link between for clickbait modelling, as well as a diverse set of ex- clickbait and misinformation, considering both in a uni- periments to assess the effectiveness of current models fied framework. for clickbait detection, spoiling and neutralisation. The dataset includes news articles that have been manually annotated to indicate the presence of clickbait, spoilers Acknowledgments associated with clickbait headlines, and their respective neutral headlines. This work was partly supported by: the AI4TRUST The experiments explore the effectiveness of current project - AI-based-technologies for trustworthy solu- NLP methods for the modelling of clickbait headlines in tions against disinformation (ID: 101070190), the Euro- Italian through ClickBaIT. The evaluation for clickbait pean Union’s CERV fund under grant agreement No. detection shows how training data can be augmented in 101143249 (HATEDEMICS), the European Union’s Hori- a multilingual setting, which leads to classification im- zon Europe research and innovation programme un- provements that are in line with previous research [28]. der grant agreement No. 101135437 (AI-CODE). Oscar The generation experiments, for both spoiling and neu- Araque acknowledges the support of the project UNICO tralisation, evidence that the evaluated model does ben- I+D Cloud - AMOR, financed by the Ministry of Eco- efit from in-domain knowledge extracted from the pro- nomic Affairs and Digital Transformation, and the Euro- posed dataset. As seen, these informed generations are pean Union through Next Generation EU; as well as the more accurate and align better with the golden text. support of the project CPP2023-010437 financed by the Considering the effect of clickbait, we argue that while MCIN / AEI / 10.13039/501100011033 / FEDER, UE. there are initially harmless articles, lack of accuracy can have a detrimental effect on readers. This is clear when References v1/2022.acl- long.484 . [11] A. Agrawal, Clickbait detection using deep learn- [1] K. Scott, You won’t believe what’s in this ing, in: 2016 2nd International Conference on Next paper! clickbait, relevance and the curios- Generation Computing Technologies (NGCT), 2016, ity gap, Journal of Pragmatics 175 (2021) pp. 268–272. doi:10.1109/NGCT.2016.7877426 . 53–66. URL: https://www.sciencedirect.com/ [12] S. Kaur, P. Kumar, P. Kumaraguru, Detecting science/article/pii/S0378216621000229. doi:https: clickbaits using two-phase hybrid cnn-lstm biterm //doi.org/10.1016/j.pragma.2020.12.023 . model, Expert Systems with Applications 151 (2020) [2] J. N. Blom, K. R. Hansen, Click bait: Forward- 113350. URL: https://www.sciencedirect.com/ reference as lure in online news headlines, science/article/pii/S0957417420301755. doi:https: Journal of Pragmatics 76 (2015) 87–100. //doi.org/10.1016/j.eswa.2020.113350 . URL: https://www.sciencedirect.com/science/ [13] M. Fröbe, B. Stein, T. Gollub, M. Hagen, M. Pot- article/pii/S0378216614002410. doi:https: thast, SemEval-2023 task 5: Clickbait spoiling, //doi.org/10.1016/j.pragma.2014.11.010 . in: A. K. Ojha, A. S. Doğruöz, G. Da San Mar- [3] G. Loewenstein, The psychology of curiosity: A tino, H. Tayyar Madabushi, R. Kumar, E. Sar- review and reinterpretation, Psychological Bulletin tori (Eds.), Proceedings of the 17th Interna- 116 (1994) 75–98. doi:10.1037/0033- 2909.116.1. tional Workshop on Semantic Evaluation (SemEval- 75 . 2023), Association for Computational Linguis- [4] K. Scott, R. Jackson, When everything stands out, tics, Toronto, Canada, 2023, pp. 2275–2286. nothing does, Relevance theory, figuration, and URL: https://aclanthology.org/2023.semeval-1.312. continuity in pragmatics 8 (2020) 167–192. doi:10.18653/v1/2023.semeval- 1.312 . [5] K. Scott, “deceptive” clickbait headlines: Relevance, [14] A. K. Ojha, A. S. Doğruöz, G. Da San Martino, intentions, and lies, Journal of Pragmatics 218 H. Tayyar Madabushi, R. Kumar, E. Sartori (Eds.), (2023) 71–82. URL: https://www.sciencedirect.com/ Proceedings of the 17th International Workshop science/article/pii/S0378216623002643. doi:https: on Semantic Evaluation (SemEval-2023), Asso- //doi.org/10.1016/j.pragma.2023.10.004 . ciation for Computational Linguistics, Toronto, [6] S. Zannettou, M. Sirivianos, J. Blackburn, Canada, 2023. URL: https://aclanthology.org/2023. N. Kourtellis, The web of false information: semeval-1.0. Rumors, fake news, hoaxes, clickbait, and various [15] H. Kurita, I. Ito, H. Funayama, S. Sasaki, S. Moriya, other shenanigans, J. Data and Information Quality Y. Mengyu, K. Kokuta, R. Hatakeyama, S. Sone, 11 (2019). URL: https://doi.org/10.1145/3309699. K. Inui, TohokuNLP at SemEval-2023 task 5: doi:10.1145/3309699 . Clickbait spoiling via simple Seq2Seq generation [7] E. Aïmeur, S. Amri, G. Brassard, Fake news, disin- and ensembling, in: A. K. Ojha, A. S. Doğruöz, formation and misinformation in social media: a G. Da San Martino, H. Tayyar Madabushi, R. Ku- review, Social Network Analysis and Mining 13 mar, E. Sartori (Eds.), Proceedings of the 17th (2023) 30. International Workshop on Semantic Evaluation [8] M. Potthast, S. Köpsel, B. Stein, M. Hagen, Clickbait (SemEval-2023), Association for Computational Lin- detection, in: Advances in Information Retrieval: guistics, Toronto, Canada, 2023, pp. 1756–1762. 38th European Conference on IR Research, ECIR URL: https://aclanthology.org/2023.semeval-1.243. 2016, Padua, Italy, March 20–23, 2016. Proceedings doi:10.18653/v1/2023.semeval- 1.243 . 38, Springer, 2016, pp. 810–817. [16] T. Liu, K. Yu, L. Wang, X. Zhang, H. Zhou, [9] P. Rajapaksha, R. Farahbakhsh, N. Crespi, Bert, X. Wu, Clickbait detection on wechat: A deep xlnet or roberta: The best transfer learning model integrating semantic and syntactic infor- model to detect clickbaits, IEEE Access 9 mation, Knowledge-Based Systems 245 (2022) (2021) 154704–154716. doi:10.1109/ACCESS.2021. 108605. URL: https://www.sciencedirect.com/ 3128742 . science/article/pii/S0950705122002714. doi:https: [10] M. Hagen, M. Fröbe, A. Jurk, M. Potthast, Click- //doi.org/10.1016/j.knosys.2022.108605 . bait spoiling via question answering and pas- [17] Şura Genç, E. Surer, Clickbaittr: Dataset for click- sage retrieval, in: S. Muresan, P. Nakov, bait detection from turkish news sites and social me- A. Villavicencio (Eds.), Proceedings of the 60th dia with a comparative analysis via machine learn- Annual Meeting of the Association for Com- ing algorithms, Journal of Information Science 49 putational Linguistics (Volume 1: Long Pa- (2023) 480–499. doi:10.1177/01655515211007746 . pers), Association for Computational Linguistics, [18] A. Geçkil, A. A. Müngen, E. Gündogan, M. Kaya, Dublin, Ireland, 2022, pp. 7025–7036. URL: https:// A clickbait detection method on news sites, in: aclanthology.org/2022.acl-long.484. doi:10.18653/ 2018 IEEE/ACM International Conference on Ad- vances in Social Networks Analysis and Min- Exploring multifaceted variation and bias in writ- ing (ASONAM), 2018, pp. 932–937. doi:10.1109/ ten language data, arXiv preprint arxiv:2406.17647 ASONAM.2018.8508452 . (2024). URL: https://arxiv.org/abs/2406.17647. [19] C. Oliva, I. Palacio-Marín, L. F. Lago-Fernández, [27] M. Potthast, T. Gollub, K. Komlossy, S. Schuster, D. Arroyo, Rumor and clickbait detection by M. Wiegmann, E. Garces Fernandez, M. Hagen, combining information divergence measures and B. Stein, Crowdsourcing a Large Corpus of Clickbait deep learning techniques, in: Proceedings of on Twitter, in: E. Bender, L. Derczynski, P. Isabelle the 17th International Conference on Availabil- (Eds.), 27th International Conference on Compu- ity, Reliability and Security, ARES ’22, Association tational Linguistics (COLING 2018), Association for Computing Machinery, New York, NY, USA, for Computational Linguistics, 2018, pp. 1498–1507. 2022. URL: https://doi.org/10.1145/3538969.3543791. URL: https://aclanthology.org/C18-1127/. doi:10.1145/3538969.3543791 . [28] O. Araque, M. F. L. Corniel, K. Kalimeri, Towards a [20] I. García-Ferrero, B. Altuna, Noticia: A clickbait multilingual system for vaccine hesitancy using a article summarization dataset in spanish, arXiv data mixture approach., in: Proceedings of the 9th preprint arXiv:2404.07611 (2024). Italian Conference on Computational Linguistics, [21] T. E. C. L. Arthur, A. T. Cignarella, S. Frenda, M. Lai, 2023. M. A. Stranisci, A. Urbinati, et al., Debunker assis- [29] V. Sanh, L. Debut, J. Chaumond, T. Wolf, Distilbert, tant: a support for detecting online misinformation, a distilled version of bert: smaller, faster, cheaper in: Proceedings of the Ninth Italian Conference and lighter, arXiv preprint arXiv:1910.01108 (2019). on Computational Linguistics (CLiC-it 2023), vol- [30] M. Polignano, P. Basile, G. Semeraro, Advanced ume 3596, Federico Boschetti, Gianluca E. Lebani, natural-based interaction for the italian language: Bernardo Magnini, Nicole Novielli, 2023, pp. 1–5. Llamantino-3-anita, 2024. arXiv:2405.07101 . [22] S. S. Tekiroğlu, Y.-L. Chung, M. Guerini, Generat- [31] M. Woźny, M. Lango, Generating clickbait spoilers ing counter narratives against online hate speech: with an ensemble of large language models, arXiv Data and strategies, in: D. Jurafsky, J. Chai, preprint arXiv:2405.16284 (2024). N. Schluter, J. Tetreault (Eds.), Proceedings of the [32] C.-Y. Lin, Rouge: A package for automatic eval- 58th Annual Meeting of the Association for Com- uation of summaries, in: Text summarization putational Linguistics, Association for Computa- branches out, 2004, pp. 74–81. tional Linguistics, Online, 2020, pp. 1177–1190. URL: https://aclanthology.org/2020.acl-main.110. doi:10. 18653/v1/2020.acl- main.110 . [23] D. Russo, S. Kaszefski-Yaschuk, J. Staiano, M. Guerini, Countering misinformation via emotional response generation, in: H. Bouamor, J. Pino, K. Bali (Eds.), Proceedings of the 2023 Conference on Empirical Methods in Natural Lan- guage Processing, Association for Computational Linguistics, Singapore, 2023, pp. 11476–11492. URL: https://aclanthology.org/2023.emnlp-main.703. doi:10.18653/v1/2023.emnlp- main.703 . [24] M. Snover, B. Dorr, R. Schwartz, L. Micciulla, J. Makhoul, A study of translation edit rate with tar- geted human annotation, in: Proceedings of the 7th Conference of the Association for Machine Trans- lation in the Americas: Technical Papers, Associa- tion for Machine Translation in the Americas, Cam- bridge, Massachusetts, USA, 2006, pp. 223–231. URL: https://aclanthology.org/2006.amta-papers.25. [25] M. Turchi, M. Negri, M. Federico, Coping with the subjectivity of human judgements in MT quality estimation, in: Proceedings of the Eighth Workshop on Statistical Machine Translation, Association for Computational Linguistics, Sofia, Bulgaria, 2013, pp. 240–251. URL: https://aclanthology.org/W13-2231. [26] A. Ramponi, C. Casula, S. Menini, Variationist: scienza insetti, animali, AI, scienza, smartphone, Spazio, tecnologia, TECNOLOGIE, SCIENZA, ufo, biochimica, eclissi, bomba atomica, terra piatta, idroelettrico, temperatura, coltivazione, robot, fisica quantistica, macchie solari, ricerca, vulcano, titanio, universo, fotovoltaico, intelligenza, iPhone, hacker, microonde, motori di ricerca, onde elettromagnetiche, tecnologia, sole, scienza, radioterapia, pesticidi, armi chimiche, comete, case farmaceutiche, psichiatria, smartphone, formiche, elettrodomestici, solare, macrobiologi, mondo, lampadine a basso consumo, tecnologia, scienze-e-tech, scienza, scienza, innovazione, scienza, tecnologia-2, animali intelligenti, funzione cognitiva, microchip, cani, samsung, wi fi, tecnologia-e-tv, SCIENZE, TECNOLOGIA, bioetica, biologia, fisica, covid, coronavirus salute Salute, CORONAVIRUS, VAIOLO SCIMMIE, TUBERCOLOSI, SALUTE, SCABBIA, AIDS, salute, hiv, cocaina, antidepressivi, veleni, infezioni, carne, tabacco, infibulazione, fluoro, alcool, alimentari, aids, antibatterico, dieta, insetticida, cibo, benessere, farmaci, digitopressione, caffè, sigarette, ministero della salute, autismo, limoni, cure naturali, paracetamolo, cancro, antiossidante, droga, olio, medicina alternativa, fragole, vegetariano, eroina, dislessia, veleno, zenzero, virus, psicologia, biologico, magne- sio, frutta, psicofarmaci, pollo al cloro, fiori di bach, medico, sonno, birra, vitamina e, ulivi, proteine, stress, banana, pensieri negativi, tumori, benzodiazepine, latte, miele, cuore, epilessia, longevità, mari- juana, diabete, sale, ibernazione, vecchiaia, fegato, vegan, prevenzione, dentifricio, cervello, sistema immunitario, sodio, suicidio, rimedi naturali, maltempo, canapa, pillola, mal di gola, depressione, psiche, alimentazione, ebola, aspartame, dentifricio senza fluoro, tiroide, mangiare, cure proibite, Alzheimer, smog, gas, malattie, calamità, mammografia, verdura, aloe, masticazione, farmaco, igiene, batteri, medicina, vitamina c, epatite c, forfora, energia, vaccini, ormoni, flora batterica, sorbitolo, antibiotici, piedi, obesità, arsenico, cortisolo, chemioterapia, contraccezione, Neurotrasmettitori, semi, melograno, celiachia, Coca cola, salute-benessere, salute, salute-e-benessere, bellezza, dimagrante, benessere, salute-benessere, rimedi-naturali, pianeta-mamma, grano antico, acqua ossigenata, alimet- nazione, ansia, dentisti, curcuma, casa-e-cucina, hobby-e-sport, SPORT, crescita-consapevolezza, la-salute-che-viene, sport, stile-di-vita, consigli, lifestyle, pomodori ambiente Cambiamenti climatici, energia, energia elettrica, Natura, AMBIENTE, ECOLOGIA, global warming, geoingegneria, alberi, pianeta terra, natura, inquinamento, mare, terra, manipolazione climatica, clima, rinnovabili, Dissesto idrogeologico, ecologia, ambiente, green, ambiente-attuale, ecologia, salute-benessere, natura, ambiente, METEO, tempesta solare, astronomia, acido economia affari-online, economia, ECONOMIA, consumi-risparmi, microchip r-fid, bollo auto, tasso d’interesse, finanza, bollette, banche, profitto, spese, economia-finanza, economia, economia, economia-dellanima, fisco-e-tasse, economia, economia, economia, economia-e-finanza Table 7 Split of the categories into the four macro-categories. A. Dataset Creation Details notators received both a score indicating how much the headline was clickbait and automatic ChatGPT gpt-3.5- A.1. Category Assessment turbo-0125 generated suggestions for the spoilers and the neutralized versions of the headlines. Below, we have In Table 7 we report how the heterogeneous categories outlined the annotation guidelines that the annotators scraped directly from the misleading websites were di- were to follow. vided into the four macro-category of scienza (science), salute (well-being), ambiente (environment), economia (economy). Clickbait labelling In order to select the clickbait headlines present in the scraped data, the annotators were provided with specific guidelines. Table 8 provides A.2. Annotation Guidelines the main key points taken into consideration in order to Three components of our datasets were subject to human label the data. intervention to: (i) determine if the headline was click- bait, (ii) identify the related article’s spoiler, that is, the Spoiler post-editing For the post-editing of the information required to satisfy the curiosity gap within spoiler the annotator was required to spot in the headline the headline, and (iii) revise the headline to include the the information gap and to check if the generated spoiler spoiler information, thereby neutralizing it. During all was providing that information checking the related ar- three annotation stages, we employed a machine-human ticle. If the model failed to find the proper spoiler, the collaboration to expedite the work of annotators. The an- annotator had to rewrite it sticking as much as possible to Characteristic Original example (IT) Translated example (EN) Lack of essential information, “Ora riposa in pace”. Calcio in lutto, morto “Now rest in peace”. Football in mourning, one i.e., the subject the article is talk- uno dei grandi protagonisti dell’Italia of Italy’s great protagonists dead ing about Sensationalist tone Fan ubriaca le salta addosso sul palco. La Drunk fan jumps on her on stage. Her reaction sua reazione è incredibile e sconvolge tutti i is incredible and shocks everyone present presenti Questions raised but answered Tratti della nostra colonna: quali sono? Come Traits of our column: what are they? How to in the article body evitare lesioni? avoid injuries? Enumeration of elements 10 cibi per sbarazzarsi del gonfiore di stomaco 10 foods to get rid of bloated stomach and e pancia tummy Use of capitalization INFARTO: sopravvivere quando si è soli. Hai HEART ATTACK: surviving when alone. You solo 10 secondi per salvarti. Ecco cosa devi only have 10 seconds to save yourself. Here’s fare: what you have to do: Introduction of the content Zanzare, ecco come eliminarle senza insetti- Mosquitoes, this is how to eliminate them with- without actually giving the in- cidi out insecticide formation Use of quotations that do not Omicron, Ilaria Capua: “Ecco perché i vacci- Omicron, Ilaria Capua: “This is why the vacci- give information nati si infettano di più rispetto a prima” nated get more infected than before” Table 8 Key points used for the annotation of the dataset. Please note that some instances can exemplify more than one point. the document’s text. If the spoiler was correct but added The clickbait headline typically omits key extra info, the annotator had to keep those extra informa- information to create a curiosity gap for tion only if those were essential for having a complete the reader. Your task is to extract this headline. If the spoiler was correct, then the annotator missing information, known as a “spoiler,” could leave it as it was. from the article’s text. The spoiler can be a single keyword, a short text passage, or a Neutralised Clickbait Post-Editing The annotator list of keywords. Once you have identified was required to check if the neutralised forms comprises the spoiler, rewrite the clickbait headline both the headline and the spoiler information. If the by incorporating this information to elim- spoiler was very long (e.g., long listing), then the anno- inate the curiosity gap. The output must tator had to summarise the spoiler as much as possible be in JSON format and written in Italian. aiming to embed in the final novel headline enough in- The JSON should include two entries: one formation to reduce or remove the information gap. If called “spoiler” that contains the extracted the model failed at addressing the spoiler information in spoiler(s), and another called “new_head- the neutralised version of the headline, then the anno- line” that has the revised headline. tator had to manually add it. Moreover, the annotator Example Input: was required to remove sensationalist tones as much as Clickbait headline: “Questo attore ha possible, if this tone was still creating useless curiosity fatto qualcosa di incredibile sul set di un in the reader. famoso film!” Article: “Durante le riprese del film ‘Il Gladiatore’, l’attore Russell A.3. Author Component Instruction Crowe ha deciso di fare un gesto di grande Hereafter, we provide the instruction employed to au- generosità donando una parte significa- tomatically generate spoilers and the neutralised ver- tiva del suo stipendio al fondo per i mem- sions of the clickbait headlines through ChaGPT gpt- bri della troupe.” 3.5-turbo-0125 . Example Output: {“spoiler”: “Russell Crowe ha donato I have a clickbait headline and its corre- una parte significativa del suo stipen- sponding article, both written in Italian. Search the chart Top Non-Clickbait Characteristic Non-Clickbait Frequency sinner lutto e la Frequent perina grazie tumore i di con del l’ coronavirus covid il italia ha gli le question contro infarto coronavirus un’nel che all’ dei lo anche vaccino contro alla su cosa ma italia sintomi sinner anche rischio dell’ lutto italiani d’ grazie ospedale come si mondo covid morto cancro dopo più ecco all’ scoperta così ’ sarà dove ora ci sarà scoperto in lutto nella tumore ai sul è morto euro alle tutti primo l' vaccino addio alla campionessa nuova 3 tra | virus giorno fa italiano sui muore sui storia consigli suo arriva vita mi può delle sport % vialli governo cibi governo sta solo allerta moglie perché contro il dall’ europa gianluca italiana nuovo tumori italiani segnali nei campionessa prima funziona 10 questo nel capitato potrebbe sempre questa cura c’ notizia scienza studiocos’ oggi acqua primo allerta mihajlovic casa cos’ è fanno foto tumore al tutto quello che mai benefici salute Average figli è un’ il suo quale sulla c’ è farmaci muore quello Top Clickbait uccide olio mamma vi quali malore 2naturale tempo allarme ecco tre giorni dieta attenzione malore l’ annuncio essere molto prezzo ogni ecco cosa salvarti nell’ scoperta ecco come fare opinioni e farmacia question malattia dal quali trova la moglie in farmacia sognare modo figlio farmacia sintomi bene video in casa choc sintomi succede poi quando capelli dolore ne infarto corpo fare dovete polmoni bere cellule parla la sua caso 4 il campione sua benefici cosa significa hanno ed casa morto mangiamo due al giorno prima di significa 5 coming fatto dai chi cosa malattia naturali sono le scoperto previene nello mai morte sapere incidente quali sono dovete uno le donne i sintomi donne quando causemese avere sotto choc cui hai se previene famiglia trucco incredibile sono i davvero uomo da non soli ti mangiamo rimedi 7 terribile di sognare 8 sotto più succede è mai si è rimedi la malattia mai capitato a cui auto è soli quali sono sottovalutare devi fare vi è capitato questo pulire Infrequent trucchi salvarti ecco perché segnali ti cibi devi ecco cosa se terribile 6 pulire alimenti Clickbait Frequency Infrequent Average Frequent Non-Clickbait document count: 481; word count: 5,642 Clickbait document count: 846; word count: 10,647 Figure 2: Frequency of words for both clickbait and non-clickbait categories. On the right, most frequent words for each class, and both (Characteristic). An interactive version of the graph can be accessed at the following link https://oaraque.github.io/clickIT/clickbait.html dio al fondo per i membri della troupe”, B.2. Dataset Excerpt Translation “new_headline”: “Russell Crowe ha fatto Table 9 includes the English translations for the Italian qualcosa di incredibile sul set di ‘Il Gladi- examples presented in Table 1. atore’: ha donato una parte significativa del suo stipendio al fondo per i membri della troupe”} C. Experimental Design Details Please ensure the output is formatted in JSON as specified and that all content is C.1. Question Generation in Italian. Questions were generated with ChatGPT gpt-3.5- Now do it for the following headline. turbo-0125 using the following prompt: Clickbait headline: “{headline}” You will be provided with a clickbait head- Article:“{article}” line written in Italian. Your task is to gen- erate a question that addresses any miss- ing or vague information in the headline. B. Additional Dataset Details Here are some examples: B.1. Dataset Visualisation Headline: Si chiama la benedizione di Dio: rimuove l’alta pressione, il diabete e il Figure 2 shows a frequency-based visualization of the grasso nel sangue Question: Che cosa dataset. It considers the frequency of appearance of rel- viene chiamato ’benedizione di Dio’? evant uni and bi-grams for both the clickbait and non- Headline: “Emorragia cerebrale”. Italia in clickbait categories. The figure shows common strategies apprensione per il suo campione: ricover- that are frequent in clickbait content, such as the use of ato in condizioni gravissime “ecco cosa” (this is what) or “quali sono” (what are) that can be seen in the lower right part. Question: Chi è il campione? Please generate the question in Italian, en- suring it seeks to clarify the ambiguous or incomplete details present in the headline. Category Headline Article Clickbait Spoiler Neutralised title Health Fruit or flower? Tasty We all know it, inevitable True The strawberry Strawberry: tasty and at- and attractive, a celebrity on our tables, world- tractive, a celebrity on on our tables, we reveal famous, but mysterious is our tables who she is its nature, fruit to enjoy or flower to decorate? Science Self-repairing metal dis- The recent experiment re- True Platinum The metal that repairs it- covered. Scientists as- vealed an extraordinary self: platinum tounded phenomenon... Health A disease that affects We are talking about True Psoriasis affects about Psoriasis: a disease that 500,000 people a chronic immune- 500,000 people affects about 500,000 peo- mediated systemic ple in Italy disease that affects about 1.8 million patients... Environment Mosquitoes, here’s how With the arrival of hot True To eliminate mosquitoes Mosquitoes, here’s how to get rid of them with- weather, mosquitoes also from your home once and to get rid of them with- out insecticides make their way into our for all, you should buy a out insecticides: just buy homes or gardens... bat a bat Table 9 Translated from the original Italian. An excerpt of the presented dataset showing the most relevant fields. Article bodies are shortened for space reasons. C.2. Spoiler Generation C.3. Fine-Tuning Details For the zero-shot spoiler generation task we employed The LLaMAntino-3-8B [30] model underwent training the following prompt: on a single Ampere A40 GPU with 48GB of memory, employing the QLoRA strategy with a low-rank approxi- Ti verranno forniti un titolo clickbait e il mation of 64, a low-rank adaptation of 16, and a dropout suo articolo corrispondente. Il titolo click- rate of 0.1. It was set to evaluate every 50 steps, with a bait di solito omette, o non esplicita, in- batch size of 4, across 3 epochs, using a learning rate of formazioni chiave per creare curiosità nel 10−4 . lettore. Estrai dall’articolo le informazioni In the clickbait detection experiments, the DistilBERT mancanti o vaghe nel titolo che servono and Llama3-8b models have been fine-tuned on the same per colmare questa curiosità. La risposta GPU. The DistilBERT model has been trained on 10 può essere un messaggio estremamente epochs with a learning rate of 2 ⋅ 10−4 . For the Llama3 coinciso oppure un elenco. Formatta la model, we have used QLoRa with the same characteris- risposta nel seguente modo. “Risposta: tics as described above, trained on two epochs, with a ” learning rate of 2 ⋅ 10−4 . Titolo: {headline} Articolo: {article} C.4. Neutralised Clickbait Generation The following system prompt (enriched with three exam- The same instruction was employed with the fine- ples) has been utilised with LLaMAntino-3-8B: tuned model. For few-shot generation of the spoiler, we enriched the instruction with two examples. Ti verrano forniti due testi: un titolo click- When casting spoiler generation as a Question An- bait e un testo, chiamato spoiler, che con- swering task, the following instruction was employed: tiene le informazioni mancanti nel titolo. Il tuo compito è di riscrivere il titolo Ti verrà fornita una domanda e un doc- clickbait integrando le informazioni dello umento. Trova nel documento le infor- spoiler. Il nuovo titolo deve essere infor- mazioni per rispondere alla domanda. La mativo, privo di toni sensazionalistici, e risposta può essere un messaggio conciso breve. Se Lo spoiler contine tante infor- oppure un elenco. Formatta la risposta mazioni, puoi riassumerle in concetti più nel seguente modo. “Risposta: ” generali. Titolo: {headline} Spoiler: {spoiler} D. Ethical Statement No specific ethical conflicts have been reported during the development of this work. The dataset was compiled from publicly available sources. It is important to ac- knowledge that the examples in this document are not indicative of the authors’ opinions or beliefs. Addition- ally, the ideas or assertions contained within these texts may be misleading or harmful; therefore, the dataset should be utilized strictly for research purposes.