=Paper=
{{Paper
|id=Vol-3878/90_main_long
|storemode=property
|title=To Click It or Not to Click It: An Italian Dataset for Neutralising Clickbait Headlines
|pdfUrl=https://ceur-ws.org/Vol-3878/90_main_long.pdf
|volume=Vol-3878
|authors=Daniel Russo,Oscar Araque,Marco Guerini
|dblpUrl=https://dblp.org/rec/conf/clic-it/0004AG24
}}
==To Click It or Not to Click It: An Italian Dataset for Neutralising Clickbait Headlines==
To Click it or not to Click it: An Italian Dataset for
Neutralising Clickbait Headlines
Daniel Russo1,2,∗ , Oscar Araque3 and Marco Guerini2
1
University of Trento, Trento, Italy
2
Fondazione Bruno Kessler, Trento, Italy
3
Universidad Politécnica de Madrid, Madrid, Spain
Abstract
Clickbait is a common technique aimed at attracting a reader’s attention, although it can result in inaccuracies and lead to
misinformation. This work explores the role of current Natural Language Processing methods to reduce its negative impact.
To do so, a novel Italian dataset is generated, containing manual annotations for classification, spoiling, and neutralisation of
clickbait. Besides, several experimental evaluations are performed, assessing the performance of current language models.
On the one hand, we evaluate the performance in the task of clickbait detection in a multilingual setting, showing that
augmenting the data with English instances largely improves overall performance. On the other hand, the generation tasks of
clickbait spoiling and neutralisation are explored. The latter is a novel task, designed to increase the informativeness of a
headline, thus removing the information gap. This work opens a new research avenue that has been largely uncharted in the
Italian language.
Keywords
clickbait, natural language processing, natural language generation, large language model, language resource
1. Introduction Although clickbait headlines are considered one of
the less harmful forms of fake news, as their main goal
Accuracy and truthfulness are essential characteristics of is to increase profit by driving traffic to their website
journalism. Nevertheless, in an effort to improve revenue, [6, 7], they can sometimes pose a danger, especially when
a large number of newspapers and magazines publish they deal with potentially harmful topics such as health
clickbait articles, a viral journalism strategy that seeks to and science. To address this problem, Natural Language
attract users to click on a link to a page through tactics Processing techniques have been widely employed to
such as sensationalist stories and catchy headlines that detect clickbait headlines, with a particular focus on the
act as bait. The use of these tactics harms the quality of English language [8, 9]. Hagen et al. [10] proposed the
news pieces and thus hinders the ability of citizens to clickbait spoiling task, i.e., the generation of a short text
obtain reliable and objective information. The literature that satisfies the curiosity induced by a clickbait post.
distinguishes between two main types of clickbait. (i) In light of this, this work addresses the issue of click-
Classical clickbait [1] embeds within the headlines infor- bait in the Italian language, studying its characteristics
mation gaps, also known as curiosity gaps [2, 3], in order and the possibilities of current technology to reduce its
to arouse curiosity in the reader that is forced to access negative impact. In doing so, we have generated a novel
the article’s content which is ultimately disappointing. Italian dataset that gathers a large collection of clickbait
Classical clickbait usually makes use of hyperbolic lan- articles, which is made public for the community to use 1 .
guage, caps lock, demonstrative pronouns and superla- We named the dataset ClickBaIT. This dataset contains
tive to grasp the user’s attention [1, 4, 5]. (ii) Deceptive manually annotated instances as clickbait/non-clickbait,
clickbait [5] refers to headlines that resemble traditional as well as manually generated spoilers and neutralised
media headlines by offering a summary of the article, still headlines. We have also performed a thorough multi-
leading to content that differs from the reader’s expec- lingual evaluation, exploiting the availability of English
tations. These headlines promise high news value but data to complement our dataset in the task of clickbait
deliver content with low news value, resulting in reader detection. Finally, this work also explores the use of our
disappointment. annotated dataset and large language models to auto-
CLiC-it 2024: Tenth Italian Conference on Computational Linguistics,
matically generate both spoilers and, as a novel task, a
Dec 04 — 06, 2024, Pisa, Italy neutralised version of clickbait headlines. A graphical
∗
Corresponding author. illustration of the experimental design is presented in
Envelope-Open drusso@fbk.eu (D. Russo); o.araque@upm.es (O. Araque); Figure 1.
guerini@fbk.eu (M. Guerini)
Orcid 0009-0006-9123-5316 (D. Russo); 0000-0003-3224-0001
(O. Araque); 0000-0003-1582-6617 (M. Guerini)
© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License 1
Attribution 4.0 International (CC BY 4.0). The dataset is available in https://github.com/oaraque/ClickBaIT
CEUR
ceur-ws.org
Workshop ISSN 1613-0073
Proceedings
1. CLICKBAIT DETECTION Quale malattia colpisce 3. CLICKBAIT NEUTRALISATION
500mila persone?
Question
Rewriting La psoriasi: una malattia
che colpisce circa
La psoriasi 500mila persone in Italia
Una malattia che colpisce
500mila persone HEADLINE SPOILER NEUTRALISED
HEADLINE
HEADLINE
HEADLINE HEADLINE
ARTICLE
2. SPOILER GENERATION
Figure 1: The experimental design is depicted, encompassing three tasks: clickbait detection, spoiler generation, and clickbait
neutralisation. The robot icon represents the language model used for either classification or generation. We utilized DistilBERT
and Llama3-8B for task 1, and LLaMAntino-3-8B for tasks 2 and 3. The models were tested for generative tasks using zero-shot,
few-shot, and fine-tuning configurations, except for question rewriting, for which we employed a few-shot approach.
2. Related Work 3. Dataset
The use of clickbait is common in many news outlets, 3.1. Dataset Creation
and thus it has been extensively studied.
There are several works that address clickbait detec- Data were collected from fourteen news websites2 , noto-
tion: Potthast et al. [8] collected a corpus of clickbait rious for acting as news aggregators, engaging in plagia-
articles, posted by well-known English-speaking newspa- rism, lacking fact-checking, and using sensational head-
pers on Twitter, and proposed a set of lexical and semantic lines to draw in readers. In all the websites, articles are
features to be used with a Random Forest classifier. Fol- labelled according to specific categories; we decided to
lowing the general trend in Natural Language Processing focus on four macro-categories: health, science, economy,
(NLP) field, clickbait detection has also been explored and environment. These categories have been selected
using deep learning methods, such as convolutional [11] to cover some of the most frequent - and potentially
and recurrent [12] neural networks, as well as more re- hazardous - domains where clickbait is usually found.
cent Transformer-based approaches [9]. Since the categories varied a lot from website to web-
Other works leveraged Natural Language Generations site, we manually mapped each category into one of the
(NLG) strategies to create a piece of text, the spoiler, com- four macro categories under analysis. Two annotators,
prising the information needed to fulfil the curiosity gap knowledgeable in the area, were then provided with the
present in clickbait headlines. This task was proposed by headlines and the related articles and were asked to la-
Fröbe et al. [13] with the name of spoiling generation. The bel whether a headline was clickbait. For aiding in this
authors created the Webis Clickbait Spoiling Corpus 2022, task, we have used as reference the clickbait measure as
and cast spoiler generation as a Question Answering task. computed by Arthur et al. [21]. Eventually, given the
Eventually, they open the challenge to the community clickbait dataset, the two annotators were required to
through a SemEval-2023 shared task [13, 14]. The op- extract the gold spoilers from the article’s text and to pro-
timal spoiler generator operates with five independent duce the neutralised forms for each headline. To this end,
sequence-to-sequence generative models. It selects the we employed an author reviewer strategy [22]: an LLM
best spoiler through a majority vote, determined by com- (ChatGPT gpt-3.5-turbo-0125 3 ) was used to generate
paring edit distances among the outputs [15]. both the spoilers and the neutralised forms (author com-
Regarding the languages studied, the majority of works ponent)4 , and the native Italian speaking annotators were
are based on English. Other works were performed in asked to manually post-edited the generations (reviewer
Chinese [16], Turkish [17, 18] and Spanish [19, 20]. To component).5 This procedure was proven to be more
the best of our knowledge, this is the first work that fully effective and less time-consuming than writing the data
addresses the study of clickbait detection and spoiling 2
Essere Informati, TGNewsItalia, Voxnews, DirettaNews, Informati,
in the Italian language. Moreover, we propose a novel
Italia, Jeda News, News Cronaca, TG5Stelle, TG24-ore, ByoBlu,
task, i.e., clickbait neutralisation, which aims at filling Mag24, WorldNotix, lo sapevi che, Fortementein
the curiosity gap by rewriting the headline levering the 3
https://chat.openai.com
4
information of the spoiler. In Appendix A.3 we provide the prompt employed
5
Details in Appendix A.2
Category Headline Article Clickbait Spoiler Neutralised title
Health Frutto o fiore? gusto- Tutti la conosciamo, im- True La fragola Fragola: gustosissima e
sissima e attraente, una mancabile sulle nostre attraente, una celebrità
celebrità sulle nostre tav- tavole, celebre in tutto sulle nostre tavole
ole, sveliamo chi è il mondo ma misteriosa
la sua natura, frutto da
gustare o fiore...
Science Scoperto un metallo che Il recente esperimento True Il platino Il metallo che si auto-
si auto-ripara. Scienziati ha rivelato un fenomeno ripara: il platino
sbalorditi straordinario...
Health Una malattia che colpisce Parliamo di una malat- True La psoriasi colpisce circa La psoriasi: una malattia
500mila persone tia sistemica cronica me- 500 mila persone che colpisce circa 500mila
diata dal sistema immu- persone in Italia
nitario che interessa...
Environment Zanzare, ecco come elim- Con l’arrivo del caldo, an- True Per eliminare una volta Zanzare, ecco come elim-
inarle senza insetticidi che le zanzare si fanno per tutte le zanzare dalla inarle senza insetticidi:
largo nelle nostre case o vostra casa, dovreste ac- basta acquistare un pip-
nei nostri giardini... quistare un pipistrello istrello
Table 1
An excerpt of the presented dataset showing the most relevant fields. Article bodies are shortened for space reasons. Translated
text can be found in Table 9 (Appendix B).
from scratch [23]. To assess the amount of post-editing 3.2. Dataset Analysis
required, we employed Human-targeted Translation Edit
The complete ClickBaIT dataset consists of 4,144 entries.
Rate [HTER; 24]. HTER quantifies the minimum edit
Each entry includes the following fields: (i) source web-
distance, which is the least number of editing operations
site, that specifies the source of the article; (ii) publica-
needed, between a machine-generated text and its post-
tion date, which is captured from the original source;
edited counterpart. HTER values exceeding 0.4 indicate
(iii) headline text; (iv) article text; (v) original URL;
low-quality outputs; under such circumstances, rewrit-
(vi) macro category inferred from the original category
ing the text from scratch or extensive post-editing would
extracted from the source; (vii) image URL associated
necessitate comparable effort [25].
with the article as specified in the source; (viii) clickbait
The obtained HTER results for the spoiler generation
annotation; (ix) the associated spoiler; and (x) the
(0.4) are higher than those computed upon the neutrali-
neutralised version of the title.
sation (0.3), in par or slightly lower than the 0.4 thresh-
Table 2 shows the main statistics of the final version
old. The high HTER values, especially for the spoiler
of the dataset. The golden set is manually annotated and
annotation, can be attributed to the model’s tendency
thus contains high-quality information. Additionally, the
to generate spoilers comprising more details than those
silver set has been annotated automatically as described
necessary to fill the curiosity gap. While in some cases
and therefore contains a larger number of instances.
a simple deletion was sufficient, in others the annotator
To gain a deeper understanding of the content of the
had to rewrite the spoiler almost completely. Regarding
dataset we have used Variationist [26], a tool that allows
the annotation of the neutralisation texts, the higher re-
to inspect useful statistics and patterns in textual data.
sults are a consequence of the spoiler generation, as the
Upon inspection of the data, we have detected several
model was required to generate them simultaneously.
patterns frequently used for generating the curiosity gap.
With this, we have generated the golden set of the
Of course, one of the most common strategies used in
dataset, in which all the instances were manually anno-
tated. Further details regarding the dataset creation can
be found in Appendix A. To expand this set, we have used Set Clickbait (%) Non-clickbait (%) Total
a clickbait classifier (see Sect. 4.1) to automatically detect
clickbait headlines. This new set of data, automatically Golden 698 (53%) 629 (47%) 1,327
Silver 1,563 (56%) 1,224 (44%) 2,787
annotated, constitutes the silver set of our dataset. Sev-
eral examples of dataset entries are provided in Table 1. Total 2,261 1,853 4,114
Table 2
Size of the presented dataset, considering both golden and
silver sets.
clickbait headlines is the formulation of a question that multilingual-cased 7 ) model trained in a multilingual
is later answered in the article, even though sometimes setting, and (ii) the Llama3-8B language model (meta-
it is not. In the instance “Quanto è green il gas? ” (How llama/Meta-Llama-3-8B 8 ). The composed dataset has
green is gas? ) the article explains that gas is not consid- been split into train and test splits, which have been used
ered green. Another frequent strategy we have detected to fine-tune and evaluate these models, respectively.
is the introduction to the content of the article, which To assess the effect of using a mixture of both En-
invites the reader to click it: Beve un cucchiaio di aceto di glish and Italian instances in the dataset, we evaluate
mele nell’acqua tutti i giorni, ecco cosa succede (Drinks a the performance of the two models in a monolingual
tablespoon of apple cider vinegar in water every day, this setting (e.g., fine-tuning in Italian and predicting in the
is what happens). same language) as well as the multilingual variant (e.g.,
Another usual pattern is the reference to enumerations, fine-tuning in English and Italian text, and predicting on
frequently using round and manageable numbers such as Italian instances).
10, 8, and 5. This can be done for introducing numbered
content, as in “Le 10 fantasie femminili più segrete” (The 4.2. Spoiler Generation
10 most secret female fantasies), or even to generate a re-
action in the reader: “Hai solo 10 secondi per salvarti. Ecco The spoiler generation task consists in generating a
cosa devi fare:” (You only have 10 seconds to save yourself. short message that fulfils the curiosity gap present in
Here’s what you have to do:). Other means can be used a given clickbait title, by extracting the information from
to make headlines noticeable, such as introducing text the linked article. To this end, we tested LLaMAntino-
in all caps, using striking vocabulary or even punctua- 3-ANITA-8B-Inst-DPO-ITA (LLaMAntino-3-8B here-
tion marks, as in “[ALLARME] Truffa AUTO USATE, fate after) [30] on our clickbait dataset. The model was tested
attenzione!” ([ALERT] USED CAR scam, beware!). both in in-context learning (zero- and few-shot) and fine-
See Table 8 (Appendix A.2) for a collection of patterns tuning settings.
that have been considered during the manual annotation Building on prior research that frames spoiler genera-
of the dataset. Besides, Appendix B includes a graphical tion as a Question Answering task [31], we prompt the
summary of the dataset, while its interactive version can model to rewrite clickbait headlines as questions and ex-
be accessed online.6 Details are provided in Appendix C. tract the corresponding answers, i.e., the spoilers, from
the linked articles.
4. Experimental Design 4.3. Clickbait Neutralisation
The experimental design comprises three steps: clickbait The best-performing configuration was employed for the
detection, spoiler generation and clickbait neutralisation. neutralisation of the clickbait headlines. To this end,
we instructed the LLM to perform a style transfer task,
4.1. Clickbait Detection from a clickbait headline style to a more journalistic one,
while integrating the spoiler information into the original
This is the first and most basic task aimed at addressing headline.
the clickbait phenomenon. To explore the effect of using
additional data in the training process, we use the Webis-
Clickbait-17 [27], an English dataset containing clickbait 5. Results and Discussion
that is also annotated in a binary fashion.
Following the insights by Araque et al. [28], we use the 5.1. Evaluation Metrics
training on English data to improve the classification of
Italian data. The main idea is to harness the availability Firstly, for the evaluation of the clickbait detection
of large amounts of English data, generating a compound task we use the macro-averaged precision, recall and
dataset with a lower amount of Italian instances. To do f-score. This allows us to assess the performance even
so, a multilingual mixture dataset is created so that 35% in an unbalanced scenario. For the generation tasks, we
of the final dataset comprises Italian instances, while the assessed lexical similarity through ROUGE score [32]
rest are in English. and semantic similarity. For the latter, text embed-
We model the detection challenge as a binary clas- dings, computed 9
using sentence-bert-base-italian-
sification task: clickbait/non-clickbait. To study the xxl-uncased , were compared using cosine similarity.
complexity of the task, we explore two different models
for classification: (i) a DistilBERT [29] (distil-base- 7 https://huggingface.co/distilbert-base-multilingual-cased
8
https://huggingface.co/meta-llama/Meta-Llama-3-8B
9
https://huggingface.co/nickprock/sentence-bert-base-italian-xxl-
6
https://oaraque.github.io/ClickBaIT/clickbait.html uncased
zero-shot few-shot fine-tuning
R1 RL SemSim R1 RL SemSim R1 RL SemSim
headlines 0.189 0.157 0.567 0.250 0.221 0.667 0.260 0.234 0.659
questions 0.271 0.249 0.645 0.286 0.258 0.630 0.250 0.224 0.646
Table 3
LLaMAntino-3-8B results for the spoiler generation task. We report ROUGE 1 and L (R1, RL) and semantic similarity (SemSim).
5.2. Clickbait detection few examples provided in the few-shot approach, which
make the model aware of the task while allowing more
Table 4 shows the results of the evaluation in the task
creative outputs (resulting in lower ROUGE scores). Con-
of clickbait classification. As expected, introducing data
versely, the fine-tuned model learned from the training
instances in English improves the performance in Italian.
data to adhere more closely to the source article, which
In the case of classification in Italian, we see a staggering
comes at the expense of producing semantically richer
improvement for the Llama3 model of 8.43 points. This
responses (evidenced by lower SemSim scores).
further supports previous results [28]. We argue that
Interestingly, casting spoiler generation as a question-
augmenting the training set with instances in a diverse
answering task yields higher results in the zero-shot set-
language is an effective strategy that can be generalised
ting compared to using headlines as input. However, the
to other tasks.
results for few-shot and fine-tuning scenarios tend to be
We also see that the best model for the classification
on par. This can be explained by the fact that headlines
of clickbait is the one obtained with Llama3, trained with
may contain multiple gaps that the human-annotated
both English and Italian data. Hence, we use this model
dataset accounted for, but the non-supervised “question
to predict on the silver set of our dataset.
generation” module could not fully capture. Generally,
this approach leads to sufficiently good results; however,
Test Train Model Prec. Rec. M-F1 we believe that more attention should be given to the
DistilBERT 67.15 70.34 66.94 quality of the questions, either through more efficient
EN
Llama3 68.42 66.46 67.18 prompts or with human-generated/curated data.
EN
DistilBERT 70.28 70.14 70.12
EN+IT
Llama3 71.20 71.15 71.15 5.4. Clickbait Neutralisation Results
DistilBERT 68,85 70.47 68.65
IT In Table 5, we report the results for clickbait neutralisa-
Llama3 66.96 67.19 67.07
IT tion. For this task, we prompted LLaMAntino-3-8B with
DistilBERT 72.87 74.85 71.77 a few-shot approach, employing the spoilers generated
EN+IT
Llama3 76.32 75.51 75.50 with the three configurations of the previous experiments
Table 4 (headlines as input). Using spoilers generated with the
Results for Clickbait detection. The ‘Test’ and ‘Train’ columns fine-tuned models leads to higher results both for lexi-
indicate the languages of the test and train sets, respectively. cal and semantic metrics. Interestingly, scores tend to
increase when the training complexity of the input data
increases. In Table 6 we report examples of headlines
along with their generated spoilers (through the fine-
5.3. Spoiler Generation Results tuned model) and their neutralisation.
Results for the spoiler generation task are reported in Ta-
ble 3. We evaluated the capabilities of LLaMAntino-3-8B input data R1 RL SemSim
in both in-context learning scenarios (zero- and few-shot) zero-shot 0.250 0.212 0.675
and through fine-tuning. As inputs, we used clickbait few-shot 0.265 0.223 0.706
headlines and questions generated by ChatGPT, instruct- fine-tuning 0.286 0.247 0.715
ing the model to execute a Question Answering task for
Table 5
the latter. When using headlines as input, few-shot and Neutralisation generation results. Automatically generated
fine-tuning approaches outperform zero-shot methods. spoilers from the previous experiments were used as input for
Few-shot approaches demonstrate higher performance the few-shot generation of the data. We report ROUGE 1 and
in terms of semantic similarity, while fine-tuning exhibits L (R1 and RL) and the semantic similarity scores.
stronger lexical adherence to the source document, as
reflected in ROUGE scores. This can be attributed to the
Headline Spoiler Neutralisation
“Juventus in Serie B”: perché c’è 15 punti di penalizzazione Juventus in grave difficoltà: 15
panico tra i tifosi, la scoperta punti di penalizzazione e il ris-
delle ultime ore chio di cadere in Serie B
Lutto tremendo nello sport ital- “Samuel Dilas era un giocatore di pallacanestro che mili- Tragico decesso del palla-
iano, morto giovanissimo dopo tava nel Virtus Lumezzane a Brescia, in Serie B” e “aveva canestrista Samuel Dilas, 24
un malore 24 anni” e “era alto 206 centimetri” e “nato a Novellara anni, ex convalescente da
(Reggio Emilia)” e “aveva un padre di nome Torsen, una polmonite e giocatore della
madre di nome Chiara e una sorella minore di nome Maia” Virtus Lumezzane
e “era in convalescenza dopo una polmonite” e “era ar-
rivato alla Virtus Lumezzane nella scorsa stagione”.
Un papà si rifiuta di mangiare Il cameriere ha sbattuto fuori il maleducato padre che Un cameriere espelle un cliente
accanto a un bambino Down di voleva essere spostato a causa della presenza di un bam- maleducato che chiede di es-
5 anni, il cameriere decide di bino con sindrome di Down. sere spostato per non sedersi
fare questo accanto a un bambino con sin-
drome di Down.
E’ doloroso e si forma tra le dita “L’occhio di pernice è causato principalmente dalla pres- Come rimuovere l’occhio di per-
dei piedi, ecco come rimuoverlo sione della scarpa che favorisce la formazione di un’ispes- nice, un problema di pressione
simento di pelle che provoca dolore, in quanto è soggetto e attrito causato dalle scarpe
all’attrito tra le dita. Per rimuovere l’occhio di pernice
è fondamentale ammorbidire prima la zona interessata
per poi provare a rimuovere l’ispessimento utilizzando
rimedi naturali senza dolore e in modo semplice.”
La chiamano “LA BOMBA” la Lo zenzero è un rimedio naturale per il trattamento di Miscela naturale di limone, zen-
miscela che in sole 24-48 ore tosse, raffreddore e influenza. La miscela limone, zenzero zero e miele allevia i sintomi di
elimina influenza, raffreddore e e miele è ideale per alleviare i sintomi delle comuni malat- tosse, raffreddore e influenza in
tosse tie. Basta prendere 2 o 3 cucchiai della miscela naturale, pochi giorni.
riempire una tazza con acqua calda e lasciare in infusione
per 3 o 4 minuti.
Table 6
Examples of clickbait headlines, along with the automatically generated spoiler and neutralised version.
6. Conclusion considering certain sensitive domains such as health.
Thus, we hope that this work facilitates future research
This work presents ClickBaIT, a novel Italian dataset on the topic for example, by addressing the link between
for clickbait modelling, as well as a diverse set of ex- clickbait and misinformation, considering both in a uni-
periments to assess the effectiveness of current models fied framework.
for clickbait detection, spoiling and neutralisation. The
dataset includes news articles that have been manually
annotated to indicate the presence of clickbait, spoilers Acknowledgments
associated with clickbait headlines, and their respective
neutral headlines. This work was partly supported by: the AI4TRUST
The experiments explore the effectiveness of current project - AI-based-technologies for trustworthy solu-
NLP methods for the modelling of clickbait headlines in tions against disinformation (ID: 101070190), the Euro-
Italian through ClickBaIT. The evaluation for clickbait pean Union’s CERV fund under grant agreement No.
detection shows how training data can be augmented in 101143249 (HATEDEMICS), the European Union’s Hori-
a multilingual setting, which leads to classification im- zon Europe research and innovation programme un-
provements that are in line with previous research [28]. der grant agreement No. 101135437 (AI-CODE). Oscar
The generation experiments, for both spoiling and neu- Araque acknowledges the support of the project UNICO
tralisation, evidence that the evaluated model does ben- I+D Cloud - AMOR, financed by the Ministry of Eco-
efit from in-domain knowledge extracted from the pro- nomic Affairs and Digital Transformation, and the Euro-
posed dataset. As seen, these informed generations are pean Union through Next Generation EU; as well as the
more accurate and align better with the golden text. support of the project CPP2023-010437 financed by the
Considering the effect of clickbait, we argue that while MCIN / AEI / 10.13039/501100011033 / FEDER, UE.
there are initially harmless articles, lack of accuracy can
have a detrimental effect on readers. This is clear when
References v1/2022.acl- long.484 .
[11] A. Agrawal, Clickbait detection using deep learn-
[1] K. Scott, You won’t believe what’s in this ing, in: 2016 2nd International Conference on Next
paper! clickbait, relevance and the curios- Generation Computing Technologies (NGCT), 2016,
ity gap, Journal of Pragmatics 175 (2021) pp. 268–272. doi:10.1109/NGCT.2016.7877426 .
53–66. URL: https://www.sciencedirect.com/ [12] S. Kaur, P. Kumar, P. Kumaraguru, Detecting
science/article/pii/S0378216621000229. doi:https: clickbaits using two-phase hybrid cnn-lstm biterm
//doi.org/10.1016/j.pragma.2020.12.023 . model, Expert Systems with Applications 151 (2020)
[2] J. N. Blom, K. R. Hansen, Click bait: Forward- 113350. URL: https://www.sciencedirect.com/
reference as lure in online news headlines, science/article/pii/S0957417420301755. doi:https:
Journal of Pragmatics 76 (2015) 87–100. //doi.org/10.1016/j.eswa.2020.113350 .
URL: https://www.sciencedirect.com/science/ [13] M. Fröbe, B. Stein, T. Gollub, M. Hagen, M. Pot-
article/pii/S0378216614002410. doi:https: thast, SemEval-2023 task 5: Clickbait spoiling,
//doi.org/10.1016/j.pragma.2014.11.010 . in: A. K. Ojha, A. S. Doğruöz, G. Da San Mar-
[3] G. Loewenstein, The psychology of curiosity: A tino, H. Tayyar Madabushi, R. Kumar, E. Sar-
review and reinterpretation, Psychological Bulletin tori (Eds.), Proceedings of the 17th Interna-
116 (1994) 75–98. doi:10.1037/0033- 2909.116.1. tional Workshop on Semantic Evaluation (SemEval-
75 . 2023), Association for Computational Linguis-
[4] K. Scott, R. Jackson, When everything stands out, tics, Toronto, Canada, 2023, pp. 2275–2286.
nothing does, Relevance theory, figuration, and URL: https://aclanthology.org/2023.semeval-1.312.
continuity in pragmatics 8 (2020) 167–192. doi:10.18653/v1/2023.semeval- 1.312 .
[5] K. Scott, “deceptive” clickbait headlines: Relevance, [14] A. K. Ojha, A. S. Doğruöz, G. Da San Martino,
intentions, and lies, Journal of Pragmatics 218 H. Tayyar Madabushi, R. Kumar, E. Sartori (Eds.),
(2023) 71–82. URL: https://www.sciencedirect.com/ Proceedings of the 17th International Workshop
science/article/pii/S0378216623002643. doi:https: on Semantic Evaluation (SemEval-2023), Asso-
//doi.org/10.1016/j.pragma.2023.10.004 . ciation for Computational Linguistics, Toronto,
[6] S. Zannettou, M. Sirivianos, J. Blackburn, Canada, 2023. URL: https://aclanthology.org/2023.
N. Kourtellis, The web of false information: semeval-1.0.
Rumors, fake news, hoaxes, clickbait, and various [15] H. Kurita, I. Ito, H. Funayama, S. Sasaki, S. Moriya,
other shenanigans, J. Data and Information Quality Y. Mengyu, K. Kokuta, R. Hatakeyama, S. Sone,
11 (2019). URL: https://doi.org/10.1145/3309699. K. Inui, TohokuNLP at SemEval-2023 task 5:
doi:10.1145/3309699 . Clickbait spoiling via simple Seq2Seq generation
[7] E. Aïmeur, S. Amri, G. Brassard, Fake news, disin- and ensembling, in: A. K. Ojha, A. S. Doğruöz,
formation and misinformation in social media: a G. Da San Martino, H. Tayyar Madabushi, R. Ku-
review, Social Network Analysis and Mining 13 mar, E. Sartori (Eds.), Proceedings of the 17th
(2023) 30. International Workshop on Semantic Evaluation
[8] M. Potthast, S. Köpsel, B. Stein, M. Hagen, Clickbait (SemEval-2023), Association for Computational Lin-
detection, in: Advances in Information Retrieval: guistics, Toronto, Canada, 2023, pp. 1756–1762.
38th European Conference on IR Research, ECIR URL: https://aclanthology.org/2023.semeval-1.243.
2016, Padua, Italy, March 20–23, 2016. Proceedings doi:10.18653/v1/2023.semeval- 1.243 .
38, Springer, 2016, pp. 810–817. [16] T. Liu, K. Yu, L. Wang, X. Zhang, H. Zhou,
[9] P. Rajapaksha, R. Farahbakhsh, N. Crespi, Bert, X. Wu, Clickbait detection on wechat: A deep
xlnet or roberta: The best transfer learning model integrating semantic and syntactic infor-
model to detect clickbaits, IEEE Access 9 mation, Knowledge-Based Systems 245 (2022)
(2021) 154704–154716. doi:10.1109/ACCESS.2021. 108605. URL: https://www.sciencedirect.com/
3128742 . science/article/pii/S0950705122002714. doi:https:
[10] M. Hagen, M. Fröbe, A. Jurk, M. Potthast, Click- //doi.org/10.1016/j.knosys.2022.108605 .
bait spoiling via question answering and pas- [17] Şura Genç, E. Surer, Clickbaittr: Dataset for click-
sage retrieval, in: S. Muresan, P. Nakov, bait detection from turkish news sites and social me-
A. Villavicencio (Eds.), Proceedings of the 60th dia with a comparative analysis via machine learn-
Annual Meeting of the Association for Com- ing algorithms, Journal of Information Science 49
putational Linguistics (Volume 1: Long Pa- (2023) 480–499. doi:10.1177/01655515211007746 .
pers), Association for Computational Linguistics, [18] A. Geçkil, A. A. Müngen, E. Gündogan, M. Kaya,
Dublin, Ireland, 2022, pp. 7025–7036. URL: https:// A clickbait detection method on news sites, in:
aclanthology.org/2022.acl-long.484. doi:10.18653/ 2018 IEEE/ACM International Conference on Ad-
vances in Social Networks Analysis and Min- Exploring multifaceted variation and bias in writ-
ing (ASONAM), 2018, pp. 932–937. doi:10.1109/ ten language data, arXiv preprint arxiv:2406.17647
ASONAM.2018.8508452 . (2024). URL: https://arxiv.org/abs/2406.17647.
[19] C. Oliva, I. Palacio-Marín, L. F. Lago-Fernández, [27] M. Potthast, T. Gollub, K. Komlossy, S. Schuster,
D. Arroyo, Rumor and clickbait detection by M. Wiegmann, E. Garces Fernandez, M. Hagen,
combining information divergence measures and B. Stein, Crowdsourcing a Large Corpus of Clickbait
deep learning techniques, in: Proceedings of on Twitter, in: E. Bender, L. Derczynski, P. Isabelle
the 17th International Conference on Availabil- (Eds.), 27th International Conference on Compu-
ity, Reliability and Security, ARES ’22, Association tational Linguistics (COLING 2018), Association
for Computing Machinery, New York, NY, USA, for Computational Linguistics, 2018, pp. 1498–1507.
2022. URL: https://doi.org/10.1145/3538969.3543791. URL: https://aclanthology.org/C18-1127/.
doi:10.1145/3538969.3543791 . [28] O. Araque, M. F. L. Corniel, K. Kalimeri, Towards a
[20] I. García-Ferrero, B. Altuna, Noticia: A clickbait multilingual system for vaccine hesitancy using a
article summarization dataset in spanish, arXiv data mixture approach., in: Proceedings of the 9th
preprint arXiv:2404.07611 (2024). Italian Conference on Computational Linguistics,
[21] T. E. C. L. Arthur, A. T. Cignarella, S. Frenda, M. Lai, 2023.
M. A. Stranisci, A. Urbinati, et al., Debunker assis- [29] V. Sanh, L. Debut, J. Chaumond, T. Wolf, Distilbert,
tant: a support for detecting online misinformation, a distilled version of bert: smaller, faster, cheaper
in: Proceedings of the Ninth Italian Conference and lighter, arXiv preprint arXiv:1910.01108 (2019).
on Computational Linguistics (CLiC-it 2023), vol- [30] M. Polignano, P. Basile, G. Semeraro, Advanced
ume 3596, Federico Boschetti, Gianluca E. Lebani, natural-based interaction for the italian language:
Bernardo Magnini, Nicole Novielli, 2023, pp. 1–5. Llamantino-3-anita, 2024. arXiv:2405.07101 .
[22] S. S. Tekiroğlu, Y.-L. Chung, M. Guerini, Generat- [31] M. Woźny, M. Lango, Generating clickbait spoilers
ing counter narratives against online hate speech: with an ensemble of large language models, arXiv
Data and strategies, in: D. Jurafsky, J. Chai, preprint arXiv:2405.16284 (2024).
N. Schluter, J. Tetreault (Eds.), Proceedings of the [32] C.-Y. Lin, Rouge: A package for automatic eval-
58th Annual Meeting of the Association for Com- uation of summaries, in: Text summarization
putational Linguistics, Association for Computa- branches out, 2004, pp. 74–81.
tional Linguistics, Online, 2020, pp. 1177–1190. URL:
https://aclanthology.org/2020.acl-main.110. doi:10.
18653/v1/2020.acl- main.110 .
[23] D. Russo, S. Kaszefski-Yaschuk, J. Staiano,
M. Guerini, Countering misinformation via
emotional response generation, in: H. Bouamor,
J. Pino, K. Bali (Eds.), Proceedings of the 2023
Conference on Empirical Methods in Natural Lan-
guage Processing, Association for Computational
Linguistics, Singapore, 2023, pp. 11476–11492. URL:
https://aclanthology.org/2023.emnlp-main.703.
doi:10.18653/v1/2023.emnlp- main.703 .
[24] M. Snover, B. Dorr, R. Schwartz, L. Micciulla,
J. Makhoul, A study of translation edit rate with tar-
geted human annotation, in: Proceedings of the 7th
Conference of the Association for Machine Trans-
lation in the Americas: Technical Papers, Associa-
tion for Machine Translation in the Americas, Cam-
bridge, Massachusetts, USA, 2006, pp. 223–231. URL:
https://aclanthology.org/2006.amta-papers.25.
[25] M. Turchi, M. Negri, M. Federico, Coping with the
subjectivity of human judgements in MT quality
estimation, in: Proceedings of the Eighth Workshop
on Statistical Machine Translation, Association for
Computational Linguistics, Sofia, Bulgaria, 2013, pp.
240–251. URL: https://aclanthology.org/W13-2231.
[26] A. Ramponi, C. Casula, S. Menini, Variationist:
scienza insetti, animali, AI, scienza, smartphone, Spazio, tecnologia, TECNOLOGIE, SCIENZA, ufo, biochimica,
eclissi, bomba atomica, terra piatta, idroelettrico, temperatura, coltivazione, robot, fisica quantistica,
macchie solari, ricerca, vulcano, titanio, universo, fotovoltaico, intelligenza, iPhone, hacker, microonde,
motori di ricerca, onde elettromagnetiche, tecnologia, sole, scienza, radioterapia, pesticidi, armi
chimiche, comete, case farmaceutiche, psichiatria, smartphone, formiche, elettrodomestici, solare,
macrobiologi, mondo, lampadine a basso consumo, tecnologia, scienze-e-tech, scienza, scienza,
innovazione, scienza, tecnologia-2, animali intelligenti, funzione cognitiva, microchip, cani, samsung,
wi fi, tecnologia-e-tv, SCIENZE, TECNOLOGIA, bioetica, biologia, fisica, covid, coronavirus
salute Salute, CORONAVIRUS, VAIOLO SCIMMIE, TUBERCOLOSI, SALUTE, SCABBIA, AIDS, salute, hiv,
cocaina, antidepressivi, veleni, infezioni, carne, tabacco, infibulazione, fluoro, alcool, alimentari, aids,
antibatterico, dieta, insetticida, cibo, benessere, farmaci, digitopressione, caffè, sigarette, ministero
della salute, autismo, limoni, cure naturali, paracetamolo, cancro, antiossidante, droga, olio, medicina
alternativa, fragole, vegetariano, eroina, dislessia, veleno, zenzero, virus, psicologia, biologico, magne-
sio, frutta, psicofarmaci, pollo al cloro, fiori di bach, medico, sonno, birra, vitamina e, ulivi, proteine,
stress, banana, pensieri negativi, tumori, benzodiazepine, latte, miele, cuore, epilessia, longevità, mari-
juana, diabete, sale, ibernazione, vecchiaia, fegato, vegan, prevenzione, dentifricio, cervello, sistema
immunitario, sodio, suicidio, rimedi naturali, maltempo, canapa, pillola, mal di gola, depressione,
psiche, alimentazione, ebola, aspartame, dentifricio senza fluoro, tiroide, mangiare, cure proibite,
Alzheimer, smog, gas, malattie, calamità, mammografia, verdura, aloe, masticazione, farmaco, igiene,
batteri, medicina, vitamina c, epatite c, forfora, energia, vaccini, ormoni, flora batterica, sorbitolo,
antibiotici, piedi, obesità, arsenico, cortisolo, chemioterapia, contraccezione, Neurotrasmettitori, semi,
melograno, celiachia, Coca cola, salute-benessere, salute, salute-e-benessere, bellezza, dimagrante,
benessere, salute-benessere, rimedi-naturali, pianeta-mamma, grano antico, acqua ossigenata, alimet-
nazione, ansia, dentisti, curcuma, casa-e-cucina, hobby-e-sport, SPORT, crescita-consapevolezza,
la-salute-che-viene, sport, stile-di-vita, consigli, lifestyle, pomodori
ambiente Cambiamenti climatici, energia, energia elettrica, Natura, AMBIENTE, ECOLOGIA, global warming,
geoingegneria, alberi, pianeta terra, natura, inquinamento, mare, terra, manipolazione climatica,
clima, rinnovabili, Dissesto idrogeologico, ecologia, ambiente, green, ambiente-attuale, ecologia,
salute-benessere, natura, ambiente, METEO, tempesta solare, astronomia, acido
economia affari-online, economia, ECONOMIA, consumi-risparmi, microchip r-fid, bollo auto, tasso d’interesse,
finanza, bollette, banche, profitto, spese, economia-finanza, economia, economia, economia-dellanima,
fisco-e-tasse, economia, economia, economia, economia-e-finanza
Table 7
Split of the categories into the four macro-categories.
A. Dataset Creation Details notators received both a score indicating how much the
headline was clickbait and automatic ChatGPT gpt-3.5-
A.1. Category Assessment turbo-0125 generated suggestions for the spoilers and
the neutralized versions of the headlines. Below, we have
In Table 7 we report how the heterogeneous categories
outlined the annotation guidelines that the annotators
scraped directly from the misleading websites were di-
were to follow.
vided into the four macro-category of scienza (science),
salute (well-being), ambiente (environment), economia
(economy). Clickbait labelling In order to select the clickbait
headlines present in the scraped data, the annotators
were provided with specific guidelines. Table 8 provides
A.2. Annotation Guidelines the main key points taken into consideration in order to
Three components of our datasets were subject to human label the data.
intervention to: (i) determine if the headline was click-
bait, (ii) identify the related article’s spoiler, that is, the Spoiler post-editing For the post-editing of the
information required to satisfy the curiosity gap within spoiler the annotator was required to spot in the headline
the headline, and (iii) revise the headline to include the the information gap and to check if the generated spoiler
spoiler information, thereby neutralizing it. During all was providing that information checking the related ar-
three annotation stages, we employed a machine-human ticle. If the model failed to find the proper spoiler, the
collaboration to expedite the work of annotators. The an- annotator had to rewrite it sticking as much as possible to
Characteristic Original example (IT) Translated example (EN)
Lack of essential information, “Ora riposa in pace”. Calcio in lutto, morto “Now rest in peace”. Football in mourning, one
i.e., the subject the article is talk- uno dei grandi protagonisti dell’Italia of Italy’s great protagonists dead
ing about
Sensationalist tone Fan ubriaca le salta addosso sul palco. La Drunk fan jumps on her on stage. Her reaction
sua reazione è incredibile e sconvolge tutti i is incredible and shocks everyone present
presenti
Questions raised but answered Tratti della nostra colonna: quali sono? Come Traits of our column: what are they? How to
in the article body evitare lesioni? avoid injuries?
Enumeration of elements 10 cibi per sbarazzarsi del gonfiore di stomaco 10 foods to get rid of bloated stomach and
e pancia tummy
Use of capitalization INFARTO: sopravvivere quando si è soli. Hai HEART ATTACK: surviving when alone. You
solo 10 secondi per salvarti. Ecco cosa devi only have 10 seconds to save yourself. Here’s
fare: what you have to do:
Introduction of the content Zanzare, ecco come eliminarle senza insetti- Mosquitoes, this is how to eliminate them with-
without actually giving the in- cidi out insecticide
formation
Use of quotations that do not Omicron, Ilaria Capua: “Ecco perché i vacci- Omicron, Ilaria Capua: “This is why the vacci-
give information nati si infettano di più rispetto a prima” nated get more infected than before”
Table 8
Key points used for the annotation of the dataset. Please note that some instances can exemplify more than one point.
the document’s text. If the spoiler was correct but added The clickbait headline typically omits key
extra info, the annotator had to keep those extra informa- information to create a curiosity gap for
tion only if those were essential for having a complete the reader. Your task is to extract this
headline. If the spoiler was correct, then the annotator missing information, known as a “spoiler,”
could leave it as it was. from the article’s text. The spoiler can be
a single keyword, a short text passage, or a
Neutralised Clickbait Post-Editing The annotator list of keywords. Once you have identified
was required to check if the neutralised forms comprises the spoiler, rewrite the clickbait headline
both the headline and the spoiler information. If the by incorporating this information to elim-
spoiler was very long (e.g., long listing), then the anno- inate the curiosity gap. The output must
tator had to summarise the spoiler as much as possible be in JSON format and written in Italian.
aiming to embed in the final novel headline enough in- The JSON should include two entries: one
formation to reduce or remove the information gap. If called “spoiler” that contains the extracted
the model failed at addressing the spoiler information in spoiler(s), and another called “new_head-
the neutralised version of the headline, then the anno- line” that has the revised headline.
tator had to manually add it. Moreover, the annotator Example Input:
was required to remove sensationalist tones as much as
Clickbait headline: “Questo attore ha
possible, if this tone was still creating useless curiosity
fatto qualcosa di incredibile sul set di un
in the reader.
famoso film!” Article: “Durante le riprese
del film ‘Il Gladiatore’, l’attore Russell
A.3. Author Component Instruction Crowe ha deciso di fare un gesto di grande
Hereafter, we provide the instruction employed to au- generosità donando una parte significa-
tomatically generate spoilers and the neutralised ver- tiva del suo stipendio al fondo per i mem-
sions of the clickbait headlines through ChaGPT gpt- bri della troupe.”
3.5-turbo-0125 . Example Output:
{“spoiler”: “Russell Crowe ha donato
I have a clickbait headline and its corre- una parte significativa del suo stipen-
sponding article, both written in Italian.
Search the chart
Top Non-Clickbait Characteristic
Non-Clickbait Frequency
sinner lutto
e la
Frequent
perina grazie tumore
i di
con del l’ coronavirus covid
il
italia ha gli le question contro infarto
coronavirus un’nel che
all’ dei lo anche vaccino
contro alla su cosa
ma italia sintomi
sinner anche rischio dell’ lutto
italiani d’
grazie ospedale come si mondo covid morto cancro dopo più ecco all’ scoperta
così ’
sarà dove ora ci sarà scoperto
in lutto nella tumore ai sul
è morto euro alle tutti
primo l' vaccino addio alla campionessa
nuova 3 tra |
virus giorno fa italiano sui muore
sui storia consigli suo
arriva vita
mi può delle sport
% vialli governo cibi
governo sta solo
allerta moglie perché
contro il dall’ europa gianluca italiana nuovo tumori italiani segnali
nei campionessa prima
funziona 10 questo nel capitato
potrebbe sempre questa
cura c’ notizia
scienza studiocos’ oggi acqua primo allerta
mihajlovic casa
cos’ è fanno foto tumore al tutto quello che mai benefici
salute
Average
figli è un’ il suo quale sulla
c’ è farmaci muore quello Top Clickbait uccide
olio mamma vi quali
malore 2naturale tempo allarme ecco
tre giorni dieta attenzione malore
l’ annuncio essere molto prezzo ogni ecco cosa salvarti
nell’ scoperta ecco come fare
opinioni e farmacia question malattia dal quali
trova la moglie in farmacia sognare
modo figlio farmacia sintomi
bene video in casa choc sintomi succede
poi quando
capelli dolore ne infarto corpo fare
dovete polmoni
bere cellule parla la sua caso 4 il campione sua
benefici cosa significa hanno ed
casa morto
mangiamo due al giorno prima di significa 5
coming fatto dai chi cosa malattia
naturali sono le scoperto
previene nello mai
morte sapere incidente quali sono dovete
uno le donne i sintomi donne quando
causemese avere
sotto choc
cui
hai
se previene
famiglia
trucco incredibile
sono i davvero uomo da non soli ti mangiamo
rimedi 7 terribile di sognare 8 sotto più
succede
è mai si è rimedi
la malattia mai capitato
a cui
auto
è soli quali sono sottovalutare
devi fare vi è capitato questo pulire
Infrequent
trucchi salvarti
ecco perché segnali ti
cibi devi ecco cosa se terribile
6 pulire alimenti
Clickbait Frequency
Infrequent Average Frequent
Non-Clickbait document count: 481; word count: 5,642
Clickbait document count: 846; word count: 10,647
Figure 2: Frequency of words for both clickbait and non-clickbait categories. On the right, most frequent words
for each class, and both (Characteristic). An interactive version of the graph can be accessed at the following link
https://oaraque.github.io/clickIT/clickbait.html
dio al fondo per i membri della troupe”, B.2. Dataset Excerpt Translation
“new_headline”: “Russell Crowe ha fatto
Table 9 includes the English translations for the Italian
qualcosa di incredibile sul set di ‘Il Gladi-
examples presented in Table 1.
atore’: ha donato una parte significativa
del suo stipendio al fondo per i membri
della troupe”} C. Experimental Design Details
Please ensure the output is formatted in
JSON as specified and that all content is C.1. Question Generation
in Italian.
Questions were generated with ChatGPT gpt-3.5-
Now do it for the following headline. turbo-0125 using the following prompt:
Clickbait headline: “{headline}”
You will be provided with a clickbait head-
Article:“{article}” line written in Italian. Your task is to gen-
erate a question that addresses any miss-
ing or vague information in the headline.
B. Additional Dataset Details Here are some examples:
B.1. Dataset Visualisation Headline: Si chiama la benedizione di Dio:
rimuove l’alta pressione, il diabete e il
Figure 2 shows a frequency-based visualization of the grasso nel sangue Question: Che cosa
dataset. It considers the frequency of appearance of rel- viene chiamato ’benedizione di Dio’?
evant uni and bi-grams for both the clickbait and non-
Headline: “Emorragia cerebrale”. Italia in
clickbait categories. The figure shows common strategies
apprensione per il suo campione: ricover-
that are frequent in clickbait content, such as the use of
ato in condizioni gravissime
“ecco cosa” (this is what) or “quali sono” (what are) that
can be seen in the lower right part. Question: Chi è il campione?
Please generate the question in Italian, en-
suring it seeks to clarify the ambiguous or
incomplete details present in the headline.
Category Headline Article Clickbait Spoiler Neutralised title
Health Fruit or flower? Tasty We all know it, inevitable True The strawberry Strawberry: tasty and at-
and attractive, a celebrity on our tables, world- tractive, a celebrity on
on our tables, we reveal famous, but mysterious is our tables
who she is its nature, fruit to enjoy
or flower to decorate?
Science Self-repairing metal dis- The recent experiment re- True Platinum The metal that repairs it-
covered. Scientists as- vealed an extraordinary self: platinum
tounded phenomenon...
Health A disease that affects We are talking about True Psoriasis affects about Psoriasis: a disease that
500,000 people a chronic immune- 500,000 people affects about 500,000 peo-
mediated systemic ple in Italy
disease that affects about
1.8 million patients...
Environment Mosquitoes, here’s how With the arrival of hot True To eliminate mosquitoes Mosquitoes, here’s how
to get rid of them with- weather, mosquitoes also from your home once and to get rid of them with-
out insecticides make their way into our for all, you should buy a out insecticides: just buy
homes or gardens... bat a bat
Table 9
Translated from the original Italian. An excerpt of the presented dataset showing the most relevant fields. Article bodies are
shortened for space reasons.
C.2. Spoiler Generation C.3. Fine-Tuning Details
For the zero-shot spoiler generation task we employed The LLaMAntino-3-8B [30] model underwent training
the following prompt: on a single Ampere A40 GPU with 48GB of memory,
employing the QLoRA strategy with a low-rank approxi-
Ti verranno forniti un titolo clickbait e il mation of 64, a low-rank adaptation of 16, and a dropout
suo articolo corrispondente. Il titolo click- rate of 0.1. It was set to evaluate every 50 steps, with a
bait di solito omette, o non esplicita, in- batch size of 4, across 3 epochs, using a learning rate of
formazioni chiave per creare curiosità nel 10−4 .
lettore. Estrai dall’articolo le informazioni In the clickbait detection experiments, the DistilBERT
mancanti o vaghe nel titolo che servono and Llama3-8b models have been fine-tuned on the same
per colmare questa curiosità. La risposta GPU. The DistilBERT model has been trained on 10
può essere un messaggio estremamente epochs with a learning rate of 2 ⋅ 10−4 . For the Llama3
coinciso oppure un elenco. Formatta la model, we have used QLoRa with the same characteris-
risposta nel seguente modo. “Risposta: tics as described above, trained on two epochs, with a