<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Participation of ESCOM's NLP Group at TA1C-IberLEF2025: RoBERTa Model Fine-Tuned for Clickbait Detection</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Omar Juárez Gambino</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>José-Emiliano Ledesma-Ramírez</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Raul Rodas-Rodríguez</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Omar-Alejandro Velázquez-Cruz</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yahir Arias-Morales</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Oscar-Galo Ayala-García</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Axel-Maximiliano Rivera-García</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>David Ramírez-Rosas</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Consuelo-Varinia García-Mendoza</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>07738</institution>
          ,
          <country country="MX">Mexico</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Clickbait detection</institution>
          ,
          <addr-line>Text Classification, Machine Learning</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Escuela Superior de Cómputo (ESCOM-IPN), Instituto Politénico Nacional, J.D. Batiz e/ M.</institution>
          <addr-line>O. de Mendizabal s/n</addr-line>
          ,
          <country>Mexico City</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>In this paper, we describe the participation of the ESCOM NLP group in the TA1C 2025 Clickabit detection task. The contest proposes a binary classification of tweets to determine whether the content is clickbait or not. We trained several models using traditional machine learning methods and LLMs. Our best model achieved an F1 score of 0.8152 and ranked second in the final competition results. Internet access has revolutionized how people consume and share information, enabling a massive content exchange. New avenues have opened for companies to attract customers online, giving rise to digital marketing. Today, products or services can be promoted using digital technologies through electronic media [1]. Web pages can generate revenue through various monetization models. One of the most common is Pay Per Click, where site owners earn money each time a user clicks on an advertisement link. This type of advertising, known as contextual advertising, allows companies to pay only when their ads receive interaction, ensuring greater control over their digital marketing investment [2].The excessive and unethical use of this type of advertising has led to the rise of clickbait. This strategy has proven to be efective in attracting trafic. However, it has simultaneously harmed information quality, causing a decline in public trust due to sensationalist headlines that promise more than the content actually delivers. This practice has become common on digital platforms and social networks, where the competition for user attention is fierce [ 3].</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>CEUR
Workshop</p>
      <p>ISSN1613-0073
18 media outlets, which was manually labeled as either clickbait or non-clickbait. The authors report a
baseline machine learning method that achieved 0.84 of the F1-score.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Task and corpus description</title>
      <p>The work presented in [6] was revisited as a task by the IberLEF 2025 conference [7]. TA1C [8] proposes
two substask: Clickbait Detection and Clickbait Spoiling. The first subtask aims to determine whether a
tweet’s content is related to a news item considered clickbait. The second subtask involves generating
or extracting a brief text from the article that fills the information gap, satisfies the curiosity generated,
or, conversely, indicates that the article ofers no answer on the matter.</p>
      <p>We decided to participate solely in the clickbait detection subtask, therefore we will only
describe the corpus created for this purpose. The event organizers provided three datasets:
TA1C_dataset_detection_train (training), TA1C_dataset_detection_dev_gold (development),
and TA1C_dataset_detection_test (test), all in CSV format. The training dataset contains 2,800
instances, while development and test datasets contain 700 instances.</p>
      <p>The training and the development dataset includes the following columns: Tweet ID, Tweet Date,
Media Name, Media Origin, Teaser Text, and Tag Value (indicating whether the teaser is clickbait
or not). The datasets exhibit a class imbalance, with 71% of instances belonging to the ”No clickbait”
class and the remaining 29% belonging to the ”Clickbait” class. This imbalance poses a significant
challenge for machine learning models, as it often bias their predictions towards the more represented
classes. The efect of this issue on the results of the developed models is detailed in Section 4.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Method</title>
      <p>Two approaches were pursued in developing the clickbait detector. The first relied on traditional
Machine Learning methods, while the second used Large Language Models. Below, we describe the
processes applied in each approach.</p>
      <sec id="sec-3-1">
        <title>3.1. Traditional Machine Learning methods</title>
        <p>Traditional Machine Learning methods (TMLM) require the text to be preprocessed and converted into
a suitable representation before use. This approach involved applying diverse text preprocessing and
representation techniques.</p>
        <sec id="sec-3-1-1">
          <title>3.1.1. Preprocessing</title>
          <p>Since the corpus consists of tweets, it contains special strings like user mentions, hashtags, and URLs. We
performed a cleaning process to remove this information, which was considered unhelpful in identifying
clickbait. Additionally, POS tagging was used to identify determiners, prepositions, conjunctions, and
pronouns, as words with these grammatical categories were removed because they were considered
stopwords. Finally, a lemmatization process was applied to use words in their base form. The SpaCy
tool1 was used for this normalization process.</p>
        </sec>
        <sec id="sec-3-1-2">
          <title>3.1.2. Text representation</title>
          <p>After preprocessing, unigrams, bigrams, and a combination of both were extracted as features. These
features were transformed into vector space using the following techniques:
• Term frequency: Transforms the text into a word count vector, where each entry indicates how
many times a specific word appears in the document. Although simple, it can be efective in tasks
where absolute frequency is relevant.
• Binarization: Similar to the previous method but only indicates the presence or absence of a word,
disregarding its frequency. This helps reduce the impact of extremely frequent words.
• Term Frequency - Inverse Document Frequency (TF-IDF): This approach weighs words based on
their relative frequency within a document compared to the rest of the document collection.
Common words across the entire collection are assigned a lower weight, allowing terms more
representative of each class or category to stand out.</p>
          <p>Details on how preprocessing and text representation techniques were used into the pipeline can be
found in Section 4.</p>
        </sec>
        <sec id="sec-3-1-3">
          <title>3.1.3. Classification</title>
          <p>The task for determining whether a tweet is clickbait or not was was tackled as a binary classification
problem. We use the following machine learning methods: Logistic Regression (LR), Multinomial Naïve
Bayes (MNB), Support Vector Machine (SVM), Multi-layer Perceptron(MLP), Random Forest (RF), and
Gradient Boosting (GB). These methods were chosen based on their established eficacy in the text
classification task [ 9]. For the implementation of the classifiers the scikit-learn library [10] was
used. Further details regarding the training process, hyperparameter tuning and results can be found in
Section 4.</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Large Language Models</title>
        <p>In addition to traditional machine learning approaches, we explored the use of Large Language Models
(LLMs), which have demonstrated strong performance accross a range of Natural Language Processing
(NLP) tasks due to their capacity to capture deep semantic and contextual information [11]. For this
task, we fine-tuned two pre-trained transformer-based models: a Spanish version of BERT [ 12] known
as BETO [13] and RoBERTa [14]. Unlike classical models that rely on predefined text vectorization
methods, these models learn contextual representations of text directly from raw input. They use
self-attention mechanisms to capture semantic relationships between words, making them highly
efective for tasks involving subtle linguistic cues such as detecting clickbait.</p>
        <p>To optimize the training process and reduce computational costs, we applied the LoRA (Low-Rank
Adaptation) technique [15]. LoRA enables parameter-eficient fine-tuning by injecting trainable rank
decomposition matrices into the transformer layers, allowing us to adapt large models to our specific
dataset without retraining the entire model.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments and Results</title>
      <p>In this section, we present the experiments and results of the clickbait detection task. As explained
above, we used TMLM and LLMs. For all of the experiments, we used the train and gold development
corpora described in Section 2. The column Teaser Text was provided as data input and column Tag
Value as a target for the classifiers.</p>
      <sec id="sec-4-1">
        <title>4.1. TMLM experiments</title>
        <p>The training corpus was split into 75% of the data for training and 25% for development. The gold
development corpus was used as a test set. The corpus analysis revealed a substantial class imbalance,
with the ’No clickbait’ class being overrepresented (71% of instances). Consequently, a stratified
partitioning method was employed to ensure that the training and development sets accurately reflect
the class distribution in the original corpus.</p>
        <p>For the experiments we tried several combinations of the preprocessing and text representation
techniques describen in subsections 3.1.1 and 3.1.2. GridSearch was employed to conduct a systematic
search for optimal hyperparameters for the classifiers. The hyperparameters adjusted for each machine
learning method are listed below.</p>
        <p>• LR: max_iter and penalty.
• SVM: kernel, C and Gamma
• MLP: hidden_layer_sizes, alpha and solver
• RF: n_estimators, max_depth, min_samples_leaf and min_samples_split
• GB: n_estimators, learning_rate, subsample and max_depth</p>
        <p>As observed in the table, text cleaning and lemmatization proved to be the most efective preprocessing
steps across all machine learning methods. Stop word removal did not enhance performance, which we
attribute to the short text lengths being further diminished by their exclusion. For text representation,
unigram and bigram frequency counts yielded the best results. This suggests that the presence of
individual words and two-word sequences helps identify relevant features in the texts.</p>
        <p>Subsequently, the best-performing model (MLP) was retrained using 100% of the training corpus data.
The adjusted model was then used to predict instances in the gold development corpus, achieving a
F1-score (Macro) of 0.649. The model’s performance indicates limited generalization power, and based
on our analysis, this is primarily due to the class imbalance. Given the poor results, we decided not to
use this model for predictions in the test corpus.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. LLMs experiments</title>
        <p>Given the nature of the task, LLMs ofer a significant advantage by capturing subtle semantic and
syntactic cues that traditional models may overlook. This is particularly beneficial in our case, as the
corpus presented a notable class imbalance, with a greater number of non clickbait instances. LLMs have
been shown to be more robust in handling unbalanced datasets due to their contextual understanding
and capacity to learn from limited examples of the minority class.</p>
        <p>A fine-tunning process was performed on three LLMs models:
• dccuchile/bert-base-spanish-wwm-cased (BETO)
• PlanTL-GOB-ES/roberta-base-bne (RoBERTa)
• PlanTL-GOB-ES/roberta-base-bne + LoRA (RoBERTa_LoRA)
The input text was tokenized using each model’s specific tokenizer, with sequences truncated to the
model’s maximum length (512 for both BERT and RoBERTa) and padding activated. We did not apply
text normalization techniques (e.g., text cleaning, stop words removal and lemmatization) before feeding
the input into BERT and RoBERTa. This decision is based on the fact that these models are pre-trained
on raw text, and altering the input could introduce a distributional mismatch that afects performance.
Since tokenizers for BERT and RoBERTa are designed to handle casing, punctuation, and subword
variations, preserving the original text allows the models to leverage their pretraining fully.</p>
        <p>Below are the results obtained by the three LLMs on the gold development set. In Tables 3, 4 and 5
we show the classification report of the three fine-tuned LLMs.</p>
        <p>The RoBERTa model fine-tuned with LoRA obtained the best results based on the results. We can
see that this model’s macro F1-score was 0.90, representing a 38% improvement compared to the 0.649
achieved by the best TMLM (MLP). Additionally, despite the imbalance in the ”No clickbait” class, the
LLM achieved an F1-score of 0.86. This result shows that the bias towards the majority class has been
significantly reduced.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Competition test set results</title>
        <p>The RoBERTa_LoRA model was used to generate predictions on the final, unlabeled test file provided
for the competition. The same tokenization method was applied to the 700 instances of the test set.
According to the results published by the competition organizers, our model achieved a macro F1-score
of 0.815249, placing us in second place, just 0.00039 behind first place. Table 8 shows the oficial
competition results.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions and future work</title>
      <p>Clickbait has become a widespread advertising strategy designed to drive trafic to websites. While
the information provided through this mechanism can sometimes be interesting, clickbait text often
exaggerates or provides incomplete information, enticing users to follow the link. Task TA1C, proposed
as part of the IBERLEF 2025 conference, aims to detect clickbait text automatically. During the
competition, we developed a machine learning model that ranked second in correctly identifying both clickbait
and non-clickbait texts. The use of the RoBERTa LLM, coupled with LoRA for parameter fine-tuning,
significantly improved the results compared to traditional machine learning methods. Data imbalance
was a challenge due to the common bias present in models; however, our LLM successfully overcame
this situation.</p>
      <p>In future work, we propose data augmentation for the minority class (clickbait) using an LLM for
text generation, given that traditional oversampling techniques were inefective in our experiments.
Additionally, we can explore the use of more robust LLMs like DeepSeek or Gemini and employ
prompting techniques to guide the models in this task.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>This research was partially funded by SECIHTI-SNII and Instituto Politécnico Nacional (IPN), through
grants SIP-20254348, COFAA-SIBE and EDI.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used Gemini and Grammarly to: check grammar and
spelling. After using these tools, the authors reviewed and edited the content as needed and take full
responsibility for the publication’s content.
[2] K. K. Kapoor, Y. K. Dwivedi, N. C. Piercy, Pay-per-click advertising: A literature review, The</p>
      <p>Marketing Review 16 (2016) 183–202.
[3] A. B. Araujo, M. F. N. Jaso, J. Serrano-Puche, Use of clickbait in spanish digital native media. an
analysis of el confidencial, el español, eldiario. es and ok, Dígitos. Revista de Comunicación Digital
(2021) 185–210.
[4] Supriya, J. P. Singh, G. Kumar, Identification of clickbait news articles using sbert and correlation
matrix, Social Network Analysis and Mining 13 (2023) 153.
[5] A. Muqadas, H. U. Khan, M. Ramzan, A. Naz, T. Alsahfi, A. Daud, Deep learning and sentence
embeddings for detection of clickbait news from online content, Scientific Reports 15 (2025) 13251.
[6] G. Mordecki, G. Moncecchi, J. Couto, Te ahorré un click: A revised definition of clickbait and
detection in spanish news, in: L. Correia, A. Rosá, F. Garijo (Eds.), Advances in Artificial Intelligence
– IBERAMIA 2024, Springer Nature Switzerland, Cham, 2025, pp. 387–399.
[7] J. Á. González-Barba, L. Chiruzzo, S. M. Jiménez-Zafra, Overview of IberLEF 2025: Natural
Language Processing Challenges for Spanish and other Iberian Languages, in: Proceedings of the
Iberian Languages Evaluation Forum (IberLEF 2025), co-located with the 41st Conference of the
Spanish Society for Natural Language Processing (SEPLN 2025), CEUR-WS. org, 2025.
[8] G. Mordecki, L. Chiruzzo, R. Laguna, J. Prada, A. Rosá, I. Sastre, G. Moncecchi, Overview of TA1C
at IberLEF 2025: Detecting and Spoiling Clickbait in Spanish-Language News, Procesamiento del
Lenguaje Natural 75 (2025).
[9] A. Gasparetto, M. Marcuzzo, A. Zangari, A. Albarelli, A survey on text classification algorithms:
From text to predictions, Information 13 (2022). URL: https://www.mdpi.com/2078-2489/13/2/83.
doi:10.3390/info13020083.
[10] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer,
R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, E. Duchesnay,
Scikit-learn: Machine learning in Python, Journal of Machine Learning Research 12 (2011)
2825–2830.
[11] L. Tunstall, L. Von Werra, T. Wolf, Natural language processing with transformers, ” O’Reilly</p>
      <p>Media, Inc.”, 2022.
[12] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional transformers
for language understanding, in: Proceedings of the 2019 Conference of the North American Chapter
of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long
and Short Papers), 2019, pp. 4171–4186.
[13] J. Cañete, G. Chaperon, R. Fuentes, J.-H. Ho, H. Kang, J. Pérez, Spanish pre-trained bert model and
evaluation data, in: PML4DC at ICLR 2020, 2020.
[14] A. Conneau, K. Khandelwal, N. Goyal, V. Chaudhary, G. Wenzek, F. Guzmán, E. Grave, M. Ott,
L. Zettlemoyer, V. Stoyanov, Unsupervised cross-lingual representation learning at scale, CoRR
abs/1911.02116 (2019). URL: http://arxiv.org/abs/1911.02116. arXiv:1911.02116.
[15] E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, W. Chen, Lora: Low-rank
adaptation of large language models, in: International Conference on Learning Representations,
2022. URL: https://openreview.net/forum?id=nZeVKeeFYf9.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Puthussery</surname>
          </string-name>
          ,
          <article-title>Digital marketing: an overview (</article-title>
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>