<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>VerbaNexAI at MentalRiskES 2025: Early Detection of Gambling Disorders using Transformer Architectures and Machine Learning Models</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jeison D. Jimenez</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jairo E. Serrano</string-name>
          <email>jserrano@utb.edu.co</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Juan C. Martinez-Santos</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Edwin Puertas</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Universidad Tecnologica de Bolivar, School of Digital Transformation</institution>
          ,
          <addr-line>Cartagena de Indias 130010</addr-line>
          ,
          <country country="CO">Colombia</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>Gambling disorder represents a significant public health challenge with severe psychological and socioeconomic consequences, afecting approximately 80 million individuals worldwide. Early detection and intervention are crucial for mitigating its harmful efects. In this paper, we present the approach of VerbaNexAI to the MentalRiskES 2025 shared task, focusing on the early detection of gambling disorders in Spanish social media content. Our methodology leverages transformer-based embeddings combined with traditional machine learning algorithms, applied to a novel dataset of Spanish-language user profiles from Telegram and Twitch platforms. We implemented a comprehensive pipeline including text preprocessing, contextual embedding generation using Spanish BERT models, class balancing through random oversampling, and systematic model selection. Our LightGBM classifier demonstrated moderate classification performance for risk detection but excelled in early detection capabilities among all participating teams. Our Logistic Regression model delivered robust results across multiple evaluation criteria for addiction type classification. This research contributes to the field by demonstrating the efectiveness of combining transformer architectures with traditional machine learning for early detection of gambling disorders in Spanish text, with particular emphasis on timeliness of detection as a critical factor for efective mental health monitoring and intervention.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Mental Risk</kwd>
        <kwd>Gambling</kwd>
        <kwd>Embedding</kwd>
        <kwd>Transformers</kwd>
        <kwd>Machine Learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Mental health represents one of the most pressing healthcare challenges worldwide: one in eight people
sufers from a mental disorder, yet the majority lack access to adequate treatment [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The health crisis
caused by COVID-19 intensified this issue, increasing the global prevalence of anxiety and depression
by 25% during its first year and thereby intensifying preexisting structural deficiencies in care services
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Suicide remains the fourth leading cause of death among individuals aged 15 to 29, accounting for
over 700,000 deaths annually [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>
        In this context of collective psychological vulnerability, gambling disorder has emerged as a significant
public health threat: recent Lancet Commission research estimates that approximately 80 million
individuals meet diagnostic criteria for this disorder. At the same time, up to 450 million engage in
harmful gambling behaviours worldwide [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Conceptualized in the DSM-5 as persistent, recurrent
problematic gambling behaviour leading to clinically significant impairment or distress [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], gambling
disorder exhibits the highest suicide rate among addictive disorders, with clinically relevant rates
of suicidal ideation (31.6 %) and suicide attempts (13.2 %) [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Its socioeconomic impact is equally
alarming: social-cost analyses estimate per-adult harms ranging from USD 16 to USD 36,144 annually,
encompassing economic, relational, and psychological consequences [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Paradoxically, only one in five
afected individuals seeks professional assistance [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>
        Implementing automated systems for mental-health risk detection on digital platforms constitutes a
scalable methodology for early intervention. The CLEF eRisk lab has driven significant advances in early
risk prediction in English [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]; however, Spanish remains under-represented. To address this gap, IberLEF
[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] incorporated the Early-Risk Identification task (MentalRiskES) into SEPLN (the Spanish Society
for Natural Language Processing) in 2023 [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] and 2024 [12], thereby establishing a methodological
framework for mental-risk evaluation in Spanish. In this third edition [13], two subtasks have been
defined:
• Task 1: Risk Detection of Gambling Disorders. This online binary classification task requires
systems to decide, for each incoming message in a chronological stream of Telegram and Twitch
user comments, whether the user is at high risk (label = 1) or low risk (label = 0) of developing a
gambling-related disorder. Early detection is paramount: we evaluated performance on
classification accuracy and the promptness with which we issued a correct “high risk” label once suficient
evidence was available.
• Task 2: Type of Addiction Detection. This risk-conditioned multiclass classification task requires
systems to assign each user exactly one addiction type: Betting, Online Gaming, Trading, or
Lootboxes, based on their message history, regardless of prior risk flagging. The final prediction
submitted in the last round is used for evaluation, emphasizing both label correctness and decision
timeliness.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>In recent years, the automatic detection of pathological gambling using machine learning and natural
language processing methods has gained significant attention. Researchers have explored various
approaches to identify gambling disorder signals in text data, leveraging linguistic patterns and behavioral
indicators. This section provides an overview of key studies in this domain, highlighting the most
efective approaches and results obtained.</p>
      <p>ELiRF-UPV [14] conducted one of the most successful studies in pathological gambling detection,
employing Support Vector Machines (SVM) with TF-IDF features for the eRisk 2023 shared task. Their
approach prioritized handling long texts efectively, achieving remarkable results with an F1 score
of 0.935 on validation data and perfect precision (1.000) in test scenarios. Their system ranked first
among 49 submissions in the eRisk 2023 challenge, detecting 75% of gamblers within the first 10 posts,
demonstrating the continued viability of classical machine learning methods even as transformer models
gain popularity.</p>
      <p>NLP-UNED-2 [15] introduced an innovative approach for the eRisk 2022 challenge using Approximate
Nearest Neighbors (ANN) for dataset relabeling to convert user-level annotations to message-level
labels. They refined the training data using Universal Sentence Encoder embeddings and HNSW graphs.
They implemented RNN-based models that achieved impressive results, ranking second in the eRisk
2023 decision-based evaluation with high precision (0.896) and recall (0.922). Their latency-weighted F1
score of 0.877 demonstrated the efectiveness of neural networks with properly labeled data.</p>
      <p>SINAI [16] presented an approach to the eRisk 2023 challenge, leveraging pre-trained
Transformerbased models (RoBERTa-Large and XLM-RoBERTa-Large) combined with Long Short-Term Memory
(LSTM) architectures to analyze social media posts in sequential order. Their methodology emphasized
comprehensive data preprocessing, including normalization of text (replacing URLs, emojis, and special
characters), handling imbalanced datasets through sub-sampling, and integrating sequential modeling
for early detection. The team implemented five systems: four utilizing Transformer-based models
with feedforward neural networks (FFNN) and a fifth introducing an innovative hybrid architecture
combining RoBERTa with LSTM to capture temporal dependencies in user posts. Despite ranking 7th
out of 49 submissions with an F1 score of 0.126, SINAI achieved notable performance in recall and early
detection metrics (ERDE50 = 0.020–0.029), demonstrating strong capability in identifying high-risk
users in early stages. Their work highlights the potential of combining transformer architectures with
sequential models for temporal analysis of user behavior.</p>
      <p>UNSL [17] leveraged transformer-based models for early gambling detection in the eRisk 2023
challenge, employing BERT architectures with domain-specific vocabulary enrichment from external
models. Their approach incorporated a decision policy based on historical predictions, which optimized
early classification by applying thresholds and delay parameters. Their models achieved competitive
results in the challenge, particularly excelling in decision-based metrics (F1, ERDE50) and runtime
eficiency, with UNSL ranking among the fastest teams in the competition.</p>
      <p>UNED-NLP [18] applied an Approximate Nearest Neighbors method with semantic embeddings for
the eRisk 2022 challenge, achieving competitive results in early detection (ERDE50 = 0.018). Their
methodology involved constructing a reference database of labeled user profiles and classifying new
users based on their nearest neighbors in the embedded space. This lightweight approach demonstrated
scalability and low runtime latency, suggesting its utility in resource-constrained environments.</p>
      <p>Beyond academic competitions, researchers [19] have developed predictive models using player
account data, such as transaction logs and betting patterns. Their model achieved strong discriminatory
power (AUC &gt; 0.85) in identifying at-risk gamblers by analyzing features like loss-to-win ratios, betting
frequency spikes, and late-night gambling activity. This approach complements text-based methods by
incorporating behavioral data that may reveal gambling disorders before they manifest in social media
discourse.</p>
      <p>Research on pathological gambling detection highlights diverse methodologies with complementary
strengths. Classical machine learning models (e.g., SVM) demonstrate robust performance when
rigorously implemented. At the same time, neural architectures with engineered features enable
efective early identification. Hybrid frameworks, such as SINAI’s combination of transformers and
LSTMs, integrate semantic analysis with temporal modeling, underscoring the value of combined
approaches for precise and timely detection.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Data</title>
      <p>For the MentalRiskES challenge, authors compiled a novel dataset, comprising 517 anonymized
Spanishlanguage user profiles (7 trial, 350 training, 160 testing)[ 20]. The data were sourced from public Telegram
groups dedicated to gambling-related topics, where users exchange messages about wagering activities,
and from live-chat streams on Twitch discussing gambling. The organisers then extracted, anonymised,
annotated, and curated each user’s message history to support two subtasks. As previously outlined,
they divided the corpus into three sets for trial, training, and test. Table 1 provides a synopsis of
the distribution of users by risk category for Task 1. In contrast, Table 2 ofers an overview of the
distribution by gambling-type label for Task 2.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Architecture</title>
      <p>In this section, we describe the architecture of our system for the early detection of gambling disorders
based on user comments from Telegram and Twitch. As Figure 1 shows, the processing workflow
consists of four main stages: preprocessing, where we cleaned messages, normalized, and standardized;
feature extraction, in which we generated contextual embeddings through a Spanish Transformer model
and relevant lexical features are incorporated; training and validation, which involves comparing various
Machine Learning algorithms through cross-validation to ensure robustness; and model evaluation and
selection, based on metrics including Accuracy, Recall, F1, Precision, Kappa, and MCC, as well as the
earliness of detection.</p>
      <p>Input</p>
      <p>Features
extraction
Transformer</p>
      <p>Regularization</p>
      <p>Models training
Random Over</p>
      <p>Sample
Splitting data</p>
      <p>Train
Classifiers</p>
      <p>Cross
Validation</p>
      <p>Evaluation</p>
      <p>Evaluate
models
Select the best
model</p>
      <p>Output
Pre-Processing
Read and load data</p>
      <p>Normalize and
Standardize text</p>
      <p>Lemmatize</p>
      <p>Data augmentation</p>
      <sec id="sec-4-1">
        <title>4.1. Pre-Processing</title>
        <p>The first step in our pre-processing pipeline is to convert all text to lowercase. It standardizes the text
and ensures consistency in our analysis. Next, we replace email addresses and usernames with tags
[EMAIL] and [USER]. Then, we replace emojis with textual descriptions and remove extra spaces. We
also normalize quotation marks and reduce repeated punctuation to a maximum of three consecutive
instances.</p>
        <p>Subsequently, we tokenized the text to segment it into lexical units that facilitate the application of
lemmatization on the comments, thus reducing words to their base forms. We then concatenate the
messages for each user, using the [SEP] separator to distinguish between diferent interactions and
maintain the conversational structure. This process is implemented for both Task 1 and Task 2, as they
share the same dataset of messages.</p>
        <p>Furthermore, for Task 2, we implement data augmentation techniques specifically in the Lootboxes
class because this category presents a marked imbalance compared to the others, having only 26
instances, as shown in Table 2. To mitigate this limitation, we generated 59 additional message instances
using three techniques: back translation (Spanish-English-Spanish), synonym substitution via Spanish
WordNet, and paraphrasing by combining both methods. The augmentation was applied at the message
level rather than creating synthetic user profiles, increasing the Lootboxes class from 26 to 85 instances.
This approach allowed us to expand the training set while preserving the semantic integrity of the
original examples and achieving better class balance across all gambling categories.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Feature Extraction</title>
        <p>We extracted the features by tokenizing the concatenated messages of each user into sequences of up
to 512 tokens using the tokenizer in the case bert-base-spanish-wwm-cased [22]. The tokenized
sequences were then processed through the transformer model to generate 768-dimensional embeddings,
obtained by applying mean pooling to the last hidden-state outputs. These embeddings constituted the
feature matrices for Tasks 1 and 2, which we directly fed into our classification pipelines. To handle
sequences exceeding the 512-token limit, we applied truncation to retain the first 512 tokens while
using padding to ensure uniform sequence length.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Regularization</title>
        <p>To mitigate class imbalance in Task 1 and Task 2, we applied random oversampling (using
RandomOverSampler with a fixed seed) directly on the 768-dimensional embedding matrices, generating additional
examples of minority classes until we reached parity with the majority; the balanced datasets were
then split into training (80%) and validation (20%) subsets to support downstream cross-validation and
model selection.</p>
      </sec>
      <sec id="sec-4-4">
        <title>4.4. Models Training</title>
        <p>This subsection overviews the supervised classifiers in Task 1 (binary risk detection) and Task 2
(multiclass addiction-type classification). We used PyCaret, a machine learning library that facilitates model
development and evaluation by automating preprocessing, model training, and algorithm comparison.
Additionally, MLflow, an open-source platform for managing the machine learning lifecycle, was
integrated to systematically track experiment runs, log model parameters, and ensure reproducibility. We
evaluated the classifiers [23] detailed in Table 3 on the balanced 768-dimensional embeddings.</p>
        <p>We trained each model and evaluated using 10-fold cross-validation on 80% of the training data,
with Accuracy and F1 score as the primary evaluation metrics. MLflow automatically recorded all
experimental configurations and validation results, enabling a systematic and reproducible comparison
of classifiers, with the best-performing model selected for final evaluation on the test set.</p>
      </sec>
      <sec id="sec-4-5">
        <title>4.5. Evaluation</title>
        <p>At this stage, we conducted a rigorous comparison of the results obtained by the diferent models for
both Task 1 (binary risk detection of gambling disorder) and Task 2 (multiclass classification of gambling
type). We selected the best-performing model based on its performance across the metrics: Accuracy,
Precision, Recall, F1 score, AUC, Kappa, and Matthews Correlation Coeficient (MCC). These metrics
reflect diferent dimensions of predictive quality and are particularly relevant for evaluating model
behavior in class imbalance and uncertainty. By assessing the classifiers against this comprehensive set
of criteria, we ensured a rigorous and interpretable comparison of model performance for each task.</p>
        <sec id="sec-4-5-1">
          <title>4.5.1. Results of Training Process for Binary Classification Task 1</title>
          <p>Based on the evaluation results presented in Table 4, LightGBM was selected as the optimal classifier
for Task 1 due to its strong performance across key metrics, particularly Accuracy (0.7505), F1 score
(0.7315), and MCC (0.5072). The specific hyperparameter configuration for this model is detailed in
Table 5.</p>
        </sec>
        <sec id="sec-4-5-2">
          <title>4.5.2. Results of Training Process for Classification Task 2</title>
          <p>The results for Task 2 presented in Table 6 demonstrate that Logistic Regression is the top-performing
classifier, achieving outstanding metrics with Accuracy (0.9907) and F1 score (0.9906). Table 7 details
the hyperparameter configuration for this model to ensure reproducibility.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Results of VerbaNexAI in MentalRiskES Task Evaluation</title>
      <p>This section outlines the results obtained by the VerbaNexAI team in evaluating the MentalRiskES
shared task. The challenge focused on the detection of mental health disorders, with particular attention
to the early identification of gambling-related behavior in Spanish-language comments collected from
Telegram and Twitch. We used the same model across the three evaluation runs for each subtask:
LightGBM for Task 1 and Logistic Regressor for Task 2.</p>
      <sec id="sec-5-1">
        <title>5.1. Task 1: Risk Detection of Gambling Disorders</title>
        <p>The VerbaNexAI Lab team approached Task 1 by deploying the LightGBM classifier to detect early
signs of gambling behavior in Telegram messages. Table 8 presents the classification results for Task
1, including Accuracy, Macro, and Micro averaged Precision, Recall, and F1 scores. In this subtask,
the model achieved an overall Accuracy of 0.519, with a macro-F1 of 0.342, highlighting moderate
performance across classes. The Micro-F1 score reached 0.519, indicating consistent per-instance
prediction performance.</p>
        <p>In addition to classification metrics, we evaluated the system’s eficiency in timely detection using
ERDE5, ERDE30, latencyTP, detection speed, and latency-weighted F1. As shown in Table 9, VerbaNexAI
Lab achieved the lowest ERDE5 score (0.274) among all participating teams, demonstrating its strong
capability for early detection of gambling-related risks. The team also obtained a competitive ERDE30
(0.250), latencyTP of 2, and a top detection speed of 0.990, resulting in a latency-weighted F1 score of
0.677. While classification performance suggests room for improvement in class balance, the system’s
outstanding timeliness metrics confirm that it is highly efective for real-time monitoring scenarios.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Task 2: Type of Addiction Detection</title>
        <p>Task 2 focused on a multiclass classification challenge to identify each user’s specific type of gambling
based on their message history. The VerbaNexAI Lab team used a Logistic Regression model for all
three submitted runs. The evaluation considered the assigned label’s correctness and the prediction’s
timeliness.</p>
        <p>The model achieved an Accuracy of 0.813, along with a Macro Precision of 0.846, Macro Recall
of 0.769, and Macro F1 score of 0.780, as shown in Table 10. These metrics indicate a well-balanced
performance across the diferent classes. Furthermore, the Micro metrics (Precision, Recall, and F1)
were also consistent, each scoring 0.813, which suggests uniform instance-level classification.</p>
        <p>These results highlight the efectiveness of the Logistic Regression model in accurately distinguishing
between diferent addiction types, delivering solid performance in a multiclass classification setting.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Error Analysis</title>
      <p>To evaluate the limitations of our models, we conducted an analysis of misclassified cases using the
oficial gold labels published for the test set. This analysis reveals distinctive error patterns that provide
insights into the limitations and strengths of our approach.</p>
      <p>The analysis identified three primary sources of error afecting both models: the inability to capture
subtle mood and emotional changes that are characteristic of pathological gambling behaviors,
insufifcient message context for users with limited posting history, and the presence of gambling-specific
terminology that appears across multiple categories without adequate contextual understanding.
Additionally, confusions were observed between categories with overlapping vocabulary, particularly
between Trading and Betting due to shared financial terminology.</p>
      <p>However, the data augmentation strategy for the Lootboxes category proved efective, significantly
improving its classification despite the original class imbalance. The error patterns suggest the need
to focus on incorporating temporal emotional modeling and developing domain-specific lexicons for
Spanish gambling terminology.</p>
    </sec>
    <sec id="sec-7">
      <title>7. Carbon Emission</title>
      <p>We tracked resource consumption and emissions during the complete pipeline execution using
CodeCarbon API [24] to assess the computational requirements and environmental impact of our approach.
Understanding these metrics is crucial for identifying approaches suitable for deployment on personal
computers or resource-constrained environments. Table 11 details our computational hardware
configuration used for the experiments, while Table 12 presents the energy consumption and carbon emissions
recorded during the complete pipeline execution.</p>
      <p>The results demonstrate that our approach maintains reasonable computational eficiency for
resourceconstrained environments, with the majority of energy consumption occurring during the embedding
generation phase using the Spanish BERT model, while downstream classification tasks required
minimal additional resources.</p>
    </sec>
    <sec id="sec-8">
      <title>8. Conclusions and Future Work</title>
      <p>This paper presented VerbaNexAI’s approach to the MentalRiskES 2025 shared task for early detection
of gambling disorders in Spanish social media content. Our methodology combined rigorous text
preprocessing, contextual embeddings from Spanish BERT models, class balancing through random
oversampling, and systematic model selection.</p>
      <p>For Task 1 (risk detection), LightGBM achieved an accuracy of 0.7505 and an F1 score of 0.7315 during
cross-validation. At the same time, Logistic Regression demonstrated exceptional performance in Task
2 (addiction type classification) with 0.9907 accuracy. However, when evaluated on the oficial test set,
our system reached 0.519 accuracy and a Macro-F1 of 0.342 in Task 1, alongside more robust results in
Task 2 with 0.813 accuracy and a Macro-F1 of 0.780.</p>
      <p>Notably, our approach demonstrated robust early detection performance, securing the lowest ERDE5
score (0.274) among all participating teams. It also had an impressive detection speed of 0.990 and a
latency-weighted F1 of 0.677. These metrics highlight the system’s efectiveness for real-time monitoring
scenarios.</p>
      <p>This work’s primary contribution demonstrates that transformer-based embeddings combined with
traditional machine learning algorithms can detect early signs of gambling disorders in Spanish text.
Our approach prioritized detection timeliness, crucial for real-world mental health monitoring, where
early intervention significantly improves outcomes. The outstanding early detection metrics validate
the system’s practical potential for mental health risk monitoring. At the same time, our class imbalance
handling techniques proved efective, particularly for Task 2’s underrepresented Lootboxes class.</p>
      <p>The noticeable diference between cross-validation performance and oficial evaluation results in Task
1 highlights the challenge of generalizing mental health risk detection across diverse user populations
and communication contexts. Our preprocessing approach may have overlooked certain language
nuances specific to Spanish-speaking gambling communities, particularly evolving slang or
contextdependent expressions used in Telegram and Twitch platforms.</p>
      <p>Future research should incorporate temporal modeling techniques to better capture gambling behavior
progression, develop domain-specific lexicons for Spanish gambling terminology, combine text analysis
with behavioral metrics (message frequency, time patterns), and investigate explainable AI techniques
to enhance transparency for mental health professionals. These improvements would strengthen
early detection systems for mental health risks in Spanish-language communities, supporting timely
interventions for vulnerable individuals.</p>
    </sec>
    <sec id="sec-9">
      <title>Acknowledgments</title>
      <p>The authors would like to acknowledge the support provided by the master’s degree scholarship program
in engineering at the Universidad Tecnológica de Bolívar (UTB) in Cartagena, Colombia.</p>
    </sec>
    <sec id="sec-10">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this investigation, Claude Sonnet 4 was used for the revision of translations
into English, as well as for grammatical and spelling correction. After using this tool, the content was
reviewed and edited as necessary, and full responsibility for the content of the publication is assumed.
Early detection of mental disorders risk in spanish, Procesamiento del Lenguaje Natural 71 (2023)
329–350. URL: http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/6564.
[12] A. M. Mármol-Romero, A. Moreno-Muñoz, F. M. Plaza-del Arco, M. D. Molina-González, M. T.</p>
      <p>Martín-Valdivia, L. A. Ureña-López, A. Montejo-Ráez, Overview of MentalRiskES at IberLEF 2024:
Early detection of mental disorders risk in spanish, Procesamiento del Lenguaje Natural 73 (2024)
435–448. URL: http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/6629.
[13] A. M. Mármol-Romero, P. Álvarez-Ojeda, A. Moreno-Muñoz, F. M. P. del Arco, M. D.
MolinaGonzález, M.-T. Martín-Valdivia, L. A. Ureña-López, A. Montejo-Ráez, Overview of MentalRiskES
at IberLEF 2025: Early detection of mental disorders risk in spanish, Procesamiento del Lenguaje
Natural 75 (2025).
[14] A. Molina, X. Huang, L.-F. Hurtado, F. Pla, Elirf-upv at eRisk 2023: Early detection of pathological
gambling using svm, CEUR-WS 3497 (2023) 736–742. URL: https://ceur-ws.org/Vol-3497/paper-062.
pdf.
[15] H. Fabregat, A. Duque, L. Araujo, J. Martinez-Romo, Nlp-uned-2 at eRisk 2023: Detecting
pathological gambling in social media through dataset relabeling and neural networks, CEUR-WS 3497
(2023) 672–683. URL: https://ceur-ws.org/Vol-3497/paper-056.pdf/.
[16] A. M. Mármol-Romero, F. M. Plaza-Del-Arco, A. Montejo-Ráez, Sinai at eRisk@CLEF 2023:
Approaching early detection of gambling with natural language processing, CEUR-WS 3497 (2023)
743–751. URL: https://ceur-ws.org/Vol-3497/paper-063.pdf.
[17] H. Thompson, L. Cagnina, M. Errecalde, Strategies to harness the transformers’ potential: Unsl at
eRisk 2023, CEUR-WS 3497 (2023) 791–804. URL: https://ceur-ws.org/Vol-3497/paper-068.pdf.
[18] H. Fabregat, A. Duque, L. Araujo, J. Martinez-Romo, Uned-nlp at eRisk 2022: Analyzing gambling
disorders in social media using approximate nearest neighbors, CEUR-WS 3180 (2022) 894–904.</p>
      <p>URL: https://ceur-ws.org/Vol-3180/paper-71.pdf.
[19] B. Perrot, J. B. Hardouin, E. Thiabaud, A. Saillard, M. Grall-Bronnec, G. Challet-Bouju, Development
and validation of a prediction model for online gambling problems based on players’ account
data, Journal of behavioral addictions 11 (2022) 874–889. URL: https://pubmed.ncbi.nlm.nih.gov/
36125924/. doi:10.1556/2006.2022.00063.
[20] Álvarez Ojeda Pablo, C.-R. M. Victoria, S. Anastasia, M.-R. Arturo, The precom-sm corpus:
Gambling in spanish social media, in: Proceedings of the 31st International Conference on
Computational Linguistics, 2025, pp. 17–28.
[21] J. Cuadrado, E. Martinez, J. Cuadrado, J. C. Martinez-Santos, E. Puertas, Verbanex ai at dipromats
2024: Enhancing propaganda detection in diplomatic tweets with fine tuned bert and integrated
nlp techniques, CEUR-WS 3756 (2024). URL: https://ceur-ws.org/Vol-3756/{MentalRiskES}2024_
paper8.pdf.
[22] Cañete, José, Chaperon, Gabriel, R. Fuentes, J.-H. Ho, H. Kang, J. Pérez, Spanish pre-trained bert
model and evaluation data, in: PML4DC at ICLR 2020, 2020.
[23] E. Martinez, J. Cuadrado, J. C. Martinez-Santos, E. Puertas, Automated detection of depression
and anxiety using lexical and phonestheme features in spanish texts, CEUR-WS 3756 (2024). URL:
https://ceur-ws.org/Vol-3756/{MentalRiskES}2024_paper8.pdf.
[24] B. Courty, V. Schmidt, S. Luccioni, Goyal-Kamal, MarionCoutarel, B. Feld, J. Lecourt, LiamConnell,
A. Saboni, Inimaz, supatomic, M. Léval, L. Blanche, A. Cruveiller, ouminasara, F. Zhao, A. Joshi,
A. Bogrof, H. de Lavoreille, N. Laskaris, E. Abati, D. Blank, Z. Wang, A. Catovic, M. Alencon,
M. Stęchły, C. Bauer, L. O. N. de Araújo, JPW, MinervaBooks, mlco2/codecarbon: v2.4.1, 2024. URL:
https://doi.org/10.5281/zenodo.11171501. doi:10.5281/zenodo.11171501.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>World</given-names>
            <surname>Health</surname>
          </string-name>
          <string-name>
            <surname>Organization</surname>
          </string-name>
          ,
          <article-title>Who special initiative for mental health</article-title>
          , https://www.who.int/ initiatives/who-special
          <article-title>-initiative-for-mental-</article-title>
          <string-name>
            <surname>health</surname>
          </string-name>
          ,
          <year>2025</year>
          . Accessed:
          <fpage>2025</fpage>
          -05-04.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>World</given-names>
            <surname>Health</surname>
          </string-name>
          <string-name>
            <surname>Organization</surname>
          </string-name>
          , Covid-19 pandemic triggers 25%
          <article-title>increase in prevalence of anxiety and depression worldwide</article-title>
          , https://n9.cl/wesf4,
          <year>2022</year>
          . Accessed:
          <fpage>2025</fpage>
          -05-04.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>World</given-names>
            <surname>Health</surname>
          </string-name>
          <string-name>
            <surname>Organization</surname>
          </string-name>
          ,
          <source>World Report on Universal Health Coverage, Technical Report 9789240026643</source>
          , World Health Organization,
          <year>2022</year>
          . Accessed:
          <fpage>2025</fpage>
          -05-04.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>H.</given-names>
            <surname>Wardle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Degenhardt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Marionneau</surname>
          </string-name>
          , G. Reith,
          <string-name>
            <given-names>C.</given-names>
            <surname>Livingstone</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sparrow</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. T.</given-names>
            <surname>Tran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Biggar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Bunn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Farrell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Kesaite</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Poznyak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Quan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Rehm</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rintoul</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sharma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Shifman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Siste</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ukhova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Volberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. S.</given-names>
            <surname>Yendork</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Saxena</surname>
          </string-name>
          ,
          <article-title>The *lancet public health* commission on gambling</article-title>
          ,
          <source>The Lancet Public Health</source>
          <volume>9</volume>
          (
          <year>2024</year>
          )
          <fpage>e950</fpage>
          -
          <lpage>e994</lpage>
          . URL: https://doi.org/10. 1016/S2468-
          <volume>2667</volume>
          (
          <issue>24</issue>
          )
          <fpage>00167</fpage>
          -
          <lpage>1</lpage>
          . doi:
          <volume>10</volume>
          .1016/S2468-
          <volume>2667</volume>
          (
          <issue>24</issue>
          )
          <fpage>00167</fpage>
          -
          <lpage>1</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>National</surname>
            <given-names>Center</given-names>
          </string-name>
          <source>for Biotechnology Information, Table</source>
          <volume>3</volume>
          .39:
          <article-title>Prevalence of mental disorders by who region</article-title>
          ,
          <source>in: Mental Health Competency Frameworks: A Global Perspective, National Academies Press (US)</source>
          ,
          <year>2020</year>
          . URL: https://www.ncbi.nlm.nih.gov/books/NBK519704/table/ch3.t39, accessed:
          <fpage>2025</fpage>
          -05-04.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J. H.</given-names>
            <surname>Kristensen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Pallesen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bauer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Leino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. D.</given-names>
            <surname>Grifiths</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. K.</given-names>
            <surname>Erevik</surname>
          </string-name>
          ,
          <article-title>Suicidality among individuals with gambling problems: A meta-analytic literature review</article-title>
          ,
          <source>Psychol Bull</source>
          <volume>150</volume>
          (
          <year>2024</year>
          )
          <fpage>82</fpage>
          -
          <lpage>106</lpage>
          . doi:
          <volume>10</volume>
          .1037/bul0000411.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S.</given-names>
            <surname>Hautamäki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Marionneau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Castrén</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Palomäki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Raisamo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Lintonen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Pörtfors</surname>
          </string-name>
          , T. Latvala,
          <article-title>Methodologies and estimates of social costs of gambling: A scoping review</article-title>
          ,
          <source>Social Science Medicine</source>
          <volume>371</volume>
          (
          <year>2025</year>
          )
          <article-title>117940</article-title>
          . doi:https://doi.org/10.1016/j.socscimed.
          <year>2025</year>
          .
          <volume>117940</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>R.</given-names>
            <surname>Bijker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Booth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. S.</given-names>
            <surname>Merkouris</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. A.</given-names>
            <surname>Dowling</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. N.</given-names>
            <surname>Rodda</surname>
          </string-name>
          ,
          <article-title>Global prevalence of help-seeking for problem gambling: A systematic review and meta-analysis</article-title>
          ,
          <source>Addiction</source>
          <volume>117</volume>
          (
          <year>2022</year>
          )
          <fpage>2972</fpage>
          -
          <lpage>2985</lpage>
          . doi:
          <volume>10</volume>
          .1111/add.15952.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J.</given-names>
            <surname>Parapar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Martín-Rodilla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. E.</given-names>
            <surname>Losada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Crestani</surname>
          </string-name>
          , Overview of eRisk 2023:
          <article-title>Early risk prediction on the internet</article-title>
          ,
          <source>in: CLEF</source>
          <year>2023</year>
          :
          <article-title>Experimental IR Meets Multilinguality</article-title>
          , Multimodality, and Interaction, Springer-Verlag,
          <year>2023</year>
          , pp.
          <fpage>294</fpage>
          -
          <lpage>315</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>031</fpage>
          -42448-9_
          <fpage>22</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>J.</given-names>
            <surname>Á</surname>
          </string-name>
          .
          <string-name>
            <surname>González-Barba</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Chiruzzo</surname>
            ,
            <given-names>S. M.</given-names>
          </string-name>
          <string-name>
            <surname>Jiménez-Zafra</surname>
          </string-name>
          ,
          <article-title>Overview of IberLEF 2025: Natural Language Processing Challenges for Spanish and other Iberian Languages, in: Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2025), co-located with the 41st Conference of the Spanish Society for Natural Language Processing (SEPLN 2025), CEUR-WS</article-title>
          . org,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>A. M. Mármol-Romero</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Moreno-Muñoz</surname>
            ,
            <given-names>F. M.</given-names>
          </string-name>
          <string-name>
            <surname>Plaza-del Arco</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. D.</surname>
            Molina-González,
            <given-names>M. T.</given-names>
          </string-name>
          <string-name>
            <surname>Martín-Valdivia</surname>
            ,
            <given-names>L. A.</given-names>
          </string-name>
          <string-name>
            <surname>Ureña-López</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Montejo-Raéz</surname>
          </string-name>
          ,
          <source>Overview of MentalRiskES at IberLEF</source>
          <year>2023</year>
          :
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>