<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>wangkongqiang at MentalRiskES@IberLEF 2025: Early Detection of Mental Disorders Risk in Spanish</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Kongqiang Wang</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Yunnan University, School of Information Science and Engineering</institution>
          ,
          <addr-line>Kunming, Yunnan, 650500</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>According to a recent report by the World Health Organization, there is 1 in every 8 people in the world sufering from a mental disorder. The organisation MentalRiskES at IberLEF 2025 considers that early identification is a key efective intervention to prevent these problems. The task I participated in was Task 1: Risk Detection of Gambling Disorders. This is a binary classification task aimed at determining whether a user is at high risk ( label = 1 ) or low risk ( label = 0 ) of developing a gambling-related disorder based on their messages. The objective is to enable early detection and facilitate timely interventions. We compare the performance of two diferent modeling approaches: fine-tuning a roberta-base model and using sentence embeddings as inputs to a linear regressor, with the latter yielding better results. My final experimental result is Accuracy 0.519, Macro_R 0.500, ERDE_5 0.332, ERDE_30 0.250.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Mental Health</kwd>
        <kwd>Natural Language Processing</kwd>
        <kwd>Transformers</kwd>
        <kwd>Sentence Embedding</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>IberLEF 2025 September 2025, Zaragoza, Spain
* Corresponding author.
$ wangkongqiang60@gmail.com (K. Wang)
 https://github.com/WangKongQiang/ (K. Wang)</p>
      <p>© 2025 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).</p>
      <p>The first thing we did was group all the messages by the user they belonged to and concatenated
them into a single string, obtaining a total of 357 messages ( one per user ). This was done to
obtain a single representation of each user’s conversation history ( from which the labels were
assigned ) to be able to use it as input for the models.
• augment the train data for train dataset settings.</p>
      <p>To increase the amount of data available for training, at the same time, attempt to model early
detection ( obtaining predictions early on in the lifetime of the message history ), we augmented
the training set by adding observations that only contained half of their messages ( The first half
and the second half ), One third of their messages ( The first third of the data, the middle third
of the data, and the last third of the data ). This was done by first sorting the messages of each
user in the training set by its date and then only taking the augment data, the resulting dataset
was then appended to the original training set to obtain a new one with sixfold the number of
observations to be used for training.
• pretrain model for model training.</p>
      <p>
        We used two models, and they are respectively models–PlanTL-GOB-ES–roberta-base-bne[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]
and models–somosnlp-hackathon-2023–roberta-base-bne-finetuned-suicide-es[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. The model we
ifne-tuned was a version of RoBERTa pre-trained for detecting suicidal behavior from texts in
Spanish. We chose this model due to the fact of having been trained previously for a task that
shares similar characteristics to ours.
• embedding model optional package if regression estimators is required in the document.
      </p>
      <p>The sentence embeddings were obtained after concatenating the messages of each user into a
single string. The diference in performance between roberta-base-bne-suicide-es encodings and
the other embeddings can be justified by the fact that we are taking advantage of the information
gained from the prior fine-tuning for suicide detection of this model, which likely shares semantic
similarities with our data.
• regression estimators optional sklearn package for using sentence embeddings as inputs
to a linear regressor.</p>
      <p>
        The estimators mentioned in the paper are implementations of common regressors from Python’s
Scikit-Learn library.These include: Ordinary (" lr ") and Ridge (" ridge ") [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], Least Squares
Regression, Ada-Boost regression (" ada "), LightGradientBoosting Machine (" lgbm "),
SupportVectorRegression (" svr "), RandomForests (" rf ") [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], and a Multi-Layer Perceptron (" mlp "). References of
the implementations of these models can be found in the Scikit-Learn documentation.
      </p>
      <p>The rest of the paper is organized as follows: In the next section, we analyze the dataset used for the
task ( Section 2 ). Then, we describe in detail our methodology for training and evaluating the models (
Section 3 ). Finally, we discuss the results obtained ( Section 4 ) and present our conclusions and future
lines of work ( Section 5 ).</p>
    </sec>
    <sec id="sec-2">
      <title>2. Dataset Analysis</title>
      <p>The dataset given for the task consisted of a total of thousands of individual messages from 357 Telegram
/ Twitch / Reddit users ( see Table 1 ), each with a variable number of messages. The annotation process
consisted of labeling each user based on the evidence from their conversation history of sufering from
gambling disorders. Thus, a total of 2 labels were used for the tasks. Each was asked to assign one of
the following two labels: at high risk ( label = 1 ) or low risk ( label = 0 ) to each user.</p>
      <p>To increase the amount of data available for training, at the same time, attempt to model early
detection ( obtaining predictions early on in the lifetime of the message history ), we augmented the
training set by adding observations that only contained part of their messages. This was done by first
sorting the messages of each user in the training set by its date and then only taking the part of data,
the resulting dataset was then appended to the original training set to obtain a new one with sixfold the
number of observations to be used for training. Now, taking the sample with the subject-id of user1002
as an instance, how to create an instance with five more partial data than the original data ( see Table 2
).</p>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <p>We proceeded to evaluate diferent techniques to solve this subtasks. Two main predictive-modeling
approaches were explored: The first one involved fine-tuning a pre-trained language model on this
subtask and the second was about training a standard ML regressor using sentence embeddings encoded
from the user’s messages as features. The following section describes the steps taken for each approach,
ifrst describing how the data was pre-processed and later explaining the training and evaluation process
done for this subtask.</p>
      <sec id="sec-3-1">
        <title>3.1. Data Processing and Augmentation</title>
        <p>Based on the detailed description and practical examples provided earlier in the article, a brief account
will be given here, include:
• concat_messages : According to the messages of the telegram users, they are spliced in
chronological order, that is, the rounds sequence. The standard for splicing is based on their user
ids.
• augment_data : Expand based on half and one-third of the original data set, and eventually
expand to six times the original size ( including the original data ). 1</p>
        <p>To prepare for training, the original data was split into training and validation sets, leaving a random
54 ( 15% ) users in the latter for stratified cross-validation, where each set receives the same proportion
of samples of each class. The stratification was done using the labels of Subtask 1 to ensure equal
representation of the classes in both sets.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Solving Substask1 by Solving for Regression</title>
        <p>
          By the discussion in Section 2, it should be clear to see that all labels of the subtasks give the same
amount of information about the condition of the subject and the likelihood of predicting it based on
the available data. This observation led us to consider using models that solve for this one subtask by
1We can enable the display and use of data only the oficial data set is provided and no additional extended datasets were used.
only training it with the labels of Subtask1. This allowed us to reduce the number of models that had to
be trained and focus on solving for a single data modality (regression on [
          <xref ref-type="bibr" rid="ref1">0, 1</xref>
          ]).
        </p>
        <p>
          We approached simple regression in a standard way training models, training models to minimize
the Mean Squared Error between the output values and the real ones. Additionally,we included the
post-processing step of clipping the output predictions of models of this type to the [
          <xref ref-type="bibr" rid="ref1">0, 1</xref>
          ] range to
ensure that they were valid probabilities. simple-output regression using standard machine learning
regression, on the other hand,wasn’t as trivial as in the simple binary classification case. We used many
regressors in the sklean python package.
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Modeling Approaches</title>
        <p>
          3.3.1. Training a Regressor with Sentence Embeddings
A sentence embedding[
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] is a semantically meaningful real-valued vector representation of a sentence,
obtained from the outputs of the hidden layers of a language model. The properties of this representation
are so that sentences that express similar meanings are mapped ( encoded ) closer to each other in the
vector space.
        </p>
        <p>In this way, the process of encoding text as numeric vectors can be used directly to extract features
for a classifier or regressor, which will try to learn from the semantic information of these encodings to
predict the label of their corresponding messages. Note, however, that this approach requires the need
to have a pre-trained model to perform this encoding. Furthermore, it assumes that the model will be
good enough at capturing the semantic information of the texts given as input, enough for the classifier
/ regressor to learn from it.</p>
        <p>Assuming that this is the case, this approach has the advantage that it is much faster to train
these kinds of regressors with regular CPUs, with the most time-consuming part being obtaining the
embeddings of the training / evaluation messages, which only has to be done once. However, it is
necessary to evaluate diferent encoding models and diferent classifiers / regressors ( prediction models
) to find the best combination for the task at hand.</p>
        <p>
          As such, we conducted experiments using diferent language models to find the best encoding model.
Particularly, we tested two diferent versions of roberta trained with diferent corpora in Spanish. These
versions are described in Table 3. Additionally, we experimented with over 10 diferent regressors,
including Least Squares Linear regression, Random Forest,and Gradient Boosting[
          <xref ref-type="bibr" rid="ref10">10</xref>
          ], among others.
These models were chosen due to their ease of implementation and the fact that they are commonly
used in the literature[
          <xref ref-type="bibr" rid="ref11">11</xref>
          ].
        </p>
        <p>The process of training and evaluating these models proceeded then as follows: First, the training
set was encoded using the language model and the resulting embeddings were used as features for a
regressor. The regressor was then trained using the labels of Subtask1 ( the most informative ones )
and the resulting model was used to predict the labels of the validation set. The predictions were then
evaluated with the root mean squared error ( RMSE ). This process was repeated for each combination
of language model and regressor.</p>
        <p>Appendix contains the results of this experiment. Based on that, roberta-suicide-es was deemed to be
the best model for encoding the texts. Additionally, Table 8 shows a detailed report of the evaluation of
the best regression model with these embeddings.</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Fine-tuning a Language Model for Regression</title>
        <p>
          Apart from the approach mentioned above, we also experimented with the pure Deep Learning ( DL )
[
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] approach of taking a language model and fine-tuning it with the labels of the corresponding subtask.
The model we fine-tuned was a version of RoBERTa pre-trained for detecting suicidal behavior from
texts in Spanish. We chose this model due to the fact of having been trained previously for a task that
shares similar characteristics to ours. Intermediate fine-tuning has been proven to improve the results
of downstream tasks by prior literature. The HuggingFace Transformers and Pytorch libraries in Python
were utilized for loading the model weights and implementing the training loop. We changed the head
of the pre-trained model to a linear layer consisting of output dimension 1 for simple regression. The
models were trained using an NVIDIA GeForce RTX 3090 24G GPU for a total of 30 epochs, where the
weights of the pre-trained model remained fully frozen for the first half and then were progressively
unfrozen[
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] each epoch after that as in ( see Table 4 ).
        </p>
        <p>We used an Adam Optimizer with Mean-Squared Error ( MSE ) for the simple regression models.
However, this did not improve the results empirically as compared with simply normalizing he outputs of
the predictions after inference. The formula of this loss is shown in ( LOSS_custom = LOSS_crossentropy
). Other hyperparameters are shown in Table 4.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results</title>
      <p>Using the approaches mentioned in the prior section, we came up with diferent models to solve the
subtask of Task 1 of MentalRiskES. The results in this section are obtained from selecting the
bestperforming models after evaluating the diferent approaches and hyperparameters on the validation
set. The final predictions were obtained from a test set of messages from 136 subjects never observed
during the training process and evaluated against the task’s true labels.</p>
      <p>
        In the tables ( Table 5, Table 6 ) below, we report the relevant metrics obtained for this subtask
and compare them against the ones obtained from baseline models provided by the organizers of the
competition. In particular, we report both absolute metrics, obtained after observing all the messages
of each subject, and early detection metrics, obtained after incrementally observing the messages
across several rounds. Additionally, Table 7 displays the inference-time CO2 emissions and energy
consumption of each model, based on computing their absolute predictions on the test set. These values
were estimated using the codecarbon[
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] python library.
      </p>
      <p>run
run0
run1
run2
baseline1
baseline2
run compared with the actual model
roberta-base-bne-finetuned-suicide-es embeddings and Ridge EmbeddingsRegressor
actual_model</p>
      <p>Robertuito</p>
      <p>Roberta Base
the inference-time CO2 emissions and energy consumption of each model
duration_mean emissions_mean cpu_energy_mean</p>
      <p>gpu_energy_mean ram_energy_mean energy_consumed_mean
1.20E-05
1.20E-05
1.22E-05
null
null
The embeddings approaches and regressors combination yielded the results representation
estimator
r2_score_risk</p>
      <p>mean_squared_error_risk</p>
      <p>For the absolute metrics, we show the accuracy, precision, recall, and F1 scores for the classification
task ( Subtask 1) and the root mean squared error ( RMSE ) and coeficient of determination ( R2 ) for
the regression task ( expand Subtask 1 ). The early detection metrics include the early-risk detection
metric ( erde ) computed after observing diferent rounds of messages as well as other metrics ( more
details are provided in the competition guidelines ).</p>
      <p>The metrics are shown along with the name of the model used to obtain them. The models are named
as follows: [ model name ]-[ approach ]. For example, roberta-suicide-es-fine-tuning refers to the model
trained with the task 1 ( binary classification ) labels by fine-tuning the Roberta model pre-trained for
suicide detection. The " approach " can be either embeddings or fine-tuning for the two approaches
described in Section 3.</p>
      <p>Furthermore, all ML regressors trained with embeddings as features were simple regressors, and all
embeddings were obtained using roberta-suicide-es encodings as this combination yielded the best
results in the evaluation set. The embeddings approaches and regressors for task 1 ( see Table 8 ).</p>
      <p>For R2-score, it can be understood in a simple way as using the mean as the error reference to
see if the prediction error is greater than or less than the mean reference error. R2-score = 1. The
predicted values in the sample are exactly equal to the true values without any error, indicating that
the interpretation of the dependent variable by the independent variable in the regression analysis is
better. R2-score = 0. At this point, the numerator is equal to the denominator, and each predicted value
of the sample is equal to the mean. R2-score is not the square of r. It may also be negative ( numerator &gt;
denominator ). The model is equivalent to blind guessing. It is better to directly calculate the average
value of the target variable. Specific formula representation see Figure 1.</p>
      <p>MSE is the abbreviation of Mean Squared Error and is a commonly used indicator to measure the
prediction accuracy of regression models. It represents the average of the sum of squares of the
diferences between the predicted values and the true values, and is usually used to evaluate the
performance of regression models ( see Figure 2 ). RMSE is the abbreviation of Root Mean Squared
Error and is a commonly used indicator to measure the prediction accuracy of regression models. It
represents the average magnitude of the diference between the predicted value and the true value, and
is usually used to evaluate the performance of the regression model ( see Figure 3 ). Among them, y-i is
the true value of the i-th sample, yˆ-i is the predicted value of the model for the i-th sample, and m is the
number of samples.</p>
      <p>The smaller the MSE and RMSE are, the higher the prediction accuracy of the model is. However, it
should be noted that MSE and RMSE are greatly afected by outliers. Therefore, in practical applications,
a comprehensive evaluation needs to be conducted in combination with other indicators ( such as the
maximum error, max-error ) .</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions</title>
      <p>The results show that the approaches considered in this work were successful at modeling of the
predictive subtasks, with at least one of our models outperforming the baselines in most cases. We can
make the following observations:</p>
      <p>• The best-performing approach across task1 seems to be the one that uses the embeddings of the
messages as input to a simple-output regression model. At least one model trained with this approach
reached the top ranking for task absolute ranking metrics and outperformed the baseline absolute
metrics across this task.</p>
      <p>• Most notably, the regression method that uses regressors obtained the best metrics for task across
all models, outperforming the fine-tuning approach by over 20% in the absolute metrics and reaching
the better highest spot in the early-risk metrics for this task in our valid dataset ( train dataset split into
15% of subject ids for validation ).</p>
      <p>• Models trained for single-output regression perform very well for binary classification and simple
regression tasks, even outperforming the models trained for simple transformer targets in their own
subtask. This suggests that using one model with MLPRegressor to solve for single targets was indeed
a good approach to this problem.</p>
      <p>
        • The models obtained with a pure DL approach from fine-tuning a RoBERTa[
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] model are estimated
to produce over 3-4x less emissions at inference time than the hybrid approach from training linear
regressors on sentence embeddings. This gap is likely because the fine-tuning approach requires less
computation at inference time than the hybrid approach, which requires the computation of the sentence
embeddings before feeding them to regressors, while the fine-tuning approach is made in one forward
pass.
      </p>
      <p>Another finding we can conclude from these insights is that while our models achieve great results
in the absolute ranking metrics, they do not perform as well for the metrics that assess early-risk
performance. In our work, we did not model explicitly for an early detection scenario. We only added
information about prior messages through data augmentation. This limitation means our models may
not perform as well in real-world situations where we aim to detect signs of gambling disorder in a
conversation early on.</p>
      <p>Thus, it may be important to explore diferent training approaches to improve the performance of
early-risk detection. This might include directly employing online learning to predict and update the
model as new messages come in or incorporating an ensemble of models to make independent decisions
about a message’s risk level and combining them for a final decision. Additionally, we may also look
into more eficient implementations of the hybrid approach to minimize the disparity in emissions
compared to pure DL models. These improvements are crucial when considering the deployment of our
models in real-world situations and will be the focus of future work.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Acknowledgments</title>
      <p>Thank you for MentalRiskES@IberLEF 2025 organizing the competition and providing the dataset and
other support, and thanks to students of Yunnan University individuals and groups that assisted in the
research and the preparation of the work.</p>
    </sec>
    <sec id="sec-7">
      <title>7. Declaration on Generative AI</title>
      <sec id="sec-7-1">
        <title>The author(s) have not employed any Generative AI tools.</title>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>A. Appendices</title>
      <sec id="sec-8-1">
        <title>The sources for the fine-tune pre-trained models are available via:</title>
        <p>• fine-tune a pre-trained RoBERTa model ( models–PlanTL-GOB-ES–roberta-base-bne ), see Figure
4.
• fine-tune a pre-trained RoBERTa model (
models–somosnlp-hackathon-2023–roberta-base-bneifnetuned-suicide-es ), see Figure 5.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Álvarez-Ojeda</surname>
            , Pablo, Cantero-Romero,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Victoria</surname>
          </string-name>
          , Semikozova, Anastasia, Montejo-Ráez,
          <article-title>Arturo, The precom-sm corpus: Gambling in spanish social media</article-title>
          ,
          <source>in: Proceedings of the 31st International Conference on Computational Linguistics</source>
          ,
          <year>2025</year>
          , pp.
          <fpage>17</fpage>
          -
          <lpage>28</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J.</given-names>
            <surname>Xiong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Lipsitz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Nasri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. M. W.</given-names>
            <surname>Lui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Gill</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Phan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chen-Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Iacobucci</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Majeed</surname>
          </string-name>
          ,
          <string-name>
            <surname>R. S. McIntyre</surname>
          </string-name>
          ,
          <article-title>Impact of COVID-19 pandemic on mental health in the general population: A systematic review</article-title>
          ,
          <source>in: Journal of Afective Disorders</source>
          <volume>277</volume>
          ,
          <year>2020</year>
          , p.
          <fpage>55</fpage>
          -
          <lpage>64</lpage>
          . URL: https://www. sciencedirect.com/science/article/pii/S0165032720325891. doi:
          <volume>10</volume>
          .1016/j.jad.
          <year>2020</year>
          .
          <volume>08</volume>
          .001.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>González-Barba</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Ángel</surname>
            , Chiruzzo, Luis, Jiménez-Zafra,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>María</surname>
          </string-name>
          , Overview of IberLEF 2025:
          <article-title>Natural Language Processing Challenges for Spanish and other Iberian Languages, in: Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2025), co-located with the 41st Conference of the Spanish Society for Natural Language Processing (SEPLN 2025), CEUR-WS</article-title>
          . org,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Mármol-Romero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Álvarez-Ojeda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Moreno-Muñoz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. M. P.</given-names>
            <surname>del Arco</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. D. MolinaGonzález</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.-T.</surname>
            Martín-Valdivia,
            <given-names>L. A.</given-names>
          </string-name>
          <string-name>
            <surname>Ureña-López</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Montejo-Ráez</surname>
          </string-name>
          , Overview of mentalriskes at iberlef 2025:
          <article-title>Early detection of mental disorders risk in spanish</article-title>
          ,
          <source>Procesamiento del Lenguaje Natural</source>
          <volume>75</volume>
          (
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.G.</given-names>
            <surname>Fandiño</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.A.</given-names>
            <surname>Estapé</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Pàmies</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.L.</given-names>
            <surname>Palao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.S.</given-names>
            <surname>Ocampo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.P.</given-names>
            <surname>Carrino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.A.</given-names>
            <surname>Oller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.R.</given-names>
            <surname>Penagos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.G.</given-names>
            <surname>Agirre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Villegas</surname>
          </string-name>
          ,
          <article-title>Maria: Spanish language models</article-title>
          ,
          <source>Procesamiento del Lenguaje Natural</source>
          <volume>68</volume>
          (
          <year>2022</year>
          ). URL: https://upcommons.upc.edu/handle/2117/367156#.YyMTB4X9A-0.mendeley. doi:
          <volume>10</volume>
          . 26342/2022-68-3.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>D. L.</given-names>
            <surname>Padial</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.</surname>
          </string-name>
          <article-title>Gómez, hackathon-somos-nlp-2023-roberta-base-bne-finetunedsuicide-es· hugging face (</article-title>
          <year>2023</year>
          ). URL: https://huggingface.co/hackathon-somos-nlp
          <article-title>-2023/ roberta-base-bne-finetuned-suicide-es.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A. E.</given-names>
            <surname>Hoerl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. W.</given-names>
            <surname>Kennard</surname>
          </string-name>
          ,
          <article-title>Ridge regression: Biased estimation for nonorthogonal problems</article-title>
          , in: Technometrics,
          <fpage>55</fpage>
          -
          <lpage>67</lpage>
          , [Taylor Francis, Ltd., American Statistical Association,American Society for Quality],
          <year>1970</year>
          , p.
          <fpage>12</fpage>
          . URL: https://www.jstor.org/stable/1267351. doi:
          <volume>10</volume>
          .2307/1267351.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>L.</given-names>
            <surname>Breiman</surname>
          </string-name>
          ,
          <article-title>Random forests</article-title>
          ,
          <source>in: Machine Learning</source>
          , volume
          <volume>45</volume>
          ,
          <year>2001</year>
          , p.
          <fpage>5</fpage>
          -
          <lpage>32</lpage>
          . URL: https: //doi.org/10.1023/A:1010933404324. doi:
          <volume>10</volume>
          .1023/A:
          <fpage>1010933404324</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>C. S.</given-names>
            <surname>Perone</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Silveira</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. S.</given-names>
            <surname>Paula</surname>
          </string-name>
          ,
          <article-title>Evaluation of sentence embeddings in downstream and linguistic probing tasks</article-title>
          ,
          <year>2018</year>
          . URL: http://arxiv.org/abs/
          <year>1806</year>
          .06259.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>J.</given-names>
            <surname>Friedman</surname>
          </string-name>
          ,
          <article-title>Greedy function approximation: A gradient boosting machine</article-title>
          ,
          <source>in: The Annals of Statistics</source>
          ,
          <year>2000</year>
          , p.
          <fpage>29</fpage>
          . doi:
          <volume>10</volume>
          .1214/aos/1013203451.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>F.</given-names>
            <surname>Pedregosa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Varoquaux</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gramfort</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Michel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Thirion</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Grisel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Blondel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Prettenhofer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Weiss</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Dubourg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Vanderplas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Passos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Cournapeau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Brucher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Perrot</surname>
          </string-name>
          , E. Duchesnay,
          <article-title>Scikit-learn: Machine learning in python</article-title>
          ,
          <source>in: Journal of Machine Learning Research</source>
          ,
          <volume>12</volume>
          ,
          <year>2011</year>
          , p.
          <fpage>2825</fpage>
          -
          <lpage>2830</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>A.</given-names>
            <surname>Paszke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gross</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Massa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lerer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bradbury</surname>
          </string-name>
          , G. Chanan,
          <string-name>
            <given-names>T.</given-names>
            <surname>Killeen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Gimelshein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Antiga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Desmaison</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kopf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>DeVito</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Raison</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Tejani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chilamkurthy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Steiner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. B. L.</given-names>
            <surname>Fang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chintala</surname>
          </string-name>
          ,
          <string-name>
            <surname>Pytorch:</surname>
          </string-name>
          <article-title>An imperative style, highperformance deep learning library</article-title>
          ,
          <source>in: Advances in Neural Information Processing Systems</source>
          <volume>32</volume>
          ,
          <string-name>
            <surname>Curran</surname>
            <given-names>Associates</given-names>
          </string-name>
          , Inc.,
          <year>2019</year>
          , p.
          <fpage>8024</fpage>
          -
          <lpage>8035</lpage>
          . URL: http://papers.neurips.cc/paper/ 9015-pytorch
          <article-title>-an-imperative-style-high-performance-deep-learning-library</article-title>
          .pdf.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>C. C. Liu</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Pfeifer</surname>
            ,
            <given-names>I. Vulić</given-names>
          </string-name>
          ,
          <string-name>
            <surname>I. Gurevych</surname>
          </string-name>
          ,
          <article-title>Improving generalization of adapter-based crosslingual transfer with scheduled unfreezing</article-title>
          ,
          <year>2023</year>
          . URL: http://arxiv.org/abs/2301.05487.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>V.</given-names>
            <surname>Schmidt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Joshi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Feld</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Conell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Laskaris</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Blank</surname>
          </string-name>
          , J. Wilson,
          <string-name>
            <given-names>S.</given-names>
            <surname>Friedler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Luccioni</surname>
          </string-name>
          ,
          <article-title>Codecarbon: estimate and track carbon emissions from machine learning computing</article-title>
          , in: Cited on,
          <year>2021</year>
          , p.
          <fpage>20</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ott</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Joshi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Levy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zettlemoyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Stoyanov</surname>
          </string-name>
          ,
          <article-title>Roberta: A robustly optimized bert pretraining approach</article-title>
          ,
          <year>2019</year>
          . URL: http://arxiv.org/abs/
          <year>1907</year>
          . 11692. doi:
          <volume>10</volume>
          .48550/arXiv.
          <year>1907</year>
          .
          <volume>11692</volume>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>