<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>A. Casamayor);</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>ELiRF-VRAIN at MentalRiskES 2024: Using LongFormer for Early Detection of Mental Disorders Risk</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Andreu Casamayor</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vicent Ahuir</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Antonio Molina</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lluís-Felip Hurtado</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Valencian Research Institute for Artificial Intelligence (VRAIN), Universitat Politècnica de València</institution>
          ,
          <addr-line>Camino de Vera s/n, 46022 Valencia.</addr-line>
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0001</lpage>
      <abstract>
        <p>This paper describes the approaches taken by the ELiRF-VRAIN team at the shared tasks of MentalRiskES at IberLEF 2024 [1]. These shared tasks involved two activities focused on identifying mental illness on Spanish-language social media: detection of disorder and context detection. Our work consisted of three approaches: one approach based on a Support Vector Machine and the other two based on Transformer architecture pre-trained models, one using BERT-like models and the other using LongFormer models. In order to fine-tune our models, we used a data augmentation process on the data provided by the organization. According to the results, our approaches fit the task correctly.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Longformer</kwd>
        <kwd>Transformers</kwd>
        <kwd>Support Vector Machine</kwd>
        <kwd>Mental disorder detection</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        A mental disorder is characterized by a clinically significant disturbance in an individual’s
cognition, emotional regulation, or behavior. It is usually associated with distress or impairment
in important areas of functioning [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>
        According to the World Health Organization (WHO), 1 in every 8 people is living with a
mental disorder, with anxiety and depressive disorders the most common [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Although the
problem is widely known, the number of people is still increasing, and discrimination against
them still exists. Currently, the governments work to prevent and cure mental illness. However,
the lack of human and material resources means that many people cannot receive adequate
treatment or none at all. In addition to all this, early detection of mental disorders is often
dificult.
      </p>
      <p>In this context, detecting mental disorders risk through analyzing social media interactions
has acquired great relevance in recent years. Many factors make the problem of mental disorders
detection complicated, such as availability, amount, and quality of data. Providing quality labeled
data in Spanish and promoting the creation of models for this early detection is precisely the
objective of the MentalRiskES shared tasks.</p>
      <p>
        In the 2024 edition, the competition consisted of three tasks [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]: (1) Detection of mental
disorder, (2) Context Detection, and (3) Suicidal ideation detection. Our team participated in the
ifrst two tasks.
      </p>
      <p>
        To tackle task 1, we considered three diferent approaches.
1. The first approach is based on a classic machine learning algorithm: Support Vector
Machines (SVM). SVMs have demonstrated adequate behavior in long text classification
tasks such as this case. We consider this approach as an assessment of the performance
of classical models.
2. The second approach is based on Transformers [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. We use a pre-trained RoBERTa model
[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] as a basis and then run a fine-tuning process to adjust them to the task domain. We
considered two diferent datasets to do fine-tuning: the one provided by the organization
and an expanded version of the dataset through a data augmentation process.
3. The last approach is similar to the second one; however, to capture more context, we use
a pre-trained LongFormer model [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. This way, the model is able to capture more context
because of the bigger size of the input layer. We used the same dataset as in the previous
approach for the fine-tuning phase.
      </p>
      <p>We submitted three runs for task 1, one for each approach. The best model of each approach
was chosen through a previous validation stage in which diferent parameters and datasets were
considered.</p>
      <p>To tackle task 2, we sent one system based on the third approach of the first task, a
LongFormerbased solution. We chose that approach since it was the more promising one based on the
evaluation results of task 1.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Description of Dataset and Tasks</title>
      <p>
        The datasets delivered by the organization consisted of a message collection sent to diferent
groups on Telegram [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. These public groups have the characteristic of being in Spanish and
related to mental illnesses. The messages were anonymized and, subsequently, labeled by ten
annotators at the user level; that is, each user was labeled considering his/her messages.
      </p>
      <p>Two diferent datasets were delivered: one for the first two tasks and a diferent one for
the third task. The first dataset, the one with which we worked, has the following sample
distribution: 20 users for trial, 465 users for train, and 400 users for test.</p>
      <p>As stated above, the main objective of this competition is to predict mental disorders as soon
as possible. To achieve realistic behavior, the organization emulated a real conversation by
setting up a server that gives out packets of data containing a message for each user. The system
must predict the label of each user, considering the current message and all their previous
messages, before the classification system will receive the next packet. The goal is to predict
each user’s mental disorder, if any, as quickly as possible.</p>
      <sec id="sec-2-1">
        <title>2.1. Task 1: Disorder Detection</title>
        <p>Task 1 is a multiclass classification task whose objective is to predict if users sufer from
depression, anxiety, or none disorder.</p>
        <p>Table 1 shows the distribution among the diferent labels in the dataset for the first task.</p>
        <p>None
Depression
Anxiety
Total</p>
        <p>To maximize the available samples for the training process, we joined the Train and Trial
partitions to train our systems; the Total column of Table 1 shows the final sample distribution
of our training dataset.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Task 2: Context Detection</title>
        <p>Task 2 is a two-level multiclass multilabel task: in addition to detecting the mental illness,
the context or contexts in which it appears must be detected. There are 7 contexts: addiction,
emergency, family, work, social, other, and none.</p>
        <p>The label distribution in this total dataset can be seen in Table 2. It shows how the contexts
of Family, Social, Other, and None are the most common.</p>
        <p>Depression
Anxiety
Total</p>
        <p>Addiction
9
3
12</p>
        <p>Emergency
7
10
17</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. System architecture and Techniques</title>
      <p>In this kind of task, an important aspect to count on is the amount of context required to perform
the detection correctly. Since each user can have many messages, the size of the input to the
system must be a factor to consider. One goal of our team was to study the impact of the context
in these tasks. That is, measure the capabilities of diferent systems depending on how much
context they can manage. We selected three diferent systems to achieve this goal: the first
based on Support Vector Machines (SVM), the second based on a RoBERTa model, and the third
based on a LongFormer model. Every system evaluated has a diferent size for context:
• SVM has no limit in the input size; it creates a vector as long as the vocabulary size.
• The selected RoBERTa model has a limit of 512 tokens in the input.</p>
      <p>• The selected LongFormer model has a limit of 4096 tokens in the input.</p>
      <p>
        Regarding the dataset, we translated all the data into English because the Transformers base
models were pre-trained using documents in this language. We used the library EasyNMT
[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] and the model OPUS-MT Spanish-English [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] (https://huggingface.co/Helsinki-NLP/
opus-mt-es-en). Furthermore, we created two diferent datasets to train and evaluate the
performance of the transformer-based systems.
      </p>
      <p>Dataset 1. We created only one sample per user by accumulating all his/her messages, for
both positive and negative labeled users.</p>
      <p>Dataset 2. If we had some a priori evidence of in which message a user begins to present
symptoms of mental illness risk, we could label the samples from previous messages as negative,
and the samples containing that message and subsequent ones as positive. In this way, we could
increase the number of positive samples, in order to achieve a more precise model. This data
augmentation process is explained below.</p>
      <p>To carry out our experimentation, we divided the original dataset into two partitions: training
(80% of users) and development (20% of users), maintaining the proportions of positive and
negative samples in each of the partitions. Table 4 shows the distribution of samples in Dataset
1.</p>
      <p>None
Depression
Anxiety
Total</p>
      <sec id="sec-3-1">
        <title>3.1. Data Augmentation</title>
        <p>The data augmentation process aims to create more samples per positive user. We said above
that we need some evidence of the message in which a user begins to express symptoms of
illness. To do this, we relied on the prediction of the SVM-based classifier. We can assume that
all the previous messages to the SVM decision don’t express symptoms of illness. To achieve
this goal, we followed the next steps:
1. For positive users, we calculated how many messages the SVM needs to classify the user
as positive (depression or anxiety). Each user has a diferent trigger value.
2. For false negatives, we used the mean of the true positive trigger values as the trigger
value.
3. For each positive user in the original data set, let  be the number of messages that the
SVM model needs to determine this user’s mental disorder risk,   be the maximum
number of messages the model supports as input, and  the ith message from the user.
a) we created  − 1 negative samples as follows:</p>
        <p>(1), (12), (123), ..., (1...− 1)
b) and   −  + 1 positive samples:</p>
        <p>(1...), (1...+1), ..., (1...... )
4. Note that the value of   depends on which model was used and the number of
tokens in the messages. That is, we discard messages from an accumulated history of
more than 512 tokens for RoBERTa and 4096 for LongFormer. So, if  &gt;   only
negative samples are generated.
5. For negative users, we created new samples accumulating the history as before, stopping
when the MAX was reached.</p>
        <p>The result of this technique is a new dataset with a higher number of positive samples for
the training.</p>
        <p>None
Depression
Anxiety
Total</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Task 1: Disorder Detection</title>
        <sec id="sec-3-2-1">
          <title>3.2.1. Classical Machine Learning Classifier Approach</title>
          <p>To evaluate the context’s importance, we wanted to use a classical machine learning classifier that
can handle all the context. One of the most important issues of models based on Transformers
is their poor ability to deal with large texts, because of their limitation in the input size. This
issue can afect the performance since the input cannot hold all the sample length, and valuable
information may be lost in this process.</p>
          <p>
            Firstly, we did an experiment where we compared diferent types of classical machine learning
classifiers. Scikit-learn library [
            <xref ref-type="bibr" rid="ref11">11</xref>
            ] provided us with the tools to develop this experiment. The
configuration was to use all the default classifiers to select the better one. The results can be
seen in Table 6. The table shows that the best classifier was the Linear SVM.
          </p>
          <p>Linear SVM
Gradient Boosting
K-Neighboors
Random Forest</p>
          <p>
            Once the classifier was chosen, we wanted to test diferent approaches:
• Preprocess of Data:
1. First approach: Transform the text into tokens using TweetTokenizer and then
eliminate stop words.
2. Second Approach: Same as the first approach with the addition of methods to clean
the text, eliminate non-alphanumerical characters and others, and lemmatize tokens.
• Sentimental Analysis: We used the model
"lxyuan/distilbert-base-multilingualcased-sentiments-student" [
            <xref ref-type="bibr" rid="ref12">12</xref>
            ] to proceed with a sentimental analysis of every message
per user. We obtained 3 results, positive messages, negative messages, and neutral
messages, all normalized in the end. We add these results as a new feature for the TF-IDF.
• TF-IDF: The class TfidfVectorizer in Scikit-learn was used to vectorize the data. We
tested diferent configurations for the analyzer and ngram_range number, and used the
default values for the other features.
          </p>
          <p>To find the best models for every approach, we did an exhaustive grid search over some
specific parameters, such as regularization parameter C, diferent tols (Tolerance for stopping
criteria), and diferent loss.</p>
          <p>We obtained 6 diferent approaches. Table 7 shows the diferent configurations used in the
experimentation, the column TF-IDF refers to the type of analyzers (word or char) used and the
number of n-grams. The last column refers to the best model found in the search grid.</p>
          <p>The result shows in Table 8 the best configuration is the SVM-4, using the most completed
preprocess for the data, sentimental analysis, "char_wb" as the analyzer and (4-5) as ngram_range.
This model was used for Run0 in Task 1.</p>
          <p>SVM-1
SVM-2
SVM-3
SVM-4
SVM-5
SVM-6
SVM-7
SVM-8
"char_wb" , 4-5 n-gram
"char_wb" , 4-5 n-gram
"char_wb" , 4-5 n-gram
"char_wb" , 4-5 n-gram
"word" , 1-2 n-gram
"word" , 1-2 n-gram
"word" , 1-2 n-gram
"word" , 1-2 n-gram</p>
          <p>Best Model
’C’: 1, ’loss’: ’squared_hinge’, ’tol’: 0.1
’C’: 1, ’loss’: ’squared_hinge’, ’tol’: 0.1
’C’: 1, ’loss’: ’hinge’, ’tol’: 0.1
’C’: 1, ’loss’: ’hinge’, ’tol’: 0.1
’C’: 10, ’loss’: ’hinge’, ’tol’: 0.1
’C’: 10, ’loss’: ’hinge’, ’tol’: 0.1
’C’: 10, ’loss’: ’hinge’, ’tol’: 0.1
’C’: 10, ’loss’: ’hinge’, ’tol’: 0.1</p>
        </sec>
        <sec id="sec-3-2-2">
          <title>3.2.2. BERT-like Model Approach</title>
          <p>
            It is well known that the state-of-the-art models in NLP are based on Transformers. Models
like BERT or RoBERTa usually provide good versatility for classification tasks. However, these
types of models usually can not handle more than 512 tokens, which could be a problem for
tasks with long contexts such as the current ones. Therefore, we used one of these models as a
baseline to compare other models with a better capacity to handle large contexts. Some research
made by Alireza Porkeyvan [
            <xref ref-type="bibr" rid="ref13">13</xref>
            ] shows that the state of the art in mental disorder detection
is MentalRoBERTa [
            <xref ref-type="bibr" rid="ref14">14</xref>
            ]. MentalRoBERTa is a RoBERTa-like model specialized in mental
health. This model is pre-trained using a special corpus of texts from mental health forums,
clinical notes, and normal corpus. Consequently, MentalRoBERTa provides better adaptation
for the mental health-related language, which brings a lot of possible applications related to
this domain.
          </p>
          <p>The model chosen was AIMH/mental-roberta-large [15], a RoBERTa model trained with
posts on Reddit related to mental health. This model can be found in HuggingFace [16] public
hub (https://huggingface.co/AIMH/mental-roberta-large). Furthermore, we wanted to compare
a specific domain RoBERTa model, like MentalRoBERTa, with the non-domain RoBERTa model,
the baseline of the competition (RoBERTa base).</p>
          <p>Once we chose our pre-trained model, we performed an experiment that consisted of testing
two fine-tuning processes: one with the Dataset 1 (RoBERTa-1) and the other with the Dataset
2 (RoBERTa-2); the second dataset is the one with data augmentation. Table 9 shows the
configuration used in the fine-tuning process.</p>
          <p>parameter
optimizer
learning rate
lr scheduler type
weight decay
number of epochs
training batch size</p>
          <p>value
AdamW</p>
          <p>7e-5
linear
0.01
10
16</p>
          <p>Table 10 shows the results of each model on the development partition. The results show that
the best model is RoBERTa-2, the one fine-tuned with data augmentation. In our participation,
this model was used for Run1 in Task 1.</p>
          <p>RoBERTa-1
RoBERTa-2</p>
          <p>Data Augmentation</p>
          <p>No
Yes</p>
          <p>Precision
0.81
0.94</p>
          <p>Recall F1-score
0.82 0.81
0.94 0.93</p>
        </sec>
        <sec id="sec-3-2-3">
          <title>3.2.3. LongFormer Approach</title>
          <p>
            As we said before, one of the most important disadvantages of BERT-like or RoBERTa-like
models based on Transformers is the lack of capacity to handle large contexts. However, a
variant of Transformers can handle large text called LongFormer [
            <xref ref-type="bibr" rid="ref7">7</xref>
            ].
          </p>
          <p>LongFormer is the abbreviation for “Long-Document Transformer” and can process long
contexts more eficiently than Transformer models, such as BERT or RoBERTa. LongFormer
architecture shows the following characteristics:
• New attention mechanism: An eficient attention mechanism that uses a sliding
window, where each token only attends to a fixed number of neighborhood tokens, reducing
the complexity.
• Global attention selection: The architecture can select which tokens are globally
attended and which are just attended locally.</p>
          <p>The pre-trained model chosen was AIMH/mental-longformer-base-4096 [17] a pre-trained
LongFormer for the mental health domain. This model can be found in https://huggingface.co/
AIMH/mental-longformer-base-4096.</p>
          <p>As in with the RoBERTa model, we fine-tuned the LongFormer with the two datasets: Dataset
1 without data augmentation (LongFormer-T1-1), and Dataset 2 with data augmentation
(LongFormer-T1-2). We used the same fine-tuning parameters as in RoBERTa’s
experimentation; the configuration is in Table 9.</p>
          <p>Table 11 shows the results of the experimentation, where LongFormer-T1-2 (fine-tuned
with data augmentation) achieves better performance than LongFormer-T1-1 (fine-tuned
without data augmentation). This model was Run2 in our participation.</p>
          <p>LongFormer-T1-1
LongFormer-T1-2
The experimentation for Task 1 shows that the best system is the LongFormer-T1-2, so to take
part in Task 2 we only chose this approach. We used the LongFormer pre-trained model as the
base model, increased the number of samples of the competition dataset with data augmentation,
changed the labels for the new ones, and fine-tuned the model. LongFormer-T2 model was
used for Run0 in the second task.</p>
          <p>The Table 12 shows the results of the fine-tuning process.</p>
          <p>LongFormer-T2</p>
          <p>Model
LongFormer</p>
          <p>Precision
0.99</p>
          <p>Recall F1-score
0.98 0.97</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Runs</title>
      <p>The reason for choosing these models was to assess the importance of context in predicting
mental illness. Each model has a diferent input length capability, which can handle larger or
smaller context sizes.</p>
      <p>On the one hand, BERT-like models performed better than SVMs in the first task, even
though BERT-like models can handle less context than SVMs. On the other hand, LongFormer
performed slightly better than BERT-like models in the first task since LongFormer can handle
larger contexts.</p>
      <sec id="sec-4-1">
        <title>4.1. Run Configuration</title>
        <p>Besides, to select the model for each run, the classification systems contained additional
parameters that needed to be set:</p>
        <p>Task1:
• For every round in the competition, we used as the input classifier a new sample created
combining the new message of the user with the previous ones.
• Each system has an initial context, in other words, we made our systems wait until the
initial context was suficiently large. This context was diferent in each system:
– SVM: An initial context of 50 tokens after the pre-process.</p>
        <p>– RoBERTa and LongFormer: An initial context of 100 tokens.
• The RoBERTa and LongFormer system has a limit of tokens, when the system was full
we just returned the last prediction made.</p>
        <p>Task2:</p>
        <p>For the second task, we combined the best model from Task 1 (LongFormer-T1-2) and the one
ifne-tuned specifically for Task 2 (LongFormer-T2). The first model was used to discriminate
between negative cases and positive cases. If the sample was detected as positive, then the
LongFormer-T2 was used to predict the context.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Results</title>
      <p>5.1. Task 1</p>
      <p>Table 14 shows how the best system is the Run 2, which refers to the LongFormer-T1-2:
pre-trained LongFormer fine-tuned with the data augmentation. This run achieved the first
position in the competition. The only two runs that beat the Baseline were our Run1 and Run2,
indicating the importance of appropriate data selection.</p>
      <p>Although the best runs used a model base in Transformers, the run with SVM achieves
a similar result, only 1% less than Run1. This indicates that classical approaches like SVMs
continue to be useful in detecting mental illnesses because of their ability to handle large
contexts. Therefore, SVMs still well-fitted in situations with low computational resources.
5.2. Task 2</p>
      <p>As can be seen from Table 15, the results obtained by our system in the competition are not
as good as those obtained in the development partition, which might indicate that the model
was overfitted during the fine-tuning process. Further analysis is needed to find the source of
the low generalization capabilities of the developed model.</p>
      <sec id="sec-5-1">
        <title>5.3. Carbon emission</title>
        <p>
          One of the main goals of the competition is to identify systems that complete tasks with minimal
resource consumption[
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. This will help them pinpoint technologies that can operate on mobile
devices or personal computers and those with the lowest carbon emissions. Therefore, we
include the following information:
• Total time to process (in milliseconds)
• Kg in CO2 emissions.
        </p>
        <p>Using the provided script, which utilizes the CodeCarbon API [18] to calculate emissions,
we present our team’s computer configuration in Table 16. This table details the types and
quantities of CPUs and GPUs employed, as well as the total RAM used. We present the results
for the LongFormer-T1-2 Run 2.</p>
        <p>Measurements
CPU_Count
GPU_Count
CPU_Model
GPU_Model
RAM_Total_Size
Country_ISO_Code
Values
24
1
12th Gen Intel(R) Core(TM) i9-12900K</p>
        <p>NVIDIA GeForce RTX 4090
128 GB</p>
        <p>ESP</p>
        <p>Figure 1 illustrates the variation in emissions and duration during the experimentation.
A direct correlation exists between each measurement, indicating that rounds with longer
durations emitted more CO2. Since every round utilized the same models and configurations,
the primary factor influencing emissions was the length of the round and the accumulated
context of the user.</p>
        <p>(a) Emissions of CO2 (Kg) of each round
(b) Duration (milliseconds) of each round</p>
        <p>Figure 2 displays the cumulative energy consumption of each component. The GPU is the
highest energy-consuming component, accounting for approximately 83% of the total energy
usage. The CPU follows, consuming 16.5%, while RAM accounts for only 0.5% of the total energy
consumption.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>In this paper, we have presented the participation of the ELiRF-VRAIN team in the shared tasks
of MentalRiskES at IberLef 2024. In addition to testing classic classification models and
state-ofthe-art transformer models, our team’s most innovative contribution was using LongFormer
models to expand the context for making the decision and increase the training corpus through
data augmentation.</p>
      <p>The results obtained support the correctness of our proposal, being the only team to exceed
the baseline presented by the organization of the shared task.</p>
      <p>For future work, two lines of improvement are identified. On the one hand, try to improve
early detection so that the system does not need as much initial context to make the right
decision; on the other hand, use Explainable Artificial Intelligence (XAI) techniques to better
understand the system’s behavior.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>This work is partially supported by MCIN/AEI/10.13039/501100011033 and "ERDF A way of
making Europe" under grant PID2021-126061OB-C41. Partially supported by the Vicerrectorado
de Investigación de la Universitat Politècnica de València PAID-01-23. It is also partially
supported by the Spanish Ministerio de Universidades under the grant FPU21/05288 for university
teacher training and by the Generalitat Valenciana under CIPROM/2021/023 project.
pretrained language models for mental healthcare, in: N. Calzolari, F. Béchet, P. Blache,
K. Choukri, C. Cieri, T. Declerck, S. Goggi, H. Isahara, B. Maegaard, J. Mariani, H. Mazo,
J. Odijk, S. Piperidis (Eds.), Proceedings of the Thirteenth Language Resources and
Evaluation Conference, European Language Resources Association, Marseille, France, 2022, pp.
7184–7190. URL: https://aclanthology.org/2022.lrec-1.778.
[15] AIMH, Mentalroberta: A robustly optimized bert pretraining approach for mental health,
2024. URL: https://huggingface.co/AIMH/mental-roberta-large, accessed: 2024-05-15.
[16] T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf,
M. Funtowicz, J. Davison, S. Shleifer, P. von Platen, C. Ma, Y. Jernite, J. Plu, C. Xu, T. L.
Scao, S. Gugger, M. Drame, Q. Lhoest, A. M. Rush, Transformers: State-of-the-art natural
language processing, 2020. URL: https://arxiv.org/abs/1910.03771. arXiv:1910.03771.
[17] AIMH, Mentallongformer: A long-document transformer model for mental health, 2024.</p>
      <p>URL: https://huggingface.co/AIMH/mental-longformer-base-4096, accessed: 2024-05-15.
[18] CodeCarbon, Codecarbon: Track and reduce your carbon emissions from machine learning
workloads, https://mlco2.github.io/codecarbon/index.html, 2024. Accessed: 2024-05-15.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>L.</given-names>
            <surname>Chiruzzo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Jiménez-Zafra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Rangel</surname>
          </string-name>
          , Overview of IberLEF 2024:
          <article-title>Natural Language Processing Challenges for Spanish and other Iberian Languages, in: Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2024), co-located with the 40th Conference of the Spanish Society for Natural Language Processing (SEPLN 2024), CEUR-WS</article-title>
          .org,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>World</given-names>
            <surname>Health</surname>
          </string-name>
          <string-name>
            <surname>Organization</surname>
          </string-name>
          , Mental disorders,
          <year>2022</year>
          . URL: https://www.who.int/news-room/ fact-sheets/detail/mental-disorders, accessed:
          <fpage>2024</fpage>
          -05-15.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>World</given-names>
            <surname>Health</surname>
          </string-name>
          <string-name>
            <surname>Organization</surname>
          </string-name>
          ,
          <source>Mental disorders fact sheet</source>
          ,
          <year>2022</year>
          . URL: https://www.who.int/ news-room/fact-sheets/detail/mental-disorders, accessed:
          <fpage>2024</fpage>
          -05-21.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Mármol-Romero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Moreno-Muñoz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. M. P.</given-names>
            <surname>del Arco</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. D.</surname>
            Molina-González, M.-
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Martín-Valdivia</surname>
            ,
            <given-names>L. A.</given-names>
          </string-name>
          <string-name>
            <surname>Ureña-López</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Montejo-Ráez</surname>
          </string-name>
          , Overview of mentalriskes at iberlef 2024:
          <article-title>Early detection of mental disorders risk in spanish</article-title>
          ,
          <source>Procesamiento del Lenguaje Natural</source>
          <volume>73</volume>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Vaswani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Parmar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Uszkoreit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Gomez</surname>
          </string-name>
          , Ł. Kaiser,
          <string-name>
            <surname>I. Polosukhin</surname>
          </string-name>
          ,
          <article-title>Attention is all you need</article-title>
          ,
          <source>Advances in Neural Information Processing Systems</source>
          <volume>30</volume>
          (
          <year>2017</year>
          ). URL: https://arxiv.org/abs/1706.03762, accessed:
          <fpage>2024</fpage>
          -05-15.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ott</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Joshi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Levy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zettlemoyer</surname>
          </string-name>
          , V. Stoyanov,
          <article-title>RoBERTa: A robustly optimized BERT pretraining approach</article-title>
          , arXiv preprint arXiv:
          <year>1907</year>
          .
          <volume>11692</volume>
          (
          <year>2019</year>
          ). URL: https://arxiv.org/abs/
          <year>1907</year>
          .11692.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>I.</given-names>
            <surname>Beltagy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. E.</given-names>
            <surname>Peters</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Cohan</surname>
          </string-name>
          ,
          <article-title>Longformer: The long-document transformer</article-title>
          , arXiv preprint arXiv:
          <year>2004</year>
          .
          <volume>05150</volume>
          (
          <year>2020</year>
          ). URL: https://arxiv.org/abs/
          <year>2004</year>
          .05150.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Mármol Romero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. Moreno</given-names>
            <surname>Muñoz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. M.</given-names>
            <surname>Plaza-del Arco</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. D. Molina González</surname>
            ,
            <given-names>M. T. Martín</given-names>
          </string-name>
          <string-name>
            <surname>Valdivia</surname>
            ,
            <given-names>L. A.</given-names>
          </string-name>
          <string-name>
            <surname>Ureña-López</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Montejo</surname>
            <given-names>Ráez</given-names>
          </string-name>
          ,
          <article-title>MentalRiskES: A new corpus for early detection of mental disorders in Spanish</article-title>
          , in: N.
          <string-name>
            <surname>Calzolari</surname>
            , M.-
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Kan</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Hoste</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Lenci</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Sakti</surname>
          </string-name>
          , N. Xue (Eds.),
          <source>Proceedings of the 2024 Joint International Conference on Computational Linguistics</source>
          ,
          <article-title>Language Resources and Evaluation (LREC-COLING 2024), ELRA</article-title>
          and
          <string-name>
            <given-names>ICCL</given-names>
            ,
            <surname>Torino</surname>
          </string-name>
          , Italia,
          <year>2024</year>
          , pp.
          <fpage>11204</fpage>
          -
          <lpage>11214</lpage>
          . URL: https://aclanthology.org/
          <year>2024</year>
          . lrec-main.
          <volume>978</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>N.</given-names>
            <surname>Reimers</surname>
          </string-name>
          ,
          <article-title>Easynmt: A simple interface to state-of-the-art machine translation models</article-title>
          ,
          <year>2020</year>
          . URL: https://github.com/UKPLab/EasyNMT, accessed:
          <fpage>2024</fpage>
          -05-15.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>J.</given-names>
            <surname>Tiedemann</surname>
          </string-name>
          , S. Thottingal,
          <article-title>OPUS-MT - Building open translation services for the World</article-title>
          ,
          <source>in: Proceedings of the 22nd Annual Conferenec of the European Association for Machine Translation (EAMT)</source>
          , Lisbon, Portugal,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>F.</given-names>
            <surname>Pedregosa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Varoquaux</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gramfort</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Michel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Thirion</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Grisel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Blondel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Prettenhofer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Weiss</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Dubourg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Vanderplas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Passos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Cournapeau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Brucher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Perrot</surname>
          </string-name>
          , É. Duchesnay,
          <article-title>Scikit-learn: Machine learning in python</article-title>
          ,
          <source>Journal of Machine Learning Research</source>
          <volume>12</volume>
          (
          <year>2011</year>
          )
          <fpage>2825</fpage>
          -
          <lpage>2830</lpage>
          . URL: https://jmlr.org/papers/v12/pedregosa11a. html.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>L. X.</given-names>
            <surname>Yuan</surname>
          </string-name>
          , distilbert
          <article-title>-base-multilingual-cased-sentiments-student (</article-title>
          <source>revision 2e33845)</source>
          ,
          <year>2023</year>
          . URL: https://huggingface.co/lxyuan/ distilbert-base
          <article-title>-multilingual-cased-sentiments-student</article-title>
          .
          <source>doi:10</source>
          .57967/hf/1422.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>A.</given-names>
            <surname>Pourkeyvan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Safa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sorourkhah</surname>
          </string-name>
          ,
          <article-title>Harnessing the power of hugging face transformers for predicting mental health disorders in social networks</article-title>
          ,
          <source>IEEE Access 12</source>
          (
          <year>2024</year>
          )
          <fpage>28025</fpage>
          -
          <lpage>28035</lpage>
          . URL: http://dx.doi.org/10.1109/ACCESS.
          <year>2024</year>
          .
          <volume>3366653</volume>
          . doi:
          <volume>10</volume>
          .1109/ access.
          <year>2024</year>
          .
          <volume>3366653</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>S.</given-names>
            <surname>Ji</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , L. Ansari,
          <string-name>
            <given-names>J.</given-names>
            <surname>Fu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Tiwari</surname>
          </string-name>
          , E. Cambria, MentalBERT: Publicly available
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>